How we listen for differences in audio

Couple great points in this video about how to describe what we are hearing and what we like/don’t like. How experience improves certain aspects of your listening time and if that is even something people would even want. Like most things, learning takes time and it is hard discern between what changes we may hear from dacs,amps, and headphones, let alone trying to desribe those changes. There are a couple of sound examples they give too, like sibilance and decay. Also i included some definitions I “stole” from Moon Audio and headphonesty that i like and are still learning about:


  • Attack: The time it takes for a sound to increase to its maximum amplitude.
  • Decay: The sound immediately begins to decrease to the sustain level.
  • Sustain: The sound remains at this level until released.
  • Release: The sound decreases back to zero amplitude.


Describes the space and openness of the product usually associated with open back headphones and live music.


An abundance of, or uncontrolled bass response that overwhelms or interferes with higher (midrange) frequencies.


the effect of a device on the music signal. The opposite of “neutral.” Various aspects can affect the tone, responsiveness or the frequency response of the music/audio.


Poor clarity caused by overlapping sounds. Congested sound signatures lack detail and clarity, making it hard to hear separate instruments and may also be called muddy or muffled.


Describing how far away the instruments spacing is from back to front.


The variation in loudness between notes or sections within the music.


A loss of a sample or block of samples in a bitstream during playback in a digital device, introducing noise. It can be caused by a number of factors including sync/word clock error or even buffer issues with the interface. Regardless, it happens with all digital devices and introduces noise, so that is why it’s important to have more data or higher quality recordings for playback to minimize jitter.


The way a user prefers to listen to music. Typically, users may prefer to listen more analytically, or they may prefer to relax and “get lost” in the music. There’s no right or wrong way to listen to music.

Low-Level Detail

The subtlest elements of musical sound, which include the delicate details of instrumental sounds and the final tail of reverberation decay.


Described as good width and depth in the presentation of sound. Plenty of room between the instrumentation.


Indicative of strong bass reproduction and dynamics, with fast attack and short decay, giving a sense of power but remaining coherent and controlled.


Short for reverberation. A diminishing series of echoes spaced sufficiently closely in time that they merge into a smooth decay.


The high unpleasant peaks that are usually unpleasant to the ear if too prevalent.


The ability of the equipment to create a perception of space (width, height, and depth) in the music, within which the instruments and vocalists are located.


The interaction or cooperation of two or more audio components in an audio system, which, when combined produce a combined effect greater than the sum of their separate effects. Example: the synergy between a DAC and a headphone amp.


A perceptible pattern or structure in reproduced sound, even if random in nature. Texturing gives the impression that the energy continuum of the sound is composed of discrete particles, like the grain of a photograph.


the basic tone of a note, or the recognizable characteristic sound signature of an instrument.


In referring to music, tonality is the quality of the instrument’s tone. In referring to audio, it refers to the reproduction of the sound and accuracy of the original timbres.


Described as clarity in the sound presentation; being able to distinguish details and qualities.


Lack of full clarity due to noise or loss of detail from limited transparency.


Engaging vocals, bumped mid-bass and clear midrange. Full sounding with clarity.


The feeling of solidity and foundation contributed to music by extended, natural bass reproduction.


The apparent lateral spread of a stereo image. If appropriately recorded, a reproduced image should sound no wider or narrower than how it sounded originally.


how? demo low end, mid-range, hi-fi and TOTL one after the other. then you will know… :wink:

Two things to note about listening, at least for me:

  1. How can people hear SO differently? I have bought several pieces of gear that were and are universally praised. And I could not sell them fast enough.

  2. I am rocking about ten sets of headphones as I love variety. However, sometimes going from one set to another is not a pleasant experience. Wear one set for a few days straight and want to try something different. The next set (that I have loved or at least really liked for months) is on and sounds veiled, rough, bright, etc. After about 10 to 15 and maybe even 20 minutes, I am Loving the new sound and set for days. Funny how the brain works. :grin:


How much of that is the recording ?


It all depends on how deep do you want to go…

For some changes in patterns you don’t need that much training, some changes you can detect with the naked eye, other minute changes in pattern you will need a microscope as an instrument, and you will need the training and know how to know what you are looking for.

For the subtle minute changes in pattern, you need be familiar, aware, have put your time in, so you can listen, see, smell and taste with training.

It is like getting a new job, at first you lost, others who have been there take advantage of your lack of know how, but as you stay there longer, eventually you start to realize which department is better, what shift is better, which supervisor is the worse, etc… got to put your time in, so you can decode the pattern.

With music it also has to do with your personal taste or defect in your ears and brain. Some people can not tolerate certain sounds. Immediately they can identify a change in pattern because of the pain that it brings to their ears.

You can have a system as simple as a power speaker connected to a phone via the same 3.5mm to RCA audio cable. You can have two identical model number phones, download the same music app on both phones, play the same exact song back to back, and one phone will sound more pleasing to you.

So, what about if you buy the speakers, connected to your one phone that you have and it is not a good match, the phone doesn’t sound pleasing, now you blaming the speaker. But if you connect a different audio source, those speakers will have a completely different sound, and you don’t need that much training to detect that.

The only issue is that we are stuck with the phone that we have, we do not design nor create any part of the phone, we are stuck with what we get when we get the device. In my opinion each device and differ greatly.

My iPhone 6S Plus sounded much better than my iPhone 7 plus.
My 2019 iPad sounded much better then my 2020 iPad, so looking for another 2919 used iPad because I can’t find it new anymore.

That is the frustrating part, we are at their mercy.


Warning: I’m about to talk about brains again!

First, read this: What do we actually mean by training our ears? - #4 by WaveTheory

…because how human brains encode information is a very dynamic process that builds on what happened before. Every human brain works by taking in sensory information and tagging it with memories that it deems relevant (a process we have no control over, unfortunately) for later retrieval. If you experience an event, your brain will immediately and unconciously search for the things it has tagged as having some relevance to that event and bring those up to be quickly accessed. The set of memories/ideas/thoughts that comes up creates a frame through which we experience the new thing. For example: pizza. What happened when you read that word? You probably pictured a round, flat food with a tomato-based sauce and a variety of toppings like cheese or pepperoni or mushrooms…and maybe even recalled some smells or the name of your favorite pizza place or that time your college buddy dumped a slice all down the front of his shirt or you argued with yourself about New York vs Chicago. All of those memories are associated with that one word ‘pizza’. You may be introduced to a new type of pizza, or eat at a new pizza restaurant. And the moment you do your brain is going to bring up your pizza memories - specifically the patterns it associates with pizza - and compare and contrast the new sensory experiences, the new smells, textures, tastes, even the feeling of being in that new pizza place, with the existing ones it already has. That ‘tagging’ - connecting the new experiences to the old - is how our brains have evolved to form new memories and that’s how you’re going to remember the new experience of new pizza at a new pizza place.

Now, what happens if you introduce someone who has never eaten pizza to pizza? Their brain is going to pull up anything and everything it can to make sense of this new experience; do I taste tomato? This bread tastes like the flatbread I ate at my Aunt’s house that one time…do I tear off chunks like I did with that flatbread?…as examples.

Great. What does this have to do with audio? Well, even though every human brain works by the same mechanism on taking in stimuli and storing them, what gets stored and how it gets stored is strongly connected to our experiences and what our brain deems relevant (which again, unfortunately we have no conscious control over). No two people are going to have identical experiences, so no two people are going to connect the same incoming stimulus to the same relevant memories. This means that some music listeners, for whatever reason, are more likely to key in on bass and notice differences in bass performance. Others are going to hear vocal differences first. The list goes on and on… It’s not really about hearing differently, it’s about perceiving differently.

This is also why in general, as the Abyss video points out, the more experience you have listening, and the wider and more diverse your listening experience becomes, the easier it gets to perceive differences. When you have more points of reference to compare to, you have a better sense of what’s new. Also, the more experience we gain in ANYTHING, the more we can take in about that thing at once. The more expertise we gain, the more we can pay attention to at any one time because our brains have more relevant experiences to bring up and handle at one time.

OK, that’s enough rambling. I hope some find this useful.


THIS…It is so easy to get “used to” a certain sound and then declare it as “good”. The only way I could tell what I really prefer was to switch gear rather quickly after a day or two of not listening and then see what sounded “best”. I have realized that there is no best…just what our individual ears like, and that changes over time too.


The golden ear.

This particular Abyss video is great, and I enjoy their podcast overall.
Here’s another well done video explaining the history and evolution of digital audio along with simulating what different bit depths and sample rates are like. Goes great with headphones.

For me, it’s fun to listen for differences no matter what I’m playing.

Has a few inaccuracies when it comes to the mathematics behind it all.

This comment corrects the biggest misunderstanding:

To make this clear: The Sampling rate has NOTHING to do with Signal to Noise ratio.
That would be like saying the gearbox in your car influences the fuel capacity of the gas tank.

1 Like