New Algorithm Captures What Pleases the Human Ear—and May Replace Human Instrument Tuners

By Sarah Zhang | March 29, 2012 10:16 am

As computer hardware and software becomes ever more powerful, they find ways to match and then exceed many human abilities. One point of superiority that humans have stubbornly refused to yield is tuning musical instruments. Pythagoras identified the precise, mathematical relationships between musical tones over 2,000 years ago, and modern machines can beat out any human when it comes to precise math. So why aren’t computers better than people? The professional tuner does have one incontrovertible advantage: a trained human ear.

Imprecision, it turns out, is embedded in our scales, instruments, and tuning system, so pros have to adjust each instrument by ear to make it sound its best. Electronic tuners can’t do this well because there has been no known way to calculate it. Basically, it’s an art, not a science. But now, a new algorithm published in arXiv claims to be just as good as a professional tuner. To understand how this new algorithm works, it’s worth understanding how today’s electronic tuners don’t work.

Human 1, Machine 0

One major problem with automatic tuning is baked into the Western musical system and the limits of human hearing. In the equal temperament system, which is used for most modern Western instruments, the frequency of each note is greater than the half-step below it by a factor of 2^(1/12), or 1.0595. If you go up by 12 of those half-steps, the frequency of the note is twice where you started: an octave.

But there’s a problem: Equal temperament systems don’t exactly generate intervals like a perfect fifth, where the ratio of the frequencies between the top and bottom notes should be exactly 3:2. (A perfect fifth is the interval in the first four notes in “Twinkle Twinkle Little Star.”) On an instrument tuned according to strict equal temperament, the top note of a perfect fifth is 2^(7/12) times the frequency of the bottom, or 2.997:2—not exactly 3:2—and our ears naturally find whole number ratios between the frequency of notes to be most pleasing. (This previous sentence was corrected based on a comment below.) A musician with a good ear can hear the subtle difference. Over the entire range of an instrument, from its lowest to highest notes, this small difference is compounded, and interferes with what should be the pleasant, harmonious sounds of its overtones—the higher-pitched, secondary sounds created by any instrument.

The solution is to fudge it: tuners “stretch” the frequencies of some strings to make the instrument sound good overall. A seasoned pro figures out a way to optimize the instrument’s sound based on both our purely mathematical musical system and on human psychoacoustics. This stretching makes a pro-tuned instrument sound noticeably better than an electronically tuned one. The chart above shows how an example of stretching on a human-tuned piano.

Human 1, Machine 1? 

The new study replaces the human ear’s ability to detect “pleasingness” with an algorithm that minimizes the Shannon entropy of the sound the instrument produces. (Shannon entropy is related to the randomness in a signal, like the waveform of a sound, and is unrelated to the entropy of matter and energy). Entropy is high when notes are out of tune, say the researchers, and it decreases as they get into tune. The algorithm applies small random changes to a note’s frequency until it finds the lowest level of entropy, which is the optimal frequency for it, say the researchers. And setting tuners to follow this algorithm instead of the current, more simple formula, would be a simple fix.

The paper has a graph comparing the results of human (black) and algorithmically tuning (red) as proof of the latter’s effectiveness. Not bad, but entropy-based tuning hasn’t passed the real test yet: a musician’s ear.

[via arXiv Blog]

Images via Haye Hinrichsen / arXiv

CATEGORIZED UNDER: Physics & Math, Top Posts
  • JoePasta

    Just thought I’d correct the math a little here:

    A perfect fifth is 2^(7/12) = 2.997:2 (rounded up instead of truncated)
    A perfect fourth is 2^(5/12) = 4.005:3

    So, a fifth is not 2^(5/12), a fourth is not 2^(4/12) – that’s a perfect third, by the way, or 5.040:4 – and so on.

    Otherwise a nice explanation.

  • Amos Zeeberg (Discover Web Editor)

    @Joe: Right you are! Thanks for the correction. Fixed the text.

  • M Raymer

    I’m sorry, but the basic explanation above is incorrect, and not what the paper published on arxiv claims. The ‘problem’ is not with equal temperament, but with inharmonicity of stiff strings. That is, the frequencies of the higher overtones of a single vibrating string are not multiples of the fundamental frequency of the string. Therefore, when playing that string at the same time as a much higher-pitch string, the frequencies don’t line up. For example, a low C has an overtone that is close to but not equal to the fundamental frequency of a C that is 4 octaves higher. This has nothing to do with equal temperament, since the octaves are not affected at all by equal temperament tuning (they are perfect octaves).

    What the arxiv paper does is find a method to best compromise for this inharmonicity effect. The solution found is very close to that found by human tuners, who fully understand what they are doing.

  • Uranium Willy

    I was also under the impression that, because the harmonics of each individual note on a piano are stretched out the thicker the strings get (especially with upright pianos where the strings are even thicker), that the lower notes are tuned to match the first harmonic instead of the fundamental, because this is what our ear pays more attention to.

  • dina renna

    Sounds lke we have a genius in our midst…

  • John Lerch

    A mention of the human voice would be nice too. I frequently am a bit off from the church’s electronic organ and the 2 other choir members with the most musical training would rather I were with the electronic organ.

  • Lawrence de M.

    The fallacy of machine tuning is assuming there is one tuning.

    A bit of expansion on the valid comments of M Raymer, John Lerch and Uranium Willy above: The composition of the strings changes discretely over the range, from steel to bronze wound over steel, to double wound. Also, the stiffness of the soundboard is critical in shaping the string harmonics and feeds back to some degree to shift the frequencies of the harmonics. We hear some composite of the fundamental and overtones as a single pitch even though a piano is complex.

    Piano soundboards are also far too small for the fundamentals of the lowest notes. This means that harmonics which tend sharp dominate the bass octaves.

    What no one has mentioned is that composition has intended temperaments. These varied widely in the Baroque era from country to country and city to city. Bach even had his own temperament. In this sense tuning is part of the voicing, which depends on the taste and repertoire of the performer.

    Piano temperament is particularly subject to aesthetic variation when used with just instruments as in a quintet with strings. String quartets often utilize perfect intervals and the piano would then sound most harmonious if tuned for the key. Of course, discord is an equally valid expression and can be equal or preferred to harmony by the tuning.

  • eyesoars

    The arxiv summary makes some sense, but the first four paragraphs of the technology review article are techno-gibberish. The author of the latter appears clueless.

    I can’t see the underlying article, but what M. Raymer says above would make sense in what context I can see. Uranium Willy’s comment also appears to make sense.

    In the guitar world, there are certainly similar inharmonies. A plucked note is initially sharp, and then tapers off towards the tuned note, and this is deliberate because it sounds “best”. Guitars also have bridge adjustments for each string (to lengthen or shorten it) so that they can be “intonated” to make the frets match the natural string harmonics. The size (diameter) and type of the string (typically solid for the treble strings, wound for the bass) have a significant effect on the actual string length required to make the 12th fret match the first harmonic.

  • Tim

    I think everybody is right here.
    First the article:
    I do believe this frequency ratio of 3:2 sounds pleasant for our human ear (better than 2.997:2). Because of this perfect ratio the “entropy” is low. However, I am talking about two perfect sine shaped (direct) sound waves entering 1 ear. Best fit is a ratio of 1:1, but best after that is a ratio of 2:1, the octave…
    Well, a piano, guitar or any physical instrument is not producing a perfect sine. Natural overtones/harmonics/acoustic reflections/natural AM+FM and more is coloring the sound. We just find it most pleasant to hear when that entropy is low. This happens to be a “little of key”, I guess when playing a physical instrument with many extra frequencies. Are overtones produced “prefect overtones? M. Raymer is saying something similar I believe…
    Interesting issue, since I play with synthesis and physical modelling…

  • http://N/A Chris Pilcher

    Today on my piano I used a tuning program from the net, Dirk’s, and compared it continually my Yamaha “little” instrument: they agreed perfectly.
    However, I did not have the full facility to work out the streach over the the full 88 notes ,non the less after six hours, I have a pleasing result.
    So !!!!!!!!!


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!


80beats is DISCOVER's news aggregator, weaving together the choicest tidbits from the best articles covering the day's most compelling topics.

See More

Collapse bottom bar