Re: Analyzing fundamental frequencies from musical signals



This is about analyzing fine detail in pitch and pitch changes
over time, given that you already know approximately what note
is being sung or played, and that any interfering tones are
at least a minor third or more away.

Note that the strongest frequency might be an overtone of
the fundamental pitch, so you might have to divide down the
strongest frequency to find the pitch relative to the notes
on the score ***.

As you found, zero-padding and using a long fft, although a very
accurate method of interpolating frequency, will not show fine
detail in the frequency envelope. However frequency is the
derivative of phase. So what I might try is a technique from
phase vocoding. Use overlapped successive short fft's and
compare the phase changes in the nearest bin of interest with
what would be the phase change represented by the overlap
offset. Plot that phase difference. The slope of the plot will
represent the frequency offset from the fft bin center, and any
curvature in the plot will represent a change in frequency.

This could work with fft windows as short as maybe a dozen
cycles or less of the dominant frequency (which itself may be
an overtone of the fundamental pitch), so you can get much
better time resolution. For 330 Hz, maybe try 75% overlapped
windows as short as maybe 1024 samples of 44.1 KHz.

This fft phase technique differs from autocorrelation in that
autocorrelation requires some interpolation to find the phase
to some given resolution, if finer than that of one sample step.

As for multiple voices at differing frequencies, I recommend
multiple microphones and a multi-channed recorder.

IMHO. YMMV.

------Original Message------
dieselviulu@xxxxxxxx wrote:

I'm a soon (hopefully) graduating music teacher who's quite lost in the
general field of signal processing. I'm trying to write my graduating
work about intonation in amateur choir music. My big question in my
work is "How does an amateur choir fine-tune itself when singing a
cappella (without accompanying instruments)?". In order to reliably
answer that question I need to be able to analyze the tones the people
are singing. For my purposes I have estimated that I'll need accuracy
of 0,1% of the frequency. Lowest absolute amounts of acceptable error
would be about 0.066Hz when the bass sings (very) low C, about 66Hz.
But that's the extreme situation, I can cope with less.

I'm doing my work on matlab, and this far I've been simply using fft.
I've been doing this with a friend who is an engineer, while I'm more
of a musician. I'm very good in mathematics and computing for a music
teacher, but I'm no engineer :). We played with fourier quite a bit,
and found out that it's not very accurate frequency-wise for this
purpose. Then we made quite a bit of progress by using extreme
zero-padding; it was in the scale of adding 1e6 zeros after a sample
with length 8820 points. After that we used contour function to make
our output plot more accurately readable. It provided accurate results
from synthetic signals, but I'm not at all sure that it would be
accurate and reliable on real audio signals. I have a gut feeling that
using that much zero-padding would not be riskless.

We also tried using something that my friend called "autocorrelation"
but it didn't work out. We cancelled that attempt when we got results
that were very nice, clean, accurate and totally incorrect.

The material with which I'm working is live recordings from choir
rehersals and concerts, recorded on DAT with one stereo microphone,
samplerate 44100, 16 bits. Signal is quite noisy, and the number of
frequency peaks that I want to catch varies between 4 and 8. Frequency
range that I want to study is about from 66Hz (lowest bases) to 1KHz
(highest sopranos), although the upper limit could be much higher if it
turns out that I need the harmonics too.

We've been using sample length of 0.2 seconds in fft. The things that I
don't like in fft are:

1) When analyzing a synthetic signal which consists of one clean sine
wave of 440Hz, I get a plot that shows me quite a wide slope with peak
at 440Hz. This means that in the audio signal I cannot tell if for
example all the altos are singing the same tone as they should, or is
there diversity [1] within the altos.

2) fft cannot analyze any changes within the time of sample length.
This means that in the two cases where a) bases all sing a bit
differrent tone making their voice rather a band than a tone, or b)
bases sing the same note, but their fine tuning slides downwards within
the sample length I get same kind of results for both cases: a wider
peak.

3) I have to consider all things that are related to harmonics manually
/ visually. This is not impossible, though, since I'm one of the
mentioned bases and I know the music they are singing. This means that
I'm not looking for an audio-to-notation algorithm, since I do have the
*** music. If I have to do this with only "tuned-up fft" I'd rather
lose the harmonics and study only fundamental tones.

So, is there a better way to do this kind of analysis? I've heard about
MUSIC algorithm, but I don't know anything about it. My ideal system
would present me a plot of 10 seconds of music with 4-8 curves that
represent the fundamental tones of differrent voices (high soprano,
lower soprano etc.) and how their frequency changes in time (so, a plot
with time as x and frequency as y). It would also (not obligatory if
it's otherwise reliable and accurate) present me with a chance to see
some kind of power-graph of the signal to see how much diversity there
is within one voice. I have a gut feeling that this *should* be
possible since we do that with our ears all the time. Also, all needed
information is in the signal, if I just can get it out because I can
hear a lot of things in the recording. I know some of the reasons why
fft-based analysis cannot give me a spectrum of a "moment" of music
(sample length approaching zero), but I believe ears can so it should
be possible somehow.

I would be very glad about suggestions! I know it's a lot to ask, but
please bare in mind that I'm not actually a mathematician and that
trying out something new in matlab takes several days before I can say
anything about it... Imagine writing a letter with a dictionary in a
language you know almost nothing about :). So if a light bulb lights up
in your head and you know just the thing that works, and why it works,
I'd love to hear that!

Happy processing!

Erkki Nurmi
Sibelius academy, Finland

[1] by diversity within a voice I mean a situation where a loud
"leading" alto is singing a note of 300Hz, and another loud alto in the
other end of the row of altos is singing 295Hz. All the other altos
sing (with varying amplitudes) something between those two frequencies.
This is not a rare situation.


IMHO. YMMV.
--
Ron Nicholson
rhn A.T nicholson d.0.t C-o-M

.