Is it possible to check similarity of two recorded sounds, "throwing" out a noise and a time offset?

Hello users in this forum section! I have downloaded this plugin for audio analysis, but I don’t know the technical side of the question from the header (I am a programmer, not a sound designer) so I want some set of checks (energy, spectral etc.) to verify that, for example, during the last 5 seconds there was a sound “equivalent” to the previously recorded one in “main” tones (it is clearly understandable that two “real” sounds recorded at the different time moments cannot be equal “bit by bit” due to noises and other natural factors), probably with different time of start of that “main” sound. The “sensitivity” must be such that check usually identifies sounds produced by one source (bell ring, door opening etc.) as equivalent, but not such that program recognizes random sound appeared in the background as equivalent. I think it is reachable by the usual programmer, not a deep professional.

This is, in general, still an open research problem.

That being said, you may be able to do something like training an open-source deep learning model to detect similar sounds. Two models that are good at this, are “Spleeter” (which by default separates vocals from music) and “Whisper” (which does speech detection.)

You might be able to do something by figuring out what kinds of sounds you want to detect, and then which frequency ranges are most salient for those sounds, and then use the FFT part of the detection. Because sounds may have slightly different speeds, a matching algorithm might start with the first spectrogram, and extract fingerprints you want to “hit” in the input sound, and then scan the input sound for a sequence of those fingerprints.

So, very briefly:

to record:

spectrograms = empty vector of spectogram
lastspectrogram = empty
foreach block in original sound:
    blockspectrogram = calculate spectrogram from block
    if lastspectrogram is empty or different_enough(lastspectrogram, blockspectrogram) then
        spectrograms = append blockspectrogram to spectrograms

to detect:

index = 0
foreach block in input sound:
    blockspectrogram = calculate spectrogram from block
    if not different_enough(blockspectrogram, spectrograms[index]) then
        index++
        if index == length(spectrograms) then
            return Success
return Failure

Something like that.

Obviously, “different_enough” needs to be tuned based on the kinds of sounds you’re interested in detecting, and “calculate spectrogram” should probably exclude buckets for frequencies you’re not interested in.

1 Like

Thank you for your post. It was interesting for me, but, of course, this is not a solution. I hope I will still return to this problem, but not at current minute. Good luck!

In what way? Did you expect working code that you can copy-and-paste?

Sorry for late answer. In such way that problem has not been solved. Do you understand what means the word “solution”? It means that the problem does not longer exist. In this case it is not so. I liked your post because it is interesting and informative for me, but I think it’s obvious that it is not the finish of the problem.

You have to be a little more specific about what the problem is now.
What does “solution” look like to you?
Did you implement the algorithm and it didn’t work?

I did not implement the algorithm because it is not urgent at this moment. I had asked “generally”. I have today asked another question - it is more important for current moment. Present question is not such, it was asked only as information.