We live in a multi-media world...
So why shouldn't search engines work on audio files as well?
This question originally came up for me when I was looking for a particular episode of RadioLab. This is a wonderful podcast with much that's thought-provoking, and memorable.
Except when you can't remember WHICH episode that memorable comment was made.
The other time I need to search through audio is when I have a recording of some event, and I'd like to be able to search the TEXT of that recording.
These audio search questions leads to this week's Challenges:
1. Is there some way I can search through all of the podcasts on the internet for ones that mention a particular topic? Let's try finding a few podcasts that discuss the way oceanic tides work. Can you find a podcast or two?
It's not hard to find a podcast about ocean tides:
[ ocean tides podcast ]
will turn up dozens. That's pretty straightforward. Most podcasts have gone out of their way to make the podcast a discoverable object by search engines. That means they have a title page with the name (including the word "podcast") and usually links to audio recordings. The better ones (in my opinion) also have transcripts of the podcast content. (RadioLab does this, but not every podcast does. You have to treasure the ones that do provide transcripts.)
A very real question is "have you found ALL of the podcasts"?
This is called coverage--that is, does your search engine provide a really complete set of results for the topic you're searching?
That's a difficult question to answer, but if you compare the results from Bing, DuckDuckGo, and from Google, you'll see they're really pretty similar (the top 10 are exactly the same).
Which suggests that we might want to find a search engine that's specialized for podcasts.
So, I did a search for podcast search engines: [ search engine podcast ] and found several. Here's my list:
* CastBox.fm – (about Castbox) All they index is podcasts, so that's all you'll find here. Use the magnifying glass lens to bring up the search box. Alas, I couldn't figure out how to search the contents of a podcast.
* AudioBurst.com - (about) seems well suited for searching recent radio programs, but they couldn't find any podcasts with ocean tide (which seems odd--as we know, there are a bunch). They index the full-text of the shows. One nice thing is that you can control the degree of match: exact,all, any).
* Google Podcasts - (about) This is the Google podcast search tool. Oddly, while it seems to index the contents of the podcast, it doesn't find nearly as many podcasts about ocean tides as regular old Google search does. Huh.
* ListenNotes - (about) I have to admit that this did the best job of all of the podcast search tools, finding many plausible casts. It returned so many results that I had to ultimately use double quotes to limit the results to just those with "ocean tides" in the podcast text.
There are a few other podcast search tools, but they're mostly limited in their coverage.
I would be remiss to not mention YouTube as a podcast source. Lots of podcasters put their casts up on YT, so be sure to check there as well.
2. If I have a recording of a conversation, what's the best way to be able to search the contents of that recording for mentions of a particular key word or phrase? How would you recommend I do this? (Bonus points if you can figure out how to do this for more than just English.)
There are a number of ways to do this. Here's the method that my buddy, Henk van Ess, posted about recently.
1. Convert audio to mp3 using one of the many converters available.
2. Use VideoIndexer to upload, speech reco, and index it.
3. Make sure you choose the right language. (And let's hope yours is supported.)
1. Upload your audio to YouTube (yes, create a new YouTube video with just the audio track).2. After the upload is done, you can get access to a time-stamped text file with the text in it. According to YouTube, they support: English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.
3. How can I find a particular non-spoken sound--say, the bells of Notre Dame or the sound of a glass harmonica?
This wasn't hard, but absolutely fun.
The glass harmonica (aka "armonica" -- both spellings are allowed) can be easily found on YouTube (examples: Thomas Bloch, Adagio for Glass Harmonica by Mozart, glass harmonica setup and assembly). Likewise for Notre Dame bells, finding pre-fire bells is easy (850 anniversary peal).
But of course, there are other, specialized collections of sounds that you might want to access (or download for your media project). In general, the best approach is to look for that particular collection and then search within the collection. Examples: [cartoon sound effects] or [famous speeches].
And, as always, don't forget the Internet Archive Audio file search. (Not just for podcasts, but also for all of those sounds you've been searching for.)
1. Searching for particular audio is possible, but it might require checking multiple sources. Google is good, but it doesn't have perfect coverage of all the possible audio on the internet. Check multiple sources! (And, every so often, look for a new audio/podcast search tool. You never know what you'll find.)
2. Doing your own speech recognition isn't hard anymore: just look for a transcription service (or use YouTube).
And that RadioLab clip I was looking for? All I remembered about it was that they were talking about some kind of butterfly--a kind of butterfly that was called a "satyr." My query, [ Radiolab satyr butterfly] was enough to find me the episode... but ONLY because they provided the transcript!
As always... Search on!