Wednesday, February 23, 2022

Answer: How can I search over audio?

  We live in a multi-media world...   


So why shouldn't search engines work on audio files as well?  

This question originally came up for me when I was looking for a particular episode of RadioLab.  This is a wonderful podcast with much that's thought-provoking, and memorable.

Except when you can't remember WHICH episode that memorable comment was made.   

The other time I need to search through audio is when I have a recording of some event, and I'd like to be able to search the TEXT of that recording. 

These audio search questions leads to this week's Challenges: 

1. Is there some way I can search through all of the podcasts on the internet for ones that mention a particular topic?  Let's try finding a few podcasts that discuss the way oceanic tides work.  Can you find a podcast or two? 

It's not hard to find a podcast about ocean tides:  

     [ ocean tides podcast ] 

will turn up dozens. That's pretty straightforward.  Most podcasts have gone out of their way to make the podcast a discoverable object by search engines.  That means they have a title page with the name (including the word "podcast") and usually links to audio recordings.  The better ones (in my opinion) also have transcripts of the podcast content.  (RadioLab does this, but not every podcast does. You have to treasure the ones that do provide transcripts.)  

A very real question is "have you found ALL of the podcasts"?  

This is called coverage--that is, does your search engine provide a really complete set of results for the topic you're searching?  

That's a difficult question to answer, but if you compare the results from Bing, DuckDuckGo, and from Google, you'll see they're really pretty similar (the top 10 are exactly the same).  

Which suggests that we might want to find a search engine that's specialized for podcasts.  

So, I did a search for podcast search engines:  [ search engine podcast ] and found several.  Here's my list: 

CastBox.fm – (about Castbox)  All they index is podcasts, so that's all you'll find here.  Use the magnifying glass lens to bring up the search box.  Alas, I couldn't figure out how to search the contents of a podcast.  

* AudioBurst.com - (about) seems well suited for searching recent radio programs, but they couldn't find any podcasts with ocean tide (which seems odd--as we know, there are a bunch).  They index the full-text of the shows.  One nice thing is that you can control the degree of match: exact,all, any).  

* Google Podcasts - (about) This is the Google podcast search tool. Oddly, while it seems to index the contents of the podcast, it doesn't find nearly as many podcasts about ocean tides as regular old Google search does.  Huh.   

* ListenNotes - (about) I have to admit that this did the best job of all of the podcast search tools, finding many plausible casts.  It returned so many results that I had to ultimately use double quotes to limit the results to just those with "ocean tides" in the podcast text.  

There are a few other podcast search tools, but they're mostly limited in their coverage.

I would be remiss to not mention YouTube as a podcast source. Lots of podcasters put their casts up on YT, so be sure to check there as well.  


2.  If I have a recording of a conversation, what's the best way to be able to search the contents of that recording for mentions of a particular key word or phrase?  How would you recommend I do this?  (Bonus points if you can figure out how to do this for more than just English.)

There are a number of ways to do this.  Here's the method that my buddy, Henk van Ess, posted about recently. 

Method 1: 

1. Convert audio to mp3 using one of the many converters available. 

2. Use VideoIndexer to upload, speech reco, and index it.  

3. Make sure you choose the right language. (And let's hope yours is supported.)  


Method 2: 

1. Upload your audio to YouTube (yes, create a new YouTube video with just the audio track). 

2. After the upload is done, you can get access to a time-stamped text file with the text in it.  According to YouTube, they support: English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

As always, there are a number of additional ways to do this:  Art Weiss reccommends HappyScribe.com (he tested it on a Hebrew audio file and was impressed with the accuracy).  I haven't tried it, but if you do, post your results in the comment thread.  

Jon also points to Otter.ai for transcription services.  They have automated speech / audio recognition with summary keywords, highlights, and full audio transcripts. Their service offers 600 mins free every month. It does require an account.  

3.  How can I find a particular non-spoken sound--say, the bells of Notre Dame or the sound of a glass harmonica?  

This wasn't hard, but absolutely fun.  

The glass harmonica (aka "armonica" -- both spellings are allowed) can be easily found on YouTube (examples: Thomas Bloch, Adagio for Glass Harmonica by Mozart, glass harmonica setup and assembly).  Likewise for Notre Dame bells, finding pre-fire bells is easy (850 anniversary peal).  

But of course, there are other, specialized collections of sounds that you might want to access (or download for your media project).  In general, the best approach is to look for that particular collection and then search within the collection.  Examples:  [cartoon sound effects] or [famous speeches].  

And, as always, don't forget the Internet Archive Audio file search.  (Not just for podcasts, but also for all of those sounds you've been searching for.)


SearchResearch Lessons

1. Searching for particular audio is possible, but it might require checking multiple sources.  Google is good, but it doesn't have perfect coverage of all the possible audio on the internet.  Check multiple sources!  (And, every so often, look for a new audio/podcast search tool.  You never know what you'll find.)  

2. Doing your own speech recognition isn't hard anymore: just look for a transcription service (or use YouTube).  


And that RadioLab clip I was looking for?  All I remembered about it was that they were talking about some kind of butterfly--a kind of butterfly that was called a "satyr."  My query, [ Radiolab satyr butterfly] was enough to find me the episode... but ONLY because they provided the transcript!  

As always... Search on! 

13 comments:

  1. Thanks, Dr Russell!

    It is a great Challenge and lots of new tools and knowledge.

    Is there a way to search YouTube transcripts?

    Let's say We want to search for a phrase or similar. And it's not part of the title. I searched and found a tool but it's for learning how to speak a language.

    To create transcripts on YouTube, everyone can do it? Once I read that autogenerated only works with some specific cases.

    For Q3, I never understood the question. I thought you were looking for other thing not just the sound on a video. Maybe, I thought, you meant something like sounds on ringtones or mp3 sounds. Something like that was my thought. I was wrong.

    ReplyDelete
    Replies
    1. Sorry about that, Ramon. Once you've opened the transcript, you just Control-F your way to find the text you're interested in seeing.

      For Q3 I was interested in ANY kind of sound. YouTube just happens to have a gigantic collection of sounds, so it's a good resource to remember. (People tend to go with what they know, rather than stepping back to think about the entire breadth of options.)

      Delete
    2. Thanks Dr Russell.

      About transcripts, my question is if there is a way to search on YouTube for some word and YouTube finds it in a transcript. A time ago, I remember something about this. My question (hopefully I can describe it correctly) is if we want to find on YouTube videos that mention on their transcript certain words but we still don't have a video to Ctrl-F in the transcript, is it possible?

      Youglish.com finds something like that. We search for a word and the site finds the videos in which that word is mentioned on transcripts.

      Delete
    3. Mexican Oaks. Related to previous SRS Challenges
      https://www.bbc.com/future/article/20220222-the-mystery-of-mexicos-vanishing-stream-oaks

      Delete
  2. Replies
    1. Nice finds, remmij -- I hadn't seen these. (And I did not know that RadioLab makes their podcasts available on YouTube. I should have checked.)

      Delete
  3. Won't Google Assistant help identify any song ?
    you can try humming or singing it. Here’s how:

    On an Android device, say, “Hey Google,” or touch and hold the home button.
    On an iPhone, open the Google app and tap the microphone button.
    Ask, “What’s this song?”
    If the song is playing, Google Assistant will name it and give you a YouTube link.
    You can also hum, whistle, or sing the melody, and Google will suggest potential matches.

    ReplyDelete
  4. After the usual searches and results with listennotes.com, I departed from the challenge a bit and tried
    [podcast noaa how oceanic tides work]
    I got some podcasts to the point of the challenge, but also this interesting tidbit about how the tides were crucial in planning of the D-Day Normandy invasion:
    https://oceanservice.noaa.gov/podcast/june20/nop36-dday-tides.html

    I repeated the search using “nsf”, “nasa”, and “Friday Harbor Labs”, all (credible) outfits I know and trust. Except for NASA, there was no new material.
    I found one advantage to searching the science-oriented web sites. As I said earlier, the word “tide” has many metaphorical meanings. Some of the podcasts listennotes.com found reflected this with titles such as “Tide is turning, how do you prevent….?” even with oceanic (not in quotes) included.

    On a lark, I searched my tablet for [how oceanic tides work] because that is my podcast device of choice and found (among others) Rick Steves’ Podcast “634 Holy Land Easter; Mudlarking; Higher Tides”. The last segment is an interview with Jonathan White, author of *Tides: The Science and Spirit of the Ocean* which now resides on my library “For later” shelf. One point of discussion in the podcast was harnessing the previously-mentioned extraordinary Bay of Fundy tides to generate electricity. (Coincidentally, mudlarking is possible only because the Thames is a tidal river.)

    I typically listen to podcasts only when I cook but now that I have seen more of what is out there, I will listen more.

    ReplyDelete
  5. I tried creating a transcript on Youtube and was amused that one of the language options was New Zealand English. Yay!

    ReplyDelete
  6. video transcripts - used YouTube to look - seems to be an alternative/augmentation to your suggestions…
    …searched on youtube… took me a while to think of that – duh… can C&P to other documents… there are qualifiers & exceptions.
    YT transcriptions
    the SERP
    used this to try it out…
    used this one to check language options - previous link auto-generated in english only
    offered in Ukrainian… example (with time stamps - can toggle that off): btw, didn't see the NZ English option on these…
    02:27 з росіянами його прозвали привидом Києва ще один прикладз росіянами його прозвали привидом Києва ще один приклад

    ReplyDelete