Wednesday, February 23, 2022

Answer: How can I search over audio?

  We live in a multi-media world...   


So why shouldn't search engines work on audio files as well?  

This question originally came up for me when I was looking for a particular episode of RadioLab.  This is a wonderful podcast with much that's thought-provoking, and memorable.

Except when you can't remember WHICH episode that memorable comment was made.   

The other time I need to search through audio is when I have a recording of some event, and I'd like to be able to search the TEXT of that recording. 

These audio search questions leads to this week's Challenges: 

1. Is there some way I can search through all of the podcasts on the internet for ones that mention a particular topic?  Let's try finding a few podcasts that discuss the way oceanic tides work.  Can you find a podcast or two? 

It's not hard to find a podcast about ocean tides:  

     [ ocean tides podcast ] 

will turn up dozens. That's pretty straightforward.  Most podcasts have gone out of their way to make the podcast a discoverable object by search engines.  That means they have a title page with the name (including the word "podcast") and usually links to audio recordings.  The better ones (in my opinion) also have transcripts of the podcast content.  (RadioLab does this, but not every podcast does. You have to treasure the ones that do provide transcripts.)  

A very real question is "have you found ALL of the podcasts"?  

This is called coverage--that is, does your search engine provide a really complete set of results for the topic you're searching?  

That's a difficult question to answer, but if you compare the results from Bing, DuckDuckGo, and from Google, you'll see they're really pretty similar (the top 10 are exactly the same).  

Which suggests that we might want to find a search engine that's specialized for podcasts.  

So, I did a search for podcast search engines:  [ search engine podcast ] and found several.  Here's my list: 

CastBox.fm – (about Castbox)  All they index is podcasts, so that's all you'll find here.  Use the magnifying glass lens to bring up the search box.  Alas, I couldn't figure out how to search the contents of a podcast.  

* AudioBurst.com - (about) seems well suited for searching recent radio programs, but they couldn't find any podcasts with ocean tide (which seems odd--as we know, there are a bunch).  They index the full-text of the shows.  One nice thing is that you can control the degree of match: exact,all, any).  

* Google Podcasts - (about) This is the Google podcast search tool. Oddly, while it seems to index the contents of the podcast, it doesn't find nearly as many podcasts about ocean tides as regular old Google search does.  Huh.   

* ListenNotes - (about) I have to admit that this did the best job of all of the podcast search tools, finding many plausible casts.  It returned so many results that I had to ultimately use double quotes to limit the results to just those with "ocean tides" in the podcast text.  

There are a few other podcast search tools, but they're mostly limited in their coverage.

I would be remiss to not mention YouTube as a podcast source. Lots of podcasters put their casts up on YT, so be sure to check there as well.  


2.  If I have a recording of a conversation, what's the best way to be able to search the contents of that recording for mentions of a particular key word or phrase?  How would you recommend I do this?  (Bonus points if you can figure out how to do this for more than just English.)

There are a number of ways to do this.  Here's the method that my buddy, Henk van Ess, posted about recently. 

Method 1: 

1. Convert audio to mp3 using one of the many converters available. 

2. Use VideoIndexer to upload, speech reco, and index it.  

3. Make sure you choose the right language. (And let's hope yours is supported.)  


Method 2: 

1. Upload your audio to YouTube (yes, create a new YouTube video with just the audio track). 

2. After the upload is done, you can get access to a time-stamped text file with the text in it.  According to YouTube, they support: English, Dutch, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

As always, there are a number of additional ways to do this:  Art Weiss reccommends HappyScribe.com (he tested it on a Hebrew audio file and was impressed with the accuracy).  I haven't tried it, but if you do, post your results in the comment thread.  

Jon also points to Otter.ai for transcription services.  They have automated speech / audio recognition with summary keywords, highlights, and full audio transcripts. Their service offers 600 mins free every month. It does require an account.  

3.  How can I find a particular non-spoken sound--say, the bells of Notre Dame or the sound of a glass harmonica?  

This wasn't hard, but absolutely fun.  

The glass harmonica (aka "armonica" -- both spellings are allowed) can be easily found on YouTube (examples: Thomas Bloch, Adagio for Glass Harmonica by Mozart, glass harmonica setup and assembly).  Likewise for Notre Dame bells, finding pre-fire bells is easy (850 anniversary peal).  

But of course, there are other, specialized collections of sounds that you might want to access (or download for your media project).  In general, the best approach is to look for that particular collection and then search within the collection.  Examples:  [cartoon sound effects] or [famous speeches].  

And, as always, don't forget the Internet Archive Audio file search.  (Not just for podcasts, but also for all of those sounds you've been searching for.)


SearchResearch Lessons

1. Searching for particular audio is possible, but it might require checking multiple sources.  Google is good, but it doesn't have perfect coverage of all the possible audio on the internet.  Check multiple sources!  (And, every so often, look for a new audio/podcast search tool.  You never know what you'll find.)  

2. Doing your own speech recognition isn't hard anymore: just look for a transcription service (or use YouTube).  


And that RadioLab clip I was looking for?  All I remembered about it was that they were talking about some kind of butterfly--a kind of butterfly that was called a "satyr."  My query, [ Radiolab satyr butterfly] was enough to find me the episode... but ONLY because they provided the transcript!  

As always... Search on! 

Wednesday, February 16, 2022

SearchResearch Challenge (2/16/22): How can I search over audio?

 Podcasts?  Well, of course!  


While I have my usual line-up of podcasts that I like (you can see some of them above), every so often I'll want to search around through audio to find something that's particularly on my topic of interest.  

Suppose I'm curious about something that we've discussed in earlier SRS posts--say, "how tides work"--is there some way to find a podcast or two on that topic?  That is, without manually scrubbing through lots of podcast descriptions, hoping to find one that mentions ocean tides in the podcast description.  

The other time I need to search through audio is when I have a recording of some event, and I'd like to be able to search the TEXT of that recording. 

These audio search questions leads to this week's Challenges: 

1. Is there some way I can search through all of the podcasts on the internet for ones that mention a particular topic?  Let's try finding a few podcasts that discuss the way oceanic tides work.  Can you find a podcast or two? 

2.  If I have a recording of a conversation, what's the best way to be able to search the contents of that recording for mentions of a particular key word or phrase?  How would you recommend I do this?  (Bonus points if you can figure out how to do this for more than just English.)

3.  How can I find a particular non-spoken sound--say, the bells of Notre Dame or the sound of a glass harmonica?  

As always, let us know HOW you found these.  We'd all like to be better searchers... of audio!  So tell us what you did! 

A big tip of the hat to my friend in online searching skills, Henk van Ess, for the audio search idea in the first place.  


Search on! 



Monday, February 14, 2022

How to find anything #5: Assessing Credibility of News Sources

 #8.3  Assessing Credibility of News Sources 

Foreword:  This is part two of our chapter on finding news (and late-breaking information).  Mario Callegaro and I have been writing this to let you know some of the best practices in how to find online news.  Here we talk about that important step--how to assess the credibility of a news source.  How do you know when you should believe a new source?  
  




When you’re looking for news sources you’ll naturally want to find credible sources.  The hard here is what’s credible… and how can you spot a credible source?  In other words, do you trust a source to be accurate even when you can’t verify the claim yourself?  That’s why you want to use credible sources: that’s the trust you place in the source.  


In other words, credibility is a measure of how much you trust the source to be accurate and honest about their reporting.  Every news provider will make mistakes, but a credible provider will acknowledge them openly.  



Brands are all about trust. That trust is built in drops and lost in buckets. 

             – Kevin Plank, Feb 21, 2014 – USA Today interview.  



So, how can you identify a credible source?  


I hesitate to give any checklist of features to look for two reasons:

  1. Those lists tend to become out-of-date rather quickly, especially when different players start to game the system to appear more credible than they really are; and 

  2. The credibility of a source is built up over time and can’t be determined by examining a single article.  


(But if you want a checklist, here are a couple: the CRAP checklist, MLA checklist CXL’s checklist, Mike Caulfield’s SIFT list.  You know how to find more.)  


How then can we identify credibility in the news we seek?  


Here’s the truth about credibility:  Identifying a credible site takes time.  Credibility, like trust, is easy to lose and difficult to gain.  Yet that’s what you’re doing as you read the news—you’re learning which sites have credible content.  What all those checklists are saying is that credibility has many hallmarks, but none of them are completely diagnostic—they’re brush strokes that are trying to help you assess how believable you find the source.  That is:  


Credibility is created by sustained accurate and honest reporting by a news source. When a source is credible, the expectation is that the information being presented is accurate, and implicitly, you don’t need to fact-check every detail.  


A  particular article might tick all the boxes in a “credibility checklist,” it might have a byline, have citations, and present the information in a reasoned and verifiable way.  It might have quotes that you can trace back to the original source, and present the information in a sober non-combative way.  Those are all great features to have.  But credibility is more complex than the presence or absence of those individual characteristics.  


Recognize that news sources often have a composite nature (e.g. large paper will run editorials from different perspectives, and individual writers have different reputations).  News sources are often not monolithic and through-written by an individual.  Instead, the op-ed might well be written by an author with a very different perspective than the editorial stance of the site.  Bear in mind that a single article might well be non-credible, but be found in a credible journal.  (And vice-versa.)  


So, what can one do to assess the credibility of a source?  Here are some aspects to consider:  


Reputation is variable: Credible sources tend to be well-known, they tend to cite each other, and they have a good mutual reputation.  But bear in mind that different sources can have very different reputations in different communities.  You might find the Wall Street Journal to be a credible source of economic information, but not to be a trustworthy advocate for economic policy positions.  (I trust my Mother’s advice about our family’s recipes, but not so much her thoughts on the economy.)  


Check different versions of the story:  When looking for a source of information on a given story… search for a range of sources in the area of interest. Get different perspectives on the story, and then see how your source stands up in the reporting.  Are the facts consistent across different stories?  How do different news sites tell the same story?  The differences between stories by news source is often a useful guide to how to understand the site’s credibility.  Often, a news aggregator ( news aggregator, feed reader, news reader) will put different versions of a story side-by-side allowing for comparison reading.

A deep insight about news:  Remember that “facts” of the moment are often transitory and are constantly being updated as new information is discovered.  Keep this in mind as you wade through the news--our understanding of facts, what's going on, and how to interpret what we hear is constantly under revision.  

Credible sites admit to errors:  A site (or author) that never makes a mistake isn’t doing reporting, they’re writing religious tracts.  Check for updates, errata, corrections on the site.  Credible news outlets admit, identify, and correct their mistakes.  Non-credible ones don’t.    

Pay for subscriptions: As you identify credible sources, we encourage you to pay for subscriptions to quality news sources (you will have your own list, but for example, my paid subscription list includes the New York Times, Wall Street Journal, Washington Post, The Atlantic, The Economist, The New Yorker).  We also have some international sources such as BBC and Al Jazeera

Fact-checking sites:  When determining credibility of news sources, it’s often useful to have some some fact-checking sites in your credibility toolbox (e.g. Factcheck.org, Politifact.org, Washington Post Fact-check, AP Fact Check, FullFact.org, or Snopes.com).  Remember that Wikipedia maintains a fact checking page by country.

Credible specialty sites:  If you’re interested in speciality topics, you’ll probably want to find reliable sites that cover only that topic area.  You might be interested in the politics or your city/state/region/country, so we encourage you to spend some time looking at different perspectives on that topic.  This advice is as true for math as it is for political stories.  Different points of view—even about topics that don’t seem to have different perspectives (e.g. math)--can be useful if only as a way to get different ways of telling the same story. For a completely different point of view (for us), we will sometimes check the Australian news services News.AU and the ABC (AU)

Finally, note that credibility is potentially transient—what you find a credible source NOW might not be a credible source next year. (They could have changed, or YOU could have changed. In either case, keep checking!)  

 

The Can-you-tell-the-story? test

One of the simplest ways to self-assess whether a story is believable or not (and whether or not you understand the story and its sources), we often just try to tell-the-story to someone else.  Imagine that you’re going to tell this story to your mother (or a friend, or whoever you trust).  Can you actually tell the simplest possible version of the story and feel as though you can give an honest account of the story?  


It’s often the case that if you actually try to re-tell the story (or write it out), gaps in your understanding will often appear.  And—in particular—you might well come to understand whether or not you should trust your news sources.  Does the story make sense, or are there hard-to-follow claims?  Do you understand all of the terms you’ve used in telling the story to your trusted friend?  Do you trust the sources that the news provider used in telling their story?  


All of this will come to the front as you retell the tale.  Can-you-tell-the-story?  It’s a simple sanity check.  If you can’t re-tell the story in a way that YOU believe, then you have to question the credibility of your sources.  



What we do to keep track

Mostly, we collect reputable sites and authors.  As we recommended in the previous post, keep track of the news sources that you find especially good.  Some useful sites that we follow and have in our list of good sites are Techmeme (technology), Metacritic (reviews of movies), (e)Science News (for a feed of science news) and ScienceNews (for another perspective on science news). For health news we follow Healthline, or for business news Bloomberg, for example.  Ask around within your community to learn what sources your friends and colleagues consider reliable and useful news sources.  

Of course, your mileage may vary: it’s ultimately up to you to find your own useful and believable sites (and then keep track of them).  

You can find your own news sites by searching for [ news site <topic> ] For instance, [ news site classical music ] will lead you to a bunch of newsy sites on that topic. 

We also learn about authors as we read.  When you find good (and credible) articles, take note of the author.  Good writers tend to write multiple articles within a domain of expertise.  Note their names and seek out their reporting.  

 

___________________

And, just for extra, bonus points and a very different way to find news that you care about, check out the… 

 

Wikipedia Current Events Portal:  The Wikipedia current events portal is an interesting way to get an overview of the world’s stories without much commentary.  There's one line for each major news story, including many international stories.  There's even an archive feature (see red circle in image below) that can take you back to Jan 1994, although depth of coverage drops over time.  


 



Search on!

Thursday, February 10, 2022

Answer: Search in a world of changing names?

 This flower has changed names... 

P/C Daniel M. Russell (2022)

.. which is fairly common thing to do.  People generally call this a shooting star, although I've always known this particular flower as Dodecatheon hendersonii, a name that rolls off the tongue (try saying it aloud!) and has a nice reference to the 12 gods ("dodecatheon") of the Olympian pantheon.  It's also a fanciful name given by Pliny to a primrose purportedly protected by the gods.  

But the name was changed to reflect newer phylogenetic information, which is only the right thing to do.  Goodbye Dodecatheon, hello Primula, where it rejoins the rest of the primrose family.  

This led me to frame these three name-changing Challenges.  Can you figure out a search strategy for each?  

1. Speaking of Mark Twain, did he use any other names besides Mark Twain for writing?  (If so, what are they?) 

We could do the Wikipedia search (which actually is a pretty good article). OR you could do a search like this, spelling out the variations on "pen name": 

     [ Mark Twain pseudonym ] 

     [ Samuel L. Clemens pen name ] 

     [ Samuel L. Clemens nom de plume ] 

both of which will lead you to his aliases.  But I wanted to find only his authorial names (that is, ones under which he published books or articles).  

For something like that, I turn to the Library Of Congress online catalog, knowing full well that the catalog entry would also show me his pseudonyms.  


As you can see, the LOC shows Quintus Curtius Snodgrass, Louis de Conte, and Jean François Alden as other names that Clemens used as well.  

That's a great find.  BUT, as SRS Regular Reader Art showed us, it's often useful to look at multiple sources, each with a slightly different perspective.  With his query of [ Mark Twain "other names" ] he found a few additional resources, including this one from World History Edu about Mark Twain.  From that page he learned that Twain also wrote as “Josh” and as “Thomas Jefferson Snodgrass.” But the first time he used a pen name was in the early 1850s when he signed a sketch in his brother’s newspaper “W. Epaminondas Adrastus Perkins.”

Of course I checked, and sure enough, I found that Clemens had also written with these other names.  So it's clear we need to dig deeper.  

Pro tip:  If you look at the Wikipedia entry for Mark Twain, near the bottom you'll see some expandable sections including one called "Authority Control."  I looks like this:


If you click on the [show] button on the far right, it will expand and show you multiple authority control links: 


The first row (General) are links to different "Authority Control" web pages.  If you click on WorldCat, you'll land on the OCLC WorldCat authority control page for Mark Twain / S. L. Clemens.  Looks like this: 

WorldCat identities authority record for Mark Twain

(What is WorldCat?  It's a meta-library catalog (technically, a "union catalog") that includes the catalogs of over 15K libraries worldwide.  As you can imagine, like all libraries, they need to have an authority control system so you can find all of the works by the same writer, no matter if they changed names or write in a different writing systems. This get complicated when authors come from another orthography.  For example, see the WorldCat record for Lao Tzu.) 

Scroll down a bit on this page and you'll find the complete list of alternative names that Clemens used: 


This list goes on for quite a while.  When you think about it, that makes sense because the works of Mark Twain also appear in other orthographies such as Chinese, Korean, Arabic, Hebrew, Russian, etc.  Farther down this list you'll find: 


As I read through the list, I'd heard of most of his pseudonyms (as seen above), but was surprised to see the name Jean François Alden on the list.  Is that really another pen name? 

Life is complicated.  Searching for this name reveals that the book "Personal Recollections of Joan of Arc, by the Sieur Louis de Conte" is an 1896 novel by Mark Twain which recounts the life of Joan of Arc. The novel is presented as a translation from the French by one "Jean François Alden" of memoirs originally written by Louis de Conte, a fictionalized version of Joan of Arc's page Sieur Louis de Contes. 

So, Mark Twain (Samuel Clemens) wrote the book, but had a fictional memoir writer (de Contes) whose work was (fictionally) translated by Jean François Alden.  Is that a pen name?  Or is Alden just another character in a book by Twain?  I can see it either way, and apparently, so can the Library of Congress, since they also listed Alden in their authority record!  

Have we found ALL of the pen names of Twain/Clemens/Josh/Snodgrass?  It's hard to know because Twain was such a prolific writer that a complete compendium of his work is nearly impossible to compile.  He was an active writer and wrote copiously, often in obscure newspapers, sometimes changing pseudonyms at the drop of a hat. Researchers has recently rediscovered some of his works in 2015.  (See this article from the Guardian about the Mark Twain project at UC Berkeley, where they KEEP finding more articles!)  

So, for Clemens we have Quintius Curtius Snodgrass, Thomas Jefferson Snodgrass, Josh, W. Epaminondas Adrastus Perkins,  Sieur Louis de Conte, and Jean François Alden.  But I have no doubt that we'll find more pseudonyms as bibliophiles continue to find more of his writings.  


2. You know that Istanbul was once known as Constantinople (there's even a song about that!), but what was Saint Petersburg, Russia (that is, the city of Санкт-Петербург) called before its current name?  

I began with the simplest possible search:  [ Constantinople ] and read a few pages (including the inevitable Wikipedia page) where I learned that Constantinople appears in many orthographies, thus, in  Greek: Κωνσταντινούπολις (Kōnstantinoupolis), Latin: Constantinopolis; Ottoman Turkish: قسطنطينيه‎, romanized as: Ḳosṭanṭīnīye).  

It was the capital of the Roman/Byzantine Empire (330–1204), the Latin Empire (1204–1261), and then back to the Byzantine Empire (1261–1453),  and the Ottoman Empire (1453–1922). Officially renamed as Istanbul in 1930, the city is today the largest city and financial centre of the Republic of Turkey 

What was it before 330?  

In 657 B.C., the ruler Byzas from the Greek city of Megara founded a settlement on the western side of the Strait of Bosporus, which linked the Black Sea with the Mediterranean. Thanks to the pristine natural harbor, his self-named city--Byzantium (Ancient Greek: Βυζάντιον, Byzántion)--grew into a thriving port city.  It kept that name until 330 CE when Constantine consecrated the empire's new capital there as New Rome (Nova Roma), a city which would one day bear his name. Constantinople would become the economic and cultural hub of region. Its importance would take on new meaning with Alaric's invasion of Rome in 410 CE and the eventual fall of (old) Rome to Odoacer in 476 CE. 

In this case, my search strategy was simple: begin with the obvious search, and then read 5 or 6 different sources (in particular, ones that are NOT duplicates of each other), and look for the points of agreement.  I found these earlier names (Byzántion and Nova Roma) in multiple places with links to various histories of Turkey and Istanbul, so I'm pretty confident that these are the earlier names.  

Applying the same search technique to St. Petersburg tells us that it was founded on May 16, 1703, at the mouth of the Neva, not far from Nien at the site of a captured Swedish fortress by Tsar Peter I (aka Peter the Great). Later, the name was changed to Petrograd (1914–1924) and then Leningrad (1924–1991), before reverting to St. Petersburg in 1992.  (For reference, I used the Russian St. Petersburg Wikipedia page with Google Translation.)  

 

3. What were projected moving images (what we would now label a "movie") called before 1900?  

For this I took a historic approach with a search for: 

     [ history of movies ] 

and landed on the Wikipedia page for the History of Film along with its related articles History of Film Technology, Precursors of Film. While around 1659 the magic lantern was developed by Christiaan Huygens to project still images, it took a while to get the tech working for moving images.  But I will leave you to read about the detailed history.  (Fascination warning: You might go into the rabbithole of old film mechanisms and never emerge.)  

images.  We could spend many happy hours reading about the early motion image systems, but here's a list of the ones I found:   Bioscop, Zoetrope, Théâtre Optique, Phenakistiscope, Kinetoscope, Zoopraxiscope, Praxinoscope, etc.  

And, of course, my favorite image from this era (not a projected image, but the first motion capture): 

The Horse in Motion. A GIF of E. Muybridge's still photos of a
racehorse running past the Red Barn on Stanford campus.
1878 (P/C Wikimedia)  



SearchResearch Lessons 

1. An authority file has the "master record" of an author's names, pen names, pseudonyms, and spellings in other languages.  WorldCat and the LOC have the best authority file, but they're still a little wonky to use.  (I'll write another article soon about how to use them effectively.)  

2. Even so, it's worth looking around for other pseudonyms The authority files might not be conclusive.  (As we found with "Josh" as a pen name for Twain.)  

3. When searching in other orthographies (e.g., Russian), it's useful to consult multiple sources, including those in the original language.  Thank heavens for Google Translate.  For dataful queries like this, it's a fantastic resource.  


Search on!