Wednesday, October 19, 2022

Answer 2: Can you find characters from Moby Dick in other places?


Let's wrap this up... 

 Side comment:  Why so long between posts?  Answer: In addition to my regular gig at Google as a research scientist, I'm ALSO teaching a class at Stanford University on "Human-Computer Interaction & AI/ML" with my friend and colleague Peter Norvig (syllabus).  It's a wonderful experience, but it's also taking a LOT of time.  I always forget how much effort it takes to create a new university-level course from scratch, especially one that's full of content-rich lectures.  Last week (and this week, to be honest) are very full of me writing the course, creating tests, making slides, and organizing the material.  Well, it got busy last week, and as you'll see below, writing up the Wikidata method wasn't straightforward.  Hope you'll bear with me for the next 7 weeks as we work through the course and write SRS posts.  I think I'll make the next few posts somewhat simpler questions--still very fun--but they shouldn't take me as much time to write up the answer.  The last day of the class is December 15th.  I'll have more time to post more advanced Challenges after that.  


As I mentioned last time, there are multiple ways to think about answering this question.  Let me show you the Wikidata approach and then summarize.    

Reminder: Our Challenge was... 

1. Can you find a way to identify other major works of fiction (leaving out fan-fiction for the moment) in which the names of "Starbuck" and "Queequeg" appear (either independently or together)?  

Last time we looked at using queries like:    [ "starbuck" -starbucks ]  to search for all mentions of the word  in ALL of Wikipedia.  (Noting the use of the minus symbol to remove any mentions of that coffee company.)  

My plan was to write up a long post here about how to use Wikidata to search the data underlying Wikipedia to find all mentions of Starbuck or Queequeg in any literary object (books, movies, cartoons, etc.).  

So... I spent several hours learning the SPARQL query language for Wikidata and figuring out how to write those queries.  Here's what they look like in the Wikidata SPARQL editor: 

Yeah.  In this example the term p:P1441 stands for "is present in work" and wd:Q3414055 stands for "Queequeg."  Roughly, this query translates to "search for everything that has Queequeg present in the work."

You have to know that "in the work" means, specifically, 

"this (fictional or fictionalized) entity or person appears in that work as part of the narration (use P2860 for works citing other works, P361/P1433 for works being part of other works, P1343 for entities described in non-fictional accounts)"

The SPARQL language is very powerful--you can ask nearly anything.  If you'd like to learn more about it, here's the SPARQL tutorial that I used.  Using this, you can ask questions like "Who are the grandchildren of Johann Sebastian Bach?" and get answers: 

As it happens, I know (because I'm a fan of early music) that JS Bach had 4 grandchildren, not 3: Anna Philippiana Friederica Bach, Wilhelm Friedrich Ernst Bach, Christina Luise Bach, and
Johann Sebastian Altnickol.  A fact you can check in many ways, but notably by looking at the Wikipedia page about Bach's descendents. And this illustrates a problem with the Wikidata; it has entries for everything that's a first-class object (e.g., famous people), but not everything that's a piece of text in Wikipedia has an entry in Wikidata.  Thus, if you run the SPARQL query for works that contain Queequeg, you'll get only "first-class items," such as well-known books, movies, etc... but not all of them.  If a book has Queequeg as a character, but the Wikidata doesn't have an entry for Queequeg in that book, you won't find it.  

That's not really surprising--every database has a coverage issue.  (That is, the database contains only certain types and amounts of information, this is called coverage.)  The coverage of Wikidata is less than the full-text of Wikipedia.  

It took me a while to figure this out.  I was hoping that Wikidata would be more extensive, and allow me to find new entities that simple text search would not--but it didn't work out that way.  

And, in truth, it took me quite a while to learn how to use SPARQL.  It's sufficiently complex that unless you're going to use it every day, it's probably not worth the time to learn it.  (It's a great project, but not quite ready for the ordinary SRSer who just wants to look things up.)  

SearchResearch Lessons

As I said last week,  

1. When searching for literary resources, no single source is going to give us everything we want.  For complex search tasks like this, the best you can do is to assemble data from multiple sources.  It's going to be really really hard to get a complete list of all the uses of Starbuck or Queequeg in literary works with just a single search.  The sources are too many, too diverse, and with very different interfaces. Just realize this when you set out on your journey.  Some SRS Challenges are still really hard. 

2. Use SITE: on Wikipedia As a first repository of cultural knowledge, Wikipedia is pretty good.  Just searching for your target on Wikipedia (using the MINUS operator judiciously as needed) can get you pretty far. In a previous post we found a LOT of hits this way.  

3. Consider searching other special collections.  Think about searching on Google Books (remembering that the Hathi Collection is mostly, but not quite the same as Google Books).  Searching in Books will get you a bunch of additional hits, but also search in places with other cultural resources such as movies (IMBD to find "Age of Dragons" where Captain Ahab is in search of the Great White Dragon, with trusty Starbuck and Queequeg on hand in the crew) or recorded music (e.g. Spotify, to find the song Queequeg by Quarteto Minimo).  

Search on! 


  1. …then you're into the holidays… then vacation… then it's almost 2024…
    sounds like fun engagement… can you use any of the Swiss prep?
    looked ahead for Week 8.A: Nov 15, 2022 (Doug Eck?, Peter away)* AI & Art
    "There's a new competitor to DALL-E out there: Google's Imagen."includes a number of side-by-side comparisons… interesting
    via twit_ter
    see this from the syllabus
    alle KI-Arbeiten sollen zu riesigen Mengen an Schokoladenmakronen führen … per Kantonsverordnung 2nd von den Blue Jays
    not that you said this, but jtbc, it's too chilly to bare in December…
    fwiw, Anna Magdalena Wilcke
    JSB, the painter
    …better than a lump of coal…
    * Where the future leads????????? (Dan’s vision)
    * Wrap-up / Summary / What have we learned?
    Finals Week 11: Dec 15, 2022 time: 3:30 - 6:30PM
    * Final exam: Group presentations

  2. Good points, especially how some searches require multiple attempts to get some definitive answers (as was made clear during this particular query), and also simply using the site: operator with Wikipedia (i.e., rather than picking a language. I'll definitely have to remember that one.

    Regardless, thanks for taking the time to provide us those tips, and good luck with your class (though your students are fortunate to have a senior Google search engineer teaching them).

  3. As an aside, I just looked at your course's syllabus. It looks like it covers some interesting topics, and like its readings would enhance what students are supposed to learn.

  4. About Wikipedia: I use Wikipedia a lot and browse the references and follow the ones that seem most relevant (or interesting), then follow the ones that seem most relevant…. Primary sources, right?

    “Doubt is not a pleasant condition, but certainty is absurd.”

    “If you would seek certainty in life, go read the scores for yesterday’s sports games.”

  5. I have been offline for a while and during that time had an experience which took me back to July 2021. I attended an exhibition dedicated to a well-known artist. This included a double-decker bench with a display case in the lower part containing the skeleton of a rattlesnake, a replica of a piece of furniture in the artist’s home. Apparently, to this artist, skeletons do not signify death but may be more living than animals.

    I mentioned this to a friend who replied that she thought snakes did not have vertebrae since they are able to swallow such large, uh, meals. We know they can unhinge their jaws, but what about expanding the rest of their bodies? I was able to figure out that one with a few searches.

    1. I have to admit that until my friend raised the issue I had never thought of it. I have often seen snakes with bodies bulging and knew how they had managed to get their prey into their jaws, but nothing beyond that.

      [How do snakes swallow large prey?]

      The first result was this, credible to me since it is from one of my favorite universities:
      It explains snake swallowing in detail.

      [Do snakes' bodies expand when they swallow?]

      Another university web site with a grisly description of how a snake can swallow a man:


      I couldn’t watch this one but it may be instructive:

      [Smithsonian how snakes swallow]
      Contained one interesting point: “Like most snakes, they can detach their jaw to swallow prey much larger then themselves, though they are careful to weigh the risk of injury with large prey.”

      This web site contains the photo of a snake skeleton included in the 21 July 2021 challenge:
      The references weren’t helpful.

      I’m going to have nightmares tonight. A while back I had a not-so-pleasant encounter with a six foot python.

    2. One more thing:

      [Brad Moon Louisiana]

      The gentleman seems to know his stuff.

    3. skeletal art
      who was the artist with the rattlesnake skeleton furniture? where?…

    4. nice find on Mr/Prof. Moon
      bio, UM PhD
      "Rattlesnake tailshaker muscle is a great system for studying these things because it is specialized for sustaining high frequency contractions and shows very clear relationships between muscle speed, strength, motion, and energy use. With several colleagues, I have been using sonomicrometry and force transducers to record muscle shortening patterns and force exertion during rattling in western diamondback rattlesnakes (Crotalus atrox). Rattlesnake tailshaker muscle is an excellent study system because sustains extremely high twitch frequencies (up to 100 Hz!) without fatigue. These muscles show clear mechanical tradeoffs between contractile frequency and joint displacement that help to explain their unusually low energy use."

      research inclusive of jumping slugs…
      think I saw these on a post office wall…herpetology –also an area of FBI interest… mostly the two-legged variety…
      not quite the original, but still a southern moon

    5. Terminated by the lady herself:

      This is what I saw:

    6. Did I forget to post this?

      Terminated by the lady herself:

      This is what I saw (scroll down to “Detail of a built-in bench”):

      In any case, Happy Kibibyte Day everyone!

    7. This comment has been removed by the author.

    8. thanks… (2nd try)
      "O’Keeffe got such a thrill from the coil of a rattlesnake skeleton she bought from a science supply warehouse, that she had a black velvet display case built for it to sit within the banco (clay bench) of her adobe home in Abiquiu."
      appears fragile
      Alfred, Georgia (not fragile) & snake poetry
      Kilobytes vs. Kibibytes