Let's wrap this up...
Side comment: Why so long between posts? Answer: In addition to my regular gig at Google as a research scientist, I'm ALSO teaching a class at Stanford University on "Human-Computer Interaction & AI/ML" with my friend and colleague Peter Norvig (syllabus). It's a wonderful experience, but it's also taking a LOT of time. I always forget how much effort it takes to create a new university-level course from scratch, especially one that's full of content-rich lectures. Last week (and this week, to be honest) are very full of me writing the course, creating tests, making slides, and organizing the material. Well, it got busy last week, and as you'll see below, writing up the Wikidata method wasn't straightforward. Hope you'll bear with me for the next 7 weeks as we work through the course and write SRS posts. I think I'll make the next few posts somewhat simpler questions--still very fun--but they shouldn't take me as much time to write up the answer. The last day of the class is December 15th. I'll have more time to post more advanced Challenges after that.
As I mentioned last time, there are multiple ways to think about answering this question. Let me show you the Wikidata approach and then summarize.
Reminder: Our Challenge was...
1. Can you find a way to identify other major works of fiction (leaving out fan-fiction for the moment) in which the names of "Starbuck" and "Queequeg" appear (either independently or together)?
Last time we looked at using queries like: [ site:wikipedia.org "starbuck" -starbucks ] to search for all mentions of the word in ALL of Wikipedia. (Noting the use of the minus symbol to remove any mentions of that coffee company.)
My plan was to write up a long post here about how to use Wikidata to search the data underlying Wikipedia to find all mentions of Starbuck or Queequeg in any literary object (books, movies, cartoons, etc.).
So... I spent several hours learning the SPARQL query language for Wikidata and figuring out how to write those queries. Here's what they look like in the Wikidata SPARQL editor:
Yeah. In this example the term p:P1441 stands for "is present in work" and wd:Q3414055 stands for "Queequeg." Roughly, this query translates to "search for everything that has Queequeg present in the work."
You have to know that "in the work" means, specifically,
"this (fictional or fictionalized) entity or person appears in that work as part of the narration (use P2860 for works citing other works, P361/P1433 for works being part of other works, P1343 for entities described in non-fictional accounts)"
The SPARQL language is very powerful--you can ask nearly anything. If you'd like to learn more about it, here's the SPARQL tutorial that I used. Using this, you can ask questions like "Who are the grandchildren of Johann Sebastian Bach?" and get answers:
Johann Sebastian Altnickol. A fact you can check in many ways, but notably by looking at the Wikipedia page about Bach's descendents. And this illustrates a problem with the Wikidata; it has entries for everything that's a first-class object (e.g., famous people), but not everything that's a piece of text in Wikipedia has an entry in Wikidata. Thus, if you run the SPARQL query for works that contain Queequeg, you'll get only "first-class items," such as well-known books, movies, etc... but not all of them. If a book has Queequeg as a character, but the Wikidata doesn't have an entry for Queequeg in that book, you won't find it.
That's not really surprising--every database has a coverage issue. (That is, the database contains only certain types and amounts of information, this is called coverage.) The coverage of Wikidata is less than the full-text of Wikipedia.
It took me a while to figure this out. I was hoping that Wikidata would be more extensive, and allow me to find new entities that simple text search would not--but it didn't work out that way.
…then you're into the holidays… then vacation… then it's almost 2024…ReplyDelete
sounds like fun engagement… can you use any of the Swiss prep?
looked ahead for Week 8.A: Nov 15, 2022 (Doug Eck?, Peter away)* AI & Art
"There's a new competitor to DALL-E out there: Google's Imagen."… includes a number of side-by-side comparisons… interesting
see this from the syllabus
alle KI-Arbeiten sollen zu riesigen Mengen an Schokoladenmakronen führen … per Kantonsverordnung 2nd von den Blue Jays
not that you said this, but jtbc, it's too chilly to bare in December…
fwiw, Anna Magdalena Wilcke
JSB, the painter
…better than a lump of coal…
* Where the future leads????????? (Dan’s vision)
* Wrap-up / Summary / What have we learned?
Finals Week 11: Dec 15, 2022 time: 3:30 - 6:30PM
* Final exam: Group presentations
Good points, especially how some searches require multiple attempts to get some definitive answers (as was made clear during this particular query), and also simply using the site: operator with Wikipedia (i.e. site:wikipedia.org), rather than picking a language. I'll definitely have to remember that one.ReplyDelete
Regardless, thanks for taking the time to provide us those tips, and good luck with your class (though your students are fortunate to have a senior Google search engineer teaching them).
As an aside, I just looked at your course's syllabus. It looks like it covers some interesting topics, and like its readings would enhance what students are supposed to learn.ReplyDelete
About Wikipedia: I use Wikipedia a lot and browse the references and follow the ones that seem most relevant (or interesting), then follow the ones that seem most relevant…. Primary sources, right?ReplyDelete
“Doubt is not a pleasant condition, but certainty is absurd.”
“If you would seek certainty in life, go read the scores for yesterday’s sports games.”
I have been offline for a while and during that time had an experience which took me back to July 2021. I attended an exhibition dedicated to a well-known artist. This included a double-decker bench with a display case in the lower part containing the skeleton of a rattlesnake, a replica of a piece of furniture in the artist’s home. Apparently, to this artist, skeletons do not signify death but may be more living than animals.ReplyDelete
I mentioned this to a friend who replied that she thought snakes did not have vertebrae since they are able to swallow such large, uh, meals. We know they can unhinge their jaws, but what about expanding the rest of their bodies? I was able to figure out that one with a few searches.
I have to admit that until my friend raised the issue I had never thought of it. I have often seen snakes with bodies bulging and knew how they had managed to get their prey into their jaws, but nothing beyond that.Delete
[How do snakes swallow large prey?]
The first result was this, credible to me since it is from one of my favorite universities:
It explains snake swallowing in detail.
[Do snakes' bodies expand when they swallow?]
Another university web site with a grisly description of how a snake can swallow a man:
I couldn’t watch this one but it may be instructive:
[Smithsonian how snakes swallow]
Contained one interesting point: “Like most snakes, they can detach their jaw to swallow prey much larger then themselves, though they are careful to weigh the risk of injury with large prey.”
This web site contains the photo of a snake skeleton included in the 21 July 2021 challenge:
The references weren’t helpful.
I’m going to have nightmares tonight. A while back I had a not-so-pleasant encounter with a six foot python.
One more thing:Delete
[Brad Moon Louisiana]
The gentleman seems to know his stuff.
who was the artist with the rattlesnake skeleton furniture? where?…
nice find on Mr/Prof. MoonDelete
bio, UM PhD
"Rattlesnake tailshaker muscle is a great system for studying these things because it is specialized for sustaining high frequency contractions and shows very clear relationships between muscle speed, strength, motion, and energy use. With several colleagues, I have been using sonomicrometry and force transducers to record muscle shortening patterns and force exertion during rattling in western diamondback rattlesnakes (Crotalus atrox). Rattlesnake tailshaker muscle is an excellent study system because sustains extremely high twitch frequencies (up to 100 Hz!) without fatigue. These muscles show clear mechanical tradeoffs between contractile frequency and joint displacement that help to explain their unusually low energy use."
research inclusive of jumping slugs…
think I saw these on a post office wall…herpetology –also an area of FBI interest… mostly the two-legged variety…
not quite the original, but still a southern moon
a random twitter pic - no snakesDelete
wait, it is Monty Python no longer a 6 footer…
timber… kinda like furniture…
Terminated by the lady herself:Delete
This is what I saw:
Did I forget to post this?Delete
Terminated by the lady herself:
This is what I saw (scroll down to “Detail of a built-in bench”):
In any case, Happy Kibibyte Day everyone!
This comment has been removed by the author.Delete
thanks… (2nd try)Delete
"O’Keeffe got such a thrill from the coil of a rattlesnake skeleton she bought from a science supply warehouse, that she had a black velvet display case built for it to sit within the banco (clay bench) of her adobe home in Abiquiu."
Alfred, Georgia (not fragile) & snake poetry
Kilobytes vs. Kibibytes
#3 - Bach was a sRs student@St. Michael’s School…ReplyDelete
salt & Bach
before ear buds… or white noise