Tuesday, November 30, 2010

Learning as you search along: Why learn-as-you-search is so important




Regular reader J writes... 


I had a mosquito bite that began oozing a clear-yellow fluid that became crusted. I began searching for:
[ yellow discharge insect bite ]
and got a lot of low-quality results - plenty of generic info from, for example, first aid kit distributors, all of which appeared to be paraphrased from similar original sources. None of it was really helpful. Some sources mentioned that the discharge might be indicative of infection, but the discharge was only mentioned in passing in any case.

I decided to rephrase, and upon searching for:
[ crusty yellow mosquito bite ] 
one of the top results mentioned something called "impetigo". I immediately did a search for:
[ impetigo ] 
and came up with many high-quality results that appeared to be written by experts. I found articles written by people who called themselves doctors that contained detailed descriptions. 

It turns out that impetigo is a bacterial infection of the skin which comes in three types, one of which results in a red, itchy bump with yellowish discharge that becomes crusted. Apparently, to the untrained eye, the symptoms of impetigo are quite similar to an insect bite, whether it started with one or not.

I thought you might find this interesting, because my search pattern started with my assumption that I had a mosquito bite, and was searching in the category "insect bites", but the information I really wanted was "classified" on the web under "bacterial infections of the skin". 

It took reading through quite a lot of search results, including one Google Books search result before I was able to reformulate my search terms. I had a better "scent" of information in the results which also used the word "crusted" or "crusty", and that scent led me to the word "impetigo", which had the best scent of all, and then I found my answer.



This is a pretty common kind of story--J started to search for a topic, then discovered that a shift in terms would yield MUCH better results.  In this case, the shift from insect bite to a search on impetigo did a great deal to improve the quality of the search results.  


I take two big lessons's from J's story..  


1.  The right query term selection REALLY matters.  You can spend a lot of time wandering in the search wilderness without a good term.  But once you have it, the search is frequently very short.  (And contrariwise, if you have the WRONG search term, you can be in big trouble.  Be sure you know what it is you're searching for!) 


2.  Learning-while-you-search is key to power searching. J's big insight was when he noticed an unfamiliar term (impetigo) in the middle of his results.  By recognizing that as a potentially useful word, and turning his search to include that, he learned something about the area, and can now search much more effectively.  




Search On! 



Learning as you search along: Why learn-as-you-search is so important




Regular reader J writes... 


I had a mosquito bite that began oozing a clear-yellow fluid that became crusted. I began searching for [ yellow discharge insect bite ] and got a lot of low-quality results - plenty of generic info from, for example, first aid kit distributors, all of which appeared to be paraphrased from similar original sources. None of it was really helpful. Some sources mentioned that the discharge might be indicative of infection, but the discharge was only mentioned in passing in any case.

I decided to rephrase, and upon searching for [ crusty yellow mosquito bite ] one of the top results mentioned something called "impetigo". I immediately did a search for [ impetigo ] and came up with many high-quality results that appeared to be written by experts. I found articles written by people who called themselves doctors that contained detailed descriptions. 

It turns out that impetigo is a bacterial infection of the skin which comes in three types, one of which results in a red, itchy bump with yellowish discharge that becomes crusted. Apparently, to the untrained eye, the symptoms of impetigo are quite similar to an insect bite, whether it started with one or not.

I thought you might find this interesting, because my search pattern started with my assumption that I had a mosquito bite, and was searching in the category "insect bites", but the information I really wanted was "classified" on the web under "bacterial infections of the skin". 

It took reading through quite a lot of search results, including one Google Books search result before I was able to reformulate my search terms. I had a better "scent" of information in the results which also used the word "crusted" or "crusty", and that scent led me to the word "impetigo", which had the best scent of all, and then I found my answer.



This is a pretty common kind of story--J started to search for a topic, then discovered that a shift in terms would yield MUCH better results.  In this case, the shift from insect bite to a search on impetigo did a great deal to improve the quality of the search results.  


I take two big lessons's from J's story..  


1.  The right query term selection REALLY matters.  You can spend a lot of time wandering in the search wilderness without a good term.  But once you have it, the search is frequently very short.  (And contrariwise, if you have the WRONG search term, you can be in big trouble.  Be sure you know what it is you're searching for!) 


2.  Learning-while-you-search is key to power searching. J's big insight was when he noticed an unfamiliar term (impetigo) in the middle of his results.  By recognizing that as a potentially useful word, and turning his search to include that, he learned something about the area, and can now search much more effectively.  




Search On! 



Thursday, November 25, 2010

Answer: Original cranberry sauce recipe?

Answering this question is a bit tricky because there's so much written about the First Thanksgiving (1621, Plimouth, MA) during the past 100 years, but so little source material to work from!  


The big search lesson to take from the answer to this challenge is this:  Don't assume too much about what you think the answer will be.  Your initial assumptions might be very wrong, and you'd waste a lot of time trying to prove something that just isn't true.  


Here's the story... 

Background:  Cranberries (Vaccinium macrocarpon), are native to New England and a few other places in North America, growing in acidic bogs.  Many members of the heath family, such as blueberries (Vaccinium spp.) and azaleas (Rhododendron spp.), also grow well in acid, peat soils.

The cranberry plant is a very long-lived perennial less than eight inches high with trailing, thin stems with small, opposite, evergreen leaves. Cranberry flowers appear around the Fourth of July; these are white to light pink, downward-pointing, bell-shaped, axillary flowers. The name cranberry is a modification of the colonial name "crane berry," because the drooping flower looked like the neck and head of the sand crane, which was often seen eating the fruits.

Cranberry sauce, as we typically think of it, is a cooked, heavily sweetened concoction that's frequently augmented with citrus.  Of course, any citrus would have been impossible to have in 17th century Massachussets.  An even larger problems is that there was effectively little sugar in 1621 America.  (Certainly not in sufficient quantities to make cranberry sauce!)  The settlers hadn't had time to make maple syrup or sugar, and honeybees were still years in the future.  


But how do we search for this kind of information?  


My first search was [ cranberries 1621 ] which gives a number of search results, the most interesting of which is the "The Truth About Thanksgiving" from the Planet Blacksburg (VA) news site that quotes Daniel Thorp (colonial history prof at Virginia Tech) 




After looking through lots of deadend links (research is a slow process!)  I finally decided to look for the original letters describing the Thanksgiving holiday in 1621 to see what I could find.  From the Daniel Thorp article I found that the letter was written by Edward Winslow, making the obvious search
 [ Edward Winslow 1621 ] lead me directly to a transcription (and handy translation into modern speech): http://www.pilgrimhall.org/1stthnks.htm 


I quote from their site (in modern spelling): 


"...our harvest being gotten in, our governor sent four men on fowling, that so we might after a special manner rejoice together, after we had gathered the fruits of our labors; they four in one day killed as much fowl, as with a little help beside, served the Company almost a week, at which time amongst other Recreations, we exercised our Arms, many of the Indians coming amongst us, and amongst the rest their greatest king Massasoit, with some ninety men, whom for three days we entertained and feasted, and they went out and killed five Deer, which they brought to the Plantation and bestowed on our Governor, and upon the Captain and others.  And although it be not always so plentiful, as it was at this time with us, yet by the goodness of God, we are so far from want,  that we often wish you partakers of our plenty."

From there, I found that William Bradford had also written about the First Thanksgiving.  The obvious query: 
[ William Bradford Plimouth Plantation ] leads to his writings on Thanksgiving.

He wrote:  

"...They began now to gather in the small harvest they had, and to fit up their houses and dwellings against winter, being all well recovered in health and strength and had all things in good plenty.  For as some were thus employed in affairs abroad, others were exercising in fishing, about cod and bass and other fish, of which they took good store, of which every family had their portion.  All the summer there was no want; and now began to come in store of fowl, as winter approached, of which this place did abound when they came first (but afterward decreased by degrees).  And besides waterfowl there was great store of wild turkeys, of which they took many, besides venison, etc.  Besides they had about a peck of meal a week to a person, or now since harvest, Indian corn to that proportion.  Which made many afterwards write so largely of their plenty here to their friends in England, which were not feigned but true reports."  

And that's about it for the historical record.  So, were cranberries on the menu, and if so, in what form?  

There certainly was a tradition of stewed fruits in 17th century English cooking, so it's probable that some kind of fruit cooked compote was served, but it's not actually in the record.  So the truth is... it's all speculation!  

Most probably, the Indians (the local Wampanoag) brought along some of their own supplies.  Ninety people is, after all, a pretty big crowd to bring to Thanksgiving table.  If so, then they most likely brought along cranberries in the form of pemmican, a very calorically-dense survival and travel food.  It's made by mixing dried berries with dried deer meat and melted fat to form easy-to-carry slabs.  Think of ground up beef jerky mixed with a bit of bacon grease and dried cranberries, and you've got the idea.  

Fred's approach (in the comments) was pretty clever.  Query:  [cranberry history] then  used the Timeline tool to narrow down the results to the 1620's.  Once there, you see lots of references to pemmican.
http://en.wikipedia.org/wiki/Pemmican





Not my solution, but very nice.  


Happy Thanksgiving! 

Wednesday, November 24, 2010

Search challenge (Nov 24, 2010): What's the original cranberry recipe?

It's the day before Thanksgiving, and we're all pondering the origins of our national holiday.  Whether or not you believe it was a celebration of pure survival or a celebration of the domination of capitalism over socialism (you can go look it up!) matters not.  


Today's question, which I'm sure we've all thought about at one time or another is this: 


What recipe for cranberry sauce that was used at the first Thanksgiving feast at Plymouth in 1621?  


Just curious...  


Search on! 

Saturday, November 20, 2010

The Google Dictionary (and Wordnik too)

I'm constantly amazed at what one can find while poking around Google's apparently inexhaustible resources.

For example, although I use (and teach) the DEFINE: operator, I never knew about dictionary.Google.com!


And over on Footle blog I found a couple of nice bookmarklets that give you instant access to the Dictionary.  Here's the one for English:

 gDefine

Just drag that link into your bookmarks toolbar to make it runnable.  (Go ahead and try it. If you don't like it, just right-click on the button and hit CUT.  That'll remove it from your bookmarks.)

To use, just highlight a word (try it here:  install, then highlight this word--lassitude--and then click on the gDefine bookmarklet).  Nice.

Here are a few other examples to get started.  Again, just drag these into the toolbar up above and you'll have this cross-language look-up / define ability.


gDefine (Spanish)


gDefine (German)

And bookmarklets for autotranslation from a selected term into your preferred language:

gTranslate (French->English)


And I would be remiss if I didn't also point out that you can also get this a very similar kind of information from  Wordnik. It doesn't do cross-language translation, but does give you another point of reference

Wordnik


-- Just as above, you can drag this link to your bookmarks bar, select a word or phrase on the page, and click. There's a cool extra feature...  just clicking on the bookmarklet without highlighting a word gives you a random word in Worknik (which is very entertaining for logophiles).

In my practice, I have them both in my bookmarks bar, and switch back and forth to see both sources (most recently used on "cardiomyopathy" and "efficacious" -- even though I know what both those words mean, it's interesting to see the differences in definition and use between Google and Wordnik).


Dictionary on!

Friday, November 19, 2010

Answer: Where's the shrimp shack?

When I got this problem, I started by looking for the script, figuring that it would have the shooting locations specified.

[ script "fast and furious" ]  -- I was actually hoping to find a shooting script. And...  I found a script, but it only had the dialog (so I know what they said at the shrimp place, but not where it was).

So I gave up on the script idea and went with the "shrimp shack" idea. 

[ shrimp shack  "fast and furious" ] -- then a couple of deadend clicks before I saw on one page that I visited that it was in Malibu. (Here's an example of domain knowledge... Since I grew up in LA I know Malibu is on the coast; just as I also knew that San Fernando is NOT on the coast, so I didn't bother with any searches that involved San Fernando.)

So I started working on the Malibu angle...

[ shrimp shack  malibu "fast and furious" ]
-- and the first result is perfect.  It told me that it's Neptune's Net, on the Pacific Coast Highway near Malibu.  (Or, what the locals would call "PCH" -- aka Hwy 1.)

But naturally, I wanted to check this, as this page could be totally incorrect.  To check my work,

[ neptune's net fast and furious ]  for which the first result is a list of shot locations for the movie.  Since the page gives the street address for the Neptune's Net restaurant, it was easy to verify that this is in fact the correct place (see picture below for the StreetView image).

Looks pretty much like the movie shot.  And now that I look at it, I realized I've been here.  (Growing up in LA, it's a place to go... I just didn't realize it was connected to the movie!)














Comment:  Interestingly, commenter Fred Deventhal pointed out that filming locations are also a feature in IMDB.  Knowing this, I suspect he was far faster than I was.  The query:

 [ fast and furious imdb filming locations ] gets you to a pretty authoritative answer quickly.

My takeaway from this example is twofold:  (1) including the names of "only in the movie" restaurants isn't a great pathway to search...  and (2) IMDB is a pretty good resource for movie information, including shooting locations (but be sure to use the language that IMDB uses--"filming locations").

Search on!

Wednesday, November 17, 2010

Wednesday Search Challenge (Nov 17, 2010): Where's the shrimp shack from the movie "Fast and Furious"?

Someone asked an interestingly difficult question that took much longer to find that I'd expected.  Here's the original statement of the question: 


"I wanted to find the shrimp stand that Brian and Dom had shrimp at along the coast in Southern California in the original Fast and the Furious movie ("Fast and Furious 1").   I wanted to go get me some shrimps while down in San Fernando."  


It took a little bit of time to find the answer that was specific enough to drive to.  


Can you do this?  If so, let us know how you solved the challenge! 


Search On! 



Tuesday, November 16, 2010

Operator changes: phonebook: and rphonebook: gone

Things change, and Google has dropped the phonebook: and rphonebook: operators for finding phone numbers (rphonebook: was for "residential phonebook" and focused on home phones).  As you can imagine, this was an endless source of hassles for people (who were surprised to see themselves searchable on Google) and for Google (who had to constantly deal with all of the takedown requests and outraged letters from folks who thought they were unlisted).  

It's the way things go, though.  All web-services flux their interfaces (it's one of the things they do best).  But that also means having to sunset features that just didn't work out for one reason or another.  There are very few commitments in the web-verse, and the perpetual provisioning of an operator is definitely in the category of Continuously Provisional.  


On the other hand, there are a whole lot more operators in the Left-Hand panel.  Unfortunately, they're not expressible in the query.  Date range still works correctly, but I don't know of any handy way to say (in the query) "Latest" or "Past 24 hours."  


It's the perpetual struggle between command complexity and UI efficiency.  Should Google let you type in all of those clever (yet arcane) commands--or should you select the options via the panel controls?  


And so it goes.  All the world's information no longer contains phone number listings.  Luckily, there are lots of other places to get this information.  (e.g., www.phonenumber.com) 


Of course, the other interesting phone number feature recently released is the "Emergency Search Feature" -- launched in 13 countries (Australia, Belgium, France, Germany, Hungary, Italy, Netherlands, New Zealand, Norway, Spain, Sweden, Switzerland, U.K) For certain queries (mostly language variations on "help" or "emergency", the feature displays the phone number to call for poison emergencies, suicide prevention or general emergency services. It's the universal translator of 911 into whatever-the-local-number is.  




-------------------
Later: I just noticed a great point made by Barry Schwartz over on SearchEngineLand -- IF you put in a person's full name and zip, then the Phonebook onebox will trigger and show the phonebook info. 



BUT... you have to get the full-name (for example, "Dan Russell" doesn't work) AND you have to have the zip code.  So it's not really the same as the old phonebook: operator, but it still gives you a bunch of data.  


Friday, November 12, 2010

How many words should be in your search query?

There's always a discussion about how many words should you include in a search query.
Do you want it to be short & sweet, or long and descriptive? The thing is, what works for people (long and descriptive) might not work so well for a search engine.  Here's why...

When you do a search on Google, your words are implicitly AND-ed together.  What that means is that every word you add to your search changes the search, usually making the result set smaller and smaller.  That's even true for words that you might not think are very important (words like "the" "or" "of" "by" etc.).

Here's an example: Suppose you'd like to find out a bit about the history of the musical "Oklahoma."

If your first query is just:  [ Oklahoma ] -- you'll get about 112 million results.  (Don't panic, you don't have to read them all!  Just the first 10 or 12 are all you need to see.  Really!)



When you look at those results, it's pretty clear that they're not on topic.  Or, rather, they're great results for just Oklahoma (the state, the university and the football team).  But we're looking for the musical, right?

So try adding the terms "the musical" to the query:  you now get the query [ Oklahoma the musical ].  But look what happens to the size of your results!  It gets a LOT smaller--now it's down to 5.9 million results.  




And if the results aren't to you're liking, then you can add the term "on Broadway" to your query like this:  
[ Oklahoma the musical on Broadway ] and you've made your result set smaller yet again.  Now you're at 4.2 million results.  






The big point to make here is that every time you add a word to your search, you're restricting the set of possible answers.  That is, when you add a word, everything in the result set is limited to that query.

Usually, that's good!  But if you happen to be looking for something that's NOT in the set, say what you're looking for is really the one of the reviews of "Oklahoma," then by adding in "on Broadway," you might have already passed it by.  Here's a graphical illustration of what I mean.

If you're looking for just the right review of "Oklahoma" (say, from 1948), it's very possible that the result will NOT be in the yellow zone (the 4.2 million results).


And you'll have to back up a little bit in order to find that review.

The bottom line is a little subtle, but an important idea to have when searching.  When you make your search LONGER, you're typically making the result set smaller and smaller.  So if you're having trouble finding just what you want, trying removing terms that might be sending you down the wrong rabbit hole.  And keep trying.  I'll sometimes do 10 searches in a row, removing words and swapping out one way of asking for another.

In my next post, I'll explain why and WHEN you might want to go with longer queries.  But for the moment, keep 'em short and to the point!

Search on!

Friday, November 5, 2010

Answer: Textbooks archive?

One of the things I find most interesting about writing this blog is that I sometimes have no idea how hard a question will be to answer.  I mean, ahead of time, would you think that finding a few textbooks online would be difficult?  


Well... surprise, surprise, surprise... it looks to be one of those impossible search tasks.  I've had several offblog discussions with people about textbooks online, and it seems to be fraught with copyright issues.  Textbook publishers are (I guess I should have expected this) VERY cautious about putting their materials online.  I don't know if it's a worry about liability, fear of copyright infringement or what.. but the truth is that few textbooks have their entire text online.  


You might think that going to Google Books would solve the problem--aren't they there?  Well, yes, there are a few--but only ones that are pre-1923 (the current copyright cutoff date).  Since most of the concepts of interest are relatively recent (post 1955), these pre-1923 texts aren't much help.  


Since I was failing at my searches, I turned to the "Ask A Librarian" feature at the Library of Congress.  This is a fantastic feature that lets you send an email to a reference librarian with a question that they might hope to answer.  (I have to say that the good people at the LOC deserve every bit of credit they get for offering this service.  They're amazingly good!)  


In this particular case, I just forwarded my question to them, and got back the following answer: 


______________________________________


Your question was referred to the Science Reference Service at the Library of Congress <http://www.loc.gov/rr/scitech/> since your inquiry involves topics under our purview.

The Library of Congress has a good collection of high school and college textbooks, mainly by the big publishers. Many of the topics you listed are concepts from the mid to late 20th century. Works that are published after the 1920's will still be protected under copyright, so you would have to visit the Library to access these titles- more than likely they are not digitized or available online for free. Since you are in the Bay Area you might wish to consult with local universities that have special textbook collections.

You may wish to consult with Stanford's Cubberley Education Library <http://www.stanford.edu/group/cubberley/Stanford> about the Hurd collection- Paul Hart Hurd wrote extensively on science education and reform <http://www.stanford.edu/group/cubberley/collections/hurd>

Also, the San Francisco State University Library Marguerite Archer Collection of Children material might also be of interest <http://www.library.sfsu.edu/about/collections/archer/>

There are a number of ways to approach your research.

You might also wish to consult books or articles about teaching science in the 20th/21st century such as The National Science Foundation and pre-college science education, 1950-1975: report
<http://lccn.loc.gov/76601065>.

Using the Library of Congress online catalog you can search the following subjects:

Science study and teaching
Biology study and teaching (and other specific disciplines, plate tectonics, physical sciences, etc.)
Textbooks United States (can also search Textbooks United States History)

You also may find textbook catalogs or bibliographies helpful with identifying titles- For example El-Hi Textbooks in print <http://lccn.loc.gov/57004667>

In terms of articles you might wish to use the ERIC database < http://www.eric.ed.gov/>

Although the following is about current science education, you still may find it helpful with your research:  LC Science Tracer Bullet: Science Education
<http://www.loc.gov/rr/scitech/tracer-bullets/sciedtb.html>

______________________________________


This helps my friend out because he can go to the libraries listed.  But I'm not sure how to deal with this problem more generally.  


Any more ideas from the SearchResearchers? 




Still searching! 

Wednesday, November 3, 2010

Wednesday Search Challenge (Nov 3, 2010): Textbooks over time

A friend contacted me with the following question:  


"I'm doing a research project on the uptake of key scientific ideas into popular understanding. (e.g., the carbon cycle, DNA, plate tectonics, personal computers, internet search, evolution, the hole in the ozone, global warming)  My current approach is to identify when these key ideas first appear in high school science textbooks.

I'm looking for an archive collection of US science textbooks over the years.  Do you happen to know of any I could use?  (It would be really, truly great if they were online too!)" 



My friend lives in Menlo Park, CA.  


What would you tell my friend?  


How would you search for such a thing?  




Searching.... 

Tuesday, November 2, 2010

Google Video is NOT the same as YouTube!

Sometimes a common belief about search is just plain wrong.  I've told lots of people that the videos found on Google Video (at video.Google.com) are a superset of the YouTube videos, and had them react with an "I didn't know that!"  


(Okay.  How many of you reading this far had that reaction??  Fess up now.)  


The truth is that the videos on Video.Google.com are what Google finds by crawling the net, including video sites like YouTube.com, ABC.com, CBS.com, PBS.org, MySpace.com and so on.  


The key thing to keep in mind is that YouTube is video that has been explicitly uploaded to YouTube. 


Video.Google.com crawls all the video websites it can find, and gives links to those videos.  


In a Venn Diagram: 





Generally speaking, you'll find shorter clips of the premium content (think of shows like "Survivor" which exist only on CBS.com -- you can find clips and remixes of Survivor on YouTube, but can't get the whole show).   


So when would you use Video.Google.com?  


That's easy: Whenever you want to find full-length, professionally produced content and don't know where it exists.  Since Video crawls other video sites, you can find out quickly where the original content exists, click through and watch in on the site.   For example (to use Survivor again), if I don't know which channel produced the show, I'd just go to Google.com, search for [ survivor ] and click on the Video tab.  I instantly find out that it was shown on CBS, and I can go there to watch it.  






Since I rarely remember what production company or station has shown a particular show, this is a great way to figure out where it lives... and how to find it. 
  

Search on!