Monday, March 5, 2012

We must go deeper: Why search is always going to be tricky

In the movie “Inception,” the main character keeps goading his team farther into the subconscious mind with “we must go deeper…”  Their goal is to get to the bottom of the many layers of experience that makes up the human mind in order to plant a single idea. Each layer depends on the ones below it; they pile up to make an image of perceived reality.  And that’s kind of what I’m thinking here…

I often have this feeling of “we must go deeper” when I’m doing a simple search for information on Google—and yet I’m surprised when I see that other people don’t seem to have that same urge to understand what’s really going on, but are satisfied with the first answer that pops to the top. 

Somewhere along the way I must have developed a fact-checker’s outlook.  You know who the fact-checkers are: they’re the woefully underpaid people who read news, books, and magazine articles pre-press and meticulously verify that what’s written actually lines up with observable reality.  They’re paid to not just lookup the original publication of a fact, but then to also follow-up with the next few issues to see if any retraction or correction was published.  That’s what a real fact-checker does.  Running down truth, even if it takes several extra steps.

The simple reality is that looking stuff up has always been difficult.  Now, it’s just difficult in a different way (so many alternatives to consider, so many ways to think about things, so many ways to measure).  A fact-checker looks at a sentence like “There were 31 bars in Elko, Nevada in 1965” and immediately asks the question “How would I figure this out?  Would I count liquor licenses issued in the city of Elko during ALL of 1965, or just active licenses on 12/31/65?” 

When I talk to students as they search for simple facts, I find that they just want to get an answer that’s plausibly correct.  The actual truth of a fact doesn’t seem to matter as much as it confirms something previously known, or is merely close to something plausible.  It’s as though they live in an ongoing game of horseshoes where getting your token close to the post is often good enough to win.  

The thing that worries me these days is that not only do students not want to second source a reference (let alone look for retractions or corrections), but they often don’t want to understand the topic in enough detail to know how-and-why their fact might be incorrect.

The simplest example I can think of is the straight-forward question “What’s the circumference of the earth?”  How hard could finding that be?

 It IS the case that if you do a quick Google search, you can quickly end up with a very credible looking answer.  Here’s one answer: 24,901.55 measured around the equator (red arrow).  But here’s another:  24,859.82 miles if you measure around the poles (yellow arrow).  And of course, you can get different sets of number depending on who made the measurement, how it was made, and according to what model of the earth they’re using.

Now wait a second!  How does “model of the earth” enter into this discussion?  Isn’t the circumference an actual physical measurement? 

Well, yes, it is… but this is where you have to go deeper and realize that nobody actually takes the measurement.  Think about it for a second:  Walking around the equator with a yardstick just doesn’t work for all kinds of practical reasons.  So to actually measure the circumference, you’d actually measure something that’s easier to do—say, the difference between two vertical angles measured at two points on the surface.  (That is, if you project a vertical ray up from a point and see what star it hits, then walk north 100 miles and project a second vertical ray upwards, you can measure the difference in the angles and figure out what angle is subtended.  Once you know the diameter from other means, with this measured angle you can work out the circumference of the earth.) 

But here’s the tricky part:  If you measure some fraction of the earth’s surface, you then drop that into a MODEL of the earth to compute the circumference.   If you assume the Earth is a sphere, then it’s just simple geometry.  But  the truth is, you have to know that the earth isn’t a sphere.  (I assume you know that!)  But if it’s not a sphere, then what IS it?  That’s where the model comes in. 

Newton thought the earth was an oblate spheroid (a sphere that’s been squashed down a bit) with a flattening of 1:230 from truly spherical.  He derived that ratio from his model of gravity, and brilliant though it was to figure that out, it’s not quite right.  Measurements later showed that the flattening is more like 1:210.

Except… we must go deeper

The model of Earth as oblate spheroid assumes that it’s a “body of revolution.”  That doesn’t mean it suffers from constant political uprisings, but means the model assumes it’s pretty smooth, as though it were turned on a lathe.  Of course, you know by now that nothing is that simple.  Earth is actually fairly lumpy, with bumps on it—think Rocky Mountains or Himalayas—or low spots like Death Valley.  So a better model is complex ellipsoid with various undulations that make it NOT a body of revolution. 

The real question is probably this: Why should I care?  The two numbers given above (24,901.55/ 24,859.82) are within 1% of each other.  Does that matter? 

If you’re launching a spacecraft, it matters.  But the real point is that when searching for something as straightforward as the “earth circumference,” there’s a backstory about the number that reveals a good deal.   (And personally, the backstory is often more interesting than the final number… but I digress.) 

Thing is, this is true for most things you look up.  Suppose someone asks “what’s the biggest city on Earth?”  A quick Google check shows that it’s Shanghai with 17,836,133 people.  Really?  Since the time you started reading this, some people died in Shanghai while others were born.  What does the instantaneous population of a city even mean? 

And so checking the backstory quickly leads to clarifying questions: “What do you mean by biggest?”  Are you measuring population or area?  Do you mean city boundary as determined by the city council or by measuring the urban core and excluding suburban regions? 

This is, of course, what reference librarians do—they conduct the “reference interview” to try and pin down the variable parts of someone’s question. 

When I was doing a field study at the University of Alaska (Anchorage),  I actually played the part of reference librarian for a couple of hours while the snow fell and it grew dark at 3PM.  Mostly the questions were “where’s the bathroom” (no backstory needed for that!) and “how do I get my printer to work?” 

But then a graduate student asked for a “list of the top ten journals in Sociology.”  He wanted to get a general awareness of what the field was about and what would be reasonable topics of current interest. 

Getting a list of sociology journals isn’t hard.  Rank ordering them according to some reasonable function is tough.  “What would you like?”  I asked.  “Readership size?  Citation rates?  Agreed-upon influence?  Most-often-checked-out?”  Those all produce somewhat different sort orders.

Thing was, he didn’t really care.  He didn’t need an ordered list—just ANY list of 10 good journals would be fine, thanks very much.

I’d just assumed that he wanted an ordered list.  That’s what working at Google does to you; I see the world as a ranking problem and just assumed he wanted a list with an evaluation.  But since he was going to spend as much time with journal #1 as with journal #10, it really didn’t matter. 

Any good psychologist will tell you that “the presenting problem is often not the root problem.”  Likewise, the first question is often not the real question.  Good searchers take a problem and worry on it a little.  Good searchers look behind the “answer” to understand how that answer came to be.  In the circumference case, it was the measurement method.  In the population problem, it was how “largest” was defined and then how that property was measured as well. 

If I could somehow give everyone a piece of search advice, I’d say “think like a fact-checker” and get one or two steps deeper into your topic matter.  You must go deeper in order to understand what it is you think you’ve found.  The fact you seek isn’t just a fact, but something based on a set of choices about how to measure and what to report. 

Of course, I can say this all I want.  But this approach hasn’t worked well in the past.  (People have been saying this for a long, long time.)  How can I help people see this is what they really want to do when they really want to understand a topic? 



  1. What do you mean by "really understand a topic"? Everyone's viewpoint is unique. I might be understanding this topic because it is part of another, larger topic or I'm curious because I'm travelling or .....


    1. It varies tremendously. I certainly don't claim to understand everything to all depths. My deeper point (forgive the pun) is that people often don't think about what they're finding on the SERP. If you're just looking for the correct spelling of "pneumonia," then no context, no backstory, no history, no interpretation is needed. But many things people look up DO need more background to understand the assumptions that go into the finding.

  2. you pose a vexing dilemma... do your fellow gogglers view it as a way to separate the "wheat from the chaff"? - 1% from 99%, in a different context?
    ... alas, one can lead a horse to water, but it is frowned upon if you drown Mr. Ed...
    interesting article you link from the NYT - with G playing a featured role:
    "and the catchphrase among the old guard became “You can’t trust Google.”
    "Soon — and astonishingly — Google became much more than trusted; it became shorthand for everything that had been recorded in modern history. The Internet wasn’t the accurate or the inaccurate thing; it was the only thing."
    It would seem frustratingly unlikely that people could be taught/compelled to take in more than the bare minimum they think they want or will satisfy an external requirement they have. Perhaps, in the short term, the most that can be done is to offer the least demanding/highest quality search results possible by developing a hybrid form of intuitive search engine - a largely thankless task. You might refer to these?: (included scheme-root as an example of active cross-linking.)
    btw, in addition to the resistance to go deeper, I've noticed another search related phenomena with your search quizzes that may be a contributing factor to superficiality: speed — I know I,m not even close to being a quick researcher, but I'm consistently surprised at the claimed speeds in answering your quiz questions... even when the results are inaccurate, misguided or missing all together - it seems "speed" trumps almost everything else.
    Also, wanted to mention that I recently ran an ultramarathon event (100 miler) in just over 8 minutes, running in place in a specially modified DARPA "aircraft" @ ~ 300' above sea level - the only race-related injuries were some patches of mild skin abrasions. really, just fact check it. ;-)
    Will close suggesting you may want to do a stint as a "reference librarian" in Beijing for comparison to the AK experience... come to think of it, maybe making accurate information harder to access is a way to go?

    1. There's interesting discussion on this by researchers in search engine based IR to do with how the hell you measure 'success'. I know the big ones, using their own data and search engine have their own algorithms. But assuming you don't you can go for:

      1) Correspondence: 'success is 'out there'' - you can map information to needs perfectly, use of algorithms and 'experts' to assess success

      2) Coherence: 'success is constructed between system and user' - measure search duration, links traversed, etc.

      3) Affective measures: 'success is in there' - measure self-efficacy, subjective affect, etc.

      4) Communicative: 'success is use' - can the information be used effectively, but measuring that's hard because you need a far more nuanced perspective on the information need (and that need might fit into a wider scheme than the agent's direct need, e.g. a classroom), although you can probably use other approaches and supplement them.

      Anyway, what's interesting is that it might seem intuitively easy to measure success, and of course hit rate/useage is always going to be a pretty good indicator! But that doesn't tell us very much interesting.

  3. That's true to an extent Gord, but unless you're happy to be utterly subjectivist, the issue is that the activities or uses for which we gain information influence the nature of the need such that sometimes the cognitive environment - the search engine-agent pairing - is rather friendly and any plausible answer will do, and other times the uses are more complex.

    Dan - There's some cool research on epistemic beliefs, and framing (or scripting or schemata) and the ways in which individual's (and their instructors) beliefs about knowledge impact on their behaviours. Sure they impact on the later evaluation aspect, but also on the way needs are defined, and then cycled through - the sort of fact checking you mention. The framing is sometimes to do with how the coarser grain task is defined - are we looking for worksheet completion, understanding, effective group work, etc.

    There's also some interesting philosophy stuff attached to aspects of this. Let me know if interested

    1. Sure. Drop something in here! We'd love to see it.

    2. So I think there's an interesting set of discussions about the psychological nature of epistemic beliefs and their impact on behaviour, but also on the ways in which epistemic beliefs could be read into policy, pedagogy, etc. That gets closer to a philosophical issue about what we mean by 'knowledge' (not what sorts of knowledge are important, or privileged or sociological issues - but what it means to know something).

      So I wrote my MA on what the implications for 'knowing' were of one take on mind (the extended mind perspective), suggesting that to 'know' you needed not only to internalise some token, but also to have some other 'knowing-how' skills to do with use. That's particularly interesting in the context of external artefacts like calculators, books, the act of writing, and of course the internet. Especially given that, in Denmark, students have access to the internet in a number of their school leaver exams - looks like quite a different take on knowledge. What that allows for is setting rather more complex creative questions rather than the sort of fact reproduction things other places often do. So basically, I wrote about this problem

      And part of the stance I took was to do with our nature as inherently technology using beings - see (coincidentally by my supervisor).

      I think a lot of the people writing in distributed cognition, anyone explicitly taking a pragmatic (epistmeological) stance, etc. would be in broad agreement.

  4. Thanks for your understanding of what reference librarians do. But I think it goes deeper than fact checking. The reference interview is designed to elicit what the requestor really wants to know. In your example, it's entirely possible that the question "What is the circumference of the earth" is straightforward and, indeed, a fact-based question. It could equally be completely different. The requestor is actually curious about the number of miles Phileas Fogg would have traveled as he circumnavigated the globe in "Around the World in 80 Days" but thinks this is too complicated a question for a reference librarian, so tries to be "helpful" by "simplifying" the question.

    There's a second reference interview that information professionals employ -- and it happens in their brains. Having clarified the true nature of the question, they then ask "Who would be interested in this? Sufficiently interested that they would be collecting data?" This is where you've gone with your thinking about the bars in Elko in 1965 and your thoughts about NASA for the earth's circumference. This approach leads you to sources rather than instant answers.

    Let me go back to thinking as a fact checker. We information professionals and reference librarians would probably find that a bit too limiting. We are also looking for opinions, for different views of a topic, for a well-rounded group of information sources. We are also concerned with sources that present different answers and need to tease out why that is so. We can't reduce all the questions we get to the equivalent of fact checking. It's much more than that and requires critical thinking, creative approaches, and web search skills.

    This is not to demean fact checking. We do that too. And the mantra "we must go deeper" is a good one for information professionals.


    1. @Marydee -- I wasn't trying to capture all of what a reference librarian does or what a reference interview is. You're right; the point of the reference interview is to clarify what the requester is really after. I love it when a patron asks "What caused the Civil War?" because there's a great open opportunity for teaching. Of course, you have to be sensitive to how much information they want, what kind of information they can process, etc. But as I was trying to point out, even "fact-based" questions have interpretations that are assumed. Questions like "what is the air speed of an unladen swallow?" spring to mind. ("African or European...?" This is a Monty Python reference, if you care to go look it up...)

      So yes, fact checking is part of it; but a ref interview is much more. Absolutely.

    2. @Marydee -- I love your point about asking "Who would be interested in this? Sufficiently interested that they would be collecting data?" As a librarian and a search educator, I teach this as one of the most critical questions to help one locate information. If a searcher is able to think your question through, it suggests so many strategies to take in finding the information.

      Once that question is answered, it also helps to consider: "Who (of those who would be interested) do I trust to tell me about this?" Which brings one back to the fact-checking piece, in a way.

  5. Did you happen to hear the NPR program "On the Media" broadcast on February 24, 2012? It is entitled "The Lifespan of a Fact". It does address some of the issues that you discuss.

    Can we divide the world into two types of people? Those who believe that all the facts presented by the writer are true? Those who expect the writer to disclose at the very beginning that reader should not assume that the text is factual? Versus the people who can not understand what the fuss is all about?

    To listen to the podcast go to:

  6. As a descriptivist editor, sometimes I do searches where I'm sure the Google search algorithm has no idea what information I actually want. For example, today I wanted to know whether the term "medium-format camera" should have a hyphen in it or not. The prescriptive rule says it should. But do people actually use it?

    I used to do an exclusionary Google search like this:
    "medium format camera" -"medium-format camera"
    followed by
    -"medium format camera" "medium-format camera"
    Then comparing the numbers of results would show me what usage the real world favors.

    But Google's search algorithm doesn't know that's what I want. It thinks I want to find a medium format camera, not information about the words in the phrase. Sometime recently the algorithm changed so it now treats hyphens in a phrase as if they were spaces. So both of the searches above now give zero results, and searching for "medium format camera" and "medium-format camera" give identical results.

    For the majority of people using Google to search, making this change in the algorithm was the correct choice. For those people, Google shouldn't exclude one type of result just because it has a hyphen and they searched for the phrase using a space instead.

    The algorithm doesn't have any way of knowing that I'm not one of those people, and that for me it makes a huge difference. But I sure wish there were a way to let the algorithm know.

    1. Peter, you can get the search you want with "verbatim" in the search tools on the left-hand navigation bar (but only AFTER doing the unsuccessful search).

      "medium format camera" -"medium-format camera" 85800
      "medium-format camera" -"medium format camera" 85900

      (not a big enough difference to claim that one is more popular than the other)