Tuesday, October 4, 2011

Language for searching is subtle

In central London, just across the bus-clogged street from Victoria Station, there is a very modern newsroom.  It's a large, open-plan space full of desks arranged like the spokes of a wheel pivoting around the central conference desk.  Each line of desks has two monitors at each place, and a autumnal scattering of newsprint broadsheets all around, lending a sense of functional chaos to the orderly and ergonomically correct work stations.  There’s a sense of new-media about the place.  Lots of stories being written around the clock, many Twitter feeds examined, telephones everywhere and even a tiny studio just off on the side so new media journalist can do a quick standup video or audio recording when needed. 

With the radial layout, it’s a bit of a panopticon design, shades of Jeremy Bentham and the all-seeing eye, although in this case it’s intended to help people on different desks work together efficiently rather than the pervasive monitoring of inmates as Bentham proposed. 

I was in the newsroom to teach journalists a few of the finer points of search.  Of course they all use Google everyday, so everyone knew many of the basics, but once outside of their comfort zone, I realized once again that even the best investigative reporters know only a fraction of what’s really possible.  One more time I see that while they’re often great reporters and have a drive to get-to-the-bottom of a story, but even the young reporters tend to follow the tropes and patterns of previous years, and that limits them. 

It’s been like that everyplace I’ve gone in the past two weeks travelling throughout Europe lecturing, teaching and giving press briefings.  London, Dublin, Warsaw, Prague, Hamburg, Zurich…  The raw data tells you a bit: 2 invited talks; 14 press briefings;  7 classes taught (5 for Googlers, 2 for  journalists).  That’s a lot to do in 8 working days.  (And for the scariest piece of data:  40.2 hours of flight time.  That is, forty hours in the aluminum-tube-that-flies.) 

I did an interview on Czech television that was good fun, and then the next day I was in Hamburg, answering the same questions about what makes someone a good searcher on Google, but this time with a slightly more German twist.  Sample question:  “How can someone be the most efficient searcher possible?”  That’s an interestingly engineering-style question.  It’s very Googley, but also very different from questions I got from reporters in Dublin.  There the questions were more about the user’s experience—“how can we be sure that the searcher is really happy with what they find?”  I don’t mean to caricature, but there’s a reason the stereotypes are the way they are.   Hamburg… Dublin… they have very different outlooks on life.  I suspect they also search differently, although I didn’t do any studies to find out.

On the other hand, in the Swiss newspapers I became the “Chef für Kundenzufriedenheit,” literally, “chief for customer satisfaction,” which is not quite the way I think about myself, but I can see how they got that from “user happiness,” which IS in my job title.  It’s a distinction that matters in English, but does it work that way in German?  Don’t know. 

These subtleties in language kept coming up again and again.  One of the things I teach is the way to use additional search terms that describe the *kind* of thing you want to find.  A nice example is to do an Images search for [bicycle diagram]: 

which always gives you a page full of nice diagrams, each with the parts all labeled.  Alas, when you do this in German, it doesn’t work.  Turns out that the German word “diagramm" (that’s their spelling) has a slightly more limited meaning than “diagram” in English.  (You have to use the German word “schema” to get labeled diagrams.)  Likewise, a word like “serious” (“serious” or “seriös” in German) has several meanings in German.  But you can’t say (in German) “he was seriously ill,” that sense of “serious” as “substantial in number or size” doesn’t carry over into German. 

I shouldn’t have been surprised, really.  I know all about false cognates (example: Spanish “dia” means “day” but has no relationship to “diary” in English, it just looks like it does).  But somehow had the impression that a relatively simple word like “diagram” (a word that IS a true cognate across language pairs) would also copy all of the subsenses of the word as well.  Nope.  Not true. 

What’s so interesting to me is that search strategies that I thought would work across-different languages don’t turn out to be very robust.  When I tried the [ diagram ] trick in German, I had to shift to a synonym for “diagram.” On the other hand, it works fine in Spanish (where the query is [bicicleta diagram] )  But when I try this search tactic  in Japanese, I can’t make it work at all!  (If you can make this work in Japanese, please let me know what you did!) 

What this tells me is that language variations are even more subtle and profound than I’d thought.  It gets even trickier when you try to remember to account for loan-words across languages.  The technical term for an oft-used bicycle frame alloy of chromium and molybdenum is (in English) Chrome-moly.  But in Spanish it’s “Cro-Moly”—which makes *half* sense.  “Chromium” (EN) is “cromo"  -- but molybdenum is “molibdeno” (ES), so you’d think the Spanish tech name would be “Cro-Moli.”  But it’s not: the term “Cro-Moly” is half-Spanish, half-English.  Who thinks this stuff up?

The bottom line is that language is rich, complex and variegated in its intricacies.   You’ve got to known it and love it.  You need a kind of panopticon of the mind to see all the variations.  (DE: Panoptikum; ES: panóptico; FR: panoptique)


  1. The image search
    japanese jitensha diagram
    worked fairly well for me. I started with
    japanese bicycle diagram
    and lost the "japanese" modifier, so changed "bicycle" to the japanese word.

    I also found a few with
    (Google translation of "exploded diagram of bicycle")

  2. In which sense is "diary" unrelated to "día"? As far as I can see, they both originate from Latin "dies".

  3. Greg -- The root of "diary" is from Latin, "dies" -- you're quite right. However, DAY has a more complex etymology that is NOT Latin in origin. DAY: Originally from Old English, dæg "day," also "lifetime," from P.Gmc. *dagaz (cf. O.S., M.Du., Du. dag, O.Fris. dei, O.H.G. tag, Ger. Tag, O.N. dagr, Goth. dags), from PIE *dhegh-. Not considered to be related to Latin dies (see diurnal), but rather to Skt. dah "to burn," Lith. dagas "hot season," O.Prus. dagis "summer." Meaning originally, in English, "the daylight hours;" expanded to mean "the 24-hour period" in late Anglo-Saxon times. (From: http://www.etymonline.com/index.php?allowed_in_frame=0&search=day&searchmode=none)