Thursday, January 16, 2014

The value of looking up stuff in other languages in Wikipedia

As noted, I'll write up the answer to this week's Search Challenge tomorrow (Friday, 1/17/13).  In the meantime, here's something I've been meaning to write up for a while.  Hope you find this interesting as well.  


You might have been using Wikipedia for a while, and not noticed that there are entire other worlds of information there on the page for you to examine.  Take a look at the left side of this Wikipedia entry about cats... do you see the column over on the left labelled "Languages"?  

Wikipedia comes in 287 languages (and more are being added).  Those links on the left are to the equivalent article in another language.  

What you might not realize is that articles are NOT THE SAME in all of the world's different languages.  Even fairly straight-forward articles, like this one on "cats," can be very different, and well worth looking at for comparison purposes.  

Here, for example, are the outlines of the cat-article in English (on the left) and the Spanish Wikipedia article (on the right, in translation).  Note how very different they are.  

Wikipedia articles on "cat" -- EN version on the left, Spanish version on the right. 
For some reason, the Spanish authors of the "cat" entry go into MUCH greater detail about cat diseases (section 9.1 - 9.13) than the English authors (section 7.1).  The Spanish language edition covers, for example, the effects of second-hand tobacco smoke on the rate of oral cancers among cats.  (Bottom line:  second-hand smoke is bad for your pet cat as well.)  

Bear in mind that the topic of "cat" is fairly non-controversial.  When you compare more culturally, historically, or politically loaded topics, the differences become even greater.  

Here, for example is a comparison of the English and Italian versions of the Wikipedia articles about "Leonardo da Vinci."  It makes sense that the Italian version would be much more in-depth, but it's also interesting to read the different treatments that each culture makes about his personal life and relationships.  (You can click on the image to see it full size.)  

Notice that just Section 1 on the life of Leonardo in Italian (12K words) is longer than the entire article about Leonardo in English (a mere 8K words).  

These kinds of cross-cultural comparisons are fascinating to make, but also potentially really useful when doing your research.  Frustrated by the lack of depth of the Wikipedia article in your language, try another language, perhaps one that's "closer to home" for the material.  

A tool for comparison 

I just recently ran across a wonderful tool that lets you see Wikipedia articles side-by-side. lets you look up a single topic, and then look at the Wikipedia articles from those languages side-by-side.  If we look up a topic we're interested in, say, the sculptor Bernini, we can see the English and Italian (in translation) versions side-by-side.  

I've added two numbers to the screencap to point out a couple of features. 

(1) is the list of all of the images from the Wikipedia article.  Notice how the list of pictures is VERY different between the two articles.  His great work "The Hermaphrodite" isn't even mentioned in the English version, but shows up prominently in the Italian article. You can spot this immediately by looking at the images summary. 

(2) shows a kind of word cloud (font size indicates word frequency in the article) for the Wikipedia entry.  Again, a quick scan shows real differences between the articles.  "Lorenzo" and "Barberini" are more important in Italian, but "Rome" and "Lorenzo" dominate in the English version. 

Search lessons:  When you're looking up a topic that might "more naturally" fit in another language (such as Leonardo da Vinci, or the life of Gian Lorenzo Bernini), consider ALSO checking out the other languages of Wikipedia for additional perspectives and insights.  Often, you'll find a huge variation in the way in which the topics are discussed, and how they're presented.  (For a really interesting transcultural experience, compare the Wikipedia articles on "conspiracy theory" in multiple languages.  You'll see that not everyone thinks of conspiracies the way the US does!)  

Also, if you're interested in a deeper understanding of this kind of cross-language comparison analysis, I highly recommend this paper from my friends at Northwestern University.  They built a system, Omnipedia, that analyzes the differences in Wikipedia articles across languages.  It is a brilliant idea, and well worth the read.  


  1. Thanks Dr. Russell for the link and for taking extra time to give us more tools and knowledge.

    Sometime ago, I thought Wikipedia in other languages was just copy/paste from the original. When searched for the way to translate one page to another language learned that they are different as you said and therefore, much more interesting. Now, with your link we can learn more and faster.

    I also noticed that some pages in languages have special mark to show they are well written and with more value to readers. So, we can look for those pages and translate them to our languages.

    Talking about languages, was great to see how some peers searched using their languages. In my case, most of the time only search in English. A question for my peers which primary language is not English, do you search also in English or in your language and then translate to English?

  2. Hi Dan, I hadn't heard of Manypedia it looks great (and like a great tool for cross-language Wikipedia editing!). If readers are interested in this stuff they might also like some of the Oxford Internet Institute (OII) Zero Geographies work on mapping controversial topics on Wikipedia (i.e. what language version of an article receives most edit wars)

  3. 2022 notes on this:
    * The link for "Manypedia" is broken as of 7-20-2022 (it returns a white screen), plus manually going there by entering the URL into my browser gives me the same result. As well, a Google search for it did not return that encyclopedia, plus its most recent Facebook post was from 2013. So, unfortunately, it looks like it's broken.
    * You made some very good points about the value of relying on Wikipedia for background information when doing search (both in this post and in "The Joy of Search"). And, based on that, I investigated further and found different studies had confirmed its value for that.

    However, last week, I learned that there had been a significant hoax concerning Wikipedia that came to light last month. In a nutshell, a Chinese housewife with just a high school education spent over a decade either creating false Chinese-language Wikipedia entries on Medieval Russian history, or adding numerous fake details to existing entries. On top of that, she apparently used several different accounts to pull this off, along with being highly detailed and using lots of citations (albeit from either sources that either did not exist or that did not back up her claims).

    It was finally discovered in June, and the 206 entries she created or edited were deleted.

    Having written that, and acknowledging that research indicates that Wikipedia is highly accurate, it makes me wonder how many other completely or partially false entries like that there are on that site. So, in light of that, how would you advise that people use Wikipedia, when trying to look up information?

    * (only in Chinese) (apology letter from the vandal)

    1. Alas, Manypedia seems to have gone to that great online services in the Sky. Too bad--it was really great. (And, worth noting, it was an academic research project. Once the graduate student who did it moved elsewhere, it was only a matter of time before it perished.)