Friday, February 19, 2010

Finding specific kinds of files on the web -- the wonders of filetype:

It's sometimes really useful to be able to find a specific kind of file when doing a Google search.  To do this, we use the filetype:  operator.  (Sounds like a scary word, but "operator" just means "tool" -- the thing we put in the query to get the effect we want.)

For instance, you might want to find a Powerpoint presentation on a particular topic, say, the botanical structure of flowers. A good query for this would be:

     [ flower tutorial filetype:PPT ] 

Here, the part of the query  filetype:PPT limits the kinds of results to just Powerpoint files.  (Note that the capitalization doesn't matter.  I just put the file extension in capitals to make it stand out.)

I most often use filetype:  as a way to look for scholarly papers.  I do a fair bit of reading, and often find myself with the name of a paper by a particular author.  

Turns out that academics love to put papers out onto the web in Acrobat format.  That will give the file a PDF extension.  So when I'm looking for a paper by Richard Nisbett and Timothy Wilson, I'll do the query: 

... and that will give me a whole set of papers by Nisbett & Wilson to read.  This handy trick will often work when a paper is otherwise unavailable.. useful to know when you're in deadline mode.

You can combine the filetype operator with other operators to help clarify what would otherwise be ambiguous searches.  For instance, you can use double quotes to get Acrobat files that I've written.  To do this, use a query like:

     [ filetype:PDF  "Daniel M Russell" ] 

Why the double quotes?  Because it will look for those terms "Daniel"  "M"  and "Russell" spelled exactly like that and in that order.  If you take the quotes off of the search, you'll get a million other hits, most of which have nothing to do with me.  (So why didn't I put quotes around Nisbett and Wilson?  Because Nisbett is a very uncommon name.  Alas, Daniel, M and Russell are all super-common, so I used the double quotes to restrict the search JUST to me.)

The remarkable thing is that you can limit your searches to almost ANY kind of file you'd like.  Here's the deep, dark secret:  There's no magical list of file type extensions.... it can be anything you want to search for.

What that means is that if you're looking for a very odd, very strange kind of file, you can limit your searches to just that kind of document.  For example, TSV often stands for "tab-separated values" and usually indicates a data file where the values are separated by tabs (rather than commas, or some other special character).

     [ filetype:TSV data ] 

Would look for TSV files that also mention the term "data" -- a handy thing to know when you're scanning the web for some data sets.

Other handy file extensions to know about:

LWP  -- Lotus Word Pro (a word processor format)
XSL -- Microsoft Excel file
PDF -- Adobe Acrobat (often used for documents with special layout)
TXT -- plain text (usually the format for README files)
PS  -- Adobe PostScript files
MP3 -- audio file format
MP4, AVI, MOV -- video file formats

But, if you want to look for [ filetype:CRAZY ], be my guest.  Search on!