Tuesday, July 2, 2013

What happened to the tilde ( ~ ) operator?

So…you might have heard by now, and have been asking: 

    Why did Google turn-down the ~ (synonym) operator?

As regular readers recall, up until this past week, Google used to have a single-term synonym operator.  That is, you could do a query like this:  

   [ ~beginner class ]       

-- and get an automatic expansion of synonyms for beginner (novice, freshman, inexperienced...)  

or you could write a query like... 

   [ homicide investigation ~officer  “San Antonio” ~report ]

You can see why users might like this--they want to get Google's synonym expansion for "officer" and "report" but not homicide, investigation or San Antonio.  That is, sophisticated searchers KNOW those terms (homicide, investigation, "San Antonio") will be in the articles they seek, but they don't know which synonym was used for "officer" or "report" or “account” or… whatever the synonym might be.

But the way to do this now is to quote the parts you don't want to change, and let Google's automatic synonym code do its work.  To wit: 

   [ "homicide" "investigation" officer "San Antonio" report ] 

Of course, in truth, I wouldn't use the quotes here for anything except "San Antonio."  

I agree it's a useful operator in a very small number of cases, but the usage was minuscule bordering on nonexistent, and the feature took up a disproportionately large amount of index space given the low usage. So Google made the call to turn it off.  (Or to think about it in personal terms, I used ~ twice in all of 2011, and I write about this stuff.)  

Changes in how synonyms work at Google really mean that ~ isn't needed any much more. When it launched ~, Google didn't have synonyms and for the first few years after launch synonyms, Google was pretty conservative about how they were used in modifying the query. That's really not true any more. In particular, in almost every multi-word query today, Google adds synonyms to any unquoted term, but as the result of the past few years of analysis, Google does a better job of holding the query's meaning close to the original than ~ ever did.  (There's a lot of clever bigram analysis underlying this, so it's clear that

By contrast, the way ~ operator used to work, it would add synonyms irrespective of context, so they might radically change the meaning of queries.  For instance, you probably don't really want search for: 

[ ~dog star ] 

as any high-frequency synonym would probably be in an incorrect context.  

After a fairly extensive measurements of this effect, it became clear that in most real use cases, the net interpretation with the ~ included in the query was not really what the searcher thought they were getting.  In fact, it was actually damaging most queries with ~ in it.  (How do we know this?  A: By running experiments... lots and lots of experiments asking people to rate the quality of the results from searches with the tilde, and without the tilde.  It wasn't even close.)  

What’s more,  ~ appeared in the logs so rarely that it wasn't worth the costs of maintaining the operator.  In reality, although it seems “free,” the truth is that the index costs Google real money in terms of storage, software complexity, and associated maintenance costs. (In addition, >90% of the times we saw searchers using ~ in the logs were clearly spurious and not intended to trigger synonyms.  They were mostly just errors or bots.)

Tech-savvy users (e.g., librarians, reporters, professional researchers, you know… readers of this blog) can always do manual synonyms using OR – e.g.,

   [ antidepressant OR SSRI OR flouextine  side-effects ]  

and get much tighter control over what they REALLY want to see as an expansion of the synonym.   

So while it’s true—Google has removed the manual ~ forced-synonyms operator, this frees up resources for us to do something even better and more globally useful while also making the vast majority of ~ queries be much better.  








7 comments:

  1. I decided the tilde was a bit useless long ago. But now that I am thinking about it, I am curious if it worked inside quotes? Would this search ["he was an ~old man"] have returned results for 'he was an ancient man' ?

    Alas, I am asking too late.


    ReplyDelete
    Replies
    1. Well... it DID work inside of quotes. (Again, not exactly well known.) Luckily, Google now does synonyms inside of quotes (but very conservatively). If you really don't want synonym expansion, you have to quote each term within the scope of the outer quotes. Example: [ "Now is the time for all good "people" to come to the aid" ] -- this will look for this phrase, but FORCE the use of the term "people" rather than any synonym, such as "man."

      Delete
  2. Has Google ever considered replacing the old + sign operator with the = sign? Putting double quotes around sigle words is an annoyance, that seems trivial but I notice time and time again how much I miss the plus sign operator.

    ReplyDelete
    Replies
    1. Yes, we've considered it, but it's not seen as being widely used (at best). But I wonder why you think of double quotes as an annoyance. Isn't it 1 extra keystroke? (Which is a pretty short amount of time, no?)

      Delete
  3. I really appreciated this thorough and pragmatic explanation. Thanks.

    ReplyDelete
  4. Just came back to check and see if this post was still here. I saw this posted to G+ https://plus.google.com/u/0/109856811237624829337/posts/UnEH67db8Ry

    At the bottom it had the reference to the Tips and Tricks on Inside Search. Sure enough the tilde is still listed
    http://www.google.com/insidesearch/tipstricks/all.html#similar-terms

    So is still there or is it turned down is relevance or just doesn't do anything at all?

    Just wondering. :-)

    ReplyDelete
    Replies
    1. Good catch. It's a bug. We'll remove it shortly. Thanks for pointing it out.

      Delete