So…you might have heard by now, and have been asking:
Why did Google turn-down the ~ (synonym) operator?
As regular readers recall, up until this past week, Google
used to have a single-term synonym operator. That
is, you could do a query like this:
[ ~beginner class ]
-- and get an automatic expansion of synonyms
for beginner (novice, freshman, inexperienced...)
or you could write a query like...
[ homicide investigation ~officer “San Antonio” ~report ]
You can see why users might like this--they want to get Google's
synonym expansion for "officer" and "report" but not
homicide, investigation or San Antonio.
That is, sophisticated searchers KNOW those terms (homicide, investigation, "San Antonio") will be in the
articles they seek, but they don't know which synonym was used for "officer"
or "report" or “account” or… whatever the synonym might be.
But the way to do this now is to quote the parts you don't want to change, and let Google's automatic synonym code do its work. To wit:
[ "homicide" "investigation" officer "San Antonio" report ]
I agree it's a useful operator in a very small number of cases, but the usage was minuscule bordering
on nonexistent, and the feature took up a disproportionately large amount of
index space given the low usage. So Google made the call to turn it off. (Or to think about it in personal terms, I used ~ twice in all of 2011, and I write about this stuff.)
Changes in how synonyms work at Google really mean that ~
isn't needed any much more. When it launched ~, Google didn't have synonyms and for
the first few years after launch synonyms, Google was pretty conservative
about how they were used in modifying the query. That's really not true any more.
In particular, in almost every multi-word query today, Google adds synonyms to
any unquoted term, but as the result of the past few years of analysis, Google does a better job of holding the query's meaning close to
the original than ~ ever did. (There's a lot of clever bigram analysis underlying this, so it's clear that
By contrast, the way ~ operator used to work, it would add
synonyms irrespective of context, so they might radically change the meaning of
queries. For instance, you probably don't really want search for:
[ ~dog star ]
as any high-frequency synonym would probably be in an incorrect context.
After a fairly extensive measurements of this effect, it became clear that in most real use cases, the net interpretation with the ~ included in the query was not really what the searcher thought they were getting. In fact, it was actually damaging most queries with ~ in it. (How do we know this? A: By running experiments... lots and lots of experiments asking people to rate the quality of the results from searches with the tilde, and without the tilde. It wasn't even close.)
[ ~dog star ]
as any high-frequency synonym would probably be in an incorrect context.
After a fairly extensive measurements of this effect, it became clear that in most real use cases, the net interpretation with the ~ included in the query was not really what the searcher thought they were getting. In fact, it was actually damaging most queries with ~ in it. (How do we know this? A: By running experiments... lots and lots of experiments asking people to rate the quality of the results from searches with the tilde, and without the tilde. It wasn't even close.)
What’s more, ~
appeared in the logs so rarely that it wasn't worth the costs of maintaining
the operator. In reality, although it
seems “free,” the truth is that the index costs Google real money in terms of storage,
software complexity, and associated maintenance costs. (In addition, >90% of
the times we saw searchers using ~ in the logs were clearly spurious and not
intended to trigger synonyms. They were
mostly just errors or bots.)
Tech-savvy users (e.g., librarians, reporters, professional
researchers, you know… readers of this blog) can always do manual synonyms
using OR – e.g.,
[ antidepressant OR SSRI OR flouextine side-effects ]
and get much tighter control over what they REALLY want to
see as an expansion of the synonym.
So while it’s true—Google has removed the manual ~ forced-synonyms operator,
this frees up resources for us to do something even better and more globally
useful while also making the vast majority of ~ queries be much better.
I decided the tilde was a bit useless long ago. But now that I am thinking about it, I am curious if it worked inside quotes? Would this search ["he was an ~old man"] have returned results for 'he was an ancient man' ?
ReplyDeleteAlas, I am asking too late.
Well... it DID work inside of quotes. (Again, not exactly well known.) Luckily, Google now does synonyms inside of quotes (but very conservatively). If you really don't want synonym expansion, you have to quote each term within the scope of the outer quotes. Example: [ "Now is the time for all good "people" to come to the aid" ] -- this will look for this phrase, but FORCE the use of the term "people" rather than any synonym, such as "man."
DeleteHas Google ever considered replacing the old + sign operator with the = sign? Putting double quotes around sigle words is an annoyance, that seems trivial but I notice time and time again how much I miss the plus sign operator.
ReplyDeleteYes, we've considered it, but it's not seen as being widely used (at best). But I wonder why you think of double quotes as an annoyance. Isn't it 1 extra keystroke? (Which is a pretty short amount of time, no?)
DeleteI really appreciated this thorough and pragmatic explanation. Thanks.
ReplyDeleteJust came back to check and see if this post was still here. I saw this posted to G+ https://plus.google.com/u/0/109856811237624829337/posts/UnEH67db8Ry
ReplyDeleteAt the bottom it had the reference to the Tips and Tricks on Inside Search. Sure enough the tilde is still listed
http://www.google.com/insidesearch/tipstricks/all.html#similar-terms
So is still there or is it turned down is relevance or just doesn't do anything at all?
Just wondering. :-)
Good catch. It's a bug. We'll remove it shortly. Thanks for pointing it out.
Delete