Friday, June 6, 2014

Answer: Finding something in the UC System

This one turned out to be harder than I expected.  A bit like finding the eye of the lionfish.  
Lionfish upclose. Bahamas, 2010.


These were the Challenges:  


1.  Can you find a talk (or seminar) that will be given somewhere in the UC schools during this week (between June 1 - 7, 2014) on some aspect of corals?  
2.  Remember my affinity for Parrotfish?  Is anyone in the UC system doing research on Parrotfish?  (Can you list their name(s) and what the gist of their research is?  Made-up example:  Dr. Smith at UCSD is studying feeding behavior of Bonarian Parrotfish.)  
3.  Suppose I decide to give up this crazy computer science lifestyle at Google and become a marine biologist. Which of the ten UC campuses has the best marine biology research program that I should join?   (Be sure to say why you believe that, given what you found in search.) 


The key to answering these Challenges is to realize that you need to search all 10 campuses at the same time.  

You could do a query like:  


     [ coral seminar June 2014 site:ucsd.edu OR site:ucdavis.edu OR  ...etc etc...   ] 

but that's pretty unwieldy, especially if you're going to do a query like this more than once.  
This is a task that calls for a Google Custom Search Engine.  

What IS a Custom Search Engine?  (aka CSE)  It's basically a way to encapsulate a search query and then add a few extra terms when you actually do the search.  

As an example, if you look on the right side of this blog you'll see a box that's labeled "Search the SearchResearch Blog" -- see that?  If you put search terms in there and click on the Search button next to it, you'll run a search that is over JUST all of the postings and comments from the blog.  It's just as though you've done a search with a site:searchresearch.blogpost.com filter on.  

In essence, it takes whatever terms you put into the query box and appends them to that site:searchresearch.blogpost.com and sends it to regular Google.  So, if you want to find that post I wrote about parrotfish poop, you'd be doing a search like this: 

     [ parrotfish poop site:searchresearch.blogpost.com ] 

Obviously, you can do a lot more with this (because ANY legal query can be the basis for the CSE), but for the moment you can think of the CSE as a giant SITE: restriction.  

How does that help us?  

Because what I want to do is repeatedly do queries with a long list of SITE filters:  

     [ coral seminar June 2014 site:ucsd.edu OR site:ucdavis.edu OR   ...etc etc...   ] 

So let's build a CSE that searches over ALL of the ten UC campus websites. 

The first time you go to the CSE web page (Google.com/cse), it will look something like this: 



To make a new CSE, click on the "Create a custom search engine"  (it's all free--don't worry).  

When the creation page opens up, enter a reasonable title for your CSE.  Here I've put in "UC Campus Search" 





And now you have to add in the sites for the UC campuses.  You do this by adding each campus site in the "Sites to search" at the top of this page.  In the image below I've started to fill in a few... starting with UCSD.edu, the UCSF.edu, etc.  (Don't worry if you don't do all 10 campuses here, you can always add more later.)  

Note that I just used the top-level domain for each campus.  (e.g UCSD.edu)  The CSE will automatically include all of the subdomains (e.g., Scripps.UCSD.edu) in the search results.  






Once you've added in the top-level domains for all 10 campuses, you'll have the "UC Campus Search" CSE!  

LINK to my "UC Campus Search" CSE.  You can use this to follow along for the rest of the blogpost, but I encourage you to try it out yourself.  

Now, when you go back to Google.com/cse you'll see the this CSE.  If this is the first you've made, you'll only have one CSE.  I've made quite a few; here's the top of my CSE list.  




AND NOW we've built our own tool for searching over all of the UC Campus web sites.  

You can use the CSE by clicking on the "Public URL" icon (looks like a chain) on the right.  That brings up a minimalist interface:  




Let's turn back to our Challenges.  

1.  Can you find a talk (or seminar) that will be given somewhere in the UC schools during this week (between June 1 - 7, 2014) on some aspect of corals?

I'm going to use my "UC Campus Search" CSE to search for: 

     [ coral seminar OR colloquium June 2014 ] 

(I did seminar OR colloquium because they're terms that are often used to describe talks given in a university.) 

Once I run this query, I get back 93 results: 



(I'm getting the ads at the top because I made the free version, which is supported by ads.  If you want to pay a bit, you can get the ads-free version of the CSE.  For my purposes, the free one is fine.)  

As you can see, there are quite a few seminars being given.  After poking around in the results for a bit, I find that there's a seminar on "Ocean Apocalypse Now" which discusses the future of corals in the sea, to be held at the Bren Center (UC Irvine) on June 2, 2014.  

You might notice that the first result ("Marine Biology Seminar") looks like it's incorrect.  If you click on that page and Control-F for "coral" you won't find anything.  BUT that's because it's a calendar page.  

Typically, when a calendar page is loaded, ALL of the months are loaded with it.  To find the seminar about coral, you have to go back to May, 2014, and you'll find there's a seminar on May 8th at Scripps entitled "The Benthic Underwater Microscope: A Novel Tool for In Situ Microscopic Observations of Coral Reefs."  Wish I'd seen that talk; it sounds excellent.  

See where this is going?  We've got a powerful search telescope that's trained on the UC campuses.  

2.  Remember my affinity for Parrotfish?  Is anyone in the UC system doing research on Parrotfish?  

Let's do the obvious search in CSE: 

     [ parrotfish research ] 

And we'll find lots of great hits.  If we'd tried to do this search WITHOUT the CSE, we'd be drowning in irrelevant results.  The CSE is a giant filter that gives us just the kinds of results we want.  





You can find lots of parrotfish scholars.  

The first result shown above is Melissa Roth at Scripps Center for Marine Biodiversity and Conservation who's doing really interesting research on parrotfish sizes on the reef.  Does it make a difference?  (Short answer:  Yes.  See her web page for details.)  



3.  Suppose I decide to give up this crazy computer science lifestyle at Google and become a marine biologist. Which of the ten UC campuses has the best marine biology research program that I should join?

So... this is a qualitative problem.  Let's first figure out WHICH of the campuses has a Marine Biology program.  

Using the CSE, I searched for: 

     [ "marine biology" major ] 

thinking that I only would want to go to a school with a marine bio major program.  

I quickly found that the leading contenders are UCSB (Santa Barbara), UCSD (San Diego), UCSC (Santa Cruz), and UC Davis.  The other schools have classes and programs, but these four schools seem to publish the most and have the highest activity in the area.  

So I'll just start drilling down into each school using a pattern like this: 

     [ UCSB "marine biology" ] 

in the CSE.  That quickly gives me a nice overview of the work going on at that school in marine bio.  

It doesn't take long for me to figure out that while all of the school have fine programs, USCD has the Scripps Marine, UCSB has the Marine Science Institute, UC Davis has the Bodega Marine Lab, and UCSC has the Institute for Marine Sciences (not to be confused with UCSB) and works closely with the Monterey Bay Aquarium Institute.  

At this point, I'd start looking in detail at the research focus of each place.  I'd probably look in detail at UCSD's Scripps, along with UCSC and UCSB's institutes and checking out their interests in corals.  Probably by doing something like: 

     [ UCSD "marine biology"  coral ] 

which will tell me who on the faculty is currently doing active coral research, whether or not they have graduate students, and details of their work.  (That's how I found Jennifer Smith at UCSD who's working on coral reef stressors and effective management practices.  Now that sounds like a great research program.  Her YouTube lecture on "Benthic Coral Reef Community Dynamics" is great.  If I had a second career... )  

Bottom line:  For me, it's a toss-up between UCSD and Scripps (because of the coral research) and UCSC (because it's very good, it's close to where I live now, and has an extensive research system).  

But they'll all good.  You really couldn't go very wrong.   

Search Lessons: 

The biggest lesson for today is the use of the Custom Search Engine.  It's incredibly handy when trying to do repeated searches that involve filtering or query modification.  

CSEs are also really useful when you want to focus in on just a particular KIND of result (e.g., limited to a timespan, or with certain META tags).  I'll write up more on CSEs in the future, but for today, remember that they're incredibly easy to set up and share (by making them public.)  


26 comments:

  1. A question for clarification - in your note above, you say "Note that I just used the top-level domain for each campus. (e.g UCSD.edu) The CSE will automatically include all of the subdomains (e.g., Scripps.UCSD.edu) in the search results."

    But in the image it shows using the wildcard (*) for parts of the site and also for subdomains. So is the wildcard not needed?

    ReplyDelete
    Replies
    1. The * is optional. Doesn't hurt (but doesn't do anything different either).

      Delete
    2. Next for clarification - does inputting the url as www.example.com limit the SERP as opposed to example.com?

      Delete
    3. Yes.

      The results from [ site:www.nytimes.com TERM ] are not the same as those from [ site:nytimes.com TERM ]

      If you add a subdomain (which is what the WWW in front of NYTimes.com is), then the search is limited only to that subdomain.

      Another example: if you search [ site:searchresearch1.blogspot.com Delventhal ] the results are different than if you searched for [ site:blogspot.com Delventhal ]

      Delete
    4. I have to share this with everyone. It brings to the forefront how this blog is loaded with great information & some really good answers. Fred while I was reviewing your comments here I wanted to be sure I understood the results mentioned above so I did the queries Dr. Dan used to explain how subdomains work and did [site:searchresearch1.blogspot.com Delventhal] And first item listed in an answer Fred gave in Sept 2010. Wow Fred that was great! Pretty cool.
      http://searchresearch1.blogspot.ca/2010/09/wednesday-search-challenge-sep-29-2010.html

      Ramón you're right we need a way to see additional comments on old challenges. In the meantime I'm sure it's worth the time reviewing old challenges. I've done that on occasion but I would like to do more.

      Delete
    5. Thanks Rosemary. The Google Search Stories Video Creator is, alas, one of the tools that is no longer available. Teachers and students loved it because it made the "how did you get your answer" or "What search terms did you use" or "SHOW YOUR WORK" fun and interesting.

      Delete
    6. I thought top-level domain included www. Thanks for your questions and answers, Fred, Rosemary and Dr. Russell. Thanks to that, I learned something new and remove a bad knowledge from me. I am reading now more about domains.

      Remmij, you always find something interesting.

      Delete
  2. Sure glad you posted this answer early! Never knew the capabilities of CSE until now. Anne and I had heard about CSE but didn't realize that it could be used for a search query like this. Hope you post more on this because it is really interesting.

    Dan, if you found that most people don't know about Control/Command F, the number of people who know about how to use CSE in this manner has got to be way less than 1% of the population.

    ReplyDelete
    Replies
    1. Good Morning, Dr. Russell and everyone.

      The answer gave us a fantastic new tool. Once I tried time ago and for some reason, I didn't understand the tool so never used it again. Now, I finally understand its function. I thought it worked for something else.

      Thanks Jon for the Wikipedia tip.

      Thanks RoseMary for trying calendars. Yesterday I had no time to try. Glad you found a way and share it with me.

      Luis, for Hangouts, you don't even need to follow someone. Just search for them and send a message. You can in settings choose if they can contact you automatically or they need to send and invite. Either way, the answer is almost automatic. A message appears or you can search in invites.

      Dr. Russell, I was thinking the other day about past challenges and that new posts are made weeks, months or even years that the challenge happened. Is there a way to notice when someone post something new in past challenges without needing to check one by one? Thanks

      Have a great one, everyone.

      Delete
  3. That's slick.

    Had never heard of it,

    Back to Live D-Day coverage

    jon

    ReplyDelete
  4. A useful tool to use with Google's Custom Search Engine is the Google Marker bookmarklet. If you come across a site that you want to add to one of your CSEs, you just click the bookmarklet to add it. Looking at the other's CSE's for this challenge I realized I left off universityofcalifornia.edu. Here's a screenshot of what the popup looks like when I added it.
    It saves time when you find a site later and don't have the CSE site open already.

    ReplyDelete
    Replies
    1. Thanks for sharing this tool, Fred. It is very useful.

      Delete
  5. caveat: none of the following is about CSEs or filter… even though they are super important.

    coral reefs - super important, arrrrr
    produced in partnership with the Center for Marine Biodiversity, The Scripps Institution of Oceanography

    evangelism is super important - Dan makes search super important.

    photomegatron
    notes
    may be of particular interest to Fred (btw, useful tip on the Google Marker… a definite time saver) -
    TED & Furby UCSD
    Furby, Sandin Lab bio
    scan page 11

    and just because… had to picture something other than CSEs…
    Mobula Munkiana, or Munk's Devil Ray, is named after Scripps Institution of Oceanography geophysicist Walter Munk
    Lion terminus
    Bumpheads & Waldo
    black tip

    ReplyDelete
  6. Just wanted to say that the Lionfish eye is very hard to find! They look amazing in all colors that I look.

    Thanks for sharing this fish picture, Dr. Russell

    ReplyDelete
    Replies
    1. Ramón - enjoyed your photo from this a.m.. (Ramón's Popocatépetl) Found this footage that seemed to be a nice compliment to your image. Popocatépetl active
      Your picture worked well with Dan's Sea Ranch pic illustrating the ever changing nature of natural change — nicely done by both of you.

      Delete
    2. Good Morning and great week, Dr. Russell, Remmij and everyone.

      Thanks for your comments, Remmij, and thanks for the video of Popocatépetl. It looks amazing. Glad you shared with us and being able to see it.

      Dr. Russell's photos are fantastic. And now with Photospheres, they look even more magnificent.

      Delete
    3. … speaking of Madre Naturaleza
      you are correct about the DrD PhotoSphere, impressive, although I did wonder about the surf horizon - guess it is a
      hand-held thing, but I liked it, made it more human… in a non-Eugene sort of way;-∫

      Delete
    4. The mis-alignment seems to be a particular difficulty that the Photosphere stitching software has with handheld images. (I made that with my Moto X phone camera.) I thought I was being steady, but I guess not quite steady enough...

      Delete
    5. still think it is pretty impressive, the stitching software, the quality of the phone cam optics and the photographer's eye - just curious, if you recall - was that the start/stop point of the 360? there is a color difference right there too that might reflect a larger time gap in the sequence… may be more than being steady - a fair amount of discussion about "no-parallax point" & "the lens's entrance pupil"… it can become complicated quickly… regardless, you seemed to capture the spirit of the scene quite succinctly; I rather liked the jog in optic space… think you could probably exploit that characteristic - stitching perfection can wait. Wonder if anyone is doing much time-lapse photospheres? or rearranged sequences?
      A couple items for the PS (the other ps) curious:
      GooPS aid
      No-parallax
      complicated
      reddit/ANDROID

      Delete
  7. Is there a way to search within hundreds of websites at the same time without affecting the quality of the results? From my understanding, the CSE becomes a lot less accurate once you reach 20 websites?

    ReplyDelete
    Replies
    1. CSE's work just as well with 21 websites as with 20. (Where did you hear this story about decreasing accuracy?)

      There IS a limit to the number of sites you can include in a CSE (I don't know what the max number is). It's an interesting question, though--what kind of task requires you to search through a limited number of sites? (something like a medical research task?)

      Delete
    2. "If your custom search engine includes more than 20 sites, the results may differ from the results of a 'site:' search on Google.com and your search engine may sometimes display fewer results." https://support.google.com/programmable-search/answer/70392?hl=en

      Delete
    3. https://support.google.com/programmable-search/thread/3082211/cse-domain-limit-reduced-from-5-000-to-20?hl=en

      Delete
    4. In regards to the task: There are 82 external links in this webpage https://www.remotecompany.com/blog/remote-first-companies-list . I want to be able to search in them at once (keyword: content curator) or if that's too rare of a keyword then "analyst".

      Delete
    5. One task I needed for is searching within fully distributed companies (companies that are 100% remote and accept applicants from any country). The positions I'm looking for are numerous so I'd need to look for "analyst" or "curator" or "writer" or "pychology" or "seo" etc...

      Delete