SearchReSearch: research

Showing posts with label research. Show all posts

Tuesday, July 12, 2011

Answer: How many planes from SFO / day?

Last week I asked a question that should be pretty simple to answer: How many regularly scheduled flights depart SFO (San Francisco airport) on a typical day?

The hard part here is to find a reasonable data source of flights from SFO. Here's what I did:

First search for [ faa flight departures ]

When I scanned the results, I quickly find that the FAA has an "ASDI Data Feed," which shows all of the flights that the FAA tracks, and that wouldn't be a bad thing to use, but I only want to do this once from one airpot, not many times over and over, so that seems like overkill.

In my reading of the results I also learned about codeshares. Ah, that's right, any listing of airline flights will have the same flight listed as being from multiple carriers. Since I wanted the number of planes departing, I need to exclude the codeshare flights. Don’t want to double or triple count these, so I want to get a singlecode for 1 plane.

After then searching for [ SFO flights listing ] I find that FlightStats.com allows you to search for all departures by the hour! Although it's inelegant (and I wouldn't want to do it this way more than once), I can manually copy/paste all of the departures from SFO into spreadsheet! Luckily, FlightStats also has a "Remove codeshares" button which lets me skip the post-data download step of removing dups.

It seems painful, but the truth is that I was able to grab the 18 screen-fulls of data in about 3 minutes and paste it into my spreadsheet. Great!

After spot checking to make sure that I really DID have all the flights, and that there really were no codeshares, and that I hadn't accidentally duplicated data somehow, I was ready to go. Finding the number of flights was then just looking at the last row in the sheet: there are 583 regularly-scheduled flights / day out of SFO (at least for July 12, 2011).

But I couldn't stop there! From here, it was easy to write out a CSV file and then import that into Google Fusion Tables in order to use the map visualization tool that I know is there. A quick import, and then a click on "Visualize" and then "Map" yields:

Interesting, is it not, that there are more flights to more cities in China that to Russia, Africa and Australia combined?

And for my last quick look at the data, I computed a histogram to see the number of flights departing by hour of the day. As you can see, it's quiet after midnight, but pretty steady for most of the day.

And finally, this chart that buckets departures by 30 minute buckets...

So now you know the quiet times for departures at SFO. I've flown enough times out of SFO to say with some certainty that this is probably right.

People who left comments (Hans, Julia and gasstationswithoutpumps) all had reasonable guesses, but they were estimates from other people's summaries. That's a valid method when you can't get to the actual data, but when possible (as in this case), I like to get the raw numbers and do the analysis myself. (Still--Hans got extremely close to my number! Nice job.)

There are a couple of search lessons here....

1. When you can, get the raw data and do your own analysis. Other people's analyses often contain assumptions that you can't verify (and they might not even explain). Be careful!

2. A really important part of the search analysis process is learning stuff along the way. This is such an important concept that we'll have to talk about it in more detail in a future post, but I wanted to highlight it here. I was reminded about the codeshare problem as I read through the initial search results... and that let me know how to frame my subsequent searches to be more to the point.

Keep searching!

Postscript: I also should mention here Aaron Koblin's wonderful visualizations of air traffic patterns. It's not quiet what I was asking for, but you might be interested in seeing them.

Here I zoomed into his flow patterns around SFO. The labels are mine to add a bit of clarity.

Monday, March 21, 2011

What counts as a credible source for a newspaper article?

There's been a recent kerfuffle in the NYTimes about what counts as a credible resource for news reporting.

To start, there was an article in the Times about how Michelle Obama is advocating breastfeeding. The problem begins with the article containing with a few quotes from “anonymous bloggers."

Arthur Brisbane, the public editor for the Times and the “reader’s representative” who responds to complains and comments, wrote in commentary.

The Times’s policy on anonymous sourcing describes the use of anonymous sources as a “last resort to obtain information that we believe to be newsworthy and reliable.” The policy goes on to say that the identity of every anonymous source should be known by at least one editor. Unfortunately, it is all but impossible to know the identity of an anonymous blogger.

That said, I can see some merit in reporting on how a particular political controversy is playing out in the blogosphere, so long as that is clearly described as the basis of the story.”

Katie Zernicke, the reporter who wrote the original story responded saying:

“…her reporting included interviews, as well as the blog material, and that interviewees were reluctant to be quoted. The blog material supported what she was hearing in her interviews, she said.

“I wouldn’t have taken only blog postings, but given that they backed up what I was hearing elsewhere – and that this was generating so much comment online – they seemed to be relevant,” she said.

So… Should you use anonymous blog posts in news coverage? I would say generally no, there’s really not any value added to the conversation.

The point of reporting is to add some information or insight to the public conversation. If the blog quotes were brilliant, incisive bits of text that added a perspective or nuanced and colorful language, it might be a plausible addition. But the quotes are pedestrian. They fell like filler more than content.

When you’re trying to make sense of a complex argument or data-space, your task is to gather insightful, dataful and useful content. Anonymous bloggers do not a source of data make.

Now if she had said 90% of all blog posts in the US during the past week were pro- or con- this position, THAT would have been an interesting insight. (I presuppose that the author would have some way of crawling the blogosphere and sentiment labeling the comments; something, perhaps, along the line of “I feel fine” by Jon Harris and Sep Kamvar.)

But one unknown blogger with an opinion is pretty cheap, content free stuff. The NY Times really shouldn’t be reaching that low for their sources.

Keep searching! (And always use credible sources.)

Friday, January 14, 2011

Quick check on the validity of an image

Did anyone else see that amazing image of the International Space Station shown in sihlouette against the sun?

http://media.npr.org/assets/img/2011/01/06/eclipse110104_solar_transit_33.jpg

I wondered--is that even possible? Or is this a completely Photoshopped image?

First, how big is the ISS? Here's a picture from NASA to give a sense of scale:

In other words, it's about one flying-football-field.

How big would a football field look if if were flying at the altitude of the ISS?

Well... here's a bit of math I did to do a quick check on it.

I did a few obvious searches to figure out that the ISS is 109m maximum width and flies at an average distance of 350km. With these two bits of info and a little geometry, you can work out what the subtended linear angle of the ISS would have to be. The geometry is easy...

54.5m is half the ISS width, which you need as the base of the triangle. 278.000005 is the length of the hypotenuse according to Pythagoras.)

To work out the subtended angle, you just compute the arcsin ( 54.5m / 278.000005km ) -- that's about 0.011 degrees.

Luckily, this is really easy with Google.

How did I do that? Once I'd worked out the hypotenuse length (350.0004 km) I just turned to Google Calculator. Here's my query:

[ arcsin ( 54.5m / 350.0004 km ) in degrees ]

That is, this bit of trig computes "what is the subtended angle of the ISS?"

( NOTE: I added "in degrees" at the end because the Google Calculator gives back sin/cos/tan/arcsin... (etc) measurements in radians. But I wanted degrees, because I happen know (from another query) that the sun is 0.53 degrees wide. )

This Google Calculator expression tells us that half the ISS width is 0.008 degrees wide (remember, we divided the image in half in order to do the right-angle trig up above). So the subtended linear angle of the ISS at 350 km overhead is 0.016 degrees.

So...

... if you measure the photographed width of the ISS in the image, the width of the ISS image should be about 1/50th the width of the sun image. (How do I know what? The subtended linear angle of the sun is 0.53 degrees. Divide 0.53 by 0.01 and you get 53.

I measured it quickly by doing a little copy/pasting on the image (see below), and got it at 1/52cnd... close enough. (That's well within measurement error on my part.) Here's the diagram I drew to measure it. I took the image and drew a light blue line below the ISS that's exactly the width of the ISS. I then drew a bunch of dark blue bars that same width across the radius of the sun.

Each of the dark blue bars is 1 ISS width. Each of those light blue lines is 10 ISS widths across... so it looks to me like the sun is 52 ISS widths wide.

I also looked around a bit, and found multiple OTHER images made by other amateur astronomers, all showing the ISS at more-or-less the same size with respect to the sun.

Overall, the picture checks out. It's internally consistent (that's why I was measuring it) and it's been replicated by other astronomers. So yeah... I believe it.

What a beautiful image!

Search (and measure) on!

Saturday, January 30, 2010

About this blog--Why SearchReSearch?

I've been tempted for quite a while to create a blog. But I was finally pushed over the edge when i realized that there are too many good ideas about how-people-search, too many fascinating tales of mystery and woe that should be told, too many little morceaux that should be shared.

Seems to me that's what a blog is all about: Writing a little bit each week to crystallize an idea into a meaningful collection so that the combination of small strokes becomes a big IDEA.

I have to warn you before you start reading: In the back of my head, I want something tangible to emerge from this. Ideally, a book, or a series of books, about how people search... how they research... and how they get good at doing this.

When you think about it, search is not something you're born with--there's no inherent, latent skills for research (the way there is, say, for walking or spitting). Some people are really good at it, others just never quite get the basics.

That's what this blog is about: What skills, tricks, tips, ideas (both small ideas and big IDEAs) should you know in order to be an effective searcher? Better yet, which of these combine to make you a great researcher?

And, is there a difference between search and research?

Yes, there is. But you'll have to read along to find out what the difference is. Stay tuned. This blog will have moments of sublime insight; it will also have moments of pure personal reflection... but I'll try to stay on task. If you visit often, you'll learn a great deal about search, searching, researching, and co-incidentally, web search engines.

And, to give my non-disclosure up-front: I work at Google, and while I'll probably give lots of Google-specific hints and tips, I'll also give other search engine features from time-to-time as I find them useful or compelling (or even just really interesting).

-- Begin searching!
-- Dan --

SearchReSearch