Wednesday, July 24, 2024

SearchResearch Challenge (7/24/24): Can you find the shellmounds?

Some ancient constructions are not made of stone...

Shellmound in Emeryville, CA, party torn apart. From Publications in American Archaeology and
Ethnography Journal, V 23 (1926)

... but are giant piles of debris. Throughout much of the world, these middens or shellmounds, or shell heaps are built up over centuries.

While driving over to Berkeley the other day, I crossed a road called Shellmound Street, which apparently was originally the road leading to the shellmound shown above.

There are lots of stories about the shellmound on Shellmound Street. It apparently once had a dance hall on top of it, and an amusement park off to the side. It was there until

Over the years I've seen shellmounds and "regular" Native American mounds at various places around North America. While the Emeryville shellmound is probably the biggest one I've lived near, the size of it made me wonder if there weren't a few others in the San Francisco Bay area.

Today's SearchResearch Challenge is fairly simple, but getting a good quality answer might be tricky.

1. I want to know if I NOW live near any Native American shellmounds. (By "near," I mean within 5 miles / 8km.) Can you find a map that shows an extensive set of shellmounds in the SF Bay Area? Are any of those near Mountain View, CA? (Extra credit: Can you find any images of the nearest shellmounds?)
2. A big part of my family hails from the area around Rice Lake, WI. Are there any Native American mound structures there? If so, where?

Let us know HOW you found the answer to these Challenges! They're not hard, but as I suggested, sometimes it's difficult to find a really high resolution map that we can use to answer the questions.

Keep Searching!

Thursday, July 18, 2024

Answer: How can we find the best way to track developments in AI?

Tracking is an important skill to have...

Not an actual server farm, but one imagined by Meta's gen-AI.

... and we need to get good at it.

This week we asked ourselves, "How can we best teach ourselves about current developments in AI?" In other words, how can I be an autodidact about AI, a field that's changing rapidly?

So... this Challenge is really about what you do when you've got to drink from the raging torrent of your latest topic area. Here's some practical advice that might help you out...

1. What should I do to stay on top of / learn-about / understand the development happening right now in AI? Some of the tactics we talked about last week kind-of don't work well (the textbooks haven't been written yet), and even taking a course means re-taking that course a year later when everything changes. So... what's a SearchResearcher to do? What advice would you give? What resources have you found that help answer this Challenge?

This kind of rapid following / tracking is an important skill, perhaps now more than ever.

Here's my summary of what I do to try and stay on top of the AI game. (You can substitute your favorite fast-changing topic for "AI" in all that follows.)

1. Track the news through blogs. I sign up for new AI-tracking blogs when I see new posts, but I also fairly actively prune away the ones that aren't satisfying my interests. I currently follow:

Ben's Bites - mostly AI product launches
MLearning.ai - lots of how-to articles, a few good in-depth reporting articles
Jakob Nielsen on UX - but these days, mostly about the UX of AI
TLDR - 1 post/day, AI, ML, data science news
OpenAI blog - somewhat irregular
Google's Gemini blog - 1/week; lots of Gemini promotional stuff
Perplexity's blog - also somewhat infrequent
MIT AI News - academic, but really interesting

But note that this list will change as I add new blogs and delete ones that have drifted away from my interests. Don't just keep adding stuff to your list! Keep looking for newer / better / more aligned with your interests. (And be ruthless about getting rid of the deadwood.)

SRS Regular Reader Arthur gave some good advice in his response:

So I try to drink from that fire hose through blogs (AI Secret, There's an AI for That, The Rundown AI, and also posts from Perplexity's blog, You.com and others that say what's happening. I file these in a dedicated folder that at some point I can use AI to summarise and draw out the gold. (I haven't yet - but it's an option).
More importantly when something new grabs my attention I try it out. Whether it's Claude3.5, Llama3, DBRX on HuggingFace, ChatGPT4.5o and so on.

Arthur's advice is good: pick a few blogs to follow and try out new capabilities as you read about them. (Don't wait!) Experience beats reading in this regard.

2. Set up Google Alerts. We've talked about this before, but it's worth a reminder. You can set up alerts to run a daily query for you and then email you the latest results. Set up one or two and try it out at Google Alerts. You can set up an alert to track a specific company (e.g., OpenAI) or a specific site that publishes a lot of breaking results (e.g., arXiv.org).

3. Set up Google Scholar Alerts. We've also talked about this before... You can set up alerts to run a daily query for you just on Google Scholar contents and then email you the latest results. Very handy for tracking the academic work that's going on in your area of interest. Check out this great article about setting up a Scholar Alert from the University of Tennessee. Note that these results are completely disjoint from the regular Google Alerts.

4. Set up Google YouTube Alerts. Yes, YouTube has its own series of alerts. (Here's how to set them up. Yes, they are also disjoint from regular Alerts.) You can go set them up to be notified when a favorite Tuber drops an explainer or demo. (These can really be quite good: take a look at some of these papers.) Channels like "AI Explained" and "Two Minute Papers" provide accessible explanations of complex AI concepts and recent developments

5. LinkedIn. Remarkably, LinkedIn has become an incredibly valuable source of up-to-the-second information. I have a lot of connections that help keep me on top of the news, but I have found that following some folks like Ethan Mollick, Gary Marcus, Andrej Karpathy, and Eric Horvitz. You'll find more people that align with your interests.

Most importantly: I spend around 1 hour / day just doing my AI/UX reading--it's baked into my daily routine. It's really the only way to keep on top. Pretend you're taking a course on AI technology and this is your study time.

But don't fret if you miss a day or two--it's not worth your time to feel guilty about missing a few things. It really is a fire hose out there, and your FOMO is just a burden. Delete all the notifications you received (or give them a quick once-over before deleting) and get back to your normal practice.

Keep Searching!

Friday, July 12, 2024

PSA: How to grab text from an image

Every so often...

Pieces of paper falling from the sky. P/C by Gemini.

... I find myself with an image that contains a bunch of text. If it's short, I'll just type it into my notes. I'm a reasonably fast typist, so it's not a problem.

But then I get images with a lot of text, or very specific text with a lot of special characters (e.g., URLs or fragments of code), and I really would rather just copy/paste the text out of the image.

Here's a short list of ways to do this. Should this happen to you, learn one of these methods and make it a quick/easy thing to do. Hope you find this useful in your own SRS work:

1. Use your image clipping tool's grab text feature. I happen to use Snagit as my image grab tool of choice. When I see an image with text, I grab the image and then use SnagIt's "Grab Text" tool (Edit>Grab Text) to copy the text out of the image. Here's an example. This is an image of a text page from one of Ethan Mollick's LinkedIn posts. As you can see, this is just long enough that I'd rather grab it than type it.

Then, grabbing the text is a click away:

It's then copy/pasteable. Most (modern) image screencapture tools have something like this. If yours doesn't, consider changing to one that does.

2. (Mac) Use Preview. If you open an image with text in the Preview app, use the Text Selection tool (Tools>Text Selection). After you select the text, you can CMD+C to copy it, then paste anywhere.

3. (Windows) Use the Snipping tool to do the same thing. (Here's a handy video showing you how.)

4. (iPhone) Use the text grab tool. When you take an image with text on it, use the tool in the lower right of the image area (looks like lines of text surrounded by a rounded-edge fence).

This is especially handy when traveling in another country when you can take a pic, then click on the "copy text" tool.

Then swipe over the text you want to copy (or translate) with your finger:

And then translate, copy, or lookup as you wish:

5. (Android) Use the text grab tool. When you take an image with text on it, Android gives a large button that says "Copy text from image" (see below). You can also click on the "Lens" option to grab the text as well.

I had to look this up the other day to get a particular job done--and this PSA might help you avoid the same search.

Keep searching.

Thursday, July 11, 2024

More on: Do not believe citations created by Gemini

I wrote a post in February of this year...

... saying that the citations generated by Google's Gemini were not to be believed.

Since I believe in second chances and redemption, I revisited that post and re-did the queries.

It pains me to say that Gemini has not improved--if anything, it's gotten worse.

On the other hand, other LLMs have really stepped up their games. Perplexity, Claude, and ChatGPT 4o have gotten significantly better. They're now to the point where I'm going use them in my daily research. I'll still check their work, but in several cases, I've learned things that I wouldn't have found otherwise.

The Details

Yesterday, I redid my queries to Gemini, Perplexity, Claude, ChatGPT4o, and Meta's AI (which uses Llama 3). The two queries were:

Q1: [ why have house sparrows expanded their range
dramatically since being introduced into the US,
while Eurasian tree sparrows have not?
They're so similar, you'd think they would
expand at a similar rate. ]

and...

Q2: [ can you suggest further reading in the scientific
literature about the differences in range expansion? ]

Here's the summary of what worked and what didn't.

Gemini: The answer to Q1 is short and correct. It did not go very deep into any reasons for the differences. It was a merely okay answer. Oddly, it listed 2 citations for its writing, but somehow neglected to actually give the citations! (That is, the text has reference numbers like this: [1] -- but there's no actual citation for [1]!)

But Gemini's answer to Q2 was terrible. It gave 3 suggestions for further reading, but instead of actually giving the citations or a link to the papers, it totally punted! In two of the three suggestions, it says "[invalid URL removed]" -- what? In the 3rd citation, it shows blue text, as though was a link to a paper, but there's no link there. It's just blue text. WTF?

TLDR--Gemini's answers were short and misleading. No actual citations were produced. A pretty bad shortfall.

Perplexity: Answered Q1 with 7 reasons, all with citations (that worked!) to reasonable literature. Best of all, Perplexity found an answer to the question that all of the other LLMs missed (House sparrows have a really robust immune system, letting them outcompete the Eurasians). I'm impressed.

Perplexity's answer to Q2 listed 4 excellent papers in the scientific literature. Well done.

Even better, Perplexity now has a "Pro Search" capability that allows it to dig more deeply into the literature. That is, it does a kind of "slow search" (a term coined by my friend and fellow SearchResearch scientist Jamie Teevan) digging into extra resources to give a better answer to the query.

Using Perplexity Pro Search found 5 additional papers that are all real papers that are spot on. (What impressed me the most is that Pro Search found a great paper that had eluded me when using traditional search methods. Kudos.)

ChatGPT4o: The overview answer to Q1 is fairly good--accurate in all details. Not especially deep, but fine as an introduction.

But ChatGPT's answer to Q2 wasn't as good as Perplexity's. The citations were often to book chapters that are VERY hard to access. One of them was written in 1951, which is fine, but really hard to access. (And, truthfully, the field has moved on from there!)

And, because hallucinations run deep, one of the citations is fictitious. Alas. It was going well, but then the LLM had to make up one of the papers. Dang.

Claude: Also had a good answer to Q1--deeper than Gemini, about the same as ChatGPT. No citations, but not bad.

The answer to Q2 was much like ChatGPT's answer--slightly dated, with one hallucinated citation. And one of the citations is to an actual chapter in a book, The Birds of North America, but it's a massive book that comes in 18 volumes (and costs more than $2000)... but the citation doesn't say which volume it's in! So close.

Meta AI: A decent answer to Q1, but the answer to Q2 is a bit confused. The answer lists 7 factors that contribute to the difference in success (well done!), but the citations are deeply messed up. Three of the bullet points list citation #1 as their source, but each bullet point describes the citation in a different way! (As a book chapter, as a summary article, or as a journal paper.) It looks like there were supposed to be 3 different citations, but they somehow all got lumped together. Maybe there's good stuff in there, but it's hard to tell--the citations are missing and mixed up with each other.

Bottom line: If I was giving out grades:

Why am I so tough on Gemini? Mostly because, like many teachers, I want to say "you're not living up to expectations." Google has Scholar, for heaven's sake. It should be trivial to check their own results. There's no reason it should be producing "[invalid URL removed]" in the results. This text suggests that they ARE checking, but then not following through when the check fails.

By contrast, I am really impressed with Perplexity. I'll be using it more-and-more! (But, as always, I'll be double-checking everything. Great results on this short test, but I still have my doubts.)

For people who want to delve more deeply into what I saw as the results, here's a link to a PDF with images of all the results. You can check them out if you'd like to see what I saw in my testing.

Keep searching. (And keep double checking those AI results!)

Wednesday, July 10, 2024

SearchResearch Challenge (7/10/24): How can we find the best way to track developments in AI?

A logical continuation from last week's Challenge...

Not an actual server farm, but one imagined by Meta's gen-AI.

... is to ask ourselves "How can we best teach ourselves about current developments in AI?" In other words, how can I be an autodidact about AI, a field that's changing rapidly?

Assuming you read last week's Answer to the Challenge of How to Find the Best Learning Resources in a Field? you'll appreciate the distinction here. That was about how to teach yourself about a field that currently exists and has many resources to search out and use. It was all about how to teach yourself about a deep, quiet pool of knowledge in an area.

But the current wave of AI developments and research results is pretty staggering. It's a growing, flowering, burgeoning field... more of a raging torrent of claims, counter-claims, launches, take-downs, and lawsuits.

How can a person deal with all this?

You've heard the metaphor:tracking all of the changes in information technology is like "drinking from a fire hose." That's not bad, but fully charged fire hoses are relatively rare, while it seems that every few months yet another torrent of information springs up, raging down the hillsides and into our lives. It's not just AI, but also connectomics (how brains are wired), epigenetics (the study of how inheritance happens without DNA), synthetic biology (how to engineer organisms for various purposes), or health data sensors (gadgets to track health data). Hey, even a field that you THINK would be quiet and placid like archaeology turns out to be pretty active as new imaging systems and sensing technologies hit the field. We live in a time of rapid development--the science and tech rains deliver a new waterfall of knowledge each week!

So... this Challenge is really about what you do when you've got to drink from the raging torrent of your latest topic area. Let's frame this meta-Challenge with respect to AI. After all, in our SearchResearch land, the developments in AI are rapidly changing things we knew about online research.

1. What should I do to stay on top of / learn-about / understand the development happening right now in AI? Some of the tactics we talked about last week kind-of don't work well (the textbooks haven't been written yet), and even taking a course means re-taking that course a year later when everything changes. So... what's a SearchResearcher to do? What advice would you give? What resources have you found that help answer this Challenge?

As always...

Keep Searching!

Friday, July 5, 2024

Answer: How to find the best learning resources in a crowded field?

If you're going to teach yourself something...

Pyramid of Giza. P/C Jeremy Bishop, Unsplash

... you might as well start at the beginning. Since Ancient Egyptian history begins around 3150 BCE, that pretty much qualifies as "the beginning." (Yes, I know there are earlier civs, but it's a nice, big topic with great resources to demonstrate my points today.)

As I said, suppose, just suppose, someone in your household gets an interest in learning about ancient Egypt. It won't take you long to learn that there's an entire scholarly discipline on the subject.

Knowing that brings up an entire world of questions--how can I approach teaching myself a big topic?

And so...

1. How would you organize a plan to learn about Ancient Egypt? What kinds of searches would you do to get to the heart of a big, well-established topic like this? What kinds of things should one think about when starting on such a project?

The first thing to think about (and be able to answer)...

1. What do I want to learn about this topic?

It's important to know for yourself what you're trying to learn. Note that it's totally fine to wander around without a goal, but understanding what you want to learn will help determine your steps.

There are several kinds of ways you can answer that question ("what do I want to learn?").

Topic area: The hardest thing when you start to teach yourself a topic is knowing what's in it. With a gigantic topic like Ancient Egypt, there are a zillion subtopics in that space. Advice: Start with what interested you and expand as you learn more. But be metacognitively Marie Kondo about your topic choices--"will learning this topic spark joy in my life?" Remember you can always change your topic areas as you learn more. (It might turn out that studying Ancient Egyptian sculpture styles no longer excites you as much as you though!)

Breadth vs. depth: You could be interested in just a quick, broad overview of a topic. You've only got 2 days before your trip to Cairo--what are top 20 things I need to know?

Or you could want a deep knowledge of the topic and plan on spending the next year studying Egypt every day.

In either case, what you want to learn will tell you how to organize your time. Quick and broad? Or deep and intensive? Both are fine approaches (I've taken many an online course for only the first few hours just so I'd have an idea of what the topic covers.)

Skill vs. knowledge learning: If you're teaching yourself something that's skill-based (I want to learn to speak Arabic) then your approach would be very different than if you want to understand a body of knowledge (the culture of Ancient Egypt), which is more of a knowledge-base.

Naturally, some topics have both parts: Learning physics or chemistry requires you to both learn the knowledge (e.g., what is inertia? what is a covalent bond?) AND learn the skill of doing the math that goes with the knowledge. You can learn physics without the math, but recognize that it's a very different kind of knowing. For our purposes here at SRS, teaching yourself a skill (like physics math or speaking German). Learning something like "play piano" is amazingly difficult from a book--performance / behavior skills are really much better with a tutor.

2. Organize your learning approach.

Where do you start? The topic of "Ancient Egypt" is just massive. We normally talk about "Ancient Egypt" as running from 3150 BCE to 30 BCE. There were about 170 Pharaohs (list of all of them) during this time span--that's a lot of history to take in.

Timeline from Wikipedia article on "Ancient Egypt"

Think about what you want to learn and organize your self-teaching to match your target. If you are looking for a broad-brush approach, seek out overviews and tutorials that match your time budget. If you're looking to learn more about a topic in detail, think about finding sources that go into depth on that. But first...

Get an overview to start: Even if you're going to plumb the depths of a topic, you need to start with a decent overview, if only to learn what the boundaries of your topic really are.

I usually go to the Wikipedia article as a starting point--they're usually quite good and people spend huge amounts of time arguing about what should (or should not) go into the article. For instance, the Wiki article on Ancient Egypt is around 13,000 words long (about 29 printed pages), with 216 citations, many with helpful links to get you to the original articles.

And even if you're going to pursue a topic in depth, there is often a Wikipedia entry about that topic as well. Examples: Music in Ancient Egypt; Ships in Ancient Egypt; Clothing in Ancient Egypt. Of course, you don't need to limit yourself to Wikipedia--there are many other encyclopedic collections online. (For instance, Britannica.com has a great intro to Ancient Egypt that has about 3 times as much detail as Wikipedia.)

Leverage other people's overviews: If your topic is likely to be taught in schools, chances are really good that someone (some teacher!) has gone to the trouble of making an outline, course overview, or syllabus that will let you know the lay of the land.

I know that Egyptology is a thing--many universities still teach it as a subject area. If I want to search for their syllabi, I'd do one (or all) of these searches:

[ site:.edu Ancient Egypt syllabus ]

[ "course overview" Ancient Egypt history ]

[ "course summary" Ancient Egypt ]

You'll find a LOT of syllabi that will each give their particular outline of the topic.

Make a list of what you want to learn: Once you know what you want to learn, I highly suggest making a quick list of the things you want to learn. It doesn't need to be anything complicated--it's just a way to help you map out what you want to know at the end of your studies. (I have a couple of yellow sticky notes that I use to organize my thinking / reading / studying on a topic. Easy, fast, simple.)

3. Find high quality resources that match your interests.

Take everything we've ever talked about in SearchResearch and apply it here. Your goal in teaching yourself is to be efficient and accurate. Be constantly aware that there are multiple voices and opinions on everything, even something like Ancient Egypt, where you think things would have been figured out by now. Not true! Be sure to look for reputable sources, be sure to triangulate what you learn, etc.

Remember that there are a LOT of books online: Google Books, Hathi Trust, Internet Archive. (And of course, your local library!)

One other resource to consider are the plethora of online courses. In the case of Ancient Egypt, they're plentiful. A search like this will find what you seek:

[ online course "Ancient Egypt" ]

There are the usual online course providers (edX, Coursea, Udemy, etc.), but many universities also have online classes (they can be high quality, but usually also charge for the classes).

In addition, I've found the professional online teaching sites (e.g. The Great Courses, or something like ClassCentral that organizes other online courses) that offer Egypt classes to be quite good.

Finally, don't underestimate the value of in-person classes (there's much that you learn that's NOT in the video or books). I've taught a lot of in-person classes (probably around 2000 or so) and as you know, I've done a lot of online classes as well (around 20, each with many lessons, with a total student engagement of ~5 million students). So take it from me--as great as online classes are, the in-person classes are often better. You can ask questions, and the extra information and contextualization of knowledge is an important part of real teaching.

Overall, my recommendation is for you to be very metacognitive about what you're doing. (This means constantly asking yourself "What am I trying to do here? What's the next step? How will I know when I'm done? Is this the best approach?") Learn more about being successfully metacognitive about your own self-teaching here.

Keep Searching! (And have a great time learning whatever it is you want to learn!)

SearchReSearch

Wednesday, July 24, 2024

SearchResearch Challenge (7/24/24): Can you find the shellmounds?

Thursday, July 18, 2024

Answer: How can we find the best way to track developments in AI?

Friday, July 12, 2024

PSA: How to grab text from an image

Thursday, July 11, 2024

More on: Do not believe citations created by Gemini

The Details

Wednesday, July 10, 2024

SearchResearch Challenge (7/10/24): How can we find the best way to track developments in AI?

Friday, July 5, 2024

Answer: How to find the best learning resources in a crowded field?

Followers

Blog Archive