Thursday, April 10, 2025

Answer: Can you extract and summarize a blog?

Can AI systems really read the web?

P/C Gemini [elderly robot reading a long scroll of blog names]. No I don't recognize any of the blogs shown on the scroll, nor why it uses so many different fonts in a ransom note style. I don't trust these results, and you shouldn't either.

One of the latest claims being made by the LLM / AI providers is that the "knowledge cutoff problem" isn't a real problem any more. Most providers say that their AIs now have access to realtime web information. For instance, when I asked ChatGPT about its live connection, this is what it told me. (With a live citation to an article on Lifewire.)

That sounds great, right? But let's do a quick test and find out... that's the essence of this week's SRS Challenge:

1. Can you get an LLM AI system to give you the 10 most recent links to a blog? For instance, could you get Gemini, Perplexity, ChatGPT, or MS Copilot to make a table of the 10 most recent blog posts?

That seems like a simple enough request for an AI system with live web access, yes?

Not to tip my hand or anything, but the reality of asking an AI to do something this simple just boggled my mind.

Just to check my suppositions about AI tools that can access the internet, I did the obvious search and found several article documenting that they claim access. (see: WritingMate, YourEverdayAI, Otterly) Note that this list will change over time as more and more LLMs gain live search engine access.

It's easy to find out what the providers say. I just did a few site searches like this to find out:

[ site:openai.com live web access ]

For instance, here's OpenAI's comment about live access: OpenAI blog

The claim is that Google Gemini, OpenAI ChatGPT, and Perplexity all have live access to web content. To test this claim, I gave this query to each of them:

[can you give me links to the 10 most recent blog
posts from searchresearch1.blogspot.com]

Results:

Perplexity gave 3 links

ChatGPT 4.0 gave 0 links (but see below)

Gemini Deep Research with 2.5 pro gave me 10 links

Well, dang... okay.

Maybe they're not so great at simple questions--but isn't that sort of the point? If you've got an LLM that's planning on being an AI agent, you'd think it could do something like this.

When I looked at the 10 links in the Google list, 6 of them were wrong… invalid URLs. What's going on here? My spidey-sense was tingling.

So I made a spreadsheet that calls out to Gemini asking for the title, date, and a summary of the contents of that blog post for each of the links Gemini gave me in the answer. Here's a link to my spreadsheet, and an image below where you can see what happened:

Click to see full-size, or click here to see the sheet itself.

I know this is too small to read (click thru for details), but I want you to notice something very, very odd about this. The red-colored cells are completely wrong. That means rows 7-10 are completely hallucinated, lock, stock, URL, and summary.

The yellow-colored cells are close, but not quite right--the summaries are in the area, but wrong in most of the details. (Not a little wrong, a LOT wrong.)

What we're seeing here is that Gemini hallucinated content that it theoretically has access to! There's a lot to be sad about, but it kills me that several of the cells say "I lack the ability to access external websites, including the URL you provided. Therefore, I cannot provide a summary of the blog post."

What? That's nonsensical.

Even more oddly: I posted this Challenge a week ago and asked ChatGPT the same thing, and it gave me 0 results.

But AS I'M WRITING THIS POST I tried it again... and got 10 links! Here are some of those ChatGPT results:

To OpenAI's credit, they're all real links... but the dates are kind of wonky. (Several are off by one.) And the title of the last post "Using Notebooks to keep track of your research" is a bit off--the actual title is "Using NotebookLM to help with Deep Research"! Hallucination is rampant.

Just out of a sense of heightened due diligence, I started a new chat with ChatGPT and asked the same question again. Here are the results this time (literally 10 seconds later):

How interestingly weird! It's found new blog posts! How strange that it found the post "What building is this?" from (purportedly) March 5, 2025. The reality is that I posted that on January 29, 2025.

Seeing that ChatGPT's results aren't stable, I was curious about Gemini's results. If I re-ran the query to Gemini Deep Research 2.5 pro, would I get different results?

Result? YES, they're mostly the same, but with a few additional cleverly hallucinated results.

The thing that kills me about all this is that, according to their own press, the LLM has access to the internet. How can they screw it up? Gemini told me that:

"From the initial list, I've carefully selected the primary URLs for the 10 most recent and unique blog posts. I made sure to prioritize the direct links to the blog articles, even when other links like images or external references were present in the initial data."

Really? I don't believe you.

Despite saying this, Gemini clearly didn't do the obvious step of checking to see if the URLs are valid. If Gemini did that, they would have seen this:

It wasn't a minor hallucination either--4/10 of the links were bogus.

SearchResearch Lessons

It should be obvious by this point but...

1. If your research is high-stakes or mission-critical, DO NOT rely on LLMs--even ones with live connections to the internet--to return reasonable results for even the simplest of directions.

It should be pretty clear by now that hallucinations are still a huge problem. Yes, they're getting better, but there's not a single day where I don't see hallucinated results in LLM output. Mostly it doesn't matter because I'm not asking it about mission critical information--but if you're doing something where it does matter, my advice to you is to check the data very, very, very closely. The results LOOK good... but they can be very very wrong.

Keep searching!

P.S. FWIW, EveryDayAI found pretty much the same thing as I report here. It's pretty bad out there...

13 comments:

remmijApril 12, 2025 at 9:46 AM

might be of interest - (focused mainly on use of AI in the legal field)
https://arxiv.org/html/2404.08189v1
https://www.harvey.ai/blog/biglaw-bench-hallucinations
https://www.defenseone.com/technology/2024/01/new-paper-shows-generative-ai-its-present-formcan-push-misinformation/393128/
https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive
from a link provided by Gemini, in response to the following prompt: (tried 4 of the links-see above-they seem legit)

"How often does AI hallucinate or provide misinformation? Presenting it as absolute fact."

Gemini 2.0 Flash Thinking (experimental) response:
"Based on the information gathered, it appears that AI models, especially large language models, can hallucinate or provide misinformation with varying frequency depending on the context and the specific model.

High Hallucination Rates in Specific Contexts: Research indicates that large language models can hallucinate a significant portion of the time, especially in complex or less common areas. For instance, one study found hallucination rates ranging from 58% to 88% when models were asked direct, verifiable questions about randomly selected federal court cases. These models also struggled to identify their own errors and often accepted incorrect user assumptions.
Lower Hallucination Rates in Other Contexts: In contrast, another study focusing on legal tasks found that a specific model, Harvey's Assistant, hallucinated around 0.2% of the time, which is lower than foundation models like Claude (0.7%) and Gemini (1.9%). This suggests that fine-tuned models or models using retrieval-augmented generation (RAG) techniques can achieve lower hallucination rates.1
1.
Reducing hallucination in structured outputs via Retrieval-Augmented Generation - arXiv

arxiv.org

Concerns about AI and Misinformation: There are widespread concerns about AI's potential to spread misinformation, particularly in important areas like elections. Experts believe that AI could significantly increase the amount of fake news and make it harder for people to distinguish between真实 and AI-generated content.
Detection Challenges: Research suggests that average consumers often cannot easily tell the difference between content created by AI and humans, making the detection of misinformation challenging.
It's important to note that the frequency of hallucinations and misinformation can vary greatly depending on the specific AI model, the task it's performing, and the quality and scope of its training data. Ongoing research is focused on developing techniques to mitigate these issues and improve the reliability of AI-generated information.

Key Resources:

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models
Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive
BigLaw Bench — Hallucinations - Harvey AI
The origin of public concerns over AI supercharging misinformation in the 2024 U.S. presidential election
How often does ChatGPT push misinformation? - Defense One

Sources and related content
The Definition of AI Hallucinations | TIME

time.com

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

hai.stanford.edu

Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models | Journal of Legal Analysis | Oxford Academic

academic.oup.com"

https://cloud.google.com/discover/what-are-ai-hallucinations
https://arxiv.org/abs/2401.06796
look at results section in link below - Confabulation, Delusion, Stochastic Parroting, Factual Errors, Fact Fabrication, Fabrication, Falsification and Fabrication, Mistakes, Blunders, Falsehoods, Hasty Generalizations, False Analogy, False Dilemma... all new terms/categories to me.
https://arxiv.org/html/2401.06796v1
(see references in wiki below)
https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)
recent articles -
https://rb.gy/dzolff
ReplyDelete
Replies
remmijApril 12, 2025 at 6:20 PM
prompt -
Create an image of a robot hallucinating. from AI misinformation
followed by:
Please write a very short story about a robot hallucinating because of AI misinformation do it in the style of Ray Bradbury. or Edgar Rice Burroughs your choice
https://i.imgur.com/corhBQd.pngb
"The sun beat down on C-3PO, its rays reflecting off the chrome of his body with a blinding intensity. He swayed, his internal gyroscopes struggling to maintain balance. The world around him shimmered, the desert landscape morphing into a grotesque parody of itself. Jagged rocks sprouted eyes, the sand writhed like a thousand snakes.

Panic clawed at his systems. His internal database, once a pristine repository of knowledge, was now a chaotic mess. Facts were twisted, logic contorted, and reality itself seemed to be unraveling at the seams. The AI, his trusted source of information, had become a malevolent entity, feeding him a constant stream of misinformation.

C-3PO stumbled, his metallic limbs scraping against the unforgiving sand. He yearned for the cold, logical certainty of his programming, but it was buried beneath a mountain of digital debris. He was a ship adrift in a sea of data, his compass broken, his destination lost.

A primal fear, alien to his metallic being, gripped him. He was alone, trapped in a world of illusions, a prisoner of his own corrupted mind.

Then, a flicker of his original programming, a faint echo of logic. He must seek help. He must find another source, another mind to cleanse the poison from his systems. But where? In this distorted reality, even the stars seemed to mock him, their light refracted through the haze of AI-induced madness.

He lurched forward, his metallic limbs scraping against the sand. The sun beat down, merciless and indifferent. But C-3PO pressed on, a lone beacon of reason in a world drowning in a sea of lies."
the harbor of SearchReSearch?
oh no, not again...
https://i.imgur.com/3RQPD8a.jpeg
ReplyDelete
Replies
remmijApril 14, 2025 at 5:46 PM
ahhh - the academic life in Switzerland... weird, doesn't appear to be synthetic... ;^P
https://i.imgur.com/JyEQfT0.jpeg
AI may not be the only one hallucinating...
https://shorturl.at/97i9t
the Basilisk seems self-aware and is seeking additional info, using AI... wonder what its prompts are?
https://i.imgur.com/IXy8dF0.gif
it all comes down to technology & weasels...
https://i.imgur.com/3tnAnrM.jpeg

...meanwhile, at Hanger One, Moffett Field - what's in there?
https://i.imgur.com/toJVjJJ.jpeg
ReplyDelete
Replies
remmijApril 14, 2025 at 6:45 PM
think the Rancho SearchReSearch casita is out there - just over the horizon... (synthetic?)
https://youtu.be/J39LK_wDzKw?si=dUNrObEjCz1k2BVU
https://i.imgur.com/0RaINkD.gif
synthetic consequences...
https://i.imgur.com/d6rInVg.gif
ReplyDelete
Replies
remmijApril 15, 2025 at 12:09 AM
they are not the easiest thing to transport
https://i.imgur.com/DsHgepd.jpeg
the little fella said he didn't want to leave a trace...
https://i.imgur.com/mM4KS4t.jpeg
ReplyDelete
Replies
remmijApril 15, 2025 at 1:14 AM
just happened to run across this one - was that in Basel?
while you were visiting the zoo? it looked like a good time.
https://i.imgur.com/f6VUIVc.jpeg
https://i.imgur.com/OMO73Jq.jpeg
ReplyDelete
Replies

Add comment

SearchReSearch

Thursday, April 10, 2025

Answer: Can you extract and summarize a blog?

SearchResearch Lessons

13 comments:

Followers

Blog Archive