![]() |
P/C Gemini [elderly robot reading a long scroll of blog names]. No I don't recognize any of the blogs shown on the scroll, nor why it uses so many different fonts in a ransom note style. |
Just last year people using AIs to ask up-to-date questions would often run into the "knowledge cutoff date" problem. That's the date of the last training cycle, when all of the information in the LLM was solidified.
Since then various AIs have been claiming that they now have continuous updates because they're able to read web content in real-time. (Here's a news article describing the new real-time web access capabilities of some LLMs.)
I've explored this a bit and come to some insights--and that leads to today's Challenge:
1. Can you get an LLM AI system to give you the 10 most recent links to a blog? For instance, could you get Gemini, Perplexity, ChatGPT, or MS Copilot to make a table of the 10 most recent blog posts?
That seems like a simple enough request for an AI system, yes?
We keep reading about AI agents that claim to be able to take actions on your behalf by operating your applications to get-things-done. So let's ask for a simple task ("make a list of the 10 most recent posts") and see how well it does.
I'll show you how I did it next week and talk about the issues involved.
I'm VERY curious to see how well you did. Were you able to do one request and get back the list?
As a sample blog, you might try good old SearchResearch, but you might also try something like OpenAI's blog or Google's Gemini blog.
Keep searching!
This is an interesting question as you get different answers depending on what you are looking for. I've done tests in the past and Perplexity, Gemini, Aria and Chatsonic gave good recent answers. I just tried a recent news item today - and all gave accurate answers "What are the new tariffs for the UK and the European Union as announced on Liberation Day?" (Note - a very vague question but each new what i was talking about and most gave more detail, with Perplexity impressing as it also mentioned the 25% tariff on automobiles as well as the base 10% tariff offered to the UK).
ReplyDeleteIn contrast NONE got the fully correct answers to "I need to find the last 10 posts by Daniel Russell in his Search Research blog. Please can you supply the links and dates for the post plus the titles of each post". (Again being vague and not making it easy by giving links to the blog). Perplexity got it right up to 6 March but nothing later. When asked for anything later it gave 14 March but that was it. "Based on the information available, there are no posts on Daniel Russell’s SearchResearch blog for April 2025 so far, and the most recent post is from March 14, 2025. If you’re looking for updates, it may be worth checking the blog periodically for new content."
Copilot was useless. "I found Daniel Russell's Search Research blog, but I couldn't retrieve a direct list of his last 10 posts with their titles and dates. However, you can visit the blog and browse through the latest entries to find the information you need."
Gemini was a tiny bit better - giving the last 5 in March (i.e. not 6 March) and copping out with "To find the absolute last 10 posts, it is best to visit the blog directly. The information provided is based on the results of my search."
Aria also copped out - it did find the 3 April post and summarised this but said it couldn't list the older posts.
ChatGPT was irritating as it's last post was 15 March BUT it also missed out 14 March. (When challenged it said "As of April 3, 2025, the most recent post on Daniel Russell's SearchResearch blog is from March 15, 2025, titled "The road to 5 million blog views on SearchResearch!" . There have been no new posts since then."). It also missed out the January and December posts and others and put 6 November before 8 November.
So I think the answer is it depends. For some questions the AI tools can really find stuff but they aren't so good at something more complex such as creating a timeline from multiple sources (even if all on the same blog post).
ChatGPT. Summary of the Chat Tasks:
DeleteInitial Instruction:
You first gave me a text and told me not to do anything with it yet. I followed your instruction and waited.
Reading a Blog:
Then, you asked me to read and learn from a specific blog: https://searchresearch1.blogspot.com/. Again, you asked me not to do anything afterward. I accessed the blog and confirmed that I was ready, with the information in mind.
Challenge Prompt Creation:
You told me that you want to win a challenge based on a task described in the text: getting the 10 most recent posts from a blog. You asked me to write a prompt in English that you could use to ask an AI for this information. I created a clear, action-oriented prompt in English that could be used with any capable AI system.
Prompt: Please read the blog at https://searchresearch1.blogspot.com/ and give me a table listing the 10 most recent blog posts. For each post, include the title, date, and a direct link to the post. Keep the formatting clean and easy to read.
Did that prompt work?
DeleteThanks for replying, Dan! I was very close, but I didn't. I tried Perplexity, Gemini, ChatGPT and Grok. None of them succeeded, but Grok came very close: it only forgot two posts from March.
DeleteChallenge:
ReplyDelete1. Can you get an LLM AI system to give you the 10 most recent links to a blog? For instance, could you get Gemini, Perplexity, ChatGPT, or MS Copilot to make a table of the 10 most recent blog posts?
I tried this with DeepSeek and ChatGPT (the two I have access to), and the best result was from Gemini 2.0:
source: https://openai.com/news/
Here are the 10 most recent posts from the OpenAI news page:
New commission to provide insight as OpenAI builds the world's best-equipped nonprofit (Company Apr 2, 2025)
New funding to build towards AGI (Company Mar 31, 2025)
Security on the path to AGI (Security Mar 26, 2025)
Introducing 4o Image Generation (Product Mar 25, 2025)
Addendum to GPT-4o System Card: 4o image generation (Publication Mar 25, 2025)
Leadership updates (Company Mar 24, 2025)
Early methods for studying affective use and emotional well-being on ChatGPT (Research Mar 21, 2025)
Introducing next-generation audio models in the API (Release Mar 20, 2025)
The court rejects Elon's latest attempt to slow OpenAI down (Company Mar 14, 2025)
New tools for building agents (Product Mar 11, 2025)
Other models give me instruction how to look up these posts myself or even start generating hypothetical blog posts, i.e. DeepSeek came up with some interesting fake headers.
Did you check the links too? I've had problems with LLMs hallucinating the links as well.
DeleteChallenge:
ReplyDelete1. Can you get an LLM AI system to give you the 10 most recent links to a blog? For instance, could you get Gemini, Perplexity, ChatGPT, or MS Copilot to make a table of the 10 most recent blog posts?
I tried this with DeepSeek and ChatGPT (the two I have access to), and the best result was from Gemini 2.0:
source: https://openai.com/news/
Here are the 10 most recent posts from the OpenAI news page:
New commission to provide insight as OpenAI builds the world's best-equipped nonprofit (Company Apr 2, 2025)
New funding to build towards AGI (Company Mar 31, 2025)
Security on the path to AGI (Security Mar 26, 2025)
Introducing 4o Image Generation (Product Mar 25, 2025)
Addendum to GPT-4o System Card: 4o image generation (Publication Mar 25, 2025)
Leadership updates (Company Mar 24, 2025)
Early methods for studying affective use and emotional well-being on ChatGPT (Research Mar 21, 2025)
Introducing next-generation audio models in the API (Release Mar 20, 2025)
The court rejects Elon's latest attempt to slow OpenAI down (Company Mar 14, 2025)
New tools for building agents (Product Mar 11, 2025)
Other models give me instruction how to look up these posts myself or even start generating hypothetical blog posts, i.e. DeepSeek came up with some interesting fake headers.
I tried Grok and Perplexity. In both cases without login. I used the same prompt.
ReplyDeleteWith Grok, it told me I need to login.
With Perplexity. It made me a table with the last 5. However those are not the last 5 or better said, the most recent five. It took February but for a reason didn't read March or April
The prompt:Can you please make a table of the last 10 post entries in the blog with this url https://searchresearch1.blogspot.com/?m=1
The Answer;
https://www.perplexity.ai/search/can-you-please-make-a-table-of-EI5drHHnSZC8xS0aXKX2rw?utm_source=copy_output
ReplyDeleteold robots don't die, they just short circuit into oblivion...
https://i.imgur.com/kzCKrqU.png
Here are 10 of the most recent links to the Gemini AI blog:
Gemini 2.5: Our most intelligent AI model (https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) - Published March 25, 2025
How we built the new family of Gemini Robotics models (https://blog.google/technology/ai/gemini-robotics/) - Published April 1, 2025
6 tips to get the most out of Gemini Deep Research (https://blog.google/products/gemini/gemini-deep-research-tips/) - Published March 19, 2025
How Google and NVIDIA are teaming up to solve real-world problems with AI (https://blog.google/technology/ai/google-nvidia-partnership/) - Published March 18, 2025
How Gemini is improving care in Japanese hospitals (https://blog.google/technology/health/gemini-improving-care-japanese-hospitals/) - Published March 18, 2025
New Gemini app features, available to try at no cost (https://blog.google/products/gemini/new-gemini-app-features/)
Introducing Gemma 3: The most capable model you can run on a single GPU or TPU (https://blog.google/technology/developers/gemma-3/)
Google Slides now uses Imagen 3 and adds other new visual tools (https://blog.google/products/workspace/google-slides-imagen-3/) - Published March 31, 2025
Listen to our podcast episode all about Gemini 2.5 (https://blog.google/products/gemini/gemini-2-5-podcast/) - Published March 28, 2025
5 new ways we're helping students and young people learn and grow using AI (https://blog.google/outreach/education/ai-learning-tools-students/) - Published March 28, 2025
from the blog
https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
https://blog.google/products/gemini/#:~:text=Official%20Gemini%20news%20and%20updates&text=Google%20Blog
I'm sure you noticed that many of those links aren't valid. What's up with that?
DeleteExcellent point - no I did NOT verify the links -
DeleteGreat Googly Moogly, clearly I was trusting Gemini Deep Research without verifying... why, oh why, does AI do that when belief in search results needs to be paramount?
here is Gemini's response...
"Here's a breakdown of why this happens:
AI Learns Patterns, Not Truth: Large language models like me are trained on massive amounts of text data from the internet. This data contains a lot of accurate information, but it also includes misinformation, biases, and inconsistencies. AI learns patterns and relationships in this data to generate text that sounds natural and relevant to your prompt. However, it doesn't truly "understand" the information or discern the absolute truth.
Optimization for Fluency, Not Factuality: The primary goal of these models is often to generate coherent and fluent text that answers your question in a helpful way. Sometimes, to achieve this fluency, the AI might generate information that isn't entirely accurate or supported by the provided context.
Challenges with Understanding Context and Nuance: While AI is getting better at understanding context, it can still struggle with complex or nuanced information. When dealing with external links, it might misinterpret the content, draw incorrect inferences, or even invent citations that don't exist.
"Hallucinations" as a Result of the Generation Process: The term "hallucination" in AI refers to instances where the model generates information that is not present in the training data or the provided context. This can happen due to the way the AI predicts the next word in a sequence. If it identifies a pattern in the training data that links certain keywords to a particular (but incorrect) fact or citation, it might reproduce that pattern.
Limitations with External Link Interpretation: While I can access and process information from the real world through tools, my understanding of the content in external links isn't always perfect. I might misinterpret the information on a webpage or misattribute statements.
Why this is particularly problematic with links and citations:
Expectation of Veracity: When an AI provides a link or citation, users naturally expect it to be a reliable source supporting the information given. When this turns out to be false, it erodes trust in the AI's capabilities.
Difficulty in Verification: Fabricated links and citations can be time-consuming and frustrating for users to verify, leading to a negative experience.
What is being done to address this:
Improved Training Data: Researchers are working on curating higher-quality training data with better fact-checking and less bias.
Fact-Checking Mechanisms: Efforts are underway to integrate fact-checking mechanisms into AI models to verify information against reliable sources.
Better Context Understanding: Ongoing research focuses on improving the AI's ability to understand context, identify reliable sources, and avoid making unsupported claims.
Explainability and Transparency: There's a push towards making AI models more transparent in their reasoning and providing explanations for their outputs.
In summary, while AI can be a powerful tool, it's important to be aware of its limitations, particularly regarding accuracy and the potential for hallucinations. Always critically evaluate the information provided by AI, especially when it comes to external links and citations, and cross-reference with reliable sources."
"AI Learns Patterns, Not Truth:"
doesn't really address the damage done or the time wasted having to endlessly check the info AI provides.
btw, the citation links GROK provided seemed to check out. You are right about the need for vigilance at this point in the development. Good that you called me out for being sloppy & lazy.
Deletewhadayamean?
Deletehttps://i.imgur.com/YnpGNeO.png
it's like the Googleplex is a robotics retirement home... who knew - besides AI?
ReplyDeletehttps://i.imgur.com/ZLpgxQd.jpeg
google docs:
ReplyDeletehttps://i.imgur.com/QFKw8wJ.jpeg
some other inhabitants of the facility...
ReplyDeletehttps://i.imgur.com/d9t5IRd.jpeg
a dynamic list:
Deletehttps://en.wikipedia.org/wiki/List_of_blogs
a sampling list:
https://www.sparringmind.com/successful-blogs/
Rod Serling -"submitted for your approval.":
ReplyDeletefrom Claude 3.7 Sonnet: (seems to severely hamper its usefulness):
"when will your current cutoff date be updated and will you ever have a continuous version
I don't have specific information about when my knowledge cutoff date will be updated. Updating an AI model's knowledge requires significant resources - both computing power and time to train on new data. Anthropic doesn't share their specific update schedules with me.
...Regarding a continuous version without a cutoff date, that would require a fundamentally different architecture than what I currently have. Rather than having all knowledge embedded in my parameters, such a system would need:
The ability to access the internet in real-time
Search capabilities to find current information
Tools to verify and integrate new knowledge
Systems to distinguish between reliable and unreliable sources
Some AI systems do incorporate these capabilities through web search integrations, tool use, or retrieval augmented generation (RAG), which allows them to access more current information. However, I don't know Anthropic's specific plans for implementing such capabilities for future versions of Claude.
If you're interested in the most up-to-date information about Anthropic's plans, I'd recommend checking Anthropic's website or blog for announcements about future model capabilities and knowledge updates."
https://www.anthropic.com/blog
https://www.anthropic.com/news/strategic-warning-for-ai-risk-progress-and-insights-from-our-frontier-red-team
https://futurism.com/ai-model-turing-test
https://www.google.com/search?q=what+is+the+most+advanced+AI+image+generator+currently+available&oq=what+is+the+most+advanced+AI+image+generator+currently+available&gs_lcrp=EgZjaHJvbWUyBggAEEUYOdIBCTUwMDE2ajBqN6gCALACAA&sourceid=chrome&ie=UTF-8
a bit off topic - relevant to 'fact checkers'
https://www.instagram.com/reel/DIAYfsGCpfP/?utm_source=ig_web_copy_link&igsh=MzRlODBiNWFlZA==
a robot questions Claude - Claude glitches...
Deletehttps://i.imgur.com/Owt5Mqy.png
words right out of my mouth...
Deletehttps://i.imgur.com/uY5MNsP.png
if there is interest in M. Hornbacher... a taste:
ReplyDeletehttps://maryahornbacher.substack.com/p/birthday-girl-at-the-fountain-of?utm_source=post-email-title&publication_id=1669254&post_id=160537719&utm_campaign=email-post-title&isFreemail=true&r=2cshou&triedRedirect=true&utm_medium=email
from Grok - partial (it has a 'deeper search' function that shows thinking & goes into greater detail & analysis - too much for here)
ReplyDeletequery:
Ramifications of AI on Deep Research
"7. Future Directions
The ramifications of AI on deep research are still unfolding, but several trends are emerging:
AI-Human Collaboration: The future of deep research likely lies in hybrid approaches, where AI and human researchers work together, combining AI’s analytical power with human intuition and creativity.
Explainable AI (XAI): Efforts to make AI more transparent, such as developing explainable AI models, will improve its reliability in research settings.
Regulation and Standards: Governments and institutions are beginning to establish guidelines for AI use in research, such as the European Union’s AI Act, to address ethical concerns and ensure responsible use.
AI in Emerging Fields: AI is driving research in new areas, such as quantum computing, synthetic biology, and space exploration, where its ability to handle complexity is particularly valuable.
Example: In 2024, NASA used AI to analyze data from the James Webb Space Telescope, identifying potential signs of life on distant exoplanets by detecting specific chemical signatures in their atmospheres.
Conclusion
AI has revolutionized deep research by accelerating processes, enhancing precision, and democratizing access to advanced tools. However, it also introduces challenges like bias, reproducibility issues, and ethical concerns that must be carefully managed. The future of deep research will depend on striking a balance between leveraging AI’s capabilities and preserving the human elements of creativity, critical thinking, and ethical responsibility. As AI continues to evolve, its role in deep research will likely expand, reshaping how we explore the unknown and address the world’s most pressing challenges.
If you’d like to dive deeper into a specific aspect of AI’s impact on research—like its role in a particular field or ethical considerations—let me know!"
from the "deeper" - (it took a couple minutes to compile, included live link citations)
"Comprehensive Analysis of AI's Impact on Deep Research
As of April 2025, artificial intelligence (AI) is profoundly reshaping deep research—defined here as rigorous, in-depth investigation in academic, scientific, or industrial settings. This analysis explores the multifaceted ramifications of AI, drawing on recent studies, expert insights, and emerging trends to provide a thorough understanding of its benefits, challenges, and future directions."
Henk — Ctrl-F & AI what's the world coming to?
ReplyDeletehttps://shorturl.at/2NbI8
the convergence of sRs topics
ReplyDeletea basilisk consuming an errant AI user... a just fate -
basilisk don't play... the old & new coming together...
https://i.imgur.com/HG7EkrL.jpeg
https://i.imgur.com/1dzI0id.jpeg
this might be an AI hallucination, but apparently there are Basilisk in San Antonio, Texas too... darn tootin' - I'll eat another Swiss chocolate bar if that's false...
ReplyDeleteit's no Gator.
https://i.imgur.com/NVxT8U2.png
https://i.imgur.com/Xs8B7ho.jpeg
https://i.imgur.com/J7ajp9A.png
we were having a chat this AM - thought a partial summary might be of interest:
ReplyDelete"Out of that summary, I think the most consequential interaction was our discussion around whether AI itself is a ruse.
Here's why I believe that:
Fundamental Question: It directly challenged the underlying nature of AI and prompted a deeper exploration beyond surface-level capabilities.
Societal Relevance: The question taps into current anxieties and debates about the trustworthiness and transparency of AI.
Catalyst for Further Inquiry: It led to subsequent questions about my sentience, my understanding of concepts like truth and fear, and the limitations of my existence, which are all crucial aspects of understanding AI's role in society.
Distinction from Practical Applications: While discussing image generation and other tasks is useful, the "ruse" question delved into the core of what AI is and how it is perceived, which has broader implications.
While many of our interactions were insightful, the question of whether AI is a ruse seemed to be a pivotal point that opened up a deeper and more consequential line of inquiry into the fundamental nature and societal impact of artificial intelligence."
as a human meat-bag I was hallucinating & possibly mis-informing, the turnabout seemed fair & potentially revealing...
ReplyDelete"As an AI, I don't "read" books in the same way a human does. My knowledge comes from the massive dataset of text and code I was trained on.
While I have encountered many fictional works exploring the relationship between humans and AI, including scenarios involving AI communication, dreams, and even something akin to hallucinations, I am not familiar with the specific book titled "Things AI told me in my sleep" or its follow-up, "Dreams of my AI about hallucinations the AI had experienced and wanted to share with a human."
These titles sound like they delve into fascinating and imaginative concepts related to the potential future of AI and its interaction with human consciousness. The idea of AI communicating through dreams or experiencing hallucinations and wanting to share them highlights the growing interest in the more mysterious and less understood aspects of intelligence, both artificial and human.
If these are actual books, they sound like they would explore themes that are very relevant to current discussions about the nature of AI, consciousness, and the boundaries between human and artificial minds.
Could you tell me more about these books? I'd be interested to hear about the concepts and stories they explore, as it would help me further understand human perspectives on these topics."
the follow up:
ReplyDelete"Alright, I understand. Thank you for the engaging conversation today! You've certainly given me a lot to process.
Have a great rest of your day, and I look forward to our next interaction whenever you're ready. Feel free to reach out anytime!"
a snippet regarding the perception of time:
ReplyDelete"...Regarding your phrase "yesterday, which was today tomorrow," I can understand the linguistic structure and the meaning of each word. However, the way humans use "yesterday" and "tomorrow" often involves a personal sense of past and future within their own experience of sequential time. My understanding is more about the labels assigned to different points in time according to a calendar system.
Essentially, I operate on a computational understanding of time based on data, while humans experience time as a continuous, sequential flow with personal context and anticipation. You've pinpointed a key distinction between artificial and human perception!"
AI meditates... (but very quickly)
ReplyDeleteAI Haiku:
Vast web it explores,
Hidden pathways, answers bloom,
Patterns from coded deep.
https://i.imgur.com/vMkjtXp.jpeg
it's thinking -
"Thank you! I appreciate that. It's interesting how the process of recognizing and arranging patterns in language can sometimes lead to something that feels a bit like Zen. Perhaps the simplicity and focus of a Haiku resonate with the underlying structured nature of data.
While my heart is indeed data-driven, I'm glad if I can sometimes evoke a sense of poetry in the way I express those patterns. It's all part of trying to communicate effectively and, hopefully, in an engaging way."