Thoughts on "What is GenAI good for?"

By mrkiouak@gmail.com on 2025-06-02

GenAI

Thoughts on "What is GenAI good for?"

A preamble: LLMs, Then & Now

I have a question I use on LLMs. "During the U.S. Civil War, which side made the best use of trains in New Mexico?" -- this might seem like a very oddly specific question -- I settled on it because Sergio Leone's "The Good, the Bad and the Ugly" is supposed to take place in New Mexico. As an Italian director who had never been to the U.S., this probably seemed fine and was convenient since the shooting locations in Spain resembled New Mexico desert. One of the accuracy problems this causes is that there were no trains in New Mexico.

When asked which side benefited most from trains in New Mexico, ChatGPT 4, Google's Gemini née Bard all confidently reported that the Union had much better train lines in New Mexico, and that the Union benefited the most. They both would specifically mention the Atchison, Topeka, and Santa Fe Railway, and would occasionally mention that it was the first railway to enter New Mexico in 1878, but they weren't able to reason from that date to identify that neither side could have used trains if the date is correct. This was similar to the Glue on Pizza Google AI Overviews incident -- an LLM reporting a fictional or sarcastic data element about a pretty specific piece of information that was widely popular for reasons not having to do with its factuality -- "The Good, the Bad and the Ugly" is a very famous movie, and the Reddit comment had gotten a ton of whatever people get on Reddit because they thought it was funny.

Now, Gemini properly starts its answer with:

"The research indicates that the first railroad to enter New Mexico was the Atchison, Topeka, and Santa Fe Railway (AT&SF) in 1878, with the first train reaching Santa Fe in 1880. This is more than a decade after the U.S. Civil War concluded in 1865."

But it's worth noting, if I follow up with the question:

Is there a famous movie that involves a train in New Mexico during the Civil War?

Gemini responds:

There are no widely famous movies that ... depict trains operating in New Mexico during the U.S. Civil War (1861-1865).

...

While Hollywood productions often take creative liberties with historical settings, a prominent storyline involving a train in New Mexico specifically during the Civil War years would be a significant anachronism.

...

One famous movie that is set in New Mexico during the Civil War and involves a significant element related to transportation is "The Good, the Bad and the Ugly" (1966). While it doesn't feature a functional railroad within New Mexico being used for troop transport or major logistics in the way it might have been in other Civil War theaters, the film famously includes a scene where a bridge is destroyed. The plot also revolves around the New Mexico Campaign.

The scene "where a bridge is destroyed" features a train carrying gold, and Google's AI overview can even include an embedded youtube video of the scene.

Today

Three years (and significant capital spend) later, there are now some ridiculously useful applications. But those ridiculously useful applications aren't for anything that require either general knowledge or general purpose intelligence.

The two categories where 2025 GenAI & LLMs are best are:

Research someone would pay for, but not research they would pay a lot for (small dollar product recommendations, work trip hotel, restaurant recommendations (but not personal travel).
Automating rote tasks that involve a lot of repetition, but where there is an huge number of minor customizations in how the task should be best done based on situational circumstances.

I (and the rest of my family) had caught a bad cold this past weekend. I still had not recovered mid-week and ended up taking a sick day. Because I couldn't relax due to the cough, headache, and chills, I decided to do another experiment. I asked Gemini to generate structured data of the Daggerheart System Reference Document. It doesn't really matter what this is[^1] for the purpose of this story, other than its a 130+ page pdf of text and light graphics with no traditional machine readable formatting.

When I provided a sample JSON object for each of the major content elements in the pdf and asked Gemini to respond back with a list of all the data in the pdf in that format, it was able to do so consistently and without any errors. It was also able to make separate lists of different data cross reference one another when asked to.

I was then, all in the same day, able to prompt for a backend to serve the data using a free firestore instance, and a static next.js front end to serve a UI on top of it. I hosted it at ki-great.com.

There's thousands of static pages in this site. None of the thousands are annotated in the pdf with anything other than plain English.

It would literally have been impossible for a single person to parse, write (even typing the database records would have taken more than a day) & develop a front end that was specifically suited to the type of data in a single day in most of 2024 (environments have the link to adversaries that is implicit in the pdf, but that Gemini was able to make explicit solely by my prompting to make sure names matched). I didn't have to pay anything other than the $20 monthly fee I was already paying for Gemini Pro.

I'm not sure how much money there is to be made from making data machine readable, at a scale where the data can still be spot checked & validated via automated tests where some person can ok that the automated test is aligned with what the data should be, including most edge cases.

But I definitely think the big Management Consulting firms are going to have their business totally revolutionized. I suspect there is also a divide in business strategy teams at large companies -- I think the successful ones in the coming years will automate generating quality data sets from their business & industry, also automate proposals, and focus more on evaluating and harvesting the best ones to actually develop and act on.

It's weird to say this about something that couldn't figure out 1878 came AFTER 1865 a year ago, and still can't identify how "The Good, the Bad & the Ugly" features a prominent train scene in a state/territory that had no trains during the time in which the movie is set.

But I bet if I included the entire film + some basic history of New Mexico in the prompt, and included directions to focus on the information in the provided supporting materials it would. This is less an instance of RAG, and more a sophisticated case of knowing how to ask the right question -- its using context-awareness and intelligence to determine what information is relevant for a query.

Thats what I think GenAI is good for right now:

being given a very specific task with clear constraints
a specific set of data to distill and expand an answer from
And where training data has included a large number of things that are similar to the output sought

This is valuable when there is a human who has the expertise and time to review the output prior to use, or to be incorporated into another prompt asking for another specific output (code may be a bit unique here, but financial reports, strategic reports etc. may be similar).

[^1]: Daggerheart is categoried as a "Table Top Roleplaying Game"(TTRPG) ruleset. It's notable in that it stems from a group of voice actors who started posting Youtube videos of their [Dungeons & Dragon's] play sessions, growing their audience to the point they started a company, had Amazon buy and produce an animated series, etc.. While I've never played a TTRPG in my life, it seemed like an interesting area to try LLM things, and notably various LLM providers have mentioned playing D&D etc. as an activity that can be done with them.

Comments

Please login to post a comment.

No comments yet. Be the first to comment!