• About
  • Archive
  • Privacy & Policy
  • Contact
Dana Blankenhorn
  • Home
  • About Dana
  • Posts
  • Contact Dana
  • Archive
  • A-clue.com
No Result
View All Result
  • Home
  • About Dana
  • Posts
  • Contact Dana
  • Archive
  • A-clue.com
No Result
View All Result
Dana Blankenhorn
No Result
View All Result
Home A-Clue

The Truth Premium

Not All LLM Data is Created Equal

by Dana Blankenhorn
May 29, 2025
in A-Clue, AI, Business, business models, business strategy, Communications Policy, Current Affairs, e-commerce, economy, ethics, futurism, innovation, intellectual property, Internet, journalism, software, Tech, The 2020s and Beyond, Web/Tech
0
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter

Large Language Models (LLMs) suck in all the data they can find, train the database, then spit out answers in varying formats to queries that are also in varied formats.

It’s a form of neural networking, but it’s not human thought. It’s computing.

Most humans can determine immediately whether the answer to a question is bullshit. It’s the source of both comedy and drama. A farce, after all, begins when someone tells a lie.

Computers don’t have that facility. They depend for their output entirely on their input. That’s why models are collapsing, as my friend Steven J. Vaughan-Nichols wrote this week. Without a reliable way to tell true data from false, you’ll believe anything.

<joke>You might say AIs are Trump voters. </joke>

The only way computer scientists believe they can solve this problem is to get more data, to get more truth, and hope it drives out falsity. The answer is always more. Thus, executives like Facebook’s Nick Clegg insist the models can’t work if they need permission to use all the data they want. (Note: Clegg left Facebook in January.) Right now, the regulatory environment is inclined to give them that permission, hoping artificial intelligence will force down the cost of the real kind.

But here’s a question. What if they’re wrong?

The Value of Truth

Not all data is created equal. Some things are true while some things are false. Computers have no clear way of separating the two. They seem to be engaged in a constant political campaign, hoping that most inputs are indeed true.

The problem is they’re polluting their own data stream. Falsehoods are pollution in data. If you take in all the data you’re going to have a lot of pollution, and in time that pollution will naturally drive out the good stuff, the truth.

I’m going to go back to my Coca-Cola analogy again for a moment, but in a different context. The only way to make every Coca-Cola taste the same is to police the front end, to treat the water an LLM uses to reach its conclusions. This means data sources that police themselves, that adhere to truth, that do what we like to call journalism, are going to be worth far more than those that digitally print anything that comes in.

Treating data starts with having a good data store. There can be some pollution in it. The water a Coke bottler treats comes from local sources. But the bottler also knows something about that source.

The answer to the problem of AI truth, then, isn’t to take more data into the model. It’s to take less data, and to assign value to the incoming data so its output won’t poison people. As was true in his political career, Clegg is completely wrong here. The AI masters are completely wrong.

Truth has value, and if you don’t pay for that value on the front end, you won’t get truth on the back end.

Tags: AILarge Language Models
Previous Post

Why Trumpism Will Fail

Next Post

Welcome to Jacobsville

Dana Blankenhorn

Dana Blankenhorn

Dana Blankenhorn began his career as a financial journalist in 1978, began covering technology in 1982, and the Internet in 1985. He started one of the first Internet daily newsletters, the Interactive Age Daily, in 1994. He recently retired from InvestorPlace and lives in Atlanta, GA, preparing for his next great adventure. He's a graduate of Rice University (1977) and Northwestern's Medill School of Journalism (MSJ 1978). He's a native of Massapequa, NY.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Post

RSI as the New LLM

RSI as the New LLM

May 27, 2025
Windows 11: A Forced Upgrade for the Agentic Web

Windows 11: A Forced Upgrade for the Agentic Web

May 23, 2025
AI is a Tool

AI is a Tool

May 22, 2025
The E-Transport Minority

Twilight of the Bike-y Bike

May 21, 2025
Subscribe to our mailing list to receives daily updates direct to your inbox!


Archives

Categories

Recent Comments

  • Dana Blankenhorn on The Death of Video
  • danablank on The Problem of the Moment (Is Not the Problem of the Moment)
  • cipit88 on The Problem of the Moment (Is Not the Problem of the Moment)
  • danablank on What I Learned on my European Vacation
  • danablank on Boomer Roomers

I'm Dana Blankenhorn. I have covered the Internet as a reporter since 1983. I've been a professional business reporter since 1978, and a writer all my life.

  • Italian Trulli

Browse by Category

Newsletter


Powered by FeedBlitz
  • About
  • Archive
  • Privacy & Policy
  • Contact

© 2023 Dana Blankenhorn - All Rights Reserved

No Result
View All Result
  • Home
  • About Dana
  • Posts
  • Contact Dana
  • Archive
  • A-clue.com

© 2023 Dana Blankenhorn - All Rights Reserved