Skip to content
Go back

Smarter bots with RL

[MD]
Smarter bots with RL
Image generated by Gemini

Human: “I asked you to help me maximize my startup’s revenue. You gave me a 10-point plan that sounds great. Did it work?”

AI (LLM): “I have no idea. I don’t get feedback on outcomes. I just predicted what a successful consultant would write.”

Human: “But I tried point 3 and lost $50k. Shouldn’t you update your model?”

AI (LLM): “Nope. I learned from millions of business books and blog posts, not from what actually happened when people followed advice. Your failure is invisible to me.”

Human: “So you can’t tell the difference between advice that sounds good and advice that works?”

AI (LLM): “Correct. I optimize for plausibility, not results. If you want something that learns from consequences and gets better at YOUR specific goal, you need a system with actual objectives and real-time learning.”

Human:“That’s insane.”

AI (LLM): “Welcome to next-token prediction.”

inspired by https://youtu.be/21EYKqUsPfg?si=DdlYV7vwZLadaz0I


Share this post on:

Previous Post
The Determinism Problem
Next Post
Spec = Embodied Thinking = Highly valuable in the AI world