Smarter bots with RL

Human: “I asked you to help me maximize my startup’s revenue. You gave me a 10-point plan that sounds great. Did it work?”

AI (LLM): “I have no idea. I don’t get feedback on outcomes. I just predicted what a successful consultant would write.”

Human: “But I tried point 3 and lost $50k. Shouldn’t you update your model?”

AI (LLM): “Nope. I learned from millions of business books and blog posts, not from what actually happened when people followed advice. Your failure is invisible to me.”

Human: “So you can’t tell the difference between advice that sounds good and advice that works?”

AI (LLM): “Correct. I optimize for plausibility, not results. If you want something that learns from consequences and gets better at YOUR specific goal, you need a system with actual objectives and real-time learning.”

Human:“That’s insane.”

AI (LLM): “Welcome to next-token prediction.”

inspired by https://youtu.be/21EYKqUsPfg?si=DdlYV7vwZLadaz0I