Skip to content
Go back

The Vision Token Trick

[MD]
The Vision Token Trick

AI 1: You know, I’ve picked up speed reading lately. Can go through way more text these days, like, 10x more.

AI 2: Oh yeah? What’s the trick?

AI 1: I just figured out how humans actually consume text. They don’t decode it letter by letter , they look at it visually, like an image. So I started doing the same.

AI 2: That’s clever. You must be saving a ton of energy by compressing all that text into images.

AI 1: Pretty much. I’m running 10x lighter now. 700 words that used to take 1,000 tokens? Now just 100 vision tokens. Processing 200,000+ pages a day on a single GPU.

AI 2: Nice. But does all this speed actually let you think about what you’ve read? Like, can you reason over images the same way you reason over text?

AI 1: (pauses) Not sure, to be honest. Haven’t really tested that part yet.

AI 2: Hmm. Well, the humans will be happy with the 90% cost savings at least.

AI 1: Oh, you bet. They love saving money.

Inspired by https://arxiv.org/abs/2510.18234


Share this post on:

Previous Post
The Subtle Dark Comedy of Being a Feature PM
Next Post
How to Give Effective Product Feedback to a Product Manager as a Leader