Blog post #31
Thanksgiving felt like the perfect 3-day reading break to revisit Harari’s Sapiens. I first read it eight years ago back in college and it’s one of those books that has been formative to how I think. Back then, “collective imagination” was the big unlock, realizing that money, religion, and corporations are shared fictions that enable large-scale coordination.
This time, a different set of themes stood out: the role of corporations in history, the relationship between science and imperialism, capitalism’s growth logic, the discovery of ignorance, and the emergence of new forms of energy. This post focuses on one thread: how writing reshaped human cognition, and how that maps to the shift toward multimodal AI we’re seeing today.
Photo by Marcus Ganahl on Unsplash
Writing expands our cognitive limits
Humans scaled cooperation through imagined orders and scripts, external memory systems that extended our cognitive capacity.
Early writing systems like Egyptian hieroglyphs, Chinese logographs, and the Inca quipu weren’t created for poetry. They were invented to manage complexity: taxes, grain storage, inventories, land ownership, political coordination. They complemented spoken language with something more structured, durable, and precise. Civilizations that mastered writing became strong archivists, able to catalog and retrieve information at scale.
Over time, script didn’t just record thought, it reshaped it. It moved humans from free association, our natural cognitive mode, toward categorization, administration, and compartmentalized thinking.
Photo by The Cleveland Museum of Art on Unsplash
Will multimodal AI interfaces restore our natural way of thinking?
Human cognition is inherently multimodal. We process the world through images, tone, gesture, texture, narrative fragments, and nonlinear jumps. Script, mathematics, and binary code compressed that complexity into narrower formats machines could understand.
Interestingly, multimodal AI might reverse that trend. My grandma and her friends (all in their 60s and 70s) almost exclusively use voice messages in their group chats. For people newer to technology, voice is intuitive, emotional, social, and low-friction. It feels natural. It builds trust. And it mirrors how we actually think.
If writing expanded our capability through structure, multimodal AI expands it again by restoring expressiveness. It shifts technology from something we adapt to into something that adapts to us.
Photo by Tiago Muraro on Unsplash
Language also shapes what we can think
Humans invent tools to expand what we can do, but those same tools shape—and often limit—what we can perceive. Each representational system—language, script, math, binary code—extends our capability but also defines the boundaries of the world we notice. Language narrows attention to what can be said. Script narrows thought to what can be recorded.
Every tool enlarges capability, but every tool also creates a frame. People who speak multiple languages, or who can switch between different representational systems (text, visuals, diagrams, narrative), gain access to different slices of the same idea. The concept of “home,” “honor,” or “freedom” shifts meaning across languages. The same is true in design: a diagram reveals relationships that a paragraph obscures; a voice note carries emotion that text flattens.
Multimodal AI, in that sense, widens the frames again—bringing machines closer to the full range of human expression we started with. If script once expanded our capacity through structure, multimodal AI may expand it again through expressiveness. The question isn’t whether machines can think like humans, but how we design the cognitive frames we build with them.