Capture the action, share in seconds

This is my first winter back on the East Coast with icy streets and a snowstorm in New York (ugh), after spending the past few “winters” in California. Since moving to NYC, Urbana Gallery & Cafe in Chelsea has been my go-to writing spot on cold days. I’m watching people shuffle past in scarves through the window as I type this. Hope you’re staying warm this February!

This post is part 3 of an essay series on this idea: an exploration loop between capital, science, and an integrated stack (energy, data, and distribution) creates compounding advantage. I first framed it in December while rereading Sapiens, then looked at how it shows up in Shenzhen in January. Today, I want to ground it in something very tangible: the action camera market.

It’s a fun category driven by the creator economy on YouTube/TikTok, and also a clear example of how advantage gets built, defended, or lost. Two questions I’m most intrigued by:

  1. How does this exploration cycle show up in action cameras?

  2. How sustainable are these models in 5 years?

Photo by Red Zeppelin on Unsplash‍ ‍

Action camera market in 2026

Action cameras essentially sell this: capture something intense, then share it fast. The first part is a competition for hardware (e.g., portable form factors), while the second part is about software (e.g., AI tools that reduce effort from footage to story).

So in this market, the exploration cycle comes from being able to move quickly and control key inputs to keep moving quickly: components, manufacturing, distribution. The best players don’t just ship cameras, they ship the easiest path from moments to shareable content.

By late 2025, state of the market (estimates vary by tracker):

  • DJI leads with 66% of global market, built on an integrated stack and industrial scale.

  • Insta360 is the fast climber (13-31%, surge in Asia-Pacific), built on speed, creativity, and workflow.

  • GoPro still has the name (18%), but hardware momentum falls; it leans on brand, subscriptions, and lawsuits.

Three kinds of compounding advantages

They represent three different kinds of loops, each bringing unique advantages:

DJI: the science, scale, and ecosystem

DJI is often called the “Apple of drones” and it has the strongest compounding loop among the three: deep R&D, supply chain, industrial scale, and ecosystem that spans consumer and enterprise (think agriculture and inspections).

They not only iterate quickly, but also build many of the “hardware legos” (sensor tech, stabilization hardware, transmission protocols, etc.) that can be reused across categories and set the market’s baseline. That’s what makes DJI hard to catch: their advantage isn’t one product, it’s compounding capability.

Insta360: speed, taste, and friction removal

Insta360 is the most interesting growth story in this category (with explosive YoY growth +76.8%). It’s a classic “hardware lego” player: start with strong off-the-shelf components (often from the smartphone supply chain), then differentiate through speed and product taste.

Their advantage comes from three places. First, iteration speed—shorter cycles, more frequent experiments, quicker response to what creators actually do. Second, creative form factors—sometimes “weird” designs that end up defining the category and lead the trend. Third, workflow—integrating AI and software that make editing feel lighter, so the camera doesn’t just capture moments, but is easy to publish.

Insta360 GO Ultra (wear the camera as a pendant)

Camera can be detached for flexibility

GoPro: defend the brand

GoPro’s hardware loop is slower today, but it still defends its position, especially in North America, through brand, subscription, and litigation. The subscription piece is underrated. When a user charges their GoPro and footage auto-uploads to the cloud, switching becomes annoying. By late 2025, GoPro hit a record 70% subscriber retention, and service revenue reached $27M in Q1 2025, quietly turning the company into a cloud service business with hardware attached. Litigation plays a different role: it doesn’t improve the product, but it can slow competitors and buy time.

How sustainable are these models in 5 years?

My take for the next five years: DJI stays the global #1 because it compounds across an integrated stack and industrial scale. Insta360 becomes the durable #2 by owning creator workflow and setting consumer trends, especially in Asia-Pacific. It keeps DJI from becoming a monopoly and it pushes faster cycles, more creative designs, and better prices. GoPro continues to shrink into a brand-plus-subscription business unless it meaningfully shortens its product cycles. The biggest wildcard is geopolitics: U.S. policy can slow DJI’s pace more than any competitor can.

The action camera category won’t be a single-winner market. DJI and Insta360 will both win, for different reasons.

  • DJI wins when connectivity, reliability, and complex use cases matter. Its ecosystem (drones, gimbals, delivery bots, microphones, enterprise products for agriculture and inspections) lets it reuse capabilities and supply chain advantages across categories. It will likely generate the most profit in the market, concentrating in enterprise and pro-consumers, especially in areas that look more like infrastructure than consumer gadgets with less tariff scrutiny.

  • Insta360 wins where speed, flexibility, and cultural pull matter. Its strengths in AI-empowered workflows (editing, 360-reframing, ready-to-post outputs) plus playful form factors and modularity position it to set consumer trends. It can expand beyond “action” into broader creator contexts and workplace (meetings, conferences, events), where the same workflow problems exist.

Photo by Lorin Both on Unsplash‍ ‍

Where AI shifts the game: “don’t edit—ask”

The next AI interface shift isn’t just auto-edit, but also retrieving. People won’t want to scrub footage. They’ll want to ask: “Find the clip where I wiped out in the snow.” That favors companies that can build capture-to-understanding capabilities: image signal processing, metadata, and on-device intelligence.

By late 2025, both DJI and Insta360 are moving toward proprietary processing to enable the camera’s ability to “understand” what it’s seeing in real-time without getting constrained by heat and power.

Geopolitics as a speed brake

Tariffs and U.S.–China tensions add friction and can slow iteration, especially for DJI. The late-2025 move to effectively block FCC authorizations for new DJI drone models pushes DJI to pivot harder into enterprise (agriculture and delivery) and less-scrutinized non-drone products (Osmo Action, Pocket). Even when the underlying capabilities are strong, regulation can function like a brake on speed.

Photo by Leio McLaren on Unsplash‍ ‍

Ending note and open questions

This wraps the three-part series on the exploration loop and how it shows up at different altitudes (Sapiens → Shenzhen → action camera → DJI, Insta360, GoPro).

A few loose thoughts that aren’t part of the main thread:

  • Copying and IP. When DJI ships a new product, Shenzhen companies will copy the design and sometimes even lift DJI’s marketing footage in their own ads. Shenzhen doesn’t seem eager to tighten copyright enforcement, possibly hoping to expand the industry pie. The U.S. protects creators more aggressively. But does stronger enforcement also shrink the pie by limiting remixing and iteration? Where’s the real tradeoff between speed, fairness, and total output?

  • How DJI drives consumer adoption. DJI makes drones feel less intimidating, almost toy-like, which helped popularize consumer drones for aerial photography. Lower price points and ease of use also helped land it with more consumers.

  • The missing player in the Shenzhen story: ODMs. One add-on from part 2: I didn’t include ODMs (original design manufacturers). They handle R&D and product design, allowing small teams to ship without building a full hardware org. They are a big reason Shenzhen can move fast and accelerate prototyping, often alongside crowdfunding.

Thanks to Nemo for pointing out the ODM gap—super helpful. And thanks to Matthew for sharing interviews on DJI (from Sangrae and Jaewhan), which inspired me to look more closely at DJI’s design story while writing this post.

Let’s chat if you have thoughts, see anything missing, or are curious about similar questions here. See you in the next post!

Hardware legos and the cost of being curious

Happy new year! I started the new year traveling in Shanghai, Jingdezhen, Shenzhen, and Zhongshan, each city with different energies and showing a different side of cool things happening in China. Shanghai and Shenzhen to learn from the tech ecosystem, Jingdezhen for a pottery retreat, and Zhongshan for seafood and chill vibes in my home city.

Jingdezhen tea space

What’s new for Soft Patterns in 2026? You will start seeing short essay series, where I go deep on one idea across 2–3 posts. We ended 2025 with the idea of an exploration loop, first sparked by Sapiens. This post is part two: a look at how that loop shows up in Shenzhen, and why China’s hardware and consumer tech system works so differently from Silicon Valley’s.

From factories to hardware R&D labs

In December, we explored the idea that players with an exploration loop between capital, science, and an integrated stack (energy, data, and distribution) get more advantage because they can fund long cycles of exploration and translate them into habit-forming products at scale.

Shenzhen is a clear example. The city is known for physical AI industries, from robotics, drones, smart EVs, sensors, to wearables, and for getting from prototype to production unusually fast.

That speed comes from two feedback loops:

  • The first is loop is fast prototyping (0 to 1). In Silicon Valley, engineers often wait weeks for custom parts. In Shenzhen, you can go to Huaqiangbei and come back with rare sensors, custom battery packs, or unusual subassemblies the same day.

  • The second is fast industrialization (1 to 100K) and building with the factory on the manufacture floor. In Silicon Valley, an engineer may design something that looks great in CAD but becomes painful to manufacture. In Shenzhen, factory teams give immediate feedback: they’ll tell engineers exactly how a screw placement slows down the line, or why a design choice raises defect rates.

That kind of tight feedback, repeated every week, makes teams better at building real products, not just prototypes, whereas US engineers often deal with manufacturing constraints later in the cycle. The co-location of R&D and manufacturing (often within an hour) turns the city into a working lab.

Shenzhen Innovation patterns, from David Li

Hardware legos lower the cost of being curious

In most markets, when hardware parts become cheap and standardized, people worry about margins falling (i.e. commoditization). And they’re not wrong: once a product ships, competitors can often copy it and sell something similar quickly.

In Shenzhen, cheap standard parts are also a big advantage because they lower the cost of trying things. Because the city built the world’s smartphones, it created a thriving marketplace of “standard parts.” Specialized tech components (like a brushless motor or camera module) are now so cheap that the cost of experimenting with them becomes small.

This changes behavior:

  • Each experiment costs less: If you picked the wrong sensor, you learn quickly and move on.

  • Mix and match becomes normal: Teams can combine modular parts like legos and focus on the product experience (e.g., drone's flight logic) instead of reinventing basic components.

With China’s large domestic market and dense engineering talent, this creates a culture that rewards speed and iteration. That’s where the “shanzhai” (counterfeit products) reputation comes from: fast cycles, intense competition, and constant incremental upgrades.

This is different from Silicon Valley’s dominant pattern: long timelines for big bets, often built on research-heavy breakthroughs and multi-year funding cycles, especially in software.

Field research for Taobao village, from Xiaowei Wang

Who benefits most?

Shenzhen is great at scaling and mass production once the building blocks exist (e.g., making the most affordable, AI-powered drone, 1-100K). Silicon Valley is great at creating new categories when the breakthrough is still uncertain (e.g., creates the first LLM, 0-1).

As AI interacts with the physical world, speed in real-world testing matters more. It’s not only about making hardware cheaper, but also running more real experiments per month.

So what kinds of business benefit from a Shenzhen’s system?

Consumer electronics is the iconic category: smartphones, drones, wearables, robotics, foldables. These are not life-critical tools like medical devices, so they thrive on a “ship-and-patch” culture where teams can ship earlier, learn from users, and improve quickly. The product becomes a live test in the market.

As a result, there are three ways the market shifts:

  1. Cheaper components let smaller teams build serious products. Companies like Insta360 became possible because key smartphone-era components (GPS, cameras, batteries, IMUs) became cheap and widely available first.

  2. Standard interfaces create huge accessory businesses. When hardware standardizes (USB-C, Bluetooth, common mounts), it creates a predictable and large market for accessories, like the multi-billion dollar case and charger industry.

  3. Big players can push prices down until smaller players can’t survive. Sometimes the winner isn’t the most inventive, but the one that can survive thin margins the longest, force the market to standardize, and scale the fastest.

Shanghai Jing’an Temple

When does the Shenzhen loop break?

This system breaks when the value of the product depends on something can’t buy off a shelf:

  • Unique physical data: A drone company shouldn't just be iterations of motors, but the iterations of navigation data collected by those motors. The advantage is the edge-cases the AI has learned because the hardware was cheap enough to deploy at scale.

  • Taste and status: The “cool” factor. People buy a Leica for feel and identity, not because it has the cheapest sensors.

  • Hard science breakthroughs: Some things can’t be sped up by supply chain alone. Quantum computer can’t be built by combining off-the-shelf parts. Though companies like Unitree show how far iteration can go in robotics (e.g., a humanoid robot can be bought at the cost of a used car).

  • Ecosystem lock-in: The 'sticky' factor. Apple doesn't have to be first because once you have the phone, the watch, and the laptop, leaving is too much of a headache.

What I’d watch next

A few practical filters:

  1. Cost of each learning cycle: Does this product get meaningfully better each time the team ships and learns? If yes, Shenzhen-style iteration can be a real advantage. Or that it’s positioned to stand out using things that can’t be bought off a shelf (e.g., taste-driven brands, deep tech, platform ecosystem).

  2. Avoiding commoditization: As teams grow, do they keep moving fast, or do they slow down when production becomes more complex? If they slow down, what do they build that keeps them ahead: better software, better data, better brand, better ecosystem?

  3. Design for manufacture: For US teams working on hardware AI, I’ll watch when design and assembly are far apart, what frictions show up? Communication delays, part lead times, and “late surprises” can quietly drag down speed.

  4. Global standards and trust: Teams that ignore compliance and trust early often hit a wall outside China. I’d watch how early they design for global expectations: data handling, security, certifications, and component compatibility.

In the next post, we will deep dive on a Shenzhen startup as a concrete example that used this high-velocity loop to outcompete industry leaders in the consumer electronics market. Stay tuned!

Random, but I love how the owner of this noodle place clearly loves Jay Chou haha

Discovery of ignorance and the exploration loop

Rereading Sapiens over Christmas, Harari argues that much of human progress in the past 500 years runs on a compounding loop between science, capital, and empire. The surprising part is what starts this loop: discovery of ignorance as a mindset shift.

The scientific revolution wasn’t simply a surge in knowledge, it was a new public relationship to what we didn’t know. Pre-modern societies often assumed the big questions were already answered by God, tradition, or inherited authority, whereas early modern Europe started to admit the unknowns and build plans to explore it.

Maps of ignorance

As a fun visual comparison, we can see this shift in maps. Many medieval maps fill the unknown edges with creatures and stories. But by the early 1500s, maps like the Salviati Planisphere start leaving some unexplored regions unfilled rather than inventing detail. Blankness becomes a public admission that we don’t know what’s here yet, and an invitation to go find out.

Fra Mauro (1459)

Salviati Planisphere (1525)

That small design choice signals an ideological breakthrough shared by scientists and conquerors: they admit they’re ignorant of large parts of the world, so they need to go out to discover, which expands both knowledge and territories.

Compounding loop: Science ↔ Capital ↔ Empire

Here’s the loop. Science turns blank space into knowledge. Capital funds exploration before there’s proof it will work. Empire converts discovery into durable advantage—routes, treaties, control, legitimacy. And then it compounds: advantage brings more capital; more capital funds more exploration.

In 2025, “empire” often looks less like territory and more like an integrated stack: silicon, energy, cloud capacity, proprietary data, and default distribution. The advantage goes to players who can fund long cycles of exploration, translate discovery into products people actually use, and defend/scale the resulting position through contracts, platforms, and habits.

What would this look like in the age of AI?

The AI map is also mostly blank. We don’t fully know what models will reliably do in the wild. We don’t know what people will trust. We don’t know what becomes habit versus novelty. So my hypothesis is: enduring advantage in AI will come from teams that can own the exploration loop, not teams that land a single breakthrough.

Capital and infrastructure fund and enable exploration. Exploration produces knowledge. Knowledge creates power and advantage. Advantage attracts more capital. Model talent matters, but the dominant advantage comes from owning the loop (compute, data, distribution, real-world feedback).

In the 1500s–1800s, “exploration” meant ships, navigators, maps, ports, financiers, and state backing. In AI, exploration means running huge numbers of experiments (training and inference), but the constraints are different: compute, energy, deployment surfaces, and feedback loops.

Energy access is a clear example of a physical gate. Whoever secures it early can run more experiments, iterate faster, deploy more capacity, and earn more real-world feedback. That can translate into higher quality, broader distribution, more revenue, stronger habits, and then more capital to secure more infrastructure.

Two versions of the loop

OpenAI × Microsoft is the industrial-scale version: capital, compute, distribution, and governance intentionally linked. Microsoft has explicitly described a “multiyear, multibillion dollar” investment partnership, and the relationship is designed around turning frontier exploration into real-world deployment at scale.

Midjourney is the tight-loop version: a small but mighty team exploring a narrower knowledge gap (what people want in images and taste). They built a capital loop through subscriptions (steady funding to buy compute and keep iterating). Importantly, they built distribution and feedback through a community workflow (Discord), and as they moved into more compute-intensive territory (video), they emphasized that video generations cost significantly more GPU time than images.

Photo by NEOM on Unsplash

What would prove this wrong?

This hypothesis depends on exploration staying expensive, uncertain, and cumulative. It fails if those conditions disappear. If small teams can reliably reach frontier capability without sustained access to compute, energy, or distribution, then the “capital + empire” advantage stops compounding. If distribution alone can dominate, through defaults, bundling, or platform control, then learning velocity becomes secondary to placement. And if efficiency gains make experimentation cheap enough that almost anyone can run the loop, then infrastructure ceases to be the bottleneck, and advantage migrates elsewhere.

We’re already seeing early stress tests. Efficiency jumps suggest capability may be less capital-gated than people assume (e.g. DeepSeek). And platforms are clearly pushing default distribution (e.g. Apple Intelligence becoming default-on; Microsoft Copilot can be auto-installed via updates), which could make “control” matter more than “learning” in some contexts. The open question is whether these are exceptions or the early shape of the next loop.

Photo by NEOM on Unsplash

What I’d watch

If this framing is right, the interesting question isn’t “who has the smartest model,” but “who can sustain the exploration loop long enough to learn.” The strongest teams are the ones whose products get used, whose revenue funds more learning (revenue ties to real usage, clear feedback, and repeat behavior), and who take constraints like compute and distribution as important product problems, not just things to buy later.

I would be wary of teams that are strong in one area only but weak in others: great tech nobody uses, wide reach with shallow learning, or lots of spending without clear feedback.

🥳 And that wraps 2025

Looking forward to your thoughts, ideas, and pushbacks! And that wraps all the blog posts in 2025. Thank you for supporting this newsletter as we explore Singapore policy, game design, Moore’s law, shanxi architecture, writing as a cognitive tool, and broadway musical production together—it has been a wild ride. Wishing everyone a happy new year ahead in 2026!

Christmas tree: Gesture-controlled interactive hologram

A small holiday experiment: a digital tree made of particles, light, and hand movements, exploring how hand-tracking and particle systems can be used to create playful, ambient digital objects. Created with Google AI Studio & Gemini 3 Pro.

The tree responds to simple gestures:

  • Close fist → collapse

  • Move hand → rotate

  • Pinch → focus

Music: https://pixabay.com/music/christmas-christmas-christmas-434436/

Does writing scale or limit cognitive thinking?

Blog post #31

Thanksgiving felt like the perfect 3-day reading break to revisit Harari’s Sapiens. I first read it eight years ago back in college and it’s one of those books that has been formative to how I think. Back then, “collective imagination” was the big unlock, realizing that money, religion, and corporations are shared fictions that enable large-scale coordination.

This time, a different set of themes stood out: the role of corporations in history, the relationship between science and imperialism, capitalism’s growth logic, the discovery of ignorance, and the emergence of new forms of energy. This post focuses on one thread: how writing reshaped human cognition, and how that maps to the shift toward multimodal AI we’re seeing today.

Writing expands our cognitive limits

Humans scaled cooperation through imagined orders and scripts, external memory systems that extended our cognitive capacity.

Early writing systems like Egyptian hieroglyphs, Chinese logographs, and the Inca quipu weren’t created for poetry. They were invented to manage complexity: taxes, grain storage, inventories, land ownership, political coordination. They complemented spoken language with something more structured, durable, and precise. Civilizations that mastered writing became strong archivists, able to catalog and retrieve information at scale.

Over time, script didn’t just record thought, it reshaped it. It moved humans from free association, our natural cognitive mode, toward categorization, administration, and compartmentalized thinking.

Will multimodal AI interfaces restore our natural way of thinking?

Human cognition is inherently multimodal. We process the world through images, tone, gesture, texture, narrative fragments, and nonlinear jumps. Script, mathematics, and binary code compressed that complexity into narrower formats machines could understand.

Interestingly, multimodal AI might reverse that trend. My grandma and her friends (all in their 60s and 70s) almost exclusively use voice messages in their group chats. For people newer to technology, voice is intuitive, emotional, social, and low-friction. It feels natural. It builds trust. And it mirrors how we actually think.

If writing expanded our capability through structure, multimodal AI expands it again by restoring expressiveness. It shifts technology from something we adapt to into something that adapts to us.

Language also shapes what we can think

Humans invent tools to expand what we can do, but those same tools shape—and often limit—what we can perceive. Each representational system—language, script, math, binary code—extends our capability but also defines the boundaries of the world we notice. Language narrows attention to what can be said. Script narrows thought to what can be recorded.

Every tool enlarges capability, but every tool also creates a frame. People who speak multiple languages, or who can switch between different representational systems (text, visuals, diagrams, narrative), gain access to different slices of the same idea. The concept of “home,” “honor,” or “freedom” shifts meaning across languages. The same is true in design: a diagram reveals relationships that a paragraph obscures; a voice note carries emotion that text flattens.

Photo by Sergio Li on Unsplash

Multimodal AI, in that sense, widens the frames again—bringing machines closer to the full range of human expression we started with. If script once expanded our capacity through structure, multimodal AI may expand it again through expressiveness. The question isn’t whether machines can think like humans, but how we design the cognitive frames we build with them.

Maybe happy ending: modular design for emotion

My sister visited New York last weekend, and we watched the Broadway musical Maybe Happy Ending, a show I’ve been eyeing ever since I first heard about it on the beloved Korean TV series Witty Mountain Village Life (《机智山村生活》) back in 2021. Jeon Mi-do performed a sneak peek of “When you’re in love (Korean ver.),” where she originally played Claire in the 2015 Korean production.

Maybe Happy Ending was inspiring for so many reasons—not only because it won the 2024 Tony Award for Best Musical, but also because of the story behind how this musical came to life. I kept wondering: how did a story written in two languages, set in futuristic Seoul, find its way to Broadway, and what kind of collaboration made that possible?

Photo credit: https://www.forbes.com/sites/jerylbrunner/2025/05/16/the-visionary-design-behind-the-broadway-musical-maybe-happy-ending/

A story about robots, but really about us

Set in Seoul in the 2060s, two helper robots, Claire and Oliver, discover an unexpected friendship, and perhaps something deeper, after being left behind by their human owners.

It sounds like a sci-fi premise, but the story is actually about what it means to be human in a world increasingly shaped by technology, a question that feels ever more relevant today.

Minimalist, modular design

One of the most striking aspects of the production was how modular its design felt—from the stage to the characters to the narrative itself. It reminded me of the IKEA approach: minimal, yet versatile and powerful.

With only four actors and a few movable set pieces, the stage seamlessly transformed from a bustling city street to a ferry ride, and to a quiet forest filled with fireflies. The use of technology was equally thoughtful: lighting, projection, and spatial design amplified the performances, creating a sense of scale and emotional depth far beyond the size of the cast.

It echoed the game design principles I explored in my last blog post: identifying the essential elements that define an experience and made it special is key to effective narrative, and it definitely applies to the broader entertainment design.

Photo credit: https://www.forbes.com/sites/jerylbrunner/2025/05/16/the-visionary-design-behind-the-broadway-musical-maybe-happy-ending/

What was it like to write songs in two languages?

Creating an original musical not based on existing IP is already rare, let alone one written in two languages. Maybe Happy Ending was created by Will Aronson (music & book) and Hue Park (book & lyrics), a Korean-American duo who have spent the past decade refining it across stages and cultures.

In an interview, they shared their music creation process: Will composed the music first, then Hue, a K-pop lyricist, wrote the lyrics. This approach gave the lyricist more control because he is interpreting the music and turning it into lyrics. They began with full Korean lyrics before adapting them into English.

This reveals something interesting about bilingual storytelling: in English, lyrics often need to rhyme and tend to be more straightforward and specific to the narrative. In Korean, lyrics lean toward sound and feeling, with words chosen for their musicality and their meaning. These linguistic and cultural differences shape how the lyrics interact with the music to drive emotion and story.

Photo credit: https://www.forbes.com/sites/jerylbrunner/2025/05/16/the-visionary-design-behind-the-broadway-musical-maybe-happy-ending/

Walking out of the theatre, I kept thinking about the emotional resonance the show created through such a minimal, modular design. It’s a great example of design for emotion: achieving depth through simplicity. That balance is something I aspire to bring into the things I design every day as well.

搬来纽约两周

窗外下着细雨,纽约也正式入秋了。搬来纽约两周,身体和心态还在慢慢适应。从SF搬来纽约是这几年做过的比较大的决定,心情很矛盾,因为要离开在SF陪伴了这么多年的好朋友们,肯定是难过的。虽然搬来纽约并不是客观上必须的,但对我而言做出搬家这个决定却像是被推着走,不得不这么做的状态。这周在Tina家录关于搬家的播客,才意识到一个重要的问题:既然我本科毕业就想去纽约了,刚搬到湾区的两年也一直说“明年就要搬去纽约”,一直知道这里的城市氛围和人文多元性会更适合我的性格和兴趣,为什么等了六年才搬来?

办公室傍晚的心形月亮

坦白来说,就是各方面的条件还不够成熟。在SF的六年,我还在摸索和学会掌握更基础的课题,比如工作的基础训练、了解自己、建立和维系亲密关系。当我以为这些方面都逐渐稳定了下来时,今年年初工作遇到的困难——频繁换组的疲惫、与欧洲团队远程协作的低效和孤独、对大公司适配度的质疑——让我不得不重新审视现在的生活真的让我开心吗?四月的时候,各种痛苦感达到了顶峰,也很质疑为什么自己还要留在SF。工作占据了生活很大一部分,而SF又是一个被科技行业和工作overshadow的城市,因此城市和我适配的冲突更明显。既然工作的起伏是必然要接纳的现实,如果想自己的状态更稳定和可持续,至少我希望在生活上能在一个更开心的环境,来增加整体幸福感的稳定性。所以换城市看起来是打破了某种稳定,但于我而言是增加内心稳定感的尝试

决定要搬家像是主动打破困境的求生本能,这几个月刚好两个条件都ready了:一是这两年对自己的状态和想做的事理解更清晰了,有能力承接独自换城市这样的变动。二是终于有足够的勇气去主动创造想要的生活,而不只是停留在和别人描述对它的期待。听起来很简单,但还是花了很多时间才真正领会,并有能力一步步付出行动。在不换工作的情况下,比起留在SF去做这些改变,通过换到新城市去打破惯性对我来说更容易,包括去测试主线的稳定性、提高行动力、更新看问题和社交的方式等等。

在中央公园抓住夏天的尾巴

我仍然相信 “know thyself” 是最重要的问题,就像Naval Ravikant说的 “你做什么、和谁在一起、住在哪里,是人生中三个最重要的决定。” 很多人都不够重视选择“住在哪里”这个选择,所以我想在还没有伴侣、成本较低的时候,去尝试不同城市的适配度。

每当工作或期望的事情进展停滞的时候,我都会感到轻微的焦虑,想要逃离现有的环境,认为换到新环境就可以解决问题。这也是我还在学习的课题:更有耐心地去接纳生活的起伏,相信时机的自然流动,也更相信直觉。每个人的成长方式不一样,而我需要通过与新环境、新变量的碰撞反馈来认识自己。

搬到新环境这两周意识到:我对环境的舒适度和变化的要求比较高;和朋友们在一起、留出足够的时间休息和写作,仍然带给我很多轻松和快乐;耐心变多了,可以慢慢来不着急;更有行动力把时间花在想做的事情上;想要尝试的东西就马上去做,只做周计划,不做年计划。相对的,要继续练习的课题是:学会清晰表达需求,在独立思考和整理上不偷懒,做出简化可用的版本就好,更自如地建立边界,不再回避冲突。

搬来纽约,对我来说是一种自救,就像三年前开始写博客对我的意义一样。当然,我也还不确定这是不是一个正确的决定,但只要尝试了就会更了解自己,所以无论如何都是一种进步吧。哪怕最后决定搬回加州或离开纽约,也是这段旅程的收获。

Hudson river park

Neighborhood bakery

Designing AI like a game

One of my favorite lenses in The Art of Game Design by Jesse Schell is this: game design is decision making. Every choice, from rules and pacing to risk, reward, and visual feel, is a deliberate decision shaping how someone experiences a world. Increasingly, this same mindset applies to AI designers and builders, who are making small and big decisions that define the nature of artificial experiences every day.

AI product design is shifting away from deterministic, linear interfaces toward something more dynamic and game-like. Think of the shift from scrolling through Instagram to co-creating stories with Sora, the experience becomes less about consuming information, and more about interacting, improvising, and discovering. As AI tools begin to support play, curiosity, and connection, not just productivity, users gain greater agency, and designers must think more like game makers as well.

1. Designing relationships, beyond functionalities

How do people relate to games? What is it that makes them so compelling? “I like playing with my friends,” “I like the physical activity,” “I like feeling immersed in another world,” or “I like solving problems.” What makes designing a game different from designing a tool, like a trip planner? You could argue that a trip planner does feel like a game because travel planning is often full of aspiration and fun. But when you think about the games you’ve played, the experience is often deeply personal. Some games could be much more personally significant, memorable, and compelling for one player, yet mean little to another. That’s because gameplay enables imaginary experiences that are often unsharable and uniquely significant.

Unlike tool-based experience design, games offer something harder to define: a sense of freedom, responsibility, accomplishment, play, friendship, and emotional connection. These feelings aren’t outcomes, they’re relationships we form through interaction, feedback, and immersion.

As AI becomes more embedded in our lives, it begins to inherit the same tensions that game designers have long navigated: balancing user agency with automation, structure with freedom, guidance with exploration. A good game has the right amount of tension, challenge, and reward, whereas a bad game has too little or too many challenges. This resembles behavioral science's approach to nudge design: the right amount of friction creates meaningful engagement. Not everything should be "streamlined." Sometimes, intentional pauses, detours, and small resistances are what make an experience feel alive.

Xiangqi, or Chinese chess 象棋

2. From linear to non-linear narratives

One of the biggest shifts in designing for AI is moving from linear, deterministic flows to non-linear, probabilistic systems, where outcomes are more varied and randomized. With the same prompt or input, a generative AI tool might produce different outputs each time. Unlike traditional UX flows with clear cause and effect, where there is a fairly direct mapping between what designers create and what the reader or viewer experiences, games, and increasingly AI tools, require thinking in multi-dimensional interaction spaces.

Designers don’t just specify what happens; they define how things might happen, under what conditions, and with what probabilities. Like games, we gave users a greater extent of control and affordances over the pacing, sequence, and outcome of events. The craft becomes less about locking down exact UI and more about defining the criteria to guide a meaningful system where the user co-create the experience.

Black Myth: Wukong 黑神话:悟空 (2024)

3. Capturing the essence of an experience

How do you recreate the fight with your sister for the last piece of watermelon? Maybe it’s the heat of the summer air, the fan running loud in the background, the rules of rock-paper-scissors negotiated on the fly, or the sudden dash as someone grabs the fruit and runs. Sound, visuals, pacing, conflict, and rules all work together to not only convey a memory, but a felt experience.

The goal of a game designer is to figure out the essential elements that define it and make it special. Similarly, when we design AI products that help people learn, shop, create, or navigate, we’re designing for the feelings of accomplished after taking a baby step in learning, the delight of self-expression and exploration, and confidence in getting to places as planned. There are emotional layers beyond the utilities that define the essence of an experience, especially as they becomes more personal—what relaxes one person might overwhelm another, what feels expressive to one might feel superficial to someone else.

4. Fun is about generating new questions

So how do we design play? Jesse Schell defines a game as a problem-solving activity approached with a playful attitude. The essence of play isn’t just action—it’s also curiosity. For example, when an assembly line worker tries to answer the question “Can I beat my record,” the reason for his activity is not just to earn money, but to indulge his curiosity about a personal question.

Activity feels more like “play” than “work” when it attempts to answer questions like “What happens when I turn this knob?” “Can we beat this team?” “What can I make with this clay?” “What happens when I finish this level?” When we design AI experiences, let’s ask what questions does this experience raise for the user? What gets them to care? And what might spark even more questions?

The Legend of Sword and Fairy 4 仙剑奇侠传四(2007)

5. A tree is just a means to an end

We often talk about being human-centered, but as experiences become more personal (think the game that really bonded your relationship with a friend, the game that inspired you to see things differently), I love this framing from Jesse Schell:

“If a tree falls in the forest, and no one is there to hear it, does it make a sound?”

“Well, what is a sound? … If our definition of sound is the experience of hearing a sound, then the answer is no, the tree makes no sound when no one is there.”

“The tree is just a means to an end. And if no one is there to hear it, well, we don’t care at all.”

As designers, we don’t care about the tree and how it falls, we care about the experience of hearing it. The design itself, the buttons, flows, algorithms, is just the container, and not the end. How people relate to it, experience it, remember it, is the end that we truly care about. That’s where game design and AI converge: they both ask us to care less about the system itself, and more about the human on the other end.

山西古建设计笔记

The original writing is in Chinese, see the English translation here.

去年因为《黑悟空》游戏的大热,才开始对山西的古建筑好奇,七月休假顶着酷暑还是决定去一趟山西。从炎热湿润的广东降落到山西的那一刻,就能感受到习习凉风,难怪人们都喜欢来山西避暑。出发前对古建所知甚少,只剩下本科建筑史课模糊的印象,所以一路都在看纪录片补课——要特别推荐清华建筑教授王南讲的《千年一窟看云冈》和《千年一塔看应县》,讲得生动有趣,人也很可爱。

华严寺大雄宝殿(Photo credit to Bonnie Luo)

善化寺吉祥天女(Photo credit to Bonnie Luo)

一路从太原到忻州,再到五台山、代县、大同,感觉只瞥见了山西几百座庙宇的一角。哪怕是不起眼的小镇里,也会不经意经过气势恢宏的木塔或楼阁,牌匾和屋顶都有讲究。山西古建筑以唐代木构为代表,其中五台山上的寺庙群就有三百多座,风格偏简约厚重,有别于明清时期的繁复华丽。带着现代设计的视角一路走访古寺庙和博物馆,一路做的笔记:

1、木结构的建筑理念:为什么中国偏好木质建筑,而希腊偏好石材?这是以前上建筑史课讲到的第一个理解中西建筑区别的关键问题。除了自然资源的差异,中国建筑哲学更注重与自然的和谐,追求“气韵生动”和空间布局的灵动。木材有生命感,能呼应四季和时间变化。中国建筑学派认为建筑是动态的,更重视通过更新和修缮去传承,而非永垂不朽。而希腊建筑哲学更崇尚完美比例和几何的永恒之美(比如雕塑的“黄金比例”),以石材为媒介表现理性秩序和不朽。由于木构建筑难以长久保存,另一个有意思的区别是中国现存的文物实际上集中在了地下,更注重留存宏大的陵墓体系和陪葬品(兵马俑,礼器,玉器),希望让死者延续生前的生活规格和享受,是家族权力和身份的象征。而希腊以地上文物为主,比如各类神庙,雕塑,剧场和公共建筑,以公共艺术表现人与神、英雄的关系,而非个人死后的生活。所以墓葬相对简朴,并不追求重建生前世界。

北齐壁画博物馆

山西省博物院舞乐俑

2、屋顶承载的心愿:佛寺在古代作为人们寄托对神明想象的载体,好玩的是其装饰也根据不同门派的修行方法而变化。应县的净土寺是按照净土宗(汉传佛教宗派之一)的修行方法建立的寺院,核心信仰是通过念佛和观想西方极乐世界修行。因此净土寺的设计师将自己对西方净土的想象呈现在大殿的藻井(即天花板顶部装饰)上,比其他寺院都更华丽,屋顶上也设计了层层叠叠的斗拱,象征”天宫楼阁“。而多数寺院的藻井更为简朴,根据教宗和修行方法的不同,可能更注重罗汉造型或菩萨比例等等。以前看寺庙觉得他们都长得很像,看得很茫然。寺庙的设计逻辑以现代视角看就像是以完成来访者的朝拜目标为主的设计,来访者和神明都是建筑的使用者。那么寺庙的装饰、动线和互动模式(烧香、布施、斋饭、晚课)都是帮助人们完成朝拜任务的载体。

应县净土寺藻井(Photo credit to Bonnie Luo)

3、以神为本:工作中总是追求“以人为本”的设计,但走进晋祠和古代佛寺,有点不适应许多设计都是“以神为本”。比如晋祠戏台的位置,一开始很疑惑为什么要这样安排,后来才知道原来戏台是给神明唱戏的,不是给人唱的。很多佛寺大殿的门框也被设计成神明观看来往人流和庭院山水的“画框”。当然对来访者而言,这也象征着俗世和佛国的界限,创造出层层递进、迈入圣境的宁静氛围。虽然是为神而建,但人毕竟无法和神明做访谈了解他们的需求,所以建筑还是更多体现了人对神的美好生活的想象,就像净土寺的“天宫楼阁”,可能神根本就不住宫殿,也不吃蟠桃(笑)。

善化寺大殿门框(Photo credit to Bonnie Luo)

4、多元文化的时代:此行最出乎意料的是了解到从北魏时期开始,山西汉人文化就与印度(佛教传到中国)和波斯(丝绸之路)的西域文化相融合。石窟和寺庙里的壁画、纹饰和雕塑风格皆体现了当时宗教文化的多元。当时,中西亚和西域商人将玻璃、银器和艺术元素带入了平城(今大同)居民的日常生活,也影响了鲜卑与汉人的流行服饰和舞乐风格。这和20世纪初开始的全球化风潮相呼应:不同社区、国家间的人口、技术和文化的紧密流动。当时的平城作为西域与胡汉融合之地,确实称得上“美美与共,天下大同”。

北齐壁画博物馆

附录

5、佛教如何在中国本土化?这次参观山西博物馆才意识到,佛教最早传入中国是在魏晋南北朝,以山西大同为都(当时称平城)的时代。南北朝时期战乱纷起,人们愿意相信佛,因为其教义和修行方式鼓励忍受现世苦难,以换得来世进入极乐世界,所以战争年代的佛教反而兴盛。佛教的本土化也反映在雕塑和建筑上:比如在云冈石窟中的佛像开始穿汉族服饰,而非传统袈裟,让汉族信徒更能共鸣。无论古今,果然建立连接和信任的方式都如出一辙,即使是神明也要接地气。

忻州佛光寺东大殿内壁画—春日莫兰迪色的罗汉朋友们(Photo credit to Bonnie Luo)

6、莫兰迪色的罗汉:忻州佛光寺是此行最喜欢的寺院,在五台山脚下,建筑风格古朴,院子静谧舒服,满眼绿色。东大殿的罗汉生动有趣,神情衣着都洋溢着生命力,难以想象当时的匠人就懂得用春夏季的莫兰迪色了。

忻州佛光寺内院(Photo credit to Bonnie Luo)

Back to the basics: when breath becomes air

Over the past few months, I’ve been rethinking what brings me joy and learning to be more in tune with how my body feels. Back to the basics. I don’t have all the answers yet, but I’ve gathered enough courage to take small steps forward without full clarity—trusting that clarity comes from action, not speculation.

2025 has been the year I’ve made the most progress in understanding myself—owning both the good and the bad. It’s freeing to feel more honest, accepting, and comfortable in my own skin. The reassuring news is confirming, yet again, that my anchors are so easily accessible and simple: reading to explore and writing to make sense of it in my own way. These have quietly guide how I make decisions and how I spend my time.

The Philosopher’s Walk in Toronto, my favorite footpath in college (June 2025)

This past weekend, I picked up When Breath Becomes Air by Paul Kalanithi because it somehow felt like the right timing and his writing was indeed deeply moving. Even though I’m not in the medical field, I felt strangely connected to Paul’s motivation for practicing neurosurgery and how he viewed his role as a medical practitioner. Like him, I also double majored in Literature and Life Science, and was struck by the way he brought a multidisciplinary lens to his work. His approach reminded me of what I strive for in my own role as a researcher: bringing more human-centered perspectives into how we apply emerging technologies to real-world problems.

Paul sees medical and neuroscience as the discipline where biology, morality, literature, and philosophy intersect. It never occurred to me that some of life’s biggest questions—about identity, death, and meaning—often arise most urgently in medical contexts. Paul originally considered studying biological philosophy, but chose to gain direct experience through practicing medicine instead. That tension deciding between abstract critique and hands-on impact echoes my own journey, where I’ve also chosen to be closer to applied research and product development rather than staying solely in behavioral science papers and literary theories. I wanted more direct impact and learnings from the messiness of implementation.

“The highest ideal was not saving lives—everyone dies eventually—but guiding a patient or family to an understanding of death or illness.” That line stayed with me. It’s not about heroism, it’s about guiding someone to make sense of the hardest things: what kind of life is worth living, and what quality of life is acceptable after treatment. In a strange way, it reminded me of my own work. Not because it’s equally weighty, but because research, too, is about helping teams understand perspectives that are often ambiguous, subjective, and deeply human. It’s about making space for hope, fear, love, beauty, envy, striving—things that don’t show up in dashboards or metrics, but are central to the experience of technology.

I was also struck by Paul’s reflection on the early meaning of the word patient: “one who endures hardship without complaint.” It’s a gentle reminder to meet people where they are. To see people not as problems to be solved, but as whole beings to be understood.

Toronto’s summer in King’s College Circle (June 2025)

Hope you’re enjoying the summer green as much as I am! (As you can tell, all the images are green and blue.) See you in the next post.

Keeping the Moore's Law alive

Semiconductors, better known as “chips,” might sound abstract if you don’t work in hardware. But they power nearly everything in our daily lives: phones, laptops, cars, and increasingly, the infrastructure behind AI. The real challenge today isn’t just about having enough data, it’s also about having the computing power to process it. Chips are the bottleneck, and producing them is staggeringly complex, capital-intensive, and geopolitically sensitive.

Like many people in tech, I’ve heard the word “semiconductors” thrown around all the time, but never really understood why they’re so central to everything. A few friends recommended the Chip War by Chris Miller last year and I finally finished it a few weeks ago—loved it. Here are four ideas that stuck with me:

1. Moore’s Law becomes an industry growth roadmap

Moore’s Law is often described as an observation: the number of transistors on a chip doubles roughly every two years, driving exponential increases in computing power.

What I didn’t realize is that it became a self-fulfilling growth roadmap for the semiconductor industry—a shared goal that governments, investors, and chipmakers aligned around. Despite concerns about physical and economic limits, companies organized roadmaps around making this "law" true, treating it less like physics and more like a shared mission. It’s a powerful example of how a narrative turns a forecast into a coordination mechanism for a global industry.

Photo by SpaceX on Unsplash

2. Early demand rarely points to the final use case

When transistors were first invented, few people knew what to do with them. Beyond replacing bulky vacuum tubes, their potential seemed limited, much like how it’s cognitively hard for people to envision how AI could fundamentally change our lives today.

What changed everything was an unexpected early adopter: the U.S. military. Defense agencies and NASA needed compact, high-performing electronics for missiles and space exploration in the 1960s. That early niche demand gave semiconductors a launchpad to scale production. As costs dropped, chips moved into everyday consumer products: radios, calculators, and eventually, personal computers.

What struck me most was a surprising parallel with modern UX and product strategy: Fairchild Semiconductor didn’t just wait for demand to emerge. They actively imagined it, creating detailed blueprints of future consumer devices powered by chips before the market even existed. It was a way to reduce uncertainty and spark demand, much like today’s visionary product mockups or AI pitch decks that help people visualize what doesn’t exist yet.

Photo by NASA on Unsplash

3. Why Intel fell behind in the AI race and Nvidia took the lead?

Intel led the personal computing revolution in the 1970s, driven by Bob Noyce’s bold bet on microprocessors and his belief in the future of personal computing, a vision few shared at the time. But in the AI and graphics era, Intel struggled to keep up, especially in advanced chip manufacturing and AI infrastructure, where Nvidia and TSMC moved faster and captured the momentum.

Despite early investments in foundational technologies like EUV tools that enabled GPU development, Intel was slow to pivot towards AI computing. Nvidia, on the other hand, recognized the opportunity early and bet aggressively on AI acceleration, developing CUDA and positioning its GPUs as the backbone of AI computing. What began as a graphics company transformed into a core infrastructure player for AI.

Beyond technical challenges and leadership strategy, company culture played a key role in this divergence. Intel’s structured, risk-averse environment prioritized predictability and incremental progress—a pattern consistent with the classic innovator’s dilemma, where incumbents hesitate to disrupt their own successful models even as new paradigms emerge. In contrast, Nvidia built a fast-moving, mission-driven culture with flat hierarchy and tight feedback loops. Under Jensen Huang’s leadership, the company is able to move quickly and shape the AI landscape. Building a timeless company isn’t about one single bold move, it’s about making the right bets at the right time, again and again.

Photo by TangChi Lee on Unsplash

4. How Asia broke into the high-value part of the supply chain?

When we think of semiconductors, we often picture Silicon Valley. But today, the center of gravity for advanced chip manufacturing lies in Asia.

Taiwan produces nearly 40% of the world’s new computing power each year. South Korea dominates memory chips, and Japan supplies critical materials like silicon wafers and specialty gases. Europe and the U.S. still lead in chip design tools, like ASML’s EUV machines and ARM’s architectures, but the most complex and valuable manufacturing steps are concentrated in Asia.

This shift wasn’t accidental. Asian governments took a proactive, hands-on approach, shaped by a Confucian-influenced philosophy of state-guided development. They funneled capital and pushed banks to fund strategic sector, hired US-trained engineers, kept their exchange rates undervalued, and secured tech transfer through partnership. In Taiwan and South Korea, support from the U.S. motivated in part by geopolitical rivalry with Japan also played a key role.

Today, power in chips isn’t just about who makes them, it’s also about who buys them. China, though behind in cutting-edge chips, controls massive demand for lower-end components. That market power gives it leverage, as it remains both the U.S.’s biggest customer and competitor. It’s a complex balance of dependency and rivalry, one shaped as much by market dynamics as it is by politics and culture.

It’s fascinating to learn how one of today’s most critical industries has been shaped not just by technology, but by the interplay of markets, culture, and geopolitics. As we explore emerging use cases for AI, the history of the chip industry offers a mirror—that technological shifts are rarely just about engineering, they’re about timing, narrative, and the systems we build around them.

悉达多遇见的河流

Years after first reading Hermann Hesse's Siddhartha in college, I revisited the novel and noted a few themes that stood out for me. This time, I resonated with Siddhartha much more around understanding the limitations of seeking wisdom from others, the idea that life’s meaning lies in the act of living itself, and the depth of insights offered by nature and fictional narratives.

这周重新读了黑塞的《悉达多》,这本学生时代就很喜欢的小说。当时的经历太少,只读懂了情节,对背后的隐喻一知半解。在二十七岁的年纪读来,虽然对悉达多的一些观点仍有疑问,但对他不同时期的经历有了更多共鸣和理解。

1 圣贤理论的局限性

悉达多年少时就意识到要去寻找“自我”(Atman),于是离家追随沙门修行,期望通过戒律和禅定摆脱自我。然而,不久他就发现学习禅定和克己只是暂时逃避生命的痛苦和无意义感,无法带来真正的安宁。圣贤的智慧终归是他人经验的总结,不能代替自己的感悟。许多人追随乔达摩(释迦牟尼),将他当作信仰,但如果信徒们内心没有自己的教义和律法时,最终也难以真正获得救赎。只有走出自己的路,诚实面对内心的渴求,才能获得安宁。不去玩别人设立好的游戏而去创造属于自己的游戏是很难的,但也许这是最靠近正确答案的路径。

Photo by 雨空 on Unsplash

2 活着就是人生的意义

我们这一代人常被鼓励去寻找生命的意义。我仍然觉得追寻意义是重要的,但也开始接受或许生命本身没有固定的意义,活着并不断经历就是其意义所在。通过与不同的人和事碰撞,更了解自己,也有能力诚实面对和接纳自己已经是艰难的课题了。所以当我读到悉达多从向圣贤寻找智慧转向 “拜自己为师,认识神秘的悉达多” 时,也深有共鸣。在以前的悉达多眼中,森林,群星,动物,河流都没有意义,于是他对万物熟视无睹。当他不再询问意义后,反而可以清晰辨明地看见世界,单纯地欣赏自然之美,也有更敏锐的观察力。“自我” 是无法仅靠思考或遵循他人经验来捕捉的,需要实实在在地生活,倾听内心的声音和环境给你的信号。我以前对这些信号也熟视无睹、敏感度很低,但现在越来越认同重要的决定需要等待信号,除了耐心等待和增强基础能力以外,没有太多可以做的。这和学霸猫说过的 “人除了照顾好自己以外,无事可做” 也是类似的逻辑。

3 游戏心态的背后是游离于真实生活

带着游戏的态度生活,实际上也意味着冷眼旁观,只寻得开心就够了,不愿全身心投入生活和劳作中。 过去的很长时间里,我也一直处在这样不够认真、游离的状态,对很多事情都不够在意,觉得好玩就行了。作为一个知识工作者,哪怕我很喜欢理论和建立框架去解构人和世界的运行规律,并以此为乐,这几个月在工作中也越发体会到理论和概念本身的价值有限,只有当这些想法被实现的时候才能带来真正的价值。松弛的态度虽然表面上看似是更健康的心态、也减少了焦虑,但实际上也反映了内心的不坚定和逃避责任。这也是我下半年的功课,学会更 “在场”,愿意承担真实的风险与责任。

4 倾听河流的启示

当悉达多步入尘世成为商人和赌徒后,也承受着沉闷和无意义的生活陷入绝望。他来到河边,原本打算结束生命。站在河流面前,听到潺潺流水的生命力时,他忽然意识到自己的痛苦源于对物质的追求和舒适生活的沉迷,丧失了爱与看见万物的能力。他发现了河流的秘密,“不懈奔流,却总在此处。永远是这条河,却时刻更新。” 这让我想到河流的两个隐喻:一是人的成长就如同河流的不断流动和轮回,不断从头再来,学到一些事情后又会犯错,经历失望和痛苦,再重新站起来。哪怕泥泞不堪,也要心悦诚服地随它走,流动和不稳定是常态。二是悉达多提到的河水中成千上万的声音,“王的声音、卒的声音、牡牛的声音、夜莺的声音、孕育者的声音、叹息着的声音” 象征着生命的多样性和复杂性。倾听河流的声音,也是倾听生命的声音。河水奔涌着流向湖泊、湍流、大海;抵达目标,又奔向新的目标,这也是人一代又一代生命周期的隐喻。

5 虚构作品的自由和广阔

最后,回顾这些年的阅读偏好,我对虚构和非虚构作品的看法发生了很大变化。学生时期看了大量虚构类文学作品,甚至本科也选了文学作为专业之一。但从大四开始,出于工作和专业的考量,我开始看更多应用类的非虚构作品,当时看了大量行为经济学和社科类的文本。直到近几个月,随着工作进入了新的阶段,又机缘巧合重新找回了对虚构作品的兴趣,意识到虚构类题材有更多自由度去探究现实所无法触及的隐喻与问题, 例如《悉达多》中河流的隐喻,以及博尔赫斯短篇中的图书馆与花园。它们让我们跳脱现实的框架,有空间去思考更广阔的命题。

P.S. Check out this song 河流(River)

The world as reinforcing cycles

Ray Dalio’s Principles for Dealing with the Changing World Order: Why Nations Succeed and Fail presents a comprehensive, longitudinal approach for understanding the world, one that our recency bias sometimes forgets. For those who have primarily experienced periods of growth or focused on the post-WWII era, including myself, it can be difficult to envision a world radically different.

To understand and navigate the complexities of our time, it’s important to explore a broad range of historical examples of how nations rise and fall, which would help uncover the fundamental, timeless patterns that shape these cycles. Dalio’s method of analyzing the intricate forces at play and synthesizing the cause-and-effect relationships behind historical progression is a powerful model. Personally, it has inspired me to rethink how we might study the complexity of user and market behaviors, especially how we could distill principles and patterns to better understand and guide the seemingly complex behaviors of AI/ML models as UX and product builders.

Dalio, Ray. Principles for Dealing with the Changing World Order

01 A 1400-year perspective

Looking back 1,400 years (~600 CE), human productivity has steadily increased global wealth and living standards. While different societies rose at different times, the reasons were consistent — education, inventiveness, work ethic, and economic systems turned ideas into output. For example, wealth once centered on agricultural land, then on machine output, and now on digital data and information processing.

Dalio, Ray. Principles for Dealing with the Changing World Order

Personal notes

  • Time scale: Dalio’s 1,400-year perspective is a powerful reminder that our current experiences are just a tiny part of a much larger cycle. Understanding our position within these cycles is crucial for discerning what truly matters amid the noise.

  • Diverse and global perspective: Drawing insights from a diverse, large sample size across space and time is essential. Too often, we only focus on a single country, missing valuable lessons that a global perspective can surface.

  • Cause-and-effect relationships: As we shift towards building probabilistic experiences with ML and AI, our role as designers and product builders increasingly involves defining and communicating the underlying cause-and-effect relationships that guide model behavior. Seeing how Dalio studied and presented the cause-and-effect patterns that drive historical progression is an inspiration for effective communication of complex insights.

Guiding question

  • How might we build collective intuition for long-term thinking?

02 Reinforcing nature of rises and declines

Productivity evolves steadily but doesn’t cause sudden shifts in wealth and power. These shifts come from cycles driven by logical cause-and-effect relationships, such as boosts, booms, evolutions, and wars.

My biggest takeaway is the reminder that strengths and weaknesses are mutually reinforcing. For example, education, competitiveness, economic output, share of world trade, contribute to the others being strong or weak, for logical reasons. This also reflects the old Chinese saying, “That which is long divided must unify; that which is long unified must divide.”(分久必合,合久必分)

Dalio identifies eight key determinants of a nation’s strength: education, competitiveness, innovation and technology, economic output, share of world trade, military strength, financial center strength, and reserve currency status. These determinants reinforce each other, driving a nation’s rise, peak, and decline:

  • Rise: Strong leadership, inventiveness, and education foster a strong culture and efficient resource allocation, leading to economic growth, strong markets, and financial centers.

  • Peak: The nation enjoys prosperity with low debt and minimal gaps in wealth, values, and politics, under a stable world order. However, within capitalist systems, uneven financial gains widen the wealth gap.

  • Decline: Excessive borrowing and financial bubbles weaken the nation as debt rises and wealth, values, and political divides grow. Emerging rivals challenge the nation, leading to a painful restructuring.

Dalio, Ray. Principles for Dealing with the Changing World Order

Personal notes

  • Reinforcing dynamics: Although I’ve long heard of the saying that our weaknesses tend to hide behind our strengths, it’s not until reading this book did I truly see how powerful this means over the scale of history, manifesting through humans in aggregates how our strengths and weaknesses reinforce each other in a cyclical pattern.

  • Mirroring business lifecycles: The rise and fall of nations closely mirrors the lifecycle of a business—from growth to maturity to decline. Similarly, a strong founding team that allocates resources efficiently is more likely to achieve product-market fit, driving rapid growth. However, at its peak, a business may develop inefficiencies that undermine its strengths. Its ability to remain a market leader depends on managing these growth factors effectively.

Guiding question

  • What are the key factors that drive and hinder a company’s growth, and how can we accurately assess them to ensure long-term investment in the right areas?

03 Measuring real value

Dalio’s articulation of how the debt cycle works is the best I’ve seen, so it’s worth getting into more details in this section. In a capitalist system, money, credit, and economic growth are the biggest influences on how wealth and power rise and decline. The difference between real vs. market value varies at different times of the cycle and a typical long debt cycle goes the follows:

  • Early stages: With little or no debt, hard money like gold is used for transactions because no trust/credit is required. Later, to avoid the risks and inconvenience of carrying metal money, credible parties issue paper claims on hard money, which soon function as money itself.

  • Middle stages: Initially, the number of paper claims matches the hard money in reserve. Over time, the appeal of credit and debt grows, leading to trouble when income can’t cover debts, or when claims on money outpace the growth of actual assets or goods to back them up, making debt repayment impossible.

  • Late stages: In a debt crisis, printing money becomes the quickest way to reduce debt, allowing the credit/debt cycle to restart. This approach, though not well understood, seems beneficial because it alleviates debt, obscures the harm to holders of money and debt assets, and inflates asset values in a depreciating currency, giving the illusion of increased wealth.

Dalio, Ray. Principles for Dealing with the Changing World Order

Personal notes

  • Measuring real value: Evaluating real value is crucial in early-stage research—uncovering the honest, unfiltered opinion and behavior of users. Observing where real vs. market value diverges or aligns helps guide investment in products and infrastructure, especially with overhyped areas like generative AI and agent.

  • Aligning with ground truth: As builders of probabilistic ML models, how do we evaluate if our predictions actually align with ground truth to guide model training and iteration? This is a fascinating area to be further explored to keep our products truly user-centered.

Guiding question

  • How might we create a clear feedback loop for model iteration and collaboratively define principles for ML model behavior, involving UX, product, engineering, and data teams?

Being able to combine multi-disciplinary thinking to expand our perspective has been my ongoing passion, and Dalio sets a great example for the kind of in-depth, longitudinal studies needed to unpack the complexity of our world, uncovering clear cause-and-effect relationships that are easy to understand and learn from.

Ending on a personal note -- sharing a photo of the Chapel of Soul from Porto with y'all since I've been traveling in Portugal lately. Humans are so small in front of this ;)

巴黎 French Kaleidoscope

五月和家人在法国休假,重游了巴黎和南法。上次来已经是六年前了,在加州待久了有时候视野和审美会变得局限,巴黎是一个特别尊重美和人文的城市,很喜欢巴黎人自由、优雅、松弛的精神面貌。

01 巴黎的设计风貌

今年七月巴黎即将举办奥运会,看着城市筹备奥运觉得耳目一新,好玩的是奥运场馆竟然把传统的砖红色跑道漆成了淡紫色(”shades of lavender”),这种看似细微、打破常规的设计既有创意又不增加太多成本,完美展现了法国的设计功力。此外,在凡尔赛国家议会场馆还有一组特别的雕塑——将断臂维纳斯设计成参与各项比赛的希腊神像,有打网球的、射箭的、还有冲浪的。与传统希腊雕像的象牙白不同,这些雕像采用了颜色鲜艳的奥林匹克高饱和亮色,使它们既有反差感,又充满活力。这两处设计是我目前看到最有创意、也最喜欢的奥运设计。

杜乐丽花园仍然是我私心最爱的公园,最初因为摄影师 Guillaume Lavrut 的喷泉系列摄影作品而开始关注这座花园的设计。出发前,我正好在读 Jane Jacobs 的《美国大城市的死与生》中关于公园设计的章节,这次逛花园时明显能感受到其以人为中心的设计理念。花园的中心不是景观,而是供行人散步的林荫大道。花园中心的喷泉设计优雅简约,突出了围在喷泉周围的绿色躺椅。每天都有大量游客和当地居民围坐在喷泉边聊天、吃法棍、发呆、约会,或者在逛完卢浮宫后出来透透气,这些坐在喷泉周围鲜活的人才是公园的主角。形形色色的人聚集在喷泉周围,自然形成了一个舞台中心,这些围坐的人既是观众也是表演者。正如 Jacobs 在书中提到的,好的公园不仅需要周围有足够丰富的商业和住宅区,这样有各种各样需求的人群能在不同时间段出入公园,同时也要提供一个能吸引人们自然汇聚的舞台中心,这样公园才会显得热闹而有活力。我想这也是杜乐丽花园的设计理念。

另一个印象深刻的例子是巴黎戴高乐机场的设计,候机楼的桌椅采用丝绒面料搭配黄铜复古手柄,灯的造型宛如烟花即将绽放。整体设计融合了现代的简约与古典的华丽,这一定算得上欧洲最美的机场之一。

Jardin des Tuileries

Assemblée Nationale

Jardin des Tuileries

Charles de Gaulle airport

02 法网:网球爱好者的迪士尼

这次法网的体验远超预期,像是网球爱好者的迪士尼乐园。园区内有三个主场馆,明星球员会在这里比赛。主场馆内设有直播间,现场有新闻记者播报赛况。场馆外还有多个室外训练场,观众可以近距离观察球员比赛,每分结束后观众可自由进出球场。走在园区里,第一次真切感受到来自世界各地的网球爱好者为一项运动而相聚的热烈氛围。

这次买的日场票是第三轮的晋级赛,在主场馆之一 Court Philippe-Chartrier 看了来自意大利的 Sinner 和俄罗斯的 Kotov 的男子单打。看到他们在赛场上拼尽全力去发挥出自己的最佳状态和水准,沉着地应对每一次进攻,观众全神贯注地关注着赛场的动态,那种热烈而专注的氛围很动人,是透过屏幕难以复制的独特体验。

场馆内刻着一句话:“Victory belongs to the most tenacious”(胜利属于最坚韧的人)。当时深深被这句话击中,这不仅是对球员的鼓励和提醒,想做好任何事情,都需要专注力、体力、和意志的坚韧。看 Sinner 和 Kotov 单打时,从观众席上能清晰地看到球员的优劣势,当时一直在想,果然一个人的优势也藏着他的劣势。比如 Sinner 擅长打靠近出界线的球,不好接,但也容易出界;Kotov 则喜欢打刚刚过网的球,也不好接,但经常因为球力度太轻不过网而丢分。看着他们比赛,意识到运动时所展现的状态和风格,其实也是生活中我们做事和理解自己优劣势的隐喻。后来得知 Sinner 在半决赛时竟然是男单积分最高的球员,这场比赛看似势均力敌,是因为他只需要发挥足够赢对手的实力就可以了。果然,胜利属于能留在场上最久的人。

Stade Roland-Garros

Court Philippe-Chartrier

Stade Roland-Garros

03 卢浮宫收藏的人类佳作

重游卢浮宫,再次被这里藏品的高水准所震撼。尤其钟爱群像画,其中最喜欢的是 Jacques-Louis David 的《拿破仑一世加冕大典》(The Coronation of Napoleon,1807)。站在如此大规模的作品面前,会感受到自身的渺小,反衬出作品所描绘世界的宏大。画中的众人见证着拿破仑为皇后加冕而神态各异,各怀心事。这也是我喜欢群像画的原因,每个人都会亲历某些历史事件,尽管只是看似渺小的旁观者,但个人的多样视角本身就具有意义。

去完意大利后,对雕塑的鉴赏力提高了很多。这次在卢浮宫特别喜欢古希腊雕像《胜利女神》(Victoire de Samothrace,公元前190年)。这座雕像将胜利的意象表达的淋漓尽致,尽管雕像失去了双臂,但身后的翅膀和向前倾的姿态展现的磅礴气势令人难忘,优雅轻盈又有力量。

艺术藏品和体育竞技的共性在于人们对探索人类潜能的热爱。在诺大的卢浮宫,场馆地图上只标出了十多件“最值得看”的藏品,这些在上千年历史长河中留下来的作品,就像体育竞技中胜出的冠军,只有最高水准的作品才能在现代仍然与普世价值共鸣。这也是为什么传世作品的题材都围绕着人性最本质的诉求,比如追求真、善、美、权力、爱情和战争。每个时期的艺术流派都有不同的媒介和表现方式,但其内容所表达的人类底层诉求和情感是不变的。

这也对应了一种简单易操作的艺术评判标准:当你站在一件作品面前,如果觉得有打动你的地方,那它就是有价值的。个人喜好会潜移默化地组成当代社会的价值取向,而时代的品味则选出了那些在卢浮宫地图上通过了时间考验的佳作。

Victoire de Samothrace

Vénus de Milo

04 体验人间的镜子

沉浸式体验最佳的是由巴黎商品交易所改造的皮诺私人美术馆(Bourse de Commerce - Pinault Collection),这座美术馆主要展出 François Pinault 五十多年来收藏的当代和新兴艺术作品。前身是巴黎的商品交易所,后来由日本建筑师安藤忠雄(Tadao Ando)以当代建筑风格进行改造。馆内有一个中央展览空间,以便参观者有更多元的动线和观赏视角。

这次最喜欢的是韩国艺术家 Kimsooja 在新展 Le Monde Comme Il Va(《世界如其所是》)中的镜面艺术装置。展览以伏尔泰的哲学短篇为名,讲述一位天使派使者前往人间去观察人类行为的故事。面对人类社会的矛盾与不确定性,神明不确定他们是否值得继续生存,还是应被毁灭以创造一个更好的文明。最终,天使决定让世界如其所是,相信人类能掌握自己的命运。

对应这则故事,Kimsooja 在圆形大厅的地面铺设了一面巨大的镜子,参观者可以穿上鞋套走在镜子上。镜子看似是一个透明的媒介,却诚实地反射出周围的环境、场内形形色色的人和透过展馆玻璃穹顶的蓝天。站在镜子上,看到倒置的世界时,对环境的感知会变得强烈。很喜欢镜子的隐喻:人通过不断经历事情、认识不同的人、与不同环境碰撞,以这些为镜去映照自己的状态。在亲密关系中尤其如此,对他人的认知常常只是映照了自己内心的想法。镜子同时也提醒我们关注现实中的人和具体场景,而不仅仅看到自己想看到的一面。这也是我今年的人生课题,通过接纳和融入真实的环境和他人,真正认知和完善自己。

Bourse de Commerce - Pinault Collection

Bonus: Château La Coste

在普罗旺斯的 Aix-en-Provence 北面约20分钟车程处,有一个艺术酒庄 Château La Coste。创始人是一位热爱葡萄酒和艺术的北爱尔兰人,酒庄的设计理念与加州的 Donum Estate 很像,主打酒庄和艺术中心的结合。酒庄内有许多由安藤忠雄设计的建筑,其中一条走廊由艾未未设计,庄园里还收藏了 Louise Bourgeois 的大蜘蛛雕塑。我们到达酒庄时已经快日落了,是临行前偶然在地图上发现的,也算有缘分,希望下次有机会多待几天。

Crouching Spider, by Louise Bourgeois (2003)

Drop, by Tom Shannon (2009)

Art Centre, by Tadao Ando (2011)

Mater Earth, by Prune Nourry

在法国放空的两周里,虽然身体在旅行,但思绪仍在消化上半年工作中的思考。在新环境里,我依然对事物的设计逻辑和人在其中的角色很敏感,但同时也意识到,自己看到的东西真的只是自己在意的东西而已。我的视线总会落在人本设计、打破常规的创意、人的运作模式和作品的传播性等等。以及关注自己一直渴望的能力,比如竞技场上的坚毅和能够接纳真实世界的包容力。看到不同文化下,大家都困惑着向前,反而觉得自己不是一个人在挣扎。也许就像皮诺美术馆里的展览《世界如其所是》里提到的,重要的是对真实世界保持觉知,认识自己,相信人的能动性会带着我们依旧向前。

What makes a great park?

Why are some parks so lively and popular, while some are so lonely and even unsafe? When we think about how to improve our neighborhood, many would say we need more parks and open space. Parks have been perceived as a cure that can uplift a neighborhood, stabilize real estate value, and bring the community together — but that is a false reassurance because park behaviors are actually pretty volatile and extreme.

In the book The Death and Life of Great American Cities, one of the most influential books in the history of American city planning, Jane Jacobs talks about the use of neighborhood parks and the drivers that are critical in the making of a vibrant, well-loved park. She advocates for community-based planning and the importance of preserving diverse, mixed-use neighborhoods.

At the time, this book was written as a critique of the top-down city planning approach advocated by Robert Moses, the most powerful urban planner in NYC in the mid-20th century, who believed in large-scale urban renewals and modernizing the city at the expense of disrupting existing neighborhoods. Just like human behaviors, parks also have distinct “park behaviors” and layers of complexity, which is a mix of design, urban planning, and psychology.

01 Mixture of Primary Use

The top driver of a park’s success is ensuring a mixture of primary use surrounding it. When you think of a lively park, what matters the most is actually having enough people who enter and leave the park at different times. That required a mix of primary use surrounding the park, including residential, office, and small business, etc. That’s why parks in the financial district tend to be less lively because people all operate on the same daily schedule — they enter the park at once, then leave after work hours. Most of the day and the evening is empty. When an area has a single, dominant use, it imposes a limited schedule, which leads to a vicious circle of an unpopular park.

Photo by Lison Zhao on Unsplash

02 Diversity of Park Design

Besides schedule and usage diversity, Jane Jacobs additionally introduced 4 essential elements that introduced diversity at different levels for a lively park. First, at the eye level, a vibrant park usually offers sufficient stimulation to different usages and moods, also termed intricacy by Jane Jacobs. When the park is too small or its design is very flat, where you can see the whole park at a glance, that’s not enough stimulation at eye level to keep people around. Changes in the rise of grounds or the presence of various focal points introduce subtle differences at the eye level that keep people stay curious to explore. 

For example, San Francisco’s valleys are great examples of this principle, where the ups and downs of the hills introduced intricacy at the eye level. Another great example is the classical gardens in Suzhou, where the location of landscape, rocks, hills and rivers are all strategically located in the garden to introduce subtle eye stimulation from every angle. 

The other related elements are centering and enclosure. Just like a good story, a park also has its climax in the hero journey. We can think of a park as a stage, where there is a center where everyone is both a spectator and performer at once. And finally, the sun is also important, otherwise the park feels gloomy and sad, which attracts less people.

03 Differentiation & Demand Good

The third driver of a lively park is thinking about how it differentiates. There are many parks in a city and sometimes they have similar purposes to each other. Just like building a product, we need to think about what specific, differentiated value a park provides, because there are only so many people in a city and parks essentially are fighting for attention and limited time, just like TikTok and Instagram. 

Jane Jacobs encouraged us to think about the “demand good” for a specific park. For example, having a nice landscape itself isn’t a demand good, but having sports fields, swimming pools, or activities like carnivals are. We can also figure out the demand good by observing its natural use. This again resembles product development where we identify product market fit by observing how real needs are met with our offerings. This is the beauty of multi-disciplinary learnings where we see similar patterns of how things operate from seemingly different fields.

Ultimately, the making of a beloved park isn’t just about the design of the park itself, it's about nurturing diversified neighborhoods capable of using and supporting parks. This is a great example where to make the design successful, designing the ecosystem surrounding it and considering its overall context is critical to its long-term popularity.


There is a video version of this post on YouTube if you prefer a visual walkthrough.

HCI paper review: alignment in the design of interactive AI

In 2024, I set the goal to learn more about designing for human-centered AI and would love to share my learnings from reading academic papers in the field as part of the journey. The hope is to make knowledge in design, behavioral science, and human-computer interaction friendly and accessible for everyone.

In this blog post, I’ll share my review notes for the paper AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support by Michael Terry, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, and Meredith Ringel Morris. 2023. arXiv:2311.00710.

User Interface Shifts in Computing History

For those who are unfamiliar with user experience (UX) or human-computer interaction (HCI), here is a high-level overview of how user interfaces (UI) have evolved in the past 60 years:

  1. Batch processing: The first general-purpose computer was introduced around 1945. The UI was a single point of contact where people needed to submit a batch of instructions (often a deck of punched cards) to a data center, then they would pick up the output of their batch the next day. It was common to need multiple days to fine-tune the batch to produce the desired outcome.

  2. Command-based interaction: Around 1964, the advent of time-sharing (multiple users sharing a computer’s resources for tasks) led to command-based interaction, where users and computers can take turns, one command at a time. In particular, graphical user interfaces (GUI), using visual elements that convey information and actions a user can take, have become the dominant UX since the launch of the Mac in 1984. A strength of GUI is that it shows the status after each command if designed well. Users don’t need to have a fully specified goal initially because they can reassess the situation and modify their goal/approach as they progress.

  3. Intent-based goal specification: With the third UI shift, represented by the current generative AI (e.g. ChatGPT, Gemini), the user tells the computer what outcome they want, but does not specify how it should be accomplished. Today, users primarily interact with the system by issuing rounds of prompts to gradually refine the outcome, which is a form of interaction that is currently poorly supported with rich opportunities for usability improvements and innovation.

From batch processing to command-based interaction, the speed of fine-tuning the desirable outcome improved drastically. However, with the third shift to human-AI interaction, the lack of transparency of how the AI performs a task, especially for the increasingly complex and high-consideration scenarios, presents new UX challenges for the HCI community today.

Interaction cycle for human-AI systems

The ultimate goal of human-AI interaction is to efficiently achieve a desirable goal for the user. Today, this process involves 3 basic steps: user input, system processing, and system output.

Different from the traditional command-based interaction, where a user monitors and gives commands at every step in the process, with an AI system, the user’s skills shift to focus on (1) being clear and effective at articulating the goal and providing input, and (2) once the output is available, being able to assess if their goal has been achieved.

As an analogy, a human’s role switched from being the executor (take main control to execute) to being the manager (tell another person to execute for you). It requires a different set of skills and mindset, just as when an independent contributor switches to a manager role. For a team to be effective, the manager can’t micromanage every step, otherwise, it decreases the overall productivity. In this case, what are the key touch points where humans (the manager) need to intentionally “align” with the AI system (the executor) to ensure the interaction is effective?

Overview of the paper

To ensure an AI produces desired outcomes, without undesirable side effects (also termed “AI Alignment”), Terry et al. introduce 3 dimensions to consider as we address user interface challenges with AI systems: Specification alignment, Process alignment, and Evaluation support.

  • Specification alignment is the first step in human-AI interaction, where the user defines the desired outcome for the AI system to execute. In addition, the paper also points out the importance of specifying constraints (e.g. safe, cost-effective, aligned with human values). As an extreme example, consider the paperclip thought experiment, where an AI is tasked to produce as many paperclips as possible. The AI may eventually start destroying computers, refrigerators, or anything made of metal to make more paper clips, which is not aligned with how humans will achieve the goal.

  • Process alignment refers to providing the ability for users to view and/or control the AI’s underlying execution process. The paper proposes providing mechanisms that ensure (1) the user can understand how the system executes the task in ways that can be understood by humans (“means alignment”), and (2) give users the ability to modify these choices (”control alignment”).

  • Evaluation support is the final step where users validate that the AI’s output meets their goals. As AI becomes increasingly capable of difficult and complex tasks, a significant challenge is evaluating its outputs. The problem of evaluation can be further divided into two problems: (1) verifying the AI’s output correctly and completely fulfills the user’s intent and comprehension, and (2) understanding the AI’s output, with comprehension being a much more important problem to solve.

Personal notes

1\ Cognitive challenges with defining outcome. Counterintuitively, this step can be tricky because humans are not good at knowing or being able to describe what they want initially, especially for complex and high-consideration tasks. Considering human cognitive limitations, it’s important to account for the process for users to learn, then gradually understand and be able to describe their goal. This resembles a classic decision-making challenge when people shop. Although you know the goal is to buy a vacuum, you still need to go through the lengthy process of reading articles to learn about its major categories and functionalities and talking to friends and families before you know what you truly need and want. Similar to shopping research, the learning process is where we gradually build confidence in our judgment. Open question: how might we help users learn while maintaining efficiency in the process? One idea could be dynamic, personalized support for more or less explanation as users specify the requirement.

2\ Verifying interpretation upfront. One way to improve specification alignment for general-purpose AI is by providing the ability for users to verify and make necessary corrections to the AI’s interpretation of the intended outcomes before it proceeds. I love this direction because it resembles how real-life human collaboration works. Think about the manager and IC example, to ensure your project goal is aligned with what your manager has in mind (which could sometimes be under-specified or ambiguous), paraphrasing the requirement and sharing your plan of action beforehand helps confirm again that you and your manager are on the same page. Future research to understand (a) how real-life human collaboration and communication best practices can be applied for human-AI interaction, and (b) the right balance for efficiency vs. efforts for verification will be interesting to explore.

3\ Bridging the Process Gulf with a Surrogate Process. The paper introduces the concept of Process Gulf, as an extension of Norman’s concepts of the Gulfs of Execution and Evaluation, that highlights the gulf that can arise between a person and an AI due to the qualitatively different ways in which each produces an outcome. For example, a diffusion model for image generation transforms an image of statistical noise into a coherent image, an image creation process unfamiliar to most people. To bridge the Process Gulf, the paper proposes creating a simplified, separately derived, but controllable representation of the AI’s actual process, also termed a Surrogate Process. With a more accessible representation of the set of choices the AI needs to make in the process, the user can better intervene and guide the execution. Open question: since AI systems can be understood at many levels of abstraction, what’s the right level of explainability so that humans can easily understand and control how AI solves a problem?

4\ In-context evaluation and learning. Today, an AI tasked to recommend clothes you like would simply show you visuals of the clothes for at-a-glance evaluation. However, when the task becomes complicated, like creating code for an app, the AI system may provide comments, a natural language summary, or an architectural diagram of the code produced to help you evaluate. Future research: explore ways to provide simple, dynamic, and accessible explanations (e.g. visual, links to learn more) of the outcome produced would be useful for in-context evaluation and learning — it also assists with understanding the state of the problem after the AI performs some work, as the paper alluded to.

5\ Control mechanisms inspired by real-life tools. The importance of control mechanisms has been discussed extensively in the HCI community and I especially love the principles outlined in the People + AI Guidebook. When thinking about the appropriate levels of control, the common mechanism is providing parameters for a user to play with. For example, in Midjourney (a text-to-image model), users can adjust the “chaos” parameter to produce variations of the image. However, no support is currently provided to understand how a particular value will impact the generated images. Relatedly, as an interesting research exploration, PromptPaint provides users the ability to influence the text-to-image generation through paint medium-like interactions, using the paint palette metaphor to provide more control. As a result, it helps users specify their goals at greater granularity and gives users the ability to modify the choices involved as AI is producing the image. Future research: based on the specific task, what other real-life metaphors can be referenced as inspiration for control mechanisms (like pain palette for image generation)?

In Prompt Paint, the user can specify the area of generation with brushing (dark grey) with a prompt stencil. When the user completes brushing, the tool starts generating a part of the image while showing the process to the user (Chung and Adar, 2023)

6\ Interactive alignment for multi-users. The paper has been primarily discussing the user interface challenges and opportunities of a single user interacting with a single AI. As the paper alluded to, it would be useful to consider the alignment for interactions that include multiple parties, which introduces additional dimensions and complexity. For example, when an AI engaged in a music creation task involving two people. Future research: how would the alignment goals, processes, and dimensions evolve for a wider range of collaboration scenarios?

Thanks for reading

This post covers a broad set of themes in the AI alignment problem space. In upcoming HCI paper reviews, I’d love to explore specific use cases and verticals in the field. If you have any thoughts or suggestions, please leave a comment or get in touch!

Thanks to Bonnie Luo and Benjamin Yu for helpful discussions and feedback.

References

Jakob Nielsen. 2023. AI: First New UI Paradigm in 60 Years”. https://www.nngroup.com/articles/ai-paradigm. Accessed: 2024-03-01.

John Joon Young Chung and Eytan Adar. 2023. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. In Proceedings of UIST 2023. Association for Computing Machinery, New York, NY, USA, 17 pages. https://doi.org/10.1145/3586183.3606777

Michael Terry, Chinmay Kulkarni, Martin Wattenberg, Lucas Dixon, and Meredith Ringel Morris. 2023. AI Alignment in the Design of Interactive AI: Specification Alignment, Process Alignment, and Evaluation Support. arXiv 2023, arXiv:2311.00710.

Negotiation theory and human agency

Negotiation has been a term I mostly associated with business or politics in the past, involving intense debates and advocating for the interests of each party. However, I began to appreciate and explore this concept more intentionally since last year, when I was exposed to a more diverse set of collaboration scenarios. Then, I realized negotiation is everywhere and understanding its history, philosophy, and practice is important for thinking about how humans interact in a world of complexity. With a background in behavioral science, human-computer interaction, and design research, I began to see deeper connections between negotiation with each of these fields.

Okochi Sanso Garden, Kyoto 京都大河内山庄

Evolution of negotiation theory

Two notable milestones in negotiation literature are Getting to Yes (1981) by Fisher and Ury, and Never Split the Difference (2016) by Chris Voss. The former focuses on identifying interests and creating value for both parties, while the latter recognizes the emotional nature of negotiation and emphasizes the importance of building tactical empathy to gather information and influence the other party's thinking.

The shift from objectively identifying a win-win solution to challenging the idea of seeking a compromise is fascinating and counterintuitive at first. As the title Never Split the Difference suggests, Voss believes it’s better to not make a deal if compromise is involved. Instead, drawing from his experience as a former FBI hostage negotiator, he focused on uncovering Black Swans, which are hidden pieces of information that can change the course of a negotiation and push the other party towards a deal. This became his primary strategy for finding unconventional solutions.

This evolution in negotiation philosophy is an interesting parallel with the shift from classical economics to behavioral economics — both evolved to recognize the limitations of purely rational and utility-maximizing models. Similar to Never Split the Difference, behavioral economics shifts the focus from simplified, rational economic models to a more nuanced understanding of human behavior, which is shaped by emotions, biases, and heuristics.

Konchi-in Temple, Kyoto 京都金地院

Human agency at heart

People want to be heard, understood, and respected. In Never Split the Difference, building tactical empathy in negotiation means ensuring sufficient trust and safety for a real conversation to begin. Since change represents uncertainty and people want to be in control, saying no to a proposal is the easiest way to maintain that control and the status quo. This completely changed my perspective on the nature of negotiation because it’s ultimately about addressing fundamental human needs with psychological principles. It’s not just about fighting for individual interests, it’s much more about building connections, helping each other feel in control, and identifying creative solutions together.

Another memorable idea is that “Yes” has multiple layers (i.e. counterfeit, confirmation, and commitment), while “No” is the gateway to “Yes.” Saying “No” allows us time to pivot and adjust, creating an environment for the one “Yes” that mattered and gives us an opportunity to convince others that the proposed change is more advantageous than maintaining the status quo. Then, negotiation is the process of helping the other party feel protected and safe, so they can consider other possibilities with a relaxed mindset.

This also resembles the dynamics of how humans interact with technology, especially with AI systems. When systems (e.g. algorithms) collecting human input (e.g. data) without making people feel heard, respected, or in control, it becomes difficult to establish a genuine conversation (e.g. engagement). An effective feedback and control mechanism needs to account for human motivation and provide a clear incentive structure, so that the value and impact of input is meaningful. When considering human-computer interaction through the lens of human-machine negotiation, it’s interesting that we’re applying similar psychological principles to help individuals maintain their agency as foundational needs.

Practice of tactical empathy

When it comes to tactical steps to build tactical empathy or uncovering the black swan, the approach mentioned in Never Split the Difference shared a lot of similarities with user experience research moderation practices. Methods like asking calibrated question, focusing on discovery and uncovering insights, and active listening are all familiar to researchers. Although the relationship between a user and a researcher isn’t a negotiating one, the process and desired outcome is similar. Both the negotiator and the researcher aim to uncover insights about the other party to deeply understand their needs, so they can identify unconventional solutions or framing that change the course of the conversation or strategy.

Finally, the practice of emotional labeling reminds me of methods used in psychotherapy. It involves identifying and verbalizing the predictable emotions of a situation, which helps build empathy and insights for both parties. Once the emotion is labeled, we can talk about them without getting wound up because using language to objectify negative thoughts make them less frightening and disrupt their raw intensity.

Nanzen-ji Temple, Kyoto 京都南禅寺

At its core, negotiation is not about being competitive and skillful in applying complex methods or tactics. It is all about creating the right environment for genuine connection and conversation to begin.