Apple Misunderstands Spatial Computing and an Alternative Vision

July 31st, 2024

I recently had the opportunity to demo the Apple Vision Pro (AVP) at the Apple Store. Here I want to share my reservations about Apple’s Spatial Computing strategy broadly and how I see the space playing out in the coming years.

This is not meant to be a review or “initial impressions” type essay of the product as there are so many out there already, and I have only tried the product a brief time so I couldn’t add anything valuable to the space. But to get it out of the way, here they are in a few bullet points.

The light bleeding in from the bottom was somewhat distracting
The device did feel a little heavy for me
Sound is impressive considering there’s nothing worn on the ears
Gesture navigation works well

From what I read online, people find that the best use case for these devices are for watching movies and as a big external monitor for their Mac. I really don’t think the tradeoffs are worth it for this kind of a use case. Having a larger monitor is an incremental (n+1) improvement for a tradeoff that is more than incremental, I.e. having a physical barrier between you (at least as far your eyes are concerned) and your immediate surroundings.

As such, I can’t help but think that Apple has really missed the mark here. Of course there will be those defending the device, saying that it’s first generation product and that it’ll get better over time, but my concerns are that they’ve gotten the foundation of the operating system and user interface wrong, and that is not something that can be recovered from so easily.

To provide some background, let’s take a look at the history of computing. Apple and Windows succeeded in personal computers leaving behind IBM in part, because they recognized the importance of the keyboard and mouse graphical user interface while the incumbents thought terminals were sufficient. Similarly, Apple won in the smartphone era because they recognized that a touchscreen was the optimal interface, while Blackberry, Nokia and others were attached to physical buttons. I believe we are seeing the same thing play out as we transition to the spatial computing era. The user interface for the AVP is not very different from MacOS, and the different experiences it offers are built around apps which are straight out of iOS.

I believe this approach fundamentally misunderstands the value proposition of spatial computing and committing to a flawed approach at such a base layer will make it very difficult for Apple to pivot and catch up when someone eventually creates a product with a user interface that capitalizes on the strength of this modality.

2.5D Design

In creating a new computing paradigm, it is necessary to use skeuomorphic design to lead people gently into learning and adopting new behaviors. In desktop we see this in the form of elements such as folders that contain our files and numerous icons that are reminiscent of physical hardware counterparts. Apple tried to do this for the AVP by creating a 2.5D (2D but with slight depth and shadows particularly between a subject and the background) interface, even though the hardware is capable of fully immersive 3D. The best of the AVP’s capabilities are showcased in their “immersive experiences” in which one finds oneself fully immersed in a 3D world with such incredible fidelity that it feels quite realistic. The demo in which I was next to African(?) kids playing soccer and right in front elephants mud bathing was so compelling that it actually quenched my insatiable travel itch in a way that no technology has been able to before. Ben Thompson has written about being amazed using the AVP in which he was able to sitting at a court side seat at a basketball game. If the best of VR is experienced as a 3D world, then why would Apple dumb it down so far as to make it a glorified Mac screen?

Apple has made it clear in their promotional content that they hope the AVP will be a product that people can use while not totally cutting them off from the outside world, hence why they put an outward facing screen that displays disconcerting eyes displayed on them for others to see. So one could rationalize their decision as being one to create an interface in which the user feels that they’re more or less engaging in the world as they would if they were not wearing a headset. However, as much as Apple would like to have you believe the AVP is not a VR device, the simple truth of the matter is that when you are using it, your sense of vision is entirely mediated by Apple’s hardware and software. You give up your own vision for “Apple Vision” in other words. When you consider the tradeoff in that way, in my opinion, the opportunity to have a larger workspace for my Mac to is much less compelling.

Alternative Interface

It’s easy to be critical of an experimental new device, and less so to be constructive on what an alternative would look like, so I’d like to give a shot at that. I believe the primary element of interface in VR would be the AI “butler” that is trained on your data and generatively overlays 3D skeuomorphic elements on top of your physical surroundings depending on the context of where you are, your circumstances, etc. and collaborates with you on fulfilling your aims. For example creating a virtual photo album on your desk that you can flip through until you find the photo you want to work on, at which point you can expand it and make the desired changes through conversation with the AI and your gestures. Menus from the desktop PC era are not a good fit for VR.

It may seem unfair to paint a lofty vision when unburdened by the engineering and cost limitations, but I hope this exercise helps to illustrate what I mean by Apple setting a bad foundation relative to where this technology is headed. In order to better understand this form factor, and what it may be most useful for it may be good to look at a similar emerging product, smart glasses.

Considerations of AI and Comparisons to Smart Glasses

Apple does not make smart glasses, but its biggest competitor in this space, Meta, does, so I think it is instructive to compare these products. They don’t project visuals onto our field of view, they are simply a tool for providing first person vision for your AI assistant. Yes, they can take photos and videos, but as of now that’s not the biggest consideration because the quality relative to smartphones is bad, which we always have with us. ChatGPT is incredibly useful, but it is severely hampered by its lack of access to the overall context of who its user is, the particular situation in which he is using the AI, and his broader aims in life (not that many people know themselves what their aim in life is). ChatGPT is working at this by introducing the ability to retain history and an omni-modal model. This aspect is particularly valuable at a time when AI companies are expressing concern about the lack of quality training data, and for the user it is even more useful because the training data would be about what is most useful to them i.e. their own lives. Whereas the AVP displays the world to the user through an Apple filter, smart glasses share the user’s vision with their AI butler and gives it the context of the user’s life in order to unlock the next level of usefulness.

AI needs to be able to generate 3D environments on the fly with high fidelity in order to unlock its true potential. The angle of the personalized AI butler is also important, but much of that is a matter of being able to adequately address concerns of AI safety and privacy of personal information.

Smart glasses are under appreciated, and Apple would have better served by attempting this first before releasing the AVP which has impressive specs, but lacks the AI capabilities to be useful as a creative tool. As it is, it is doubtful Apple can even attract developers to build on its platform. Apple has the remarkable distinction of being one of the most trusted tech hardware companies, but it feels like they have given up on the mission of making computing more personal.

Whereas ChatGPT primarily interacts with users by talking to them, the AVP could have been the tool that enables your AI to show, not tell you. If a picture is worth a thousand words, then how many words is it worth to be immersed in an interactive 3D environment?

Subscribe to cywintam

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

dY0quyLOlP3x6Xf…iYQLv0m4tw_0ruM

Author Address

0x69F94710970A350…d3e49eb69F7F9FD

Content Digest

Xrcj9H6ZLwiZmby…EpLg61HZPsyas2w