In my previous article, What is Virtual Reality? [Definition & Examples], I examined the “what” and “who” of Virtual Reality. In this article, we’re going to look at the “how.”
How Virtual Reality works is counterintuitive. VR is unique in that it places the user inside of a digital content experience. VR displays require tailored content designed to work with stereo vision, so VR experience designers and developers can’t rely on their knowledge of what works for traditional desktop and mobile displays. In general, Virtual Reality calls into question the typical assumptions made by software designers, developers, and end users.
The most obvious difference between Virtual Reality and traditional displays is the way graphics are presented to a user. That is to say, the graphics are not automatically better or worse for Virtual Reality’s sake, but the presentation itself is fundamentally different from the flat experience that we’ve been relying on for decades.
Creating an illusion for the user, while allowing the user to participate in defying their perception of the world, is central to VR. Virtual Reality exists in the same way as any magic trick: if you can fool the senses, to quote Morpheus, “your mind makes it real.” Here’s how it happens.
A stereoscopic display is one component that separates Virtual Reality from the majority of non-VR systems. Stereoscopic displays present a different view of the virtual scene to each eye, in the same way that stereo headphones play different sounds for each ear. Stereoscopy is a powerful cue to the brain that certain objects are farther away than others. Combined with other depth cues, such as parallax (objects in the distance appear to move slower than objects close up), converging lines, and shading, a stereoscopic display can be employed to effectively create a sense of presence.
Two flat rectangles are not the end of the story for display technology. The projection of the world, in the mathematical sense, is fundamentally different when the screen is attached to a user’s head. When used with the lenses in a head-mounted display (HMD), barrel distortion provides an enormous field of view (FOV) that seems to stretch around a user’s head. Simultaneously, HMDs make it possible to place the display closer to the eye.
The nerves that line the retina of your eyes aren’t laid out in an evenly-spaced grid of rectangles. That is to say, they are not at all like pixels in a typical computer monitor display. Rather, they are log-polar. In other words, they arranged in a spiral around the center of your retina, where they’re densely-packed and become increasingly spread out as they move further away from the center. Unwrapping that spiral, a flat image becomes strikingly different, as seen below.
Left: A photo of a handsome, rugged man. Right: The same photo, unwrapped from log-polar form.
This means, among other things, that people are really good at seeing details they’re focusing on, while peripheral vision lacks considerable detail. In the future, technology such as foveated imaging will make it possible to render ultra-high-definition graphics where detail is needed, while allowing the computer to relax where it doesn’t matter as much.
Curiously, the log-polar layout in the human eye also means that people are good at recognizing scaled and rotated versions of images they’ve seen before. Scale and rotation are reduced to shifts under this projection, meaning that to the brain, there’s very little difference between seeing something close up and seeing something far away, or seeing something right-side-up, and seeing something tilted.
The mind is always grasping for the meaning of what it’s seeing. The same “wiring” that makes us good at recognizing what’s familiar makes it easy to be tricked, and VR takes advantage of that.
Several companies are already making use of flat screens inside their head-mounted displays. Oculus VR, maker of the recently-revealed Oculus Rift, uses a pair of organic light emitting diode or OLED panels in their headset. OLED panels are LED screens with a film made of organic compounds. While OLED technology has been used in devices like phones and computer monitors since the early-to-mid 2000s, the screens used in headset displays like the Oculus Rift have stricter requirements.
To use an OLED panel in a headset, the refresh rate (how often new images are drawn to the screen to simulate motion) must be much greater than a television, monitor, or phone screen. That is:
These factors are a fundamental shift from the familiar call for more detailed graphics. Subtle improvements in the way details are presented are important: when the presentation of details takes these factors into consideration, the experience is immersive. The display stops “existing” at all.
Lightfields, laser waveguides and holography
Other emerging display technology is poised to eclipse LED-based virtual reality. Displays used in the MagicLeap and Microsoft HoloLens are purported to use technologies like lightfields, laser waveguides, and holography to create a multi-focus 3D display, where users’ eyes can naturally focus on things close up or far away, rather than fixing on screens in front of their faces. These infinite depth of field displays, are a display counterpart to cameras like the Lytro. They contribute to suspension of disbelief in AR/VR experiences by allowing both Augmented Reality and Virtual Reality to behave much more like the human eye expects.
Virtual Reality just wouldn’t be Virtual Reality without the ability to look around. Tracking the user’s head position as they move is critical to maintaining the illusion of a 360-degree world.
Tracking a user’s motion is nothing new in the world of computer technology. If you think about it, your mouse is already a device for tracking hand motion, albeit a very simple device compared to an HUD like the Oculus Rift. A mouse could be described in engineering terms as a “two degrees of freedom” tracking device, since you can move it left-to-right and front-to-back across a desk (or, to use the geometric language, along the x and y axes).
Most virtual reality headsets have at least three degrees of freedom: you can “pitch” your head by tilting up and down, “roll” your head from shoulder to shoulder, or “yaw” your head from side to side (like you’re saying “no, I don’t want to take the headset off yet”). A few even let you move around a physical room in three dimensions, for a total of six degrees of freedom.
Tracking someone or something moving around in a combination of six different ways is much more complicated than just the two a mouse records. One device for recording the kind of information needed for head-tracking is called an inertial measurement unit (IMU).
An IMU uses an accelerometer, a device like a miniature weight on springs for measuring forces in three dimensions, to record linear movement and gravity. With the accelerometer, a gyroscope is used to measure angular movement. Finally, a magnetometer – like a three dimensional compass – gives the orientation of the IMU relative to the earth. Combining the measurements of the different devices in a process called sensor fusion, an IMU can estimate how a device is rotated or moved over time.
Most smartphones have an IMU: they’re used to allow the screen contents to rotate when the phone is turned on its side, or to allow the phone to act as a compass and level in a pinch. Headsets use an IMU (or more than one, in some cases) to estimate not just the device position, but also the position of the user’s head in a virtual world.
IMUs have a few shortcomings. Without a fixed point of reference, they tend to “drift.” This means that the real world stays put, but the virtual world slowly changes orientation as the small tracking errors add up.
Some devices, including the Oculus, improve their accuracy greatly by using computer vision to track the headset from nearby cameras. Markers that are easy for a computer to see, called fiducial markers, may use infrared light or reflectors to give an absolute measurement of a user’s position to the Virtual Reality system.
Some AR and VR devices are using an increasingly sophisticated scheme for tracking. Instead of cameras pointed at the device, cameras and other sensors on the device watch the world outside the headset, an approach borrowed from robot navigation. These systems look for easily-identifiable features of the environment, in order to both build a 3D map of the world and also locate (or localize) the camera within the ever-changing map. In this way, simultaneous localization and mapping (SLAM) allows the world to act as a target for tracking. While SLAM is still far from perfect, in the future, it could be a significant part of melding the virtual and the real world.
Marxent Congrats to @Robern on the Robern Room Designer! Built on Marxent's award-winning 3D Cloud™ platform, the Robern Ro… https://t.co/DAFL8EOYnu
Marxent Is Amazon No. 1 or No. 2 in furniture now? Via @FurnitureToday: https://t.co/XFtRnLfGSk
Marxent Congrats on the well-deserved recognition! https://t.co/llADxuf4A9