Apple’s ARKit SDK has been a big hit among developers, including those here at Marxent. The software is stabile, well-designed, and makes it much easier to build Augmented Reality applications for the iOS platform. But if there’s one knock on ARKit, it’s that the system takes a relatively long time to detect a surface plane and spawn 3D images.

“How annoying is ARKit’s surface detection for users?” asked an October column posted to AR Critic. The answer, according to AR Critic, is “really annoying.” So much so that the start-up process can turn users off before they even get to the core of an AR experience. But there’s a solution — instant initialization — which is made possible by pairing ARKit with Marxent’s proprietary tracking system, called MxT.

I talked to Marxent’s resident PhD, Dr. Ken Moser — the big brain behind MxT and its insanely fast start-up process — about how AR tracking works, what makes Marxent’s solution unique, and why ARKit is just better when paired with MxT. Ken was (as always) incredibly gracious with his time and his answers. Here’s our conversation:

Question: Can you briefly explain AR tracking? What is it? How does it work?

Ken Moser: Tracking, in the technical sense, is the localization of one (or both) of the translational offset (X, Y, Z) and the Rotational orientation (Roll, Pitch, Yaw) of an object, with respect to some origin point. In terms of Augmented Reality, the object in question is predominantly taken to be a video camera, or mobile device with embedded camera. This is the camera through which the augmentations are viewed.

Are there different types of AR tracking?

The technologies and techniques for performing AR tracking vary widely in both performance, cost to implement, coverage area, and accuracy. Professional grade tracking systems use state of the art infrared cameras to track arrangements of rigidly mounted retro-reflective spheres, referred to as “constellations” or “fiducials.” These systems are usually quite expensive, especially when used to cover large areas. Some notable purveyors of such systems are Optitrack and ART (Advanced Real-time Tracking). These systems are not practical for general consumer use though.

But there’s thousands of consumer-facing AR apps in the App Store, and AR apps have been around for years, right?

When it comes to using AR at home, it’s been the case for basically the last decade that tracking of the mobile device was done through 2D marker tracking. The fiducial in this case is a 2D image of predetermined size, which a mobile device is able to identify in its video feed thanks to computer vision algorithms. The downside of 2D marker tracking is, of course, that the camera has to keep the marker in sight during the entire tracking session. This means that the coverage area is limited by how much of the 2D image you can keep in sight. Multiple images can be used in close proximity to one another to extend the area, but a clear line of sight to at least one the images is required at all time.

But that’s no longer the case?

Since ARKit (and ARCore) were announced last year, a new era of AR tracking has emerged for home use. These new tracking libraries still provide the translational and rotational pose of the mobile device, but they do so through “markerless” tracking approaches. That is, they are able to startup and track in any arbitrary space without any pre-knowledge of the surroundings. These markerless tracking libraries use what is known as a SLAM (Simultaneous Localization and Mapping) approach. This simply means that the device both builds a “map” of the surroundings and also localizes itself with respect to this map during run time.

What’s the “special sauce” that makes ARKit work?

Apple has described ARKit as utilizing “Visual Odometry,” which is a fancy way of saying it uses the video to determine how much it has moved. ARKit, of course, also makes use of the motion coprocessor chips within the iOS devices to provide high precision accelerometer and gyroscopic data as well. You can test this out in the ARKit apps you may have on your device by simply covering up the camera while looking at an AR object. With the camera covered up, rotate and translate the device. This will let you see how the internal sensors alone are used to aid in the tracking.

The benefits of these new markerless tracking libraries are far greater than any of the 2D marker-based solutions of the past. The effective tracking area is much larger, possibly even infinite. What’s more, these tracking methods are able to provide positional updates in real world scale, making it much easier for developers to add life size and life like augmentations to novel applications.

How would you rate the tracking of Apple’s ARKit SDK?
ARKit, even as version 1.0, has very good tracking capabilities. The tracking itself starts quickly, and the accumulated error, while still present, is relatively small as long as you stay within a reasonable space.

I feel a “but” coming on …
One predominant hangup does exist with ARKit and ARCore, and that’s how they provide information about where the ground is. While both ARKit and ARCore may start tracking pretty quick, they start without any knowledge of where the ground actually is. In order to be able to accurately place 3D images on the floor, the floor itself must first be identified. ARKit calls this “finding an anchor,” and it’s able to do this while the application is running — but it requires the user to look at the floor and move the device, preferably in a horizontal translational motion, so that it is able to pick-out clusters of features that it thinks all lie on the same horizontal plane.

Sounds complicated. Then what?

Once it’s picked out enough points, ARKit identifies it as an anchor, and the application can then use this anchor information to place augmentations onto this plane. Unfortunately, ARKit does not provide any easy or intuitive out of the box methods for guiding a user of your app through the process of manually identifying a floor anchor. And, in fact, it provides no feedback as to how close or how well the process of identifying a floor anchor is progressing.

That slow anchor scanning is causing concern, right?

This has actually been a real problem for many first-time users of Augmented Reality, who can quickly become frustrated with what may seem like poor tracking leading to bad user experiences. Savvy application developers can incorporate additional user interfaces to help guide the user through the process of scanning the floor for an anchor, but without the ability to update them on how close they are to completing the process, or how well their floor is being tracked, it is inevitable that many users will deem the experience not worth their time and will most likely not return for another try.

Marxent’s MxT tracking, which you built, initializes almost instantly, correct? Are you a wizard?

Marxent’s MxT tracking does indeed provide a near instantaneous markerless tracking experience for placing objects on the floor without the need for an initial scanning of ground anchors. The user simply points their device and tracking starts — as simple and frustration free as it could get.

I note that you didn’t answer the wizard question. I’ll put you down for a yes. Moving on, what’s the difference between MxT and ARKit?

Fundamentally, MxT and ARKit use completely different tracking methods. As mentioned earlier, ARKit is a visual SLAM tracking experience that is able to provide device translation and rotation movement in real world scale. That is, if you move the device a meter, then it can report that you’ve moved, approximately, a meter. It is, of course, fallible, and there will be some error that will compound over time, but may be mitigated with some corrections.

MxT is a relative tracking approach, meaning that the scale of the tracking space is not absolute with respect to the world, but is based on an estimation about how the user is standing or sitting and using the application. MxT is best suited for short AR experience, such as visualizing home furnishing quickly, within a relatively small space. Since it requires no action from the user to start placing objects on the ground, the experience affords for rapid viewing of augmentations.

How does Marxent marry MxT to ARKit in an app?

Surprisingly enough, it’s not difficult to add the benefits of the MxT “Instant Start” to an ARKit experience. We’ve actually already prepared an SDK that allows an application to leverage the MxT relative tracking to instantly begin placing augmentation on the floor and in the room around the user. In the meantime, the ARKit anchor finding processes are still running, and once a ground anchor has been identified, the application can seamlessly transition from MxT to ARKit tracking without the user being any the wiser.

Are there benefits to combining MxT and ARKit beyond just the initialization?
Instant start is definitely the biggest technical benefit. This also comes with additional gains in usability for those new users wanting to try out AR for the first time. Being able to instantly place objects removes the biggest headache with the current slew of ARKit apps, which is the need to have unsuspecting consumers swing their device around or spend more time guiding them through an anchor-scanning experience then they will actually using the app for viewing products in their room. This ease of use of MxT instant start makes it far more likely that the user will return for another session.

Thanks Ken! I really appreciate all the info.
My pleasure.