My team is in the process of an AR experience that requires simultaneous hand tracking and image tracking functionalities. The core idea is to enable users to interact with a 3D object overlaid on a real-world item, specifically a t-shirt with a logo. Here’s the envisioned workflow:
The user activates the camera within the application.
The camera identifies and tracks a specific logo on the t-shirt the user is wearing, using image tracking techniques.
Upon successful tracking, a 3D object is rendered in AR, appearing to emerge from the tracked logo on the shirt.
The user can then physically interact with this 3D object using their hand. The goal is for the user to be able to “lift” the 3D object off the shirt and hold it in their hand.
For this interaction to be seamless, both hand tracking and image tracking need to operate concurrently. This dual tracking enables the application to detect when the user’s hand is in close proximity to the 3D object, triggering the logic that allows the 3D object to follow the user’s hand movements.
To achieve this, I require assistance in implementing the concurrent tracking of both elements (image and hand). Specifically, I am looking for sample code on how to achieve this. The primary challenge is ensuring that the hand tracking and image tracking processes do not interfere with each other and can run in parallel, allowing for real-time interaction and engagement with the 3D object.
Sounds like a really fun idea! At this time, unfortunately hand tracking (xrhand)and world effects (xrweb, which provides both SLAM tracking and Image Tracking) cannot be used at the same time. However, you could switch between the two.
You might consider a slightly different experience where you “transition” from one type of tracking to the other. i.e.
Experience starts using image tracking only
Image is detected and 3d model appears
xrweb component is removed (to stop world & image tracking), and xrhand is added (to start hand tracking)
Hand is detected and 3d model is now anchored to the hand
You could use a similar strategy as this sample project which switches between face tracking vs world tracking
Supporting simultaneous image + hand tracking is a feature requested by others as well - I’ve passed your feedback along to our product team so this may be considered for future implementation.
Thanks, Tony! Is there a way to ensure the transition is smooth and avoids causing any delay in the UI or displaying a loading spinner? Using a loader disrupts the seamless experience we’re trying to create.
Can this limitation be mentioned in an easier to find location, and an error thrown when a user attempts to do both simultaneously?
It’s a quick line hidden inside the examples, and it’s possible to run code that both does image tracking and hand tracking (where the hand tracking’s rotation is accurate, but the position and scale are off), so it can seem like the issue with hand tracking accuracy is due to the user’s code and not to the engine’s limitations.
@hbiz when you stop image tracking (by removing xrweb) and switch to hand tracking (by adding xrhand the AR Engine is stopped and restarted in this new mode - the camera feed will stop during that time, etc, and re-initialize. So there is going to be some interruption time while things re-initialize, and at a minimum you’ll likely want to mask that with perhaps a black
that fades in/out.
@Arthur_Sangouard great feedback - if you are using AFrame, the xrweb and xrface components will throw and error if you try to run both at the same time. But apparently the underlying pipeline modules don’t do this same check. I’ll pass along this feedback to the product team, as well as see where we can reference these limitations in more places in our documentation.