Develop

Select your platform

Face Tracking in Movement SDK for OpenXR

This topic provides:

An overview on Natural Facial Expressions,

Policies,

And a usage guide for the Face Tracking OpenXR extension

What Is Face Tracking?

Face Tracking for Meta Quest Pro relies on inward facing cameras to detect expressive facial movements. For devices without inward facing cameras, such as Meta Quest 2 and Meta Quest 3, Face Tracking relies on audio from the microphone to estimate facial movements. These movements are categorized into expressions that are based on the Facial Action Coding System (FACS). This reference breaks down facial movements into expressions that map to common facial muscle movements like raising an eyebrow, wrinkling your nose, etc. Some common facial movements might be represented by a combination of multiple of these movements. For instance a smile could be a combination of both the right and left lip corner pullers around the mouth as well as the cheek moving or even the eyes slightly closing. For that reason, it is common to blend multiple motions together at the same time. To represent this effectively in VR or AR, the common practice is to represent these as blendshapes (also known as morph targets in some tools) with a strength that represents how strong the face is expressing this action. This API conveys each of the facial expressions as a defined blendshape with a strength that indicates the activation of that blendshape. These blendshapes can then be interpreted directly to tell if the person has their eyes open, is blinking, or smiling. They can also be combined together and retargeted to a character to provide Natural Facial Expressions.

Policies and Disclaimers

Your use of the Face Tracking API must at all times be consistent with the Oculus SDK License Agreement, the Developer Data Use Policy and all applicable Oculus and Meta policies, terms and conditions. Applicable privacy and data protection laws may cover your use of Movement, and all applicable privacy and data protection laws.

In particular, you must post and abide by a publicly available and easily accessible privacy policy that clearly explains your collection, use, retention, and processing of data through the Face Tracking API. You must ensure that a user is provided with clear and comprehensive information about, and consents to, your access to and use of abstracted facial expression data prior to collection, including as required by applicable privacy and data protection laws.

Please note that we reserve the right to monitor your use of the Face Tracking API to enforce compliance with our policies.

When a user enables Natural Facial Expressions for your app, your app is granted access to real time abstracted facial expressions data which is user data under the Developer Data Use Policy. This data is only permitted to be used for purposes outlined in the Developer Data Use Policy. You are expressly forbidden from using this data for Data Use Prohibited Practices. The Natural Facial Expressions feature is powered by our Face Tracking API.

XrFace Sample App

Build XrFace Sample App

Download the Oculus Mobile OpenXR SDK (v47 or later) and then build the XrFace sample application with:

adb uninstall com.oculus.sdk.xrface
cd XrSamples/XrFace/Projects/Android
../../../../gradlew installDebug

Using XrFace Sample App

As a user, when you open the sample app, you will see a world-locked table with entries of the 70 blendshape weights, two upper and lower face confidence values, and two validity flags corresponding to all the blendshapes and eye look blendshapes respectively. Data source represents from which sensor data face tracking is being estimated. Visual means that facial movements are being estimated based on inward facing cameras and optional microphone data. Audio means that facial movements are being estimated by microphone data only.

You can move your face regions such as mouth, cheeks, eyes, etc. and notice weights of the corresponding blendshapes triggered/change. You can also observe the confidence in face tracking regions change with your facial movements.

The Face Tracking Extension

XR_FB_face_tracking2 introduces an extension to provide the output of face. It takes as input the images from custom sensors and outputs the blendshape weights corresponding to action in different facial regions. We highly encourage choosing XR_FB_face_tracking2 over a deprecated XR_FB_face_tracking extension, since XR_FB_face_tracking extension doesn’t support tongue tracking and audio-driven face tracking. If you are interested in old XR_FB_face_tracking extension, please visit Khronos OpenXR Registry⁠ for more details.

For full API reference, go to API Reference.

Permissions

For using face tracking, the app must declare the following permissions in the Android manifest. You also need to ask RECORD_AUDIO permission, if you want facial movements to be estimated by audio from the microphone.

<manifest xmlns:android="http://schemas.android.com/apk/res/android" >
   <!-- Tell the system this app can handle face tracking -->
   <uses-feature android:name="oculus.software.face_tracking" android:required="true" />
   <uses-permission android:name="com.oculus.permission.FACE_TRACKING" />

   <!-- Tell the system this app can use audio for face tracking -->
   <uses-permission android:name="android.permission.RECORD_AUDIO" />

   <!-- Tell the system this app can handle eye tracking -->
   <uses-feature android:name="oculus.software.eye_tracking" android:required="true" />
   <uses-permission android:name="com.oculus.permission.EYE_TRACKING" />

  ....
</manifest>

Important: Permissions for eye and face tracking are separated, so users may grant face tracking permission, yet deny eye tracking permission. Denying eye tracking permission prevents eye look blendshapes from being tracked.

The com.oculus.permission.EYE_TRACKING, com.oculus.permission.FACE_TRACKING, and android.permission.RECORD_AUDIO permissions are runtime permissions, so the application should explicitly ask the user to grant permission. For details about permissions, see Runtime permissions⁠. The following example demonstrates how this can be handled.

  private static final String PERMISSION_FACE_TRACKING = "com.oculus.permission.FACE_TRACKING";
  private static final String PERMISSION_EYE_TRACKING = "com.oculus.permission.EYE_TRACKING";
  private static final String PERMISSION_RECORD_AUDIO = "android.permission.RECORD_AUDIO";
  private static final int REQUEST_CODE_PERMISSION_FACE_AND_EYE_TRACKING = 1;

   @Override
  protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);

    requestFaceAndEyeTrackingPermissionIfNeeded();
  }

  private void requestFaceAndEyeTrackingPermissionIfNeeded() {
    List<String> permissionsToRequest = new ArrayList<>();
    if (checkSelfPermission(PERMISSION_EYE_TRACKING) != PackageManager.PERMISSION_GRANTED) {
      permissionsToRequest.add(PERMISSION_EYE_TRACKING);
    }
    if (checkSelfPermission(PERMISSION_FACE_TRACKING) != PackageManager.PERMISSION_GRANTED) {
      permissionsToRequest.add(PERMISSION_FACE_TRACKING);
    }
    if (checkSelfPermission(PERMISSION_RECORD_AUDIO) != PackageManager.PERMISSION_GRANTED) {
      permissionsToRequest.add(PERMISSION_RECORD_AUDIO);
    }

    if (!permissionsToRequest.isEmpty()) {
      String[] permissionsAsArray =
          permissionsToRequest.toArray(new String[permissionsToRequest.size()]);
      requestPermissions(permissionsAsArray, REQUEST_CODE_PERMISSION_FACE_AND_EYE_TRACKING);
    }
  }

OpenXR Initialization

Before the app gets access to functions of a specific OpenXR extension, you must create the OpenXR session and enable the required OpenXR extension. That part of the application is common for all extensions.

During initialization, you can create the following set of objects, which will be shared between all OpenXR extensions of the app:

XrInstance instance;
XrSystemId system;
XrSession session;
XrSpace sceneSpace;

For details, see the SampleXrFramework/Src/XrApp.h header.

Process is described in OpenXR specification:

Instance: https://registry.khronos.org/OpenXR/specs/1.0/html/xrspec.html#instance⁠

System: https://registry.khronos.org/OpenXR/specs/1.0/html/xrspec.html#system⁠

Session: https://registry.khronos.org/OpenXR/specs/1.0/html/xrspec.html#session⁠

Spaces: https://registry.khronos.org/OpenXR/specs/1.0/html/xrspec.html#spaces⁠

All that initialization is implemented in SampleXrFramework/Src/XrApp.cpp.

Enabling the extension

All extensions should be explicitly listed for creating an XrInstance:

std::vector<const char*> extensions;

XrInstance instance = XR_NULL_HANDLE;

    XrInstanceCreateInfo instanceCreateInfo = {XR_TYPE_INSTANCE_CREATE_INFO};
    ....
    instanceCreateInfo.enabledExtensionCount = extensions.size();
    instanceCreateInfo.enabledExtensionNames = extensions.data();

    ....
    OXR(initResult = xrCreateInstance(&instanceCreateInfo, &instance));

For details, see SampleXrFramework/Src/XrApp.cpp.

Setting Up

The following sections provide instructions on Face Tracking extension set up.

Include Headers

In your source code, include the following headers for face tracking.

   #include <openxr/openxr.h>

Initialize OpenXR

Prior to using face tracking, you must initialize an OpenXR session and enable the extension. For details about session initialization, read Creating Instances and Sessions.

You must initialize the OpenXR extension once and share it between all calls to the OpenXR API. If you do it successfully, you will have the following data:

    XrSession Session;
    XrSpace StageSpace;

For details, see the SampleXrFramework\Src\XrApp.h header.

We encourage you to use the constant XR_FB_FACE_TRACKING2_EXTENSION_NAME as an extension name.

Known issue

Although only one extension will be used, you must enable XR_FB_EYE_TRACKING_SOCIAL_EXTENSION_NAME with XR_FB_FACE_TRACKING2_EXTENSION_NAME, otherwise eye-related blendshapes EYES_LOOK_* will not be provided.

Check Compatibility

You must check if the user’s headset supports face tracking. For a given XrInstance, you must receive the system properties through calling the xrGetSystemProperties⁠ function to validate this.

To do so, use the XrSystemFaceTrackingProperties2FB struct that describes if a system supports eye tracking. Its definition follows.

typedef struct XrSystemFaceTrackingProperties2FB {
    XrStructureType    type;
    void* XR_MAY_ALIAS next;
    XrBool32           supportsVisualFaceTracking;
    XrBool32           supportsAudioFaceTracking;
} XrSystemFaceTrackingProperties2FB;

For details about this struct, see XrSystemFaceTrackingProperties2FB.

The following example demonstrates how to validate face tracking support.

    XrSystemFaceTrackingProperties2FB faceTrackingSystemProperties{
        XR_TYPE_SYSTEM_FACE_TRACKING_PROPERTIES2_FB};
    XrSystemProperties systemProperties{
        XR_TYPE_SYSTEM_PROPERTIES, &faceTrackingSystemProperties};
    OXR(xrGetSystemProperties(GetInstance(), GetSystemId(), &systemProperties));
    if (faceTrackingSystemProperties.supportsAudioFaceTracking ||
        faceTrackingSystemProperties.supportsVisualFaceTracking) {
        // face tracking is supported!
    }

If the supportsAudioFaceTracking field of the XrSystemFaceTrackingProperties2FB struct returns true, audio-driven face tracking is supported. If supportsVisualFaceTracking field returns true, the device supports face tracking using inward facing cameras.

Acquire Function Pointers

To create the face tracker, you must retrieve links to all the functions in the extension before usage. For details, read xrGetInstanceProcAddr⁠ in the OpenXR spec. The following example demonstrates how to do this.

    PFN_xrCreateFaceTracker2FB xrCreateFaceTrackerFB_ = nullptr;
    PFN_xrDestroyFaceTracker2FB xrDestroyFaceTrackerFB_ = nullptr;
    PFN_xrGetFaceExpressionWeights2FB xrGetFaceExpressionWeightsFB_ = nullptr;

    OXR(xrGetInstanceProcAddr(
        GetInstance(),
        "xrCreateFaceTracker2FB",
        (PFN_xrVoidFunction*)(&xrCreateFaceTrackerFB_)));
    OXR(xrGetInstanceProcAddr(
        GetInstance(),
        "xrDestroyFaceTracker2FB",
        (PFN_xrVoidFunction*)(&xrDestroyFaceTrackerFB_)));
    OXR(xrGetInstanceProcAddr(
        GetInstance(),
        "xrGetFaceExpressionWeights2FB",
        (PFN_xrVoidFunction*)(&xrGetFaceExpressionWeightsFB_)));

Using the Face Tracking Extension

Creating a Face Tracker

To create the face tracker, you must call an XrFaceTracker2FB handle to a face tracker. To create and obtain an XrFaceTracker2FB handle to a face tracker, you must call the xrCreateFaceTracker2FB function, defined as:

XrResult xrCreateFaceTracker2FB(
   XrSession session,
   const XrFaceTrackerCreateInfo2FB* createInfo,
   XrFaceTracker2FB* faceTracker);

For details, see xrCreateFaceTracker2FB. The following example demonstrates how to use it.

    XrFaceTracker2FB faceTracker_ = XR_NULL_HANDLE;

    XrFaceTrackerCreateInfo2FB createInfo{XR_TYPE_FACE_TRACKER_CREATE_INFO2_FB};
    createInfo.faceExpressionSet = XR_FACE_EXPRESSION_SET2_DEFAULT_FB;
    createInfo.requestedDataSourceCount = 2;
    XrFaceTrackingDataSource2FB dataSources[2] = {
        XR_FACE_TRACKING_DATA_SOURCE2_VISUAL_FB,
        XR_FACE_TRACKING_DATA_SOURCE2_AUDIO_FB};
    createInfo.requestedDataSources = dataSources;

    OXR(xrCreateFaceTracker2FB_(GetSession(), &createInfo, &faceTracker_));

For more details about faceExpressionSet and requestedDataSources, see XrFaceTrackerCreateInfo2FB.

Only one instance of the face tracker is allowed per process and multiple calls to this function will return the same handle. The handle is unique per process.

Important: For this call to succeed, your apps must request the com.oculus.permission.FACE_TRACKING permission in their manifest and a user must grant this permission.

Face tracking blendshape data will be available through calling the xrGetFaceExpressionWeights2FB immediately upon return of this call, as seen in the next section.

Retrieving Facial Expression/Blendshape Weights

You must first allocate memory to obtain the weights and confidences for the 70 blendshapes that are tracked by the face tracker at a given point in time. For this purpose, use the XR_FACE_EXPRESSION2_COUNT_FB and XR_FACE_CONFIDENCE2_COUNT_FB enums.

    float weights_[XR_FACE_EXPRESSION2_COUNT_FB] = {};
    float confidence_[XR_FACE_CONFIDENCE2_COUNT_FB] = {};

To retrieve facial expression weights, you must call the xrGetFaceExpressionWeights2FB function. This function obtains the weights and confidences for the 70 blendshapes that are tracked by the face tracker at a given point in time. Its definition follows:

XrResult XRAPI_CALL xrGetFaceExpressionWeights2FB(
   XrFaceTracker2FB faceTracker,
   const XrFaceExpressionInfo2FB* expressionInfo,
   XrFaceExpressionWeights2FB* expressionWeights);

For details, see xrGetFaceExpressionWeights2FB.

The XrFaceExpressionInfo2FB struct is a xrGetFaceExpressionWeights2FB function parameter that describes the time at which face expressions are being requested. Callers should request a time equal to the predicted display time for the rendered frame. The system will return the value at the closest timestamp possible to the requested timestamp. The timestamp of the estimation is always provided, so that the caller can determine to the extent the system was able to fulfill the request. The system will employ appropriate modeling to provide expressions for this time. The definition of the XrFaceExpressionInfo2FB struct follows.

typedef struct XrFaceExpressionInfo2FB {
   XrStructureType type;
   const void*     XR_MAY_ALIAS next;
   XrTime          time;
} XrFaceExpressionInfo2FB;

For details, see XrFaceExpressionInfo2FB.

The XrFaceExpressionWeights2FB struct is a xrGetFaceExpressionWeights2FB function parameter that contains arrays describing the face tracking blendshape weights and confidences. Its definition follows.

typedef struct XrFaceExpressionWeights2FB {
    XrStructureType             type;
    void* XR_MAY_ALIAS          next;
    uint32_t                    weightCount;
    float*                      weights;
    uint32_t                    confidenceCount;
    float*                      confidences;
    XrBool32                    isValid;
    XrBool32                    isEyeFollowingBlendshapesValid;
    XrFaceTrackingDataSource2FB dataSource;
    XrTime                      time;
} XrFaceExpressionWeights2FB;

For details, see XrFaceExpressionWeights2FB.

The following example demonstrates how to call the xrGetFaceExpressionWeights2FB function.

    XrFaceExpressionWeights2FB expressionWeights{XR_TYPE_FACE_EXPRESSION_WEIGHTS2_FB};
    expressionWeights.next = nullptr;
    expressionWeights.weights = weights_;
    expressionWeights.confidences = confidence_;
    expressionWeights.weightCount = XR_FACE_EXPRESSION2_COUNT_FB;
    expressionWeights.confidenceCount = XR_FACE_CONFIDENCE2_COUNT_FB;

    XrFaceExpressionInfo2FB expressionInfo{XR_TYPE_FACE_EXPRESSION_INFO2_FB};
    expressionInfo.time = GetPredictedDisplayTime();

    OXR(xrGetFaceExpressionWeights2FB_(faceTracker_, &expressionInfo, &expressionWeights));

Weights will be in the array allocated by the app. The following example demonstrates how to reference them.

    for (uint32_t i = 0; i < XR_FACE_EXPRESSION2_COUNT_FB; ++i) {
        // weights_[i] contains one specific weight
        ....
    }

Destroying a Face Tracker

It is a good practice to release resources before finishing the application by using the xrDestroyFaceTracker2FB function.

    OXR(xrDestroyFaceTracker2FB_(faceTracker_));

Using the Visemes Extension

Visemes are the visual representations of the face and mouth while speaking certain sounds in a spoken language. XR_META_face_tracking_visemes extension can be used when you need visemes as a stop-gap solution, before your rig supports blendshapes.

Check Compatibility

You must check if the user’s headset supports visemes. For a given XrInstance, you must receive the system properties through calling the xrGetSystemProperties⁠ function to validate this.

To do so, use the XrSystemFaceTrackingVisemesPropertiesMETA struct that describes if a system supports visemes. Its definition follows.

typedef struct XrSystemFaceTrackingVisemesPropertiesMETA {
    XrStructureType    type;
    void*              next;
    XrBool32           supportsVisemes;
} XrSystemFaceTrackingVisemesPropertiesMETA;

The following example demonstrates how to validate visemes support.

    XrSystemFaceTrackingVisemesPropertiesMETA faceTrackingVisemesSystemProperties{
        XR_TYPE_SYSTEM_FACE_TRACKING_VISEMES_PROPERTIES_META};
    XrSystemProperties systemProperties{XR_TYPE_SYSTEM_PROPERTIES,
        &faceTrackingVisemesSystemProperties};
    OXR(xrGetSystemProperties(instance, systemId, &systemProperties));
    if (faceTrackingVisemesSystemProperties.supportsVisemes) {
        // visemes are supported!
    }

Retrieving Viseme Weights

To retrieve visemes, you must call the xrGetFaceExpressionWeights2FB function, while adding XrFaceTrackingVisemesMETA to the next chain of XrFaceExpressionWeights2FB structure. The definition of the XrFaceTrackingVisemesMETA struct follows. Make sure you check the validity of the data on return by checking the isValid flag after the call to xrGetFaceExpressionWeights2FB.

typedef struct XrFaceTrackingVisemesMETA {
    XrStructureType    type;
    const void*        next;
    XrBool32           isValid;
    float              visemes[XR_FACE_TRACKING_VISEME_COUNT_META];
} XrFaceTrackingVisemesMETA;

The following example demonstrates how to call the xrGetFaceExpressionWeights2FB function to get visemes, instead of blendshapes.

    XrFaceExpressionWeights2FB expressionWeights{XR_TYPE_FACE_EXPRESSION_WEIGHTS2_FB};
    expressionWeights.weightCount = 0;
    expressionWeights.confidenceCount = 0;

    XrFaceTrackingVisemesMETA visemeInfo{XR_TYPE_FACE_TRACKING_VISEMES_META};
    expressionWeights.next = &visemeInfo;

    XrFaceExpressionInfo2FB expressionInfo{XR_TYPE_FACE_EXPRESSION_INFO2_FB};
    expressionInfo.time = GetPredictedDisplayTime();

    OXR(xrGetFaceExpressionWeights2FB_(faceTracker_, &expressionInfo, &expressionWeights));

    if (visemeInfo.isValid) {
        for (uint32_t i = 0; i < XR_FACE_TRACKING_VISEME_COUNT_META; ++i) {
            // visemeInfo.visemes[i] contains a weight of specific visemes
        }
    }

If you pass valid values to weightCount, weights, confidenceCount, and confidences, instead of assigning 0 to weightCount and confidenceCount, both blendshapes and visemes will be returned by xrGetFaceExpressionWeights2FB function.

Visemes Visual Reference

Viseme	Phonemes	Examples	Emphasized
SIL	neutral		None
PP	p, b, m	put, bat, mat
FF	f, v	fat, vat
TH	th	think, that
DD	t, d	tip, doll
KK	k, g	call, gas
CH	tS, dZ, S	chair, join, she
SS	s, z	sir, zeal
NN	n, l	lot, not
RR	r	red
AA	A:	car
E	e	bed
IH	ih	tip
OH	oh	toe
OU	ou	book