Microsoft Kinect is an amazing device with state-of-the-art body tracking capabilities. However, the newest SDK version includes a touch of some great new magic: Face tracking.
Utilizing the infrared and color streams, Kinect sensor can accurately track thousands of facial points. Oddly, Microsoft has implemented two APIs for tracking a face: Face Basics and Face HD. The first one only provides limited capabilities in the 2D space. Face HD, though, includes a ton of hidden goodies for tracking a face in the 3D space.
{ Vitruvius is featured in the official Microsoft Kinect Website and Channel9 }
After reading this article, you’ll be able to understand how face tracking works, how you can access the points in the 3D space and how to display them in the 2D space.
Introducing the Face class
Even though the native Kinect API is powerful, it’s quite messy. Microsoft only exposes a huge array of face vertices, along with a big enumeration. So, while building Vitruvius, we re-imagined the whole face tracking experience, from a purely developer perspective.
Please, meet the all-new Face class. The Face class contains all of the information you’d ever wish to know. Using the Face class, accessing properties like the nose, eyes, jaw, forehead, chin, cheeks, or chin is now a matter of one line of code.
Virtuous extends the native APIs and exposes a single, unified, powerful interface.
Let me show you how.
Accessing an HD Face object using C#
Just like every Kinect stream, HD Face has its own frame type. To properly use a Face frame, you need to include the following namespace:
using LightBuzz.Vitruvius;
After that, you need to subscribe to the FrameArrived event, just like you’d do for the Color, Depth, Infrared, or Body streams. If you are using Unity, you do not need to subscribe to the event, but simply check the frame readers in your Update method.
Since you’ve done those trivial tasks, you can simply call the Face() extension method.
// Private members
private KinectSensor _sensor = null;
private BodyFrameSource _bodySource = null;
private BodyFrameReader _bodyReader = null;
private HighDefinitionFaceFrameSource _faceSource = null;
private HighDefinitionFaceFrameReader _faceReader = null;
// Initialization
_sensor = KinectSensor.GetDefault();
if (_sensor != null)
{
_bodySource = _sensor.BodyFrameSource;
_bodyReader = _bodySource.OpenReader();
_bodyReader.FrameArrived += BodyReader_FrameArrived; ;
_faceSource = new HighDefinitionFaceFrameSource(_sensor);
_faceReader = _faceSource.OpenReader();
_faceReader.FrameArrived += FaceReader_FrameArrived; ;
_sensor.Open();
}
// Event handlers
private void BodyReader_FrameArrived(object sender,
BodyFrameArrivedEventArgs args)
{
using (var frame = args.FrameReference.AcquireFrame())
{
if (frame != null)
{
Body body = frame.Bodies().Closest();
if (!_faceSource.IsTrackingIdValid)
{
if (body != null)
{
_faceSource.TrackingId = body.TrackingId;
}
}
}
}
}
private void FaceReader_FrameArrived(object sender,
HighDefinitionFaceFrameArrivedEventArgs args)
{
using (var frame = args.FrameReference.AcquireFrame())
{
if (frame != null && frame.IsFaceTracked)
{
Face face = frame.Face();
}
}
}
Accessing the HD Face properties
Let’s now get to the funny part. Every facial characteristic is a property of the Face class. Every facial point is expressed as a CameraSpacePoint (X, Y, and Z values).
var eyeLeft = face.EyeLeft;
var eyeRight = face.EyeRight;
var cheekLeft = face.CheekLeft;
var cheekRight = face.CheekRight;
var nose = face.Nose;
var mouth = face.Mouth;
var chin = face.Chin;
var forehead = face.Forehead;
Insanely easy right? Here’s the result:
Even if you need to access the entire collection of vertices (more than 1000 points), Vitruvius has got you covered:
var vertices = face.Vertices;
Yeah, the result is really creepy! Don’t try it home alone!
Accessing the HD Face methods
What if you need to know more about the points that form each facial feature? For example, what if you need to get the outline of the eyes or the mouth? For such purposes, the Face class includes the following methods:
var eyeLeftOutline = face.EyeLeftPoints();
var eyeRightOutline = face.EyeRightPoints();
This is the result of drawing the contour of each eye, after calling the EyeLeftPoints() and EyeRightPoints() methods:
Converting from 3D to 2D coordinates
Accessing the HD Face features as a list of CameraSpacePoints gives you all the information about the 3D coordinates. To display those points in the 2D screen space, though, you’ll need to convert the 3D coordinates into 2D coordinates.
Kinect uses the CoordinateMapper class to convert between coordinates from different spaces. The Color space is an array of 1920×1080 pixels. The Depth & Infrared space is an array of 512×424 pixels. Vitruvius simplifies the coordinate mapping process with the handy ToPoint() method.
Converting from the 3D space to the Color space:
var nosePoint = face.Nose.ToPoint(Visualization.Color);
Converting from the 3D space to the Depth or Infrared space:
var nosePoint = face.Nose.ToPoint(Visualization.Depth);
Using the coordinate mapping process, you can now display the points on screen using Unity or XAML.
Supported platforms
Vitruvius HD Face supports the following platforms and frameworks:
- Unity3D
- WPF / .NET 4.5+
- Windows Store
Frequently Asked Questions
Finally, let me shed some light to a few topics almost every Kinect developer needs to know about.
1) What is the optimal distance from the sensor?
The sensor can accurately track a face between 40 centimeters and 2 meters. For best results, the optimal distance is 60-90cm.
2) What is the optimal rotation angle?
Kinect Face tracking works best when you are facing the sensor directly (enface). However, the tracking algorithm is pretty decent even if you rotate your head up to 50 degrees to the left or right. Face tracking won’t work if your head is rotated e.g. 90 degrees to one side.
3) What about the lighting?
As mentioned above, Kinect face tracking is strongly relying on the Color stream. As a result, the room should have a decent amount of lighting. Also, you need to avoid pointing laser beams directly to the sensor.
4) Can I find the documentation online?
Definitely. You can check the documentation online.
So, this is it! How are you planning to use the HD Face capabilities to your Kinect apps? Let me know in the comments below.
‘Till the next time, keep Kinecting!
Get Vitruvius
As you see, Vitruvius is helping innovative companies create Kinect apps fast. Vitruvius simplifies Kinect development, so you can now focus on what’s really important: your app, your research, and your customers. Why not give it a try?
excuse me Mr.Pterneas;
As I told you , I am trying to do simple program to identify ( arm is up or down, hand near to mouth or not, head up or down , fingers hold spoon or food)
I started with hand near to mouth or not >>>( I have use the HD face )
Can you check the code if my idea is right ???
using WindowsPreview.Kinect;
using Microsoft.Kinect.Face;
using LightBuzz.Vitruvius;
// The Blank Page item template is documented at http://go.microsoft.com/fwlink/?LinkId=234238
namespace Kinect2FaceHD_WinRT
{
///
/// An empty page that can be used on its own or navigated to within a Frame.
///
public sealed partial class MainPage : Page
{
private KinectSensor _sensor = null;
private BodyFrameSource _bodySource = null;
private BodyFrameReader _bodyReader = null;
private HighDefinitionFaceFrameSource _faceSource = null;
private HighDefinitionFaceFrameReader _faceReader = null;
public MainPage()
{
InitializeComponent();
_sensor = KinectSensor.GetDefault();
if (_sensor != null)
{
_bodySource = _sensor.BodyFrameSource;
_bodyReader = _bodySource.OpenReader();
_bodyReader.FrameArrived += BodyReader_FrameArrived;
_faceSource = new HighDefinitionFaceFrameSource(_sensor);
_faceReader = _faceSource.OpenReader();
_faceReader.FrameArrived += FaceReader_FrameArrived;
_sensor.Open();
}
}
private void BodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
{
using (var frame = e.FrameReference.AcquireFrame())
{
if (frame != null)
{
Body body = frame.Bodies().Closest();
if (!_faceSource.IsTrackingIdValid)
{
if (body != null)
{
_faceSource.TrackingId = body.TrackingId;
}
}
}
}
}
private void FaceReader_FrameArrived(object sender, HighDefinitionFaceFrameArrivedEventArgs e)
{
using (var frame = e.FrameReference.AcquireFrame())
{
if (frame != null && frame.IsFaceTracked)
{
Face face = frame.Face();
var mouth = face.Mouth;
var nose = face.Nose;
var neck = face.Neck;
var hand = body.Joints[JointType.HandRight].Position;
var distance = mouth.Length(hand);
if (distance < 0.1)
//display the result on screen ( how??) the hand is closing to mouth
}
}
}
the last thing ?? how I can display the result on screen what is the simple way to use canvas.
Hi Hannan. You need to declare the body object as a private variable into your class. To display the results, use a XAML Canvas or a TextBlock.
using WindowsPreview.Kinect;
using Microsoft.Kinect.Face;
using LightBuzz.Vitruvius;
namespace Kinect2FaceHD_WinRT
{
public sealed partial class MainPage : Page
{
private KinectSensor _sensor = null;
private BodyFrameSource _bodySource = null;
private BodyFrameReader _bodyReader = null;
private HighDefinitionFaceFrameSource _faceSource = null;
private HighDefinitionFaceFrameReader _faceReader = null;
private Body body = null;
public MainPage()
{
InitializeComponent();
_sensor = KinectSensor.GetDefault();
if (_sensor != null)
{
_bodySource = _sensor.BodyFrameSource;
_bodyReader = _bodySource.OpenReader();
_bodyReader.FrameArrived += BodyReader_FrameArrived;
_faceSource = new HighDefinitionFaceFrameSource(_sensor);
_faceReader = _faceSource.OpenReader();
_faceReader.FrameArrived += FaceReader_FrameArrived;
_sensor.Open();
}
}
private void BodyReader_FrameArrived(object sender, BodyFrameArrivedEventArgs e)
{
using (var frame = e.FrameReference.AcquireFrame())
{
if (frame != null)
{
body = frame.Bodies().Closest();
if (!_faceSource.IsTrackingIdValid)
{
if (body != null)
{
_faceSource.TrackingId = body.TrackingId;
}
}
}
}
}
private void FaceReader_FrameArrived(object sender, HighDefinitionFaceFrameArrivedEventArgs e)
{
using (var frame = e.FrameReference.AcquireFrame())
{
if (frame != null && frame.IsFaceTracked)
{
var face = frame.Face();
var mouth = face.Mouth;
var nose = face.Nose;
var neck = face.Neck;
var hand = body.Joints[JointType.HandRight].Position;
var distance = mouth.Length(hand);
if (distance < 0.1) // ---> Experiment with this value.
{
System.Diagnostics.Debug.WriteLine("Hand close to mouth");
// Display the results in a TextBlock, Canvas, or any other visual element, based on your UI.
}
}
}
}
}
}
We are using kinect V2 and we want a outline for upper and lower lip. Now we have picked outline points and added it in list and using this list. But is it a best way? or there is another way to achieve this?
Hello Ajay. I guess you are using the Vertices property to access the face points. If yes, that’s the most efficient way.
I was just wondering how I would go about saving or exporting the face points to a model program as 3ds Max
Hello. You can export the HD Face points in CSV format (or other), but you cannot load them directly to 3D Studio Max.
in this part , I faced this kind of error.
// set the high definishin face source
_faceSource = new HighDefinitionFaceFrameSource(_sensor);
_faceReader = _faceSource.OpenReader();
_faceReader.FrameArrived += FaceReader_FrameArrived;
The name FaceReader_FrameArrived; does not exit in the current context. <<< this error.
You need to add a method named FaceReader_FrameArrived into your class. For example:
private void FaceReader_FrameArrived(object sender,
HighDefinitionFaceFrameArrivedEventArgs args)
{
using (var frame = args.FrameReference.AcquireFrame())
{
if (frame != null && frame.IsFaceTracked)
{
Face face = frame.Face();
}
}
}
Hi Pterneas:
I used the following to create the outline of the left eye, but all I get are 4 points left eye top center, bottom venter, left side, and right side. why don’t I get the points as per your picture. Visual Studio only show a count of 4 returning from the call ” var eyeLeftOutline = face.EyeLeftPoints();”
Here is the code I used
// Display Eye Outline points.
var eyeLeftOutline = face.EyeLeftPoints();
Ellipse ellipse;
// Display all face points.
if (_ellipses.Count == 0)
{
for (int index = 0; index < eyeLeftOutline.Count; index++)
{
ellipse = new Ellipse
{
Width = 01.50,
Height = 01.50,
Fill = new SolidColorBrush(Colors.Pink)
};
_ellipses.Add(ellipse);
canvas.Children.Add(ellipse);
}
}
for (int index = 0; index < eyeLeftOutline.Count; index++)
{
ellipse = _ellipses[index];
CameraSpacePoint vertex = eyeLeftOutline[index];
PointF point = vertex.ToPoint(Visualization.Infrared);
Canvas.SetLeft(ellipse, point.X – ellipse.Width / 2.0);
Canvas.SetTop(ellipse, point.Y – ellipse.Height / 2.0);
}
Hello Duane. The Free version includes the Vertices property, as well as the primary (most common) face points. The Academic and Premium versions also include methods that allow you to get all of the available points of a particular face part. For example, the methods LeftEye() and LeftEyebrow() would give you all of the available points of the corresponding face parts.
Hi Pterneas,
I am looking for a unity kinect v2 package for starting up my AR photo booth. I am now currently adapting “Kinect v2 for Examples with MS-SDK”, which is a common package at asset store. However, my application is looking for a face tracking with shorter distance. The mentioned package detects whole body even for face tracking which limits the detecting distance. I would like to know (Q1) if vitruvius also detects whole body skeleton before tracked on face? (Q2) Does your package already includes a similar package for AR photo booth? (such as masking, AR hat etc ) . Many thanks.
Hello, Rex. To track a Face, Kinect needs to track a Body first. Using Vitruvius, once the body is detected, the face will be tracked, even if you come close to the Kinect (e.g. 1 meter). So, you can detect the body when someone is entering the room/booth and then you keep the face information as the person is coming close to the camera. Vitruvius includes an AR basketball sample that positions a ball on top of the person’s hand. You can use that sample to project any kind of 2D or 3D object on top of any human body joint. Feel free to contact our team if you need more information.
Thank you!
HI Pterneas,
Thanks for your prompt reply. I download the free version and run it at WPF, but the HD face tracking is not included. I want to try its performance before I buy. From your website, HD Face should be included in free trial version. Did I do anything wrong?
Hi Rex. Have you downloaded the latest version? If so, you can check the FacePage.xaml and FacePage.xaml.cs files. If you cannot find those files, just let me know.
I’m having a tough time figuring this out: is it possible to use HD face tracking on more than one face simultaneously? (Not “will it perform well” but “is it supported by the Kinect SDK”?)
Thanks for your reply!
Hi, Roger. Sure, it is definitely feasible. You’ll need 1-6 HighDefinitionFaceFrameReaders and HighDefinitionFaceFrameSources. Each Face reader only tracks one face. You can declare up to 6 face readers.
hello vangos
thanks for your site and its content …
i change source code HD Face and now it is color .
but there is a problem !
points aren’t exactly on my face !(a few centimeter to left )
what do shall ?
thanks
Hello, Amin. Thank you for your comment. To properly draw the points on top of the RGB frame, you need to use the ToPoint method with the Color parameter:
var position2D = face.Forehead.ToPoint(Visualization.Color);
Hi Vangos
I create a program in accordance with the instructions on the website.
I worked with points in ( face.Vertices ).
But the points are always shaking, even when the face is completely fixed.
The low vibrations for me do not cause big problems, but most of the time spots are brutally displaced. For example, points on the face jump about 1 to 2 cm.
I need fixed points without shaking.
what is problem?how can i fix this?
Thank you
Hello. Kinect has some jittering, but 1-2 cm is not normal. HD Face works well when in good lighting conditions. Consider checking the lighting conditions and keep your face at a 1-2 meters distance from the camera.
Other than that, you can manually add real-time smoothing. You can accomplish real-time smoothing by taking the median position of every point for 15 frames.
hi
i hope you be good .
i need to know point indexes .
what do i do ?
(if it’s necessary that buy package which package i should buy ?)
Hello Amin. The HD Face API includes the indexes of every face point. Additionally, the API includes methods that let you access the exact face part an index belongs to. For example, the following methods will return a list of points that belong to a particular face part:
var foreheadPoints = face.Forehead();
var leftTemplePoints = face.LeftTemple();
var rightTemplePoints = face.RightTemple();
var cheekLeftPoints = face.LeftCheek();
var cheekRightPoints = face.RightCheek();
var eyebrowLeftPoints = face.LeftEyebrow();
var eyebrowRightPoints = face.RightEyebrow();
var eyeLeftPoints = face.LeftEye();
var eyeRightPoints = face.RightEye();
var cheekboneLeftPoints = face.LeftCheekbone();
var cheekboneRightPoints = face.RightCheekbone();
var nosePoints = face.Nose();
var nostrilLeftPoints = face.LeftNostril();
var nostrilRightPoints = face.RightNostril();
var mustachePoints = face.Mustache();
var upperLipPoints = face.UpperLip();
var lowerLipPoints = face.LowerLip();
var mouthPoints = face.Mouth();
var jawLeftPoints = face.LeftJaw();
var jawRightPoints = face.RightJaw();
var chinPoints = face.Chin();
All these methods are included in Vitruvius Premium.
Hi Vangos,
I’m working on a program that will track specific points like Eyebrow points and Mouth points. I would really like those specific points to be a different color than the other Face points. However, I think it would be easiest to to do this by using the indices of those points and setting the Ellipse color to a color different from the rest of the face. The methods you developed such as getEyebrowPoints() only return a List of CameraSpacePoint, and I am unable to get at the index numbers. Is there a way I can get those specific index numbers?
A work-around I’ve come up with is to make Lists of the CameraSpacePoints I need with your functions, make the Ellipses larger, and in effect display these points over the original Face points. Does this sound like the best method? I would like to keep displaying all the other Face points. Thank you for you time and for developing Vitruvius!
Matt
Hello Matt. Using the latest version of Vitruvius, you can access the indices of the points you need using the following properties:
int[] indices = FaceExtensions.CENTER_POINTS;
int[] indices = FaceExtensions.CONTOUR;
int[] indices = FaceExtensions.FOREHEAD;
int[] indices = FaceExtensions.LEFT_TEMPLE;
int[] indices = FaceExtensions.RIGHT_TEMPLE;
int[] indices = FaceExtensions.LEFT_CHEEK;
int[] indices = FaceExtensions.RIGHT_CHEEK;
int[] indices = FaceExtensions.LEFT_EYEBROW;
int[] indices = FaceExtensions.RIGHT_EYEBROW;
int[] indices = FaceExtensions.LEFT_EYE;
int[] indices = FaceExtensions.RIGHT_EYE;
int[] indices = FaceExtensions.LEFT_CHEEKBONE;
int[] indices = FaceExtensions.RIGHT_CHEEKBONE;
int[] indices = FaceExtensions.NOSE;
int[] indices = FaceExtensions.LEFT_NOSTRIL;
int[] indices = FaceExtensions.RIGHT_NOSTRIL;
int[] indices = FaceExtensions.MUSTACHE;
int[] indices = FaceExtensions.UPPER_LIP;
int[] indices = FaceExtensions.MOUTH;
int[] indices = FaceExtensions.LOWER_LIP;
int[] indices = FaceExtensions.LEFT_JAW;
int[] indices = FaceExtensions.RIGHT_JAW;
int[] indices = FaceExtensions.CHIN;
The above properties will give you an array of integers. Each integer corresponds to the appropriate vertex.
Hi Vitruvius users, our question relates to coordinate system transformations. Is there a simple way to transform the coordinate system of the Kinect sensor to that of the user’s head (axis origin being the head pivot point) at every time sample? Our goal is to track facial feature movements relative to their position on the face. To do this, we need to remove the rotational and translational movement of the head (a rigid body). For example, at rest a person’s vertical position of his or her eyebrow is considered 0 (regardless of how head is moving in space), but when raised would be a positive value (following right-hand-rule convention). We do not want to calculate the magnitude between features and a fixed reference point on head (e.g., Euclidean distance between nose & brow), as this loses directionality along the three separate axes which is important for distinguishing emotions (e.g., surprise/fear ~= upward vertical brow movement, anger ~= downward vertical brow movement).
Hi Matt. I think you can achieve this with a “calibration pose”. A calibration pose could be taken automatically (e.g. “when the rotation of the head is 0 degrees”) or manually (e.g. “Stay still to calibrate”).
You can then compare the relative positions of the key points and the calibrated data.
Hi Vangos, thank you very much! Yes that’s a good idea, we’ll need to take an initial calibration pose of each person to identify the coordinates of the resting position of their particular facial features. Do you know what function or math operation we can use to perform a coordinate transformation in 3D space in C#? We want to rotate from the coordinate system of the Xbox sensor (world coordinate system), to that of the person’s head (local coordinate system). Wikipedia describes the operation, but we’re unsure how to implement this in code – https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation#Using_quaternion_rotations
Thanks again,
Matt.
Hi Matt. You could simply get the position of the head joint and set it as your reference/camera point. So, the position of all the other joint positions would be recalculated based on the head position. The easiest way would be to subtract the corresponding vectors (CameraSpacePoints).
Hello Vangos!
I’m amazed how many useful informations I found on your website. You’re a savior!
I’m not a programmer so my questions will be rather silly and general but it’s very important to me. I wonder if Kinect + HD Face could be used for efficient and accurate performance capture. I know the functionality is not actually there and it’s rather access to the data but could I ask you to answer few basic questions and give your opinion on the whole idea?
1. Let’s say I want to use Kinect mounted on a static rig and pointed at human face. Let’s also assume that face is nicely and uniformly lightened by the lamps mounted on the same rig. Question: how accurate the tracking can get? Will it catch subtle twitches and micro movement (basically human emotions)? Basically how powerful it can be as a facial capture solution with the proper lighting/distance/mount?
2. Let’s say that I’m not interested in live data for application. Is it possible to record all 1000 points position at least 30 frames per second and use it later as a point cloud animation in 3D software?
3. And the more general question. Is it possible to use more than one Kinect in order to get more accurate data (less noise/jitter)?
Thanks in advance and sorry for my english / I’m from PL 😉
Hi Jakub. Thank you for your comment! Please, check my replies below:
1) Kinect Face tracking has limitations. You won’t be able to detect subtle twitches or micro-movements. This is why our team has been developing a new Vitruvius Face SDK.
2) You could record as many points as you like. My recommendation is to serialize the Face to JSON format and store one JSON file per frame. Then, you’ll be able to parse the JSON data and re-create the movements. Kinect Face is relying on the RGB color frames, so you’ll have a framerate between 15 and 30, depending on your computer.
3) The Kinect drivers do not recognize more than 1 sensors. To use many Kinect devices, you’ll need many computers. Combining data from different sensors will let you have slightly more accurate results.