Paper Title
IR-Depth Face Detection And Lip Localization Using Kinect V2
Abstract
Face recognition and lip localization are two essential building blocks in the development of audio visual
automatic speech recognition systems (AV-ASR). In many earlier works, face recognition and lip localization were
conducted in ideal lighting conditions with simple backgrounds. However, such conditions are seldom the case in real world
applications. In this paper, we present an approach to face recognition and lip localization that is invariant to lighting
conditions. This is done by employing infrared and depth images captured by the Kinect V2 device. First we present the use
of infrared images for face detection and highlight its improved performance over the traditional method. Second, we use
the face’s inherent depth information to reduce the search area for the lips by developing a nose point detection. Third, we
further reduce the search area by using a depth segmentation algorithm to separate the face from its background. Finally,
with the reduced search range, we present a method for lip localization based on depth gradients. Experimental results
demonstrated an accuracy of 100% for face detection, and 96% for lip localization.
Keywords— Face Detection; Lip Localization; Audio-Visual Automatic Speech Recognition; Depth; Infrared (IR); Data
Fusion; Kinect