英语原文共 9 页,剩余内容已隐藏,支付完成后下载完整资料
Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
Xuepeng Shi 1,2 Shiguang Shan1,3 Meina Kan1,3 Shuzhe Wu 1,2 Xilin Chen1
- Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- CAS Center for Excellence in Brain Science and Intelligence Technology
{xuepeng.shi, shiguang.shan, meina.kan, shuzhe.wu, xilin.chen}@vipl.ict.ac.cn
Abstract
Rotation-invariant face detection, i.e. detecting faces with arbitrary rotation-in-plane (RIP) angles, is widely required in unconstrained applications but still remains as a challenging task, due to the large variations of face appearances. Most existing methods compromise with speed or accuracy to handle the large RIP variations. To address this problem more efficiently, we propose Progressive Calibration Networks (PCN) to perform rotation-invariant face detection in a coarse-to-fine manner. PCN consists of three stages, each of which not only distinguishes the faces from non-faces, but also calibrates the RIP orientation of each face candidate to upright progressively. By dividing the calibration process into several progressive steps and only predicting coarse orientations in early stages, PCN can achieve precise and fast calibration. By performing binary classification of face vs. non-face with gradually decreasing RIP ranges, PCN can accurately detect faces with full 360◦ RIP angles. Such designs lead to a real-time rotationinvariant face detector. The experiments on multi-oriented FDDB and a challenging subset of WIDER FACE containing rotated faces in the wild show that our PCN achieves quite promising performance. A demo of PCN can be available at https://github.com/Jack-CV/PCN.
Introduction
Face detection serves as an important component in computer vision systems which aim to extract information from face images. Practical applications, such as face recognition and face animation, all need to quickly and accurately detect faces on input images in advance. Same as many other vision tasks, the performance of face detection has been substantially improved by Convolutional Neural Network (CNN) [4, 17, 14, 13, 20, 23, 12, 7, 15]. The CNN-based
Figure 1. Many complex situations need rotation-invariant face detectors. The face boxes are the outputs of our detector, and the blue line indicates the orientation of faces.
detectors enjoy the natural advantage of strong capability in non-linear feature learning. However, most works focus on designing an effective detector for generic faces without considerations for specific scenarios, such as detecting faces with full rotation-in-plane (RIP) angles as shown in Figure 1. They become less satisfactory in such complex applications. Face detection in full RIP, i.e. rotation-invariant face detection, is quite challenging, because faces can be captured almost from any RIP angle, leading to significant divergence in face appearances. An accurate rotationinvariant face detector can greatly boost the performance of subsequent process, e.g. face alignment and face recognition.
Generally, there are three strategies for dealing with the rotation variations including data augmentation, divideand-conquer, and rotation router [18], detailed as follows.
Data Augmentation is the most straightforward solution for training a rotation-invariant face detector, which augments the training data by uniformly rotating the upright
(a) Data Augmentation (b) Divide-and-Conquer
(c) Estimate RIP angles with a router network and rotate face candidates to upright [18].
Figure 2. Three strategies for rotation-invariant face detection. “FD-full”, “FD-up”, “FD-down”, “FD-left”, and “FD-right” mean face detectors trained with faces in full RIP angles, with faces facing up, with faces facing down, with faces facing left, and with faces facing right, respectively.
faces to full RIP angles. The advantage of this strategy is that the same scheme as that of the upright face detectors can be directly used without extra operations. However, to characterize such large variations of face appearances in single detector, one usually needs to use large neural networks with high time cost, which is not practical in many applications.
Divide-and-Conquer is another commonly used method for dealing with this problem which trains multiple detectors, one for a small range of RIP angles, to cover the full RIP range such as [8]. For example, four detectors covering the faces facing up, down, left and right respectively are constructed to detect the faces in full RIP angles, as shown in Figure 2(b). As one detector only deals with a small range of face appearance variations, and thus a small neural network with low time cost is enough for each detector. However, the overall time cost of running multiple detectors grows and more false alarms are easily introduced.
Rotation Router The large appearance variations of rotated faces come from their diverse RIP angles. Thus, a natural way is to estimate the facesrsquo; RIP angles explicitly, and then rotate them to upright, significantly reducing appearance variations of faces. In [18], a router network is firstly used to estima
剩余内容已隐藏,支付完成后下载完整资料
资料编号:[253813],资料为PDF文档或Word文档,PDF文档可免费转换为Word
以上是毕业论文外文翻译,课题毕业论文、任务书、文献综述、开题报告、程序设计、图纸设计等资料可联系客服协助查找。