research-article

Joint Rotation-Invariance Face Detection and Alignment with Angle-Sensitivity Cascaded Networks

Authors:

Xu-Cheng YinAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 1473 - 1480

https://doi.org/10.1145/3343031.3350877

Published: 15 October 2019 Publication History

Abstract

Due to the angle variations especially in unconstrained scenarios, face detection and alignment have become challenging tasks. In existing methods, face detection and alignment are always conducted separately, which can greatly increase the computation cost. Moreover, this separation will abandon the inherent correlation underlying the two tasks. In this paper, we propose a simple but effective architecture, named Angle-Sensitivity Cascaded Networks (ASCN), for jointly conducting rotation-invariance face detection and alignment. ASCN mainly consists of three consecutive cascaded networks. Specifically, in the first stage, the rotation angle is predicted and candidate bounding boxes are proposed simultaneously. In the second stage, ASCN further refines the candidates and orientations. In the last stage, ASCN jointly learns the accurate bounding boxes and alignment. Besides, for accurately locating landmarks in hard examples, we introduce a pose-equitable loss to balance the faces with large poses. Extensive experiments conducted on benchmark datasets demonstrate the surprising performance of our method. Notably, our method maintains real-time efficiency for both detection and alignment tasks on the ordinary CPU platform.

References

[1]

Vineeth Nallure Balasubramanian, Jieping Ye, and Sethuraman Panchanathan. 2007. Biased Manifold Embedding: A Framework for Person-Independent Head Pose Estimation. In CVPR. 1--7.

[2]

Xavier P. Burgos-Artizzu, Pietro Perona, and Piotr Dollá r. 2013. Robust Face Landmark Estimation under Occlusion. In ICCV. 1513--1520.

[3]

Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving Into High Quality Object Detection. In CVPR. 6154--6162.

[4]

Dong Chen, Gang Hua, Fang Wen, and Jian Sun. 2016. Supervised Transformer Network for Efficient Face Detection. In ECCV. 122--138.

[5]

Dong Chen, Shaoqing Ren, Yichen Wei, Xudong Cao, and Jian Sun. 2014. Joint Cascade Face Detection and Alignment. In ECCV. 109--122.

[6]

Spyros Gidaris and Nikos Komodakis. 2015. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model. In ICCV . 1134--1142.

[7]

Ross B. Girshick. 2015. Fast R-CNN. In ICCV . 1440--1448.

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034.

[9]

Nan Hu, Weimin Huang, and Surendra Ranganath. 2005. Head pose estimation by non-linear embedding and mapping. In ICIP. 342--345.

[10]

Peiyun Hu and Deva Ramanan. 2017. Finding Tiny Faces. In CVPR. 1522--1530.

[11]

Chang Huang, Haizhou Ai, Yuan Li, and Shihong Lao. 2007. High-Performance Rotation Invariant Multiview Face Detection. TPAMI., Vol. 29, 4 (2007), 671--686.

Digital Library

[12]

Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In NIPS. 2017--2025.

[13]

Vidit Jain and Erik Learned-Miller. 2010. FDDB: A Benchmark for Face Detection in Unconstrained Settings . Technical Report UM-CS-2010-009. University of Massachusetts, Amherst.

[14]

Amin Jourabloo and Xiaoming Liu. 2015. Pose-Invariant 3D Face Alignment. In ICCV. 3694--3702.

[15]

Amin Jourabloo, Mao Ye, Xiaoming Liu, and Liu Ren. 2017. Pose-Invariant Face Alignment with a Single CNN . In ICCV. 3219--3228.

[16]

Martin Kö stinger, Paul Wohlhart, Peter M. Roth, and Horst Bischof. 2011. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization. In ICCV. 2144--2151.

[17]

Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen, and Gang Hua. 2014. Efficient Boosted Exemplar-Based Face Detection. In CVPR. 1843--1850.

[18]

Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, and Gang Hua. 2015. A convolutional neural network cascade for face detection. In CVPR. 5325--5334.

[19]

Jian Li, Yabiao Wang, Changan Wang, Ying Tai, Jianjun Qian, Jian Yang, Chengjie Wang, Jilin Li, and Feiyue Huang. 2019. DSFD: Dual Shot Face Detector. In CVPR .

[20]

Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollá r. 2017. Focal Loss for Dense Object Detection. In ICCV. 2999--3007.

[21]

Lingbo Liu, Guanbin Li, Yuan Xie, Yizhou Yu, Qing Wang, and Liang Lin. 2018. Facial Landmark Machines: A Backbone-Branches Architecture with Progressive Representation Learning. IEEE Transactions on Multimedia (2018).

[22]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. In ECCV. 21--37.

[23]

Yaqi Liu, Xiaobin Zhu, Xianfeng Zhao, and Yun Cao. 2019. Adversarial Learning for Constrained Image Splicing Detection and Localization Based on Atrous Convolution. IEEE Trans. Information Forensics and Security, Vol. 14, 10 (2019), 2551--2566.

[24]

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In ICCV. 3730--3738.

[25]

Jiang-Jing Lv, Xiaohu Shao, Junliang Xing, Cheng Cheng, and Xi Zhou. 2017. A Deep Regression Architecture with Two-Stage Re-initialization for High Performance Facial Landmark Detection. In CVPR. 3691--3700.

[26]

Yong Ma and Xiaoqing Ding. 2003. Real-time rotation invariant face detection based on cost-sensitive AdaBoost. In ICIP . 921--924.

[27]

Mahyar Najibi, Pouya Samangouei, Rama Chellappa, and Larry S. Davis. 2017. SSH: Single Stage Headless Face Detector. In ICCV. 4885--4894.

[28]

Eshed Ohn-Bar and Mohan M. Trivedi. 2016. To boost or not to boost? On the limits of boosted trees for object detection. In ICPR . 3350--3355.

[29]

Hongwei Qin, Junjie Yan, Xiu Li, and Xiaolin Hu. 2016. Joint Training of Cascaded CNN for Face Detection. In CVPR. 3456--3465.

[30]

Rajeev Ranjan, Vishal M. Patel, and Rama Chellappa. 2019. HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition. TPAMI, Vol. 41, 1 (2019), 121--135.

Digital Library

[31]

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In CVPR. 779--788.

[32]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS . 91--99.

[33]

Henry A. Rowley, Shumeet Baluja, and Takeo Kanade. 1998. Rotation Invariant Neural Network-Based Face Detection. In CVPR. 38--44.

[34]

Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, and Xilin Chen. 2018. Real-Time Rotation-Invariant Face Detection With Progressive Calibration Networks. In CVPR . 2295--2303.

[35]

Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2013. Deep Convolutional Network Cascade for Facial Point Detection. In CVPR. 3476--3483.

[36]

Xu Tang, Daniel K. Du, Zeqiang He, and Jingtuo Liu. 2018. PyramidBox: A Context-Assisted Single Shot Face Detector. In ECCV. 812--828.

[37]

Paul A. Viola and Michael J. Jones. 2001. Rapid Object Detection using a Boosted Cascade of Simple Features. In CVPR . 511--518.

[38]

Wayne Wu, Chen Qian, Shuo Yang, Quan Wang, Yici Cai, and Qiang Zhou. 2018. Look at Boundary: A Boundary-Aware Face Alignment Algorithm. In CVPR. 2129--2138.

[39]

Xuehan Xiong and Fernando De la Torre. 2013. Supervised Descent Method and Its Applications to Face Alignment. In CVPR . 532--539.

[40]

Heng Yang, Wenxuan Mou, Yichi Zhang, Ioannis Patras, Hatice Gunes, and Peter Robinson. 2015. Face Alignment Assisted by Head Pose Estimation. In BMVC. 130.1--130.13.

[41]

Shuo Yang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2016. WIDER FACE: A Face Detection Benchmark. In CVPR. 5525--5533.

[42]

Jiahui Yu, Yuning Jiang, Zhangyang Wang, Zhimin Cao, and Thomas S. Huang. 2016. UnitBox: An Advanced Object Detection Network. In ACM MM. 516--520.

[43]

Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and Understanding Convolutional Networks. In ECCV. 818--833.

[44]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. SPL, Vol. 23, 10 (2016), 1499--1503.

[45]

Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z. Li. 2017. FaceBoxes: A CPU real-time face detector with high accuracy. In IJCB. 1--9.

[46]

Zhanpeng Zhang, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Facial Landmark Detection by Deep Multi-task Learning. In ECCV. 94--108.

[47]

Xiaobin Zhu, Zhuangzi Li, Xiao-Yu Zhang, Changsheng Li, Yaqi Liu, and Ziyu Xue. 2019. Residual Invertible Spatio-Temporal Network for Video Super-Resolution. In AAAI . 5981--5988.

[48]

Xiaobin Zhu, Jing Liu, Jinqiao Wang, Changsheng Li, and Hanqing Lu. 2014. Sparse representation for robust abnormality detection in crowded scenes. Pattern Recognition, Vol. 47, 5 (2014), 1791--1799.

Digital Library

[49]

Xiangxin Zhu and Deva Ramanan. 2012. Face detection, pose estimation, and landmark localization in the wild. In CVPR . 2879--2886.

Cited By

Yuan SLi JRen LChen Z(2024)Multi-Frequency Field Perception and Sparse Progressive Network for low-light image enhancementJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104133100(104133)Online publication date: May-2024
https://doi.org/10.1016/j.jvcir.2024.104133
Wang CWu HJin ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency InformationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611909(7459-7469)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611909
Wang CJin ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image EnhancementProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611907(8356-8366)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611907
Show More Cited By

Index Terms

Joint Rotation-Invariance Face Detection and Alignment with Angle-Sensitivity Cascaded Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

JÂA-Net: Joint Facial Action Unit Detection and Face Alignment Via Adaptive Attention
Abstract
Facial action unit (AU) detection and face alignment are two highly correlated tasks, since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. However, most existing AU ...
Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment
Computer Vision – ECCV 2018
Abstract
Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection. Most existing AU detection ...
Funnel-structured cascade for multi-view face detection with alignment-awareness

Multi-view face detection in open environment is a challenging task due to diverse variations of face appearances and shapes. Most multi-view face detectors depend on multiple models and organize them in parallel, pyramid or tree structure, which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

2794 pages

ISBN:9781450368896

DOI:10.1145/3343031

General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Beijing Natural Science Foundation
Fundamental Research Funds for Central Universities of the Central South University
China Postdoctoral Science Foundation

Conference

MM '19

Sponsor:

SIGMM

MM '19: The 27th ACM International Conference on Multimedia

October 21 - 25, 2019

Nice, France

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)5

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yuan SLi JRen LChen Z(2024)Multi-Frequency Field Perception and Sparse Progressive Network for low-light image enhancementJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104133100(104133)Online publication date: May-2024
https://doi.org/10.1016/j.jvcir.2024.104133
Wang CWu HJin ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency InformationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611909(7459-7469)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611909
Wang CJin ZEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Brighten-and-Colorize: A Decoupled Network for Customized Low-Light Image EnhancementProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611907(8356-8366)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611907
Agrwal SSharma SKant V(2023)A Review on Unconstrained Real-Time Rotation-Invariant Face Detection2023 3rd International Conference on Intelligent Communication and Computational Techniques (ICCT)10.1109/ICCT56969.2023.10076222(1-7)Online publication date: 19-Jan-2023
https://doi.org/10.1109/ICCT56969.2023.10076222
Song AXu XZhai X(2021)SACN: A Novel Rotating Face Detector Based on Architecture SearchElectronics10.3390/electronics1005055810:5(558)Online publication date: 27-Feb-2021
https://doi.org/10.3390/electronics10050558
Hao SWang ZSun F(2021)Stacked Pyramid Attention Network for Object DetectionNeural Processing Letters10.1007/s11063-021-10505-x54:4(2759-2782)Online publication date: 7-Apr-2021
https://doi.org/10.1007/s11063-021-10505-x
Zhou LGu YLiang SLei BLiu J(2020)Direction-Sensitivity Features Ensemble Network for Rotation-Invariant Face DetectionPattern Recognition and Computer Vision10.1007/978-3-030-60639-8_48(581-590)Online publication date: 15-Oct-2020
https://doi.org/10.1007/978-3-030-60639-8_48
Zhou LGu YWang PLiu FLiu JXu T(2020)Rotation-Invariant Face Detection with Multi-task Progressive Calibration NetworksPattern Recognition and Artificial Intelligence10.1007/978-3-030-59830-3_44(513-524)Online publication date: 9-Oct-2020
https://doi.org/10.1007/978-3-030-59830-3_44

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents