Local and Global Features Interactive Fusion Network for Macro- and Micro-expression Spotting in Long Videos

Xie, Zhihua; Ye, Xionghui

doi:10.1007/978-981-97-8795-1_23

Zhihua Xie¹⁵ &
Xionghui Ye¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15041))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

82 Accesses

Abstract

Individual emotions are often manifested through facial expressions, where macro-expressions (MaEs) and micro-expressions (MEs) provide the corresponding visual cues for different emotion applications. The spotting of these intertwined expressions has attracted extraordinary interests, which is an indispensable procedure in expression applications. However, due to the presence of noise, irrelevant movements, and the confusion of MEs and MaEs, it is very challenging to learn discriminative intrinsic features by deep learning models. In this paper, we explore an efficient deep neural network to address the issue of MaEs and MEs spotting in long videos. Specifically, this study assigns optical flow features as the model input and proposes a deep model, named LGFINet, which concentrates on the fusion of local and global features to predict probability scores of frames during an expression interval. To further boost the learning capability of facial expression spotting, the LGFINet scheme integrates multi-head self-attention and multi-head cross-attention into the backbone of the spotting network. To validate the superiority of the LGFINet, spotting experiments are conducted on two public MEs datasets, CAS(ME)² and SAMM-LV. The proposed spotting approach achieves the F1 scores of 0.3710 and 0.4129 on the SAMM-LV and CAS(ME)² dataset respectively. Extensive experiments verify the robustness and superiority of the MEs spotting based on LGFINet to other models. The source code of the LGFINet is available on GitHub (https://github.com/XionghuiYe/LGIF_Net).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Article 31 August 2023

Lite general network and MagFace CNN for micro-expression spotting in long videos

Article 28 July 2023

Triple-ATFME: Triple-Branch Attention Fusion Network for Micro-Expression Recognition

Article 18 April 2024

References

Davison, A.K., Yap, M.H., Lansley, C.: Micro-facial movement detection using individualised baselines and histogram-based descriptors. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1864–1869. IEEE, Hong Kong (2015)
Google Scholar
Moilanen, A., Zhao, G., Pietikainen, M.: Spotting rapid facial movements from videos using appearance-based feature difference analysis. In: 2014 22nd International Conference on Pattern Recognition, pp. 1722–1727. IEEE, Stockholm (2014)
Google Scholar
Asthana, A., Zafeiriou, S., Cheng, S., et al.: Robust discriminative response map fitting with constrained local models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3451. IEEE, Portland (2013)
Google Scholar
Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Vision Conference 2006, vol. III, pp. 929–938. BMVA, Edinburgh (2006)
Google Scholar
Liong, S.T., See, J., Wong, K.S., et al.: Automatic micro-expression recognition from long video using a single spotted apex. In: Computer Vision–ACCV 2016 Workshops. LNCS, vol. 10117, pp. 345–360. Springer, Taipei (2016)
Chapter Google Scholar
Shreve, M., Brizzi, J., Fefilatyev, S., et al.: Automatic expression spotting in videos. Image Vis. Comput. 32(8), 476–486 (2014)
Article Google Scholar
Li, J., Soladie, C., Seguier, R.: LTP-ML: micro-expression detection by recognition of local temporal pattern of facial movements. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 634–641. IEEE, Xi’an (2018)
Google Scholar
Liong, S.T., See, J., Wong, K.S., et al.: Automatic apex frame spotting in micro-expression database. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp. 665–669. IEEE, Kuala Lumpur (2015)
Google Scholar
Verburg, M., Menkovski, V.: Micro-expression detection in long videos using optical flow and recurrent neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–6. IEEE, Lille (2019)
Google Scholar
Yu, W.W., Jiang, J., Li, Y.J.: LSSNet: A two-stream convolutional neural network for spotting macro- and micro-expression in long videos. In: Proceedings of the 29th ACM International Conference on Multimedia 2021, pp. 4745–4749 (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 2017, vol. 30. Curran Associates, Long Beach (2017)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations ICLR 2021, Virtual Event Austria (2021)
Google Scholar
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE, Miami (2009)
Google Scholar
He, E., Chen, Q., Zhong, Q.: SL-Swin: a transformer-based deep learning approach for macro- and micro-expression spotting on small-size expression datasets. Electronics 12(12), 2656 (2023)
Google Scholar
Sun, L., Lian, Z., Liu, B., et al.: MAE-DFER: efficient masked autoencoder for self-supervised dynamic facial expression recognition. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 6110–6121. Ottawa (2023)
Google Scholar
Liong, S.T., Gan, Y.S., See, J., et al.: Shallow triple stream three-dimensional CNN (STSTNet) for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5. IEEE, Lille (2019)
Google Scholar
Liong, G.B., See, J., Wong, L.K.: Shallow optical flow three-stream CNN for macro- and micro-expression spotting from long videos. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2643–2647. IEEE, Anchorage (2021)
Google Scholar
Minh Trieu, N., Truong Thinh, N.: The anthropometric measurement of nasal landmark locations by digital 2D photogrammetry using the convolutional neural network. Diagnostics 13(5), 891 (2023)
Google Scholar
Mohamed, M.A., Mertsching, B.: TV-L1 optical flow estimation with image details recovering based on modified census transform. In: Advances in visual computing (ISVC 2012). LNCS, vol. 7431, pp. 482–491. Springer, Heidelberg (2012)
Chapter Google Scholar
Zhang, L.W., Li, J., Wang, S.J., et al.: Spatio-temporal fusion for macro- and micro-expression spotting in long video sequences. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 734–741. IEEE (2020)
Google Scholar
Yang, B., Wu, J., Zhou, Z., et al.: Facial action unit-based deep learning framework for spotting macro- and micro-expressions in long video sequences. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4794–4798 (2021)
Google Scholar
Zhao, Y., Tong, X., Zhu, Z., et al.: Rethinking optical flow methods for micro-expression spotting. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7175–7179. Association for Computing Machinery, Lisboa (2022)
Google Scholar
Moor, M., Banerjee, O., Abad, Z.S.H., et al.: Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023)
Article Google Scholar
Fang, J., Xie, L., Wang, X., et al.: Msg-transformer: exchanging local spatial information by manipulating messenger tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, pp. 12063–12072 (2022)
Google Scholar
Sun, L., Lian, Z., Liu, B., et al.: Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans. Affect. Comput. 15(1), 309–325 (2024)
Article Google Scholar
He, K., Chen, X., Xie, S., et al.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) 2022, pp. 16000–16009 (2022)
Google Scholar
Qu, F., Wang, S.J., Yan, W.J., et al.: CAS(ME)²: a database for spontaneous macro-expression and micro-expression spotting and recognition. IEEE Trans. Affect. Comput. 9(4), 424–436 (2018)
Article Google Scholar
Yap, C.H., Kendrick, C., Yap, M.H.: Samm long videos: a spontaneous facial micro-and macro-expressions dataset. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 771–776. IEEE, Buenos Aires (2020)
Google Scholar
Jingting, L.I., Wang, S.J., Yap, M.H., et al.: Megc2020-the third facial micro-expression grand challenge. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 777–780. IEEE, Buenos Aires (2020)
Google Scholar
Li, J., Dong, Z., Lu, S., et al.: CAS (ME) 3: a third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 2782–2800 (2023)
Google Scholar
He, Y., Wang, S.J., Li, J., et al.: Spotting macro- and micro-expression intervals in long video sequences. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 742–748. IEEE, Buenos Aires (2020)
Google Scholar
Yuhong, H.: Research on micro-expression spotting method based on optical flow features. In: Proceedings of the 29th ACM International Conference on Multimedia 2021, pp. 4803–4807. Association for Computing Machinery, Virtual Event China (2021)
Google Scholar
Yap, C.H., Yap, M.H., Davison, A., et al.: 3d-cnn for facial micro-and macro-expression spotting on long video sequences using temporal oriented reference frame. In: Proceedings of the 30th ACM International Conference on Multimedia 2022, pp. 7016–7020 (2022)
Google Scholar
Liong, G.B., Liong, S.T., See, J., et al.: MTSN: a multi-temporal stream network for spotting facial macro-and micro-expression with hard and soft pseudo-labels. In: Proceedings of the 2nd Workshop on Facial Micro-Expression: Advanced Techniques for Multi-Modal Facial Expression Analysis 2022, pp. 3–10. Lisboa (2022)
Google Scholar
Selvaraju, R.R., Cogswell, M.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision (ICCV) 2017, pp. 618–626. IEEE (2017)
Google Scholar

Download references

Acknowledgements

This paper is supported by the National Nature Science Foundation of China (No. 62362037), the Natural Science Foundation of Jiangxi Province of China (No. 20224ACB202011) and the Jiangxi Province Graduate Innovation Special Fund Project (No. YC2023-X17).

Author information

Authors and Affiliations

Key Lab of Optic-Electronic and Communication, Jiangxi Science and Technology Normal University, Nanchang, 330031, China
Zhihua Xie & Xionghui Ye

Authors

Zhihua Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xionghui Ye
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihua Xie .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xie, Z., Ye, X. (2025). Local and Global Features Interactive Fusion Network for Macro- and Micro-expression Spotting in Long Videos. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15041. Springer, Singapore. https://doi.org/10.1007/978-981-97-8795-1_23

Download citation

DOI: https://doi.org/10.1007/978-981-97-8795-1_23
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8794-4
Online ISBN: 978-981-97-8795-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Local and Global Features Interactive Fusion Network for Macro- and Micro-expression Spotting in Long Videos

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Lite general network and MagFace CNN for micro-expression spotting in long videos

Triple-ATFME: Triple-Branch Attention Fusion Network for Micro-Expression Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Local and Global Features Interactive Fusion Network for Macro- and Micro-expression Spotting in Long Videos

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Inceptr: micro-expression recognition integrating inception-CBAM and vision transformer

Lite general network and MagFace CNN for micro-expression spotting in long videos

Triple-ATFME: Triple-Branch Attention Fusion Network for Micro-Expression Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation