Abstract
Individual emotions are often manifested through facial expressions, where macro-expressions (MaEs) and micro-expressions (MEs) provide the corresponding visual cues for different emotion applications. The spotting of these intertwined expressions has attracted extraordinary interests, which is an indispensable procedure in expression applications. However, due to the presence of noise, irrelevant movements, and the confusion of MEs and MaEs, it is very challenging to learn discriminative intrinsic features by deep learning models. In this paper, we explore an efficient deep neural network to address the issue of MaEs and MEs spotting in long videos. Specifically, this study assigns optical flow features as the model input and proposes a deep model, named LGFINet, which concentrates on the fusion of local and global features to predict probability scores of frames during an expression interval. To further boost the learning capability of facial expression spotting, the LGFINet scheme integrates multi-head self-attention and multi-head cross-attention into the backbone of the spotting network. To validate the superiority of the LGFINet, spotting experiments are conducted on two public MEs datasets, CAS(ME)2 and SAMM-LV. The proposed spotting approach achieves the F1 scores of 0.3710 and 0.4129 on the SAMM-LV and CAS(ME)2 dataset respectively. Extensive experiments verify the robustness and superiority of the MEs spotting based on LGFINet to other models. The source code of the LGFINet is available on GitHub (https://github.com/XionghuiYe/LGIF_Net).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Davison, A.K., Yap, M.H., Lansley, C.: Micro-facial movement detection using individualised baselines and histogram-based descriptors. In: 2015 IEEE International Conference on Systems, Man, and Cybernetics, pp. 1864–1869. IEEE, Hong Kong (2015)
Moilanen, A., Zhao, G., Pietikainen, M.: Spotting rapid facial movements from videos using appearance-based feature difference analysis. In: 2014 22nd International Conference on Pattern Recognition, pp. 1722–1727. IEEE, Stockholm (2014)
Asthana, A., Zafeiriou, S., Cheng, S., et al.: Robust discriminative response map fitting with constrained local models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3444–3451. IEEE, Portland (2013)
Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models. In: Proceedings of the British Machine Vision Conference 2006, vol. III, pp. 929–938. BMVA, Edinburgh (2006)
Liong, S.T., See, J., Wong, K.S., et al.: Automatic micro-expression recognition from long video using a single spotted apex. In: Computer Vision–ACCV 2016 Workshops. LNCS, vol. 10117, pp. 345–360. Springer, Taipei (2016)
Shreve, M., Brizzi, J., Fefilatyev, S., et al.: Automatic expression spotting in videos. Image Vis. Comput. 32(8), 476–486 (2014)
Li, J., Soladie, C., Seguier, R.: LTP-ML: micro-expression detection by recognition of local temporal pattern of facial movements. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 634–641. IEEE, Xi’an (2018)
Liong, S.T., See, J., Wong, K.S., et al.: Automatic apex frame spotting in micro-expression database. In: 2015 3rd IAPR Asian conference on pattern recognition (ACPR), pp. 665–669. IEEE, Kuala Lumpur (2015)
Verburg, M., Menkovski, V.: Micro-expression detection in long videos using optical flow and recurrent neural networks. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–6. IEEE, Lille (2019)
Yu, W.W., Jiang, J., Li, Y.J.: LSSNet: A two-stream convolutional neural network for spotting macro- and micro-expression in long videos. In: Proceedings of the 29th ACM International Conference on Multimedia 2021, pp. 4745–4749 (2021)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 2017, vol. 30. Curran Associates, Long Beach (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations ICLR 2021, Virtual Event Austria (2021)
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE, Miami (2009)
He, E., Chen, Q., Zhong, Q.: SL-Swin: a transformer-based deep learning approach for macro- and micro-expression spotting on small-size expression datasets. Electronics 12(12), 2656 (2023)
Sun, L., Lian, Z., Liu, B., et al.: MAE-DFER: efficient masked autoencoder for self-supervised dynamic facial expression recognition. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 6110–6121. Ottawa (2023)
Liong, S.T., Gan, Y.S., See, J., et al.: Shallow triple stream three-dimensional CNN (STSTNet) for micro-expression recognition. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1–5. IEEE, Lille (2019)
Liong, G.B., See, J., Wong, L.K.: Shallow optical flow three-stream CNN for macro- and micro-expression spotting from long videos. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 2643–2647. IEEE, Anchorage (2021)
Minh Trieu, N., Truong Thinh, N.: The anthropometric measurement of nasal landmark locations by digital 2D photogrammetry using the convolutional neural network. Diagnostics 13(5), 891 (2023)
Mohamed, M.A., Mertsching, B.: TV-L1 optical flow estimation with image details recovering based on modified census transform. In: Advances in visual computing (ISVC 2012). LNCS, vol. 7431, pp. 482–491. Springer, Heidelberg (2012)
Zhang, L.W., Li, J., Wang, S.J., et al.: Spatio-temporal fusion for macro- and micro-expression spotting in long video sequences. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 734–741. IEEE (2020)
Yang, B., Wu, J., Zhou, Z., et al.: Facial action unit-based deep learning framework for spotting macro- and micro-expressions in long video sequences. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4794–4798 (2021)
Zhao, Y., Tong, X., Zhu, Z., et al.: Rethinking optical flow methods for micro-expression spotting. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 7175–7179. Association for Computing Machinery, Lisboa (2022)
Moor, M., Banerjee, O., Abad, Z.S.H., et al.: Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023)
Fang, J., Xie, L., Wang, X., et al.: Msg-transformer: exchanging local spatial information by manipulating messenger tokens. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022, pp. 12063–12072 (2022)
Sun, L., Lian, Z., Liu, B., et al.: Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis. IEEE Trans. Affect. Comput. 15(1), 309–325 (2024)
He, K., Chen, X., Xie, S., et al.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) 2022, pp. 16000–16009 (2022)
Qu, F., Wang, S.J., Yan, W.J., et al.: CAS(ME)2: a database for spontaneous macro-expression and micro-expression spotting and recognition. IEEE Trans. Affect. Comput. 9(4), 424–436 (2018)
Yap, C.H., Kendrick, C., Yap, M.H.: Samm long videos: a spontaneous facial micro-and macro-expressions dataset. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 771–776. IEEE, Buenos Aires (2020)
Jingting, L.I., Wang, S.J., Yap, M.H., et al.: Megc2020-the third facial micro-expression grand challenge. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 777–780. IEEE, Buenos Aires (2020)
Li, J., Dong, Z., Lu, S., et al.: CAS (ME) 3: a third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 2782–2800 (2023)
He, Y., Wang, S.J., Li, J., et al.: Spotting macro- and micro-expression intervals in long video sequences. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), pp. 742–748. IEEE, Buenos Aires (2020)
Yuhong, H.: Research on micro-expression spotting method based on optical flow features. In: Proceedings of the 29th ACM International Conference on Multimedia 2021, pp. 4803–4807. Association for Computing Machinery, Virtual Event China (2021)
Yap, C.H., Yap, M.H., Davison, A., et al.: 3d-cnn for facial micro-and macro-expression spotting on long video sequences using temporal oriented reference frame. In: Proceedings of the 30th ACM International Conference on Multimedia 2022, pp. 7016–7020 (2022)
Liong, G.B., Liong, S.T., See, J., et al.: MTSN: a multi-temporal stream network for spotting facial macro-and micro-expression with hard and soft pseudo-labels. In: Proceedings of the 2nd Workshop on Facial Micro-Expression: Advanced Techniques for Multi-Modal Facial Expression Analysis 2022, pp. 3–10. Lisboa (2022)
Selvaraju, R.R., Cogswell, M.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision (ICCV) 2017, pp. 618–626. IEEE (2017)
Acknowledgements
This paper is supported by the National Nature Science Foundation of China (No. 62362037), the Natural Science Foundation of Jiangxi Province of China (No. 20224ACB202011) and the Jiangxi Province Graduate Innovation Special Fund Project (No. YC2023-X17).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xie, Z., Ye, X. (2025). Local and Global Features Interactive Fusion Network for Macro- and Micro-expression Spotting in Long Videos. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15041. Springer, Singapore. https://doi.org/10.1007/978-981-97-8795-1_23
Download citation
DOI: https://doi.org/10.1007/978-981-97-8795-1_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8794-4
Online ISBN: 978-981-97-8795-1
eBook Packages: Computer ScienceComputer Science (R0)