Liu R, Li C, Tang H, Ge Y, Shan Y and Li G. (2025). ST-LLM: Large Language Models Are Effective Temporal Learners. Computer Vision – ECCV 2024. 10.1007/978-3-031-72998-0_1. (1-18).

https://link.springer.com/10.1007/978-3-031-72998-0_1

Ma W, Li K, Jiang Z, Meshry M, Liu Q, Wang H, Häne C and Yuille A. (2025). Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data. Computer Vision – ECCV 2024. 10.1007/978-3-031-72624-8_15. (254-269).

https://link.springer.com/10.1007/978-3-031-72624-8_15

Li H, Hao Y, Yu J, Zhu B, Wang S and Xu T. (2024). CVLP-NaVD: Contrastive Visual-Language Pre-training Models for Non-annotated Visual Description. ACM Transactions on Multimedia Computing, Communications, and Applications. 10.1145/3708348.

https://dl.acm.org/doi/10.1145/3708348

Alrubaie A, Khodher M and Abdulameer A. (2024). Modification of the 5D Lorenz chaotic map with fuzzy numbers for video encryption in cloud computing. Open Engineering. 10.1515/eng-2024-0051. 14:1. Online publication date: 3-Dec-2024.. Online publication date: 1-Jan-2024.

https://www.degruyter.com/document/doi/10.1515/eng-2024-0051/html

Malik H and Hewahi N. (2024). A Novel Model for Chart-to-Text Generation by Utilizing NN Models 2024 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). 10.1109/3ict64318.2024.10824513. 979-8-3315-3313-7. (26-29).

https://ieeexplore.ieee.org/document/10824513/

Liu X, Zhou T, Wang C, Wang Y, Wang Y, Cao Q, Du W, Yang Y, He J, Qiao Y and Shen Y. (2024). Toward the unification of generative and discriminative visual foundation model: a survey. The Visual Computer. 10.1007/s00371-024-03608-8.

https://link.springer.com/10.1007/s00371-024-03608-8

Emran N, Saleh N and Ali M. (2024). Sports Video Classification Using Convolutional Neural Network (CNN) with Normalization Flow 2024 5th International Conference on Artificial Intelligence and Data Sciences (AiDAS). 10.1109/AiDAS63860.2024.10730229. 979-8-3315-2855-3. (1-6).

https://ieeexplore.ieee.org/document/10730229/

Wang Z, Zhang D and Hu Z. (2024). LSECA: local semantic enhancement and cross aggregation for video-text retrieval. International Journal of Multimedia Information Retrieval. 10.1007/s13735-024-00335-7. 13:3. Online publication date: 1-Sep-2024.

https://link.springer.com/10.1007/s13735-024-00335-7

Maradana C, An M, Rasheed A and Liu Q. Human Activity Recognition and Abnormality Detection Using Deep Learning. Proceedings of the 2024 6th International Conference on Pattern Recognition and Intelligent Systems. (87-92).

https://doi.org/10.1145/3689218.3689231

Babavalian M and Kiani K. (2024). Video captioning using transformer-based GAN. Multimedia Tools and Applications. 10.1007/s11042-024-19247-z.

https://link.springer.com/10.1007/s11042-024-19247-z

R A, G A and N S. (2024). AI Enhanced Video Sequence Description Generator 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS). 10.1109/ADICS58448.2024.10533487. 979-8-3503-6482-8. (1-6).

https://ieeexplore.ieee.org/document/10533487/

Ren B, Liu M, Ding R and Liu H. (2024). A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. Cyborg and Bionic Systems. 10.34133/cbsystems.0100. 5. Online publication date: 1-Jan-2024.

https://spj.science.org/doi/10.34133/cbsystems.0100

Luo X, Luo X, Wang D, Liu J, Wan B and Zhao L. (2024). Global semantic enhancement network for video captioning. Pattern Recognition. 10.1016/j.patcog.2023.109906. 145. (109906). Online publication date: 1-Jan-2024.

https://linkinghub.elsevier.com/retrieve/pii/S0031320323006040

Babavalian M and Kiani K. (2024). Learning distribution of video captions using conditional GAN. Multimedia Tools and Applications. 83:3. (9137-9159). Online publication date: 1-Jan-2024.

https://doi.org/10.1007/s11042-023-15933-6

Ghadekar P, Pungliya V, Purohit A, Bhonsle R, Raut A and Pate S. (2024). A Novel Approach for Deep Learning Based Video Classification and Captioning using Keyframe. Innovations in VLSI, Signal Processing and Computational Technologies. 10.1007/978-981-99-7077-3_50. (511-522).

https://link.springer.com/10.1007/978-981-99-7077-3_50

Darwich M, Khalil K, Ismail Y and Bayoumi M. (2023). Deep Learning-Driven Video Summarization on the Cloud: A Pathway to Efficient Storage and Quick Access 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC). 10.1109/MCSoC60832.2023.00060. 979-8-3503-9361-3. (360-365).

https://ieeexplore.ieee.org/document/10387897/

Rafiq G, Rafiq M and Choi G. (2023). Video description: A comprehensive survey of deep learning approaches. Artificial Intelligence Review. 10.1007/s10462-023-10414-6. 56:11. (13293-13372). Online publication date: 1-Nov-2023.

https://link.springer.com/10.1007/s10462-023-10414-6

Song Y, Zhang S, Tang F, Shi Y, Wu Y, He J, Chen Y and Li L. (2023). Behavior Recognition of Squid Jigger Based on Deep Learning. Fishes. 10.3390/fishes8100502. 8:10. (502).

https://www.mdpi.com/2410-3888/8/10/502

Fang B, Wu W, Liu C, Zhou Y, Song Y, Wang W, Shu X, Ji X and Wang J. (2023). UATVR: Uncertainty-Adaptive Text-Video Retrieval 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 10.1109/ICCV51070.2023.01262. 979-8-3503-0718-4. (13677-13687).

https://ieeexplore.ieee.org/document/10376945/

Khan R, Huang B, Hassan H, Zaman A and Ye Z. (2023). A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation 2023 5th International Conference on Robotics and Computer Vision (ICRCV). 10.1109/ICRCV59470.2023.10328995. 979-8-3503-2636-9. (92-99).

https://ieeexplore.ieee.org/document/10328995/

Patil S, Patil P, Pawar Y, Pisal S and Pawar V. (2023). Key-frame extraction in Spacio-Temporal Neural Networks for Human-Action Classification 2023 3rd Asian Conference on Innovation in Technology (ASIANCON). 10.1109/ASIANCON58793.2023.10270693. 979-8-3503-0228-8. (1-7).

https://ieeexplore.ieee.org/document/10270693/

Wang J, Yan M, Zhang Y and Sang J. From association to generation. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. (4326-4334).

https://doi.org/10.24963/ijcai.2023/481

Moon W, Hyun S, Park S, Park D and Heo J. (2023). Query - Dependent Video Representation for Moment Retrieval and Highlight Detection 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10.1109/CVPR52729.2023.02205. 979-8-3503-0129-8. (23023-23033).

https://ieeexplore.ieee.org/document/10205244/

Wu W, Luo H, Fang B, Wang J and Ouyang W. (2023). Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval? 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 10.1109/CVPR52729.2023.01031. 979-8-3503-0129-8. (10704-10713).

https://ieeexplore.ieee.org/document/10204125/

Yasrab R, Fu Z, Zhao H, Lee L, Sharma H, Drukker L, Papageorgiou A and Noble J. A Machine Learning Method for Automated Description and Workflow Analysis of First Trimester Ultrasound Scans. IEEE Transactions on Medical Imaging. 10.1109/TMI.2022.3226274. 42:5. (1301-1313).

https://ieeexplore.ieee.org/document/9968267/

Zhao S, Liu Y, Du S, Tian Z, Qu T and Xu L. (2023). CMFG: Cross-Model Fine-Grained Feature Interaction for Text-Video Retrieval. MultiMedia Modeling. 10.1007/978-3-031-27818-1_36. (435-445).

https://link.springer.com/10.1007/978-3-031-27818-1_36

Chen T, Grabs E, Petersons E, Efrosinin D, Ipatovs A, Bogdanovs N and Rjazanovs D. (2022). Multiclass Live Streaming Video Quality Classification Based on Convolutional Neural Networks. Automatic Control and Computer Sciences. 10.3103/S0146411622050029. 56:5. (455-466). Online publication date: 1-Oct-2022.

https://link.springer.com/10.3103/S0146411622050029

Kolobe T, Tu C and Owolawi P. (2022). A Review on Fall Detection in Smart Home for Elderly and Disabled People. Journal of Advanced Computational Intelligence and Intelligent Informatics. 10.20965/jaciii.2022.p0747. 26:5. (747-757). Online publication date: 20-Sep-2022.

https://www.fujipress.jp/jaciii/jc/jacii002600050747

Tian K, Liu Y, Yang H and Zheng Q. (2022). Cloud game computing offload based on Multi-Agent Reinforcement Learning 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall). 10.1109/VTC2022-Fall57202.2022.10012737. 978-1-6654-5468-1. (1-7).

https://ieeexplore.ieee.org/document/10012737/

Galindo-Lopez C, Beltran J, Perez C, Macias A and Castro L. (2022). Classifying interactions of parents and children with Down syndrome in educational environments using deep learning 2022 IEEE Mexican International Conference on Computer Science (ENC). 10.1109/ENC56672.2022.9882907. 978-1-6654-7347-7. (1-8).

https://ieeexplore.ieee.org/document/9882907/

Feng L, Zhao Y, Zhao W and Tang J. (2021). A comparative review of graph convolutional networks for human skeleton-based action recognition. Artificial Intelligence Review. 10.1007/s10462-021-10107-y. 55:5. (4275-4305). Online publication date: 1-Jun-2022.

https://link.springer.com/10.1007/s10462-021-10107-y

S. S and M.V. J. (2022). Understanding dance semantics using spatio-temporal features coupled GRU networks. Entertainment Computing. 10.1016/j.entcom.2022.100484. 42. (100484). Online publication date: 1-May-2022.

https://linkinghub.elsevier.com/retrieve/pii/S1875952122000088

Zhou G, Aggarwal V, Yin M and Yu D. A Computer Vision Approach for Estimating Lifting Load Contributors to Injury Risk. IEEE Transactions on Human-Machine Systems. 10.1109/THMS.2022.3148339. 52:2. (207-219).

https://ieeexplore.ieee.org/document/9718506/

Zhao H, Jin K, Wang J and Yahya A. (2022). Automatic Recognition and Extraction of English Verb Types Based on Index Line Clustering. Mobile Information Systems. 2022. Online publication date: 1-Jan-2022.

https://doi.org/10.1155/2022/2652622

Mahmoud E, Wassif K and Bayomi H. (2022). Transfer Learning and Recurrent Neural Networks for Automatic Arabic Sign Language Recognition. The 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022). 10.1007/978-3-031-03918-8_5. (47-59).

https://link.springer.com/10.1007/978-3-031-03918-8_5

Shang L, Kou Z, Zhang Y and Wang D. (2021). A Multimodal Misinformation Detector for COVID-19 Short Videos on TikTok 2021 IEEE International Conference on Big Data (Big Data). 10.1109/BigData52589.2021.9671928. 978-1-6654-3902-2. (899-908).

https://ieeexplore.ieee.org/document/9671928/

(2021). Electronic and Reverse Engineering. Electronics in Advanced Research Industries. 10.1002/9781119716907.ch8. (341-380). Online publication date: 23-Nov-2021.

https://onlinelibrary.wiley.com/doi/10.1002/9781119716907.ch8

Shitrit T. Feature Learning in Video-Based Analysis of Animal Emotional States. Proceedings of the Eight International Conference on Animal-Computer Interaction. (1-4).

https://doi.org/10.1145/3493842.3493896

García Garví A, Puchalt J, Layana Castro P, Navarro Moya F and Sánchez-Salmerón A. (2021). Towards Lifespan Automation for Caenorhabditis elegans Based on Deep Learning: Analysing Convolutional and Recurrent Neural Networks for Dead or Live Classification. Sensors. 10.3390/s21144943. 21:14. (4943).

https://www.mdpi.com/1424-8220/21/14/4943

KAZANÇ M, ENSARİ T and DAĞTEKİN M. (2021). Videoların Derin Öğrenme ile Sınıflandırılarak Filtrelenmesi. European Journal of Science and Technology. 10.31590/ejosat.952481.

https://dergipark.org.tr/tr/doi/10.31590/ejosat.952481

Ji W and Wang R. (2021). A Multi-instance Multi-label Dual Learning Approach for Video Captioning. ACM Transactions on Multimedia Computing, Communications, and Applications. 17:2s. (1-18). Online publication date: 21-Jun-2021.

https://doi.org/10.1145/3446792

Bhowmik A, Kumar S and Bhat N. (2021). Evolution of automatic visual description techniques-a methodological survey. Multimedia Tools and Applications. 10.1007/s11042-021-10964-3.

https://link.springer.com/10.1007/s11042-021-10964-3

Hossen M, Akter M and Saifuddin K. (2021). Automatic Digit and Alphabet Recognition Based Online Toll Collection System 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD). 10.1109/ICICT4SD50815.2021.9396844. 978-1-6654-1460-9. (189-194).

https://ieeexplore.ieee.org/document/9396844/

Wang G, Du J and Zhang H. (2021). Multi-feature fusion refine network for video captioning. Journal of Experimental & Theoretical Artificial Intelligence. 10.1080/0952813X.2021.1883745. (1-15).

https://www.tandfonline.com/doi/full/10.1080/0952813X.2021.1883745

Kim Y, Heo H, Chung S and Lee B. (2021). End-To-End Lip Synchronisation Based on Pattern Classification 2021 IEEE Spoken Language Technology Workshop (SLT). 10.1109/SLT48900.2021.9383616. 978-1-7281-7066-4. (598-605).

https://ieeexplore.ieee.org/document/9383616/

Han X, Pan M, Ge H, Li S, Hu J, Zhao L, Li Y and Razmjooy N. (2021). Multilabel Video Classification Model of Navigation Mark’s Lights Based on Deep Learning. Computational Intelligence and Neuroscience. 2021. Online publication date: 1-Jan-2021.

https://doi.org/10.1155/2021/6794202

Rafiq M, Rafiq G and Choi G. Video Description: Datasets & Evaluation Metrics. IEEE Access. 10.1109/ACCESS.2021.3108565. 9. (121665-121685).

https://ieeexplore.ieee.org/document/9524610/

Memon F, Khan U, Shaikh A, Alghamdi A, Kumar P and Alrizq M. Predicting Actions in Videos and Action-Based Segmentation Using Deep Learning. IEEE Access. 10.1109/ACCESS.2021.3101175. 9. (106918-106932).

https://ieeexplore.ieee.org/document/9500219/

El-Meadawy S, Shalaby H, Ismail N, Farghal A, El-Samie F, Abd-Elnaby M and El-Shafai W. Performance Analysis of 3D Video Transmission Over Deep-Learning-Based Multi-Coded N-ary Orbital Angular Momentum FSO System. IEEE Access. 10.1109/ACCESS.2021.3083524. 9. (110116-110136).

https://ieeexplore.ieee.org/document/9440406/

Garg V, Markhedkar V, Lale S and Raghunandan T. (2021). Video Tagging and Recommender System Using Deep Learning. Innovations in Computational Intelligence and Computer Vision. 10.1007/978-981-15-6067-5_33. (302-310).

http://link.springer.com/10.1007/978-981-15-6067-5_33

Pandya M, Pillai A and Rupani H. (2021). Segregating and Recognizing Human Actions from Video Footages Using LRCN Technique. Advanced Machine Learning Technologies and Applications. 10.1007/978-981-15-3383-9_1. (3-13).

http://link.springer.com/10.1007/978-981-15-3383-9_1

ALPAY Ö and AKCAYOL M. (2020). VİDEO ETİKETLEME UYGULAMALARINDA DERİN ÖĞRENME YAKLAŞIMLARININ KULLANILMASI ÜZERİNE KAPSAMLI BİR İNCELEMEA COMPREHENSIVE REVIEW ON USING OF DEEP LEARNING APPROACHES IN VIDEO CAPTIONING APPLICATIONS. Mühendislik Bilimleri ve Tasarım Dergisi. 10.21923/jesd.830587. 8:5. (271-289).

http://dergipark.org.tr/tr/doi/10.21923/jesd.830587

Sedky Adly A, Abdelwahab M, Hegazy I and Elarif T. (2020). Issues and Challenges for Content-Based Video Search Engines A Survey 2020 21st International Arab Conference on Information Technology (ACIT). 10.1109/ACIT50332.2020.9300062. 978-1-7281-8855-3. (1-18).

https://ieeexplore.ieee.org/document/9300062/

Jones P, Demaria G, Tigchelaar I, Asfaw D, Edgar D, Campbell P, Callaghan T and Crabb D. (2020). The Human Touch: Using a Webcam to Autonomously Monitor Compliance During Visual Field Assessments. Translational Vision Science & Technology. 10.1167/tvst.9.8.31. 9:8. (31). Online publication date: 20-Jul-2020.

https://tvst.arvojournals.org/article.aspx?articleid=2770338

Aggarwal A, Chauhan A, Kumar D, Mittal M, Roy S and Kim T. (2020). Video Caption Based Searching Using End-to-End Dense Captioning and Sentence Embeddings. Symmetry. 10.3390/sym12060992. 12:6. (992).

https://www.mdpi.com/2073-8994/12/6/992

Sanzharov V, Frolov V, Voloboy A, Galaktionov V and Pavlov D. (2020). Image datasets generation system for computer vision applications based on photorealistic rendering. Keldysh Institute Preprints. 10.20948/prepr-2020-80:80. (1-29).

http://keldysh.ru/papers/2020/prep2020_80.pdf

Abdolmohammadi M, Toroghi R and Bastanfard A. (2020). Video Steganography Using 3D Convolutional Neural Networks. Pattern Recognition and Artificial Intelligence. 10.1007/978-3-030-37548-5_12. (149-161).

http://link.springer.com/10.1007/978-3-030-37548-5_12

Randive K and Mohan R. (2020). A State-of-Art Review on Automatic Video Annotation Techniques. Intelligent Systems Design and Applications. 10.1007/978-3-030-16657-1_99. (1060-1069).

https://link.springer.com/10.1007/978-3-030-16657-1_99

Bastan M, Yap K and Chau L. (2019). Remote detection of idling cars using infrared imaging and deep networks. Neural Computing and Applications. 10.1007/s00521-019-04077-0.

http://link.springer.com/10.1007/s00521-019-04077-0

Vasile D and Lukasiewicz T. (2018). Learning Structured Video Descriptions: Automated Video Knowledge Extraction for Video Understanding Tasks. On the Move to Meaningful Internet Systems. OTM 2018 Conferences. 10.1007/978-3-030-02671-4_20. (315-332).

http://link.springer.com/10.1007/978-3-030-02671-4_20