Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Smart Director: An Event-Driven Directing System for Live Broadcasting

Published: 12 November 2021 Publication History

Abstract

Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keeps increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called “three-event” construction of sports broadcast [14], we build our system with an event-driven pipeline consisting of three consecutive novel components: (1) the Multi-View Event Localization to detect events by modeling multi-view correlations, (2) the Multi-View Highlight Detection to rank camera views by the visual importance for view selection, and (3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed videos. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors.

References

[1]
Andrew Barnfield. 2013. Soccer, broadcasting, and narrative: On televising a live soccer match. Communication & Sport 1, 4 (2013), 326–341.
[2]
Shyamal Buch, Victor Escorcia, Chuanqi Shen, Bernard Ghanem, and Juan Carlos Niebles. 2017. Sst: Single-stream temporal action proposals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2911–2920.
[3]
Qi Cai, Yingwei Pan, Chong-Wah Ngo, Xinmei Tian, Lingyu Duan, and Ting Yao. 2019. Exploring object relation in mean teacher for cross-domain detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11457–11466.
[4]
Christine Chen, Oliver Wang, Simon Heinzle, Peter Carr, Aljoscha Smolic, and Markus Gross. 2013. Computational sports broadcasting: Automated director assistance for live sports. In 2013 IEEE International Conference on Multimedia and Expo (ICME’13). IEEE, 1–6.
[5]
Jianhui Chen and James J. Little. 2019. Sports camera calibration via synthetic data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.
[6]
Jianhui Chen, Keyu Lu, Sijia Tian, and Jim Little. 2019. Learning sports camera selection from internet videos. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV’19). IEEE, 1682–1691.
[7]
Jianhui Chen, Lili Meng, and James J. Little. 2018. Camera selection for broadcasting soccer games. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 427–435.
[8]
Yang Chen, Yingwei Pan, Ting Yao, Xinmei Tian, and Tao Mei. 2019. Mocycle-Gan: Unpaired video-to-video translation. In Proceedings of the 27th ACM International Conference on Multimedia. 647–655.
[9]
Kyu-Hyoung Choi, Sang-Wook Lee, and Yong-Duek Seo. 2009. Automatic broadcast video generation for ball sports from multiple views. In Proceedings of the Korean Society of Broadcast Engineers Conference. The Korean Institute of Broadcast and Media Engineers, 193–198.
[10]
Fahad Daniyal and Andrea Cavallaro. 2011. Multi-camera scheduling for video production. In 2011 Conference for Visual Media Production. IEEE, 11–20.
[11]
Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang Li, and Tao Mei. 2019. Relation distillation networks for video object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7023–7032.
[12]
Adrien Gaidon, Zaid Harchaoui, and Cordelia Schmid. 2013. Temporal localization of actions with actoms. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 2782–2795.
[13]
Jiyang Gao, Kan Chen, and Ram Nevatia. 2018. CTAP: Complementary temporal action proposal generation. In Proceedings of the European Conference on Computer Vision (ECCV’18). 68–83.
[14]
John Goldlust. 2018. Playing for Keeps: Sport, the Media and Society. Hybrid Publishers.
[15]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125–1134.
[16]
Ali Javed, Khalid Bashir Bajwa, Hafiz Malik, and Aun Irtaza. 2016. An efficient framework for automatic highlights generation from sports videos. IEEE Signal Processing Letters 23, 7 (2016), 954–958.
[17]
Hoseong Kim, Tao Mei, Hyeran Byun, and Ting Yao. 2018. Exploiting web images for video highlight detection with triplet deep ranking. IEEE Transactions on Multimedia 20, 9 (2018), 2415–2426.
[18]
Yu Kong, Zhiqiang Tao, and Yun Fu. 2017. Deep sequential context networks for action prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1473–1481.
[19]
Mackenzie Leake, Abe Davis, Anh Truong, and Maneesh Agrawala. 2017. Computational video editing for dialogue-driven scenes.ACM Transactions on Graphics 36, 4 (2017), 130–1.
[20]
Florent Lefevre, Vincent Bombardier, Patrick Charpentier, Nicolas Krommenacker, and Bertrand Petat. 2018. Automatic camera selection in the context of basketball game. In International Conference on Image and Signal Processing. Springer, 72–79.
[21]
Chunyang Li, Caiyan Jia, Zhineng Chen, Xiaoyan Gu, and Hongyun Bao. 2019. psDirector: An automatic director for watching view generation from panoramic soccer video. In International Conference on Multimedia Modeling. Springer, 218–230.
[22]
Yehao Li, Yingwei Pan, Ting Yao, Hongyang Chao, Yong Rui, and Tao Mei. 2019. Learning click-based deep structure-preserving embeddings with visual attention. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3 (2019), 1–19.
[23]
Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, and Tao Mei. 2018. Jointly localizing and describing events for dense video captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7492–7500.
[24]
Yehao Li, Ting Yao, Yingwei Pan, Hongyang Chao, and Tao Mei. 2019. Deep metric learning with density adaptivity. IEEE Transactions on Multimedia 22, 5 (2019), 1285–1297.
[25]
Tianwei Lin, Xu Zhao, and Zheng Shou. 2017. Single shot temporal action detection. In Proceedings of the 25th ACM International Conference on Multimedia. 988–996.
[26]
Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, and Tao Mei. 2019. Gaussian temporal awareness networks for action localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 344–353.
[27]
Pascal Mettes and Cees G. M. Snoek. 2019. Pointly-supervised action localization. International Journal of Computer Vision 127, 3 (2019), 263–281.
[28]
Dan Oneata, Jakob Verbeek, and Cordelia Schmid. 2013. Action and event recognition with Fisher vectors on a compact feature set. In Proceedings of the IEEE International Conference on Computer Vision. 1817–1824.
[29]
Jim Owens. 2015. Television Sports Production. CRC Press.
[30]
Yingwei Pan, Yehao Li, Ting Yao, Tao Mei, Houqiang Li, and Yong Rui. 2016. Learning deep intrinsic video representation by exploring temporal coherence and graph structure. In IJCAI. Citeseer, 3832–3838.
[31]
Yingwei Pan, Ting Yao, Houqiang Li, Chong-Wah Ngo, and Tao Mei. 2015. Semi-supervised hashing with semantic confidence for large scale visual search. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 53–62.
[32]
Yingwei Pan, Ting Yao, Yehao Li, and Tao Mei. 2020. X-linear attention networks for image captioning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10971–10980.
[33]
Yingwei Pan, Ting Yao, Xinmei Tian, Houqiang Li, and Chong-Wah Ngo. 2014. Click-through-based subspace learning for image search. In Proceedings of the 22nd ACM International Conference on Multimedia. 233–236.
[34]
José Luis Pech-Pacheco, Gabriel Cristóbal, Jesús Chamorro-Martinez, and Joaquín Fernández-Valdivia. 2000. Diatom autofocusing in brightfield microscopy: A comparative study. In Proceedings 15th International Conference on Pattern Recognition (ICPR-2000), Vol. 3. IEEE, 314–317.
[35]
Danila Potapov, Matthijs Douze, Zaid Harchaoui, and Cordelia Schmid. 2014. Category-specific video summarization. In European Conference on Computer Vision. Springer, 540–555.
[36]
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In Proceedings of the IEEE International Conference on Computer Vision. 5533–5541.
[37]
Arnau Raventos, Raul Quijada, Luis Torres, and Francesc Tarrés. 2015. Automatic summarization of soccer highlights using audio-visual descriptors. SpringerPlus 4, 1 (2015), 1–19.
[38]
Yong Rui, Anoop Gupta, and Alex Acero. 2000. Automatically extracting highlights for TV baseball programs. In Proceedings of the 8th ACM International Conference on Multimedia. 105–115.
[39]
Huang-Chia Shih. 2017. A survey of content-aware video analysis for sports. IEEE Transactions on Circuits and Systems for Video Technology 28, 5 (2017), 1212–1231.
[40]
Zheng Shou, Dongang Wang, and Shih-Fu Chang. 2016. Temporal action localization in untrimmed videos via multi-stage cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1049–1058.
[41]
Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, and Cordelia Schmid. 2019. Relational action forecasting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 273–283.
[42]
Jinjun Wang, Changsheng Xu, Engsiong Chng, Hanqing Lu, and Qi Tian. 2008. Automatic composition of broadcast sports video. Multimedia Systems 14, 4 (2008), 179–193.
[43]
Jinjun Wang, Changsheng Xu, Engsiong Chng, Kongwah Wah, and Qi Tian. 2004. Automatic replay generation for soccer video broadcasting. In Proceedings of the 12th Annual ACM International Conference on Multimedia. 32–39.
[44]
Xueting Wang, Kensho Hara, Yu Enokibori, Takatsugu Hirayama, and Kenji Mase. 2016. Personal multi-view viewpoint recommendation based on trajectory distribution of the viewing target. In Proceedings of the 24th ACM International Conference on Multimedia. 471–475.
[45]
Xueting Wang, Yuki Muramatu, Takatsugu Hirayama, and Kenji Mase. 2014. Context-dependent viewpoint sequence recommendation system for multi-view video. In 2014 IEEE International Symposium on Multimedia. IEEE, 195–202.
[46]
Bo Xiong, Yannis Kalantidis, Deepti Ghadiyaram, and Kristen Grauman. 2019. Less is more: Learning highlight detection from video duration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1258–1267.
[47]
Ting Yao, Tao Mei, and Yong Rui. 2016. Highlight detection with pairwise deep ranking for first-person video summarization. In CVPR.
[48]
Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2018. Exploring visual relationship for image captioning. In Proceedings of the European Conference on Computer Vision (ECCV’18). 684–699.
[49]
Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2019. Hierarchy parsing for image captioning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2621–2629.
[50]
Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, and Tao Mei. 2020. SeCo: Exploring sequence supervision for unsupervised representation learning. CoRR abs/2008.00975. arxiv:2008.00975https://arxiv.org/abs/2008.00975
[51]
Bin Yu and Anil K. Jain. 1997. Lane boundary detection using a multiresolution hough transform. In Proceedings of International Conference on Image Processing, Vol. 2. IEEE, 748–751.
[52]
Ke Zhang, Wei-Lun Chao, Fei Sha, and Kristen Grauman. 2016. Video summarization with long short-term memory. In European Conference on Computer Vision. Springer, 766–782.
[53]
Ke Zhang, Kristen Grauman, and Fei Sha. 2018. Retrospective encoders for video summarization. In Proceedings of the European Conference on Computer Vision (ECCV’18). 383–399.
[54]
Ning Zhang, Jingen Liu, Ke Wang, Dan Zeng, and Tao Mei. 2020. Robust visual object tracking with two-stream residual convolutional networks. CoRR abs/2005.06536. arxiv:2005.06536https://arxiv.org/abs/2005.06536
[55]
Shifeng Zhang, Xiangyu Zhu, Zhen Lei, Hailin Shi, Xiaobo Wang, and Stan Z. Li. 2017. FaceBoxes: A CPU real-time face detector with high accuracy. In 2017 IEEE International Joint Conference on Biometrics (IJCB’17). IEEE, 1–9.
[56]
Jiaojiao Zhao and Cees G. M. Snoek. 2019. Dance with flow: Two-in-one stream action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9935–9944.
[57]
Yue Zhao, Yuanjun Xiong, Limin Wang, Zhirong Wu, Xiaoou Tang, and Dahua Lin. 2017. Temporal action detection with structured segment networks. In Proceedings of the IEEE International Conference on Computer Vision. 2914–2923.
[58]
Jiawei Zuo, Yue Chen, Linfang Wang, Yingwei Pan, Ting Yao, Ke Wang, and Tao Mei. 2020. iDirector: An intelligent directing system for live broadcast. In Proceedings of the 28th ACM International Conference on Multimedia. 4545–4547.

Cited By

View all
  • (2023)How Will You Pod? Implications of Creators’ Perspectives for Designing Innovative Podcasting ToolsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362509920:3(1-25)Online publication date: 23-Oct-2023
  • (2023)Real-time Computational Cinematographic Editing for Broadcasting of Volumetric-captured events: an Application to Ultimate FightingProceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624468(1-8)Online publication date: 15-Nov-2023
  • (2023)Intelligent Directing System for Music Concert Scene Based on Visual and Auditory InformationProceedings of the 2023 ACM International Conference on Interactive Media Experiences Workshops10.1145/3604321.3604336(95-102)Online publication date: 12-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 4
November 2021
529 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3492437
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2021
Accepted: 01 February 2021
Revised: 01 January 2021
Received: 01 September 2020
Published in TOMM Volume 17, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Sports-broadcast directing
  2. multi-view event detection
  3. highlight detection

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)11
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)How Will You Pod? Implications of Creators’ Perspectives for Designing Innovative Podcasting ToolsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/362509920:3(1-25)Online publication date: 23-Oct-2023
  • (2023)Real-time Computational Cinematographic Editing for Broadcasting of Volumetric-captured events: an Application to Ultimate FightingProceedings of the 16th ACM SIGGRAPH Conference on Motion, Interaction and Games10.1145/3623264.3624468(1-8)Online publication date: 15-Nov-2023
  • (2023)Intelligent Directing System for Music Concert Scene Based on Visual and Auditory InformationProceedings of the 2023 ACM International Conference on Interactive Media Experiences Workshops10.1145/3604321.3604336(95-102)Online publication date: 12-Jun-2023
  • (2023)Boosting Relationship Detection in Images with Multi-Granular Self-Supervised LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355697819:2s(1-18)Online publication date: 17-Feb-2023
  • (2023)Multi-Granularity Aggregation Transformer for Joint Video-Audio-Text Representation LearningIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.322554933:6(2990-3002)Online publication date: 1-Jun-2023
  • (2023) TraceNetKnowledge-Based Systems10.1016/j.knosys.2023.110792277:COnline publication date: 9-Oct-2023
  • (2022)Automating sports broadcasting using ultra-high definition cameras, neural networks, and classical denoisingApplications of Digital Image Processing XLV10.1117/12.2633075(36)Online publication date: 3-Oct-2022
  • (2022)LTC-SUM: Lightweight Client-Driven Personalized Video Summarization Framework Using 2D CNNIEEE Access10.1109/ACCESS.2022.320927510(103041-103055)Online publication date: 2022
  • (2022)Joint Masked Autoencoding with Global Reconstruction for Point Cloud LearningPattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges10.1007/978-3-031-37660-3_22(313-329)Online publication date: 21-Aug-2022
  • (2022)3D-Producer: A Hybrid and User-Friendly 3D Reconstruction SystemArtificial Intelligence10.1007/978-3-031-20503-3_43(526-531)Online publication date: 27-Aug-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media