Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing Video Selection LIMIT Queries with Commonsense Knowledge

Published: 30 May 2024 Publication History

Abstract

Video is becoming a major part of contemporary data collection. It is increasingly important to process video selection queries --- selecting videos that contain target objects. Advances in neural networks allow us to detect the objects in an image, and thereby offer query systems to examine the content of the video. Unfortunately, neural network-based approaches have long inference times. Processing this type of query through a standard scan would be time-consuming and would involve applying complex detectors to numerous irrelevant videos. It is tempting to try to improve query times by computing an index in advance. But unfortunately, many frames will never be beneficial for any query. Time spent processing them, whether at index time or at query time, is simply wasted computation.
We propose a novel index mechanism to optimize video selection queries with commonsense knowledge. Commonsense knowledge consists of fundamental information about the world, such as the fact that a tennis racket is a tool designed for hitting a tennis ball. To save computation, an inexpensive but lossy index can be intentionally created, but this may result in missed target objects and suboptimal query time performance. Our mechanism addresses this issue by constructing probabilistic models from commonsense knowledge to patch the lossy index and then prioritizing predicate-related videos at query time. This method can achieve significant performance improvements comparable to those of a full index while keeping the construction costs of a lossy index. We describe our prototype system, Paine, plus experiments on two video corpora. We show our best optimization method can process up to 97.79% fewer videos compared to baselines. Even the model constructed without any video content can yield a 75.39% improvement over baselines.

References

[1]
Sadiq H Abdulhussain, Basheera M Mahmmod, M Iqbal Saripan, SAR Al-Haddad, Thar Baker, Wameedh N Flayyih, Wissam A Jassim, et al. 2019. A fast feature extraction algorithm for image and video processing. In 2019 international joint conference on neural networks (IJCNN). IEEE, 1--8.
[2]
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016).
[3]
Michael R Anderson, Michael Cafarella, German Ros, and Thomas F Wenisch. 2019. Physical representation-based predicate optimization for a visual analytics database. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1466--1477.
[4]
Michael Armbrust, Reynold S Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K Bradley, Xiangrui Meng, Tomer Kaftan, Michael J Franklin, Ali Ghodsi, et al. 2015. Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD international conference on management of data. 1383--1394.
[5]
Favyen Bastani, Songtao He, Arjun Balasingam, Karthik Gopalakrishnan, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, and Sam Maddenee. 2020. MIRIS: fast object track queries in video. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1907--1921.
[6]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877--1901.
[7]
Yuru Cao, Hely Mehta, Ann E Norcross, Masahiko Taniguchi, and Jonathan S Lindsey. 2020. Analysis of Wikipedia pageviews to identify popular chemicals. In Reporters, Markers, Dyes, Nanoparticles, and Molecular Probes for Biomedical Applications XII, Vol. 11256. SPIE, 24--41.
[8]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R Hruschka, and Tom M Mitchell. 2010. Toward an architecture for never-ending language learning. In Twenty-Fourth AAAI conference on artificial intelligence.
[9]
Douglas Comer. 1979. Ubiquitous B-tree. ACM Computing Surveys (CSUR) 11, 2 (1979), 121--137.
[10]
Brian F Cooper, Neal Sample, Michael J Franklin, Gisli R Hjaltason, and Moshe Shadmon. 2001. A fast index for semistructured data. In VLDB, Vol. 1. 341--350.
[11]
Maureen Daum, Brandon Haynes, Dong He, Amrita Mazumdar, and Magdalena Balazinska. 2021. TASM: A tile-based storage manager for video analytics. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1775--1786.
[12]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[13]
Korry Douglas and Susan Douglas. 2003. PostgreSQL: a comprehensive guide to building, programming, and administering PostgresSQL databases. SAMS publishing.
[14]
Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, and Vijay Chandrasekhar. 2017. Object detection meets knowledge graphs. International Joint Conferences on Artificial Intelligence.
[15]
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, and Chris Dyer. 2016. Problems with evaluation of word embeddings using word similarity tasks. arXiv preprint arXiv:1605.02276 (2016).
[16]
Maurice Fréchet. 1951. Sur les tableaux de corrélation dont les marges sont données. Ann. Univ. Lyon, 3 e serie, Sciences, Sect. A 14 (1951), 53--77.
[17]
Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2019. I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 8303--8311.
[18]
Wenjia He, Michael R Anderson, Maxwell Strome, and Michael Cafarella. 2020. A Method for Optimizing Opaque Filter Queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1257--1272.
[19]
Wenjia He and Michael Cafarella. 2022. Controlled intentional degradation in analytical video systems. In Proceedings of the 2022 International Conference on Management of Data. 2105--2119.
[20]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[21]
Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Shivaram Venkataraman, Paramvir Bahl, Matthai Philipose, Phillip B Gibbons, and Onur Mutlu. 2018. Focus: Querying large video datasets with low latency and low cost. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 269--286.
[22]
Stratos Idreos, Martin L Kersten, Stefan Manegold, et al. 2007. Database Cracking. In CIDR, Vol. 7. 68--78.
[23]
Filip Ilievski, Pedro Szekely, and Bin Zhang. 2021. Cskg: The commonsense knowledge graph. In European Semantic Web Conference. Springer, 680--696.
[24]
Matthias Jarke and Jurgen Koch. 1984. Query optimization in database systems. ACM Computing surveys (CsUR) 16, 2 (1984), 111--152.
[25]
Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 253--266.
[26]
James Joyce. 2003. Bayes' theorem. (2003).
[27]
Daniel Kang, Peter Bailis, and Matei Zaharia. 2018. BlazeIt: optimizing declarative aggregation and limit queries for neural network-based video analytics. arXiv preprint arXiv:1805.01046 (2018).
[28]
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. Noscope: optimizing neural network queries over video at scale. arXiv preprint arXiv:1703.02529 (2017).
[29]
Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, and Matei Zaharia. 2020. Task-agnostic indexes for deep learning-based queries over unstructured data. arXiv preprint arXiv:2009.04540 (2020).
[30]
Daniel Kang, Ankit Mathur, Teja Veeramacheneni, Peter Bailis, and Matei Zaharia. 2020. Jointly optimizing preprocessing and inference for DNN-based visual analytics. arXiv preprint arXiv:2007.13005 (2020).
[31]
Vadim Kantorov and Ivan Laptev. 2014. Efficient feature extraction, encoding and classification for action recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2593--2600.
[32]
Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1725--1732.
[33]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[34]
Ferdi Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, and Sam Madden. 2023. Extract-Transform-Load for Video Streams. Proceedings of the VLDB Endowment 16, 9 (2023), 2302--2315.
[35]
Sotiris Kotsiantis and Dimitris Kanellopoulos. 2006. Association rules mining: A recent overview. GESTS International Transactions on Computer Science and Engineering 32, 1 (2006), 71--82.
[36]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. 489--504.
[37]
Hildegard Kuehne, Hueihan Jhuang, Estibaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: a large video database for human motion recognition. In 2011 International conference on computer vision. IEEE, 2556--2563.
[38]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521, 7553 (2015), 436--444.
[39]
Yao Lu, Aakanksha Chowdhery, and Srikanth Kandula. 2016. Optasia: A relational platform for efficient large-scale video analytics. In Proceedings of the Seventh ACM Symposium on Cloud Computing. 57--70.
[40]
Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating machine learning inference with probabilistic predicates. In Proceedings of the 2018 International Conference on Management of Data. 1493--1508.
[41]
Thomas Marcoux, Nitin Agarwal, Recep Erol, Adewale Obadimu, and Muhammad Nihal Hussain. 2021. Analyzing cyber influence campaigns on YouTube using YouTubeTracker. In Big Data and Social Media Analytics. Springer, 101--111.
[42]
Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. 2016. The more you know: Using knowledge graphs for image classification. arXiv preprint arXiv:1612.04844 (2016).
[43]
Ward Douglas Maurer and Ted G Lewis. 1975. Hash table methods. ACM Computing Surveys (CSUR) 7, 1 (1975), 5--19.
[44]
Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, and Josef Sivic. 2019. Howto100m: Learning a text-video embedding by watching hundred million narrated video clips. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2630--2640.
[45]
George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39--41.
[46]
Ross Mistry and Stacia Misner. 2014. Introducing Microsoft SQL Server 2014. Microsoft Press.
[47]
Oscar Moll, Favyen Bastani, Sam Madden, Mike Stonebraker, Vijay Gadepally, and Tim Kraska. 2022. Exsample: Efficient searches on video repositories through adaptive sampling. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2956--2968.
[48]
AB MySQL. 2001. MySQL.
[49]
OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
[50]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
[51]
Alex Poms, Will Crichton, Pat Hanrahan, and Kayvon Fatahalian. 2018. Scanner: Efficient video analysis at scale. ACM Transactions on Graphics (TOG) 37, 4 (2018), 1--13.
[52]
Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora, and Serge Belongie. 2007. Objects in context. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, 1--8.
[53]
Joseph Redmon. 2013. Darknet: Open source neural networks in c.
[54]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7263--7271.
[55]
Irina Rish et al. 2001. An empirical study of the naive Bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, Vol. 3. 41--46.
[56]
Jie Shao, Heng Tao Shen, and Xiaofang Zhou. 2008. Challenges and techniques for effective and efficient similarity search in large video databases. Proceedings of the VLDB Endowment 1, 2 (2008), 1598--1603.
[57]
Push Singh, Thomas Lin, Erik T Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. 2002. Open mind common sense: Knowledge acquisition from the general public. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems". Springer, 1223--1237.
[58]
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. Conceptnet 5.5: An open multilingual graph of general knowledge. In Thirty-first AAAI conference on artificial intelligence.
[59]
Todd Andrew Stephenson. 2000. An introduction to Bayesian network theory and usage. Technical Report. Idiap.
[60]
Mingxing Tan, Ruoming Pang, and Quoc V Le. 2020. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10781--10790.
[61]
Niket Tandon, Aparna S Varde, and Gerard de Melo. 2018. Commonsense knowledge in machine intelligence. ACM SIGMOD Record 46, 4 (2018), 49--52.
[62]
Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, and Jie Zhou. 2019. Coin: A large-scale dataset for comprehensive instructional video analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1207--1216.
[63]
Lisa Torrey and Jude Shavlik. 2010. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 242--264.
[64]
Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.
[65]
Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2724--2743.
[66]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
[67]
Zhuangdi Xu, Gaurav Tarlok Kakkar, Joy Arulraj, and Umakishore Ramachandran. 2022. EVA: A Symbolic Approach to Accelerating Exploratory Video Analytics with Materialized Views. In Proceedings of the 2022 International Conference on Management of Data. 602--616.
[68]
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic, and Cordelia Schmid. 2023. Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning. arXiv preprint arXiv:2302.14115 (2023).
[69]
Rowan Zellers, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. From recognition to cognition: Visual commonsense reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6720--6731.
[70]
Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 (2018).
[71]
Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J Freedman. 2017. Live video analytics at scale with approximation and delay-tolerance. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 377--392.
[72]
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision. 19--27.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 17, Issue 7
March 2024
260 pages
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 30 May 2024
Published in PVLDB Volume 17, Issue 7

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 28
    Total Downloads
  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)10
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media