Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1291233.1291280acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Practical elimination of near-duplicates from web video search

Published: 29 September 2007 Publication History

Abstract

Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the near-duplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or near-duplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data set of 12,790 videos retrieved from Google, Yahoo! and YouTube show that this hierarchical approach can dramatically reduce redundant video displayed to the user in the top result set, at relatively small computational cost.

References

[1]
D. A. Adjeroh, M. C. Lee, and I. King. A Distance Measure for Video Sequences. CVIU, pp. 25--45, 1999.
[2]
J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, 2002.
[3]
J. Allan, C. Wade, and A. Bolivar. Retrieval and Novelty Detection at the Sentence Level. ACM SIGIR'03.
[4]
T. Brants, F. Chen, and A. Farahat. A System for New Event Detection. ACM SIGIR'03, Canada, Jul. 2003.
[5]
J. Carbonell and J. Goldstein. The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries. ACM SIGIR'98.
[6]
S.-F. Chang, W. Hsu, L. Kennedy, L. Xie and et al. Columbia University TRECVID-2005 Video Search and High-Level Feature Extraction. TRECVID 2005, Washington DC, 2005.
[7]
S. C. Cheung and A. Zakhor. Efficient Video Similarity Measurement with Video Signature. IEEE Trans. on CSVT, vol. 13, no. 1, pp. 59--74, Jan. 2003.
[8]
S. C. Cheung and A. Zakhor. Fast Similarity Search and Clustering of Video Sequences on the World-Wide-Web. IEEE Trans. on CSVT, vol. 7, no. 3, pp. 524--537, June 2005.
[9]
E. Gabrilovich, S. Dumais, and E. Horvitz. Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty. WWW'04, USA, 2004, pp. 482--490.
[10]
Google Video. Available: http://video.google.com.
[11]
A. Hampapur and R. Bolle. Comparison of Sequence Matching Techniques for Video Copy Detection. Conf. on Storage and Retrieval for Media Databases, 2002.
[12]
T. C. Hoad and J. Zobel. Fast Video Matching with Signature Alignment. MIR'03, pp. 262--269, USA, 2003.
[13]
W. H. Hsu, L. S. Kennedy and S.-F. Chang. Video Search Reranking via Information Bottleneck Principle. ACM MM'06, USA, pp. 35--44, 2006.
[14]
Informedia. Available: http://www.informedia.cs.cmu.edu.
[15]
A. Jaimes. Conceptual Structures and Computational Methods for Indexing and Organization of Visual Information. Ph.D. Thesis, 2003.
[16]
A. K. Jain, A. Vailaya, and W. Xiong. Query by Video Clip. ACM Multimedia Syst. J., vol. 7, pp. 369--384, 1999.
[17]
A. Joly, O. Buisson and C. Frelicot. Content-based Copy Retrieval Using Distortion-based Probabilistic Similarity Search. IEEE Trans. on MM, vol. 9, no. 2, Feb. 2007.
[18]
K. Kashino, Takayuki, and H. Murase. A Quick Search Method for Audio and Video Signals Based on Histogram Pruning. IEEE Trans. on MM, vol. 5, no. 3, 2003.
[19]
Y. Ke, R. Sukthankar, and L. Huston. Efficient Near-Duplicate Detection and Sub-Image Retrieval. ACM MM'04.
[20]
J. Law-To, B. Olivier, V. Gouet-Brunet and B. Nozha. Robust Voting Algorithm Based on Labels of Behavior for Video Copy Detection. ACM MM'06, pp. 835--844, 2006.
[21]
R. Lienhart and W. Effelsberg. VisualGREP: A Systematic Method to Compare and Retrieve Video Sequences. Multimedia Tools Appl., vol. 10, no. 1, pp. 47--72, Jan. 2000.
[22]
L. Liu, W. Lai, X.-S. Hua, and S.-Q. Yang. Video Histogram: A Novel Video Signature for Efficient Web Video Duplicate Detection. MMM'07.
[23]
X. Liu, Y. Zhuang, and Y. Pan. A New Approach to Retrieve Video by Example Video Clip. ACM MM'99, 1999.
[24]
D. Lowe. Distinctive Image Features from Scale-Invariant Key Points. IJCV, vol. 60, pp. 91--110, 2004.
[25]
K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. CVPR'03, pp. 257--263.
[26]
K. Mikolajczyk and C. Schmid. Scale and Affine Invariant Interest Point Detectors. IJCV, 60 (2004), pp. 63--86.
[27]
C.-W. Ngo, W.-L. Zhao, Y.-G. Jiang. Fast Tracking of Near-Duplicate Keyframes in Broadcast Domain with Transitivity Propagation. ACM MM'06, pp. 845--854, USA, Oct. 2006.
[28]
Y. Peng and C.-W. Ngo. Clip-based Similarity Measure for Query-Dependent Clip Retrieval and Video Summarization. IEEE Trans. on CSVT, vol. 16, no. 5, May 2006.
[29]
Wikipedia. http://en.wikipedia.org/wiki/Youtube.
[30]
X. Wu, A. G. Hauptmann, and C.-W. Ngo. Novelty Detection for Cross-Lingual News Stories with Visual Duplicates and Speech Transcripts. ACM MM'07.
[31]
X. Wu, C.-W. Ngo, and Q. Li. Threading and Autodocumenting News Videos. IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 59--68, March 2006.
[32]
Yahoo! Video. Available: http://video.yahoo.com.
[33]
Y. Yang, J. Zhang, J. Carbonell and C. Jin. Topic-conditioned Novelty Detection. ACM SIGKDD'02, Canada.
[34]
YouTube. Available: http://www.youtube.com.
[35]
J. Yuan, L.-Y. Duan, Q. Tian, S. Ranganath and C. Xu. Fast and Robust Short Video Clip Search for Copy Detection. Pacific Rim Conf. on Multimedia (PCM), 2004.
[36]
C. Zhai, W. Cohen and J. Lafferty. Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. ACM SIGIR'03.
[37]
B. Zhang et. al. Improving Web Search Results Using Affinity Graph. ACM SIGIR.05.
[38]
D.-Q. Zhang and S.-F. Chang. Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. ACM MM'04, USA, Oct. 2004.
[39]
Y. Zhang, J. Callan, and T. Minka. Novelty and Redundancy Detection in Adaptive Filtering. ACM SIGIR'02, 2002.

Cited By

View all
  • (2025)Extremely compact video representation for efficient near-duplicates detectionPattern Recognition10.1016/j.patcog.2024.111016158(111016)Online publication date: Feb-2025
  • (2024)On Improving Management of Duplicate Video-Based Bug ReportsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639786(201-203)Online publication date: 14-Apr-2024
  • (2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 22-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '07: Proceedings of the 15th ACM international conference on Multimedia
September 2007
1115 pages
ISBN:9781595937025
DOI:10.1145/1291233
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 September 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. copy setection
  2. filtering
  3. multimodality
  4. near-duplicates
  5. novelty and redundancy detection
  6. similarity measure
  7. web video

Qualifiers

  • Article

Conference

MM07

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Extremely compact video representation for efficient near-duplicates detectionPattern Recognition10.1016/j.patcog.2024.111016158(111016)Online publication date: Feb-2025
  • (2024)On Improving Management of Duplicate Video-Based Bug ReportsProceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings10.1145/3639478.3639786(201-203)Online publication date: 14-Apr-2024
  • (2024)Efficient Unsupervised Video Hashing With Contextual Modeling and Structural ControllingIEEE Transactions on Multimedia10.1109/TMM.2024.336892426(7438-7450)Online publication date: 22-Feb-2024
  • (2024)Deep Metric Learning for Near-Duplicate Video Retrieval Leveraging Efficient Semantic Feature ExtractionIEEE Access10.1109/ACCESS.2024.341110112(88897-88903)Online publication date: 2024
  • (2024)The 2023 video similarity dataset and challengeComputer Vision and Image Understanding10.1016/j.cviu.2024.103997243(103997)Online publication date: Jun-2024
  • (2023)Video Retrieval for Everyday Scenes With Common ObjectsProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592239(565-570)Online publication date: 12-Jun-2023
  • (2023)SkipStreaming: Pinpointing User-Perceived Redundancy in Correlated Web Video Streaming through the Lens of ScenesProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611845(3944-3953)Online publication date: 26-Oct-2023
  • (2023)An Efficient Machine-Generated Data Modeling Approach Based on Domain-Aware Knowledge for Intelligent Consumer ElectronicsIEEE Transactions on Consumer Electronics10.1109/TCE.2023.332721669:4(984-995)Online publication date: Nov-2023
  • (2023)A Secure Client Video Deduplication Scheme Based on 3D CNN2023 2nd International Conference on Machine Learning, Cloud Computing and Intelligent Mining (MLCCIM)10.1109/MLCCIM60412.2023.00030(165-176)Online publication date: 25-Jul-2023
  • (2023)3D-CSL: Self-Supervised 3D Context Similarity Learning for Near-Duplicate Video Retrieval2023 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP49359.2023.10222915(2880-2884)Online publication date: 8-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media