Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1178677.1178715acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Efficient filtering with sketches in the ferret toolkit

Published: 26 October 2006 Publication History

Abstract

Ferret is a toolkit for building content-based similarity search systems for feature-rich data types such as audio, video, and digital photos.The key component of this toolkit is a content-based similarity search engine for generic, multi-feature object representations. This paper describes the filtering mechanism used in the Ferret toolkit and experimental results with several datasets. The filtering mechanism uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. The paper compared two filtering methods: using segment feature vectors and sketches constructed from segment feature vectors. Our experimental results show that filtering can substantially speedup the search process and reduce memory requirement while maintaining good search quality. To help systems designers choose the filtering parameters, we have developed a rank-based analytical model for the filtering algorithm using sketches. Our experiments show that the model gives conservative and good prediction for different datasets.

References

[1]
S. Balko, I. Schmitt, and G. Saake. The active vertice method: A performance filtering approach to high-dimensional indexing.Elsevier Data and Knowledge Engineering (DKE), 51(3): 369--397, 2004.
[2]
C. Böhm, S. Berchtold, and D. A. Keim. Search in high-dimensional spaces -index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR),33(3):322--373, 2001.
[3]
A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In 23rd International Conference on Machine Learning (ICML '06), 2006.
[4]
A. Cardone, S. K. Gupta,and M. Karnik. A survey of shape similarity assessment algorithms for product design and manufacturing applications. Journal of Computing and Information Science in Engineering, 3(2):109--118, 2003.
[5]
E. Chávez, G. Navarro, R. A. Baeza-Yates, and J.L. Marroqun. Searching in metric spaces.ACM Computing Surveys, 33(3): 273--321, 2001.
[6]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the 20th annual symposium on Computational geometry(SCG), pages 253--262, 2004.
[7]
Y. Deng and B. S. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans.on Pattern Analysis and Machine Intelligence, 23(8): 800--810, 2001.
[8]
D. Dobkin and R. Lipton. Multidimensional search problems. SIAM J.Computing, 5:181--186, 1976.
[9]
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J.G. Fiscus, D. S. Pallett, and N. L. Dahlgren. DARPA TIMIT acoustic-phonetic continuous speech corpus, 1993.
[10]
J. Gemmell, G. Bell, and R. Lueder. Mylifebits: a personal database for everything. Communications of the ACM, 49(1):88--95, 2006.
[11]
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of 25th International Conference on Very Large Data Bases(VLDB), pages 518--529, 1999.
[12]
L. Huston, R. Sukthankar, D. Hoiem, and J. Zhang. Snap find: Brute force interactive image retrieval.In Proceedings of International Conference on Image Processing and Graphics, 2004.
[13]
P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pages 604--613, 1998.
[14]
N. Iyer, S. Jayanti, K. Lou, Y. Kalyanaraman, and K.Ramani.Three dimensional shape searching: State-of-the-art review and future trends. Computer-Aided Design, 37(5): 509--530, 2005.
[15]
W. Josephson, Q. Lv, Z. Wang, M. Charikar, and K.Li. Analysis of filtering for similarity search using sketches. Technical Report TR-760-06,Princeton University, Department of Computer Science, 2006.
[16]
M. Kazhdan, T. Funkhouser,and S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Proc.of the Eurographics Symposium on Geometry Processing, 2003.
[17]
R. Krauthgamer and J. R. Lee. Navigating nets: Simple algorithms for proximity search.In Proceedings of the 15th ACM Symposium on Discrete Algorithms, pages 798--807, 2004.
[18]
Q. Lv,M.Charikar,and K.Li.Image similarity search with compact data structures. In Proc. of the 13th ACM Conf.on Information and Knowledge Management, pages 208--217, 2004.
[19]
Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Ferret: A toolkit for content-based similarity search of feature-rich data. In Proceedings of ACM Eurosys 2006, 2006.
[20]
S. Meiser. Point location in arrangements of hyperplanes.Information and Computation, 106(2): 286--303, 1993.
[21]
R.Panigrahy. Entropy based nearest neighbor search in high dimensions. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms(SODA), Jan 2006.
[22]
Y. Rubner, C. Tomasi, and L. J. Guibas.The earth mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99--121, 2000.
[23]
Y. Rui, T. S. Huang, and S. -F. Chang. Image retrieval: Current techniques,promising directions and open issues.Journal of Visual Communication and Image Representation, 10(4): 39--62, 1999.
[24]
Y. Sakurai, M. Yoshikawa, S. Uemura,and H. Kojima. The a-tree:An index structure for high-dimensional spaces using relative approximation.In Proceedings of the 26th International Conference on Very Large Databases (VLDB),pages 516--526, 2000.
[25]
A. W. Smeulders, M. Worring, S. Santini,A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000.
[26]
Spotlight: Find anything on your mac instantly. http://images.apple.com/macosx/pdf/MacOSX_Spotlight_TB.pdf.
[27]
G. Tzanetakis and P. Cook. MARSYAS: A Framework for Audio Analysis. Cambridge University Press, 2000.
[28]
R. C. Veltkamp and M. Tanase. Content-based image retrieval systems: A survey. Technical Report UU-CS-2000-34, Utrecht University, Information and Computer Sciences, 2000.
[29]
R. Weber, H.-J.Schek,and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24th International Conference on Very Large Data Bases (VLDB), pages 194--205, 1998.

Cited By

View all
  • (2020)Fast Locally Weighted PLS Modeling for Large-Scale Industrial ProcessesIndustrial & Engineering Chemistry Research10.1021/acs.iecr.0c0393259:47(20779-20786)Online publication date: 11-Nov-2020
  • (2010)Feature Distribution Based Quick Image RetrievalProceedings of the 2010 Seventh Web Information Systems and Applications Conference10.1109/WISA.2010.48(23-28)Online publication date: 20-Aug-2010
  • (2010)Locality Preserving Scheme of Text Databases Representative in Distributed Information Retrieval SystemsNetworked Digital Technologies10.1007/978-3-642-14306-9_17(162-171)Online publication date: 2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval
October 2006
344 pages
ISBN:1595934952
DOI:10.1145/1178677
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature-rich data
  2. filtering
  3. similarity search
  4. sketch
  5. toolkit

Qualifiers

  • Article

Conference

MM06
MM06: The 14th ACM International Conference on Multimedia 2006
October 26 - 27, 2006
California, Santa Barbara, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Fast Locally Weighted PLS Modeling for Large-Scale Industrial ProcessesIndustrial & Engineering Chemistry Research10.1021/acs.iecr.0c0393259:47(20779-20786)Online publication date: 11-Nov-2020
  • (2010)Feature Distribution Based Quick Image RetrievalProceedings of the 2010 Seventh Web Information Systems and Applications Conference10.1109/WISA.2010.48(23-28)Online publication date: 20-Aug-2010
  • (2010)Locality Preserving Scheme of Text Databases Representative in Distributed Information Retrieval SystemsNetworked Digital Technologies10.1007/978-3-642-14306-9_17(162-171)Online publication date: 2010
  • (2009)Efficient Similarity Search by Reducing I/O with Compressed SketchesProceedings of the 2009 Second International Workshop on Similarity Search and Applications10.1109/SISAP.2009.22(30-38)Online publication date: 29-Aug-2009
  • (2008)Asymmetric distance estimation with sketches for similarity search in high-dimensional spacesProceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1390334.1390358(123-130)Online publication date: 20-Jul-2008
  • (2007)Sizing sketchesACM SIGMETRICS Performance Evaluation Review10.1145/1269899.125490035:1(157-168)Online publication date: 12-Jun-2007
  • (2007)Sizing sketchesProceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems10.1145/1254882.1254900(157-168)Online publication date: 12-Jun-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media