Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1631058.1631067acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

Published: 23 October 2009 Publication History

Abstract

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose the robust subspace bagging (RB-SBag) algorithm by augmenting random subspace bagging with forward model selection. Compared with traditional modeling approaches, RB-SBag offers a considerably faster learning process while minimizing the risk of overfitting. Its ensemble structure also enables a convenient transformation into a simple parallel framework called MapReduce. To further improve scalability, we also develop a task scheduling algorithm to optimize task placement for heterogenous tasks. On a collection consisting of more than 250,000 images and several standard TRECVID benchmark datasets, RB-SBag achieved more than a 10-fold speedup with comparable or even better classification performance than baseline SVMs. We also deployed the MapReduce implementation on a 16-node Hadoop cluster, where the proposed task scheduler demonstrates a significantly better scalability than the baseline scheduler in the presence of task heterogeneity.

References

[1]
Hadoop. http://hadoop.apache.org/.
[2]
Hadoop wiki. http://wiki.apache.org/hadoop/PoweredBy.
[3]
L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.
[4]
L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.
[5]
R. E. Bryant. Data-intensive supercomputing: The case for disc. Technical report, School of Computer Science, Carnegie Mellon University, 2007.
[6]
R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In Intl. Conf. of Machine Learning, 2004.
[7]
E. Y. Chang, K. Zhu, H. Wang, H. Bai, J. Li, and Z. Qiu. Psvm: Parallelizing support vector machines on distributed computers. In Advances in Neural Information Processing Systems, volume 20, 2007.
[8]
C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference, page 281. MIT Press, 2007.
[9]
E. G. Coffman, M. R. Garey, and D. S. Johnson. An application of bin-packing to multiprocessor scheduling. SIAM Journal on Computing, 7(1):1--17, 1978.
[10]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
[11]
T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832--844, 1998.
[12]
Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM Intl. Conf. on Image and video retrieval, pages 494--501, 2007.
[13]
T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.
[14]
Y. Lu, L. Zhang, Q. Tian, and W.-Y. Ma. What Are the High-Level Concepts with Small Semantic Gaps? In CVPR08, 2008.
[15]
G. Martinez-Munoz and A. Suárez. Pruning in ordered bagging ensembles. In Proceedings of the 23rd Intl. Conf. on Machine Learning, pages 609--616, 2006.
[16]
M. R. Naphade and J. R. Smith. On the detection of semantic concepts at trecvid. In Proceedings of the 12th annual ACM international conference on Multimedia, pages 660--667, New York, NY, USA, 2004.
[17]
P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. Trecvid 2006 overview. In NIST TRECVID-2006, 2006.
[18]
Same Author. N/A. In N/A, 2007.
[19]
D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 28(7):1088--1099, 2006.
[20]
R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In Proceedings of the 13th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 834--843, 2007.
[21]
M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. Technical Report UCB/EECS-2008-99, EECS Department, University of California, Berkeley, Aug 2008.

Cited By

View all
  • (2020)Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centresThe Journal of Supercomputing10.1007/s11227-020-03241-xOnline publication date: 12-Mar-2020
  • (2017)Deep learning ensembles for melanoma recognition in dermoscopy imagesIBM Journal of Research and Development10.1147/JRD.2017.270829961:4-5(5:1-5:15)Online publication date: 1-Jul-2017
  • (2016)Parallel ensemble of online sequential extreme learning machine based on MapReduceNeurocomputing10.1016/j.neucom.2015.04.105174:PA(352-367)Online publication date: 22-Jan-2016
  • Show More Cited By

Index Terms

  1. Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining
    October 2009
    144 pages
    ISBN:9781605587561
    DOI:10.1145/1631058
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. mapreduce
    2. semantic concepts
    3. subspace bagging

    Qualifiers

    • Research-article

    Conference

    MM09
    Sponsor:
    MM09: ACM Multimedia Conference
    October 23, 2009
    Beijing, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centresThe Journal of Supercomputing10.1007/s11227-020-03241-xOnline publication date: 12-Mar-2020
    • (2017)Deep learning ensembles for melanoma recognition in dermoscopy imagesIBM Journal of Research and Development10.1147/JRD.2017.270829961:4-5(5:1-5:15)Online publication date: 1-Jul-2017
    • (2016)Parallel ensemble of online sequential extreme learning machine based on MapReduceNeurocomputing10.1016/j.neucom.2015.04.105174:PA(352-367)Online publication date: 22-Jan-2016
    • (2016)Towards large-scale multimedia retrieval enriched by knowledge about human interpretationMultimedia Tools and Applications10.1007/s11042-014-2292-875:1(297-331)Online publication date: 1-Jan-2016
    • (2015)Applications Exploiting Multimedia SemanticsMultimedia Ontology10.1201/b18639-16(177-200)Online publication date: 26-Jun-2015
    • (2015)A generalized framework for medical image classification and recognitionIBM Journal of Research and Development10.1147/JRD.2015.239001759:2/3(1:1-1:18)Online publication date: Mar-2015
    • (2015)GPU-based MapReduce for large-scale near-duplicate video retrievalMultimedia Tools and Applications10.1007/s11042-014-2185-x74:23(10515-10534)Online publication date: 1-Dec-2015
    • (2014)Hadoop and Its Role in Modern Image ProcessingOpen Journal of Marine Science10.4236/ojms.2014.4402204:04(239-245)Online publication date: 2014
    • (2014)Multimedia DatabasesComputing Handbook, Third Edition10.1201/b16768-17(14-1-14-28)Online publication date: May-2014
    • (2014)Large-Scale Correlation- Based Semantic Classification Using MapReduceCloud Computing and Digital Media10.1201/b16614-9(169-190)Online publication date: 6-Feb-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media