research-article

Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce

Authors:

Marc-Olivier Fleury,

Michele Merler,

Apostol Natsev,

John R. SmithAuthors Info & Claims

LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining

Pages 35 - 42

https://doi.org/10.1145/1631058.1631067

Published: 23 October 2009 Publication History

Abstract

With the rapid growth of multimedia data, it becomes increasingly important to develop semantic concept modeling approaches that are consistently effective, highly efficient, and easily scalable. To this end, we first propose the robust subspace bagging (RB-SBag) algorithm by augmenting random subspace bagging with forward model selection. Compared with traditional modeling approaches, RB-SBag offers a considerably faster learning process while minimizing the risk of overfitting. Its ensemble structure also enables a convenient transformation into a simple parallel framework called MapReduce. To further improve scalability, we also develop a task scheduling algorithm to optimize task placement for heterogenous tasks. On a collection consisting of more than 250,000 images and several standard TRECVID benchmark datasets, RB-SBag achieved more than a 10-fold speedup with comparable or even better classification performance than baseline SVMs. We also deployed the MapReduce implementation on a 16-node Hadoop cluster, where the proposed task scheduler demonstrates a significantly better scalability than the baseline scheduler in the presence of task heterogeneity.

References

[1]

Hadoop. http://hadoop.apache.org/.

[2]

Hadoop wiki. http://wiki.apache.org/hadoop/PoweredBy.

[3]

L. Breiman. Bagging predictors. Machine Learning, 24(2):123--140, 1996.

[4]

L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001.

Digital Library

[5]

R. E. Bryant. Data-intensive supercomputing: The case for disc. Technical report, School of Computer Science, Carnegie Mellon University, 2007.

[6]

R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes. Ensemble selection from libraries of models. In Intl. Conf. of Machine Learning, 2004.

Digital Library

[7]

E. Y. Chang, K. Zhu, H. Wang, H. Bai, J. Li, and Z. Qiu. Psvm: Parallelizing support vector machines on distributed computers. In Advances in Neural Information Processing Systems, volume 20, 2007.

[8]

C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-Reduce for machine learning on multicore. In Advances in Neural Information Processing Systems: Proceedings of the 2006 Conference, page 281. MIT Press, 2007.

[9]

E. G. Coffman, M. R. Garey, and D. S. Johnson. An application of bin-packing to multiprocessor scheduling. SIAM Journal on Computing, 7(1):1--17, 1978.

Digital Library

[10]

J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.

Digital Library

[11]

T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832--844, 1998.

Digital Library

[12]

Y.-G. Jiang, C.-W. Ngo, and J. Yang. Towards optimal bag-of-features for object categorization and semantic video retrieval. In Proceedings of the 6th ACM Intl. Conf. on Image and video retrieval, pages 494--501, 2007.

Digital Library

[13]

T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Schölkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.

Digital Library

[14]

Y. Lu, L. Zhang, Q. Tian, and W.-Y. Ma. What Are the High-Level Concepts with Small Semantic Gaps? In CVPR08, 2008.

[15]

G. Martinez-Munoz and A. Suárez. Pruning in ordered bagging ensembles. In Proceedings of the 23rd Intl. Conf. on Machine Learning, pages 609--616, 2006.

Digital Library

[16]

M. R. Naphade and J. R. Smith. On the detection of semantic concepts at trecvid. In Proceedings of the 12th annual ACM international conference on Multimedia, pages 660--667, New York, NY, USA, 2004.

Digital Library

[17]

P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. Trecvid 2006 overview. In NIST TRECVID-2006, 2006.

[18]

Same Author. N/A. In N/A, 2007.

[19]

D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans. Pattern Anal. Mach. Intell., 28(7):1088--1099, 2006.

Digital Library

[20]

R. Yan, J. Tesic, and J. R. Smith. Model-shared subspace boosting for multi-label classification. In Proceedings of the 13th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 834--843, 2007.

Digital Library

[21]

M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica. Improving mapreduce performance in heterogeneous environments. Technical Report UCB/EECS-2008-99, EECS Department, University of California, Berkeley, Aug 2008.

Cited By

Gill SOuyang XGarraghan P(2020)Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centresThe Journal of Supercomputing10.1007/s11227-020-03241-xOnline publication date: 12-Mar-2020
https://doi.org/10.1007/s11227-020-03241-x
Codella NNguyen QPankanti SGutman DHelba BHalpern ASmith J(2017)Deep learning ensembles for melanoma recognition in dermoscopy imagesIBM Journal of Research and Development10.1147/JRD.2017.270829961:4-5(5:1-5:15)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1147/JRD.2017.2708299
Huang SWang BQiu JYao JWang GYu G(2016)Parallel ensemble of online sequential extreme learning machine based on MapReduceNeurocomputing10.1016/j.neucom.2015.04.105174:PA(352-367)Online publication date: 22-Jan-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.04.105
Show More Cited By

Index Terms

Large-scale multimedia semantic concept modeling using robust subspace bagging and MapReduce
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations

Recommendations

Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm
Highlights
- Distributed Heterogeneous Ensemble is designed for big data classification.
- ...
Abstract
In this era of big data, processing large scale data efficiently and accurately has become a challenging problem. Ensemble classification is a type of supervised learning that uses multiple experts to generate the final output. It ...
MapReduce based for speech classification
SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

Speech classification is one of the most vital problems in speech processing as well as spoken word recognition. Although, there have been many studies on the classification of speech signals, the results are still limited on both accuracy and the size ...
MapReduce: Review and open challenges

The continuous increase in computational capacity over the past years has produced an overwhelming flow of data or big data, which exceeds the capabilities of conventional processing tools. Big data signify a new era in data exploration and utilization. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LS-MMRM '09: Proceedings of the First ACM workshop on Large-scale multimedia retrieval and mining

October 2009

144 pages

ISBN:9781605587561

DOI:10.1145/1631058

General Chairs:
Rong Yan
IBM TJ Watson Research Center
,
Qi Tian
Microsoft Research Asia and University of Texas, San Antonio
,
John R. Smith
IBM TJ Watson Research Center
,
Rahul Sukthankar
Intel Research and Carnegie Mellon

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM09

Sponsor:

SIGMM

MM09: ACM Multimedia Conference

October 23, 2009

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
843
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gill SOuyang XGarraghan P(2020)Tails in the cloud: a survey and taxonomy of straggler management within large-scale cloud data centresThe Journal of Supercomputing10.1007/s11227-020-03241-xOnline publication date: 12-Mar-2020
https://doi.org/10.1007/s11227-020-03241-x
Codella NNguyen QPankanti SGutman DHelba BHalpern ASmith J(2017)Deep learning ensembles for melanoma recognition in dermoscopy imagesIBM Journal of Research and Development10.1147/JRD.2017.270829961:4-5(5:1-5:15)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1147/JRD.2017.2708299
Huang SWang BQiu JYao JWang GYu G(2016)Parallel ensemble of online sequential extreme learning machine based on MapReduceNeurocomputing10.1016/j.neucom.2015.04.105174:PA(352-367)Online publication date: 22-Jan-2016
https://dl.acm.org/doi/10.1016/j.neucom.2015.04.105
Shirahama KGrzegorzek M(2016)Towards large-scale multimedia retrieval enriched by knowledge about human interpretationMultimedia Tools and Applications10.1007/s11042-014-2292-875:1(297-331)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1007/s11042-014-2292-8
Chaudhury SMallik AGhosh H(2015)Applications Exploiting Multimedia SemanticsMultimedia Ontology10.1201/b18639-16(177-200)Online publication date: 26-Jun-2015
https://doi.org/10.1201/b18639-16
Abedini MCodella NConnell JGarnavi RMerler MPankanti SSmith JSyeda-Mahmood T(2015)A generalized framework for medical image classification and recognitionIBM Journal of Research and Development10.1147/JRD.2015.239001759:2/3(1:1-1:18)Online publication date: Mar-2015
https://doi.org/10.1147/JRD.2015.2390017
Wang HZhu FXiao BWang LJiang Y(2015)GPU-based MapReduce for large-scale near-duplicate video retrievalMultimedia Tools and Applications10.1007/s11042-014-2185-x74:23(10515-10534)Online publication date: 1-Dec-2015
https://dl.acm.org/doi/10.1007/s11042-014-2185-x
Banaei SMoghaddam H(2014)Hadoop and Its Role in Modern Image ProcessingOpen Journal of Marine Science10.4236/ojms.2014.4402204:04(239-245)Online publication date: 2014
https://doi.org/10.4236/ojms.2014.44022
Oria VLi YDorai CHoule M(2014)Multimedia DatabasesComputing Handbook, Third Edition10.1201/b16768-17(14-1-14-28)Online publication date: May-2014
https://doi.org/10.1201/b16768-17
Fleites FHa HYang YChen SLi KLi QShih T(2014)Large-Scale Correlation- Based Semantic Classification Using MapReduceCloud Computing and Digital Media10.1201/b16614-9(169-190)Online publication date: 6-Feb-2014
https://doi.org/10.1201/b16614-9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents