Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1101149.1101338acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Multimodal metadata fusion using causal strength

Published: 06 November 2005 Publication History

Abstract

We propose a probabilistic framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), and semantic ontology in a synergistic way. We use causal strengths to encode causalities between variables, and between variables and semantic labels. Through analytical and empirical studies, we demonstrate that our fusion approach can achieve high-quality photo annotation and good interpretability, substantially better than traditional methods.

References

[1]
http://www.exif.org.
[2]
K. Barnard and D. Forsyth. Learning the semantics of words and pictures. In International Conference on Computer Vision, volume 2, pages 408--415, 2000.
[3]
M. Boutell and J. Luo. Bayesian fusion of camera metadata cues in semantic scene classification. IEEE CVPR, 2004.
[4]
E. Y. Chang. Extent: Combining context, content, and semantic ontology for photo annotation. Second International Workshop on Computer Vision meets Databases, 2005.
[5]
M. Davis, S. King, N. Good, and R. Sarvas. From context to content: Leveraging context to infer media metadata. ACM International Conference on Multimedia, 2004.
[6]
A. K. Dey. Understanding and using context. Personal and Ubiquitous Computing Journal, 5(1), 2001.
[7]
T. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Artifical Intelligence Research, 2:263--286, 1995.
[8]
D. S. Diomidis. Position-annotated photographs: a geotemporal web. IEEE Pervasive Computing, 2(2), 2003.
[9]
P. J. Doshi, L. G. Greenwald, and J. R. Clarke. Using bayesian networks for cleansing trauma data. American Association for Artificial Intelligence, 2003.
[10]
N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. volume 29, pages 131--161, 1997.
[11]
D. Geiger and D. Heckerman. Knowledge representation and inference in similarity networks and bayesian multinets. volume 82, pages 45--74, 1996.
[12]
E. B. Goldstein. Senstation and perception (5th edition). 1999.
[13]
R. M. Haralick, K. Shanmugam, and I. Dinstein. Texture features for image classification. IEEE Trans. on Sys. Man. and Cyb, 3(6), 1973.
[14]
D. Heckerman. A bayesian approach to learning causal networks. Conference on Uncertainty in Artificial Intelligence, pages 107--118, 1995.
[15]
D. Heckerman and R. Shachter. Decision-theoretic foundations for causal reasoning.MSR-TR-94-11, 1994.
[16]
Y. Ke and R. Sukthankar. Pca-sift: A more distinctive representation for local image descriptors. IEEE Computer Vision and Pattern Recognition, 2004.
[17]
L. Khan and D. McLeod. Disambiguation of annotated text of audio using ontologies. SIGKDD, 2002.
[18]
D. G. Lowe. Object recognition from local scale-invariant features. International Conference on Computer Vision, 1999.
[19]
D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004.
[20]
B. S. Manjunath and W. Y. Ma. Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18:837--842, 1996.
[21]
M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, and A. Paepcke. Context data in geo-referenced digital photo collections. ACM International Conference on Multimedia, 2004.
[22]
M. Naaman, A. Paepcke, and H. Garcia-Molina. From where to what: Metadata sharing for digital photographs with geographic coordinates. International Conference on Cooperative Information Systems (CoopIS), 2003.
[23]
NIST. Common evaluation measures. 2001.
[24]
L. R. Novick and P. W. Cheng. Assessing interactive causal influence. Psychological Review, 111(2):455--485, 2004.
[25]
J. Pearl. Causality: Models, reasoning and inference. Cambridge University Press, 2000.
[26]
J. Pearl. Causal inference in the health sciences: A conceptual introduction. Special issue on causal inference, Kluwer Academic Publishers, Health Services and Outcomes Research Methodology, 2:189--220, 2001.
[27]
J. Platt. Probabilistic outputs for svms and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 1999.
[28]
Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Current techniques, promising directions and open issues. Journal of Visual Communication and Image Representation, 1999.
[29]
Y. Rui, A. C. She, and T. S. Huang. Modified fourier descriptors for shape representations- a practical approach. Proc. of First International Workshop on Image Databases and Multi Media Search, 1996.
[30]
J. R. Smith and S. F. Chang. Transform features for texture classification and discrimination in large image databases. Proc. IEEE Int. Conf. on Image Proc., 1994.
[31]
J. R. Smith and S.-F. Chang. Tools and techniques for color image retrieval. Proc. SPIE Proceedings Storage and Retrieval for Image and Video Databases IV, 2670, 1995.
[32]
J. R. Smith and S. F. Chang. Visually searching the web for content. IEEE Multimedia, 4(3):12--20, 1997.
[33]
M. Stricker and M. Orengo. Similarity of color images. Proc. SPIE Storage and Retrieval for Image and Video Databases, 1995.
[34]
H. Tamura, S. Mori, and T. Yamawaki. Texture features corresponding to visual perception. IEEE Trans. on Sys., Man. and Cyb, 3(6), 1978.
[35]
S. Tong and E. Chang. Support vector machine active learning for image retrieval. Proceedings of ACM International Conference on Multimedia, pages 107--118, October 2001.
[36]
J. Z. Wang, J. Li, and G. Wiederhold. Simplicity: Semantics-sensitive integrated matching for picture libraries. ACM Multimedia Conference, 2000.
[37]
J. Williamson. Causality, in Dov Gabbay & F. Guenthner (eds.): Handbook of Philosophical Logic. Kluwer (to appear), 2005.
[38]
Y. Wu, B. L. Tseng, and J. R. Smith. Ontology-based multi-classification learning for video concept detection. IEEE International Conference on Multimedia and Expo, 2004.

Cited By

View all
  • (2023)Museum Education Using XR Technologies: A Survey of Metadata ModelsEuropean Journal of Engineering and Technology Research10.24018/ejeng.2023.1.CIE.3139(66-77)Online publication date: 29-Dec-2023
  • (2016)Multimodal Data Mining in a Multimedia Database Based on Structured Max Margin LearningACM Transactions on Knowledge Discovery from Data10.1145/274254910:3(1-30)Online publication date: 24-Feb-2016
  • (2015)Lightweight Adaptation of Classifiers to Users and Contexts: Trends of the Emerging DomainThe Scientific World Journal10.1155/2015/4348262015:1Online publication date: 10-Sep-2015
  • Show More Cited By

Index Terms

  1. Multimodal metadata fusion using causal strength

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
    November 2005
    1110 pages
    ISBN:1595930442
    DOI:10.1145/1101149
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. causal strength
    2. influence diagram
    3. multimodal fusion
    4. photo annotation

    Qualifiers

    • Article

    Conference

    MM05

    Acceptance Rates

    MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Museum Education Using XR Technologies: A Survey of Metadata ModelsEuropean Journal of Engineering and Technology Research10.24018/ejeng.2023.1.CIE.3139(66-77)Online publication date: 29-Dec-2023
    • (2016)Multimodal Data Mining in a Multimedia Database Based on Structured Max Margin LearningACM Transactions on Knowledge Discovery from Data10.1145/274254910:3(1-30)Online publication date: 24-Feb-2016
    • (2015)Lightweight Adaptation of Classifiers to Users and Contexts: Trends of the Emerging DomainThe Scientific World Journal10.1155/2015/4348262015:1Online publication date: 10-Sep-2015
    • (2014)Towards interactive, intelligent, and integrated multimedia analytics2014 IEEE Conference on Visual Analytics Science and Technology (VAST)10.1109/VAST.2014.7042476(3-12)Online publication date: Oct-2014
    • (2012)Social multimediaMultimedia Tools and Applications10.1007/s11042-010-0538-756:1(9-34)Online publication date: 1-Jan-2012
    • (2011)Multimedia data miningMultimedia Tools and Applications10.1007/s11042-010-0645-551:1(35-76)Online publication date: 1-Jan-2011
    • (2011)Fusing Content and Context with CausalityFoundations of Large-Scale Multimedia Information Management and Retrieval10.1007/978-3-642-20429-6_7(141-169)Online publication date: 26-Aug-2011
    • (2011)Multimodal FusionFoundations of Large-Scale Multimedia Information Management and Retrieval10.1007/978-3-642-20429-6_6(121-140)Online publication date: 26-Aug-2011
    • (2011)Context-Based Support Vector Machines for Interconnected Image AnnotationComputer Vision – ACCV 201010.1007/978-3-642-19315-6_17(214-227)Online publication date: 2011
    • (2010)Context-based support vector machines for interconnected image annotationProceedings of the 10th Asian conference on Computer vision - Volume Part I10.5555/1964320.1964343(214-227)Online publication date: 8-Nov-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media