Abstract
This chapter\(^\dagger\) presents a generative framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), and semantic ontology in a synergistic way. We use causal strengths to encode causalities between variables, and between variables and semantic labels. Through analytical and empirical studies, we demonstrate that our fusion approach can achieve high-quality photo annotation and good interpretability, substantially better than traditional methods.
†© ACM, 2005. This chapter is a minor revision of the author’s work with Yi Wu and Belle Tseng [1] published in MULTIMEDIA’05. Permission to publish this chapter is granted under copyright license #2587660180893.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use “network” and “graph” interchangeably to refer to “influence diagram.” The major difference between a network, a graph, and an influence diagram (which will become evident in Sect. 7.4.2) lies in how the weights of the edges are measured. Otherwise, an influence diagram or a probabilistic causal model under the assumption of the causal Markov condition is a Bayesian network [30].
- 2.
In general, when two variables u and d are dependent, we cannot tell which causes which. For photo annotation, we can determine the direction of the arcs based on domain knowledge.
- 3.
We changed the term \(P(u | \overline{d}, \xi)\) in [13] to \(P(u | \xi)\) in the formula, because \(\overline{d}\) could be interpreted as the negation (instead of absence) of d.
- 4.
To conserve space, we draw the influence diagrams only using context and content features. Relationships between semantic labels can be found in Fig. 7.2.
References
Y. Wu, E. Y Chang, B. L. Tseng, Multimodal metadata fusion using causal strength, in Proceedings of ACM Multimedia, pp 872–881, 2005
B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18, 837–842 (1996)
Y. Rui, T.S. Huang, S.F. Chang, Image retrieval: current techniques, promising directions and open issues. J. Vis. Commun. Image Represent. (1999)
D.G. Lowe, Object recognition from local scale-invariant features, in Proceedings of IEEE ICCV, pp. 1150–1157, 1999
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–100 (2004)
M. Boutell, J. Luo, Bayesian fusion of camera metadata cues in semantic scene classification, in Proceedings of IEEE CVPR, pp. 623–630
M. Naaman, A. Paepcke, H. Garcia-Molina, From where to what: metadata sharing for digital photographs with geographic coordinates, in Proceedings of the International Conference on Cooperative Information Systems (CoopIS), pp. 196–217, 2003
E.Y. Chang, Extent: fusing context, content, and semantic ontology for photo annotation, in Proceedings of ACM Workshop on Computer Vision Meets Databases(CVDB) in conjunction with ACM SIGMOD, pp. 5–11, 2005
D. Heckerman, R. Shachter Decision-theoretic foundations for causal reasoning. Microsoft technical report MSR-TR-94-11 (1994)
D. Heckerman, A bayesian approach to learning causal networks, in Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 107–118, 1995
J. Pearl, Causality: Models, Reasoning and Inference (Cambridge University Press, Cambridge, 2000)
J. Pearl, Causal inference in the health sciences: A conceptual introduction. Special issue on causal inference, Health Services and Outcomes Research Methodology, vol. 2, pp. 189–220 (Kluwer Academic Publishers, 2001)
L.R. Novick, P.W. Cheng, Assessing interactive causal influence. Psycholo. Rev. 111(2), 455–485 (2004)
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, pp. 107–118, October 2001
K. Barnard, D. Forsyth, Learning the Semantics of Words and Pictures. (2000), pp. 408–415
J.Z. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, in Proceedings of ACM Multimedia, pp. 483–484, 2000
M. Davis, S. King, N. Good, R. Sarvas, From context to content: leveraging context to infer media metadata, in Proceedings of the ACM International Conference on Multimedia, pp. 188–195, 2004
A.K. Dey, Understanding and using context. Pers. Ubiquitous Comput. J. 5(1), 4–7 (2001)
D.S. Diomidis, Position-annotated photographs: a geotemporal web. IEEE Pervasive Comput. 2(2) (2003)
M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, A. Paepcke, Context data in geo-referenced digital photo collections, in Proceedings of ACM International Conference on Multimedia, pp. 196–203, 2004
R. Jain, P. Sinha, Content without context is meaningless, in Proceedings of ACM Multimedia, pp. 1259–1268, 2010
M. Stricker, M.Orengo, Similarity of color images, in Proceedings SPIE Storage and Retrieval for Image and Video Databases, 1995
J.R. Smith, S.F. Chang, Tools and techniques for color image retrieval, in SPIE Proceedings Storage and Retrieval for Image and Video Databases IV, 1995
Y. Rui, A.C. She, T.S. Huang, Modified fourier descriptors for shape representations- a practical approach, in Proceedings of First International Workshop on Image Databases and Multi Media Search, 1996
Y. Ke, R. Sukthankar, Pca-sift: a more distinctive representation for local image descriptors, in Proceedings of IEEE CVPR, 2004
L. Khan, D. McLeod, Effective retrieval of audio information from annotated text using ontologies, in Proceedings of Workshop of Multimedia Data Mining with ACM SIGKDD, pp. 37–45, 2000
J.R. Smith, S.F. Chang, Visually searching the web for content. IEEE Multimedia 4(3), 12–20 (1997)
J. Deng, W. Dong, R. Socher, L. Li, K. Li, F.F. Li, Imagenet: a large-scale hierarchical image database, in Proceedings of IEEE CVPR, pp. 156–161, 2009
J. Williamson, Causality, in Handbook of Philosophical Logic, ed. by D. Gabbay, F. Guenthner (Kluwer, 2005)
D. Geiger, D. Heckerman, Knowledge representation and inference in similarity networks and bayesian multinets. Artif. Intell. 82, 45–74 (1996)
N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers. Mach. Learn. 29, 131–161 (1997)
E.B. Goldstein, Senstation and Perception 5th edn. (Wadsworth, Dordrecht, 1999)
N. Friedman, D. Koller, Learning bayesian networks from data (tutorial), in Proceedings of NIPS, 2000
J.B. Tenenbaum, T.L. Griffiths, Generalization, similarity, and bayesian inference. Behavior. Brain Sci. 24, 629–641 (2001)
P.J. Doshi, L.G. Greenwald, J.R. Clarke, Using bayesian networks for cleansing trauma data, in Proceedings of FLAIRS Conference, pp. 72–76, 2003
T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. Artif. Intell. Res. 2, 263–286 (1995)
NIST. Common Evaluation Measures. Appendix in Special Publication 500-250 (TREC 2001), 2001
J. Platt, Probabilistic outputs for svms and comparisons to regularized likelihood methods, in Advances in Large Margin Classifiers (MIT press, Cambridge, 1999)
Y. Wu, B.L. Tseng, J.R. Smith, Ontology-based multi-classification learning for video concept detection, in Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1003–1006, 2004
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg and Tsinghua University Press
About this chapter
Cite this chapter
Chang, E.Y. (2011). Fusing Content and Context with Causality. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-20429-6_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)