Fusing Content and Context with Causality

Chang, Edward Y.

doi:10.1007/978-3-642-20429-6_7

Edward Y. Chang²

1021 Accesses

Abstract

This chapter$^\dagger$ presents a generative framework that uses influence diagrams to fuse metadata of multiple modalities for photo annotation. We fuse contextual information (location, time, and camera parameters), visual content (holistic and local perceptual features), and semantic ontology in a synergistic way. We use causal strengths to encode causalities between variables, and between variables and semantic labels. Through analytical and empirical studies, we demonstrate that our fusion approach can achieve high-quality photo annotation and good interpretability, substantially better than traditional methods.

^†© ACM, 2005. This chapter is a minor revision of the author’s work with Yi Wu and Belle Tseng [1] published in MULTIMEDIA’05. Permission to publish this chapter is granted under copyright license #2587660180893.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use “network” and “graph” interchangeably to refer to “influence diagram.” The major difference between a network, a graph, and an influence diagram (which will become evident in Sect. 7.4.2) lies in how the weights of the edges are measured. Otherwise, an influence diagram or a probabilistic causal model under the assumption of the causal Markov condition is a Bayesian network [30].
2.
In general, when two variables u and d are dependent, we cannot tell which causes which. For photo annotation, we can determine the direction of the arcs based on domain knowledge.
3.
We changed the term $P(u | \overline{d}, \xi)$ in [13] to $P(u | \xi)$ in the formula, because $\overline{d}$ could be interpreted as the negation (instead of absence) of d.
4.
To conserve space, we draw the influence diagrams only using context and content features. Relationships between semantic labels can be found in Fig. 7.2.

References

Y. Wu, E. Y Chang, B. L. Tseng, Multimodal metadata fusion using causal strength, in Proceedings of ACM Multimedia, pp 872–881, 2005
Google Scholar
B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data. IEEE Trans. Pattern Anal. Mach. Intell. 18, 837–842 (1996)
Article Google Scholar
Y. Rui, T.S. Huang, S.F. Chang, Image retrieval: current techniques, promising directions and open issues. J. Vis. Commun. Image Represent. (1999)
Google Scholar
D.G. Lowe, Object recognition from local scale-invariant features, in Proceedings of IEEE ICCV, pp. 1150–1157, 1999
Google Scholar
D.G. Lowe, Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–100 (2004)
Article Google Scholar
M. Boutell, J. Luo, Bayesian fusion of camera metadata cues in semantic scene classification, in Proceedings of IEEE CVPR, pp. 623–630
Google Scholar
M. Naaman, A. Paepcke, H. Garcia-Molina, From where to what: metadata sharing for digital photographs with geographic coordinates, in Proceedings of the International Conference on Cooperative Information Systems (CoopIS), pp. 196–217, 2003
Google Scholar
E.Y. Chang, Extent: fusing context, content, and semantic ontology for photo annotation, in Proceedings of ACM Workshop on Computer Vision Meets Databases(CVDB) in conjunction with ACM SIGMOD, pp. 5–11, 2005
Google Scholar
D. Heckerman, R. Shachter Decision-theoretic foundations for causal reasoning. Microsoft technical report MSR-TR-94-11 (1994)
Google Scholar
D. Heckerman, A bayesian approach to learning causal networks, in Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 107–118, 1995
Google Scholar
J. Pearl, Causality: Models, Reasoning and Inference (Cambridge University Press, Cambridge, 2000)
Google Scholar
J. Pearl, Causal inference in the health sciences: A conceptual introduction. Special issue on causal inference, Health Services and Outcomes Research Methodology, vol. 2, pp. 189–220 (Kluwer Academic Publishers, 2001)
Google Scholar
L.R. Novick, P.W. Cheng, Assessing interactive causal influence. Psycholo. Rev. 111(2), 455–485 (2004)
Article Google Scholar
S. Tong, E. Chang, Support vector machine active learning for image retrieval, in Proceedings of ACM International Conference on Multimedia, pp. 107–118, October 2001
Google Scholar
K. Barnard, D. Forsyth, Learning the Semantics of Words and Pictures. (2000), pp. 408–415
Google Scholar
J.Z. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive integrated matching for picture libraries, in Proceedings of ACM Multimedia, pp. 483–484, 2000
Google Scholar
M. Davis, S. King, N. Good, R. Sarvas, From context to content: leveraging context to infer media metadata, in Proceedings of the ACM International Conference on Multimedia, pp. 188–195, 2004
Google Scholar
A.K. Dey, Understanding and using context. Pers. Ubiquitous Comput. J. 5(1), 4–7 (2001)
Article Google Scholar
D.S. Diomidis, Position-annotated photographs: a geotemporal web. IEEE Pervasive Comput. 2(2) (2003)
Google Scholar
M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, A. Paepcke, Context data in geo-referenced digital photo collections, in Proceedings of ACM International Conference on Multimedia, pp. 196–203, 2004
Google Scholar
R. Jain, P. Sinha, Content without context is meaningless, in Proceedings of ACM Multimedia, pp. 1259–1268, 2010
Google Scholar
http://www.exif.org
M. Stricker, M.Orengo, Similarity of color images, in Proceedings SPIE Storage and Retrieval for Image and Video Databases, 1995
Google Scholar
J.R. Smith, S.F. Chang, Tools and techniques for color image retrieval, in SPIE Proceedings Storage and Retrieval for Image and Video Databases IV, 1995
Google Scholar
Y. Rui, A.C. She, T.S. Huang, Modified fourier descriptors for shape representations- a practical approach, in Proceedings of First International Workshop on Image Databases and Multi Media Search, 1996
Google Scholar
Y. Ke, R. Sukthankar, Pca-sift: a more distinctive representation for local image descriptors, in Proceedings of IEEE CVPR, 2004
Google Scholar
L. Khan, D. McLeod, Effective retrieval of audio information from annotated text using ontologies, in Proceedings of Workshop of Multimedia Data Mining with ACM SIGKDD, pp. 37–45, 2000
Google Scholar
J.R. Smith, S.F. Chang, Visually searching the web for content. IEEE Multimedia 4(3), 12–20 (1997)
Article Google Scholar
J. Deng, W. Dong, R. Socher, L. Li, K. Li, F.F. Li, Imagenet: a large-scale hierarchical image database, in Proceedings of IEEE CVPR, pp. 156–161, 2009
Google Scholar
J. Williamson, Causality, in Handbook of Philosophical Logic, ed. by D. Gabbay, F. Guenthner (Kluwer, 2005)
Google Scholar
D. Geiger, D. Heckerman, Knowledge representation and inference in similarity networks and bayesian multinets. Artif. Intell. 82, 45–74 (1996)
Article MathSciNet Google Scholar
N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers. Mach. Learn. 29, 131–161 (1997)
Article MATH Google Scholar
E.B. Goldstein, Senstation and Perception 5th edn. (Wadsworth, Dordrecht, 1999)
Google Scholar
N. Friedman, D. Koller, Learning bayesian networks from data (tutorial), in Proceedings of NIPS, 2000
Google Scholar
J.B. Tenenbaum, T.L. Griffiths, Generalization, similarity, and bayesian inference. Behavior. Brain Sci. 24, 629–641 (2001)
Google Scholar
P.J. Doshi, L.G. Greenwald, J.R. Clarke, Using bayesian networks for cleansing trauma data, in Proceedings of FLAIRS Conference, pp. 72–76, 2003
Google Scholar
T. Dietterich, G. Bakiri, Solving multiclass learning problems via error-correcting output codes. Artif. Intell. Res. 2, 263–286 (1995)
MATH Google Scholar
NIST. Common Evaluation Measures. Appendix in Special Publication 500-250 (TREC 2001), 2001
Google Scholar
J. Platt, Probabilistic outputs for svms and comparisons to regularized likelihood methods, in Advances in Large Margin Classifiers (MIT press, Cambridge, 1999)
Google Scholar
Y. Wu, B.L. Tseng, J.R. Smith, Ontology-based multi-classification learning for video concept detection, in Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 1003–1006, 2004
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc., Mountain View, CA, 94306, USA
Edward Y. Chang

Authors

Edward Y. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward Y. Chang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chang, E.Y. (2011). Fusing Content and Context with Causality. In: Foundations of Large-Scale Multimedia Information Management and Retrieval. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20429-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-20429-6_7
Published: 26 August 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20428-9
Online ISBN: 978-3-642-20429-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics