research-article

Context or Clutter? Efficiently Matching Objects Across Scenes

Authors:

Albatool Wazzan,

Stephen Macneil, and

Richard SouvenirAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

Pages 404 - 413

https://doi.org/10.1145/3652583.3658090

Published: 07 June 2024 Publication History

Abstract

Annotated images are required for numerous computer vision tasks; however, the annotation process can be time-consuming for crowdworkers and experts. Previous work has investigated novel interaction techniques and task reformulation to speed up this process; however, there remains a gap in optimizing more complex annotation tasks, such as object matching. In this paper, we explore the impact of varying the amount of context provided to annotators. We hypothesize that reducing the context around the object being matched will improve speed without sacrificing the accuracy of the annotation task. To test this hypothesis, we developed a semi-automated annotation pipeline that pre-processes images to adjust the amount of context shown around an object of interest. We conducted two studies (n = 130, n = 10) to assess the effects of context quantitatively and qualitatively. We found that while the accuracy remained the same, the time spent on the task was significantly reduced when there was less context surrounding the object. However, our qualitative findings revealed multiple scenarios in which context served as a means of guiding the object matching task, and many others in which the distinctiveness of the object guided the matching task and additional context was not needed.

References

[1]

Robert G. Alexander and Gregory J. Zelinsky. 2012. Effects of part-based similarity on visual search: The Frankenbear experiment. Vision Research, Vol. 54 (2012), 20--30. https://doi.org/10.1016/j.visres.2011.12.004

[2]

Xue Bai and Guillermo Sapiro. 2007. A geodesic framework for fast interactive image and video segmentation and matting. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, Rio de Janeiro, Brazil, 1--8.

[3]

Virginia Braun and Victoria Clarke. 2012. Thematic analysis. American Psychological Association, Washington, DC.

[4]

Justin Brooks. 2019. COCO Annotator. https://github.com/jsbroks/coco-annotator/.

[5]

Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In (QoMEX). IEEE, 2016 eighth international conference on quality of multimedia experience, Lisbon, Portugal, 1--6.

[6]

Abhishek Dutta and Andrew Zisserman. 2019. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice, France, 2276--2279. https://doi.org/10.1145/3343031.3350535

Digital Library

[7]

Miguel P. Eckstein. 2011. Visual search: A retrospective. Journal of Vision, Vol. 11, 5 (12 2011), 14--14. https://doi.org/10.1167/11.5.14

[8]

Li Fei-Fei, R. Fergus, and P. Perona. 2004. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. In 2004 Conference on Computer Vision and Pattern Recognition Workshop. CVPR, Washington, DC, USA, 178--178. https://doi.org/10.1109/CVPR.2004.383

[9]

Lex Fridman and Bryan Reimer. 2016. Semi-Automated Annotation of Discrete States in Large Video Datasets. https://doi.org/10.48550/ARXIV.1612.01035

[10]

Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 object category dataset. CaltechDATA (2007).

[11]

Carl Gutwin, Andy Cockburn, and Ashley Coveney. 2017. Peripheral Popout: The Influence of Visual Angle and Stimulus Intensity on Popout Effects. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 208--219. https://doi.org/10.1145/3025453.3025984

Digital Library

[12]

Bernard Hart, Hannah Schmidt, Ingo Klein-Harmeyer, and Wolfgang Einhäuser. 2013. Attention in natural scenes: Contrast affects rapid visual processing and fixations alike. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, Vol. 368 (10 2013), 20130067. https://doi.org/10.1098/rstb.2013.0067

[13]

Andrew Head, Codanda Appachu, Marti A. Hearst, and Björn Hartmann. 2015. Tutorons: Generating context-relevant, on-demand explanations and demonstrations of online code. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing. (VL/HCC), Atlanta, GA, USA, 3--12. https://doi.org/10.1109/VLHCC.2015.7356972

[14]

Weidong Huang, Peter Eades, and Seok-Hee Hong. 2009. Measuring Effectiveness of Graph Visualizations: A Cognitive Load Perspective. Information Visualization, Vol. 8, 3 (2009), 139--152. https://doi.org/10.1057/ivs.2009.10

Digital Library

[15]

Jessica Hullman, Nicholas Diakopoulos, Elaheh Momeni, and Eytan Adar. 2015. Content, Context, and Critique: Commenting on a Data Visualization Blog. In In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15) (Vancouver, BC, Canada) (CSCW '15). Association for Computing Machinery, New York, NY, USA, 1170--1175. https://doi.org/10.1145/2675133.2675207

Digital Library

[16]

Thomas W. Jackson and Pourya Farzaneh. 2012. Theory-based model of factors affecting information overload. International Journal of Information Management, Vol. 32, 6 (2012), 523--532. https://doi.org/10.1016/j.ijinfomgt.2012.04.006

[17]

Wonjin Jung, Lorne Olfman, Terry Ryan, and Y-T Park. 2005. An experimental study of the effects of contextual data quality and task complexity on decision performance. In IRI. IEEE, IRI -2005 IEEE International Conference on Information Reuse and Integration, Conf, Las Vegas, NV, USA, 149--154.

Digital Library

[18]

Benjamin Kellenberger, Diego Marcos, and Devis Tuia. 2019. When a few clicks make all the difference: improving weakly-supervised wildlife detection in UAV images. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops. (CVPRW), Long Beach, CA, USA, 1414--1422.

[19]

Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. CrowdForge: Crowdsourcing Complex Work. In In Proceedings of the 24th annual ACM symposium on User interface software and technology (Santa Barbara, California, USA) (UIST '11). Association for Computing Machinery, New York, NY, USA, 43--52. https://doi.org/10.1145/2047196.2047202

Digital Library

[20]

Noreen M. Klein and Manjit S. Yadav. 1989. Context Effects on Effort and Accuracy in Choice: An Enquiry into Adaptive Decision Making. Journal of Consumer Research, Vol. 15, 4 (03 1989), 411--421. https://doi.org/10.1086/209181

[21]

Xu Lan, Xiatian Zhu, and Shaogang Gong. 2018. Person Search by Multi-Scale Matching. In Proceedings of the European Conference on Computer Vision. (ECCV), Munich, Germany, 536--552.

Digital Library

[22]

Ji Hyoun Lim, Omer Tsimhoni, and Yili Liu. 2010. Investigation of Driver Performance With Night Vision and Pedestrian Detection Systems-Part I: Empirical Study on Visual Clutter and Glance Behavior. IEEE Transactions on Intelligent Transportation Systems, Vol. 11, 3 (2010), 670--677. https://doi.org/10.1109/TITS.2010.2049843

Digital Library

[23]

Chaochao Lu and Xiaoou Tang. 2015. Surpassing human-level face verification performance on LFW with GaussianFace. In Proceedings of the AAAI conference on artificial intelligence, Vol. 29. AAAI Press, Austin, Texas, USA, 3811--3819.

[24]

Stephen MacNeil, Parth Patel, and Benjamin E. Smolin. 2022. Expert Goggles: Detecting and Annotating Visualizations Using a Machine Learning Classifier. In Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST '22 Adjunct). Association for Computing Machinery, New York, NY, USA, Article 58, bibinfonumpages3 pages. https://doi.org/10.1145/3526114.3558627

Digital Library

[25]

Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, and Luc Van Gool. 2018. Deep extreme cut: From extreme points to object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE/CVF, Salt Lake City, Utah, 616--625.

[26]

A. E. Maxwell. 1977. Coefficients of Agreement Between Observers and Their Interpretation. The British Journal of Psychiatry, Vol. 130, 1 (1977), 79--83. https://doi.org/10.1192/bjp.130.1.79

[27]

Commercial Software Engineering (CSE) Microsoft. 2017. Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos. https://github.com/microsoft/VoTT

[28]

Sheila J. Nayar. 1996. Columbia Object Image Library (COIL100). In Columbia Object Image Library (COIL100). Center for Research on Intelligent Systems at the Department of Computer Science, Columbia University, New York, USA, 0.

[29]

Srishti Palani, Zijian Ding, Austin Nguyen, Andrew Chuang, Stephen MacNeil, and Steven P. Dow. 2021. CoNotate: Suggesting Queries Based on Notes Promotes Knowledge Discovery. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 726, bibinfonumpages14 pages.

[30]

Martin Rajchl, Matthew CH Lee, Ozan Oktay, Konstantinos Kamnitsas, Jonathan Passerat-Palmbach, Wenjia Bai, Mellisa Damodaram, Mary A Rutherford, Joseph V Hajnal, Bernhard Kainz, et al. 2016. Deepcut: Object segmentation from bounding box annotations using convolutional neural networks. IEEE transactions on medical imaging, Vol. 36, 2 (2016), 674--683.

[31]

Volker Roth and Thea Turner. 2009. Bezel Swipe: Conflict-Free Scrolling and Multiple Selection on Mobile Touch Screen Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI '09). Association for Computing Machinery, New York, NY, USA, 1523--1526. https://doi.org/10.1145/1518701.1518933

Digital Library

[32]

Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. " GrabCut" interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG), Vol. 23, 3 (2004), 309--314.

[33]

Olga Russakovsky, Li-Jia Li, and Li Fei-Fei. 2015. Best of both worlds: Human-machine collaboration for object annotation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (CVPR), Boston, MA, USA, 2121--2131. https://doi.org/10.1109/CVPR.2015.7298824

[34]

Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman. 2008. LabelMe: a database and web-based tool for image annotation. International journal of computer vision, Vol. 77, 1 (2008), 157--173.

Digital Library

[35]

Barry Schwartz, Andrew Ward, John Monterosso, Sonja Lyubomirsky, Katherine White, and Darrin R Lehman. 2002. Maximizing versus satisficing: happiness is a matter of choice. Journal of personality and social psychology, Vol. 83, 5 (2002), 1178.

[36]

Cheri Speier. 2006. The influence of information presentation formats on complex task decision-making performance. International Journal of Human-Computer Studies, Vol. 64, 11 (2006), 1115--1131. https://doi.org/10.1016/j.ijhcs.2006.06.007

Digital Library

[37]

Abby Stylianou, Hong Xuan, Maya Shende, Jonathan Brandt, Richard Souvenir, and Robert Pless. 2019. Hotels-50K: A Global Hotel Recognition Dataset. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 726--733. https://doi.org/10.1609/aaai.v33i01.3301726

Digital Library

[38]

David R Thomas. 2006. A general inductive approach for analyzing qualitative evaluation data. American journal of evaluation, Vol. 27, 2 (2006), 237--246.

[39]

Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020--2022. Label Studio: Data labeling software. https://github.com/heartexlabs/label-studio Open source software available from https://github.com/heartexlabs/label-studio.

[40]

Tzutalin. 2015. LabelImg. Git code. https://github.com/tzutalin/labelImg

[41]

Johanna Vompras and Stefan Conrad. 2005. A semi-automated Framework for Supporting Semantic Image Annotation. In SemAnnot@ ISWC. In SemAnnot@ ISWC, Galway, Ireland, 0--0.

[42]

Tung Vuong, Salvatore Andolina, Giulio Jacucci, and Tuukka Ruotsalo. 2021a. Does More Context Help? Effects of Context Window and Application Source on Retrieval Performance. ACM Trans. Inf. Syst., Vol. 40, 2, Article 39 (sep 2021), bibinfonumpages40 pages. https://doi.org/10.1145/3474055

Digital Library

[43]

Tung Vuong, Salvatore Andolina, Giulio Jacucci, and Tuukka Ruotsalo. 2021b. Spoken Conversational Context Improves Query Auto-Completion in Web Search. ACM Trans. Inf. Syst., Vol. 39, 3, Article 31 (may 2021), bibinfonumpages32 pages. https://doi.org/10.1145/3447875

Digital Library

[44]

Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, and Zhuowen Tu. 2014. Milcut: A sweeping line multiple instance learning paradigm for interactive image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. CVPR, Columbus, Ohio, 256--263.

Digital Library

[45]

Zoe Xu, Alejandro Lleras, and Simona Buetti. 2021. Predicting how surface texture and shape combine in the human visual system to direct attention. Scientific Reports, Vol. 11 (03 2021), 6170. https://doi.org/10.1038/s41598-021--85605--8

[46]

Jacob Young and Kristie M. Young. 2019. Don't Get Lost in the Crowd: Best Practices for Using Amazon's Mechanical Turk in Behavioral Research. Journal of the Midwest Association for Information Systems, Vol. 2019 (2019), 7--34.

[47]

Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. CoRR, Vol. abs/1506.03365 (2015). showeprint[arXiv]1506.03365

[48]

Yang Zhao, Lin Wang, and Yaming Zhang. 2021. Research thematic and emerging trends of contextual cues: a bibliometrics and visualization approach. Library Hi Tech, Vol. 39, 2 (2021), 462--487.

[49]

Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2016. Semantic Understanding of Scenes through the ADE20K Dataset. https://doi.org/10.48550/ARXIV.1608.05442

Index Terms

Context or Clutter? Efficiently Matching Objects Across Scenes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Matching
        Object detection
        Object identification
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Empirical studies in HCI
    2. HCI design and evaluation methods
      1. User studies

Recommendations

Context models and out-of-context objects

Highlights Review of different sources of contextual information for object detection. New context model based on capturing support relationships and co-occurrences among objects. Evaluation of several context models on the SUN database. Introduction of ...
Read More
Context-based matching for Web service composition

In this paper, we propose a novel matching framework for Web service composition. The framework combines the concepts of Web service, context, and ontology. We adopt a broad definition of context for Web services, encompassing all information needed for ...
Read More
Comparison of Methods to Annotate Named Entity Corpora

The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
34
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)34

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents