Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3652583.3658090acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Context or Clutter? Efficiently Matching Objects Across Scenes

Published: 07 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Annotated images are required for numerous computer vision tasks; however, the annotation process can be time-consuming for crowdworkers and experts. Previous work has investigated novel interaction techniques and task reformulation to speed up this process; however, there remains a gap in optimizing more complex annotation tasks, such as object matching. In this paper, we explore the impact of varying the amount of context provided to annotators. We hypothesize that reducing the context around the object being matched will improve speed without sacrificing the accuracy of the annotation task. To test this hypothesis, we developed a semi-automated annotation pipeline that pre-processes images to adjust the amount of context shown around an object of interest. We conducted two studies (n = 130, n = 10) to assess the effects of context quantitatively and qualitatively. We found that while the accuracy remained the same, the time spent on the task was significantly reduced when there was less context surrounding the object. However, our qualitative findings revealed multiple scenarios in which context served as a means of guiding the object matching task, and many others in which the distinctiveness of the object guided the matching task and additional context was not needed.

    References

    [1]
    Robert G. Alexander and Gregory J. Zelinsky. 2012. Effects of part-based similarity on visual search: The Frankenbear experiment. Vision Research, Vol. 54 (2012), 20--30. https://doi.org/10.1016/j.visres.2011.12.004
    [2]
    Xue Bai and Guillermo Sapiro. 2007. A geodesic framework for fast interactive image and video segmentation and matting. In 2007 IEEE 11th International Conference on Computer Vision. IEEE, Rio de Janeiro, Brazil, 1--8.
    [3]
    Virginia Braun and Victoria Clarke. 2012. Thematic analysis. American Psychological Association, Washington, DC.
    [4]
    Justin Brooks. 2019. COCO Annotator. https://github.com/jsbroks/coco-annotator/.
    [5]
    Samuel Dodge and Lina Karam. 2016. Understanding how image quality affects deep neural networks. In (QoMEX). IEEE, 2016 eighth international conference on quality of multimedia experience, Lisbon, Portugal, 1--6.
    [6]
    Abhishek Dutta and Andrew Zisserman. 2019. The VIA Annotation Software for Images, Audio and Video. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice, France, 2276--2279. https://doi.org/10.1145/3343031.3350535
    [7]
    Miguel P. Eckstein. 2011. Visual search: A retrospective. Journal of Vision, Vol. 11, 5 (12 2011), 14--14. https://doi.org/10.1167/11.5.14
    [8]
    Li Fei-Fei, R. Fergus, and P. Perona. 2004. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories. In 2004 Conference on Computer Vision and Pattern Recognition Workshop. CVPR, Washington, DC, USA, 178--178. https://doi.org/10.1109/CVPR.2004.383
    [9]
    Lex Fridman and Bryan Reimer. 2016. Semi-Automated Annotation of Discrete States in Large Video Datasets. https://doi.org/10.48550/ARXIV.1612.01035
    [10]
    Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 object category dataset. CaltechDATA (2007).
    [11]
    Carl Gutwin, Andy Cockburn, and Ashley Coveney. 2017. Peripheral Popout: The Influence of Visual Angle and Stimulus Intensity on Popout Effects. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). Association for Computing Machinery, New York, NY, USA, 208--219. https://doi.org/10.1145/3025453.3025984
    [12]
    Bernard Hart, Hannah Schmidt, Ingo Klein-Harmeyer, and Wolfgang Einhäuser. 2013. Attention in natural scenes: Contrast affects rapid visual processing and fixations alike. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, Vol. 368 (10 2013), 20130067. https://doi.org/10.1098/rstb.2013.0067
    [13]
    Andrew Head, Codanda Appachu, Marti A. Hearst, and Björn Hartmann. 2015. Tutorons: Generating context-relevant, on-demand explanations and demonstrations of online code. In 2015 IEEE Symposium on Visual Languages and Human-Centric Computing. (VL/HCC), Atlanta, GA, USA, 3--12. https://doi.org/10.1109/VLHCC.2015.7356972
    [14]
    Weidong Huang, Peter Eades, and Seok-Hee Hong. 2009. Measuring Effectiveness of Graph Visualizations: A Cognitive Load Perspective. Information Visualization, Vol. 8, 3 (2009), 139--152. https://doi.org/10.1057/ivs.2009.10
    [15]
    Jessica Hullman, Nicholas Diakopoulos, Elaheh Momeni, and Eytan Adar. 2015. Content, Context, and Critique: Commenting on a Data Visualization Blog. In In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15) (Vancouver, BC, Canada) (CSCW '15). Association for Computing Machinery, New York, NY, USA, 1170--1175. https://doi.org/10.1145/2675133.2675207
    [16]
    Thomas W. Jackson and Pourya Farzaneh. 2012. Theory-based model of factors affecting information overload. International Journal of Information Management, Vol. 32, 6 (2012), 523--532. https://doi.org/10.1016/j.ijinfomgt.2012.04.006
    [17]
    Wonjin Jung, Lorne Olfman, Terry Ryan, and Y-T Park. 2005. An experimental study of the effects of contextual data quality and task complexity on decision performance. In IRI. IEEE, IRI -2005 IEEE International Conference on Information Reuse and Integration, Conf, Las Vegas, NV, USA, 149--154.
    [18]
    Benjamin Kellenberger, Diego Marcos, and Devis Tuia. 2019. When a few clicks make all the difference: improving weakly-supervised wildlife detection in UAV images. In Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops. (CVPRW), Long Beach, CA, USA, 1414--1422.
    [19]
    Aniket Kittur, Boris Smus, Susheel Khamkar, and Robert E. Kraut. 2011. CrowdForge: Crowdsourcing Complex Work. In In Proceedings of the 24th annual ACM symposium on User interface software and technology (Santa Barbara, California, USA) (UIST '11). Association for Computing Machinery, New York, NY, USA, 43--52. https://doi.org/10.1145/2047196.2047202
    [20]
    Noreen M. Klein and Manjit S. Yadav. 1989. Context Effects on Effort and Accuracy in Choice: An Enquiry into Adaptive Decision Making. Journal of Consumer Research, Vol. 15, 4 (03 1989), 411--421. https://doi.org/10.1086/209181
    [21]
    Xu Lan, Xiatian Zhu, and Shaogang Gong. 2018. Person Search by Multi-Scale Matching. In Proceedings of the European Conference on Computer Vision. (ECCV), Munich, Germany, 536--552.
    [22]
    Ji Hyoun Lim, Omer Tsimhoni, and Yili Liu. 2010. Investigation of Driver Performance With Night Vision and Pedestrian Detection Systems-Part I: Empirical Study on Visual Clutter and Glance Behavior. IEEE Transactions on Intelligent Transportation Systems, Vol. 11, 3 (2010), 670--677. https://doi.org/10.1109/TITS.2010.2049843
    [23]
    Chaochao Lu and Xiaoou Tang. 2015. Surpassing human-level face verification performance on LFW with GaussianFace. In Proceedings of the AAAI conference on artificial intelligence, Vol. 29. AAAI Press, Austin, Texas, USA, 3811--3819.
    [24]
    Stephen MacNeil, Parth Patel, and Benjamin E. Smolin. 2022. Expert Goggles: Detecting and Annotating Visualizations Using a Machine Learning Classifier. In Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST '22 Adjunct). Association for Computing Machinery, New York, NY, USA, Article 58, bibinfonumpages3 pages. https://doi.org/10.1145/3526114.3558627
    [25]
    Kevis-Kokitsi Maninis, Sergi Caelles, Jordi Pont-Tuset, and Luc Van Gool. 2018. Deep extreme cut: From extreme points to object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE/CVF, Salt Lake City, Utah, 616--625.
    [26]
    A. E. Maxwell. 1977. Coefficients of Agreement Between Observers and Their Interpretation. The British Journal of Psychiatry, Vol. 130, 1 (1977), 79--83. https://doi.org/10.1192/bjp.130.1.79
    [27]
    Commercial Software Engineering (CSE) Microsoft. 2017. Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos. https://github.com/microsoft/VoTT
    [28]
    Sheila J. Nayar. 1996. Columbia Object Image Library (COIL100). In Columbia Object Image Library (COIL100). Center for Research on Intelligent Systems at the Department of Computer Science, Columbia University, New York, USA, 0.
    [29]
    Srishti Palani, Zijian Ding, Austin Nguyen, Andrew Chuang, Stephen MacNeil, and Steven P. Dow. 2021. CoNotate: Suggesting Queries Based on Notes Promotes Knowledge Discovery. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI '21). Association for Computing Machinery, New York, NY, USA, Article 726, bibinfonumpages14 pages.
    [30]
    Martin Rajchl, Matthew CH Lee, Ozan Oktay, Konstantinos Kamnitsas, Jonathan Passerat-Palmbach, Wenjia Bai, Mellisa Damodaram, Mary A Rutherford, Joseph V Hajnal, Bernhard Kainz, et al. 2016. Deepcut: Object segmentation from bounding box annotations using convolutional neural networks. IEEE transactions on medical imaging, Vol. 36, 2 (2016), 674--683.
    [31]
    Volker Roth and Thea Turner. 2009. Bezel Swipe: Conflict-Free Scrolling and Multiple Selection on Mobile Touch Screen Devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI '09). Association for Computing Machinery, New York, NY, USA, 1523--1526. https://doi.org/10.1145/1518701.1518933
    [32]
    Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. " GrabCut" interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG), Vol. 23, 3 (2004), 309--314.
    [33]
    Olga Russakovsky, Li-Jia Li, and Li Fei-Fei. 2015. Best of both worlds: Human-machine collaboration for object annotation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (CVPR), Boston, MA, USA, 2121--2131. https://doi.org/10.1109/CVPR.2015.7298824
    [34]
    Bryan C Russell, Antonio Torralba, Kevin P Murphy, and William T Freeman. 2008. LabelMe: a database and web-based tool for image annotation. International journal of computer vision, Vol. 77, 1 (2008), 157--173.
    [35]
    Barry Schwartz, Andrew Ward, John Monterosso, Sonja Lyubomirsky, Katherine White, and Darrin R Lehman. 2002. Maximizing versus satisficing: happiness is a matter of choice. Journal of personality and social psychology, Vol. 83, 5 (2002), 1178.
    [36]
    Cheri Speier. 2006. The influence of information presentation formats on complex task decision-making performance. International Journal of Human-Computer Studies, Vol. 64, 11 (2006), 1115--1131. https://doi.org/10.1016/j.ijhcs.2006.06.007
    [37]
    Abby Stylianou, Hong Xuan, Maya Shende, Jonathan Brandt, Richard Souvenir, and Robert Pless. 2019. Hotels-50K: A Global Hotel Recognition Dataset. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 01 (Jul. 2019), 726--733. https://doi.org/10.1609/aaai.v33i01.3301726
    [38]
    David R Thomas. 2006. A general inductive approach for analyzing qualitative evaluation data. American journal of evaluation, Vol. 27, 2 (2006), 237--246.
    [39]
    Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020--2022. Label Studio: Data labeling software. https://github.com/heartexlabs/label-studio Open source software available from https://github.com/heartexlabs/label-studio.
    [40]
    Tzutalin. 2015. LabelImg. Git code. https://github.com/tzutalin/labelImg
    [41]
    Johanna Vompras and Stefan Conrad. 2005. A semi-automated Framework for Supporting Semantic Image Annotation. In SemAnnot@ ISWC. In SemAnnot@ ISWC, Galway, Ireland, 0--0.
    [42]
    Tung Vuong, Salvatore Andolina, Giulio Jacucci, and Tuukka Ruotsalo. 2021a. Does More Context Help? Effects of Context Window and Application Source on Retrieval Performance. ACM Trans. Inf. Syst., Vol. 40, 2, Article 39 (sep 2021), bibinfonumpages40 pages. https://doi.org/10.1145/3474055
    [43]
    Tung Vuong, Salvatore Andolina, Giulio Jacucci, and Tuukka Ruotsalo. 2021b. Spoken Conversational Context Improves Query Auto-Completion in Web Search. ACM Trans. Inf. Syst., Vol. 39, 3, Article 31 (may 2021), bibinfonumpages32 pages. https://doi.org/10.1145/3447875
    [44]
    Jiajun Wu, Yibiao Zhao, Jun-Yan Zhu, Siwei Luo, and Zhuowen Tu. 2014. Milcut: A sweeping line multiple instance learning paradigm for interactive image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. CVPR, Columbus, Ohio, 256--263.
    [45]
    Zoe Xu, Alejandro Lleras, and Simona Buetti. 2021. Predicting how surface texture and shape combine in the human visual system to direct attention. Scientific Reports, Vol. 11 (03 2021), 6170. https://doi.org/10.1038/s41598-021--85605--8
    [46]
    Jacob Young and Kristie M. Young. 2019. Don't Get Lost in the Crowd: Best Practices for Using Amazon's Mechanical Turk in Behavioral Research. Journal of the Midwest Association for Information Systems, Vol. 2019 (2019), 7--34.
    [47]
    Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. CoRR, Vol. abs/1506.03365 (2015). showeprint[arXiv]1506.03365
    [48]
    Yang Zhao, Lin Wang, and Yaming Zhang. 2021. Research thematic and emerging trends of contextual cues: a bibliometrics and visualization approach. Library Hi Tech, Vol. 39, 2 (2021), 462--487.
    [49]
    Bolei Zhou, Hang Zhao, Xavier Puig, Tete Xiao, Sanja Fidler, Adela Barriuso, and Antonio Torralba. 2016. Semantic Understanding of Scenes through the ADE20K Dataset. https://doi.org/10.48550/ARXIV.1608.05442

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
    May 2024
    1379 pages
    ISBN:9798400706196
    DOI:10.1145/3652583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. annotation
    2. computer vision problems
    3. computing methodologies
    4. hci design and evaluation methods
    5. human computer interaction (hci)
    6. human-centered computing
    7. matching
    8. object detection
    9. user studies

    Qualifiers

    • Research-article

    Conference

    ICMR '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 34
      Total Downloads
    • Downloads (Last 12 months)34
    • Downloads (Last 6 weeks)34

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media