Human-Computer Interaction
See recent articles
- [1] arXiv:2407.20390 [pdf, other]
-
Title: Hug Reports: Supporting Expression of Appreciation between Users and Contributors of Open Source Software PackagesSubjects: Human-Computer Interaction (cs.HC)
Contributors to open source software packages often describe feeling discouraged by the lack of positive feedback from users. This paper describes a technology probe, Hug Reports, that provides users a communication affordance within their code editors, through which users can convey appreciation to contributors of packages they use. In our field study, 18 users interacted with the probe for 3 weeks, resulting in messages of appreciation to 550 contributors, 26 of whom participated in subsequent research. Our findings show how locating a communication affordance within the code editor, and allowing users to express appreciation in terms of the abstractions they are exposed to (packages, modules, functions), can support exchanges of appreciation that are meaningful to users and contributors. Findings also revealed the moments in which users expressed appreciation, the two meanings that appreciation took on -- as a measure of utility and as an act of expressive communication -- and how contributors' reactions to appreciation were influenced by their perceived level of contribution. Based on these findings, we discuss opportunities and challenges for designing appreciation systems for open source in particular, and peer production communities more generally.
- [2] arXiv:2407.20519 [pdf, other]
-
Title: DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion AnalysisComments: 11 pages, 3 figuresSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Affective brain-computer interfaces (aBCIs) are increasingly recognized for their potential in monitoring and interpreting emotional states through electroencephalography (EEG) signals. Current EEG-based emotion recognition methods perform well with short segments of EEG data. However, these methods encounter significant challenges in real-life scenarios where emotional states evolve over extended periods. To address this issue, we propose a Dual Attentive (DuA) transformer framework for long-term continuous EEG emotion analysis. Unlike segment-based approaches, the DuA transformer processes an entire EEG trial as a whole, identifying emotions at the trial level, referred to as trial-based emotion analysis. This framework is designed to adapt to varying signal lengths, providing a substantial advantage over traditional methods. The DuA transformer incorporates three key modules: the spatial-spectral network module, the temporal network module, and the transfer learning module. The spatial-spectral network module simultaneously captures spatial and spectral information from EEG signals, while the temporal network module detects temporal dependencies within long-term EEG data. The transfer learning module enhances the model's adaptability across different subjects and conditions. We extensively evaluate the DuA transformer using a self-constructed long-term EEG emotion database, along with two benchmark EEG emotion databases. On the basis of the trial-based leave-one-subject-out cross-subject cross-validation protocol, our experimental results demonstrate that the proposed DuA transformer significantly outperforms existing methods in long-term continuous EEG emotion analysis, with an average enhancement of 5.28%.
- [3] arXiv:2407.20522 [pdf, other]
-
Title: Evaluating Fairness in Black-box Algorithmic Markets: A Case Study of Ride Sharing in ChicagoComments: Accepted to the Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact, co-located with the International Conference on Machine Learning, Vienna, AustriaSubjects: Human-Computer Interaction (cs.HC)
This study examines fairness within the rideshare industry, focusing on both drivers' wages and riders' trip fares. Through quantitative analysis, we found that drivers' hourly wages are significantly influenced by factors such as race/ethnicity, health insurance status, tenure to the platform, and working hours. Despite platforms' policies not intentionally embedding biases, disparities persist based on these characteristics. For ride fares, we propose a method to audit the pricing policy of a proprietary algorithm by replicating it; we conduct a hypothesis test to determine if the predicted rideshare fare is greater than the taxi fare, taking into account the approximation error in the replicated model. Challenges in accessing data and transparency hinder our ability to isolate discrimination from other factors, underscoring the need for collaboration with rideshare platforms and drivers to enhance fairness in algorithmic wage determination and pricing.
- [4] arXiv:2407.20570 [pdf, other]
-
Title: Fine-Tuned Large Language Model for Visualization System: A Study on Self-Regulated Learning in EducationLin Gao, Jing Lu, Zekai Shao, Ziyue Lin, Shengbin Yue, Chiokit Ieong, Yi Sun, Rory James Zauner, Zhongyu Wei, Siming ChenSubjects: Human-Computer Interaction (cs.HC)
Large Language Models (LLMs) have shown great potential in intelligent visualization systems, especially for domain-specific applications. Integrating LLMs into visualization systems presents challenges, and we categorize these challenges into three alignments: domain problems with LLMs, visualization with LLMs, and interaction with LLMs. To achieve these alignments, we propose a framework and outline a workflow to guide the application of fine-tuned LLMs to enhance visual interactions for domain-specific tasks. These alignment challenges are critical in education because of the need for an intelligent visualization system to support beginners' self-regulated learning. Therefore, we apply the framework to education and introduce Tailor-Mind, an interactive visualization system designed to facilitate self-regulated learning for artificial intelligence beginners. Drawing on insights from a preliminary study, we identify self-regulated learning tasks and fine-tuning objectives to guide visualization design and tuning data construction. Our focus on aligning visualization with fine-tuned LLM makes Tailor-Mind more like a personalized tutor. Tailor-Mind also supports interactive recommendations to help beginners better achieve their learning goals. Model performance evaluations and user studies confirm that Tailor-Mind improves the self-regulated learning experience, effectively validating the proposed framework.
- [5] arXiv:2407.20571 [pdf, other]
-
Title: Considering Visualization Example GalleriesSubjects: Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Example galleries are often used to teach, document, and advertise visually-focused domain-specific languages and libraries, such as those producing visualizations, diagrams, or webpages. Despite their ubiquity, there is no consensus on the role of "example galleries", let alone what the best practices might be for their creation or curation. To understand gallery meaning and usage, we interviewed the creators (N=11) and users (N=9) of prominent visualization-adjacent tools. From these interviews we synthesized strategies and challenges for gallery curation and management (e.g. weighing the costs/benefits of adding new examples and trade-offs in richness vs ease of use), highlighted the differences between planned and actual gallery usage (e.g. opportunistic reuse vs search-engine optimization), and reflected on parts of the gallery design space not explored (e.g. highlighting the potential of tool assistance). We found that galleries are multi-faceted structures whose form and content are motivated to accommodate different usages--ranging from marketing material to test suite to extended documentation. This work offers a foundation for future support tools by characterizing gallery design and management, as well as by highlighting challenges and opportunities in the space (such as how more diverse galleries make reuse tasks simpler, but complicate upkeep).
- [6] arXiv:2407.20608 [pdf, other]
-
Title: Questionnaires for Everyone: Streamlining Cross-Cultural Questionnaire Adaptation with GPT-Based Translation Quality EvaluationComments: 19 pages, 13 figuresSubjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL)
Adapting questionnaires to new languages is a resource-intensive process often requiring the hiring of multiple independent translators, which limits the ability of researchers to conduct cross-cultural research and effectively creates inequalities in research and society. This work presents a prototype tool that can expedite the questionnaire translation process. The tool incorporates forward-backward translation using DeepL alongside GPT-4-generated translation quality evaluations and improvement suggestions. We conducted two online studies in which participants translated questionnaires from English to either German (Study 1; n=10) or Portuguese (Study 2; n=20) using our prototype. To evaluate the quality of the translations created using the tool, evaluation scores between conventionally translated and tool-supported versions were compared. Our results indicate that integrating LLM-generated translation quality evaluations and suggestions for improvement can help users independently attain results similar to those provided by conventional, non-NLP-supported translation methods. This is the first step towards more equitable questionnaire-based research, powered by AI.
- [7] arXiv:2407.20637 [pdf, other]
-
Title: A Qualitative Investigation to Design Empathetic Agents as Conversation Partners for People with Autism Spectrum DisorderComments: 4 pages, 1 figure, to be published in the the conference proceedings of IEEE Conference on Games 2024Subjects: Human-Computer Interaction (cs.HC)
Autism Spectrum Disorder (ASD) can profoundly affect reciprocal social communication, resulting in substantial and challenging impairments. One aspect is that for people with ASD conversations in everyday life are challenging due to difficulties in understanding social cues, interpreting emotions, and maintaining social verbal exchanges. To address these challenges and enhance social skills, we propose the development of a learning game centered around social interaction and conversation, featuring Artificial Intelligence agents. Our initial step involves seven expert interviews to gain insight into the requirements for empathetic and conversational agents in the field of improving social skills for people with ASD in a gamified environment. We have identified two distinct use cases: (1) Conversation partners to discuss real-life issues and (2) Training partners to experience various scenarios to improve social skills. In the latter case, users will receive quests for interacting with the agent. Additionally, the agent can assign quests to the user, prompting specific conversations in real life and providing rewards for successful completion of quests.
- [8] arXiv:2407.20666 [pdf, other]
-
Title: Steps Towards an Infrastructure for Scholarly SynthesisSubjects: Human-Computer Interaction (cs.HC)
Sharing, reusing, and synthesizing knowledge is central to the research process, both individually, and with others. These core functions are not supported by our formal scholarly publishing infrastructure: instead of the smooth functioning of functional infrastructure, researchers resort to laborious "hacks" and workarounds to "mine" publications for what they need, and struggle to efficiently share the resulting information with others. Information scientists have proposed an alternative infrastructure based on the more appropriately granular model of a discourse graph of claims, and evidence, along with key rhetorical relationships between them. However, despite significant technical progress on standards and platforms, the predominant infrastructure remains steadfastly document-based. Drawing from infrastructure studies, we locate the current infrastructural bottlenecks in the lack of local systems that integrate discourse-centric models to augment synthesis work, from which an infrastructure for synthesis can be grown. Through 3 years of research through design and field deployment in a distributed community of hypertext notebook users, we elaborate a design vision of what can and should be built in order to grow a discourse-centric synthesis infrastructure: a thriving "installed base" of researchers authoring local, shareable discourse graphs to improve synthesis work, enhance primary research and research training, and augment collaborative research. We discuss how this design vision -- and our empirical work -- contributes steps towards a new infrastructure for synthesis, and increases HCI's capacity to advance collective intelligence and solve infrastructure-level problems.
- [9] arXiv:2407.20674 [pdf, other]
-
Title: TactIcons: Designing 3D Printed Map Icons for People who are Blind or have Low VisionComments: Published in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23-28, 2023, Hamburg, Germany. ACM, New York, NY, USASubjects: Human-Computer Interaction (cs.HC)
Visual icons provide immediate recognition of features on print maps but do not translate well for touch reading by people who are blind or have low vision due to the low fidelity of tactile perception. We explored 3D printed icons as an equivalent to visual icons for tactile maps addressing these problems. We designed over 200 tactile icons (TactIcons) for street and park maps. These were touch tested by blind and sighted people, resulting in a corpus of 33 icons that can be recognised instantly and a further 34 icons that are easily learned. Importantly, this work has informed the creation of detailed guidelines for the design of TactIcons and a practical methodology for touch testing new TactIcons. It is hoped that this work will contribute to the creation of more inclusive, user-friendly tactile maps for people who are blind or have low vision.
- [10] arXiv:2407.20712 [pdf, other]
-
Title: Cocobo: Exploring Large Language Models as the Engine for End-User Robot ProgrammingComments: This is the preprint version of a paper accepted for presentation at the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2024Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
End-user development allows everyday users to tailor service robots or applications to their needs. One user-friendly approach is natural language programming. However, it encounters challenges such as an expansive user expression space and limited support for debugging and editing, which restrict its application in end-user programming. The emergence of large language models (LLMs) offers promising avenues for the translation and interpretation between human language instructions and the code executed by robots, but their application in end-user programming systems requires further study. We introduce Cocobo, a natural language programming system with interactive diagrams powered by LLMs. Cocobo employs LLMs to understand users' authoring intentions, generate and explain robot programs, and facilitate the conversion between executable code and flowchart representations. Our user study shows that Cocobo has a low learning curve, enabling even users with zero coding experience to customize robot programs successfully.
- [11] arXiv:2407.20735 [pdf, other]
-
Title: Practices and Strategies in Responsive Thematic Map Design: A Report from Design Workshops with ExpertsComments: 10 pages, 4 figures, accepted at VIS 2024Subjects: Human-Computer Interaction (cs.HC)
This paper discusses challenges and design strategies in responsive design for thematic maps in information visualization. Thematic maps pose a number of unique challenges for responsiveness, such as inflexible aspect ratios that do not easily adapt to varying screen dimensions, or densely clustered visual elements in urban areas becoming illegible at smaller scales. However, design guidance on how to best address these issues is currently lacking. We conducted design sessions with eight professional designers and developers of web-based thematic maps for information visualization. Participants were asked to redesign a given map for various screen sizes and aspect ratios and to describe their reasoning for when and how they adapted the design. We report general observations of practitioners' motivations, decision-making processes, and personal design frameworks. We then derive seven challenges commonly encountered in responsive maps, and 17 strategies to address them, such as repositioning elements, segmenting the map, or using alternative visualizations. We compile these challenges and strategies into an illustrated cheat sheet targeted at anyone designing or learning to design responsive maps. The cheat sheet is available online: this https URL
New submissions for Wednesday, 31 July 2024 (showing 11 of 11 entries )
- [12] arXiv:2407.20250 (cross-list from eess.SP) [pdf, other]
-
Title: Riemannian Geometry-Based EEG Approaches: A Literature ReviewImad Eddine Tibermacine, Samuele Russo, Ahmed Tibermacine, Abdelaziz Rabehi, Bachir Nail, Kamel Kadri, Christian NapoliSubjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
The application of Riemannian geometry in the decoding of brain-computer interfaces (BCIs) has swiftly garnered attention because of its straightforwardness, precision, and resilience, along with its aptitude for transfer learning, which has been demonstrated through significant achievements in global BCI competitions. This paper presents a comprehensive review of recent advancements in the integration of deep learning with Riemannian geometry to enhance EEG signal decoding in BCIs. Our review updates the findings since the last major review in 2017, comparing modern approaches that utilize deep learning to improve the handling of non-Euclidean data structures inherent in EEG signals. We discuss how these approaches not only tackle the traditional challenges of noise sensitivity, non-stationarity, and lengthy calibration times but also introduce novel classification frameworks and signal processing techniques to reduce these limitations significantly. Furthermore, we identify current shortcomings and propose future research directions in manifold learning and riemannian-based classification, focusing on practical implementations and theoretical expansions, such as feature tracking on manifolds, multitask learning, feature extraction, and transfer learning. This review aims to bridge the gap between theoretical research and practical, real-world applications, making sophisticated mathematical approaches accessible and actionable for BCI enhancements.
- [13] arXiv:2407.20439 (cross-list from cs.RO) [pdf, other]
-
Title: Haptic feedback of front car motion can improve driving controlSubjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)
This study investigates the role of haptic feedback in a car-following scenario, where information about the motion of the front vehicle is provided through a virtual elastic connection with it. Using a robotic interface in a simulated driving environment, we examined the impact of varying levels of such haptic feedback on the driver's ability to follow the road while avoiding obstacles. The results of an experiment with 15 subjects indicate that haptic feedback from the front car's motion can significantly improve driving control (i.e., reduce motion jerk and deviation from the road) and reduce mental load (evaluated via questionnaire). This suggests that haptic communication, as observed between physically interacting humans, can be leveraged to improve safety and efficiency in automated driving systems, warranting further testing in real driving scenarios.
- [14] arXiv:2407.20513 (cross-list from cs.CL) [pdf, other]
-
Title: Prompt2DeModel: Declarative Neuro-Symbolic Modeling with Natural LanguageComments: Accepted in NeSy 2024 ConferenceSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
This paper presents a conversational pipeline for crafting domain knowledge for complex neuro-symbolic models through natural language prompts. It leverages large language models to generate declarative programs in the DomiKnowS framework. The programs in this framework express concepts and their relationships as a graph in addition to logical constraints between them. The graph, later, can be connected to trainable neural models according to those specifications. Our proposed pipeline utilizes techniques like dynamic in-context demonstration retrieval, model refinement based on feedback from a symbolic parser, visualization, and user interaction to generate the tasks' structure and formal knowledge representation. This approach empowers domain experts, even those not well-versed in ML/AI, to formally declare their knowledge to be incorporated in customized neural models in the DomiKnowS framework.
- [15] arXiv:2407.20542 (cross-list from cs.CV) [pdf, other]
-
Title: HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose EstimationComments: Accepted as a conference paper to European Conference on Computer Vision (ECCV) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
The extraction of keypoint positions from input hand frames, known as 3D hand pose estimation, is crucial for various human-computer interaction applications. However, current approaches often struggle with the dynamic nature of self-occlusion of hands and intra-occlusion with interacting objects. To address this challenge, this paper proposes the Denoising Adaptive Graph Transformer, HandDAGT, for hand pose estimation. The proposed HandDAGT leverages a transformer structure to thoroughly explore effective geometric features from input patches. Additionally, it incorporates a novel attention mechanism to adaptively weigh the contribution of kinematic correspondence and local geometric features for the estimation of specific keypoints. This attribute enables the model to adaptively employ kinematic and local information based on the occlusion situation, enhancing its robustness and accuracy. Furthermore, we introduce a novel denoising training strategy aimed at improving the model's robust performance in the face of occlusion challenges. Experimental results show that the proposed model significantly outperforms the existing methods on four challenging hand pose benchmark datasets. Codes and pre-trained models are publicly available at this https URL.
- [16] arXiv:2407.20845 (cross-list from cs.CV) [pdf, other]
-
Title: Assessing Graphical Perception of Image Embedding Models using Channel EffectivenessComments: In Proceedings of the 2024 IEEE Visualization and Visual Analytics (VIS)Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
Recent advancements in vision models have greatly improved their ability to handle complex chart understanding tasks, like chart captioning and question answering. However, it remains challenging to assess how these models process charts. Existing benchmarks only roughly evaluate model performance without evaluating the underlying mechanisms, such as how models extract image embeddings. This limits our understanding of the model's ability to perceive fundamental graphical components. To address this, we introduce a novel evaluation framework to assess the graphical perception of image embedding models. For chart comprehension, we examine two main aspects of channel effectiveness: accuracy and discriminability of various visual channels. Channel accuracy is assessed through the linearity of embeddings, measuring how well the perceived magnitude aligns with the size of the stimulus. Discriminability is evaluated based on the distances between embeddings, indicating their distinctness. Our experiments with the CLIP model show that it perceives channel accuracy differently from humans and shows unique discriminability in channels like length, tilt, and curvature. We aim to develop this work into a broader benchmark for reliable visual encoders, enhancing models for precise chart comprehension and human-like perception in future applications.
- [17] arXiv:2407.20900 (cross-list from cs.SE) [pdf, other]
-
Title: Visual Analysis of GitHub Issues to Gain InsightsSubjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)
Version control systems are integral to software development, with GitHub emerging as a popular online platform due to its comprehensive project management tools, including issue tracking and pull requests. However, GitHub lacks a direct link between issues and commits, making it difficult for developers to understand how specific issues are resolved. Although GitHub's Insights page provides some visualization for repository data, the representation of issues and commits related data in a textual format hampers quick evaluation of issue management. This paper presents a prototype web application that generates visualizations to offer insights into issue timelines and reveals different factors related to issues. It focuses on the lifecycle of issues and depicts vital information to enhance users' understanding of development patterns in their projects. We demonstrate the effectiveness of our approach through case studies involving three open-source GitHub repositories. Furthermore, we conducted a user evaluation to validate the efficacy of our prototype in conveying crucial repository information more efficiently and rapidly.
- [18] arXiv:2407.20990 (cross-list from cs.AI) [pdf, other]
-
Title: From Feature Importance to Natural Language Explanations Using LLMs with RAGSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
As machine learning becomes increasingly integral to autonomous decision-making processes involving human interaction, the necessity of comprehending the model's outputs through conversational means increases. Most recently, foundation models are being explored for their potential as post hoc explainers, providing a pathway to elucidate the decision-making mechanisms of predictive models. In this work, we introduce traceable question-answering, leveraging an external knowledge repository to inform the responses of Large Language Models (LLMs) to user queries within a scene understanding task. This knowledge repository comprises contextual details regarding the model's output, containing high-level features, feature importance, and alternative probabilities. We employ subtractive counterfactual reasoning to compute feature importance, a method that entails analysing output variations resulting from decomposing semantic features. Furthermore, to maintain a seamless conversational flow, we integrate four key characteristics - social, causal, selective, and contrastive - drawn from social science research on human explanations into a single-shot prompt, guiding the response generation process. Our evaluation demonstrates that explanations generated by the LLMs encompassed these elements, indicating its potential to bridge the gap between complex model outputs and natural language expressions.
- [19] arXiv:2407.21010 (cross-list from cs.CY) [pdf, other]
-
Title: Human-Data Interaction Framework: A Comprehensive Model for a Future Driven by Data and HumansComments: 39 pages, 138 references, Human-Data InteractionSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
In an age defined by rapid data expansion, the connection between individuals and their digital footprints has become more intricate. The Human-Data Interaction (HDI) framework has become an essential approach to tackling the challenges and ethical issues associated with data governance and utilization in the modern digital world. This paper outlines the fundamental steps required for organizations to seamlessly integrate HDI principles, emphasizing auditing, aligning, formulating considerations, and the need for continuous monitoring and adaptation. Through a thorough audit, organizations can critically assess their current data management practices, trace the data lifecycle from collection to disposal, and evaluate the effectiveness of existing policies, security protocols, and user interfaces. The next step involves aligning these practices with the main HDI principles, such as informed consent, data transparency, user control, algorithm transparency, and ethical data use, to identify gaps that need strategic action. Formulating preliminary considerations includes developing policies and technical solutions to close identified gaps, ensuring that these practices not only meet legal standards, but also promote fairness and accountability in data interactions. The final step, monitoring and adaptation, highlights the need for setting up continuous evaluation mechanisms and being responsive to technological, regulatory, and societal developments, ensuring HDI practices stay up-to-date and effective. Successful implementation of the HDI framework requires multi-disciplinary collaboration, incorporating insights from technology, law, ethics, and user experience design. The paper posits that this comprehensive approach is vital for building trust and legitimacy in digital environments, ultimately leading to more ethical, transparent, and user-centric data interactions.
Cross submissions for Wednesday, 31 July 2024 (showing 8 of 8 entries )
- [20] arXiv:2310.00491 (replaced) [pdf, other]
-
Title: StreetNav: Leveraging Street Cameras to Support Precise Outdoor Navigation for Blind PedestriansGaurav Jain, Basel Hindi, Zihao Zhang, Koushik Srinivasula, Mingyu Xie, Mahshid Ghasemi, Daniel Weiner, Sophie Ana Paris, Xin Yi Therese Xu, Michael Malcolm, Mehmet Turkcan, Javad Ghaderi, Zoran Kostic, Gil Zussman, Brian A. SmithSubjects: Human-Computer Interaction (cs.HC)
Blind and low-vision (BLV) people rely on GPS-based systems for outdoor navigation. GPS's inaccuracy, however, causes them to veer off track, run into obstacles, and struggle to reach precise destinations. While prior work has made precise navigation possible indoors via hardware installations, enabling this outdoors remains a challenge. Interestingly, many outdoor environments are already instrumented with hardware such as street cameras. In this work, we explore the idea of repurposing existing street cameras for outdoor navigation. Our community-driven approach considers both technical and sociotechnical concerns through engagements with various stakeholders: BLV users, residents, business owners, and Community Board leadership. The resulting system, StreetNav, processes a camera's video feed using computer vision and gives BLV pedestrians real-time navigation assistance. Our evaluations show that StreetNav guides users more precisely than GPS, but its technical performance is sensitive to environmental occlusions and distance from the camera. We discuss future implications for deploying such systems at scale.
- [21] arXiv:2406.19528 (replaced) [pdf, other]
-
Title: Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on DepressionComments: 7 pages, 2 figures, accepted by CSCW 24Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis.
- [22] arXiv:2407.08501 (replaced) [pdf, other]
-
Title: SelfIE: Self-Initiated Explorable Instructions Towards Enhanced User ExperienceSubjects: Human-Computer Interaction (cs.HC)
Given the widespread use of procedural instructions with non-linear access (situational information retrieval), there has been a proposal to accommodate both linear and non-linear usage in instructional design. However, it has received inadequate scholarly attention, leading to limited exploration. This paper introduces Self-Initiated Explorable (SelfIE) instructions, a new design concept aiming at enabling users to navigate instructions flexibly by blending linear and non-linear access according to individual needs and situations during tasks. Using a Wizard-of-Oz protocol, we initially embodied SelfIE instructions within a toy-block assembly context and compared it with baseline instructions offering linear-only access (N=21). Results show a 71% increase in user preferences due to its ease of reflecting individual differences, empirically supporting the prior proposal. Besides, our observations identify three strategies for flexible access and suggest the potential of enhancing the user experience by considering cognitive processes and implementing flexible access in a wearable configuration. Following the design phase, we translated the WoZ-based design embodiment as working prototypes on the tablet and OHMD to assess usability and compare user experience between the two configurations (N=8). Our data yields valuable insights into managing the trade-offs between the two configurations, thereby facilitating more effective flexible access development.
- [23] arXiv:2407.18803 (replaced) [pdf, other]
-
Title: Design Frictions on Social Media: Balancing Reduced Mindless Scrolling and User SatisfactionComments: 6 pages, 1 figure, Muc '24Subjects: Human-Computer Interaction (cs.HC)
Design features of social media platforms, such as infinite scroll, increase users' likelihood of experiencing normative dissociation -- a mental state of absorption that diminishes self-awareness and disrupts memory. This paper investigates how adding design frictions into the interface of a social media platform reduce mindless scrolling and user satisfaction. We conducted a study with 30 participants and compared their memory recognition of posts in two scenarios: one where participants had to react to each post to access further content and another using an infinite scroll design. Participants who used the design frictions interface exhibited significantly better content recall, although a majority of participants found the interface frustrating. We discuss design recommendations and scenarios where adding design frictions to social media platforms can be beneficial.
- [24] arXiv:2407.18874 (replaced) [pdf, other]
-
Title: Engaging with Children's Artwork in Mixed Visual-Ability FamiliesSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
We present two studies exploring how blind or low-vision (BLV) family members engage with their sighted children's artwork, strategies to support understanding and interpretation, and the potential role of technology, such as AI, therein. Our first study involved 14 BLV individuals, and the second included five groups of BLV individuals with their children. Through semi-structured interviews with AI descriptions of children's artwork and multi-sensory design probes, we found that BLV family members value artwork engagement as a bonding opportunity, preferring the child's storytelling and interpretation over other nonvisual representations. Additionally, despite some inaccuracies, BLV family members felt that AI-generated descriptions could facilitate dialogue with their children and aid self-guided art discovery. We close with specific design considerations for supporting artwork engagement in mixed visual-ability families, including enabling artwork access through various methods, supporting children's corrections of AI output, and distinctions in context vs. content and interpretation vs. description of children's artwork.
- [25] arXiv:2407.19537 (replaced) [pdf, other]
-
Title: Enabling Uniform Computer Interaction Experience for Blind Users through Large Language ModelsSubjects: Human-Computer Interaction (cs.HC)
Blind individuals, who by necessity depend on screen readers to interact with computers, face considerable challenges in navigating the diverse and complex graphical user interfaces of different computer applications. The heterogeneity of various application interfaces often requires blind users to remember different keyboard combinations and navigation methods to use each application effectively. To alleviate this significant interaction burden imposed by heterogeneous application interfaces, we present Savant, a novel assistive technology powered by large language models (LLMs) that allows blind screen reader users to interact uniformly with any application interface through natural language. Novelly, Savant can automate a series of tedious screen reader actions on the control elements of the application when prompted by a natural language command from the user. These commands can be flexible in the sense that the user is not strictly required to specify the exact names of the control elements in the command. A user study evaluation of Savant with 11 blind participants demonstrated significant improvements in interaction efficiency and usability compared to current practices.
- [26] arXiv:2209.08199 (replaced) [pdf, other]
-
Title: ScreenQA: Large-Scale Question-Answer Pairs over Mobile App ScreenshotsYu-Chung Hsiao, Fedir Zubach, Gilles Baechler, Victor Carbune, Jason Lin, Maria Wang, Srinivas Sunkara, Yun Zhu, Jindong ChenSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
We present a new benchmark and dataset, ScreenQA, for screen content understanding via question answering. The existing screen datasets are focused either on structure and component-level understanding, or on a much higher-level composite task such as navigation and task completion. We attempt to bridge the gap between these two by annotating 86K question-answer pairs over the RICO dataset in hope to benchmark the screen reading comprehension capacity. This work is also the first to annotate answers for different application scenarios, including both full sentences and short forms, as well as supporting UI contents on screen and their bounding boxes. With the rich annotation, we discuss and define the evaluation metrics of the benchmark, show applications of the dataset, and provide a few baselines using closed and open source models.
- [27] arXiv:2210.08731 (replaced) [pdf, other]
-
Title: Evaluation of Pedestrian Safety in a High-Fidelity Simulation Environment FrameworkLin Ma, Longrui Chen, Yan Zhang, Mengdi Chu, Wenjie Jiang, Jiahao Shen, Chuxuan Li, Yifeng Shi, Nairui Luo, Jirui Yuan, Guyue Zhou, Jiangtao GongComments: Resubmit to ITSC 2024; Accept by ITSC 2024Subjects: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Robotics (cs.RO)
Pedestrians' safety is a crucial factor in assessing autonomous driving scenarios. However, pedestrian safety evaluation is rarely considered by existing autonomous driving simulation platforms. This paper proposes a pedestrian safety evaluation method for autonomous driving, in which not only the collision events but also the conflict events together with the characteristics of pedestrians are fully considered. Moreover, to apply the pedestrian safety evaluation system, we construct a high-fidelity simulation framework embedded with pedestrian safety-critical characteristics. We demonstrate our simulation framework and pedestrian safety evaluation with a comparative experiment with two kinds of autonomous driving perception algorithms -- single-vehicle perception and vehicle-to-infrastructure (V2I) cooperative perception. The results show that our framework can evaluate different autonomous driving algorithms with detailed and quantitative pedestrian safety indexes. To this end, the proposed simulation method and framework can be used to access different autonomous driving algorithms and evaluate pedestrians' safety performance in future autonomous driving simulations, which can inspire more pedestrian-friendly autonomous driving algorithms.
- [28] arXiv:2402.16654 (replaced) [pdf, other]
-
Title: GigaPevt: Multimodal Medical AssistantPavel Blinov, Konstantin Egorov, Ivan Sviridov, Nikolay Ivanov, Stepan Botman, Evgeniy Tagin, Stepan Kudin, Galina Zubkova, Andrey SavchenkoComments: IJCAI 2024, 4 pages, 2 figures, 2 tablesJournal-ref: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI) Demo Track, 2024, pp. 8614-8618Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Building an intelligent and efficient medical assistant is still a challenging AI problem. The major limitation comes from the data modality scarceness, which reduces comprehensive patient perception. This demo paper presents the GigaPevt, the first multimodal medical assistant that combines the dialog capabilities of large language models with specialized medical models. Such an approach shows immediate advantages in dialog quality and metric performance, with a 1.18% accuracy improvement in the question-answering task.
- [29] arXiv:2402.17270 (replaced) [pdf, other]
-
Title: Multi-Agent, Human-Agent and Beyond: A Survey on Cooperation in Social DilemmasSubjects: Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in social dilemmas. First, focusing on multi-agent cooperation, we review the intrinsic and external motivations that support cooperation among rational agents, and the methods employed to develop effective strategies against diverse opponents. Second, looking into human-agent cooperation, we discuss the current AI algorithms for cooperating with humans and the human biases towards AI agents. Third, we review the emergent field of leveraging AI agents to enhance cooperation among humans. We conclude by discussing future research avenues, such as using large language models, establishing unified theoretical frameworks, revisiting existing theories of human cooperation, and exploring multiple real-world applications.
- [30] arXiv:2407.10246 (replaced) [pdf, other]
-
Title: CourseAssist: Pedagogically Appropriate AI Tutor for Computer Science EducationComments: Accepted to SIGCSE Virtual 2024Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
The growing enrollments in computer science courses and increase in class sizes necessitate scalable, automated tutoring solutions to adequately support student learning. While Large Language Models (LLMs) like GPT-4 have demonstrated potential in assisting students through question-answering, educators express concerns over student overreliance, miscomprehension of generated code, and the risk of inaccurate answers. Rather than banning these tools outright, we advocate for a constructive approach that harnesses the capabilities of AI while mitigating potential risks. This poster introduces CourseAssist, a novel LLM-based tutoring system tailored for computer science education. Unlike generic LLM systems, CourseAssist uses retrieval-augmented generation, user intent classification, and question decomposition to align AI responses with specific course materials and learning objectives, thereby ensuring pedagogical appropriateness of LLMs in educational settings. We evaluated CourseAssist against a baseline of GPT-4 using a dataset of 50 question-answer pairs from a programming languages course, focusing on the criteria of usefulness, accuracy, and pedagogical appropriateness. Evaluation results show that CourseAssist significantly outperforms the baseline, demonstrating its potential to serve as an effective learning assistant. We have also deployed CourseAssist in 6 computer science courses at a large public R1 research university reaching over 500 students. Interviews with 20 student users show that CourseAssist improves computer science instruction by increasing the accessibility of course-specific tutoring help and shortening the feedback loop on their programming assignments. Future work will include extensive pilot testing at more universities and exploring better collaborative relationships between students, educators, and AI that improve computer science learning experiences.