Computers and Society
See recent articles
Showing new listings for Friday, 18 October 2024
- [1] arXiv:2410.12793 [pdf, html, other]
-
Title: Environment Scan of Generative AI Infrastructure for Clinical and Translational ScienceBetina Idnay, Zihan Xu, William G. Adams, Mohammad Adibuzzaman, Nicholas R. Anderson, Neil Bahroos, Douglas S. Bell, Cody Bumgardner, Thomas Campion, Mario Castro, James J. Cimino, I. Glenn Cohen, David Dorr, Peter L Elkin, Jungwei W. Fan, Todd Ferris, David J. Foran, David Hanauer, Mike Hogarth, Kun Huang, Jayashree Kalpathy-Cramer, Manoj Kandpal, Niranjan S. Karnik, Avnish Katoch, Albert M. Lai, Christophe G. Lambert, Lang Li, Christopher Lindsell, Jinze Liu, Zhiyong Lu, Yuan Luo, Peter McGarvey, Eneida A. Mendonca, Parsa Mirhaji, Shawn Murphy, John D. Osborne, Ioannis C. Paschalidis, Paul A. Harris, Fred Prior, Nicholas J. Shaheen, Nawar Shara, Ida Sim, Umberto Tachinardi, Lemuel R. Waitman, Rosalind J. Wright, Adrian H. Zai, Kai Zheng, Sandra Soo-Jin Lee, Bradley A. Malin, Karthik Natarajan, W. Nicholson Price II, Rui Zhang, Yiye Zhang, Hua Xu, Jiang Bian, Chunhua Weng, Yifan PengSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With the rapid advancement of GenAI technologies, including large language models (LLMs), healthcare institutions face unprecedented opportunities and challenges. This research explores the current status of GenAI integration, focusing on stakeholder roles, governance structures, and ethical considerations by administering a survey among leaders of health institutions (i.e., representing academic medical centers and health systems) to assess the institutional readiness and approach towards GenAI adoption. Key findings indicate a diverse range of institutional strategies, with most organizations in the experimental phase of GenAI deployment. The study highlights significant variations in governance models, with a strong preference for centralized decision-making but notable gaps in workforce training and ethical oversight. Moreover, the results underscore the need for a more coordinated approach to GenAI governance, emphasizing collaboration among senior leaders, clinicians, information technology staff, and researchers. Our analysis also reveals concerns regarding GenAI bias, data security, and stakeholder trust, which must be addressed to ensure the ethical and effective implementation of GenAI technologies. This study offers valuable insights into the challenges and opportunities of GenAI integration in healthcare, providing a roadmap for institutions aiming to leverage GenAI for improved quality of care and operational efficiency.
- [2] arXiv:2410.12795 [pdf, other]
-
Title: Integrating AI Education in Disciplinary Engineering Fields: Towards a System and Change PerspectiveComments: Accepted and presented at 52nd Annual Conference of the European Society for Engineering Education (SEFI)Subjects: Computers and Society (cs.CY)
Building up competencies in working with data and tools of Artificial Intelligence (AI) is becoming more relevant across disciplinary engineering fields. While the adoption of tools for teaching and learning, such as ChatGPT, is garnering significant attention, integration of AI knowledge, competencies, and skills within engineering education is lacking. Building upon existing curriculum change research, this practice paper introduces a systems perspective on integrating AI education within engineering through the lens of a change model. In particular, it identifies core aspects that shape AI adoption on a program level as well as internal and external influences using existing literature and a practical case study. Overall, the paper provides an analysis frame to enhance the understanding of change initiatives and builds the basis for generalizing insights from different initiatives in the adoption of AI in engineering education.
- [3] arXiv:2410.12796 [pdf, other]
-
Title: A Roles-based Competency Framework for Integrating Artificial Intelligence (AI) in Engineering CoursesComments: Accepted and presented at the 52nd Annual Conference of the European Society for Engineering Education (SEFI)Subjects: Computers and Society (cs.CY)
In this practice paper, we propose a framework for integrating AI into disciplinary engineering courses and curricula. The use of AI within engineering is an emerging but growing area and the knowledge, skills, and abilities (KSAs) associated with it are novel and dynamic. This makes it challenging for faculty who are looking to incorporate AI within their courses to create a mental map of how to tackle this challenge. In this paper, we advance a role-based conception of competencies to assist disciplinary faculty with identifying and implementing AI competencies within engineering curricula. We draw on prior work related to AI literacy and competencies and on emerging research on the use of AI in engineering. To illustrate the use of the framework, we provide two exemplary cases. We discuss the challenges in implementing the framework and emphasize the need for an embedded approach where AI concerns are integrated across multiple courses throughout the degree program, especially for teaching responsible and ethical AI development and use.
- [4] arXiv:2410.12797 [pdf, html, other]
-
Title: Identification of crowds using mobile crowd detection (MCS) and visualization with the DBSCAN algorithm for a Smart Campus environmentSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Multidisciplinary research, in conjunction with artificial intelligence (AI), the Internet of Things (IoT), Blockchain and Big Data analysis, has lowered barriers and made companies more productive, in other words, the joint work of these areas has promoted digital transformation in all areas, for example Artificial intelligence (AI) has made it possible to automate processes, and the Internet of Things (IoT) has connected devices and physical objects, enabling real-time data collection and analysis. Blockchain has provided a secure and transparent way to transact and store data. Big Data analysis has allowed companies to obtain valuable insights from large amounts of data. As these technologies continue to evolve, we can expect to see even more innovations and benefits in the future. This paper explores the feasibility of using Mobile Crowd Sensing (MCS) and visualization algorithms to detect crowding on a university campus. A survey was conducted to evaluate the university community's perception of a mobile application that provides information about crowds, and a detection scenario was simulated using randomly generated data and the DBSCAN algorithm for visualization. Preliminary results suggest that the system is viable and could be a useful tool for the prevention of accidents due to crowding and for the management of public spaces. The limitations of the study are discussed and future lines of research are proposed, such as crowd prediction, data privacy, and visualization optimization.
- [5] arXiv:2410.12800 [pdf, other]
-
Title: Reproducibility Needs Reshape Scientific Data GovernancePaul Meijer, Yousef Aggoune, Madeline Ambrose, Aldan Beaubien, James Harvey, Nicole Howard, Neelima Inala, Ed Johnson, Autumn Kelsey, Melissa Kinsey, Jessica Liang, Paul Mariz, Stark Pister, Sathya Subramanian, Vitalii Tereshchenko, Anne VettoSubjects: Computers and Society (cs.CY)
Scientific data governance should prioritize maximizing the utility of data throughout the research lifecycle. Research software systems that enable analysis reproducibility inform data governance policies and assist administrators in setting clear guidelines for data reuse, data retention, and the management of scientific computing needs. Proactive analysis reproducibility and data governance are integral and interconnected components of research lifecycle management.
- [6] arXiv:2410.12803 [pdf, html, other]
-
Title: Developing Guidelines for Functionally-Grounded Evaluation of Explainable Artificial Intelligence using Tabular DataMythreyi Velmurugan, Chun Ouyang, Yue Xu, Renuka Sindhgatta, Bemali Wickramanayake, Catarina MoreiraSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
Explainable Artificial Intelligence (XAI) techniques are used to provide transparency to complex, opaque predictive models. However, these techniques are often designed for image and text data, and it is unclear how fit-for-purpose they are when applied to tabular data. As XAI techniques are rarely evaluated in settings with tabular data, the applicability of existing evaluation criteria and methods are also unclear and needs (re-)examination. For example, some works suggest that evaluation methods may unduly influence the evaluation results when using tabular data. This lack of clarity on evaluation procedures can lead to reduced transparency and ineffective use of XAI techniques in real world settings. In this study, we examine literature on XAI evaluation to derive guidelines on functionally-grounded assessment of local, post hoc XAI techniques. We identify 20 evaluation criteria and associated evaluation methods, and derive guidelines on when and how each criterion should be evaluated. We also identify key research gaps to be addressed by future work. Our study contributes to the body of knowledge on XAI evaluation through in-depth examination of functionally-grounded XAI evaluation protocols, and has laid the groundwork for future research on XAI evaluation.
- [7] arXiv:2410.12804 [pdf, html, other]
-
Title: Hip Fracture Patient Pathways and Agent-based ModellingComments: 4 pages, 2 figuresSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
Increased healthcare demand is significantly straining European services. Digital solutions including advanced modelling techniques offer a promising solution to optimising patient flow without impacting day-to-day healthcare provision. In this work we outline an ongoing project that aims to optimise healthcare resources using agent-based simulations.
- [8] arXiv:2410.13009 [pdf, html, other]
-
Title: Is ETHICS about ethics? Evaluating the ETHICS benchmarkSubjects: Computers and Society (cs.CY)
ETHICS is probably the most-cited dataset for testing the ethical capabilities of language models. Drawing on moral theory, psychology, and prompt evaluation, we interrogate the validity of the ETHICS benchmark. Adding to prior work, our findings suggest that having a clear understanding of ethics and how it relates to empirical phenomena is key to the validity of ethics evaluations for AI.
- [9] arXiv:2410.13042 [pdf, html, other]
-
Title: How Do AI Companies "Fine-Tune" Policy? Examining Regulatory Capture in AI GovernanceComments: 39 pages (14 pages main text), 3 figures, 9 tables. To be published in the Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, & Society (AIES)Subjects: Computers and Society (cs.CY)
Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation is an important part of the policy process, it can also cause regulatory capture, whereby industry co-opts regulatory regimes to prioritize private over public welfare. Capture of AI policy by AI developers and deployers could hinder such regulatory goals as ensuring the safety, fairness, beneficence, transparency, or innovation of general-purpose AI systems. In this paper, we first introduce different models of regulatory capture from the social science literature. We then present results from interviews with 17 AI policy experts on what policy outcomes could compose regulatory capture in US AI policy, which AI industry actors are influencing the policy process, and whether and how AI industry actors attempt to achieve outcomes of regulatory capture. Experts were primarily concerned with capture leading to a lack of AI regulation, weak regulation, or regulation that over-emphasizes certain policy goals over others. Experts most commonly identified agenda-setting (15 of 17 interviews), advocacy (13), academic capture (10), information management (9), cultural capture through status (7), and media capture (7) as channels for industry influence. To mitigate these particular forms of industry influence, we recommend systemic changes in developing technical expertise in government and civil society, independent funding streams for the AI ecosystem, increased transparency and ethics requirements, greater civil society access to policy, and various procedural safeguards.
- [10] arXiv:2410.13090 [pdf, html, other]
-
Title: Exploring the Head Effect in Live Streaming Platforms: A Two-Sided Market and Welfare AnalysisSubjects: Computers and Society (cs.CY)
This paper develops a theoretical framework to analyze live streaming platforms as two-sided markets connecting streamers and viewers, focusing on the "head effect" where a few top streamers attract most viewers due to strong network effects and platform policies like commission rates and traffic allocation algorithms. Using static and dynamic models, it examines how these factors lead to traffic concentration and winner-takes-all scenarios. The welfare implications are assessed, revealing that while such concentration may enhance consumer utility short-term, it can reduce content diversity and overall social welfare in the long run. The paper proposes policy interventions to adjust traffic allocation, promoting a more equitable distribution of viewers across streamers, and demonstrates through simulations that combining multiple policies can significantly reduce market concentration and enhance social welfare
- [11] arXiv:2410.13101 [pdf, html, other]
-
Title: The Influence of Generative AI on Content Platforms: Supply, Demand, and Welfare Impacts in Two-Sided MarketsSubjects: Computers and Society (cs.CY)
This paper explores how generative artificial intelligence (AI) affects online platforms where both human creators and AI generate content. We develop a model to understand how generative AI changes supply and demand, impacts traffic distribution, and influences social welfare. Our analysis shows that AI can lead to a huge increase in content supply due to its low cost, which could cause oversupply. While AI boosts content variety, it can also create information overload, lowering user satisfaction and disrupting the market. AI also increases traffic concentration among top creators (the "winner-takes-all" effect) while expanding opportunities for niche content (the "long-tail" effect). We assess how these changes affect consumer and producer benefits, finding that the overall impact depends on the quality of AI-generated content and the level of information overload. Through simulation experiments, we test policy ideas, such as adjusting platform fees and recommendations, to reduce negative effects and improve social welfare. The results highlight the need for careful management of AI's role in online content platforms to maintain a healthy balance
- [12] arXiv:2410.13326 [pdf, html, other]
-
Title: Comparing the Utility, Preference, and Performance of Course Material Search Functionality and Retrieval-Augmented Generation Large Language Model (RAG-LLM) AI Chatbots in Information-Seeking TasksComments: 12 pages, 4 figuresSubjects: Computers and Society (cs.CY); Information Retrieval (cs.IR)
Providing sufficient support for students requires substantial resources, especially considering the growing enrollment numbers. Students need help in a variety of tasks, ranging from information-seeking to requiring support with course assignments. To explore the utility of recent large language models (LLMs) as a support mechanism, we developed an LLM-powered AI chatbot that augments the answers that are produced with information from the course materials. To study the effect of the LLM-powered AI chatbot, we conducted a lab-based user study (N=14), in which the participants worked on tasks from a web software development course. The participants were divided into two groups, where one of the groups first had access to the chatbot and then to a more traditional search functionality, while another group started with the search functionality and was then given the chatbot. We assessed the participants' performance and perceptions towards the chatbot and the search functionality and explored their preferences towards the support functionalities. Our findings highlight that both support mechanisms are seen as useful and that support mechanisms work well for specific tasks, while less so for other tasks. We also observe that students tended to prefer the second support mechanism more, where students who were first given the chatbot tended to prefer the search functionality and vice versa.
- [13] arXiv:2410.13400 [pdf, other]
-
Title: Towards Hybrid Intelligence in Journalism: Findings and Lessons Learnt from a Collaborative Analysis of Greek Political Rhetoric by ChatGPT and HumansThanasis Troboukis, Kelly Kiki, Antonis Galanopoulos, Pavlos Sermpezis, Stelios Karamanidis, Ilias Dimitriadis, Athena VakaliSubjects: Computers and Society (cs.CY); Computation and Language (cs.CL)
This chapter introduces a research project titled "Analyzing the Political Discourse: A Collaboration Between Humans and Artificial Intelligence", which was initiated in preparation for Greece's 2023 general elections. The project focused on the analysis of political leaders' campaign speeches, employing Artificial Intelligence (AI), in conjunction with an interdisciplinary team comprising journalists, a political scientist, and data scientists. The chapter delves into various aspects of political discourse analysis, including sentiment analysis, polarization, populism, topic detection, and Named Entities Recognition (NER). This experimental study investigates the capabilities of large language model (LLMs), and in particular OpenAI's ChatGPT, for analyzing political speech, evaluates its strengths and weaknesses, and highlights the essential role of human oversight in using AI in journalism projects and potentially other societal sectors. The project stands as an innovative example of human-AI collaboration (known also as "hybrid intelligence") within the realm of digital humanities, offering valuable insights for future initiatives.
- [14] arXiv:2410.13452 [pdf, html, other]
-
Title: A.I. go by many names: towards a sociotechnical definition of artificial intelligenceSubjects: Computers and Society (cs.CY)
Defining artificial intelligence (AI) is a persistent challenge, often muddied by technical ambiguity and varying interpretations. Commonly used definitions heavily emphasize technical properties of AI but neglect the social purpose of it. This essay makes the case for a sociotechnical definition of AI, essential for researchers who require clarity in their work. It explores two primary approaches to define AI: the rationalistic, which focuses on AI as systems that think and act rationally, and the humanistic, which frames AI in terms of its ability to emulate human intelligence. By reconciling these approaches and contrasting them with existing socio-political definitions, the essay proposes a balanced, sociotechnical definition.
New submissions (showing 14 of 14 entries)
- [15] arXiv:2410.12872 (cross-list from cs.CL) [pdf, html, other]
-
Title: Beyond Right and Wrong: Mitigating Cold Start in Knowledge Tracing Using Large Language Model and Option WeightComments: 11 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Knowledge Tracing (KT) is vital in educational data mining, enabling personalized learning by tracking learners' knowledge states and forecasting their academic outcomes. This study introduces the LOKT (Large Language Model Option-weighted Knowledge Tracing) model to address the cold start problem where limited historical data available using large language models (LLMs). While traditional KT models have incorporated option weights, our research extends this by integrating these weights into an LLM-based KT framework. Moving beyond the binary classification of correct and incorrect responses, we emphasize that different types of incorrect answers offer valuable insights into a learner's knowledge state. By converting these responses into text-based ordinal categories, we enable LLMs to assess learner understanding with greater clarity, although our approach focuses on the final knowledge state rather than the progression of learning over time. Using five public datasets, we demonstrate that the LOKT model sustains high predictive accuracy even with limited data, effectively addressing both "learner cold-start" and "system cold-start" scenarios. These findings showcase LOKT's potential to enhance LLM-based learning tools and support early-stage personalization.
- [16] arXiv:2410.12880 (cross-list from cs.CL) [pdf, html, other]
-
Title: Navigating the Cultural Kaleidoscope: A Hitchhiker's Guide to Sensitivity in Large Language ModelsSomnath Banerjee, Sayan Layek, Hari Shrawgi, Rajarshi Mandal, Avik Halder, Shanu Kumar, Sagnik Basu, Parag Agrawal, Rima Hazra, Animesh MukherjeeSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
As LLMs are increasingly deployed in global applications, the importance of cultural sensitivity becomes paramount, ensuring that users from diverse backgrounds feel respected and understood. Cultural harm can arise when these models fail to align with specific cultural norms, resulting in misrepresentations or violations of cultural values. This work addresses the challenges of ensuring cultural sensitivity in LLMs, especially in small-parameter models that often lack the extensive training data needed to capture global cultural nuances. We present two key contributions: (1) A cultural harm test dataset, created to assess model outputs across different cultural contexts through scenarios that expose potential cultural insensitivities, and (2) A culturally aligned preference dataset, aimed at restoring cultural sensitivity through fine-tuning based on feedback from diverse annotators. These datasets facilitate the evaluation and enhancement of LLMs, ensuring their ethical and safe deployment across different cultural landscapes. Our results show that integrating culturally aligned feedback leads to a marked improvement in model behavior, significantly reducing the likelihood of generating culturally insensitive or harmful content. Ultimately, this work paves the way for more inclusive and respectful AI systems, fostering a future where LLMs can safely and ethically navigate the complexities of diverse cultural landscapes.
- [17] arXiv:2410.12913 (cross-list from cs.LG) [pdf, other]
-
Title: Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity InsightsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Discrete Mathematics (cs.DM)
Data summarization tasks are often modeled as $k$-clustering problems, where the goal is to choose $k$ data points, called cluster centers, that best represent the dataset by minimizing a clustering objective. A popular objective is to minimize the maximum distance between any data point and its nearest center, which is formalized as the $k$-center problem. While in some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a predefined subset of points, referred as facilities or suppliers; this is known as the $k$-supplier problem. In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group while minimizing the $k$-supplier objective. The groups can be disjoint or overlapping, leading to two distinct problem variants each with different computational complexity.
We present $3$-approximation algorithms for both variants, improving the previously known factor of $5$. For disjoint groups, our algorithm runs in polynomial time, while for overlapping groups, we present a fixed-parameter tractable algorithm, where the exponential runtime depends only on the number of groups and centers. We show that these approximation factors match the theoretical lower bounds, assuming standard complexity theory conjectures. Finally, using an open-source implementation, we demonstrate the scalability of our algorithms on large synthetic datasets and assess the price of fairness on real-world data, comparing solution quality with and without fairness constraints. - [18] arXiv:2410.13095 (cross-list from cs.SI) [pdf, html, other]
-
Title: Future of Algorithmic Organization: Large-Scale Analysis of Decentralized Autonomous Organizations (DAOs)Subjects: Social and Information Networks (cs.SI); Computational Engineering, Finance, and Science (cs.CE); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Decentralized Autonomous Organizations (DAOs) resemble early online communities, particularly those centered around open-source projects, and present a potential empirical framework for complex social-computing systems by encoding governance rules within "smart contracts" on the blockchain. A key function of a DAO is collective decision-making, typically carried out through a series of proposals where members vote on organizational events using governance tokens, signifying relative influence within the DAO. In just a few years, the deployment of DAOs surged with a total treasury of $24.5 billion and 11.1M governance token holders collectively managing decisions across over 13,000 DAOs as of 2024. In this study, we examine the operational dynamics of 100 DAOs, like pleasrdao, lexdao, lootdao, optimism collective, uniswap, etc. With large-scale empirical analysis of a diverse set of DAO categories and smart contracts and by leveraging on-chain (e.g., voting results) and off-chain data, we examine factors such as voting power, participation, and DAO characteristics dictating the level of decentralization, thus, the efficiency of management structures. As such, our study highlights that increased grassroots participation correlates with higher decentralization in a DAO, and lower variance in voting power within a DAO correlates with a higher level of decentralization, as consistently measured by Gini metrics. These insights closely align with key topics in political science, such as the allocation of power in decision-making and the effects of various governance models. We conclude by discussing the implications for researchers, and practitioners, emphasizing how these factors can inform the design of democratic governance systems in emerging applications that require active engagement from stakeholders in decision-making.
- [19] arXiv:2410.13114 (cross-list from cs.SD) [pdf, html, other]
-
Title: Sound Check: Auditing Audio DatasetsWilliam Agnew, Julia Barnett, Annie Chu, Rachel Hong, Michael Feffer, Robin Netzorg, Harry H. Jiang, Ezra Awumey, Sauvik DasSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
Generative audio models are rapidly advancing in both capabilities and public utilization -- several powerful generative audio models have readily available open weights, and some tech companies have released high quality generative audio products. Yet, while prior work has enumerated many ethical issues stemming from the data on which generative visual and textual models have been trained, we have little understanding of similar issues with generative audio datasets, including those related to bias, toxicity, and intellectual property. To bridge this gap, we conducted a literature review of hundreds of audio datasets and selected seven of the most prominent to audit in more detail. We found that these datasets are biased against women, contain toxic stereotypes about marginalized communities, and contain significant amounts of copyrighted work. To enable artists to see if they are in popular audio datasets and facilitate exploration of the contents of these datasets, we developed a web tool audio datasets exploration tool at this https URL.
- [20] arXiv:2410.13138 (cross-list from cs.CL) [pdf, html, other]
-
Title: Data Defenses Against Large Language ModelsSubjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Large language models excel at performing inference over text to extract information, summarize information, or generate additional text. These inference capabilities are implicated in a variety of ethical harms spanning surveillance, labor displacement, and IP/copyright theft. While many policy, legal, and technical mitigations have been proposed to counteract these harms, these mitigations typically require cooperation from institutions that move slower than technical advances (i.e., governments) or that have few incentives to act to counteract these harms (i.e., the corporations that create and profit from these LLMs). In this paper, we define and build "data defenses" -- a novel strategy that directly empowers data owners to block LLMs from performing inference on their data. We create data defenses by developing a method to automatically generate adversarial prompt injections that, when added to input text, significantly reduce the ability of LLMs to accurately infer personally identifying information about the subject of the input text or to use copyrighted text in inference. We examine the ethics of enabling such direct resistance to LLM inference, and argue that making data defenses that resist and subvert LLMs enables the realization of important values such as data ownership, data sovereignty, and democratic control over AI systems. We verify that our data defenses are cheap and fast to generate, work on the latest commercial and open-source LLMs, resistance to countermeasures, and are robust to several different attack settings. Finally, we consider the security implications of LLM data defenses and outline several future research directions in this area. Our code is available at this https URL and a tool for using our defenses to protect text against LLM inference is at this https URL.
- [21] arXiv:2410.13218 (cross-list from cs.CL) [pdf, html, other]
-
Title: CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior TherapyMian Zhang, Xianjun Yang, Xinlu Zhang, Travis Labrum, Jamie C. Chiu, Shaun M. Eack, Fei Fang, William Yang Wang, Zhiyu Zoey ChenSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
There is a significant gap between patient needs and available mental health support today. In this paper, we aim to thoroughly examine the potential of using Large Language Models (LLMs) to assist professional psychotherapy. To this end, we propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance. We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions. These tasks encompass key aspects of CBT that could potentially be enhanced through AI assistance, while also outlining a hierarchy of capability requirements, ranging from basic knowledge recitation to engaging in real therapeutic conversations. We evaluated representative LLMs on our benchmark. Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios requiring deep analysis of patients' cognitive structures and generating effective responses, suggesting potential future work.
- [22] arXiv:2410.13250 (cross-list from cs.HC) [pdf, other]
-
Title: Perceptions of Discriminatory Decisions of Artificial Intelligence: Unpacking the Role of Individual CharacteristicsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
This study investigates how personal differences (digital self-efficacy, technical knowledge, belief in equality, political ideology) and demographic factors (age, education, and income) are associated with perceptions of artificial intelligence (AI) outcomes exhibiting gender and racial bias and with general attitudes towards AI. Analyses of a large-scale experiment dataset (N = 1,206) indicate that digital self-efficacy and technical knowledge are positively associated with attitudes toward AI, while liberal ideologies are negatively associated with outcome trust, higher negative emotion, and greater skepticism. Furthermore, age and income are closely connected to cognitive gaps in understanding discriminatory AI outcomes. These findings highlight the importance of promoting digital literacy skills and enhancing digital self-efficacy to maintain trust in AI and beliefs in AI usefulness and safety. The findings also suggest that the disparities in understanding problematic AI outcomes may be aligned with economic inequalities and generational gaps in society. Overall, this study sheds light on the socio-technological system in which complex interactions occur between social hierarchies, divisions, and machines that reflect and exacerbate the disparities.
- [23] arXiv:2410.13753 (cross-list from cs.CE) [pdf, html, other]
-
Title: DPFedBank: Crafting a Privacy-Preserving Federated Learning Framework for Financial Institutions with Policy PillarsComments: 9 pages, 1 figureSubjects: Computational Engineering, Finance, and Science (cs.CE); Cryptography and Security (cs.CR); Computers and Society (cs.CY)
In recent years, the financial sector has faced growing pressure to adopt advanced machine learning models to derive valuable insights while preserving data privacy. However, the highly sensitive nature of financial data presents significant challenges to sharing and collaboration. This paper presents DPFedBank, an innovative framework enabling financial institutions to collaboratively develop machine learning models while ensuring robust data privacy through Local Differential Privacy (LDP) mechanisms. DPFedBank is designed to address the unique privacy and security challenges associated with financial data, allowing institutions to share insights without exposing sensitive information. By leveraging LDP, the framework ensures that data remains confidential even during collaborative processes, providing a crucial solution for privacy-aware machine learning in finance. We conducted an in-depth evaluation of the potential vulnerabilities within this framework and developed a comprehensive set of policies aimed at mitigating these risks. The proposed policies effectively address threats posed by malicious clients, compromised servers, inherent weaknesses in existing Differential Privacy-Federated Learning (DP-FL) frameworks, and sophisticated external adversaries. Unlike existing DP-FL approaches, DPFedBank introduces a novel combination of adaptive LDP mechanisms and advanced cryptographic techniques specifically tailored for financial data, which significantly enhances privacy while maintaining model utility. Key security enhancements include the implementation of advanced authentication protocols, encryption techniques for secure data exchange, and continuous monitoring systems to detect and respond to malicious activities in real-time.
- [24] arXiv:2410.13854 (cross-list from cs.CL) [pdf, html, other]
-
Title: Can MLLMs Understand the Deep Implication Behind Chinese Images?Chenhao Zhang, Xi Feng, Yuelin Bai, Xinrun Du, Jinchang Hou, Kaixin Deng, Guangzeng Han, Qinrui Li, Bingli Wang, Jiaheng Liu, Xingwei Qu, Yifei Zhang, Qixuan Zhao, Yiming Liang, Ziqiang Liu, Feiteng Fang, Min Yang, Wenhao Huang, Chenghua Lin, Ge Zhang, Shiwen NiComments: 32 pages,18 figures. Project Page: this https URL Code: this https URL Dataset: this https URLSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
As the capabilities of Multimodal Large Language Models (MLLMs) continue to improve, the need for higher-order capability evaluation of MLLMs is increasing. However, there is a lack of work evaluating MLLM for higher-order perception and understanding of Chinese visual content. To fill the gap, we introduce the **C**hinese **I**mage **I**mplication understanding **Bench**mark, **CII-Bench**, which aims to assess the higher-order perception and understanding capabilities of MLLMs for Chinese images. CII-Bench stands out in several ways compared to existing benchmarks. Firstly, to ensure the authenticity of the Chinese context, images in CII-Bench are sourced from the Chinese Internet and manually reviewed, with corresponding answers also manually crafted. Additionally, CII-Bench incorporates images that represent Chinese traditional culture, such as famous Chinese traditional paintings, which can deeply reflect the model's understanding of Chinese traditional culture. Through extensive experiments on CII-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on CII-Bench. The highest accuracy of MLLMs attains 64.4%, where as human accuracy averages 78.2%, peaking at an impressive 81.0%. Subsequently, MLLMs perform worse on Chinese traditional culture images, suggesting limitations in their ability to understand high-level semantics and lack a deep knowledge base of Chinese traditional culture. Finally, it is observed that most models exhibit enhanced accuracy when image emotion hints are incorporated into the prompts. We believe that CII-Bench will enable MLLMs to gain a better understanding of Chinese semantics and Chinese-specific images, advancing the journey towards expert artificial general intelligence (AGI). Our project is publicly available at this https URL.
Cross submissions (showing 10 of 10 entries)
- [25] arXiv:2308.02935 (replaced) [pdf, html, other]
-
Title: Bias Behind the Wheel: Fairness Testing of Autonomous Driving SystemsComments: Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM)Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Software Engineering (cs.SE)
This paper conducts fairness testing of automated pedestrian detection, a crucial but under-explored issue in autonomous driving systems. We evaluate eight state-of-the-art deep learning-based pedestrian detectors across demographic groups on large-scale real-world datasets. To enable thorough fairness testing, we provide extensive annotations for the datasets, resulting in 8,311 images with 16,070 gender labels, 20,115 age labels, and 3,513 skin tone labels. Our findings reveal significant fairness issues, particularly related to age. The proportion of undetected children is 20.14% higher compared to adults. Furthermore, we explore how various driving scenarios affect the fairness of pedestrian detectors. We find that pedestrian detectors demonstrate significant gender biases during night time, potentially exacerbating the prevalent societal issue of female safety concerns during nighttime out. Moreover, we observe that pedestrian detectors can demonstrate both enhanced fairness and superior performance under specific driving conditions, which challenges the fairness-performance trade-off theory widely acknowledged in the fairness literature. We publicly release the code, data, and results to support future research on fairness in autonomous driving.
- [26] arXiv:2402.01551 (replaced) [pdf, other]
-
Title: Mapping acceptance: micro scenarios as a dual-perspective approach for assessing public opinion and individual differences in technology perceptionJournal-ref: (2024) Front. Psychol. 15:1419564Subjects: Computers and Society (cs.CY)
Understanding public perception of technology is crucial to aligning research, development, and governance of technology. This article introduces micro scenarios as an integrative method to evaluate mental models and social acceptance across numerous technologies and concepts using a few single-item scales within a single comprehensive survey. This approach contrasts with traditional methods that focus on detailed assessments of as few as one scenario. The data can be interpreted in two ways: Perspective (1): Average evaluations of each participant can be seen as individual differences, providing reflexive measurements across technologies or topics. This helps in understanding how perceptions of technology relate to other personality factors. Perspective (2): Average evaluations of each technology or topic can be interpreted as technology attributions. This makes it possible to position technologies on visuo-spatial maps to simplify identification of critical issues, conduct comparative rankings based on selected criteria, and to analyze the interplay between different attributions. This dual approach enables the modeling of acceptance-relevant factors that shape public opinion. It offers a framework for researchers, technology developers, and policymakers to identify pivotal factors for acceptance at both the individual and technology levels. I illustrate this methodology with examples from my research, provide practical guidelines, and include R code to enable others to conduct similar studies. This paper aims to bridge the gap between technological advancement and societal perception, offering a tool for more informed decision-making in technology development and policy-making.
- [27] arXiv:2407.10755 (replaced) [pdf, other]
-
Title: Socioeconomic factors of national representation in the global film festival circuit: skewed toward the large and wealthy, but small countries can beat the oddsSubjects: Computers and Society (cs.CY); Social and Information Networks (cs.SI)
This study analyzes how economic, demographic, and geographic factors predict the representation of different countries in the global film festival circuit. It relies on the combination of several open-access databases, including festival programming information from the Cinando platform of the Cannes Film Market. The dataset consists of over 20,000 unique films from almost 600 festivals across the world over a decade, a total of more than 30,000 film-festival entries. It is shown that while films from large affluent countries indeed dominate the festival screen, the bias is nevertheless not fully proportional to the large demographic and economic worldwide disparities and that several smaller countries perform better than expected. Further computational simulations demonstrate how much including films from smaller countries contributes to cultural diversity, and how countries vary in cultural "trade balance" dynamics, revealing differences between net exporters and importers of festival films. This research underscores the importance of representation in film festivals and the public value of increasing cultural diversity. The data-driven insights and quantitative approaches to festival programming and cultural event analytics are hoped to be useful for both the academic community as well as film festival organizers and policymakers aiming to foster more inclusive and diverse cultural landscapes.
- [28] arXiv:2410.04247 (replaced) [pdf, other]
-
Title: Unraveling the Nuances of AI Accountability: A Synthesis of Dimensions Across DisciplinesComments: Published in the Proceedings of the 32nd European Conference on Information Systems (ECIS)Subjects: Computers and Society (cs.CY)
The widespread diffusion of Artificial Intelligence (AI)-based systems offers many opportunities to contribute to the well-being of individuals and the advancement of economies and societies. This diffusion is, however, closely accompanied by public scandals causing harm to individuals, markets, or society, and leading to the increasing importance of accountability. AI accountability itself faces conceptual ambiguity, with research scattered across multiple disciplines. To address these issues, we review current research across multiple disciplines and identify key dimensions of accountability in the context of AI. We reveal six themes with 13 corresponding dimensions and additional accountability facilitators that future research can utilize to specify accountability scenarios in the context of AI-based systems.
- [29] arXiv:2307.04417 (replaced) [pdf, html, other]
-
Title: Fairness-aware Federated Minimax Optimization with Convergence GuaranteeSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
Federated learning (FL) has garnered considerable attention due to its privacy-preserving feature. Nonetheless, the lack of freedom in managing user data can lead to group fairness issues, where models are biased towards sensitive factors such as race or gender. To tackle this issue, this paper proposes a novel algorithm, fair federated averaging with augmented Lagrangian method (FFALM), designed explicitly to address group fairness issues in FL. Specifically, we impose a fairness constraint on the training objective and solve the minimax reformulation of the constrained optimization problem. Then, we derive the theoretical upper bound for the convergence rate of FFALM. The effectiveness of FFALM in improving fairness is shown empirically on CelebA and UTKFace datasets in the presence of severe statistical heterogeneity.
- [30] arXiv:2407.05131 (replaced) [pdf, html, other]
-
Title: RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language ModelsComments: EMNLP 2024 mainSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retrieval-Augmented Generation (RAG), which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges. First, limited retrieved contexts might not cover all necessary information, while excessive retrieval can introduce irrelevant and inaccurate references, interfering with the model's generation. Second, in cases where the model originally responds correctly, applying RAG can lead to an over-reliance on retrieved contexts, resulting in incorrect answers. To address these issues, we propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the calibrated selection of the number of retrieved contexts. Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model, balancing its dependence on inherent knowledge and retrieved contexts for generation. We demonstrate the effectiveness of RULE on medical VQA and report generation tasks across three datasets, achieving an average improvement of 47.4% in factual accuracy. We publicly release our benchmark and code in this https URL.
- [31] arXiv:2409.11643 (replaced) [pdf, html, other]
-
Title: Combating Phone Scams with LLM-based Detection: Where Do We Stand?Comments: 2 pages, 1 figureSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Phone scams pose a significant threat to individuals and communities, causing substantial financial losses and emotional distress. Despite ongoing efforts to combat these scams, scammers continue to adapt and refine their tactics, making it imperative to explore innovative countermeasures. This research explores the potential of large language models (LLMs) to provide detection of fraudulent phone calls. By analyzing the conversational dynamics between scammers and victims, LLM-based detectors can identify potential scams as they occur, offering immediate protection to users. While such approaches demonstrate promising results, we also acknowledge the challenges of biased datasets, relatively low recall, and hallucinations that must be addressed for further advancement in this field
- [32] arXiv:2410.10850 (replaced) [pdf, html, other]
-
Title: On the Reliability of Large Language Models to Misinformed and Demographically-Informed PromptsToluwani Aremu, Oluwakemi Akinwehinmi, Chukwuemeka Nwagu, Syed Ishtiaque Ahmed, Rita Orji, Pedro Arnau Del Amo, Abdulmotaleb El SaddikComments: Study conducted between August and December 2023. Under review at AAAI-AI Magazine. Submitted for archival purposes onlySubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
We investigate and observe the behaviour and performance of Large Language Model (LLM)-backed chatbots in addressing misinformed prompts and questions with demographic information within the domains of Climate Change and Mental Health. Through a combination of quantitative and qualitative methods, we assess the chatbots' ability to discern the veracity of statements, their adherence to facts, and the presence of bias or misinformation in their responses. Our quantitative analysis using True/False questions reveals that these chatbots can be relied on to give the right answers to these close-ended questions. However, the qualitative insights, gathered from domain experts, shows that there are still concerns regarding privacy, ethical implications, and the necessity for chatbots to direct users to professional services. We conclude that while these chatbots hold significant promise, their deployment in sensitive areas necessitates careful consideration, ethical oversight, and rigorous refinement to ensure they serve as a beneficial augmentation to human expertise rather than an autonomous solution.
- [33] arXiv:2410.12294 (replaced) [pdf, html, other]
-
Title: LLM-based Cognitive Models of Students with MisconceptionsSubjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Accurately modeling student cognition is crucial for developing effective AI-driven educational technologies. A key challenge is creating realistic student models that satisfy two essential properties: (1) accurately replicating specific misconceptions, and (2) correctly solving problems where these misconceptions are not applicable. This dual requirement reflects the complex nature of student understanding, where misconceptions coexist with correct knowledge. This paper investigates whether Large Language Models (LLMs) can be instruction-tuned to meet this dual requirement and effectively simulate student thinking in algebra. We introduce MalAlgoPy, a novel Python library that generates datasets reflecting authentic student solution patterns through a graph-based representation of algebraic problem-solving. Utilizing MalAlgoPy, we define and examine Cognitive Student Models (CSMs) - LLMs instruction tuned to faithfully emulate realistic student behavior. Our findings reveal that LLMs trained on misconception examples can efficiently learn to replicate errors. However, the training diminishes the model's ability to solve problems correctly, particularly for problem types where the misconceptions are not applicable, thus failing to satisfy second property of CSMs. We demonstrate that by carefully calibrating the ratio of correct to misconception examples in the training data - sometimes as low as 0.25 - it is possible to develop CSMs that satisfy both properties. Our insights enhance our understanding of AI-based student models and pave the way for effective adaptive learning systems.
- [34] arXiv:2410.12622 (replaced) [pdf, html, other]
-
Title: From Measurement Instruments to Data: Leveraging Theory-Driven Synthetic Training Data for Classifying Social ConstructsSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Computational text classification is a challenging task, especially for multi-dimensional social constructs. Recently, there has been increasing discussion that synthetic training data could enhance classification by offering examples of how these constructs are represented in texts. In this paper, we systematically examine the potential of theory-driven synthetic training data for improving the measurement of social constructs. In particular, we explore how researchers can transfer established knowledge from measurement instruments in the social sciences, such as survey scales or annotation codebooks, into theory-driven generation of synthetic data. Using two studies on measuring sexism and political topics, we assess the added value of synthetic training data for fine-tuning text classification models. Although the results of the sexism study were less promising, our findings demonstrate that synthetic data can be highly effective in reducing the need for labeled data in political topic classification. With only a minimal drop in performance, synthetic data allows for substituting large amounts of labeled data. Furthermore, theory-driven synthetic data performed markedly better than data generated without conceptual information in mind.
- [35] arXiv:2410.12691 (replaced) [pdf, html, other]
-
Title: Building Better: Avoiding Pitfalls in Developing Language Resources when Data is ScarceSubjects: Computation and Language (cs.CL); Computers and Society (cs.CY)
Language is a symbolic capital that affects people's lives in many ways (Bourdieu, 1977, 1991). It is a powerful tool that accounts for identities, cultures, traditions, and societies in general. Hence, data in a given language should be viewed as more than a collection of tokens. Good data collection and labeling practices are key to building more human-centered and socially aware technologies. While there has been a rising interest in mid- to low-resource languages within the NLP community, work in this space has to overcome unique challenges such as data scarcity and access to suitable annotators. In this paper, we collect feedback from those directly involved in and impacted by NLP artefacts for mid- to low-resource languages. We conduct a quantitative and qualitative analysis of the responses and highlight the main issues related to (1) data quality such as linguistic and cultural data suitability; and (2) the ethics of common annotation practices such as the misuse of online community services. Based on these findings, we make several recommendations for the creation of high-quality language artefacts that reflect the cultural milieu of its speakers, while simultaneously respecting the dignity and labor of data workers.