Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 103,407 results for author: R.

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04694  [pdf, other

    cs.CL cs.AI cs.LG

    Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

    Authors: Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jeremy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans

    Abstract: AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model". This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model's knowledge of itself and its circumstances as situational awareness. To quantify situational… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 11 page main body, 98 page appendix, 58 figures

  2. arXiv:2407.04681  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge

    Authors: Yuanze Lin, Yunsheng Li, Dongdong Chen, Weijian Xu, Ronald Clark, Philip Torr, Lu Yuan

    Abstract: In recent years, multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets, enabling them to generally understand images well. However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs, limiting their ability to answer questions requiring… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  3. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chen Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  4. arXiv:2407.04674  [pdf, other

    cs.SE

    Game Elements to Engage Students Learning the Open Source Software Contribution Process

    Authors: Italo Santos, Katia Romero Felizardo, Marco A. Gerosa, Igor Steinmacher

    Abstract: Contributing to OSS projects can help students to enhance their skills and expand their professional networks. However, novice contributors often feel discouraged due to various barriers. Gamification techniques hold the potential to foster engagement and facilitate the learning process. Nevertheless, it is unknown which game elements are effective in this context. This study explores students' pe… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  5. arXiv:2407.04629  [pdf, other

    cs.CL cs.AI

    Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework

    Authors: Reza Averly, Xia Ning

    Abstract: Clinical named entity recognition (NER) aims to retrieve important entities within clinical narratives. Recent works have demonstrated that large language models (LLMs) can achieve strong performance in this task. While previous works focus on proprietary LLMs, we investigate how open NER LLMs, trained specifically for entity recognition, perform in clinical NER. In this paper, we aim to improve t… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Preprint

  6. arXiv:2407.04626  [pdf, ps, other

    cs.CC math.AG

    Determination Problems for Orbit Closures and Matrix Groups

    Authors: Rida Ait El Manssour, George Kenison, Mahsa Shirmohammadi, James Worrell

    Abstract: Computational problems concerning the orbit of a point under the action of a matrix group occur in numerous subfields of computer science, including complexity theory, program analysis, quantum computation, and automata theory. In many cases the focus extends beyond orbits proper to orbit closures under a suitable topology. Typically one starts from a group and several points and asks questions ab… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 22 pages

  7. arXiv:2407.04622  [pdf, other

    cs.LG

    On scalable oversight with weak LLMs judging strong LLMs

    Authors: Zachary Kenton, Noah Y. Siegel, János Kramár, Jonah Brown-Cohen, Samuel Albanie, Jannis Bulian, Rishabh Agarwal, David Lindner, Yunhao Tang, Noah D. Goodman, Rohin Shah

    Abstract: Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 15 pages (53 including appendices)

  8. arXiv:2407.04621  [pdf, other

    cs.CV

    OneRestore: A Universal Restoration Framework for Composite Degradation

    Authors: Yu Guo, Yuan Gao, Yuxu Lu, Huilin Zhu, Ryan Wen Liu, Shengfeng He

    Abstract: In real-world scenarios, image impairments often manifest as composite degradations, presenting a complex interplay of elements such as low light, haze, rain, and snow. Despite this reality, existing restoration methods typically target isolated degradation types, thereby falling short in environments where multiple degrading factors coexist. To bridge this gap, our study proposes a versatile imag… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  9. arXiv:2407.04589  [pdf, other

    cs.LG

    Remembering Everything Makes You Vulnerable: A Limelight on Machine Unlearning for Personalized Healthcare Sector

    Authors: Ahan Chatterjee, Sai Anirudh Aryasomayajula, Rajat Chaudhari, Subhajit Paul, Vishwa Mohan Singh

    Abstract: As the prevalence of data-driven technologies in healthcare continues to rise, concerns regarding data privacy and security become increasingly paramount. This thesis aims to address the vulnerability of personalized healthcare models, particularly in the context of ECG monitoring, to adversarial attacks that compromise patient privacy. We propose an approach termed "Machine Unlearning" to mitigat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 15 Pages, Exploring unlearning techniques on ECG Classifier

  10. arXiv:2407.04581  [pdf, other

    cs.LG cs.ET

    Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions

    Authors: Shumaila Javaid, Ruhul Amin Khalil, Nasir Saeed, Bin He, Mohamed-Slim Alouini

    Abstract: Integrated satellite, aerial, and terrestrial networks (ISATNs) represent a sophisticated convergence of diverse communication technologies to ensure seamless connectivity across different altitudes and platforms. This paper explores the transformative potential of integrating Large Language Models (LLMs) into ISATNs, leveraging advanced Artificial Intelligence (AI) and Machine Learning (ML) capab… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  11. arXiv:2407.04579  [pdf, other

    cs.LG

    GOALPlace: Begin with the End in Mind

    Authors: Anthony Agnesina, Rongjian Liang, Geraldo Pradipta, Anand Rajaram, Haoxing Ren

    Abstract: Co-optimizing placement with congestion is integral to achieving high-quality designs. This paper presents GOALPlace, a new learning-based general approach to improving placement congestion by controlling cell density. Our method efficiently learns from an EDA tool's post-route optimized results and uses an empirical Bayes technique to adapt this goal/target to a specific placer's solutions, effec… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 10 pages, 7 figures, preprint

  12. arXiv:2407.04578  [pdf, other

    cs.SD cs.NE eess.AS

    Resource-Efficient Speech Quality Prediction through Quantization Aware Training and Binary Activation Maps

    Authors: Mattias Nilsson, Riccardo Miccini, Clément Laroche, Tobias Piechowiak, Friedemann Zenke

    Abstract: As speech processing systems in mobile and edge devices become more commonplace, the demand for unintrusive speech quality monitoring increases. Deep learning methods provide high-quality estimates of objective and subjective speech quality metrics. However, their significant computational requirements are often prohibitive on resource-constrained devices. To address this issue, we investigated bi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for Interspeech 2024

  13. arXiv:2407.04577  [pdf, other

    cs.IR

    Optimizing Nepali PDF Extraction: A Comparative Study of Parser and OCR Technologies

    Authors: Prabin Paudel, Supriya Khadka, Ranju G. C., Rahul Shah

    Abstract: This research compares PDF parsing and Optical Character Recognition (OCR) methods for extracting Nepali content from PDFs. PDF parsing offers fast and accurate extraction but faces challenges with non-Unicode Nepali fonts. OCR, specifically PyTesseract, overcomes these challenges, providing versatility for both digital and scanned PDFs. The study reveals that while PDF parsers are faster, their a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  14. arXiv:2407.04563  [pdf, other

    cs.SE

    Experiences in Using the V-Model as a Framework for Applied Doctoral Research

    Authors: Rodrigo Falcão, Andreas Jedlitschka, Frank Elberzhager, Dieter Rombach

    Abstract: The pervasive role played by software in virtually all industries has fostered ever-increasing development of applied research in software engineering. In this chapter, we contribute our experience in using the V-Model as a framework for teaching how to conduct applied research in empirical software engineering. The foundational idea of using the V-Model is presented, and guidance for using it to… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: This is a preprint of a chapter in the book "Teaching Empirical Research Methods in Software Engineering"

  15. arXiv:2407.04561  [pdf, other

    cs.NI eess.SP

    Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities

    Authors: Mukaram Shahid, Kunal Das, Taimoor Ul Islam, Christ Somiah, Daji Qiao, Arsalan Ahmad, Jimming Song, Zhengyuan Zhu, Sarath Babu, Yong Guan, Tusher Chakraborty, Suraj Jog, Ranveer Chandra, Hongwei Zhang

    Abstract: Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  16. arXiv:2407.04559  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

    Authors: Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle

    Abstract: Visual storytelling consists in generating a natural language story given a temporally ordered sequence of images. This task is not only challenging for models, but also very difficult to evaluate with automatic metrics since there is no consensus about what makes a story 'good'. In this paper, we introduce a novel method that measures story quality in terms of human likeness regarding three key a… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  17. arXiv:2407.04557  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

    Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

    Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 512 pages total, 4 main figures + 218 supplementary figures

  18. arXiv:2407.04549  [pdf, other

    cs.CL cs.AI

    Spontaneous Reward Hacking in Iterative Self-Refinement

    Authors: Jane Pan, He He, Samuel R. Bowman, Shi Feng

    Abstract: Language models are capable of iteratively improving their outputs based on natural language feedback, thus enabling in-context optimization of user preference. In place of human users, a second language model can be used as an evaluator, providing feedback along with numerical ratings which the generator attempts to optimize. However, because the evaluator is an imperfect proxy of user preference… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  19. arXiv:2407.04541  [pdf, ps, other

    cs.CL cs.AI cs.LG

    PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts

    Authors: Ana-Cristina Rogoz, Maria Ilinca Nechita, Radu Tudor Ionescu

    Abstract: We introduce PoPreRo, the first dataset for Popularity Prediction of Romanian posts collected from Reddit. The PoPreRo dataset includes a varied compilation of post samples from five distinct subreddits of Romania, totaling 28,107 data samples. Along with our novel dataset, we introduce a set of competitive models to be used as baselines for future research. Interestingly, the top-scoring model ac… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at ICPR 2024

  20. Mapping Cardinality-based Feature Models to Weighted Automata over Featured Multiset Semirings (Extended Version)

    Authors: Robert Müller, Mathis Weiß, Malte Lochau

    Abstract: Cardinality-based feature models permit to select multiple copies of the same feature, thus generalizing the notion of product configurations from subsets of Boolean features to multisets of feature instances. This increased expressiveness shapes a-priori infinite and non-convex configuration spaces, which renders established solution-space mappings based on Boolean presence conditions insufficien… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: This is the author's version of the work. The definitive version will be published in Proceedings of 28th ACM International Systems and Software Product Lines Conference (SPLC'24)

  21. arXiv:2407.04486  [pdf, other

    q-bio.QM cs.AI

    Variational and Explanatory Neural Networks for Encoding Cancer Profiles and Predicting Drug Responses

    Authors: Tianshu Feng, Rohan Gnanaolivu, Abolfazl Safikhani, Yuanhang Liu, Jun Jiang, Nicholas Chia, Alexander Partin, Priyanka Vasanthakumari, Yitan Zhu, Chen Wang

    Abstract: Human cancers present a significant public health challenge and require the discovery of novel drugs through translational research. Transcriptomics profiling data that describes molecular activities in tumors and cancer cell lines are widely utilized for predicting anti-cancer drug responses. However, existing AI models face challenges due to noise in transcriptomics data and lack of biological i… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  22. arXiv:2407.04485  [pdf, other

    cs.CL cs.LG

    Leveraging Graph Structures to Detect Hallucinations in Large Language Models

    Authors: Noa Nonkes, Sergei Agaronian, Evangelos Kanoulas, Roxana Petcu

    Abstract: Large language models are extensively applied across a wide range of tasks, such as customer support, content creation, educational tutoring, and providing financial guidance. However, a well-known drawback is their predisposition to generate hallucinations. This damages the trustworthiness of the information these models provide, impacting decision-making and user confidence. We propose a method… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Journal ref: Proceedings of the TextGraphs-17 Workshop, ACL 2024

  23. arXiv:2407.04467  [pdf, other

    cs.AI cs.CL cs.GT

    Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

    Authors: Nathan Herr, Fernando Acero, Roberta Raileanu, María Pérez-Ortiz, Zhibin Li

    Abstract: Large Language Models (LLMs) have been increasingly used in real-world settings, yet their strategic abilities remain largely unexplored. Game theory provides a good framework for assessing the decision-making abilities of LLMs in interactions with other agents. Although prior studies have shown that LLMs can solve these tasks with carefully curated prompts, they fail when the problem setting or p… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 8 pages (19 with appendix), 6 figures in the main body (4 in the appendix), 4 tables in the main body

  24. arXiv:2407.04442  [pdf, other

    cs.CR

    GoSurf: Identifying Software Supply Chain Attack Vectors in Go

    Authors: Carmine Cesarano, Vivi Andersson, Roberto Natella, Martin Monperrus

    Abstract: In Go, the widespread adoption of open-source software has led to a flourishing ecosystem of third-party dependencies, which are often integrated into critical systems. However, the reuse of dependencies introduces significant supply chain security risks, as a single compromised package can have cascading impacts. Existing supply chain attack taxonomies overlook language-specific features that can… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  25. arXiv:2407.04411  [pdf, other

    cs.CR cs.AI cs.CL

    Waterfall: Framework for Robust and Scalable Text Watermarking

    Authors: Gregory Kang Ruey Lau, Xinyuan Niu, Hieu Dao, Jiangwei Chen, Chuan-Sheng Foo, Bryan Kian Hsiang Low

    Abstract: Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existing text watermarking methods are not robust enough against such attacks nor scalable to millions of u… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  26. arXiv:2407.04407  [pdf, other

    cs.LG

    Trustworthy Classification through Rank-Based Conformal Prediction Sets

    Authors: Rui Luo, Zhixin Zhou

    Abstract: Machine learning classification tasks often benefit from predicting a set of possible labels with confidence scores to capture uncertainty. However, existing methods struggle with the high-dimensional nature of the data and the lack of well-calibrated probabilities from modern classification models. We propose a novel conformal prediction method that employs a rank-based score function suitable fo… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  27. arXiv:2407.04404  [pdf

    cs.AR

    Multi-Antenna Technology for 6G Integrated Sensing and Communication

    Authors: Yong Zeng, Zhenjun Dong, Huizhi Wang, Lipeng Zhu, Ziyao Hong, Qingji Jiang, Dongming Wang, Shi Jin, Rui Zhang

    Abstract: By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: in Chinese language

  28. arXiv:2407.04369  [pdf, other

    cs.CV

    ZARRIO @ Ego4D Short Term Object Interaction Anticipation Challenge: Leveraging Affordances and Attention-based models for STA

    Authors: Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Josechu Guerrero-Campo, Giovanni Maria Farinella

    Abstract: Short-Term object-interaction Anticipation (STA) consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. We propose STAformer, a novel attention-based architecture integrating frame-guided temporal pooling, dual image-video attention, and multi-scale feature fusion to support S… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.01194

  29. arXiv:2407.04367  [pdf, ps, other

    math.CO cs.DS

    Reconfiguration of Independent Transversals

    Authors: Pjotr Buys, Ross J. Kang, Kenta Ozeki

    Abstract: Given integers $Δ\ge 2$ and $t\ge 2Δ$, suppose there is a graph of maximum degree $Δ$ and a partition of its vertices into blocks of size at least $t$. By a seminal result of Haxell, there must be some independent set of the graph that is transversal to the blocks, a so-called independent transversal. We show that, if moreover $t\ge2Δ+1$, then every independent transversal can be transformed withi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    MSC Class: 05C35; 05C69; 05C15; 68R05; 68R10

  30. arXiv:2407.04355  [pdf, other

    cs.CV

    Data-Driven Tissue- and Subject-Specific Elastic Regularization for Medical Image Registration

    Authors: Anna Reithmeir, Lina Felsner, Rickmer Braren, Julia A. Schnabel, Veronika A. Zimmer

    Abstract: Physics-inspired regularization is desired for intra-patient image registration since it can effectively capture the biomechanical characteristics of anatomical structures. However, a major challenge lies in the reliance on physical parameters: Parameter estimations vary widely across the literature, and the physical properties themselves are inherently subject-specific. In this work, we introduce… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024

  31. arXiv:2407.04346  [pdf

    cs.CV

    MobileFlow: A Multimodal LLM For Mobile GUI Agent

    Authors: Songqin Nong, Jiali Zhu, Rui Wu, Jiongchao Jin, Shuo Shan, Xiutian Huang, Wenhao Xu

    Abstract: Currently, the integration of mobile Graphical User Interfaces (GUIs) is ubiquitous in most people's daily lives. And the ongoing evolution of multimodal large-scale models, such as GPT-4v, Qwen-VL-Max, has significantly bolstered the capabilities of GUI comprehension and user action analysis, showcasing the potentiality of intelligent GUI assistants. However, current GUI Agents often need to acce… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  32. arXiv:2407.04328  [pdf, other

    cs.RO cs.LG eess.SY

    EAGERx: Graph-Based Framework for Sim2real Robot Learning

    Authors: Bas van der Heijden, Jelle Luijkx, Laura Ferranti, Jens Kober, Robert Babuska

    Abstract: Sim2real, that is, the transfer of learned control policies from simulation to real world, is an area of growing interest in robotics due to its potential to efficiently handle complex tasks. The sim2real approach faces challenges due to mismatches between simulation and reality. These discrepancies arise from inaccuracies in modeling physical phenomena and asynchronous control, among other factor… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: For an introductory video, see http://www.youtube.com/watch?v=D0CQNnTT010 . The documentation, tutorials, and our open-source code can be found at http://eagerx.readthedocs.io

  33. arXiv:2407.04293  [pdf, other

    cs.CL cs.SD eess.AS

    Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency

    Authors: Roman Aperdannier, Sigurd Schacht, Alexander Piazza

    Abstract: In this paper, different online speaker diarization systems are evaluated on the same hardware with the same test data with regard to their latency. The latency is the time span from audio input to the output of the corresponding speaker label. As part of the evaluation, various model combinations within the DIART framework, a diarization system based on the online clustering algorithm UIS-RNN-SML… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 6 pages

  34. arXiv:2407.04291  [pdf, other

    eess.AS cs.LG

    We Need Variations in Speech Synthesis: Sub-center Modelling for Speaker Embeddings

    Authors: Ismail Rasim Ulgen, Carlos Busso, John H. L. Hansen, Berrak Sisman

    Abstract: In speech synthesis, modeling of rich emotions and prosodic variations present in human voice are crucial to synthesize natural speech. Although speaker embeddings have been widely used in personalized speech synthesis as conditioning inputs, they are designed to lose variation to optimize speaker recognition accuracy. Thus, they are suboptimal for speech synthesis in terms of modeling the rich va… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Submitted to IEEE Signal Processing Letters

  35. arXiv:2407.04285  [pdf, other

    cs.LG cs.AI

    Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

    Authors: Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

    Abstract: Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  36. arXiv:2407.04263  [pdf, other

    cs.SE

    Drop it All or Pick it Up? How Developers Responded to the Log4JShell Vulnerability

    Authors: Vittunyuta Maeprasart, Ali Ouni, Raula Gaikovina Kula

    Abstract: Although using third-party libraries has become prevalent in contemporary software development, developers often struggle to update their dependencies. Prior works acknowledge that due to the migration effort, priority and other issues cause lags in the migration process. The common assumption is that developers should drop all other activities and prioritize fixing the vulnerability. Our objectiv… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted to SERA24. arXiv admin note: text overlap with arXiv:2406.11362

  37. arXiv:2407.04249  [pdf, other

    cs.CV

    FeatureSORT: Essential Features for Effective Tracking

    Authors: Hamidreza Hashempoor, Rosemary Koikara, Yu Dong Hwang

    Abstract: In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  38. arXiv:2407.04247  [pdf, other

    cs.CL cs.AI cs.CV

    ArAIEval Shared Task: Propagandistic Techniques Detection in Unimodal and Multimodal Arabic Content

    Authors: Maram Hasanain, Md. Arid Hasan, Fatema Ahmed, Reem Suwaileh, Md. Rafiul Biswas, Wajdi Zaghouani, Firoj Alam

    Abstract: We present an overview of the second edition of the ArAIEval shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. In this edition, ArAIEval offers two tasks: (i) detection of propagandistic textual spans with persuasion techniques identification in tweets and news articles, and (ii) distinguishing between propagandistic and non-propagandistic memes. A total of… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: propaganda, span detection, disinformation, misinformation, fake news, LLMs, GPT-4, multimodality, multimodal LLMs

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  39. Exploration of Class Center for Fine-Grained Visual Classification

    Authors: Hang Yao, Qiguang Miao, Peipei Zhao, Chaoneng Li, Xin Li, Guanwen Feng, Ruyi Liu

    Abstract: Different from large-scale classification tasks, fine-grained visual classification is a challenging task due to two critical problems: 1) evident intra-class variances and subtle inter-class differences, and 2) overfitting owing to fewer training samples in datasets. Most existing methods extract key features to reduce intra-class variances, but pay no attention to subtle inter-class differences… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accpeted by TCSVT. Code and trained models are here:https://github.com/hyao1/ECC

  40. arXiv:2407.04242  [pdf, other

    cs.CV

    Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

    Authors: Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

    Abstract: Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependen… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024. This is the submitted manuscript and the preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections

  41. arXiv:2407.04241  [pdf, other

    cs.CV cs.AI

    AnySR: Realizing Image Super-Resolution as Any-Scale, Any-Resource

    Authors: Wengyi Zhan, Mingbao Lin, Chia-Wen Lin, Rongrong Ji

    Abstract: In an effort to improve the efficiency and scalability of single-image super-resolution (SISR) applications, we introduce AnySR, to rebuild existing arbitrary-scale SR methods into any-scale, any-resource implementation. As a contrast to off-the-shelf methods that solve SR tasks across various scales with the same computing costs, our AnySR innovates in: 1) building arbitrary-scale tasks as any-re… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  42. arXiv:2407.04240  [pdf, other

    cs.LG

    A Two-Step Minimax Q-learning Algorithm for Two-Player Zero-Sum Markov Games

    Authors: Shreyas S R, Antony Vijesh

    Abstract: An interesting iterative procedure is proposed to solve a two-player zero-sum Markov games. First this problem is expressed as a min-max Markov game. Next, a two-step Q-learning algorithm for solving Markov decision problem (MDP) is suitably modified to solve this Markov game. Under a suitable assumption, the boundedness of the proposed iterates is obtained theoretically. Using results from stocha… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  43. arXiv:2407.04231  [pdf, other

    cs.CV

    Efficient GANs for Document Image Binarization Based on DWT and Normalization

    Authors: Rui-Yang Ju, KokSheik Wong, Jen-Shiun Chiang

    Abstract: For document image binarization task, generative adversarial networks (GANs) can generate images where shadows and noise are effectively removed, which allow for text information extraction. The current state-of-the-art (SOTA) method proposes a three-stage network architecture that utilizes six GANs. Despite its excellent model performance, the SOTA network architecture requires long training and… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  44. arXiv:2407.04208  [pdf, other

    cs.CV

    AMD: Automatic Multi-step Distillation of Large-scale Vision Models

    Authors: Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu

    Abstract: Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

  45. arXiv:2407.04206  [pdf, other

    math.NA cs.CE

    Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

    Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

    Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  46. arXiv:2407.04203  [pdf, other

    cs.CV

    HCS-TNAS: Hybrid Constraint-driven Semi-supervised Transformer-NAS for Ultrasound Image Segmentation

    Authors: Renqi Chen

    Abstract: Accurate ultrasound segmentation is pursued because it aids clinicians in achieving a comprehensive diagnosis. Due to the presence of low image quality and high costs associated with annotation, two primary concerns arise: (1) enhancing the understanding of multi-scale features, and (2) improving the resistance to data dependency. To mitigate these concerns, we propose HCS-TNAS, a novel neural arc… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  47. arXiv:2407.04190  [pdf, other

    cs.CV

    Computer Vision for Clinical Gait Analysis: A Gait Abnormality Video Dataset

    Authors: Rahm Ranjan, David Ahmedt-Aristizabal, Mohammad Ali Armin, Juno Kim

    Abstract: Clinical gait analysis (CGA) using computer vision is an emerging field in artificial intelligence that faces barriers of accessible, real-world data, and clear task objectives. This paper lays the foundation for current developments in CGA as well as vision-based methods and datasets suitable for gait analysis. We introduce The Gait Abnormality in Video Dataset (GAVD) in response to our review of… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    ACM Class: I.2.10

  48. arXiv:2407.04183  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

    Authors: Joshua Ashkinaze, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak, Eric Gilbert

    Abstract: Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% ac… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  49. arXiv:2407.04182  [pdf, other

    cs.AR

    Towards Generalized On-Chip Communication for Programmable Accelerators in Heterogeneous Architectures

    Authors: Joseph Zuckerman, John-David Wellman, Ajay Vanamali, Manish Shankar, Gabriele Tombesi, Karthik Swaminathan, Kevin Lee, Mohit Kapur, Robert Philhower, Pradip Bose, Luca P. Carloni

    Abstract: We present several enhancements to the open-source ESP platform to support flexible and efficient on-chip communication for programmable accelerators in heterogeneous SoCs. These enhancements include 1) a flexible point-to-point communication mechanism between accelerators, 2) a multicast NoC that supports data forwarding to multiple accelerators simultaneously, 3) accelerator synchronization leve… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Appeared in the Sixth International Workshop on Domain Specific System Architecture (DOSSA-6)

  50. arXiv:2407.04157  [pdf, other

    cs.CE cs.LG

    Finite Operator Learning: Bridging Neural Operators and Numerical Methods for Efficient Parametric Solution and Optimization of PDEs

    Authors: Shahed Rezaei, Reza Najian Asl, Kianoosh Taghikhani, Ahmad Moeineddin, Michael Kaliske, Markus Apel

    Abstract: We introduce a method that combines neural operators, physics-informed machine learning, and standard numerical methods for solving PDEs. The proposed approach extends each of the aforementioned methods and unifies them within a single framework. We can parametrically solve partial differential equations in a data-free manner and provide accurate sensitivities, meaning the derivatives of the solut… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.02363