Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 120 results for author: Chang, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.05614  [pdf, other

    cs.AR cs.ET eess.SY

    ICGMM: CXL-enabled Memory Expansion with Intelligent Caching Using Gaussian Mixture Model

    Authors: Hanqiu Chen, Yitu Wang, Luis Vitorio Cargnini, Mohammadreza Soltaniyeh, Dongyang Li, Gongjin Sun, Pradeep Subedi, Andrew Chang, Yiran Chen, Cong Hao

    Abstract: Compute Express Link (CXL) emerges as a solution for wide gap between computational speed and data communication rates among host and multiple devices. It fosters a unified and coherent memory space between host and CXL storage devices such as such as Solid-state drive (SSD) for memory expansion, with a corresponding DRAM implemented as the device cache. However, this introduces challenges such as… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: This paper is accepted by DAC2024

  2. arXiv:2408.03178  [pdf, other

    cs.CV cs.GR cs.LG

    An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion

    Authors: Xingguang Yan, Han-Hung Lee, Ziyu Wan, Angel X. Chang

    Abstract: We introduce a new approach for generating realistic 3D models with UV maps through a representation termed "Object Images." This approach encapsulates surface geometry, appearance, and patch structures within a 64x64 pixel image, effectively converting complex 3D shapes into a more manageable 2D format. By doing so, we address the challenges of both geometric and semantic irregularity inherent in… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Project Page: https://omages.github.io/

  3. arXiv:2408.02211  [pdf, other

    cs.GR

    SceneMotifCoder: Example-driven Visual Program Learning for Generating 3D Object Arrangements

    Authors: Hou In Ivan Tam, Hou In Derek Pun, Austin T. Wang, Angel X. Chang, Manolis Savva

    Abstract: Despite advances in text-to-3D generation methods, generation of multi-object arrangements remains challenging. Current methods exhibit failures in generating physically plausible arrangements that respect the provided text description. We present SceneMotifCoder (SMC), an example-driven framework for generating 3D object arrangements through visual program learning. SMC leverages large language m… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  4. arXiv:2407.13183  [pdf

    eess.IV cs.CV

    Methods to Measure the Broncho-Arterial Ratio and Wall Thickness in the Right Lower Lobe for Defining Radiographic Reversibility of Bronchiectasis

    Authors: Abhijith R. Beeravolu, Ian Brent Masters, Mirjam Jonkman, Kheng Cher Yeo, Spyridon Prountzos, Rahul J Thomas, Eva Ignatious, Sami Azam, Gabrielle B McCallum, Efthymia Alexopoulou, Anne B Chang, Friso De Boer

    Abstract: The diagnosis of bronchiectasis requires measuring abnormal bronchial dilation. It is confirmed using a chest CT scan, where the key feature is an increased broncho-arterial ratio (BAR) (>0.8 in children), often with bronchial wall thickening. Image processing methods facilitate quicker interpretation and detailed evaluations by lobes and segments. Challenges like inclined nature, oblique orientat… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 14 pages

  5. arXiv:2407.12952  [pdf, other

    cs.CV

    Denoising Diffusions in Latent Space for Medical Image Segmentation

    Authors: Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu

    Abstract: Diffusion models (DPMs) have demonstrated remarkable performance in image generation, often times outperforming other generative models. Since their introduction, the powerful noise-to-image denoising pipeline has been extended to various discriminative tasks, including image segmentation. In case of medical imaging, often times the images are large 3D scans, where segmenting one image using DPMs… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures

  6. arXiv:2406.12723  [pdf, other

    cs.LG

    BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

    Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

    Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  7. arXiv:2406.11579  [pdf, other

    cs.CV

    Duoduo CLIP: Efficient 3D Understanding with Multi-View Images

    Authors: Han-Hung Lee, Yiming Zhang, Angel X. Chang

    Abstract: We introduce Duoduo CLIP, a model for 3D representation learning that learns shape encodings from multi-view images instead of point-clouds. The choice of multi-view images allows us to leverage 2D priors from off-the-shelf CLIP models to facilitate fine-tuning with 3D data. Our approach not only shows better generalization compared to existing point cloud methods, but also reduces GPU requirement… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.03575   

    cs.LG

    Reconciling Heterogeneous Effects in Causal Inference

    Authors: Audrey Chang, Emily Diana, Alexander Williams Tolbert

    Abstract: In this position and problem pitch paper, we offer a solution to the reference class problem in causal inference. We apply the Reconcile algorithm for model multiplicity in machine learning to reconcile heterogeneous effects in causal inference. Discrepancy between conditional average treatment effect (CATE) estimators of heterogeneous effects poses the reference class problem, where estimates for… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: This version has been removed by arXiv administrators as the submitter did not have the right to agree to the license at the time of submission

  9. arXiv:2405.17537  [pdf, other

    cs.AI cs.CL cs.CV

    BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

    Authors: ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang

    Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 16 pages with 9 figures

  10. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  11. arXiv:2405.05010  [pdf, other

    cs.CV

    ${M^2D}$NeRF: Multi-Modal Decomposition NeRF with 3D Feature Fields

    Authors: Ning Wang, Lefei Zhang, Angel X Chang

    Abstract: Neural fields (NeRF) have emerged as a promising approach for representing continuous 3D scenes. Nevertheless, the lack of semantic encoding in NeRFs poses a significant challenge for scene decomposition. To address this challenge, we present a single model, Multi-Modal Decomposition NeRF (${M^2D}$NeRF), that is capable of both text-based and visual patch-based edits. Specifically, we use multi-mo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  12. arXiv:2405.00738  [pdf, other

    cs.AR cs.AI cs.LG

    HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

    Authors: Andy He, Darren Key, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee

    Abstract: Graphics Processing Units (GPUs) have become the leading hardware accelerator for deep learning applications and are used widely in training and inference of transformers; transformers have achieved state-of-the-art performance in many areas of machine learning and are especially used in most modern Large Language Models (LLMs). However, GPUs require large amounts of energy, which poses environmen… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: 7 pages, 2 figures

  13. arXiv:2404.04231  [pdf, other

    cs.CV

    Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

    Authors: Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang, Chun-Pei Chen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Yung-Yu Chuang, Yen-Yu Lin

    Abstract: This paper addresses text-supervised semantic segmentation, aiming to learn a model capable of segmenting arbitrary visual concepts within images by using only image-text pairs without dense annotations. Existing methods have demonstrated that contrastive learning on image-text pairs effectively aligns visual segments with the meanings of texts. We notice that there is a discrepancy between text a… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  14. arXiv:2403.13754  [pdf, other

    cs.CL

    Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

    Authors: Catherine Arnett, Pamela D. Rivière, Tyler A. Chang, Sean Trott

    Abstract: The relationship between language model tokenization and performance is an open area of research. Here, we investigate how different tokenization schemes impact number agreement in Spanish plurals. We find that morphologically-aligned tokenization performs similarly to other tokenization schemes, even when induced artificially for words that would not be tokenized that way during training. We then… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  15. arXiv:2403.13289  [pdf, other

    cs.CV

    Text-to-3D Shape Generation

    Authors: Han-Hung Lee, Manolis Savva, Angel X. Chang

    Abstract: Recent years have seen an explosion of work and interest in text-to-3D shape generation. Much of the progress is driven by advances in 3D representations, large-scale pretraining and representation learning for text and image data enabling generative AI models, and differentiable rendering. Computational systems that can perform text-to-3D shape generation have captivated the popular imagination a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  16. arXiv:2403.12301  [pdf, other

    cs.CV

    R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding

    Authors: Qirui Wu, Sonia Raychaudhuri, Daniel Ritchie, Manolis Savva, Angel X Chang

    Abstract: We introduce the Reality-linked 3D Scenes (R3DS) dataset of synthetic 3D scenes mirroring the real-world scene arrangements from Matterport3D panoramas. Compared to prior work, R3DS has more complete and densely populated scenes with objects linked to real-world observations in panoramas. R3DS also provides an object support hierarchy, and matching object sets (e.g., same chairs around a dining ta… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.08904  [pdf, other

    cs.CL

    Detecting Hallucination and Coverage Errors in Retrieval Augmented Generation for Controversial Topics

    Authors: Tyler A. Chang, Katrin Tomanek, Jessica Hoffmann, Nithum Thain, Erin van Liemt, Kathleen Meier-Hellstern, Lucas Dixon

    Abstract: We explore a strategy to handle controversial topics in LLM-based chatbots based on Wikipedia's Neutral Point of View (NPOV) principle: acknowledge the absence of a single true answer and surface multiple perspectives. We frame this as retrieval augmented generation, where perspectives are retrieved from a knowledge base and the LLM is tasked with generating a fluent and faithful response from the… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  18. arXiv:2403.00686  [pdf, other

    cs.CL

    A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages

    Authors: Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

    Abstract: How should text dataset sizes be compared across languages? Even for content-matched (parallel) corpora, UTF-8 encoded text can require a dramatically different number of bytes for different languages. In our work, we define the byte premium between two languages as the ratio of bytes used to encode content-matched text in those languages. We compute byte premiums for 1155 languages, and we use li… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  19. arXiv:2402.15700  [pdf, other

    cs.LG cs.AI cs.CL

    CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning

    Authors: Junyu Luo, Xiaochen Wang, Jiaqi Wang, Aofei Chang, Yaqing Wang, Fenglong Ma

    Abstract: Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and oft… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: LREC-Coling 2024

  20. arXiv:2402.01077  [pdf, ps, other

    cs.LG cs.AI

    Recent Advances in Predictive Modeling with Electronic Health Records

    Authors: Jiaqi Wang, Junyu Luo, Muchao Ye, Xiaochen Wang, Yuan Zhong, Aofei Chang, Guanjie Huang, Ziyi Yin, Cao Xiao, Jimeng Sun, Fenglong Ma

    Abstract: The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This su… ▽ More

    Submitted 13 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by IJCAI 24 Survey Track

  21. arXiv:2401.00405  [pdf, other

    cs.CV

    Generalizing Single-View 3D Shape Retrieval to Occlusions and Unseen Objects

    Authors: Qirui Wu, Daniel Ritchie, Manolis Savva, Angel X. Chang

    Abstract: Single-view 3D shape retrieval is a challenging task that is increasingly important with the growth of available 3D data. Prior work that has studied this task has not focused on evaluating how realistic occlusions impact performance, and how shape retrieval methods generalize to scenarios where either the target 3D shape database contains unseen shapes, or the input image contains unseen objects.… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  22. arXiv:2312.14369  [pdf, other

    cs.CY cs.LG

    Quality-Diversity Generative Sampling for Learning with Synthetic Data

    Authors: Allen Chang, Matthew C. Fontaine, Serena Booth, Maja J. Matarić, Stefanos Nikolaidis

    Abstract: Generative models can serve as surrogates for some real data sources by creating synthetic training datasets, but in doing so they may transfer biases to downstream tasks. We focus on protecting quality and diversity when generating synthetic training datasets. We propose quality-diversity generative sampling (QDGS), a framework for sampling data uniformly across a user-defined measure space, desp… ▽ More

    Submitted 27 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024; 7 pages main, 12 pages total, 9 figures

  23. arXiv:2312.12653  [pdf, other

    eess.IV cs.CV

    Diagnosis Of Takotsubo Syndrome By Robust Feature Selection From The Complex Latent Space Of DL-based Segmentation Network

    Authors: Fahim Ahmed Zaman, Wahidul Alam, Tarun Kanti Roy, Amanda Chang, Kan Liu, Xiaodong Wu

    Abstract: Researchers have shown significant correlations among segmented objects in various medical imaging modalities and disease related pathologies. Several studies showed that using hand crafted features for disease prediction neglects the immense possibility to use latent features from deep learning (DL) models which may reduce the overall accuracy of differential diagnosis. However, directly using cl… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: 5 pages, 3 figures, conference

  24. arXiv:2312.12649  [pdf, other

    eess.IV cs.CV

    Surf-CDM: Score-Based Surface Cold-Diffusion Model For Medical Image Segmentation

    Authors: Fahim Ahmed Zaman, Mathews Jacob, Amanda Chang, Kan Liu, Milan Sonka, Xiaodong Wu

    Abstract: Diffusion models have shown impressive performance for image generation, often times outperforming other generative models. Since their introduction, researchers have extended the powerful noise-to-image denoising pipeline to discriminative tasks, including image segmentation. In this work we propose a conditional score-based generative modeling framework for medical image segmentation which relie… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 5 pages, 5 figures, conference

  25. arXiv:2312.03141  [pdf, other

    cs.AR

    NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

    Authors: Yitu Wang, Shiyu Li, Qilin Zheng, Linghao Song, Zongwang Li, Andrew Chang, Hai "Helen" Li, Yiran Chen

    Abstract: Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size… ▽ More

    Submitted 28 May, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  26. arXiv:2311.09205  [pdf, other

    cs.CL

    When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

    Authors: Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

    Abstract: Multilingual language models are widely used to extend NLP systems to low-resource languages. However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce. Here, we pre-train over 10,000 monolingual and multilingual language models for over 250 languages, including multiple language families that are under-studied in NLP. We… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  27. arXiv:2311.09194  [pdf, other

    cs.CL

    Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

    Authors: James A. Michaelov, Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

    Abstract: Abstract grammatical knowledge - of parts of speech and grammatical patterns - is key to the capacity for linguistic generalization in humans. But how abstract is grammatical knowledge in large language models? In the human literature, compelling evidence for grammatical abstraction comes from structural priming. A sentence that shares the same grammatical structure as a preceding sentence is proc… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Accepted at EMNLP 2023

  28. arXiv:2311.02401  [pdf, other

    cs.LG

    BarcodeBERT: Transformers for Biodiversity Analysis

    Authors: Pablo Millan Arias, Niousha Sadjadi, Monireh Safari, ZeMing Gong, Austin T. Wang, Scott C. Lowe, Joakim Bruslund Haurum, Iuliia Zarubiieva, Dirk Steinke, Lila Kari, Angel X. Chang, Graham W. Taylor

    Abstract: Understanding biodiversity is a global challenge, in which DNA barcodes - short snippets of DNA that cluster by species - play a pivotal role. In particular, invertebrates, a highly diverse and under-explored group, pose unique taxonomic complexities. We explore machine learning approaches, comparing supervised CNNs, fine-tuned foundation models, and a DNA barcode-specific masking strategy across… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Main text: 5 pages, Total: 9 pages, 2 figures, accepted at the 4th Workshop on Self-Supervised Learning: Theory and Practice (NeurIPS 2023)

  29. arXiv:2310.07929  [pdf, other

    cs.CL

    Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models

    Authors: Catherine Arnett, Tyler A. Chang, James A. Michaelov, Benjamin K. Bergen

    Abstract: Do multilingual language models share abstract grammatical representations across languages, and if so, when do these develop? Following Sinclair et al. (2022), we use structural priming to test for abstract grammatical representations with causal effects on model outputs. We extend the approach to a Dutch-English bilingual setting, and we evaluate a Dutch-English language model during pre-trainin… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Extended abstract accepted to the 3rd Multilingual Representation Learning workshop at EMNLP 2023

  30. arXiv:2309.05251  [pdf, other

    cs.CV

    Multi3DRefer: Grounding Text Description to Multiple 3D Objects

    Authors: Yiming Zhang, ZeMing Gong, Angel X. Chang

    Abstract: We introduce the task of localizing a flexible number of objects in real-world 3D scenes using natural language descriptions. Existing 3D visual grounding tasks focus on localizing a unique object given a text description. However, such a strict setting is unnatural as localizing potentially multiple objects is a common need in real-world scenarios and robotic tasks (e.g., visual navigation and ob… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  31. arXiv:2308.15419  [pdf, other

    cs.CL

    Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

    Authors: Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

    Abstract: How do language models learn to make predictions during pre-training? To study this, we extract learning curves from five autoregressive English language model pre-training runs, for 1M unseen tokens in context. We observe that the language models generate short repetitive phrases before learning to generate longer and more coherent text. We also find that individual tokens often exhibit sudden in… ▽ More

    Submitted 30 July, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted to TACL (pre-MIT Press version)

  32. arXiv:2308.13746  [pdf, other

    cs.CV cs.AI cs.LG

    PE-MED: Prompt Enhancement for Interactive Medical Image Segmentation

    Authors: Ao Chang, Xing Tao, Xin Yang, Yuhao Huang, Xinrui Zhou, Jiajun Zeng, Ruobing Huang, Dong Ni

    Abstract: Interactive medical image segmentation refers to the accurate segmentation of the target of interest through interaction (e.g., click) between the user and the image. It has been widely studied in recent years as it is less dependent on abundant annotated data and more flexible than fully automated segmentation. However, current studies have not fully explored user-provided prompt information (e.g… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Accepted by MICCAI MLMI 2023

  33. arXiv:2308.08269  [pdf, other

    eess.IV cs.CV

    OnUVS: Online Feature Decoupling Framework for High-Fidelity Ultrasound Video Synthesis

    Authors: Han Zhou, Dong Ni, Ao Chang, Xinrui Zhou, Rusi Chen, Yanlin Chen, Lian Liu, Jiamin Liang, Yuhao Huang, Tong Han, Zhe Liu, Deng-Ping Fan, Xin Yang

    Abstract: Ultrasound (US) imaging is indispensable in clinical practice. To diagnose certain diseases, sonographers must observe corresponding dynamic anatomic structures to gather comprehensive information. However, the limited availability of specific US video cases causes teaching difficulties in identifying corresponding diseases, which potentially impacts the detection rate of such cases. The synthesis… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 14 pages, 13 figures and 6 tables

  34. arXiv:2307.16385  [pdf, other

    cs.RO

    Multi-gait Locomotion Planning and Tracking for Tendon-actuated Terrestrial Soft Robot (TerreSoRo)

    Authors: Arun Niddish Mahendran, Caitlin Freeman, Alexander H. Chang, Michael McDougall, Patricio A. Vela, Vishesh Vikas

    Abstract: The adaptability of soft robots makes them ideal candidates to maneuver through unstructured environments. However, locomotion challenges arise due to complexities in modeling the body mechanics, actuation, and robot-environment dynamics. These factors contribute to the gap between their potential and actual autonomous field deployment. A closed-loop path planning framework for soft robot locomoti… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

  35. arXiv:2307.11272  [pdf, other

    cs.ET

    Quantum Communication in 6G Satellite Networks: Entanglement Distribution Across Changing Topologies

    Authors: A. Sen, C. Sumnicht, S. Choudhuri, A. Chang, G. Xue

    Abstract: As LEO/VLEO satellites offer many attractive features, such as low transmission delay, they are expected to be an integral part of 6G. Global entanglement distribution over LEO and VLEO satellites network must reckon with satellite movement over time. Current studies do not fully capture the dynamic nature of satellite constellations. We model a dynamic LEO/VLEO satellite network as a time-varying… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  36. arXiv:2307.10455  [pdf, other

    cs.CV cs.AI cs.LG

    A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

    Authors: Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth

    Abstract: In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a c… ▽ More

    Submitted 13 November, 2023; v1 submitted 19 July, 2023; originally announced July 2023.

  37. arXiv:2307.07096  [pdf, other

    eess.AS cs.SD

    Low Rank Properties for Estimating Microphones Start Time and Sources Emission Time

    Authors: Faxian Cao, Yongqiang Cheng, Adil Mehmood Khan, Zhijing Yang, S. M. Ahsan Kazmiand Yingxiu Chang

    Abstract: Uncertainty in timing information pertaining to the start time of microphone recordings and sources' emission time pose significant challenges in various applications, such as joint microphones and sources localization. Traditional optimization methods, which directly estimate this unknown timing information (UTIm), often fall short compared to approaches exploiting the low-rank property (LRP). LR… ▽ More

    Submitted 21 July, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: 13 pages for main content; 9 pages for proof of proposed low rank properties; 13 figures

  38. arXiv:2306.11565  [pdf, other

    cs.RO cs.AI cs.CV

    HomeRobot: Open-Vocabulary Mobile Manipulation

    Authors: Sriram Yenamandra, Arun Ramachandran, Karmesh Yadav, Austin Wang, Mukul Khanna, Theophile Gervet, Tsung-Yen Yang, Vidhi Jain, Alexander William Clegg, John Turner, Zsolt Kira, Manolis Savva, Angel Chang, Devendra Singh Chaplot, Dhruv Batra, Roozbeh Mottaghi, Yonatan Bisk, Chris Paxton

    Abstract: HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it invol… ▽ More

    Submitted 10 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 37 pages, 22 figures, 8 tables

  39. arXiv:2306.11290  [pdf, other

    cs.CV

    Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation

    Authors: Mukul Khanna, Yongsen Mao, Hanxiao Jiang, Sanjay Haresh, Brennan Shacklett, Dhruv Batra, Alexander Clegg, Eric Undersander, Angel X. Chang, Manolis Savva

    Abstract: We contribute the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to test navigation agent generalization to realistic 3D environments. Our dataset represents real interiors and contains a diverse set of 18,656 models of real-world objects. We investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find… ▽ More

    Submitted 7 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  40. arXiv:2306.08894  [pdf, other

    cs.NI quant-ph

    Entanglement Distribution in Satellite-based Dynamic Quantum Networks

    Authors: Alena Chang, Yinxin Wan, Guoliang Xue, Arunabha Sen

    Abstract: Low Earth Orbit (LEO) satellites present a compelling opportunity for the establishment of a global quantum information network. However, satellite-based entanglement distribution from a networking perspective has not been fully investigated. Existing works often do not account for satellite movement over time when distributing entanglement and/or often do not permit entanglement distribution alon… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  41. arXiv:2305.18557  [pdf, other

    cs.CV

    Evaluating 3D Shape Analysis Methods for Robustness to Rotation Invariance

    Authors: Supriya Gadi Patil, Angel X. Chang, Manolis Savva

    Abstract: This paper analyzes the robustness of recent 3D shape descriptors to SO(3) rotations, something that is fundamental to shape modeling. Specifically, we formulate the task of rotated 3D object instance detection. To do so, we consider a database of 3D indoor scenes, where objects occur in different orientations. We benchmark different methods for feature extraction and classification in the context… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

    Comments: 20th Conference on Robots and Vision (CRV) 2023

  42. arXiv:2305.18383  [pdf, other

    stat.ML cs.LG

    A Three-regime Model of Network Pruning

    Authors: Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney

    Abstract: Recent work has highlighted the complex influence training hyperparameters, e.g., the number of training epochs, can have on the prunability of machine learning models. Perhaps surprisingly, a systematic approach to predict precisely how adjusting a specific hyperparameter will affect prunability remains elusive. To address this gap, we introduce a phenomenological model grounded in the statistica… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: ICML 2023

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:42790-42809, 2023

  43. arXiv:2305.17127  [pdf, other

    cs.CL

    Characterizing and Measuring Linguistic Dataset Drift

    Authors: Tyler A. Chang, Kishaloy Halder, Neha Anna John, Yogarshi Vyas, Yassine Benajiba, Miguel Ballesteros, Dan Roth

    Abstract: NLP models often degrade in performance when real world data distributions differ markedly from training data. However, existing dataset drift metrics in NLP have generally not considered specific dimensions of linguistic drift that affect model performance, and they have not been validated in their ability to predict model performance at the individual example level, where such metrics are often… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  44. arXiv:2304.14660  [pdf, other

    eess.IV cs.CV cs.LG

    Segment Anything Model for Medical Images?

    Authors: Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen, Sijing Liu, Haozhe Chi, Xindi Hu, Kejuan Yue, Lei Li, Vicente Grau, Deng-Ping Fan, Fajin Dong, Dong Ni

    Abstract: The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's perfo… ▽ More

    Submitted 17 January, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: Accepted by Medical Image Analysis. 23 pages, 18 figures, 8 tables

  45. arXiv:2304.03696  [pdf, other

    cs.RO cs.CV

    MOPA: Modular Object Navigation with PointGoal Agents

    Authors: Sonia Raychaudhuri, Tommaso Campari, Unnat Jain, Manolis Savva, Angel X. Chang

    Abstract: We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to systematically investigate the inherent modularity of the object navigation task in Embodied AI. MOPA consists of four modules: (a) an object detection module trained to identify objects from RGB images, (b) a map building module to build a semantic map of the observed objects, (c) an exploration m… ▽ More

    Submitted 27 January, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  46. arXiv:2303.14087  [pdf, other

    cs.CV

    OPDMulti: Openable Part Detection for Multiple Objects

    Authors: Xiaohao Sun, Hanxiao Jiang, Manolis Savva, Angel Xuan Chang

    Abstract: Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset ba… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  47. arXiv:2303.11504  [pdf, ps, other

    cs.CL

    Language Model Behavior: A Comprehensive Survey

    Authors: Tyler A. Chang, Benjamin K. Bergen

    Abstract: Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sen… ▽ More

    Submitted 25 August, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: 32 pages, accepted to Computational Linguistics

  48. Multimodal Speech Recognition for Language-Guided Embodied Agents

    Authors: Allen Chang, Xiaoyuan Zhu, Aarav Monga, Seoho Ahn, Tejas Srinivasan, Jesse Thomason

    Abstract: Benchmarks for language-guided embodied agents typically assume text-based instructions, but deployed agents will encounter spoken instructions. While Automatic Speech Recognition (ASR) models can bridge the input gap, erroneous ASR transcripts can hurt the agents' ability to complete tasks. In this work, we propose training a multimodal ASR model to reduce errors in transcribing spoken instructio… ▽ More

    Submitted 9 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: 5 pages, 5 figures

    Journal ref: Proceedings of Interspeech 2023, 1608-1612

  49. arXiv:2302.05991  [pdf, other

    cs.CV

    Digital Twin Tracking Dataset (DTTD): A New RGB+Depth 3D Dataset for Longer-Range Object Tracking Applications

    Authors: Weiyu Feng, Seth Z. Zhao, Chuanyu Pan, Adam Chang, Yichen Chen, Zekun Wang, Allen Y. Yang

    Abstract: Digital twin is a problem of augmenting real objects with their digital counterparts. It can underpin a wide range of applications in augmented reality (AR), autonomy, and UI/UX. A critical component in a good digital-twin system is real-time, accurate 3D object tracking. Most existing works solve 3D object tracking through the lens of robotic grasping, employ older generations of depth sensors, a… ▽ More

    Submitted 11 April, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

  50. arXiv:2212.00836  [pdf, other

    cs.CV

    UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding

    Authors: Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang

    Abstract: Performing 3D dense captioning and visual grounding requires a common and shared understanding of the underlying multimodal relationships. However, despite some previous attempts on connecting these two related tasks with highly task-specific neural modules, it remains understudied how to explicitly depict their shared nature to learn them simultaneously. In this work, we propose UniT3D, a simple… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.