Search | arXiv e-print repository

TikTok Engagement Traces Over Time and Health Risky Behaviors: Combining Data Linkage and Computational Methods

Abstract: Digital technologies and social algorithms are revolutionizing the media landscape, altering how we select and consume health information. Extending the selectivity paradigm with research on social media engagement, the convergence perspective, and algorithmic impact, this study investigates how individuals' liked TikTok videos on various health-risk topics are associated with their vaping and dri… ▽ More Digital technologies and social algorithms are revolutionizing the media landscape, altering how we select and consume health information. Extending the selectivity paradigm with research on social media engagement, the convergence perspective, and algorithmic impact, this study investigates how individuals' liked TikTok videos on various health-risk topics are associated with their vaping and drinking behaviors. Methodologically, we relied on data linkage to objectively measure selective engagement on social media, which involves combining survey self-reports with digital traces from TikTok interactions for the consented respondents (n = 166). A computational analysis of 13,724 health-related videos liked by these respondents from 2020 to 2023 was conducted. Our findings indicate that users who initially liked drinking-related content on TikTok are inclined to favor more of such videos over time, with their likes on smoking, drinking, and fruit and vegetable videos influencing their self-reported vaping and drinking behaviors. Our study highlights the methodological value of combining digital traces, computational analysis, and self-reported data for a more objective examination of social media consumption and engagement, as well as a more ecologically valid understanding of social media's behavioral impact. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 12 pages. Under review

arXiv:2406.09630 [pdf, other]

Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition

Authors: Mehreen Saeed, Adrian Chan, Anupam Mijar, Joseph Moukarzel, Georges Habchi, Carlos Younes, Amin Elias, Chau-Wai Wong, Akram Khater

Abstract: We present the Manuscripts of Handwritten Arabic~(Muharaf) dataset, which is a machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of the art in handwritten… ▽ More We present the Manuscripts of Handwritten Arabic~(Muharaf) dataset, which is a machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of the art in handwritten text recognition (HTR), not only for Arabic manuscripts but also for cursive text in general. The Muharaf dataset includes diverse handwriting styles and a wide range of document types, including personal letters, diaries, notes, poems, church records, and legal correspondences. In this paper, we describe the data acquisition pipeline, notable dataset features, and statistics. We also provide a preliminary baseline result achieved by training convolutional neural networks using this data. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08411 [pdf, other]

Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigm

Authors: Xinyan Zhao, Yuan Sun, Wenlin Liu, Chau-Wai Wong

Abstract: This study is among the first to develop different prototypes of generative AI (GenAI) chatbots powered by GPT 4 to communicate hurricane preparedness information to diverse residents. Drawing from the Computers Are Social Actors (CASA) paradigm and the literature on disaster vulnerability and cultural tailoring, this study conducted a between-subjects experiment with 441 Black, Hispanic, and Cauc… ▽ More This study is among the first to develop different prototypes of generative AI (GenAI) chatbots powered by GPT 4 to communicate hurricane preparedness information to diverse residents. Drawing from the Computers Are Social Actors (CASA) paradigm and the literature on disaster vulnerability and cultural tailoring, this study conducted a between-subjects experiment with 441 Black, Hispanic, and Caucasian residents of Florida. A computational analysis of chat logs (N = 7,848) shows that anthropomorphism and personalization are key communication topics in GenAI chatbot-user interactions. SEM results (N = 441) suggest that GenAI chatbots varying in tone formality and cultural tailoring significantly predict bot perceptions and, subsequently, hurricane preparedness outcomes. These results highlight the potential of using GenAI chatbots to improve diverse communities' disaster preparedness. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 21 pages

MSC Class: 68U15

arXiv:2405.20429 [pdf, other]

Quantum Preference Query

Authors: Hao Liu, Xiaotian You, Raymond Chi-Wing Wong

Abstract: Given a large dataset of many tuples, it is hard for users to pick out their preferred tuples. Thus, the preference query problem, which is to find the most preferred tuples from a dataset, is widely discussed in the database area. In this problem, a utility function is given by the user to evaluate to what extent the user prefers a tuple. However, considering a dataset consisting of N tuples, the… ▽ More Given a large dataset of many tuples, it is hard for users to pick out their preferred tuples. Thus, the preference query problem, which is to find the most preferred tuples from a dataset, is widely discussed in the database area. In this problem, a utility function is given by the user to evaluate to what extent the user prefers a tuple. However, considering a dataset consisting of N tuples, the existing algorithms need O(N) time to answer a query, or need O(N) time for a cold start to answer a query. The reason is that in a classical computer, a linear time is needed to evaluate the utilities by the utility function for N tuples. In this paper, we discuss the Quantum Preference Query (QPQ) problem, where the dataset is given in a quantum memory, and we use a quantum computer to return the answers. Due to quantum parallelism, the quantum algorithm can theoretically perform better than their classical competitors. We discuss this problem in different kinds of input and output. In the QPQ problem, the input can be a number k or a threshold theta. Given k, the problem is to return k tuples with the highest utilities. Given theta, the problem is to return all the tuples with utilities higher than theta. Also, in QPQ problem, the output can be classical (i.e., a list of tuples) or quantum (i.e., a superposition in quantum bits). We proposed four quantum algorithms to solve the problems in the above four scenarios. We analyze the number of memory accesses needed for each quantum algorithm, which shows that the proposed quantum algorithms are at least quadratically faster than their classical competitors. In our experiments, we show that to answer a QPQ problem, the quantum algorithms achieve up to 1000x improvement in number of memory accesses than their classical competitors, which proved that QPQ problem could be a future direction of the study of preference query problems. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.20416 [pdf, other]

First Tree-like Quantum Data Structure: Quantum B+ Tree

Authors: Hao Liu, Xiaotian You, Raymond Chi-Wing Wong

Abstract: Quantum computing is a popular topic in computer science, which has recently attracted many studies in various areas such as machine learning and network. However, the topic of quantum data structures seems neglected. There is an open problem in the database area: Can we improve existing data structures by quantum techniques? Consider a dataset of key-record pairs. Given an interval as a query ran… ▽ More Quantum computing is a popular topic in computer science, which has recently attracted many studies in various areas such as machine learning and network. However, the topic of quantum data structures seems neglected. There is an open problem in the database area: Can we improve existing data structures by quantum techniques? Consider a dataset of key-record pairs. Given an interval as a query range, a classical B+ tree can report all the records with keys within this interval, which is called a range query, in O(log N + k) time, where N is the total number of records and k is the output size. It is asymptotically optimal in a classical computer but not efficient enough in a quantum computer, because it is expected that the execution time and the output size are linear in a quantum computer. In this paper, we propose the quantum range query problem. Different from the classical range queries, a quantum range query returns the results in quantum bits, which has broad potential applications due to the foreseeable advance of quantum computers and quantum algorithms. To the best of our knowledge, we design the first tree-like quantum data structure called the quantum B+ tree. Based on this data structure, we propose a hybrid quantum-classical algorithm to do the range search. It answers a static quantum range query in O(log_B N) time, which is asymptotically optimal in quantum computers. Since the execution time does not depend on the output size (i.e., k, which could be as large as O(N)), it is significantly faster than the classical data structure. Moreover, we extend our quantum B+ tree to answer the dynamic and d-dimensional quantum range queries efficiently in O(log^2_B N) and O(log^d_B N) time, respectively. Our experimental results show that our proposed quantum data structures achieve up to 1000x improvement in the number of memory accesses compared to their classical competitors. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17940 [pdf, other]

World Models for General Surgical Grasping

Authors: Hongbin Lin, Bin Li, Chun Wai Wong, Juan Rojas, Xiangyu Chu, Kwok Wai Samuel Au

Abstract: Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor poli… ▽ More Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor policy for surgical grasping, enhancing both generality and robustness. In particular, a novel method is proposed to estimate the values and uncertainties of depth pixels for a rigid-link object's inaccurate region based on the empirical prior of the object's size; both depth and mask images of task objects are encoded to a single compact 3-channel image (size: 64x64x3) by dynamically zooming in the mask regions, minimizing the information loss. The learned controller's effectiveness is extensively evaluated in simulation and in a real robot. Our learned visuomotor policy handles: i) unseen objects, including 5 types of target grasping objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control. Note that we are the first work to achieve a unified surgical control system that grasps diverse surgical objects using different robot grippers on real robots in complex surgery scenes (average success rate: 69%). Our system also demonstrates significant robustness across 6 conditions including background variation, target disturbance, camera pose variation, kinematic control error, image noise, and re-grasping after the gripped target object drops from the gripper. Videos and codes can be found on our project page: https://linhongbin.github.io/gas/. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Journal ref: Robotics: Science and Systems 2024

arXiv:2405.05241 [pdf, other]

BenthicNet: A global compilation of seafloor images for deep learning applications

Authors: Scott C. Lowe, Benjamin Misiuk, Isaac Xu, Shakhboz Abdulazizov, Amit R. Baroi, Alex C. Bastos, Merlin Best, Vicki Ferrini, Ariell Friedman, Deborah Hart, Ove Hoegh-Guldberg, Daniel Ierodiaconou, Julia Mackin-McLaughlin, Kathryn Markey, Pedro S. Menandro, Jacquomo Monk, Shreya Nemani, John O'Brien, Elizabeth Oh, Luba Y. Reshitnyk, Katleen Robert, Chris M. Roelfsema, Jessica A. Sameoto, Alexandre C. G. Schimel, Jordan A. Thomson , et al. (4 additional authors not shown)

Abstract: Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with… ▽ More Advances in underwater imaging enable the collection of extensive seafloor image datasets that are necessary for monitoring important benthic ecosystems. The ability to collect seafloor imagery has outpaced our capacity to analyze it, hindering expedient mobilization of this crucial environmental information. Recent machine learning approaches provide opportunities to increase the efficiency with which seafloor image datasets are analyzed, yet large and consistent datasets necessary to support development of such approaches are scarce. Here we present BenthicNet: a global compilation of seafloor imagery designed to support the training and evaluation of large-scale image recognition models. An initial set of over 11.4 million images was collected and curated to represent a diversity of seafloor environments using a representative subset of 1.3 million images. These are accompanied by 2.6 million annotations translated to the CATAMI scheme, which span 190,000 of the images. A large deep learning model was trained on this compilation and preliminary results suggest it has utility for automating large and small-scale image analysis tasks. The compilation and model are made openly available for use by the scientific community at https://doi.org/10.20383/103.0614. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.13135 [pdf, other]

Hybrid Continuum-Eversion Robot: Precise Navigation and Decontamination in Nuclear Environments using Vine Robot

Authors: Mohammed Al-Dubooni, Cuebong Wong, Kaspar Althoefer

Abstract: Soft growing vine robots show great potential for navigation and decontamination tasks in the nuclear industry. This paper introduces a novel hybrid continuum-eversion robot designed to address certain challenges in relation to navigating and operating within pipe networks and enclosed remote vessels. The hybrid robot combines the flexibility of a soft eversion robot with the precision of a contin… ▽ More Soft growing vine robots show great potential for navigation and decontamination tasks in the nuclear industry. This paper introduces a novel hybrid continuum-eversion robot designed to address certain challenges in relation to navigating and operating within pipe networks and enclosed remote vessels. The hybrid robot combines the flexibility of a soft eversion robot with the precision of a continuum robot at its tip, allowing for controlled steering and movement in hard to access and/or complex environments. The design enables the delivery of sensors, liquids, and aerosols to remote areas, supporting remote decontamination activities. This paper outlines the design and construction of the robot and the methods by which it achieves selective steering. We also include a comprehensive review of current related work in eversion robotics, as well as other steering devices and actuators currently under research, which underpin this novel active steering approach. This is followed by an experimental evaluation that demonstrates the robot's real-world capabilities in delivering liquids and aerosols to remote locations. The experiments reveal successful outcomes, with over 95% success in precision spraying tests. The paper concludes by discussing future work alongside limitations in the current design, ultimately showcasing its potential as a solution for remote decontamination operations in the nuclear industry. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 7 pages, 8 figures, conference

arXiv:2404.11816 [pdf, other]

Tailoring Generative Adversarial Networks for Smooth Airfoil Design

Authors: Joyjit Chattoraj, Jian Cheng Wong, Zhang Zexuan, Manna Dai, Xia Yingzhi, Li Jichao, Xu Xinxing, Ooi Chin Chun, Yang Feng, Dao My Ha, Liu Yong

Abstract: In the realm of aerospace design, achieving smooth curves is paramount, particularly when crafting objects such as airfoils. Generative Adversarial Network (GAN), a widely employed generative AI technique, has proven instrumental in synthesizing airfoil designs. However, a common limitation of GAN is the inherent lack of smoothness in the generated airfoil surfaces. To address this issue, we prese… ▽ More In the realm of aerospace design, achieving smooth curves is paramount, particularly when crafting objects such as airfoils. Generative Adversarial Network (GAN), a widely employed generative AI technique, has proven instrumental in synthesizing airfoil designs. However, a common limitation of GAN is the inherent lack of smoothness in the generated airfoil surfaces. To address this issue, we present a GAN model featuring a customized loss function built to produce seamlessly contoured airfoil designs. Additionally, our model demonstrates a substantial increase in design diversity compared to a conventional GAN augmented with a post-processing smoothing filter. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10194 [pdf, other]

Impostor Syndrome in Final Year Computer Science Students: An Eye Tracking and Biometrics Study

Authors: Alyssia Chen, Carol Wong, Katy Tarrit, Anthony Peruma

Abstract: Imposter syndrome is a psychological phenomenon that affects individuals who doubt their skills and abilities, despite possessing the necessary competencies. This can lead to a lack of confidence and poor performance. While research has explored the impacts of imposter syndrome on students and professionals in various fields, there is limited knowledge on how it affects code comprehension in softw… ▽ More Imposter syndrome is a psychological phenomenon that affects individuals who doubt their skills and abilities, despite possessing the necessary competencies. This can lead to a lack of confidence and poor performance. While research has explored the impacts of imposter syndrome on students and professionals in various fields, there is limited knowledge on how it affects code comprehension in software engineering. In this exploratory study, we investigate the prevalence of imposter syndrome among final-year undergraduate computer science students and its effects on their code comprehension cognition using an eye tracker and heart rate monitor. Key findings demonstrate that students identifying as male exhibit lower imposter syndrome levels when analyzing code, and higher imposter syndrome is associated with increased time reviewing a code snippet and a lower likelihood of solving it correctly. This study provides initial data on this topic and establishes a foundation for further research to support student academic success and improve developer productivity and mental well-being. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted at: 18th International Conference, AC 2024, Held as Part of the 26th HCI International Conference, HCII 2024

arXiv:2404.07135 [pdf, other]

Towards Robustness of Text-to-Visualization Translation against Lexical and Phrasal Variability

Authors: Jinwei Lu, Yuanfeng Song, Haodi Zhang, Chen Zhang, Raymond Chi-Wing Wong

Abstract: Text-to-Vis is an emerging task in the natural language processing (NLP) area that aims to automatically generate data visualizations from natural language questions (NLQs). Despite their progress, existing text-to-vis models often heavily rely on lexical matching between words in the questions and tokens in data schemas. This overreliance on lexical matching may lead to a diminished level of mode… ▽ More Text-to-Vis is an emerging task in the natural language processing (NLP) area that aims to automatically generate data visualizations from natural language questions (NLQs). Despite their progress, existing text-to-vis models often heavily rely on lexical matching between words in the questions and tokens in data schemas. This overreliance on lexical matching may lead to a diminished level of model robustness against input variations. In this study, we thoroughly examine the robustness of current text-to-vis models, an area that has not previously been explored. In particular, we construct the first robustness dataset nvBench-Rob, which contains diverse lexical and phrasal variations based on the original text-to-vis benchmark nvBench. Then, we found that the performance of existing text-to-vis models on this new dataset dramatically drops, implying that these methods exhibit inadequate robustness overall. Finally, we propose a novel framework based on Retrieval-Augmented Generation (RAG) technique, named GRED, specifically designed to address input perturbations in these two variants. The framework consists of three parts: NLQ-Retrieval Generator, Visualization Query-Retrieval Retuner and Annotation-based Debugger, which are used to tackle the challenges posed by natural language variants, programming style differences and data schema variants, respectively. Extensive experimental evaluations show that, compared to the state-of-the-art model RGVisNet in the Text-to-Vis field, GRED performs better in terms of model robustness, with a 32% increase in accuracy on the proposed nvBench-Rob dataset. △ Less

Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.00032 [pdf, other]

Deployment of Deep Learning Model in Real World Clinical Setting: A Case Study in Obstetric Ultrasound

Authors: Chun Kit Wong, Mary Ngo, Manxi Lin, Zahra Bashir, Amihai Heen, Morten Bo Søndergaard Svendsen, Martin Grønnebæk Tolsgaard, Anders Nymark Christensen, Aasa Feragen

Abstract: Despite the rapid development of AI models in medical image analysis, their validation in real-world clinical settings remains limited. To address this, we introduce a generic framework designed for deploying image-based AI models in such settings. Using this framework, we deployed a trained model for fetal ultrasound standard plane detection, and evaluated it in real-time sessions with both novic… ▽ More Despite the rapid development of AI models in medical image analysis, their validation in real-world clinical settings remains limited. To address this, we introduce a generic framework designed for deploying image-based AI models in such settings. Using this framework, we deployed a trained model for fetal ultrasound standard plane detection, and evaluated it in real-time sessions with both novice and expert users. Feedback from these sessions revealed that while the model offers potential benefits to medical practitioners, the need for navigational guidance was identified as a key area for improvement. These findings underscore the importance of early deployment of AI models in real-world settings, leading to insights that can guide the refinement of the model and system based on actual user feedback. △ Less

Submitted 22 March, 2024; originally announced April 2024.

Comments: 10 pages

arXiv:2403.08002 [pdf, other]

Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

Authors: Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Akshay Chaudhari, Serena Yeung-Levy, Curtis P. Langlotz , et al. (2 additional authors not shown)

Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant… ▽ More The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space, as exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. For best practice, we conduct a systematic ablation study on various choices in data engineering and multimodal training. The resulting LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications. △ Less

Submitted 26 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.00192 [pdf, other]

Block-MDS QC-LDPC Codes for Information Reconciliation in Key Distribution

Authors: Lev Tauz, Debarnab Mitra, Jayanth Shreekumar, Murat Can Sarihan, Chee Wei Wong, Lara Dolecek

Abstract: Quantum key distribution (QKD) is a popular protocol that provides information theoretically secure keys to multiple parties. Two important post-processing steps of QKD are 1) the information reconciliation (IR) step, where parties reconcile mismatches in generated keys through classical communication, and 2) the privacy amplification (PA) step, where parties distill their common key into a new se… ▽ More Quantum key distribution (QKD) is a popular protocol that provides information theoretically secure keys to multiple parties. Two important post-processing steps of QKD are 1) the information reconciliation (IR) step, where parties reconcile mismatches in generated keys through classical communication, and 2) the privacy amplification (PA) step, where parties distill their common key into a new secure key that the adversary has little to no information about. In general, these two steps have been abstracted as two distinct problems. In this work, we consider a new technique of performing the IR and PA steps jointly through sampling that relaxes the requirement on the IR step, allowing for more success in key creation. We provide a novel LDPC code construction known as Block-MDS QC-LDPC codes that can utilize the relaxed requirement by creating LDPC codes with pre-defined sub-matrices of full-rank. We demonstrate through simulations that our technique of sampling can provide notable gains in successfully creating secret keys. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 7 pages, 1 figure, submitted to the International Symposium on Information Theory (ISIT) 2024

arXiv:2402.08294 [pdf, other]

Learning semantic image quality for fetal ultrasound from noisy ranking annotation

Authors: Manxi Lin, Jakob Ambsdorf, Emilie Pi Fogtmann Sejer, Zahra Bashir, Chun Kit Wong, Paraskevas Pegios, Alberto Raheli, Morten Bo Søndergaard Svendsen, Mads Nielsen, Martin Grønnebæk Tolsgaard, Anders Nymark Christensen, Aasa Feragen

Abstract: We introduce the notion of semantic image quality for applications where image quality relies on semantic requirements. Working in fetal ultrasound, where ranking is challenging and annotations are noisy, we design a robust coarse-to-fine model that ranks images based on their semantic image quality and endow our predicted rankings with an uncertainty estimate. To annotate rankings on training dat… ▽ More We introduce the notion of semantic image quality for applications where image quality relies on semantic requirements. Working in fetal ultrasound, where ranking is challenging and annotations are noisy, we design a robust coarse-to-fine model that ranks images based on their semantic image quality and endow our predicted rankings with an uncertainty estimate. To annotate rankings on training data, we design an efficient ranking annotation scheme based on the merge sort algorithm. Finally, we compare our ranking algorithm to a number of state-of-the-art ranking algorithms on a challenging fetal ultrasound quality assessment task, showing the superior performance of our method on the majority of rank correlation metrics. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: Extended version of the accepted paper at ISBI 2024

arXiv:2401.16247 [pdf, other]

Towards Red Teaming in Multimodal and Multilingual Translation

Authors: Christophe Ropers, David Dale, Prangthip Hansanti, Gabriel Mejia Gonzalez, Ivan Evtimov, Corinne Wong, Christophe Touret, Kristina Pereyra, Seohyun Sonia Kim, Cristian Canton Ferrer, Pierre Andrews, Marta R. Costa-jussà

Abstract: Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reli… ▽ More Assessing performance in Natural Language Processing is becoming increasingly complex. One particular challenge is the potential for evaluation datasets to overlap with training data, either directly or indirectly, which can lead to skewed results and overestimation of model performance. As a consequence, human evaluation is gaining increasing interest as a means to assess the performance and reliability of models. One such method is the red teaming approach, which aims to generate edge cases where a model will produce critical errors. While this methodology is becoming standard practice for generative AI, its application to the realm of conditional AI remains largely unexplored. This paper presents the first study on human-based red teaming for Machine Translation (MT), marking a significant step towards understanding and improving the performance of translation models. We delve into both human-based red teaming and a study on automation, reporting lessons learned and providing recommendations for both translation models and red teaming drills. This pioneering work opens up new avenues for research and development in the field of MT. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2312.05187

ACM Class: I.2.7

arXiv:2401.07654 [pdf, other]

Foundation Models for Biomedical Image Segmentation: A Survey

Authors: Ho Hin Lee, Yu Gu, Theodore Zhao, Yanbo Xu, Jianwei Yang, Naoto Usuyama, Cliff Wong, Mu Wei, Bennett A. Landman, Yuankai Huo, Alberto Santamaria-Pang, Hoifung Poon

Abstract: Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical im… ▽ More Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical imaging. The fundamental premise of SAM lies in its capability to segment or identify objects in images without prior knowledge of the object type or imaging modality. This approach aligns well with tasks achievable by the human visual system, though its application in non-biological vision contexts remains more theoretically challenging. A notable feature of SAM is its ability to adjust segmentation according to a specified resolution scale or area of interest, akin to semantic priming. This adaptability has spurred a wave of creativity and innovation in applying SAM to medical imaging. Our review focuses on the period from April 1, 2023, to September 30, 2023, a critical first six months post-initial publication. We examine the adaptations and integrations of SAM necessary to address longstanding clinical challenges, particularly in the context of 33 open datasets covered in our analysis. While SAM approaches or achieves state-of-the-art performance in numerous applications, it falls short in certain areas, such as segmentation of the carotid artery, adrenal glands, optic nerve, and mandible bone. Our survey delves into the innovative techniques where SAM's foundational approach excels and explores the core concepts in translating and applying these models effectively in diverse medical imaging scenarios. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 22 pages, 4 figures, 7 tables

arXiv:2312.15006 [pdf, other]

Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities

Authors: Yuhao Chen, Chloe Wong, Hanwen Yang, Juan Aguenza, Sai Bhujangari, Benthan Vu, Xun Lei, Amisha Prasad, Manny Fluss, Eric Phuong, Minghao Liu, Raja Kumar, Vanshika Vats, James Davis

Abstract: This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of large language models (LLMs). The investigation uses three prescriptive prompting methods - simple, persona, and conversational prompting - known for their effectiveness in enhancing the linguistic tasks of LLMs. We conduct this analysis on OpenAI's LLM chatbot, ChatGPT-3.5, on e… ▽ More This study critically evaluates the efficacy of prompting methods in enhancing the mathematical reasoning capability of large language models (LLMs). The investigation uses three prescriptive prompting methods - simple, persona, and conversational prompting - known for their effectiveness in enhancing the linguistic tasks of LLMs. We conduct this analysis on OpenAI's LLM chatbot, ChatGPT-3.5, on extensive problem sets from the MATH, GSM8K, and MMLU datasets, encompassing a broad spectrum of mathematical challenges. A grading script adapted to each dataset is used to determine the effectiveness of these prompting interventions in enhancing the model's mathematical analysis power. Contrary to expectations, our empirical analysis reveals that none of the investigated methods consistently improves over ChatGPT-3.5's baseline performance, with some causing significant degradation. Our findings suggest that prompting strategies do not necessarily generalize to new domains, in this study failing to enhance mathematical performance. △ Less

Submitted 20 February, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.08034 [pdf, other]

Individualized Deepfake Detection Exploiting Traces Due to Double Neural-Network Operations

Authors: Mushfiqur Rahman, Runze Liu, Chau-Wai Wong, Huaiyu Dai

Abstract: In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial ima… ▽ More In today's digital landscape, journalists urgently require tools to verify the authenticity of facial images and videos depicting specific public figures before incorporating them into news stories. Existing deepfake detectors are not optimized for this detection task when an image is associated with a specific and identifiable individual. This study focuses on the deepfake detection of facial images of individual public figures. We propose to condition the proposed detector on the identity of the identified individual given the advantages revealed by our theory-driven simulations. While most detectors in the literature rely on perceptible or imperceptible artifacts present in deepfake facial images, we demonstrate that the detection performance can be improved by exploiting the idempotency property of neural networks. In our approach, the training process involves double neural-network operations where we pass an authentic image through a deepfake simulating network twice. Experimental results show that the proposed method improves the area under the curve (AUC) from 0.92 to 0.94 and reduces its standard deviation by 17\%. For evaluating the detection performance of individual public figures, a facial image dataset with individuals' names is required, a criterion not met by the current deepfake datasets. To address this, we curated a dataset comprising 32k images featuring 45 public figures, which we intend to release to the public after the paper is published. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.05187 [pdf, other]

Seamless: Multilingual Expressive and Streaming Speech Translation

Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4T model-SeamlessM4T v2. This newer model, incorporating an updated UnitY2 framework, was trained on more low-resource language data. SeamlessM4T v2 provides the foundation on which our next two models are initiated. SeamlessExpressive enables translation that preserves vocal styles and prosody. Compared to previous efforts in expressive speech research, our work addresses certain underexplored aspects of prosody, such as speech rate and pauses, while also preserving the style of one's voice. As for SeamlessStreaming, our model leverages the Efficient Monotonic Multihead Attention mechanism to generate low-latency target translations without waiting for complete source utterances. As the first of its kind, SeamlessStreaming enables simultaneous speech-to-speech/text translation for multiple source and target languages. To ensure that our models can be used safely and responsibly, we implemented the first known red-teaming effort for multimodal machine translation, a system for the detection and mitigation of added toxicity, a systematic evaluation of gender bias, and an inaudible localized watermarking mechanism designed to dampen the impact of deepfakes. Consequently, we bring major components from SeamlessExpressive and SeamlessStreaming together to form Seamless, the first publicly available system that unlocks expressive cross-lingual communication in real-time. The contributions to this work are publicly released and accessible at https://github.com/facebookresearch/seamless_communication △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04022 [pdf, other]

doi 10.1109/TIP.2024.3409189

Analysis of Coding Gain Due to In-Loop Reshaping

Authors: Chau-Wai Wong, Chang-Hong Fu, Mengting Xu, Guan-Ming Su

Abstract: Reshaping, a point operation that alters the characteristics of signals, has been shown capable of improving the compression ratio in video coding practices. Out-of-loop reshaping that directly modifies the input video signal was first adopted as the supplemental enhancement information (SEI) for the HEVC/H.265 without the need to alter the core design of the video codec. VVC/H.266 further improve… ▽ More Reshaping, a point operation that alters the characteristics of signals, has been shown capable of improving the compression ratio in video coding practices. Out-of-loop reshaping that directly modifies the input video signal was first adopted as the supplemental enhancement information (SEI) for the HEVC/H.265 without the need to alter the core design of the video codec. VVC/H.266 further improves the coding efficiency by adopting in-loop reshaping that modifies the residual signal being processed in the hybrid coding loop. In this paper, we theoretically analyze the rate-distortion performance of the in-loop reshaping and use experiments to verify the theoretical result. We prove that the in-loop reshaping can improve coding efficiency when the entropy coder adopted in the coding pipeline is suboptimal, which is in line with the practical scenarios that video codecs operate in. We derive the PSNR gain in a closed form and show that the theoretically predicted gain is consistent with that measured from experiments using standard testing video sequences. △ Less

Submitted 19 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Published in IEEE Transactions on Image Processing

arXiv:2312.03243 [pdf, other]

Generalizable Neural Physics Solvers by Baldwinian Evolution

Authors: Jian Cheng Wong, Chin Chun Ooi, Abhishek Gupta, Pao-Hsiung Chiu, Joshua Shao Zheng Low, My Ha Dao, Yew-Soon Ong

Abstract: Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin ef… ▽ More Physics-informed neural networks (PINNs) are at the forefront of scientific machine learning, making possible the creation of machine intelligence that is cognizant of physical laws and able to accurately simulate them. In this paper, the potential of discovering PINNs that generalize over an entire family of physics tasks is studied, for the first time, through a biological lens of the Baldwin effect. Drawing inspiration from the neurodevelopment of precocial species that have evolved to learn, predict and react quickly to their environment, we envision PINNs that are pre-wired with connection strengths inducing strong biases towards efficient learning of physics. To this end, evolutionary selection pressure (guided by proficiency over a family of tasks) is coupled with lifetime learning (to specialize on a smaller subset of those tasks) to produce PINNs that demonstrate fast and physics-compliant prediction capabilities across a range of empirically challenging problem instances. The Baldwinian approach achieves an order of magnitude improvement in prediction accuracy at a fraction of the computation cost compared to state-of-the-art results with PINNs meta-learned by gradient descent. This paper marks a leap forward in the meta-learning of PINNs as generalizable physics solvers. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.13016 [pdf, other]

doi 10.1109/IGARSS52108.2023.10281551

Image-Based Soil Organic Carbon Remote Sensing from Satellite Images with Fourier Neural Operator and Structural Similarity

Authors: Ken C. L. Wong, Levente Klein, Ademir Ferreira da Silva, Hongzhi Wang, Jitendra Singh, Tanveer Syeda-Mahmood

Abstract: Soil organic carbon (SOC) sequestration is the transfer and storage of atmospheric carbon dioxide in soils, which plays an important role in climate change mitigation. SOC concentration can be improved by proper land use, thus it is beneficial if SOC can be estimated at a regional or global scale. As multispectral satellite data can provide SOC-related information such as vegetation and soil prope… ▽ More Soil organic carbon (SOC) sequestration is the transfer and storage of atmospheric carbon dioxide in soils, which plays an important role in climate change mitigation. SOC concentration can be improved by proper land use, thus it is beneficial if SOC can be estimated at a regional or global scale. As multispectral satellite data can provide SOC-related information such as vegetation and soil properties at a global scale, estimation of SOC through satellite data has been explored as an alternative to manual soil sampling. Although existing studies show promising results, they are mainly based on pixel-based approaches with traditional machine learning methods, and convolutional neural networks (CNNs) are uncommon. To study the use of CNNs on SOC remote sensing, here we propose the FNO-DenseNet based on the Fourier neural operator (FNO). By combining the advantages of the FNO and DenseNet, the FNO-DenseNet outperformed the FNO in our experiments with hundreds of times fewer parameters. The FNO-DenseNet also outperformed a pixel-based random forest by 18% in the mean absolute percentage error. △ Less

Submitted 21 November, 2023; originally announced November 2023.

Comments: This paper was accepted by the 2023 IEEE International Geoscience and Remote Sensing Symposium (IGARSS 2023)

arXiv:2311.09581 [pdf, other]

DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation

Authors: Yiqing Xie, Sheng Zhang, Hao Cheng, Pengfei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, Carolyn Rose

Abstract: Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including in… ▽ More Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including instruction-following (both proprietary and open-source) and supervised entailment models. We demonstrate the effectiveness of the resulting framework, DocLens, with three evaluators on three tasks: clinical note generation, radiology report summarization, and patient question summarization. A comprehensive human study shows that DocLens exhibits substantially higher agreement with the judgments of medical experts than existing metrics. The results also highlight the need to improve open-source evaluators and suggest potential directions. △ Less

Submitted 18 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.03032 [pdf, other]

Reconfigurable, Transformable Soft Pneumatic Actuator with Tunable 3D Deformations for Dexterous Soft Robotics Applications

Authors: Dickson Chiu Yu Wong, Mingtan Li, Shijie Kang, Lifan Luo, Hongyu Yu

Abstract: Numerous soft actuators based on PneuNet design have already been proposed and extensively employed across various soft robotics applications in recent years. Despite their widespread use, a common limitation of most existing designs is that their action is pre-determined during the fabrication process, thereby restricting the ability to modify or alter their function during operation. To address… ▽ More Numerous soft actuators based on PneuNet design have already been proposed and extensively employed across various soft robotics applications in recent years. Despite their widespread use, a common limitation of most existing designs is that their action is pre-determined during the fabrication process, thereby restricting the ability to modify or alter their function during operation. To address this shortcoming, in this article the design of a Reconfigurable, Transformable Soft Pneumatic Actuator (RT-SPA) is proposed. The working principle of the RT-SPA is analogous to the conventional PneuNet. The key distinction between the two lies in the ability of the RT-SPA to undergo controlled transformations, allowing for more versatile bending and twisting motions in various directions. Furthermore, the unique reconfigurable design of the RT-SPA enables the selection of actuation units with different sizes to achieve a diverse range of three-dimensional deformations. This versatility enhances the potential of the RT-SPA for adaptation to a multitude of tasks and environments, setting it apart from traditional PneuNet. The paper begins with a detailed description of the design and fabrication of the RT-SPA. Following this, a series of experiments are conducted to evaluate the performance of the RT-SPA. Finally, the abilities of the RT-SPA for locomotion, gripping, and object manipulation are demonstrated to illustrate the versatility of the RT-SPA across different aspects. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Submitted to Soft Robotics Journal. 12 pages, 10 figures

arXiv:2311.01301 [pdf, other]

TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models

Authors: Javier González, Cliff Wong, Zelalem Gero, Jass Bagga, Risa Ueno, Isabel Chien, Eduard Oravkin, Emre Kiciman, Aditya Nori, Roshanthi Weerasinghe, Rom S. Leidner, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon

Abstract: The rapid digitization of real-world data offers an unprecedented opportunity for optimizing healthcare delivery and accelerating biomedical discovery. In practice, however, such data is most abundantly available in unstructured forms, such as clinical notes in electronic medical records (EMRs), and it is generally plagued by confounders. In this paper, we present TRIALSCOPE, a unifying framework… ▽ More The rapid digitization of real-world data offers an unprecedented opportunity for optimizing healthcare delivery and accelerating biomedical discovery. In practice, however, such data is most abundantly available in unstructured forms, such as clinical notes in electronic medical records (EMRs), and it is generally plagued by confounders. In this paper, we present TRIALSCOPE, a unifying framework for distilling real-world evidence from population-level observational data. TRIALSCOPE leverages biomedical language models to structure clinical text at scale, employs advanced probabilistic modeling for denoising and imputation, and incorporates state-of-the-art causal inference techniques to combat common confounders. Using clinical trial specification as generic representation, TRIALSCOPE provides a turn-key solution to generate and reason with clinical hypotheses using observational data. In extensive experiments and analyses on a large-scale real-world dataset with over one million cancer patients from a large US healthcare network, we show that TRIALSCOPE can produce high-quality structuring of real-world data and generates comparable results to marquee cancer trials. In addition to facilitating in-silicon clinical trial design and optimization, TRIALSCOPE may be used to empower synthetic controls, pragmatic trials, post-market surveillance, as well as support fine-grained patient-like-me reasoning in precision diagnosis and treatment. △ Less

Submitted 6 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 6 Figures, 22 Pages, 3 Tables

arXiv:2310.17894 [pdf, other]

Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey

Authors: Weixu Zhang, Yifei Wang, Yuanfeng Song, Victor Junqiu Wei, Yuxing Tian, Yiyan Qi, Jonathan H. Chan, Raymond Chi-Wing Wong, Haiqin Yang

Abstract: The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This… ▽ More The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models. △ Less

Submitted 19 May, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: 20 pages, 4 figures, 5 tables. Accepted by IEEE TKDE

arXiv:2310.08873 [pdf, other]

Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models

Authors: Zhen Zhang, Anran Lin, Chun Wai Wong, Xiangyu Chu, Qi Dou, K. W. Samuel Au

Abstract: This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we ca… ▽ More This paper proposes an interactive navigation framework by using large language and vision-language models, allowing robots to navigate in environments with traversable obstacles. We utilize the large language model (GPT-3.5) and the open-set Vision-language Model (Grounding DINO) to create an action-aware costmap to perform effective path planning without fine-tuning. With the large models, we can achieve an end-to-end system from textual instructions like "Can you pass through the curtains to deliver medicines to me?", to bounding boxes (e.g., curtains) with action-aware attributes. They can be used to segment LiDAR point clouds into two parts: traversable and untraversable parts, and then an action-aware costmap is constructed for generating a feasible path. The pre-trained large models have great generalization ability and do not require additional annotated data for training, allowing fast deployment in the interactive navigation tasks. We choose to use multiple traversable objects such as curtains and grasses for verification by instructing the robot to traverse them. Besides, traversing curtains in a medical scenario was tested. All experimental results demonstrated the proposed framework's effectiveness and adaptability to diverse environments. △ Less

Submitted 12 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA), 7 pages, 8 figures

arXiv:2310.05370 [pdf, other]

SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction

Authors: Conghao Wong, Beihao Xia, Ziqian Zou, Yulong Wang, Xinge You

Abstract: Analyzing and forecasting trajectories of agents like pedestrians and cars in complex scenes has become more and more significant in many intelligent systems and applications. The diversity and uncertainty in socially interactive behaviors among a rich variety of agents make this task more challenging than other deterministic computer vision tasks. Researchers have made a lot of efforts to quantif… ▽ More Analyzing and forecasting trajectories of agents like pedestrians and cars in complex scenes has become more and more significant in many intelligent systems and applications. The diversity and uncertainty in socially interactive behaviors among a rich variety of agents make this task more challenging than other deterministic computer vision tasks. Researchers have made a lot of efforts to quantify the effects of these interactions on future trajectories through different mathematical models and network structures, but this problem has not been well solved. Inspired by marine animals that localize the positions of their companions underwater through echoes, we build a new anglebased trainable social interaction representation, named SocialCircle, for continuously reflecting the context of social interactions at different angular orientations relative to the target agent. We validate the effect of the proposed SocialCircle by training it along with several newly released trajectory prediction models, and experiments show that the SocialCircle not only quantitatively improves the prediction performance, but also qualitatively helps better simulate social interactions when forecasting pedestrian trajectories in a way that is consistent with human intuitions. △ Less

Submitted 26 March, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: CVPR 2024 accepted

arXiv:2310.04466 [pdf, other]

doi 10.1007/978-3-031-43901-8_35

HartleyMHA: Self-Attention in Frequency Domain for Resolution-Robust and Parameter-Efficient 3D Image Segmentation

Authors: Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood

Abstract: With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but th… ▽ More With the introduction of Transformers, different attention-based models have been proposed for image segmentation with promising results. Although self-attention allows capturing of long-range dependencies, it suffers from a quadratic complexity in the image size especially in 3D. To avoid the out-of-memory error during training, input size reduction is usually required for 3D segmentation, but the accuracy can be suboptimal when the trained models are applied on the original image size. To address this limitation, inspired by the Fourier neural operator (FNO), we introduce the HartleyMHA model which is robust to training image resolution with efficient self-attention. FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We modify the FNO by using the Hartley transform with shared parameters to reduce the model size by orders of magnitude, and this allows us to further apply self-attention in the frequency domain for more expressive high-order feature combination with improved efficiency. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: This paper was accepted by the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2023). arXiv admin note: text overlap with arXiv:2310.03872

arXiv:2310.03872 [pdf, other]

doi 10.1109/ISBI53787.2023.10230586

FNOSeg3D: Resolution-Robust 3D Image Segmentation with Fourier Neural Operator

Authors: Ken C. L. Wong, Hongzhi Wang, Tanveer Syeda-Mahmood

Abstract: Due to the computational complexity of 3D medical image segmentation, training with downsampled images is a common remedy for out-of-memory errors in deep learning. Nevertheless, as standard spatial convolution is sensitive to variations in image resolution, the accuracy of a convolutional neural network trained with downsampled images can be suboptimal when applied on the original resolution. To… ▽ More Due to the computational complexity of 3D medical image segmentation, training with downsampled images is a common remedy for out-of-memory errors in deep learning. Nevertheless, as standard spatial convolution is sensitive to variations in image resolution, the accuracy of a convolutional neural network trained with downsampled images can be suboptimal when applied on the original resolution. To address this limitation, we introduce FNOSeg3D, a 3D segmentation model robust to training image resolution based on the Fourier neural operator (FNO). The FNO is a deep learning framework for learning mappings between functions in partial differential equations, which has the appealing properties of zero-shot super-resolution and global receptive field. We improve the FNO by reducing its parameter requirement and enhancing its learning capability through residual connections and deep supervision, and these result in our FNOSeg3D model which is parameter efficient and resolution robust. When tested on the BraTS'19 dataset, it achieved superior robustness to training image resolution than other tested models with less than 1% of their model parameters. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: This paper was accepted by the IEEE International Symposium on Biomedical Imaging (ISBI) 2023

arXiv:2309.12218 [pdf, other]

SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On

Authors: Ruida Wang, Raymond Chi-Wing Wong, Weile Tan

Abstract: Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies f… ▽ More Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm but they ignore how to optimize the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} out-performs the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, which could be regarded as a significant contribution in the field. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.07921 [pdf, other]

OpenIllumination: A Multi-Illumination Dataset for Inverse Rendering Evaluation on Real Objects

Authors: Isabella Liu, Linghao Chen, Ziyang Fu, Liwen Wu, Haian Jin, Zhong Li, Chin Ming Ryan Wong, Yi Xu, Ravi Ramamoorthi, Zexiang Xu, Hao Su

Abstract: We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. For each image in the dataset, we provide accurate camera parameters, illumination ground truth, and foreground segmentation masks. Our dataset enables the quantitative evaluation of most inverse renderin… ▽ More We introduce OpenIllumination, a real-world dataset containing over 108K images of 64 objects with diverse materials, captured under 72 camera views and a large number of different illuminations. For each image in the dataset, we provide accurate camera parameters, illumination ground truth, and foreground segmentation masks. Our dataset enables the quantitative evaluation of most inverse rendering and material decomposition methods for real objects. We examine several state-of-the-art inverse rendering methods on our dataset and compare their performances. The dataset and code can be found on the project page: https://oppo-us-research.github.io/OpenIllumination. △ Less

Submitted 1 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.07650 [pdf, other]

Automatic Data Visualization Generation from Chinese Natural Language Questions

Authors: Yan Ge, Victor Junqiu Wei, Yuanfeng Song, Jason Chen Zhang, Raymond Chi-Wing Wong

Abstract: Data visualization has emerged as an effective tool for getting insights from massive datasets. Due to the hardness of manipulating the programming languages of data visualization, automatic data visualization generation from natural languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora of research effort on the English Text-to-Vis, studies have yet to be conducted on data… ▽ More Data visualization has emerged as an effective tool for getting insights from massive datasets. Due to the hardness of manipulating the programming languages of data visualization, automatic data visualization generation from natural languages (Text-to-Vis) is becoming increasingly popular. Despite the plethora of research effort on the English Text-to-Vis, studies have yet to be conducted on data visualization generation from questions in Chinese. Motivated by this, we propose a Chinese Text-to-Vis dataset in the paper and demonstrate our first attempt to tackle this problem. Our model integrates multilingual BERT as the encoder, boosts the cross-lingual ability, and infuses the $n$-gram information into our word representation learning. Our experimental results show that our dataset is challenging and deserves further research. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2308.02180 [pdf, other]

Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

Authors: Cliff Wong, Sheng Zhang, Yu Gu, Christine Moung, Jacob Abel, Naoto Usuyama, Roshanthi Weerasinghe, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon

Abstract: Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deplo… ▽ More Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deployment at a large U.S. health network. Initial findings are promising: out of box, cutting-edge LLMs, such as GPT-4, can already structure elaborate eligibility criteria of clinical trials and extract complex matching logic (e.g., nested AND/OR/NOT). While still far from perfect, LLMs substantially outperform prior strong baselines and may serve as a preliminary solution to help triage patient-trial candidates with humans in the loop. Our study also reveals a few significant growth areas for applying LLMs to end-to-end clinical trial matching, such as context limitation and accuracy, especially in structuring patient information from longitudinal medical records. △ Less

Submitted 18 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: 24 pages, 5 figures, accepted at Machine Learning for Healthcare (MLHC) 2023

arXiv:2307.16013 [pdf, other]

Marrying Dialogue Systems with Data Visualization: Interactive Data Visualization Generation from Natural Language Conversations

Authors: Yuanfeng Song, Xuefang Zhao, Raymond Chi-Wing Wong

Abstract: Data visualization (DV) has become the prevailing tool in the market due to its effectiveness into illustrating insights in vast amounts of data. To lower the barrier of using DVs, automatic DV tasks, such as natural language question (NLQ) to visualization translation (formally called text-to-vis), have been investigated in the research community. However, text-to-vis assumes the NLQ to be well-o… ▽ More Data visualization (DV) has become the prevailing tool in the market due to its effectiveness into illustrating insights in vast amounts of data. To lower the barrier of using DVs, automatic DV tasks, such as natural language question (NLQ) to visualization translation (formally called text-to-vis), have been investigated in the research community. However, text-to-vis assumes the NLQ to be well-organized and expressed in a single sentence. However, in real-world settings, complex DV is needed through consecutive exchanges between the DV system and the users. In this paper, we propose a new task named CoVis, short for Conversational text-to-Visualization, aiming at constructing DVs through a series of interactions between users and the system. Since it is the task which has not been studied in the literature, we first build a benchmark dataset named Dial-NVBench, including dialogue sessions with a sequence of queries from a user and responses from the system. Then, we propose a multi-modal neural network named MMCoVisNet to answer these DV-related queries. In particular, MMCoVisNet first fully understands the dialogue context and determines the corresponding responses. Then, it uses adaptive decoders to provide the appropriate replies: (i) a straightforward text decoder is used to produce general responses, (ii) an SQL-form decoder is applied to synthesize data querying responses, and (iii) a DV-form decoder tries to construct the appropriate DVs. We comparatively evaluate MMCoVisNet with other baselines over our proposed benchmark dataset. Experimental results validate that MMCoVisNet performs better than existing baselines and achieves a state-of-the-art performance. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.06439 [pdf, other]

Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

Authors: Yu Gu, Sheng Zhang, Naoto Usuyama, Yonas Woldesenbet, Cliff Wong, Praneeth Sanapathi, Mu Wei, Naveen Valluri, Erika Strandberg, Tristan Naumann, Hoifung Poon

Abstract: Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised le… ▽ More Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised learning, substantial gains can be attained over out-of-box LLMs, with additional advantages such as cost, efficiency, and white-box model access. We conduct a case study on adverse drug event (ADE) extraction, which is an important area for improving care. On standard ADE extraction evaluation, a GPT-3.5 distilled PubMedBERT model attained comparable accuracy as supervised state-of-the-art models without using any labeled data. Despite being over 1,000 times smaller, the distilled model outperformed its teacher GPT-3.5 by over 6 absolute points in F1 and GPT-4 by over 5 absolute points. Ablation studies on distillation model choice (e.g., PubMedBERT vs BioGPT) and ADE extraction architecture shed light on best practice for biomedical knowledge extraction. Similar gains were attained by distillation for other standard biomedical knowledge extraction tasks such as gene-disease associations and protected health information, further illustrating the promise of this approach. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2307.04336 [pdf]

doi 10.1162/dint_a_00200

Source-Aware Embedding Training on Heterogeneous Information Networks

Authors: Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin

Abstract: Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find… ▽ More Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding) -- a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms. △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: Published in Data Intelligence 2023

arXiv:2306.05436 [pdf, other]

Remaining Useful Life Modelling with an Escalator Health Condition Analytic System

Authors: Inez M. Zwetsloot, Yu Lin, Jiaqi Qiu, Lishuai Li, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

Abstract: The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic syste… ▽ More The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic system for escalators to support refurbishment decisions. The analytic system consists of four parts: 1) online data gathering and processing; 2) a dashboard for condition monitoring; 3) a health index model; and 4) remaining useful life prediction. The results can be used for a) predicting the remaining useful life of the escalators, in order to support asset replacement planning and b) monitoring the real-time condition of escalators; including alerts when vibration exceeds the threshold and signal diagnosis, giving an indication of possible root cause (components) of the alert signal. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 14 pages, 12 figures, 7 tables

arXiv:2306.00890 [pdf, other]

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Authors: Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao

Abstract: Conversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs from the public web, but such general-domain vision-language models still lack sophistication in understanding and conversing about biomedical imag… ▽ More Conversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs from the public web, but such general-domain vision-language models still lack sophistication in understanding and conversing about biomedical images. In this paper, we propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The key idea is to leverage a large-scale, broad-coverage biomedical figure-caption dataset extracted from PubMed Central, use GPT-4 to self-instruct open-ended instruction-following data from the captions, and then fine-tune a large general-domain vision-language model using a novel curriculum learning method. Specifically, the model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics using GPT-4 generated instruction-following data, broadly mimicking how a layperson gradually acquires biomedical knowledge. This enables us to train a Large Language and Vision Assistant for BioMedicine (LLaVA-Med) in less than 15 hours (with eight A100s). LLaVA-Med exhibits excellent multimodal conversational capability and can follow open-ended instruction to assist with inquiries about a biomedical image. On three standard biomedical visual question answering datasets, LLaVA-Med outperforms previous supervised state-of-the-art on certain metrics. To facilitate biomedical multimodal research, we will release our instruction-following data and the LLaVA-Med model. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 17 pages; Website: https://aka.ms/llava-med

arXiv:2305.11199 [pdf, other]

At-Admission Prediction of Mortality and Pulmonary Embolism in COVID-19 Patients Using Statistical and Machine Learning Methods: An International Cohort Study

Authors: Munib Mesinovic, Xin Ci Wong, Giri Shan Rajahram, Barbara Wanjiru Citarella, Kalaiarasu M. Peariasamy, Frank van Someren Greve, Piero Olliaro, Laura Merson, Lei Clifton, Christiana Kartsonaki, ISARIC Characterisation Group

Abstract: By September, 2022, more than 600 million cases of SARS-CoV-2 infection have been reported globally, resulting in over 6.5 million deaths. COVID-19 mortality risk estimators are often, however, developed with small unrepresentative samples and with methodological limitations. It is highly important to develop predictive tools for pulmonary embolism (PE) in COVID-19 patients as one of the most seve… ▽ More By September, 2022, more than 600 million cases of SARS-CoV-2 infection have been reported globally, resulting in over 6.5 million deaths. COVID-19 mortality risk estimators are often, however, developed with small unrepresentative samples and with methodological limitations. It is highly important to develop predictive tools for pulmonary embolism (PE) in COVID-19 patients as one of the most severe preventable complications of COVID-19. Using a dataset of more than 800,000 COVID-19 patients from an international cohort, we propose a cost-sensitive gradient-boosted machine learning model that predicts occurrence of PE and death at admission. Logistic regression, Cox proportional hazards models, and Shapley values were used to identify key predictors for PE and death. Our prediction model had a test AUROC of 75.9% and 74.2%, and sensitivities of 67.5% and 72.7% for PE and all-cause mortality respectively on a highly diverse and held-out test set. The PE prediction model was also evaluated on patients in UK and Spain separately with test results of 74.5% AUROC, 63.5% sensitivity and 78.9% AUROC, 95.7% sensitivity. Age, sex, region of admission, comorbidities (chronic cardiac and pulmonary disease, dementia, diabetes, hypertension, cancer, obesity, smoking), and symptoms (any, confusion, chest pain, fatigue, headache, fever, muscle or joint pain, shortness of breath) were the most important clinical predictors at admission. Our machine learning model developed from an international cohort can serve to better regulate hospital risk prioritisation of at-risk patients. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.06403 [pdf, other]

Sensor Observability Analysis for Maximizing Task-Space Observability of Articulated Robots

Authors: Christopher Yee Wong, Wael Suleiman

Abstract: We propose a novel performance metric for articulated robots with distributed directional sensors called the sensor observability analysis (SOA). These robot-mounted distributed directional sensors (e.g., joint torque sensors) change their individual sensing directions as the joints move. SOA transforms individual sensors axes in joint space to provide the cumulative sensing quality of these senso… ▽ More We propose a novel performance metric for articulated robots with distributed directional sensors called the sensor observability analysis (SOA). These robot-mounted distributed directional sensors (e.g., joint torque sensors) change their individual sensing directions as the joints move. SOA transforms individual sensors axes in joint space to provide the cumulative sensing quality of these sensors to observe each task-space axis, akin to forward kinematics for sensors. For example, certain joint configurations may align joint torque sensors in such a way that they are unable to observe interaction forces in one or more task-space axes. The resultant sensor observability performance metrics can then be used in optimization and in null-space control to avoid sensor observability singular configurations or to maximize sensor observability in particular directions. We use the specific case of force sensing in serial robot manipulators to showcase the analysis. Parallels are drawn between sensor observability and the traditional kinematic manipulability; SOA is shown to be more generalizable in terms of analysing non-joint-mounted sensors and can potentially be applied to sensor types other than for force sensing. Simulations and experiments using a custom 3-DOF robot and the Baxter robot demonstrate the utility and importance of sensor observability in physical interactions. △ Less

Submitted 29 May, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 15 pages, 11 figures, journal paper. arXiv admin note: substantial text overlap with arXiv:2206.10798

arXiv:2305.00956 [pdf, other]

Non-Binary LDPC Code Design for Energy-Time Entanglement Quantum Key Distribution

Authors: Debarnab Mitra, Lev Tauz, Murat Can Sarihan, Chee Wei Wong, Lara Dolecek

Abstract: In energy-time entanglement Quantum Key Distribution (QKD), two users extract a shared secret key from the arrival times (discretized as symbols) of entangled photon pairs. In prior work, Zhou et al. proposed a multi-level coding (MLC) scheme that splits the observed symbols into bit layers and utilizes binary Low-Density Parity-Check (LDPC) codes for reconciliation of the symbols. While binary LD… ▽ More In energy-time entanglement Quantum Key Distribution (QKD), two users extract a shared secret key from the arrival times (discretized as symbols) of entangled photon pairs. In prior work, Zhou et al. proposed a multi-level coding (MLC) scheme that splits the observed symbols into bit layers and utilizes binary Low-Density Parity-Check (LDPC) codes for reconciliation of the symbols. While binary LDPC codes offer low latency for key generation, splitting the symbols into bits results in a loss of key generation rate due to error propagation. Additionally, existing LDPC codes do not fully utilize the properties of the QKD channel to optimize the key rates. In this paper, we mitigate the above issues by first generalizing the MLC scheme to a non-binary(NB) MLC scheme that has layers with non-binary symbols and utilizes NB-LDPC codes. We show the NB-MLC scheme offers flexibility in system design. Additionally, we show that the NB-MLC scheme with a small symbol size per layer offers the best trade-off between latency and key rate. We then propose a framework to jointly optimize the rate and degree profile of the NB-LDPC codes that is tailored towards the QKD channel resulting in higher key rates than prior work. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 5 pages, 4 figures, submitted to International Symposium on Topics in Coding

arXiv:2304.14937 [pdf, other]

Contactless hand tremor amplitude measurement using smartphones: development and pilot evaluation

Authors: James Bungay, Osasenaga Emokpae, Samuel D. Relton, Jane Alty, Stefan Williams, Hui Fang, David C. Wong

Abstract: Background: Physiological tremor is defined as an involuntary and rhythmic shaking. Tremor of the hand is a key symptom of multiple neurological diseases, and its frequency and amplitude differs according to both disease type and disease progression. In routine clinical practice, tremor frequency and amplitude are assessed by expert rating using a 0 to 4 integer scale. Such ratings are subjective… ▽ More Background: Physiological tremor is defined as an involuntary and rhythmic shaking. Tremor of the hand is a key symptom of multiple neurological diseases, and its frequency and amplitude differs according to both disease type and disease progression. In routine clinical practice, tremor frequency and amplitude are assessed by expert rating using a 0 to 4 integer scale. Such ratings are subjective and have poor inter-rater reliability. There is thus a clinical need for a practical and accurate method for objectively assessing hand tremor. Objective: to develop a proof of principle method to measure hand tremor amplitude from smartphone videos. Methods: We created a computer vision pipeline that automatically extracts salient points on the hand and produces a 1-D time series of movement due to tremor, in pixels. Using the smartphones' depth measurement, we convert this measure into real distance units. We assessed the accuracy of the method using 60 videos of simulated tremor of different amplitudes from two healthy adults. Videos were taken at distances of 50, 75 and 100 cm between hand and camera. The participants had skin tone II and VI on the Fitzpatrick scale. We compared our method to a gold-standard measurement from a slide rule. Bland-Altman methods agreement analysis indicated a bias of 0.04 cm and 95% limits of agreement from -1.27 to 1.20 cm. Furthermore, we qualitatively observed that the method was robust to differences in skin tone and limited occlusion, such as a band-aid affixed to the participant's hand. Clinical relevance: We have demonstrated how tremor amplitude can be measured from smartphone videos. In conjunction with tremor frequency, this approach could be used to help diagnose and monitor neurological diseases △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Accepted to IEEE EMBC 2023, Sydney (pre-refereed version)

arXiv:2304.05463 [pdf, other]

doi 10.1007/978-3-031-44521-7_2

An Automatic Guidance and Quality Assessment System for Doppler Imaging of Umbilical Artery

Authors: Chun Kit Wong, Manxi Lin, Alberto Raheli, Zahra Bashir, Morten Bo Søndergaard Svendsen, Martin Grønnebæk Tolsgaard, Aasa Feragen, Anders Nymark Christensen

Abstract: Examination of the umbilical artery with Doppler ultrasonography is performed to investigate blood supply to the fetus through the umbilical cord, which is vital for the monitoring of fetal health. Such examination involves several steps that must be performed correctly: identifying suitable sites on the umbilical artery for the measurement, acquiring the blood flow curve in the form of a Doppler… ▽ More Examination of the umbilical artery with Doppler ultrasonography is performed to investigate blood supply to the fetus through the umbilical cord, which is vital for the monitoring of fetal health. Such examination involves several steps that must be performed correctly: identifying suitable sites on the umbilical artery for the measurement, acquiring the blood flow curve in the form of a Doppler spectrum, and ensuring compliance to a set of quality standards. These steps rely heavily on the operator's skill, and the shortage of experienced sonographers has thus created a demand for machine assistance. In this work, we propose an automatic system to fill the gap. By using a modified Faster R-CNN network, we obtain an algorithm that can suggest locations suitable for Doppler measurement. Meanwhile, we have also developed a method for assessment of the Doppler spectrum's quality. The proposed system is validated on 657 images from a national ultrasound screening database, with results demonstrating its potential as a guidance system. △ Less

Submitted 6 July, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: Fetal Ultrasound, Umbilical Artery, Doppler Ultrasound

Journal ref: ASMUS 2023. Simplifying Medical Ultrasound pp 13-22. Lecture Notes in Computer Science, vol 14337

arXiv:2304.05106 [pdf, other]

Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums

Authors: Conghao Wong, Beihao Xia, Qinmu Peng, Xinge You

Abstract: With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more heterogeneous trajectories with different representation forms, such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectorie… ▽ More With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more heterogeneous trajectories with different representation forms, such as 2D or 3D coordinates, 2D or 3D bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call the ``Dimension-Wise Interactions'', would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, which means these methods could not be used to forecast heterogeneous trajectories, not to mention the dimension-wise interaction. Besides, previous methods mostly treat trajectory prediction as a normal time sequence generation task, indicating that these methods may require more work to directly analyze agents' behaviors and social interactions at different temporal scales. In this paper, we bring a new ``view'' for trajectory prediction to model and forecast trajectories hierarchically according to different frequency portions from the spectral domain to learn to forecast trajectories by considering their frequency responses. Moreover, we try to expand the current trajectory prediction task by introducing the dimension $M$ from ``another view'', thus extending its application scenarios to heterogeneous trajectories vertically. Finally, we adopt the bilinear structure to fuse two factors, including the frequency response and the dimension-wise interaction, to forecast heterogeneous trajectories via spectrums hierarchically in a generic way. Experiments show that the proposed model outperforms most state-of-the-art methods on ETH-UCY, Stanford Drone Dataset and nuScenes with heterogeneous trajectories, including 2D coordinates, 2D and 3D bounding boxes. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2303.11899 [pdf, other]

Large-Scale Traffic Signal Control Using Constrained Network Partition and Adaptive Deep Reinforcement Learning

Authors: Hankang Gu, Shangbo Wang, Xiaoguang Ma, Dongyao Jia, Guoqiang Mao, Eng Gee Lim, Cheuk Pong Ryan Wong

Abstract: Multi-agent Deep Reinforcement Learning (MADRL) based traffic signal control becomes a popular research topic in recent years. To alleviate the scalability issue of completely centralized RL techniques and the non-stationarity issue of completely decentralized RL techniques on large-scale traffic networks, some literature utilizes a regional control approach where the whole network is firstly part… ▽ More Multi-agent Deep Reinforcement Learning (MADRL) based traffic signal control becomes a popular research topic in recent years. To alleviate the scalability issue of completely centralized RL techniques and the non-stationarity issue of completely decentralized RL techniques on large-scale traffic networks, some literature utilizes a regional control approach where the whole network is firstly partitioned into multiple disjoint regions, followed by applying the centralized RL approach to each region. However, the existing partitioning rules either have no constraints on the topology of regions or require the same topology for all regions. Meanwhile, no existing regional control approach explores the performance of optimal joint action in an exponentially growing regional action space when intersections are controlled by 4-phase traffic signals (EW, EWL, NS, NSL). In this paper, we propose a novel RL training framework named RegionLight to tackle the above limitations. Specifically, the topology of regions is firstly constrained to a star network which comprises one center and an arbitrary number of leaves. Next, the network partitioning problem is modeled as an optimization problem to minimize the number of regions. Then, an Adaptive Branching Dueling Q-Network (ABDQ) model is proposed to decompose the regional control task into several joint signal control sub-tasks corresponding to particular intersections. Subsequently, these sub-tasks maximize the regional benefits cooperatively. Finally, the global control strategy for the whole network is obtained by concatenating the optimal joint actions of all regions. Experimental results demonstrate the superiority of our proposed framework over all baselines under both real and synthetic datasets in all evaluation metrics. △ Less

Submitted 7 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.00915 [pdf, other]

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Authors: Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon

Abstract: Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel d… ▽ More Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets such as MIMIC-CXR, and spans a diverse range of biomedical image types. PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles. Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP achieved new state-of-the-art results in a wide range of standard datasets, substantially outperforming prior approaches. Intriguingly, by large-scale pretraining on diverse biomedical image types, BiomedCLIP even outperforms state-of-the-art radiology-specific models such as BioViL in radiology-specific tasks such as RSNA pneumonia detection. In summary, BiomedCLIP is a fully open-access foundation model that achieves state-of-the-art performance on various biomedical tasks, paving the way for transformative multimodal biomedical discovery and applications. We release our models at https://aka.ms/biomedclip to facilitate future research in multimodal biomedical AI. △ Less

Submitted 16 January, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: The models are released at https://aka.ms/biomedclip

arXiv:2302.01753 [pdf]

A Process Model to Improve Information Security Governance in Organisations

Authors: Chee Kong Wong

Abstract: Information security governance (ISG) is a relatively new and under-researched topic. A review of literature shows the lack of an ISG framework or model that can help the implementation of ISG. This research aims to introduce an empirically grounded ISG process model as a practical reference to facilitate the implementation of ISG in organisations. This research has adopted an exploratory resear… ▽ More Information security governance (ISG) is a relatively new and under-researched topic. A review of literature shows the lack of an ISG framework or model that can help the implementation of ISG. This research aims to introduce an empirically grounded ISG process model as a practical reference to facilitate the implementation of ISG in organisations. This research has adopted an exploratory research approach where a conceptual ISG process model was proposed based on synthesis of extant literature and detailed review of relevant frameworks and models. The conceptual ISG process model was subsequently refined based on empirical data gathered from 3 case study organisations. The refined ISG process model was finally validated in 6 expert interviews. This research has developed an empirically grounded ISG process model identifying stakeholder groups and explaining how core ISG processes and sub-processes interact. Specifically, the research contributes by: (1) developing ISG process theory, as ISG is a series of events occurring within an organisational context; and (2) developing an information-processing perspective on ISG, as the process model identifies the information and communication flows, and the relationships among stakeholder groups. In addition, the research has: (3) empirically examined and validated the ISG process model based on how ISG is practised in real-world organisations; (4) examined corporate governance theories to provide additional perspectives to ensure that the ISG process model is aligned with corporate governance objectives; (5) identified additional factors that influence the implementation of ISG requiring further research; and finally (6) expanded existing seminal research by introducing an empirically grounded ISG process model that has been developed based on synthesis of cumulative knowledge from previous research and validated with empirical data. △ Less

Submitted 26 January, 2023; originally announced February 2023.

Comments: 313 pages, PhD Thesis

arXiv:2302.01518 [pdf, other]

doi 10.1109/IJCNN54540.2023.10191236

LSA-PINN: Linear Boundary Connectivity Loss for Solving PDEs on Complex Geometry

Authors: Jian Cheng Wong, Pao-Hsiung Chiu, Chinchun Ooi, My Ha Dao, Yew-Soon Ong

Abstract: We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate… ▽ More We present a novel loss formulation for efficient learning of complex dynamics from governing physics, typically described by partial differential equations (PDEs), using physics-informed neural networks (PINNs). In our experiments, existing versions of PINNs are seen to learn poorly in many problems, especially for complex geometries, as it becomes increasingly difficult to establish appropriate sampling strategy at the near boundary region. Overly dense sampling can adversely impede training convergence if the local gradient behaviors are too complex to be adequately modelled by PINNs. On the other hand, if the samples are too sparse, existing PINNs tend to overfit the near boundary region, leading to incorrect solution. To prevent such issues, we propose a new Boundary Connectivity (BCXN) loss function which provides linear local structure approximation (LSA) to the gradient behaviors at the boundary for PINN. Our BCXN-loss implicitly imposes local structure during training, thus facilitating fast physics-informed learning across entire problem domains with order of magnitude sparser training samples. This LSA-PINN method shows a few orders of magnitude smaller errors than existing methods in terms of the standard L2-norm metric, while using dramatically fewer training samples and iterations. Our proposed LSA-PINN does not pose any requirement on the differentiable property of the networks, and we demonstrate its benefits and ease of implementation on both multi-layer perceptron and convolutional neural network versions as commonly used in current PINN literature. △ Less

Submitted 2 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

Comments: 11 pages, 7 figures

Journal ref: 2023 International Joint Conference on Neural Networks (IJCNN)

Showing 1–50 of 194 results for author: Wong, C