-
Mobile Robot Oriented Large-Scale Indoor Dataset for Dynamic Scene Understanding
Authors:
Yifan Tang,
Cong Tai,
Fangxing Chen,
Wanting Zhang,
Tao Zhang,
Xueping Liu,
Yongjin Liu,
Long Zeng
Abstract:
Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed,…
▽ More
Most existing robotic datasets capture static scene data and thus are limited in evaluating robots' dynamic performance. To address this, we present a mobile robot oriented large-scale indoor dataset, denoted as THUD (Tsinghua University Dynamic) robotic dataset, for training and evaluating their dynamic scene understanding algorithms. Specifically, the THUD dataset construction is first detailed, including organization, acquisition, and annotation methods. It comprises both real-world and synthetic data, collected with a real robot platform and a physical simulation platform, respectively. Our current dataset includes 13 larges-scale dynamic scenarios, 90K image frames, 20M 2D/3D bounding boxes of static and dynamic objects, camera poses, and IMU. The dataset is still continuously expanding. Then, the performance of mainstream indoor scene understanding tasks, e.g. 3D object detection, semantic segmentation, and robot relocalization, is evaluated on our THUD dataset. These experiments reveal serious challenges for some robot scene understanding tasks in dynamic scenes. By sharing this dataset, we aim to foster and iterate new mobile robot algorithms quickly for robot actual working dynamic environment, i.e. complex crowded dynamic scenes.
△ Less
Submitted 30 June, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Authors:
Holy Lovenia,
Rahmad Mahendra,
Salsabil Maulana Akbar,
Lester James V. Miranda,
Jennifer Santoso,
Elyanah Aco,
Akhdan Fadhilah,
Jonibek Mansurov,
Joseph Marvin Imperial,
Onno P. Kampman,
Joel Ruben Antony Moniz,
Muhammad Ravi Shulthan Habibi,
Frederikus Hudi,
Railey Montalan,
Ryan Ignatius,
Joanito Agili Lopo,
William Nixon,
Börje F. Karlsson,
James Jaya,
Ryandito Diandaru,
Yuze Gao,
Patrick Amadeus,
Bin Wang,
Jan Christian Blaise Cruz,
Chenxi Whitehouse
, et al. (36 additional authors not shown)
Abstract:
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due t…
▽ More
Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models suffer from a significant lack of representation of texts, images, and audio datasets from SEA, compromising the quality of AI models for SEA languages. Evaluating models for SEA languages is challenging due to the scarcity of high-quality datasets, compounded by the dominance of English training data, raising concerns about potential cultural misrepresentation. To address these challenges, we introduce SEACrowd, a collaborative initiative that consolidates a comprehensive resource hub that fills the resource gap by providing standardized corpora in nearly 1,000 SEA languages across three modalities. Through our SEACrowd benchmarks, we assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA. Furthermore, we propose strategies to facilitate greater AI advancements, maximizing potential utility and resource equity for the future of AI in SEA.
△ Less
Submitted 5 July, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Optimizing Synthetic Correlated Diffusion Imaging for Breast Cancer Tumour Delineation
Authors:
Chi-en Amy Tai,
Alexander Wong
Abstract:
Breast cancer is a significant cause of death from cancer in women globally, highlighting the need for improved diagnostic imaging to enhance patient outcomes. Accurate tumour identification is essential for diagnosis, treatment, and monitoring, emphasizing the importance of advanced imaging technologies that provide detailed views of tumour characteristics and disease. Synthetic correlated diffus…
▽ More
Breast cancer is a significant cause of death from cancer in women globally, highlighting the need for improved diagnostic imaging to enhance patient outcomes. Accurate tumour identification is essential for diagnosis, treatment, and monitoring, emphasizing the importance of advanced imaging technologies that provide detailed views of tumour characteristics and disease. Synthetic correlated diffusion imaging (CDI$^s$) is a recent method that has shown promise for prostate cancer delineation compared to current MRI images. In this paper, we explore tuning the coefficients in the computation of CDI$^s$ for breast cancer tumour delineation by maximizing the area under the receiver operating characteristic curve (AUC) using a Nelder-Mead simplex optimization strategy. We show that the best AUC is achieved by the CDI$^s$ - Optimized modality, outperforming the best gold-standard modality by 0.0044. Notably, the optimized CDI$^s$ modality also achieves AUC values over 0.02 higher than the Unoptimized CDI$^s$ value, demonstrating the importance of optimizing the CDI$^s$ exponents for the specific cancer application.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Enhancing Clinically Significant Prostate Cancer Prediction in T2-weighted Images through Transfer Learning from Breast Cancer
Authors:
Chi-en Amy Tai,
Alexander Wong
Abstract:
In 2020, prostate cancer saw a staggering 1.4 million new cases, resulting in over 375,000 deaths. The accurate identification of clinically significant prostate cancer is crucial for delivering effective treatment to patients. Consequently, there has been a surge in research exploring the application of deep neural networks to predict clinical significance based on magnetic resonance images. Howe…
▽ More
In 2020, prostate cancer saw a staggering 1.4 million new cases, resulting in over 375,000 deaths. The accurate identification of clinically significant prostate cancer is crucial for delivering effective treatment to patients. Consequently, there has been a surge in research exploring the application of deep neural networks to predict clinical significance based on magnetic resonance images. However, these networks demand extensive datasets to attain optimal performance. Recently, transfer learning emerged as a technique that leverages acquired features from a domain with richer data to enhance the performance of a domain with limited data. In this paper, we investigate the improvement of clinically significant prostate cancer prediction in T2-weighted images through transfer learning from breast cancer. The results demonstrate a remarkable improvement of over 30% in leave-one-out cross-validation accuracy.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Improving Breast Cancer Grade Prediction with Multiparametric MRI Created Using Optimized Synthetic Correlated Diffusion Imaging
Authors:
Chi-en Amy Tai,
Alexander Wong
Abstract:
Breast cancer was diagnosed for over 7.8 million women between 2015 to 2020. Grading plays a vital role in breast cancer treatment planning. However, the current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs. A recent paper leveraging volumetric deep radiomic features from synthetic correlated diffusion imaging (CDI$^s$) for br…
▽ More
Breast cancer was diagnosed for over 7.8 million women between 2015 to 2020. Grading plays a vital role in breast cancer treatment planning. However, the current tumor grading method involves extracting tissue from patients, leading to stress, discomfort, and high medical costs. A recent paper leveraging volumetric deep radiomic features from synthetic correlated diffusion imaging (CDI$^s$) for breast cancer grade prediction showed immense promise for noninvasive methods for grading. Motivated by the impact of CDI$^s$ optimization for prostate cancer delineation, this paper examines using optimized CDI$^s$ to improve breast cancer grade prediction. We fuse the optimized CDI$^s$ signal with diffusion-weighted imaging (DWI) to create a multiparametric MRI for each patient. Using a larger patient cohort and training across all the layers of a pretrained MONAI model, we achieve a leave-one-out cross-validation accuracy of 95.79%, over 8% higher compared to that previously reported.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Using Multiparametric MRI with Optimized Synthetic Correlated Diffusion Imaging to Enhance Breast Cancer Pathologic Complete Response Prediction
Authors:
Chi-en Amy Tai,
Alexander Wong
Abstract:
In 2020, 685,000 deaths across the world were attributed to breast cancer, underscoring the critical need for innovative and effective breast cancer treatment. Neoadjuvant chemotherapy has recently gained popularity as a promising treatment strategy for breast cancer, attributed to its efficacy in shrinking large tumors and leading to pathologic complete response. However, the current process to r…
▽ More
In 2020, 685,000 deaths across the world were attributed to breast cancer, underscoring the critical need for innovative and effective breast cancer treatment. Neoadjuvant chemotherapy has recently gained popularity as a promising treatment strategy for breast cancer, attributed to its efficacy in shrinking large tumors and leading to pathologic complete response. However, the current process to recommend neoadjuvant chemotherapy relies on the subjective evaluation of medical experts which contain inherent biases and significant uncertainty. A recent study, utilizing volumetric deep radiomic features extracted from synthetic correlated diffusion imaging (CDI$^s$), demonstrated significant potential in noninvasive breast cancer pathologic complete response prediction. Inspired by the positive outcomes of optimizing CDI$^s$ for prostate cancer delineation, this research investigates the application of optimized CDI$^s$ to enhance breast cancer pathologic complete response prediction. Using multiparametric MRI that fuses optimized CDI$^s$ with diffusion-weighted imaging (DWI), we obtain a leave-one-out cross-validation accuracy of 93.28%, over 5.5% higher than that previously reported.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
NutritionVerse-Direct: Exploring Deep Neural Networks for Multitask Nutrition Prediction from Food Images
Authors:
Matthew Keller,
Chi-en Amy Tai,
Yuhao Chen,
Pengcheng Xi,
Alexander Wong
Abstract:
Many aging individuals encounter challenges in effectively tracking their dietary intake, exacerbating their susceptibility to nutrition-related health complications. Self-reporting methods are often inaccurate and suffer from substantial bias; however, leveraging intelligent prediction methods can automate and enhance precision in this process. Recent work has explored using computer vision predi…
▽ More
Many aging individuals encounter challenges in effectively tracking their dietary intake, exacerbating their susceptibility to nutrition-related health complications. Self-reporting methods are often inaccurate and suffer from substantial bias; however, leveraging intelligent prediction methods can automate and enhance precision in this process. Recent work has explored using computer vision prediction systems to predict nutritional information from food images. Still, these methods are often tailored to specific situations, require other inputs in addition to a food image, or do not provide comprehensive nutritional information.
This paper aims to enhance the efficacy of dietary intake estimation by leveraging various neural network architectures to directly predict a meal's nutritional content from its image. Through comprehensive experimentation and evaluation, we present NutritionVerse-Direct, a model utilizing a vision transformer base architecture with three fully connected layers that lead to five regression heads predicting calories (kcal), mass (g), protein (g), fat (g), and carbohydrates (g) present in a meal. NutritionVerse-Direct yields a combined mean average error score on the NutritionVerse-Real dataset of 412.6, an improvement of 25.5% over the Inception-ResNet model, demonstrating its potential for improving dietary intake estimation accuracy.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering
Authors:
Hongda Sun,
Yuxuan Liu,
Chengwei Wu,
Haiyu Yan,
Cheng Tai,
Xin Gao,
Shuo Shang,
Rui Yan
Abstract:
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, nei…
▽ More
Open-domain question answering (ODQA) has emerged as a pivotal research spotlight in information systems. Existing methods follow two main paradigms to collect evidence: (1) The \textit{retrieve-then-read} paradigm retrieves pertinent documents from an external corpus; and (2) the \textit{generate-then-read} paradigm employs large language models (LLMs) to generate relevant documents. However, neither can fully address multifaceted requirements for evidence. To this end, we propose LLMQA, a generalized framework that formulates the ODQA process into three basic steps: query expansion, document selection, and answer generation, combining the superiority of both retrieval-based and generation-based evidence. Since LLMs exhibit their excellent capabilities to accomplish various tasks, we instruct LLMs to play multiple roles as generators, rerankers, and evaluators within our framework, integrating them to collaborate in the ODQA process. Furthermore, we introduce a novel prompt optimization algorithm to refine role-playing prompts and steer LLMs to produce higher-quality evidence and answers. Extensive experimental results on widely used benchmarks (NQ, WebQ, and TriviaQA) demonstrate that LLMQA achieves the best performance in terms of both answer accuracy and evidence quality, showcasing its potential for advancing ODQA research and applications.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
NutritionVerse-Real: An Open Access Manually Collected 2D Food Scene Dataset for Dietary Intake Estimation
Authors:
Chi-en Amy Tai,
Saeejith Nair,
Olivia Markham,
Matthew Keller,
Yifan Wu,
Yuhao Chen,
Alexander Wong
Abstract:
Dietary intake estimation plays a crucial role in understanding the nutritional habits of individuals and populations, aiding in the prevention and management of diet-related health issues. Accurate estimation requires comprehensive datasets of food scenes, including images, segmentation masks, and accompanying dietary intake metadata. In this paper, we introduce NutritionVerse-Real, an open acces…
▽ More
Dietary intake estimation plays a crucial role in understanding the nutritional habits of individuals and populations, aiding in the prevention and management of diet-related health issues. Accurate estimation requires comprehensive datasets of food scenes, including images, segmentation masks, and accompanying dietary intake metadata. In this paper, we introduce NutritionVerse-Real, an open access manually collected 2D food scene dataset for dietary intake estimation with 889 images of 251 distinct dishes and 45 unique food types. The NutritionVerse-Real dataset was created by manually collecting images of food scenes in real life, measuring the weight of every ingredient and computing the associated dietary content of each dish using the ingredient weights and nutritional information from the food packaging or the Canada Nutrient File. Segmentation masks were then generated through human labelling of the images. We provide further analysis on the data diversity to highlight potential biases when using this data to develop models for dietary intake estimation. NutritionVerse-Real is publicly available at https://www.kaggle.com/datasets/nutritionverse/nutritionverse-real as part of an open initiative to accelerate machine learning for dietary sensing.
△ Less
Submitted 20 November, 2023;
originally announced January 2024.
-
NutritionVerse-Synth: An Open Access Synthetically Generated 2D Food Scene Dataset for Dietary Intake Estimation
Authors:
Saeejith Nair,
Chi-en Amy Tai,
Yuhao Chen,
Alexander Wong
Abstract:
Manually tracking nutritional intake via food diaries is error-prone and burdensome. Automated computer vision techniques show promise for dietary monitoring but require large and diverse food image datasets. To address this need, we introduce NutritionVerse-Synth (NV-Synth), a large-scale synthetic food image dataset. NV-Synth contains 84,984 photorealistic meal images rendered from 7,082 dynamic…
▽ More
Manually tracking nutritional intake via food diaries is error-prone and burdensome. Automated computer vision techniques show promise for dietary monitoring but require large and diverse food image datasets. To address this need, we introduce NutritionVerse-Synth (NV-Synth), a large-scale synthetic food image dataset. NV-Synth contains 84,984 photorealistic meal images rendered from 7,082 dynamically plated 3D scenes. Each scene is captured from 12 viewpoints and includes perfect ground truth annotations such as RGB, depth, semantic, instance, and amodal segmentation masks, bounding boxes, and detailed nutritional information per food item. We demonstrate the diversity of NV-Synth across foods, compositions, viewpoints, and lighting. As the largest open-source synthetic food dataset, NV-Synth highlights the value of physics-based simulations for enabling scalable and controllable generation of diverse photorealistic meal images to overcome data limitations and drive advancements in automated dietary assessment using computer vision. In addition to the dataset, the source code for our data generation framework is also made publicly available at https://saeejithnair.github.io/nvsynth.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
FoodFusion: A Latent Diffusion Model for Realistic Food Image Generation
Authors:
Olivia Markham,
Yuhao Chen,
Chi-en Amy Tai,
Alexander Wong
Abstract:
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images. However, these generated images often exhibit an artistic or surreal quality that diverges from the authenticity of real-world food representations. This inadequacy renders them impractical for applications requiring realistic food…
▽ More
Current state-of-the-art image generation models such as Latent Diffusion Models (LDMs) have demonstrated the capacity to produce visually striking food-related images. However, these generated images often exhibit an artistic or surreal quality that diverges from the authenticity of real-world food representations. This inadequacy renders them impractical for applications requiring realistic food imagery, such as training models for image-based dietary assessment. To address these limitations, we introduce FoodFusion, a Latent Diffusion model engineered specifically for the faithful synthesis of realistic food images from textual descriptions. The development of the FoodFusion model involves harnessing an extensive array of open-source food datasets, resulting in over 300,000 curated image-caption pairs. Additionally, we propose and employ two distinct data cleaning methodologies to ensure that the resulting image-text pairs maintain both realism and accuracy. The FoodFusion model, thus trained, demonstrates a remarkable ability to generate food images that exhibit a significant improvement in terms of both realism and diversity over the publicly available image generation models. We openly share the dataset and fine-tuned models to support advancements in this critical field of food image synthesis at https://bit.ly/genai4good.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection
Authors:
Cheng-Ju Ho,
Chen-Hsuan Tai,
Yen-Yu Lin,
Ming-Hsuan Yang,
Yi-Hsuan Tsai
Abstract:
Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations. Existing methods typically employ a teacher-student framework with pseudo-labeling to leverage unlabeled point clouds. However, producing reliable pseudo-labels in a diverse 3D space still remains challenging. In this work, we propose D…
▽ More
Semi-supervised object detection is crucial for 3D scene understanding, efficiently addressing the limitation of acquiring large-scale 3D bounding box annotations. Existing methods typically employ a teacher-student framework with pseudo-labeling to leverage unlabeled point clouds. However, producing reliable pseudo-labels in a diverse 3D space still remains challenging. In this work, we propose Diffusion-SS3D, a new perspective of enhancing the quality of pseudo-labels via the diffusion model for semi-supervised 3D object detection. Specifically, we include noises to produce corrupted 3D object size and class label distributions, and then utilize the diffusion model as a denoising process to obtain bounding box outputs. Moreover, we integrate the diffusion model into the teacher-student framework, so that the denoised bounding boxes can be used to improve pseudo-label generation, as well as the entire semi-supervised learning process. We conduct experiments on the ScanNet and SUN RGB-D benchmark datasets to demonstrate that our approach achieves state-of-the-art performance against existing methods. We also present extensive analysis to understand how our diffusion model design affects performance in semi-supervised learning.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted Imaging Data via Anatomic-Conditional Controlled Latent Diffusion
Authors:
Aditya Sridhar,
Chi-en Amy Tai,
Hayden Gunraj,
Yuhao Chen,
Alexander Wong
Abstract:
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022. Due to recent successes in leveraging machine learning for clinical decision support, there has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imagi…
▽ More
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022. Due to recent successes in leveraging machine learning for clinical decision support, there has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imaging (DWI) data. A major challenge hindering widespread adoption in clinical use is poor generalization of such networks due to scarcity of large-scale, diverse, balanced prostate imaging datasets for training such networks. In this study, we explore the efficacy of latent diffusion for generating realistic prostate DWI data through the introduction of an anatomic-conditional controlled latent diffusion strategy. To the best of the authors' knowledge, this is the first study to leverage conditioning for synthesis of prostate cancer imaging. Experimental results show that the proposed strategy, which we call Cancer-Net PCa-Gen, enhances synthesis of diverse prostate images through controllable tumour locations and better anatomical and textural fidelity. These crucial features make it well-suited for augmenting real patient data, enabling neural networks to be trained on a more diverse and comprehensive data distribution. The Cancer-Net PCa-Gen framework and sample images have been made publicly available at https://www.kaggle.com/datasets/deetsadi/cancer-net-pca-gen-dataset as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
COVIDx CXR-4: An Expanded Multi-Institutional Open-Source Benchmark Dataset for Chest X-ray Image-Based Computer-Aided COVID-19 Diagnostics
Authors:
Yifan Wu,
Hayden Gunraj,
Chi-en Amy Tai,
Alexander Wong
Abstract:
The global ramifications of the COVID-19 pandemic remain significant, exerting persistent pressure on nations even three years after its initial outbreak. Deep learning models have shown promise in improving COVID-19 diagnostics but require diverse and larger-scale datasets to improve performance. In this paper, we introduce COVIDx CXR-4, an expanded multi-institutional open-source benchmark datas…
▽ More
The global ramifications of the COVID-19 pandemic remain significant, exerting persistent pressure on nations even three years after its initial outbreak. Deep learning models have shown promise in improving COVID-19 diagnostics but require diverse and larger-scale datasets to improve performance. In this paper, we introduce COVIDx CXR-4, an expanded multi-institutional open-source benchmark dataset for chest X-ray image-based computer-aided COVID-19 diagnostics. COVIDx CXR-4 expands significantly on the previous COVIDx CXR-3 dataset by increasing the total patient cohort size by greater than 2.66 times, resulting in 84,818 images from 45,342 patients across multiple institutions. We provide extensive analysis on the diversity of the patient demographic, imaging metadata, and disease distributions to highlight potential dataset biases. To the best of the authors' knowledge, COVIDx CXR-4 is the largest and most diverse open-source COVID-19 CXR dataset and is made publicly available as part of an open initiative to advance research to aid clinicians against the COVID-19 disease.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Double-Condensing Attention Condenser: Leveraging Attention in Deep Learning to Detect Skin Cancer from Skin Lesion Images
Authors:
Chi-en Amy Tai,
Elizabeth Janes,
Chris Czarnecki,
Alexander Wong
Abstract:
Skin cancer is the most common type of cancer in the United States and is estimated to affect one in five Americans. Recent advances have demonstrated strong performance on skin cancer detection, as exemplified by state of the art performance in the SIIM-ISIC Melanoma Classification Challenge; however these solutions leverage ensembles of complex deep neural architectures requiring immense storage…
▽ More
Skin cancer is the most common type of cancer in the United States and is estimated to affect one in five Americans. Recent advances have demonstrated strong performance on skin cancer detection, as exemplified by state of the art performance in the SIIM-ISIC Melanoma Classification Challenge; however these solutions leverage ensembles of complex deep neural architectures requiring immense storage and compute costs, and therefore may not be tractable. A recent movement for TinyML applications is integrating Double-Condensing Attention Condensers (DC-AC) into a self-attention neural network backbone architecture to allow for faster and more efficient computation. This paper explores leveraging an efficient self-attention structure to detect skin cancer in skin lesion images and introduces a deep neural network design with DC-AC customized for skin cancer detection from skin lesion images. The final model is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Cancer-Net PCa-Data: An Open-Source Benchmark Dataset for Prostate Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data
Authors:
Hayden Gunraj,
Chi-en Amy Tai,
Alexander Wong
Abstract:
The recent introduction of synthetic correlated diffusion (CDI$^s$) imaging has demonstrated significant potential in the realm of clinical decision support for prostate cancer (PCa). CDI$^s$ is a new form of magnetic resonance imaging (MRI) designed to characterize tissue characteristics through the joint correlation of diffusion signal attenuation across different Brownian motion sensitivities.…
▽ More
The recent introduction of synthetic correlated diffusion (CDI$^s$) imaging has demonstrated significant potential in the realm of clinical decision support for prostate cancer (PCa). CDI$^s$ is a new form of magnetic resonance imaging (MRI) designed to characterize tissue characteristics through the joint correlation of diffusion signal attenuation across different Brownian motion sensitivities. Despite the performance improvement, the CDI$^s$ data for PCa has not been previously made publicly available. In our commitment to advance research efforts for PCa, we introduce Cancer-Net PCa-Data, an open-source benchmark dataset of volumetric CDI$^s$ imaging data of PCa patients. Cancer-Net PCa-Data consists of CDI$^s$ volumetric images from a patient cohort of 200 patient cases, along with full annotations (gland masks, tumor masks, and PCa diagnosis for each tumor). We also analyze the demographic and label region diversity of Cancer-Net PCa-Data for potential biases. Cancer-Net PCa-Data is the first-ever public dataset of CDI$^s$ imaging data for PCa, and is a part of the global open-source initiative dedicated to advancement in machine learning and imaging research to aid clinicians in the global fight against cancer.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
Authors:
Zhe Ni,
Xiaoxin Deng,
Cong Tai,
Xinyue Zhu,
Qinghongbing Xie,
Weihang Huang,
Xiang Wu,
Long Zeng
Abstract:
Recent works have shown that Large Language Models (LLMs) can facilitate the grounding of instructions for robotic task planning. Despite this progress, most existing works have primarily focused on utilizing raw images to aid LLMs in understanding environmental information. However, this approach not only limits the scope of observation but also typically necessitates extensive multimodal data co…
▽ More
Recent works have shown that Large Language Models (LLMs) can facilitate the grounding of instructions for robotic task planning. Despite this progress, most existing works have primarily focused on utilizing raw images to aid LLMs in understanding environmental information. However, this approach not only limits the scope of observation but also typically necessitates extensive multimodal data collection and large-scale models. In this paper, we propose a novel approach called Graph-based Robotic Instruction Decomposer (GRID), which leverages scene graphs instead of images to perceive global scene information and iteratively plan subtasks for a given instruction. Our method encodes object attributes and relationships in graphs through an LLM and Graph Attention Networks, integrating instruction features to predict subtasks consisting of pre-defined robot actions and target objects in the scene graph. This strategy enables robots to acquire semantic knowledge widely observed in the environment from the scene graph. To train and evaluate GRID, we establish a dataset construction pipeline to generate synthetic datasets for graph-based robotic task planning. Experiments have shown that our method outperforms GPT-4 by over 25.4% in subtask accuracy and 43.6% in task accuracy. Moreover, our method achieves a real-time speed of 0.11s per inference. Experiments conducted on datasets of unseen scenes and scenes with varying numbers of objects demonstrate that the task accuracy of GRID declined by at most 3.8%, showcasing its robust cross-scene generalization ability. We validate our method in both physical simulation and the real world. More details can be found on the project page https://jackyzengl.github.io/GRID.github.io/.
△ Less
Submitted 10 March, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches
Authors:
Chi-en Amy Tai,
Matthew Keller,
Saeejith Nair,
Yuhao Chen,
Yifan Wu,
Olivia Markham,
Krish Parmar,
Pengcheng Xi,
Heather Keller,
Sharon Kirkpatrick,
Alexander Wong
Abstract:
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs an…
▽ More
Accurate dietary intake estimation is critical for informing policies and programs to support healthy eating, as malnutrition has been directly linked to decreased quality of life. However self-reporting methods such as food diaries suffer from substantial bias. Other conventional dietary assessment techniques and emerging alternative approaches such as mobile applications incur high time costs and may necessitate trained personnel. Recent work has focused on using computer vision and machine learning to automatically estimate dietary intake from food images, but the lack of comprehensive datasets with diverse viewpoints, modalities and food annotations hinders the accuracy and realism of such methods. To address this limitation, we introduce NutritionVerse-Synth, the first large-scale dataset of 84,984 photorealistic synthetic 2D food images with associated dietary information and multimodal annotations (including depth images, instance masks, and semantic masks). Additionally, we collect a real image dataset, NutritionVerse-Real, containing 889 images of 251 dishes to evaluate realism. Leveraging these novel datasets, we develop and benchmark NutritionVerse, an empirical study of various dietary intake estimation approaches, including indirect segmentation-based and direct prediction networks. We further fine-tune models pretrained on synthetic data with real images to provide insights into the fusion of synthetic and real data. Finally, we release both datasets (NutritionVerse-Synth, NutritionVerse-Real) on https://www.kaggle.com/nutritionverse/datasets as part of an open initiative to accelerate machine learning for dietary sensing.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System
Authors:
Lingbo Mo,
Shijie Chen,
Ziru Chen,
Xiang Deng,
Ashley Lewis,
Sunit Singh,
Samuel Stevens,
Chang-You Tai,
Zhen Wang,
Xiang Yue,
Tianshu Zhang,
Yu Su,
Huan Sun
Abstract:
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, Ta…
▽ More
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, TacoBot ensures efficient task assistance. To enhance the dialogue experience, we explore a series of data augmentation strategies using LLMs to train advanced neural models continuously. TacoBot builds upon our successful participation in the inaugural Alexa Prize TaskBot Challenge, where our team secured third place among ten competing teams. We offer TacoBot as an open-source framework that serves as a practical example for deploying task-oriented dialogue systems.
△ Less
Submitted 29 July, 2023;
originally announced July 2023.
-
Exploring Chain-of-Thought Style Prompting for Text-to-SQL
Authors:
Chang-You Tai,
Ziru Chen,
Tianshu Zhang,
Xiang Deng,
Huan Sun
Abstract:
In-context learning with large language models (LLMs) has recently caught increasing attention due to its superior few-shot performance on various tasks. However, its performance on text-to-SQL parsing still has much room for improvement. In this paper, we hypothesize that a crucial aspect of LLMs to improve for text-to-SQL parsing is their multi-step reasoning ability. Thus, we systematically stu…
▽ More
In-context learning with large language models (LLMs) has recently caught increasing attention due to its superior few-shot performance on various tasks. However, its performance on text-to-SQL parsing still has much room for improvement. In this paper, we hypothesize that a crucial aspect of LLMs to improve for text-to-SQL parsing is their multi-step reasoning ability. Thus, we systematically study how to enhance LLMs' reasoning ability through chain of thought (CoT) style prompting, including the original chain-of-thought prompting (Wei et al., 2022b) and least-to-most prompting (Zhou et al., 2023). Our experiments demonstrate that iterative prompting as in Zhou et al. (2023) may be unnecessary for text-to-SQL parsing, and using detailed reasoning steps tends to have more error propagation issues. Based on these findings, we propose a new CoT-style prompting method for text-to-SQL parsing. It brings 5.2 and 6.5 point absolute gains on the Spider development set and the Spider Realistic set, respectively, compared to the standard prompting method without reasoning steps; 2.4 and 1.5 point absolute gains, compared to the least-to-most prompting method.
△ Less
Submitted 27 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Cancer-Net BCa-S: Breast Cancer Grade Prediction using Volumetric Deep Radiomic Features from Synthetic Correlated Diffusion Imaging
Authors:
Chi-en Amy Tai,
Hayden Gunraj,
Alexander Wong
Abstract:
The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023. However, there are different levels of severity of breast cancer requiring different treatment strategies, and hence, grading breast cancer has become a vital component of breast cancer diagnosis and treatment planning. Specifically, the gold-standard Scarff-Bloom-Richardson (SBR) grade…
▽ More
The prevalence of breast cancer continues to grow, affecting about 300,000 females in the United States in 2023. However, there are different levels of severity of breast cancer requiring different treatment strategies, and hence, grading breast cancer has become a vital component of breast cancer diagnosis and treatment planning. Specifically, the gold-standard Scarff-Bloom-Richardson (SBR) grade has been shown to consistently indicate a patient's response to chemotherapy. Unfortunately, the current method to determine the SBR grade requires removal of some cancer cells from the patient which can lead to stress and discomfort along with costly expenses. In this paper, we study the efficacy of deep learning for breast cancer grading based on synthetic correlated diffusion (CDI$^s$) imaging, a new magnetic resonance imaging (MRI) modality and found that it achieves better performance on SBR grade prediction compared to those learnt using gold-standard imaging modalities. Hence, we introduce Cancer-Net BCa-S, a volumetric deep radiomics approach for predicting SBR grade based on volumetric CDI$^s$ data. Given the promising results, this proposed method to identify the severity of the cancer would allow for better treatment decisions without the need for a biopsy. Cancer-Net BCa-S has been made publicly available as part of a global open-source initiative for advancing machine learning for cancer care.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
A Multi-Institutional Open-Source Benchmark Dataset for Breast Cancer Clinical Decision Support using Synthetic Correlated Diffusion Imaging Data
Authors:
Chi-en Amy Tai,
Hayden Gunraj,
Alexander Wong
Abstract:
Recently, a new form of magnetic resonance imaging (MRI) called synthetic correlated diffusion (CDI$^s$) imaging was introduced and showed considerable promise for clinical decision support for cancers such as prostate cancer when compared to current gold-standard MRI techniques. However, the efficacy for CDI$^s$ for other forms of cancers such as breast cancer has not been as well-explored nor ha…
▽ More
Recently, a new form of magnetic resonance imaging (MRI) called synthetic correlated diffusion (CDI$^s$) imaging was introduced and showed considerable promise for clinical decision support for cancers such as prostate cancer when compared to current gold-standard MRI techniques. However, the efficacy for CDI$^s$ for other forms of cancers such as breast cancer has not been as well-explored nor have CDI$^s$ data been previously made publicly available. Motivated to advance efforts in the development of computer-aided clinical decision support for breast cancer using CDI$^s$, we introduce Cancer-Net BCa, a multi-institutional open-source benchmark dataset of volumetric CDI$^s$ imaging data of breast cancer patients. Cancer-Net BCa contains CDI$^s$ volumetric images from a pre-treatment cohort of 253 patients across ten institutions, along with detailed annotation metadata (the lesion type, genetic subtype, longest diameter on the MRI (MRLD), the Scarff-Bloom-Richardson (SBR) grade, and the post-treatment breast cancer pathologic complete response (pCR) to neoadjuvant chemotherapy). We further examine the demographic and tumour diversity of the Cancer-Net BCa dataset to gain deeper insights into potential biases. Cancer-Net BCa is publicly available as a part of a global open-source initiative dedicated to accelerating advancement in machine learning to aid clinicians in the fight against cancer.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
NutritionVerse-Thin: An Optimized Strategy for Enabling Improved Rendering of 3D Thin Food Models
Authors:
Chi-en Amy Tai,
Jason Li,
Sriram Kumar,
Saeejith Nair,
Yuhao Chen,
Pengcheng Xi,
Alexander Wong
Abstract:
With the growth in capabilities of generative models, there has been growing interest in using photo-realistic renders of common 3D food items to improve downstream tasks such as food printing, nutrition prediction, or management of food wastage. Despite 3D modelling capabilities being more accessible than ever due to the success of NeRF based view-synthesis, such rendering methods still struggle…
▽ More
With the growth in capabilities of generative models, there has been growing interest in using photo-realistic renders of common 3D food items to improve downstream tasks such as food printing, nutrition prediction, or management of food wastage. Despite 3D modelling capabilities being more accessible than ever due to the success of NeRF based view-synthesis, such rendering methods still struggle to correctly capture thin food objects, often generating meshes with significant holes. In this study, we present an optimized strategy for enabling improved rendering of thin 3D food models, and demonstrate qualitative improvements in rendering quality. Our method generates the 3D model mesh via a proposed thin-object-optimized differentiable reconstruction method and tailors the strategy at both the data collection and training stages to better handle thin objects. While simple, we find that this technique can be employed for quick and highly consistent capturing of thin 3D objects.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation
Authors:
Chi-en Amy Tai,
Matthew Keller,
Mattie Kerrigan,
Yuhao Chen,
Saeejith Nair,
Pengcheng Xi,
Alexander Wong
Abstract:
77% of adults over 50 want to age in place today, presenting a major challenge to ensuring adequate nutritional intake. It has been reported that one in four older adults that are 65 years or older are malnourished and given the direct link between malnutrition and decreased quality of life, there have been numerous studies conducted on how to efficiently track nutritional intake of food. Recent a…
▽ More
77% of adults over 50 want to age in place today, presenting a major challenge to ensuring adequate nutritional intake. It has been reported that one in four older adults that are 65 years or older are malnourished and given the direct link between malnutrition and decreased quality of life, there have been numerous studies conducted on how to efficiently track nutritional intake of food. Recent advancements in machine learning and computer vision show promise of automated nutrition tracking methods of food, but require a large high-quality dataset in order to accurately identify the nutrients from the food on the plate. Unlike existing datasets, a collection of 3D models with nutritional information allow for view synthesis to create an infinite number of 2D images for any given viewpoint/camera angle along with the associated nutritional information. In this paper, we develop a methodology for collecting high-quality 3D models for food items with a particular focus on speed and consistency, and introduce NutritionVerse-3D, a large-scale high-quality high-resolution dataset of 105 3D food models, in conjunction with their associated weight, food name, and nutritional value. These models allow for large quantity food intake scenes, diverse and customizable scene layout, and an infinite number of camera settings and lighting conditions. NutritionVerse-3D is publicly available as a part of an open initiative to accelerate machine learning for nutrition sensing.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments
Authors:
Drew Penney,
Bin Li,
Lizhong Chen,
Jaroslaw J. Sydir,
Anna Drewek-Ossowicka,
Ramesh Illikkal,
Charlie Tai,
Ravi Iyer,
Andrew Herdrich
Abstract:
Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) require…
▽ More
Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.
△ Less
Submitted 3 September, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Learning Object-level Point Augmentor for Semi-supervised 3D Object Detection
Authors:
Cheng-Ju Ho,
Chen-Hsuan Tai,
Yi-Hsuan Tsai,
Yen-Yu Lin,
Ming-Hsuan Yang
Abstract:
Semi-supervised object detection is important for 3D scene understanding because obtaining large-scale 3D bounding box annotations on point clouds is time-consuming and labor-intensive. Existing semi-supervised methods usually employ teacher-student knowledge distillation together with an augmentation strategy to leverage unlabeled point clouds. However, these methods adopt global augmentation wit…
▽ More
Semi-supervised object detection is important for 3D scene understanding because obtaining large-scale 3D bounding box annotations on point clouds is time-consuming and labor-intensive. Existing semi-supervised methods usually employ teacher-student knowledge distillation together with an augmentation strategy to leverage unlabeled point clouds. However, these methods adopt global augmentation with scene-level transformations and hence are sub-optimal for instance-level object detection. In this work, we propose an object-level point augmentor (OPA) that performs local transformations for semi-supervised 3D object detection. In this way, the resultant augmentor is derived to emphasize object instances rather than irrelevant backgrounds, making the augmented data more useful for object detector training. Extensive experiments on the ScanNet and SUN RGB-D datasets show that the proposed OPA performs favorably against the state-of-the-art methods under various experimental settings. The source code will be available at https://github.com/nomiaro/OPA.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Why the pseudo label based semi-supervised learning algorithm is effective?
Authors:
Zeping Min,
Qian Ge,
Cheng Tai
Abstract:
Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-sup…
▽ More
Recently, pseudo label based semi-supervised learning has achieved great success in many fields. The core idea of the pseudo label based semi-supervised learning algorithm is to use the model trained on the labeled data to generate pseudo labels on the unlabeled data, and then train a model to fit the previously generated pseudo labels. We give a theory analysis for why pseudo label based semi-supervised learning is effective in this paper. We mainly compare the generalization error of the model trained under two settings: (1) There are N labeled data. (2) There are N unlabeled data and a suitable initial model. Our analysis shows that, firstly, when the amount of unlabeled data tends to infinity, the pseudo label based semi-supervised learning algorithm can obtain model which have the same generalization error upper bound as model obtained by normally training in the condition of the amount of labeled data tends to infinity. More importantly, we prove that when the amount of unlabeled data is large enough, the generalization error upper bound of the model obtained by pseudo label based semi-supervised learning algorithm can converge to the optimal upper bound with linear convergence rate. We also give the lower bound on sampling complexity to achieve linear convergence rate. Our analysis contributes to understanding the empirical successes of pseudo label-based semi-supervised learning.
△ Less
Submitted 24 January, 2023; v1 submitted 18 November, 2022;
originally announced November 2022.
-
LiDAL: Inter-frame Uncertainty Based Active Learning for 3D LiDAR Semantic Segmentation
Authors:
Zeyu Hu,
Xuyang Bai,
Runze Zhang,
Xin Wang,
Guangyuan Sun,
Hongbo Fu,
Chiew-Lan Tai
Abstract:
We propose LiDAL, a novel active learning method for 3D LiDAR semantic segmentation by exploiting inter-frame uncertainty among LiDAR frames. Our core idea is that a well-trained model should generate robust results irrespective of viewpoints for scene scanning and thus the inconsistencies in model predictions across frames provide a very reliable measure of uncertainty for active sample selection…
▽ More
We propose LiDAL, a novel active learning method for 3D LiDAR semantic segmentation by exploiting inter-frame uncertainty among LiDAR frames. Our core idea is that a well-trained model should generate robust results irrespective of viewpoints for scene scanning and thus the inconsistencies in model predictions across frames provide a very reliable measure of uncertainty for active sample selection. To implement this uncertainty measure, we introduce new inter-frame divergence and entropy formulations, which serve as the metrics for active selection. Moreover, we demonstrate additional performance gains by predicting and incorporating pseudo-labels, which are also selected using the proposed inter-frame uncertainty measure. Experimental results validate the effectiveness of LiDAL: we achieve 95% of the performance of fully supervised learning with less than 5% of annotations on the SemanticKITTI and nuScenes datasets, outperforming state-of-the-art active learning methods. Code release: https://github.com/hzykent/LiDAL.
△ Less
Submitted 10 November, 2022;
originally announced November 2022.
-
Enhancing Clinical Support for Breast Cancer with Deep Learning Models using Synthetic Correlated Diffusion Imaging
Authors:
Chi-en Amy Tai,
Hayden Gunraj,
Nedim Hodzic,
Nic Flanagan,
Ali Sabri,
Alexander Wong
Abstract:
Breast cancer is the second most common type of cancer in women in Canada and the United States, representing over 25\% of all new female cancer cases. As such, there has been immense research and progress on improving screening and clinical support for breast cancer. In this paper, we investigate enhancing clinical support for breast cancer with deep learning models using a newly introduced magne…
▽ More
Breast cancer is the second most common type of cancer in women in Canada and the United States, representing over 25\% of all new female cancer cases. As such, there has been immense research and progress on improving screening and clinical support for breast cancer. In this paper, we investigate enhancing clinical support for breast cancer with deep learning models using a newly introduced magnetic resonance imaging (MRI) modality called synthetic correlated diffusion imaging (CDI$^s$). More specifically, we leverage a volumetric convolutional neural network to learn volumetric deep radiomic features from a pre-treatment cohort and construct a predictor based on the learnt features for grade and post-treatment response prediction. As the first study to learn CDI$^s$-centric radiomic sequences within a deep learning perspective for clinical decision support, we evaluated the proposed approach using the ACRIN-6698 study against those learnt using gold-standard imaging modalities. We find that the proposed approach can achieve better performance for both grade and post-treatment response prediction and thus may be a useful tool to aid oncologists in improving recommendation of treatment of patients. Subsequently, the approach to leverage volumetric deep radiomic features for breast cancer can be further extended to other applications of CDI$^s$ in the cancer domain to further improve clinical support.
△ Less
Submitted 4 August, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Prospective Preference Enhanced Mixed Attentive Model for Session-based Recommendation
Authors:
Bo Peng,
Chang-Yu Tai,
Srinivasan Parthasarathy,
Xia Ning
Abstract:
Session-based recommendation aims to generate recommendations for the next item of users' interest based on a given session. In this manuscript, we develop prospective preference enhanced mixed attentive model (P2MAM) to generate session-based recommendations using two important factors: temporal patterns and estimates of users' prospective preferences. Unlike existing methods, P2MAM models the te…
▽ More
Session-based recommendation aims to generate recommendations for the next item of users' interest based on a given session. In this manuscript, we develop prospective preference enhanced mixed attentive model (P2MAM) to generate session-based recommendations using two important factors: temporal patterns and estimates of users' prospective preferences. Unlike existing methods, P2MAM models the temporal patterns using a light-weight while effective position-sensitive attention mechanism. In P2MAM, we also leverage the estimate of users' prospective preferences to signify important items, and generate better recommendations. Our experimental results demonstrate that P2MAM models significantly outperform the state-of-the-art methods in six benchmark datasets, with an improvement as much as 19.2%. In addition, our run-time performance comparison demonstrates that during testing, P2MAM models are much more efficient than the best baseline method, with a significant average speedup of 47.7 folds.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
PoseCoach: A Customizable Analysis and Visualization System for Video-based Running Coaching
Authors:
Jingyuan Liu,
Nazmus Saquib,
Zhutian Chen,
Rubaiat Habib Kazi,
Li-Yi Wei,
Hongbo Fu,
Chiew-Lan Tai
Abstract:
Videos are an accessible form of media for analyzing sports postures and providing feedback to athletes. Existing sport-specific systems embed bespoke human pose attributes and thus can be hard to scale for new attributes, especially for users without programming experiences. Some systems retain scalability by directly showing the differences between two poses, but they might not clearly visualize…
▽ More
Videos are an accessible form of media for analyzing sports postures and providing feedback to athletes. Existing sport-specific systems embed bespoke human pose attributes and thus can be hard to scale for new attributes, especially for users without programming experiences. Some systems retain scalability by directly showing the differences between two poses, but they might not clearly visualize the key differences that viewers would like to pursue. Besides, video-based coaching systems often present feedback on the correctness of poses by augmenting videos with visual markers or reference poses. However, previewing and augmenting videos limit the analysis and visualization of human poses due to the fixed viewpoints in videos, which confine the observation of captured human movements and cause ambiguity in the augmented feedback. To address these issues, we study customizable human pose data analysis and visualization in the context of running pose attributes, such as joint angles and step distances. Based on existing literature and a formative study, we have designed and implemented a system, PoseCoach, to provide feedback on running poses for amateurs by comparing the running poses between a novice and an expert. PoseCoach adopts a customizable data analysis model to allow users' controllability in defining pose attributes of their interests through our interface. To avoid the influence of viewpoint differences and provide intuitive feedback, PoseCoach visualizes the pose differences as part-based 3D animations on a human model to imitate the demonstration of a human coach. We conduct a user study to verify our design components and conduct expert interviews to evaluate the usefulness of the system.
△ Less
Submitted 27 February, 2023; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Filter-based Discriminative Autoencoders for Children Speech Recognition
Authors:
Chiang-Lin Tai,
Hung-Shin Lee,
Yu Tsao,
Hsin-Min Wang
Abstract:
Children speech recognition is indispensable but challenging due to the diversity of children's speech. In this paper, we propose a filter-based discriminative autoencoder for acoustic modeling. To filter out the influence of various speaker types and pitches, auxiliary information of the speaker and pitch features is input into the encoder together with the acoustic features to generate phonetic…
▽ More
Children speech recognition is indispensable but challenging due to the diversity of children's speech. In this paper, we propose a filter-based discriminative autoencoder for acoustic modeling. To filter out the influence of various speaker types and pitches, auxiliary information of the speaker and pitch features is input into the encoder together with the acoustic features to generate phonetic embeddings. In the training phase, the decoder uses the auxiliary information and the phonetic embedding extracted by the encoder to reconstruct the input acoustic features. The autoencoder is trained by simultaneously minimizing the ASR loss and feature reconstruction error. The framework can make the phonetic embedding purer, resulting in more accurate senone (triphone-state) scores. Evaluated on the test set of the CMU Kids corpus, our system achieves a 7.8% relative WER reduction compared to the baseline system. In the domain adaptation experiment, our system also outperforms the baseline system on the British-accent PF-STAR task.
△ Less
Submitted 23 May, 2022; v1 submitted 31 March, 2022;
originally announced April 2022.
-
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
Authors:
Xuyang Bai,
Zeyu Hu,
Xinge Zhu,
Qingqiu Huang,
Yilun Chen,
Hongbo Fu,
Chiew-Lan Tai
Abstract:
LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, is under-explored. Existing fusion methods are easily affected by such conditions, mainly due to a hard association of LiDAR points and image pixe…
▽ More
LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, is under-explored. Existing fusion methods are easily affected by such conditions, mainly due to a hard association of LiDAR points and image pixels, established by calibration matrices. We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. Specifically, our TransFusion consists of convolutional backbones and a detection head based on a transformer decoder. The first layer of the decoder predicts initial bounding boxes from a LiDAR point cloud using a sparse set of object queries, and its second decoder layer adaptively fuses the object queries with useful image features, leveraging both spatial and contextual relationships. The attention mechanism of the transformer enables our model to adaptively determine where and what information should be taken from the image, leading to a robust and effective fusion strategy. We additionally design an image-guided query initialization strategy to deal with objects that are difficult to detect in point clouds. TransFusion achieves state-of-the-art performance on large-scale datasets. We provide extensive experiments to demonstrate its robustness against degenerated image quality and calibration errors. We also extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking, showing its effectiveness and generalization capability.
△ Less
Submitted 22 March, 2022;
originally announced March 2022.
-
ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications
Authors:
Yifan Yuan,
Jinghan Huang,
Yan Sun,
Tianchen Wang,
Jacob Nelson,
Dan R. K. Ports,
Yipeng Wang,
Ren Wang,
Charlie Tai,
Nam Sung Kim
Abstract:
Responding to the "datacenter tax" and "killer microseconds" problems for datacenter applications, diverse solutions including Smart NIC-based ones have been proposed. Nonetheless, they often suffer from high overhead of communications over network and/or PCIe links. To tackle the limitations of the current solutions, this paper proposes ORCA, a holistic network and architecture co-design solution…
▽ More
Responding to the "datacenter tax" and "killer microseconds" problems for datacenter applications, diverse solutions including Smart NIC-based ones have been proposed. Nonetheless, they often suffer from high overhead of communications over network and/or PCIe links. To tackle the limitations of the current solutions, this paper proposes ORCA, a holistic network and architecture co-design solution that leverages current RDMA and emerging cache-coherent off-chip interconnect technologies. Specifically, ORCA consists of four hardware and software components: (1) unified abstraction of inter- and intra-machine communications managed by one-sided RDMA write and cache-coherent memory write; (2) efficient notification of requests to accelerators assisted by cache coherence; (3) cache-coherent accelerator architecture directly processing requests received by NIC; and (4) adaptive device-to-host data transfer for modern server memory systems consisting of both DRAM and NVM exploiting state-of-the-art features in CPUs and PCIe. We prototype ORCA with a commercial system and evaluate three popular datacenter applications: in-memory key-value store, chain replication-based distributed transaction system, and deep learning recommendation model inference. The evaluation shows that ORCA provides 30.1~69.1% lower latency, up to 2.5x higher throughput, and 3x higher power efficiency than the current state-of-the-art solutions.
△ Less
Submitted 17 October, 2022; v1 submitted 16 March, 2022;
originally announced March 2022.
-
PROMPT: Learning Dynamic Resource Allocation Policies for Network Applications
Authors:
Drew Penney,
Bin Li,
Jaroslaw Sydir,
Lizhong Chen,
Charlie Tai,
Stefan Lee,
Eoin Walsh,
Thomas Long
Abstract:
A growing number of service providers are exploring methods to improve server utilization and reduce power consumption by co-scheduling high-priority latency-critical workloads with best-effort workloads. This practice requires strict resource allocation between workloads to reduce contention and maintain Quality-of-Service (QoS) guarantees. Prior work demonstrated promising opportunities to dynam…
▽ More
A growing number of service providers are exploring methods to improve server utilization and reduce power consumption by co-scheduling high-priority latency-critical workloads with best-effort workloads. This practice requires strict resource allocation between workloads to reduce contention and maintain Quality-of-Service (QoS) guarantees. Prior work demonstrated promising opportunities to dynamically allocate resources based on workload demand, but may fail to meet QoS objectives in more stringent operating environments due to the presence of resource allocation cliffs, transient fluctuations in workload performance, and rapidly changing resource demand. We therefore propose PROMPT, a novel resource allocation framework using proactive QoS prediction to guide a reinforcement learning controller. PROMPT enables more precise resource optimization, more consistent handling of transient behaviors, and more robust generalization when co-scheduling new best-effort workloads not encountered during policy training. Evaluation shows that the proposed method incurs 4.2x fewer QoS violations, reduces severity of QoS violations by 12.7x, improves best-effort workload performance, and improves overall power efficiency over prior work.
△ Less
Submitted 24 March, 2023; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Hyperbolic Disentangled Representation for Fine-Grained Aspect Extraction
Authors:
Chang-You Tai,
Ming-Yao Li,
Lun-Wei Ku
Abstract:
Automatic identification of salient aspects from user reviews is especially useful for opinion analysis. There has been significant progress in utilizing weakly supervised approaches, which require only a small set of seed words for training aspect classifiers. However, there is always room for improvement. First, no weakly supervised approaches fully utilize latent hierarchies between words. Seco…
▽ More
Automatic identification of salient aspects from user reviews is especially useful for opinion analysis. There has been significant progress in utilizing weakly supervised approaches, which require only a small set of seed words for training aspect classifiers. However, there is always room for improvement. First, no weakly supervised approaches fully utilize latent hierarchies between words. Second, each seed words representation should have different latent semantics and be distinct when it represents a different aspect. In this paper, we propose HDAE, a hyperbolic disentangled aspect extractor in which a hyperbolic aspect classifier captures words latent hierarchies, and aspect-disentangled representation models the distinct latent semantics of each seed word. Compared to previous baselines, HDAE achieves average F1 performance gains of 18.2% and 24.1% on Amazon product review and restaurant review datasets, respectively. In addition, the em-bedding visualization experience demonstrates that HDAE is a more effective approach to leveraging seed words. An ablation study and a case study further attest to the effectiveness of the proposed components
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Learning to Match Features with Seeded Graph Matching Network
Authors:
Hongkai Chen,
Zixin Luo,
Jiahui Zhang,
Lei Zhou,
Xuyang Bai,
Zeyu Hu,
Chiew-Lan Tai,
Long Quan
Abstract:
Matching local features across images is a fundamental problem in computer vision. Targeting towards high accuracy and efficiency, we propose Seeded Graph Matching Network, a graph neural network with sparse structure to reduce redundant connectivity and learn compact representation. The network consists of 1) Seeding Module, which initializes the matching by generating a small set of reliable mat…
▽ More
Matching local features across images is a fundamental problem in computer vision. Targeting towards high accuracy and efficiency, we propose Seeded Graph Matching Network, a graph neural network with sparse structure to reduce redundant connectivity and learn compact representation. The network consists of 1) Seeding Module, which initializes the matching by generating a small set of reliable matches as seeds. 2) Seeded Graph Neural Network, which utilizes seed matches to pass messages within/across images and predicts assignment costs. Three novel operations are proposed as basic elements for message passing: 1) Attentional Pooling, which aggregates keypoint features within the image to seed matches. 2) Seed Filtering, which enhances seed features and exchanges messages across images. 3) Attentional Unpooling, which propagates seed features back to original keypoints. Experiments show that our method reduces computational and memory complexity significantly compared with typical attention-based networks while competitive or higher performance is achieved.
△ Less
Submitted 19 August, 2021;
originally announced August 2021.
-
VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation
Authors:
Zeyu Hu,
Xuyang Bai,
Jiaxiang Shang,
Runze Zhang,
Jiayu Dong,
Xin Wang,
Guangyuan Sun,
Hongbo Fu,
Chiew-Lan Tai
Abstract:
In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In vie…
▽ More
In recent years, sparse voxel-based methods have become the state-of-the-arts for 3D semantic segmentation of indoor scenes, thanks to the powerful 3D CNNs. Nevertheless, being oblivious to the underlying geometry, voxel-based methods suffer from ambiguous features on spatially close objects and struggle with handling complex and irregular geometries due to the lack of geodesic information. In view of this, we present Voxel-Mesh Network (VMNet), a novel 3D deep architecture that operates on the voxel and mesh representations leveraging both the Euclidean and geodesic information. Intuitively, the Euclidean information extracted from voxels can offer contextual cues representing interactions between nearby objects, while the geodesic information extracted from meshes can help separate objects that are spatially close but have disconnected surfaces. To incorporate such information from the two domains, we design an intra-domain attentive module for effective feature aggregation and an inter-domain attentive module for adaptive feature fusion. Experimental results validate the effectiveness of VMNet: specifically, on the challenging ScanNet dataset for large-scale segmentation of indoor scenes, it outperforms the state-of-the-art SparseConvNet and MinkowskiNet (74.6% vs 72.5% and 73.6% in mIoU) with a simpler network structure (17M vs 30M and 38M parameters). Code release: https://github.com/hzykent/VMNet
△ Less
Submitted 25 July, 2022; v1 submitted 29 July, 2021;
originally announced July 2021.
-
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More
Authors:
Shabnam Daghaghi,
Nicholas Meisburger,
Mengnan Zhao,
Yong Wu,
Sameh Gobriel,
Charlie Tai,
Anshumali Shrivastava
Abstract:
Deep learning implementations on CPUs (Central Processing Units) are gaining more traction. Enhanced AI capabilities on commodity x86 architectures are commercially appealing due to the reuse of existing hardware and virtualization ease. A notable work in this direction is the SLIDE system. SLIDE is a C++ implementation of a sparse hash table based back-propagation, which was shown to be significa…
▽ More
Deep learning implementations on CPUs (Central Processing Units) are gaining more traction. Enhanced AI capabilities on commodity x86 architectures are commercially appealing due to the reuse of existing hardware and virtualization ease. A notable work in this direction is the SLIDE system. SLIDE is a C++ implementation of a sparse hash table based back-propagation, which was shown to be significantly faster than GPUs in training hundreds of million parameter neural models. In this paper, we argue that SLIDE's current implementation is sub-optimal and does not exploit several opportunities available in modern CPUs. In particular, we show how SLIDE's computations allow for a unique possibility of vectorization via AVX (Advanced Vector Extensions)-512. Furthermore, we highlight opportunities for different kinds of memory optimization and quantizations. Combining all of them, we obtain up to 7x speedup in the computations on the same hardware. Our experiments are focused on large (hundreds of millions of parameters) recommendation and NLP models. Our work highlights several novel perspectives and opportunities for implementing randomized algorithms for deep learning on modern CPUs. We provide the code and benchmark scripts at https://github.com/RUSH-LAB/SLIDE
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
Authors:
Xuyang Bai,
Zixin Luo,
Lei Zhou,
Hongkai Chen,
Lei Li,
Zeyu Hu,
Hongbo Fu,
Chiew-Lan Tai
Abstract:
Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper,…
▽ More
Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning methods in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand-crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
Influence Maximization Based on Dynamic Personal Perception in Knowledge Graph
Authors:
Ya-Wen Teng,
Yishuo Shi,
Chih-Hua Tai,
De-Nian Yang,
Wang-Chien Lee,
Ming-Syan Chen
Abstract:
Viral marketing on social networks, also known as Influence Maximization (IM), aims to select k users for the promotion of a target item by maximizing the total spread of their influence. However, most previous works on IM do not explore the dynamic user perception of promoted items in the process. In this paper, by exploiting the knowledge graph (KG) to capture dynamic user perception, we formula…
▽ More
Viral marketing on social networks, also known as Influence Maximization (IM), aims to select k users for the promotion of a target item by maximizing the total spread of their influence. However, most previous works on IM do not explore the dynamic user perception of promoted items in the process. In this paper, by exploiting the knowledge graph (KG) to capture dynamic user perception, we formulate the problem of Influence Maximization with Dynamic Personal Perception (IMDPP) that considers user preferences and social influence reflecting the impact of relevant item adoptions. We prove the hardness of IMDPP and design an approximation algorithm, named Dynamic perception for seeding in target markets (Dysim), by exploring the concepts of dynamic reachability, target markets, and substantial influence to select and promote a sequence of relevant items. We evaluate the performance of Dysim in comparison with the state-of-the-art approaches using real social networks with real KGs. The experimental results show that Dysim effectively achieves up to 6.7 times of influence spread in large datasets over the state-of-the-art approaches.
△ Less
Submitted 30 September, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
An Overview of Generalized Frequency Division Multiplexing (GFDM)
Authors:
Ching-Lun Tai,
Tzu-Han Wang,
Yu-Hua Huang
Abstract:
As a candidate waveform for next-generation wireless communications, generalized frequency division multiplexing (GFDM) features several decent properties which make it promising. In this paper, we systematically overview the research about GFDM. We start with GFDM transceivers with their main components, which consist of prototype filter design, low-complexity transceiver implementation, and symb…
▽ More
As a candidate waveform for next-generation wireless communications, generalized frequency division multiplexing (GFDM) features several decent properties which make it promising. In this paper, we systematically overview the research about GFDM. We start with GFDM transceivers with their main components, which consist of prototype filter design, low-complexity transceiver implementation, and symbol detection algorithms. Then, we investigate a couple of non-ideal issues of GFDM, including synchronization issues, channel estimation, and in-phase/quadrature (I/Q) imbalance compensation. Lastly, we study the applications of GFDM-based cognitive radio and full-duplex radio which boast of a high spectral efficiency.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
JSENet: Joint Semantic Segmentation and Edge Detection Network for 3D Point Clouds
Authors:
Zeyu Hu,
Mingmin Zhen,
Xuyang Bai,
Hongbo Fu,
Chiew-lan Tai
Abstract:
Semantic segmentation and semantic edge detection can be seen as two dual problems with close relationships in computer vision. Despite the fast evolution of learning-based 3D semantic segmentation methods, little attention has been drawn to the learning of 3D semantic edge detectors, even less to a joint learning method for the two tasks. In this paper, we tackle the 3D semantic edge detection ta…
▽ More
Semantic segmentation and semantic edge detection can be seen as two dual problems with close relationships in computer vision. Despite the fast evolution of learning-based 3D semantic segmentation methods, little attention has been drawn to the learning of 3D semantic edge detectors, even less to a joint learning method for the two tasks. In this paper, we tackle the 3D semantic edge detection task for the first time and present a new two-stream fully-convolutional network that jointly performs the two tasks. In particular, we design a joint refinement module that explicitly wires region information and edge information to improve the performances of both tasks. Further, we propose a novel loss function that encourages the network to produce semantic segmentation results with better boundaries. Extensive evaluations on S3DIS and ScanNet datasets show that our method achieves on par or better performance than the state-of-the-art methods for semantic segmentation and outperforms the baseline methods for semantic edge detection. Code release: https://github.com/hzykent/JSENet
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
IOCA: High-Speed I/O-Aware LLC Management for Network-Centric Multi-Tenant Platform
Authors:
Yifan Yuan,
Mohammad Alian,
Yipeng Wang,
Ilia Kurakin,
Ren Wang,
Charlie Tai,
Nam Sung Kim
Abstract:
In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of the workloads, and how to manage LLC is a key to the performance isolation and QoS in the cloud with multi-tenancy. In this paper, we argue that besides CPU cores, high-speed network I/O is also important for LLC management. This is because of an Intel architectural…
▽ More
In modern server CPUs, last-level cache (LLC) is a critical hardware resource that exerts significant influence on the performance of the workloads, and how to manage LLC is a key to the performance isolation and QoS in the cloud with multi-tenancy. In this paper, we argue that besides CPU cores, high-speed network I/O is also important for LLC management. This is because of an Intel architectural innovation -- Data Direct I/O (DDIO) -- that directly injects the inbound I/O traffic to (part of) the LLC instead of the main memory. We summarize two problems caused by DDIO and show that (1) the default DDIO configuration may not always achieve optimal performance, (2) DDIO can decrease the performance of non-I/O workloads which share LLC with it by as high as 32%.
We then present IOCA, the first LLC management mechanism for network-centric platforms that treats the I/O as the first-class citizen. IOCA monitors and analyzes the performance of the cores, LLC, and DDIO using CPU's hardware performance counters, and adaptively adjusts the number of LLC ways for DDIO or the tenants that demand more LLC capacity. In addition, IOCA dynamically chooses the tenants that share its LLC resource with DDIO, to minimize the performance interference by both the tenants and the I/O. Our experiments with multiple microbenchmarks and real-world applications in two major end-host network models demonstrate that IOCA can effectively reduce the performance degradation caused by DDIO, with minimal overhead.
△ Less
Submitted 4 March, 2021; v1 submitted 9 July, 2020;
originally announced July 2020.
-
MVIN: Learning Multiview Items for Recommendation
Authors:
Chang-You Tai,
Meng-Ru Wu,
Yun-Wei Chu,
Shao-Yu Chu,
Lun-Wei Ku
Abstract:
Researchers have begun to utilize heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems to mitigate the cold start and sparsity issues. However, utilizing a graph neural network (GNN) to capture information in KG and further apply in RS is still problematic as it is unable to see each item's properties from multiple perspectives. To address these issues, we propos…
▽ More
Researchers have begun to utilize heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems to mitigate the cold start and sparsity issues. However, utilizing a graph neural network (GNN) to capture information in KG and further apply in RS is still problematic as it is unable to see each item's properties from multiple perspectives. To address these issues, we propose the multi-view item network (MVIN), a GNN-based recommendation model which provides superior recommendations by describing items from a unique mixed view from user and entity angles. MVIN learns item representations from both the user view and the entity view. From the user view, user-oriented modules score and aggregate features to make recommendations from a personalized perspective constructed according to KG entities which incorporates user click information. From the entity view, the mixing layer contrasts layer-wise GCN information to further obtain comprehensive features from internal entity-entity interactions in the KG. We evaluate MVIN on three real-world datasets: MovieLens-1M (ML-1M), LFM-1b 2015 (LFM-1b), and Amazon-Book (AZ-book). Results show that MVIN significantly outperforms state-of-the-art methods on these three datasets. In addition, from user-view cases, we find that MVIN indeed captures entities that attract users. Figures further illustrate that mixing layers in a heterogeneous KG plays a vital role in neighborhood information aggregation.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds
Authors:
Lei Li,
Siyu Zhu,
Hongbo Fu,
Ping Tan,
Chiew-Lan Tai
Abstract:
In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds. To adopt a similar multi-view representation, existing studies use hand-crafted viewpoints for rendering in a preprocessing stage, which is detached from the subsequent descriptor learning stage. In our framework, we integrate the multi-view rendering into neural networks by using a differen…
▽ More
In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds. To adopt a similar multi-view representation, existing studies use hand-crafted viewpoints for rendering in a preprocessing stage, which is detached from the subsequent descriptor learning stage. In our framework, we integrate the multi-view rendering into neural networks by using a differentiable renderer, which allows the viewpoints to be optimizable parameters for capturing more informative local context of interest points. To obtain discriminative descriptors, we also design a soft-view pooling module to attentively fuse convolutional features across views. Extensive experiments on existing 3D registration benchmarks show that our method outperforms existing local descriptors both quantitatively and qualitatively.
△ Less
Submitted 16 March, 2020; v1 submitted 12 March, 2020;
originally announced March 2020.
-
D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features
Authors:
Xuyang Bai,
Zixin Luo,
Lei Zhou,
Hongbo Fu,
Long Quan,
Chiew-Lan Tai
Abstract:
A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clo…
▽ More
A successful point cloud registration often lies on robust establishment of sparse matches through discriminative 3D local features. Despite the fast evolution of learning-based 3D feature descriptors, little attention has been drawn to the learning of 3D feature detectors, even less for a joint learning of the two tasks. In this paper, we leverage a 3D fully convolutional network for 3D point clouds, and propose a novel and practical learning mechanism that densely predicts both a detection score and a description feature for each 3D point. In particular, we propose a keypoint selection strategy that overcomes the inherent density variations of 3D point clouds, and further propose a self-supervised detector loss guided by the on-the-fly feature matching results during training. Finally, our method achieves state-of-the-art results in both indoor and outdoor scenarios, evaluated on 3DMatch and KITTI datasets, and shows its strong generalization ability on the ETH dataset. Towards practical use, we show that by adopting a reliable feature detector, sampling a smaller number of features is sufficient to achieve accurate and fast point cloud alignment.[code release](https://github.com/XuyangBai/D3Feat)
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence
Authors:
Deng Yu,
Lei Li,
Youyi Zheng,
Manfred Lau,
Yi-Zhe Song,
Chiew-Lan Tai,
Hongbo Fu
Abstract:
In this paper, we study the problem of multi-view sketch correspondence, where we take as input multiple freehand sketches with different views of the same object and predict as output the semantic correspondence among the sketches. This problem is challenging since the visual features of corresponding points at different views can be very different. To this end, we take a deep learning approach a…
▽ More
In this paper, we study the problem of multi-view sketch correspondence, where we take as input multiple freehand sketches with different views of the same object and predict as output the semantic correspondence among the sketches. This problem is challenging since the visual features of corresponding points at different views can be very different. To this end, we take a deep learning approach and learn a novel local sketch descriptor from data. We contribute a training dataset by generating the pixel-level correspondence for the multi-view line drawings synthesized from 3D shapes. To handle the sparsity and ambiguity of sketches, we design a novel multi-branch neural network that integrates a patch-based representation and a multi-scale strategy to learn the pixel-level correspondence among multi-view sketches. We demonstrate the effectiveness of our proposed approach with extensive experiments on hand-drawn sketches and multi-view line drawings rendered from multiple 3D shape datasets.
△ Less
Submitted 10 August, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
Interference-Precancelled Pilot Design for LMMSE Channel Estimation of GFDM
Authors:
Ching-Lun Tai,
Borching Su,
Cai Jia
Abstract:
Generalized frequency division multiplexing (GFDM) is a promising candidate waveform for next-generation wireless communication systems. However, GFDM channel estimation is still challenging due to the inherent interference. In this paper, we formulate a pilot design framework with linear minimum mean square error (LMMSE) channel estimation for GFDM, and propose a novel pilot design to achieve int…
▽ More
Generalized frequency division multiplexing (GFDM) is a promising candidate waveform for next-generation wireless communication systems. However, GFDM channel estimation is still challenging due to the inherent interference. In this paper, we formulate a pilot design framework with linear minimum mean square error (LMMSE) channel estimation for GFDM, and propose a novel pilot design to achieve interference precancellation during pilot generation with the fixed transmit sample values at selected frequency bins. Numerical results demonstrate that the proposed method reduces the channel estimation mean square error and the symbol error rate (SER) in high signal-to-noise ratio (SNR) regions, compared with the conventional methods.
△ Less
Submitted 29 September, 2019;
originally announced September 2019.
-
Greedy Algorithms for Hybrid Compressed Sensing
Authors:
Ching-Lun Tai,
Sung-Hsien Hsieh,
Chun-Shien Lu
Abstract:
Compressed sensing (CS) is a technique which uses fewer measurements than dictated by the Nyquist sampling theorem. The traditional CS with linear measurements achieves efficient recovery performances, but it suffers from the large bit consumption due to the huge storage occupied by those measurements. Then, the one-bit CS with binary measurements is proposed and saves the bit budget, but it is in…
▽ More
Compressed sensing (CS) is a technique which uses fewer measurements than dictated by the Nyquist sampling theorem. The traditional CS with linear measurements achieves efficient recovery performances, but it suffers from the large bit consumption due to the huge storage occupied by those measurements. Then, the one-bit CS with binary measurements is proposed and saves the bit budget, but it is infeasible when the energy information of signals is not available as a prior knowledge. Subsequently, the hybrid CS which combines the traditional CS and one-bit CS appears, striking a balance between the pros and cons of both types of CS. Considering the fact that the one-bit CS is optimal for the direction estimation of signals under noise with a fixed bit budget and that the traditional CS is able to provide residue information and estimated signals, we focus on the design of greedy algorithms, which consist of the main steps of support detection and recovered signal update, for the hybrid CS in this paper. We first propose a theorem on the random uniform tessellations for sparse signals to further investigate the properties of one-bit CS. Afterwards, we propose two greedy algorithms for the hybrid CS, with the one-bit CS responsible for support detection and traditional CS offering updated residues and signal estimates. For each of the proposed algorithms, we provide the corresponding theorem with proof to analyze their capabilities theoretically. Simulation results have demonstrated the efficacy of the proposed greedy algorithms under a limited bit budget in noisy environments.
△ Less
Submitted 17 August, 2019;
originally announced August 2019.