Search | arXiv e-print repository

arXiv:2405.19012 [pdf, other]

Implicit Neural Image Field for Biological Microscopy Image Compression

Authors: Gaole Dai, Cheng-Ching Tseng, Qingpo Wuwu, Rongyu Zhang, Shaokang Wang, Ming Lu, Tiejun Huang, Yu Zhou, Ali Ata Tuz, Matthias Gunzer, Jianxu Chen, Shanghang Zhang

Abstract: The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization. This necessitates the development of efficient compression solutions. Traditional CODEC methods struggle to adapt to the diverse bioimaging data and often suffer from sub-optimal compression. In this study, we propose… ▽ More The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization. This necessitates the development of efficient compression solutions. Traditional CODEC methods struggle to adapt to the diverse bioimaging data and often suffer from sub-optimal compression. In this study, we propose an adaptive compression workflow based on Implicit Neural Representation (INR). This approach permits application-specific compression objectives, capable of compressing images of any shape and arbitrary pixel-wise decompression. We demonstrated on a wide range of microscopy images from real applications that our workflow not only achieved high, controllable compression ratios (e.g., 512x) but also preserved detailed information critical for downstream analysis. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2404.19134 [pdf, other]

Evaluating Deep Clustering Algorithms on Non-Categorical 3D CAD Models

Authors: Siyuan Xiang, Chin Tseng, Congcong Wen, Deshana Desai, Yifeng Kou, Binil Starly, Daniele Panozzo, Chen Feng

Abstract: We introduce the first work on benchmarking and evaluating deep clustering algorithms on large-scale non-categorical 3D CAD models. We first propose a workflow to allow expert mechanical engineers to efficiently annotate 252,648 carefully sampled pairwise CAD model similarities, from a subset of the ABC dataset with 22,968 shapes. Using seven baseline deep clustering methods, we then investigate t… ▽ More We introduce the first work on benchmarking and evaluating deep clustering algorithms on large-scale non-categorical 3D CAD models. We first propose a workflow to allow expert mechanical engineers to efficiently annotate 252,648 carefully sampled pairwise CAD model similarities, from a subset of the ABC dataset with 22,968 shapes. Using seven baseline deep clustering methods, we then investigate the fundamental challenges of evaluating clustering methods for non-categorical data. Based on these challenges, we propose a novel and viable ensemble-based clustering comparison approach. This work is the first to directly target the underexplored area of deep clustering algorithms for 3D shapes, and we believe it will be an important building block to analyze and utilize the massive 3D shape collections that are starting to appear in deep geometric computing. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.03787 [pdf, other]

doi 10.2312/evs.20241073

Revisiting Categorical Color Perception in Scatterplots: Sequential, Diverging, and Categorical Palettes

Authors: Chin Tseng, Arran Zeyu Wang, Ghulam Jilani Quadri, Danielle Albers Szafir

Abstract: Existing guidelines for categorical color selection are heuristic, often grounded in intuition rather than empirical studies of readers' abilities. While design conventions recommend palettes maximize hue differences, more recent exploratory findings indicate other factors, such as lightness, may play a role in effective categorical palette design. We conducted a crowdsourced experiment on mean va… ▽ More Existing guidelines for categorical color selection are heuristic, often grounded in intuition rather than empirical studies of readers' abilities. While design conventions recommend palettes maximize hue differences, more recent exploratory findings indicate other factors, such as lightness, may play a role in effective categorical palette design. We conducted a crowdsourced experiment on mean value judgments in multi-class scatterplots using five color palette families--single-hue sequential, multi-hue sequential, perceptually-uniform multi-hue sequential, diverging, and multi-hue categorical--that differ in how they manipulate hue and lightness. Participants estimated relative mean positions in scatterplots containing 2 to 10 categories using 20 colormaps. Our results confirm heuristic guidance that hue-based categorical palettes are most effective. However, they also provide additional evidence that scalable categorical encoding relies on more than hue variance. △ Less

Submitted 16 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted for publication in EuroVis 2024 Short Paper

Journal ref: In Proceedings of the 26th EG/VGTC Conference on Visualization (EuroVis 2024), May 27-31, 2024, Odense, Denmark

arXiv:2401.14989 [pdf]

Mapping-to-Parameter Nonlinear Functional Regression with Novel B-spline Free Knot Placement Algorithm

Authors: Chengdong Shi, Ching-Hsun Tseng, Wei Zhao, Xiao-Jun Zeng

Abstract: We propose a novel approach to nonlinear functional regression, called the Mapping-to-Parameter function model, which addresses complex and nonlinear functional regression problems in parameter space by employing any supervised learning technique. Central to this model is the mapping of function data from an infinite-dimensional function space to a finite-dimensional parameter space. This is accom… ▽ More We propose a novel approach to nonlinear functional regression, called the Mapping-to-Parameter function model, which addresses complex and nonlinear functional regression problems in parameter space by employing any supervised learning technique. Central to this model is the mapping of function data from an infinite-dimensional function space to a finite-dimensional parameter space. This is accomplished by concurrently approximating multiple functions with a common set of B-spline basis functions by any chosen order, with their knot distribution determined by the Iterative Local Placement Algorithm, a newly proposed free knot placement algorithm. In contrast to the conventional equidistant knot placement strategy that uniformly distributes knot locations based on a predefined number of knots, our proposed algorithms determine knot location according to the local complexity of the input or output functions. The performance of our knot placement algorithms is shown to be robust in both single-function approximation and multiple-function approximation contexts. Furthermore, the effectiveness and advantage of the proposed prediction model in handling both function-on-scalar regression and function-on-function regression problems are demonstrated through several real data applications, in comparison with four groups of state-of-the-art methods. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2312.01361 [pdf, other]

MoEC: Mixture of Experts Implicit Neural Compression

Authors: Jianchen Zhao, Cheng-Ching Tseng, Ming Lu, Ruichuan An, Xiaobao Wei, He Sun, Shanghang Zhang

Abstract: Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the… ▽ More Emerging Implicit Neural Representation (INR) is a promising data compression technique, which represents the data using the parameters of a Deep Neural Network (DNN). Existing methods manually partition a complex scene into local regions and overfit the INRs into those regions. However, manually designing the partition scheme for a complex scene is very challenging and fails to jointly learn the partition and INRs. To solve the problem, we propose MoEC, a novel implicit neural compression method based on the theory of mixture of experts. Specifically, we use a gating network to automatically assign a specific INR to a 3D point in the scene. The gating network is trained jointly with the INRs of different local regions. Compared with block-wise and tree-structured partitions, our learnable partition can adaptively find the optimal partition in an end-to-end manner. We conduct detailed experiments on massive and diverse biomedical data to demonstrate the advantages of MoEC against existing approaches. In most of experiment settings, we have achieved state-of-the-art results. Especially in cases of extreme compression ratios, such as 6000x, we are able to uphold the PSNR of 48.16. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.13621 [pdf, other]

Knowledge From the Dark Side: Entropy-Reweighted Knowledge Distillation for Balanced Knowledge Transfer

Authors: Chi-Ping Su, Ching-Hsun Tseng, Shin-Jye Lee

Abstract: Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions. Although existing KDs have shown the potential of transferring knowledge, the gap between the two parties still exists. With a series of investigations, we argue t… ▽ More Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions. Although existing KDs have shown the potential of transferring knowledge, the gap between the two parties still exists. With a series of investigations, we argue the gap is the result of the student's overconfidence in prediction, signaling an imbalanced focus on pronounced features while overlooking the subtle yet crucial dark knowledge. To overcome this, we introduce the Entropy-Reweighted Knowledge Distillation (ER-KD), a novel approach that leverages the entropy in the teacher's predictions to reweight the KD loss on a sample-wise basis. ER-KD precisely refocuses the student on challenging instances rich in the teacher's nuanced insights while reducing the emphasis on simpler cases, enabling a more balanced knowledge transfer. Consequently, ER-KD not only demonstrates compatibility with various state-of-the-art KD methods but also further enhances their performance at negligible cost. This approach offers a streamlined and effective strategy to refine the knowledge transfer process in KD, setting a new paradigm in the meticulous handling of dark knowledge. Our code is available at https://github.com/cpsu00/ER-KD. △ Less

Submitted 22 November, 2023; originally announced November 2023.

arXiv:2309.09658 [pdf]

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Authors: Ching-Hsun Tseng, Shin-Jye Lee, Po-Wei Cheng, Chien Lee, Chih-Chieh Hung

Abstract: Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token conditional probability in LDA, we can know the most possible or essential topic. However, the results are not intuitive because the given topics cannot wholly fit… ▽ More Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token conditional probability in LDA, we can know the most possible or essential topic. However, the results are not intuitive because the given topics cannot wholly fit human knowledge. LDA offers the first possible relevant keywords, which also brings out another problem of whether the connection is reliable based on the statistic possibility. It is also hard to decide the topic number manually in advance. As the booming trend of using fuzzy membership to cluster and using transformers to embed words, this work presents the fuzzy topic modeling based on soft clustering and document embedding from state-of-the-art transformer-based model. In our practical application in a press release monitoring, the fuzzy topic modeling gives a more natural result than the traditional output from LDA. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Asian Journal of Information and Communications, Vol.12, No. 1, 125-140

arXiv:2309.00131 [pdf, other]

Effects of data distribution and granularity on color semantics for colormap data visualizations

Authors: Clementine Zimnicki, Chin Tseng, Danielle Albers Szafir, Karen B. Schloss

Abstract: To create effective data visualizations, it helps to represent data using visual features in intuitive ways. When visualization designs match observer expectations, visualizations are easier to interpret. Prior work suggests that several factors influence such expectations. For example, the dark-is-more bias leads observers to infer that darker colors map to larger quantities, and the opaque-is-mo… ▽ More To create effective data visualizations, it helps to represent data using visual features in intuitive ways. When visualization designs match observer expectations, visualizations are easier to interpret. Prior work suggests that several factors influence such expectations. For example, the dark-is-more bias leads observers to infer that darker colors map to larger quantities, and the opaque-is-more bias leads them to infer that regions appearing more opaque (given the background color) map to larger quantities. Previous work suggested that the background color only plays a role if visualizations appear to vary in opacity. The present study challenges this claim. We hypothesized that the background color modulate inferred mappings for colormaps that should not appear to vary in opacity (by previous measures) if the visualization appeared to have a "hole" that revealed the background behind the map (hole hypothesis). We found that spatial aspects of the map contributed to inferred mappings, though the effects were inconsistent with the hole hypothesis. Our work raises new questions about how spatial distributions of data influence color semantics in colormap data visualizations. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.10515 [pdf, other]

QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection

Authors: Yifan Zhang, Zhen Dong, Huanrui Yang, Ming Lu, Cheng-Ching Tseng, Yuan Du, Kurt Keutzer, Li Du, Shanghang Zhang

Abstract: Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements. However, the huge memory consumption of state-of-the-art models makes it hard to deploy them on vehicles, and the non-trivial latency will affect the real-time perception of streaming applications. Despite the wide application of quantization to lighten models, we show in our paper that directly ap… ▽ More Multi-view 3D detection based on BEV (bird-eye-view) has recently achieved significant improvements. However, the huge memory consumption of state-of-the-art models makes it hard to deploy them on vehicles, and the non-trivial latency will affect the real-time perception of streaming applications. Despite the wide application of quantization to lighten models, we show in our paper that directly applying quantization in BEV tasks will 1) make the training unstable, and 2) lead to intolerable performance degradation. To solve these issues, our method QD-BEV enables a novel view-guided distillation (VGD) objective, which can stabilize the quantization-aware training (QAT) while enhancing the model performance by leveraging both image features and BEV features. Our experiments show that QD-BEV achieves similar or even better accuracy than previous methods with significant efficiency gains. On the nuScenes datasets, the 4-bit weight and 6-bit activation quantized QD-BEV-Tiny model achieves 37.2% NDS with only 15.8 MB model size, outperforming BevFormer-Tiny by 1.8% with an 8x model compression. On the Small and Base variants, QD-BEV models also perform superbly and achieve 47.9% NDS (28.2 MB) and 50.9% NDS (32.9 MB), respectively. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: ICCV 2023 Accept

arXiv:2308.07717 [pdf, other]

Real-time Automatic M-mode Echocardiography Measurement with Panel Attention from Local-to-Global Pixels

Authors: Ching-Hsun Tseng, Shao-Ju Chien, Po-Shen Wang, Shin-Jye Lee, Wei-Huan Hu, Bin Pu, Xiao-jun Zeng

Abstract: Motion mode (M-mode) recording is an essential part of echocardiography to measure cardiac dimension and function. However, the current diagnosis cannot build an automatic scheme, as there are three fundamental obstructs: Firstly, there is no open dataset available to build the automation for ensuring constant results and bridging M-mode echocardiography with real-time instance segmentation (RIS);… ▽ More Motion mode (M-mode) recording is an essential part of echocardiography to measure cardiac dimension and function. However, the current diagnosis cannot build an automatic scheme, as there are three fundamental obstructs: Firstly, there is no open dataset available to build the automation for ensuring constant results and bridging M-mode echocardiography with real-time instance segmentation (RIS); Secondly, the examination is involving the time-consuming manual labelling upon M-mode echocardiograms; Thirdly, as objects in echocardiograms occupy a significant portion of pixels, the limited receptive field in existing backbones (e.g., ResNet) composed from multiple convolution layers are inefficient to cover the period of a valve movement. Existing non-local attentions (NL) compromise being unable real-time with a high computation overhead or losing information from a simplified version of the non-local block. Therefore, we proposed RAMEM, a real-time automatic M-mode echocardiography measurement scheme, contributes three aspects to answer the problems: 1) provide MEIS, a dataset of M-mode echocardiograms for instance segmentation, to enable consistent results and support the development of an automatic scheme; 2) propose panel attention, local-to-global efficient attention by pixel-unshuffling, embedding with updated UPANets V2 in a RIS scheme toward big object detection with global receptive field; 3) develop and implement AMEM, an efficient algorithm of automatic M-mode echocardiography measurement enabling fast and accurate automatic labelling among diagnosis. The experimental results show that RAMEM surpasses existing RIS backbones (with non-local attention) in PASCAL 2012 SBD and human performances in real-time MEIS tested. The code of MEIS and dataset are available at https://github.com/hanktseng131415go/RAME. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2305.00086 [pdf, other]

An Integrated System Dynamics and Discrete Event Supply Chain Simulation Framework for Supply Chain Resilience with Non-Stationary Pandemic Demand

Authors: Mustafa Can Camur, Chin-Yuan Tseng, Aristotelis E. Thanos, Chelsea C. White, Walter Yund, Eleftherios Iakovou

Abstract: COVID-19 resulted in some of the largest supply chain disruptions in recent history. To mitigate the impact of future disruptions, we propose an integrated hybrid simulation framework to couple nonstationary demand signals from an event like COVID-19 with a model of an end-to-end supply chain. We first create a system dynamics susceptible-infected-recovered (SIR) model, augmenting a classic epidem… ▽ More COVID-19 resulted in some of the largest supply chain disruptions in recent history. To mitigate the impact of future disruptions, we propose an integrated hybrid simulation framework to couple nonstationary demand signals from an event like COVID-19 with a model of an end-to-end supply chain. We first create a system dynamics susceptible-infected-recovered (SIR) model, augmenting a classic epidemiological model to create a realistic portrayal of demand patterns for oxygen concentrators (OC). Informed by this granular demand signal, we then create a supply chain discrete event simulation model of OC sourcing, manufacturing, and distribution to test production augmentation policies to satisfy this increased demand. This model utilizes publicly available data, engineering teardowns of OCs, and a supply chain illumination to identify suppliers. Our findings indicate that this coupled approach can use realistic demand during a disruptive event to enable rapid recommendations of policies for increased supply chain resilience with controlled cost. △ Less

Submitted 15 August, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

arXiv:2303.15583 [pdf, other]

doi 10.1145/3544548.3581416

Measuring Categorical Perception in Color-Coded Scatterplots

Authors: Chin Tseng, Ghulam Jilani Quadri, Zeyu Wang, Danielle Albers Szafir

Abstract: Scatterplots commonly use color to encode categorical data. However, as datasets increase in size and complexity, the efficacy of these channels may vary. Designers lack insight into how robust different design choices are to variations in category numbers. This paper presents a crowdsourced experiment measuring how the number of categories and choice of color encodings used in multiclass scatterp… ▽ More Scatterplots commonly use color to encode categorical data. However, as datasets increase in size and complexity, the efficacy of these channels may vary. Designers lack insight into how robust different design choices are to variations in category numbers. This paper presents a crowdsourced experiment measuring how the number of categories and choice of color encodings used in multiclass scatterplots influences the viewers' abilities to analyze data across classes. Participants estimated relative means in a series of scatterplots with 2 to 10 categories encoded using ten color palettes drawn from popular design tools. Our results show that the number of categories and color discriminability within a color palette notably impact people's perception of categorical data in scatterplots and that the judgments become harder as the number of categories grows. We examine existing palette design heuristics in light of our results to help designers make robust color choices informed by the parameters of their data. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: The paper has been accepted to the ACM CHI 2023. 14 pages, 7 figures

arXiv:2210.04029 [pdf, other]

EDU-level Extractive Summarization with Varying Summary Lengths

Authors: Yuping Wu, Ching-Hsun Tseng, Jiayu Shang, Shengzhong Mao, Goran Nenadic, Xiao-Jun Zeng

Abstract: Extractive models usually formulate text summarization as extracting fixed top-$k$ salient sentences from the document as a summary. Few works exploited extracting finer-grained Elementary Discourse Unit (EDU) with little analysis and justification for the extractive unit selection. Further, the selection strategy of the fixed top-$k$ salient sentences fits the summarization need poorly, as the nu… ▽ More Extractive models usually formulate text summarization as extracting fixed top-$k$ salient sentences from the document as a summary. Few works exploited extracting finer-grained Elementary Discourse Unit (EDU) with little analysis and justification for the extractive unit selection. Further, the selection strategy of the fixed top-$k$ salient sentences fits the summarization need poorly, as the number of salient sentences in different documents varies and therefore a common or best $k$ does not exist in reality. To fill these gaps, this paper first conducts the comparison analysis of oracle summaries based on EDUs and sentences, which provides evidence from both theoretical and experimental perspectives to justify and quantify that EDUs make summaries with higher automatic evaluation scores than sentences. Then, considering this merit of EDUs, this paper further proposes an EDU-level extractive model with Varying summary Lengths and develops the corresponding learning algorithm. EDU-VL learns to encode and predict probabilities of EDUs in the document, generate multiple candidate summaries with varying lengths based on various $k$ values, and encode and score candidate summaries, in an end-to-end training manner. Finally, EDU-VL is experimented on single and multi-document benchmark datasets and shows improved performances on ROUGE scores in comparison with state-of-the-art extractive models, and further human evaluation suggests that EDU-constituent summaries maintain good grammaticality and readability. △ Less

Submitted 13 March, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

Comments: Accepted to EACL 2023 Findings

arXiv:2209.13507 [pdf, other]

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

Authors: Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin Chen, Winston H. Hsu

Abstract: To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an… ▽ More To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an extremely low recall. Furthermore, directly applying depth prediction modules to existing multi-camera methods, generally composed of large network architectures, cannot meet the real-time requirements of self-driving applications. To address these issues, we propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR. First, our lightweight depth predictor is designed to produce precise object-wise sparse depth maps and low-dimensional depth embeddings without extra depth datasets during supervision. Second, a cross-view depth-guided transformer is developed to fuse the depth embeddings as well as image features from cameras of different views and generate 3D bounding boxes. Extensive experiments demonstrated that our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics. Also, computational analyses showed that our method is 5 times faster than prior approaches. Our codes will be made publicly available at https://github.com/sty61010/CrossDTR. △ Less

Submitted 3 February, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2023. The code is available at https://github.com/sty61010/CrossDTR

arXiv:2208.07256 [pdf, other]

Multi-modal Transformer Path Prediction for Autonomous Vehicle

Authors: Chia Hong Tseng, Jie Zhang, Min-Te Sun, Kazuya Sakai, Wei-Shinn Ku

Abstract: Reasoning about vehicle path prediction is an essential and challenging problem for the safe operation of autonomous driving systems. There exist many research works for path prediction. However, most of them do not use lane information and are not based on the Transformer architecture. By utilizing different types of data collected from sensors equipped on the self-driving vehicles, we propose a… ▽ More Reasoning about vehicle path prediction is an essential and challenging problem for the safe operation of autonomous driving systems. There exist many research works for path prediction. However, most of them do not use lane information and are not based on the Transformer architecture. By utilizing different types of data collected from sensors equipped on the self-driving vehicles, we propose a path prediction system named Multi-modal Transformer Path Prediction (MTPP) that aims to predict long-term future trajectory of target agents. To achieve more accurate path prediction, the Transformer architecture is adopted in our model. To better utilize the lane information, the lanes which are in opposite direction to target agent are not likely to be taken by the target agent and are consequently filtered out. In addition, consecutive lane chunks are combined to ensure the lane input to be long enough for path prediction. An extensive evaluation is conducted to show the efficacy of the proposed system using nuScene, a real-world trajectory forecasting dataset. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 9 pages, 12 figures, and 5 tables

arXiv:2201.13324 [pdf, other]

Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents

Authors: Pengyu Li, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu Huang, Benjamin Jarman, Deanna Needell

Abstract: Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we… ▽ More Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our proposed method improves both classification accuracy and topic coherence in comparison to past methods like Semi-Supervised Non-negative Matrix Factorization (SSNMF) and Guided Non-negative Matrix Factorization (Guided NMF). △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: 14 pages, 4 figures

arXiv:2112.01348 [pdf, other]

3rd Place Solution for NeurIPS 2021 Shifts Challenge: Vehicle Motion Prediction

Authors: Ching-Yu Tseng, Po-Shao Lin, Yu-Jia Liou, Kuan-Chih Huang, Winston H. Hsu

Abstract: Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift is a competition held by NeurIPS 2021. The objective of this competition is to search for methods to solve the motion prediction problem in cross-domain. In the real world dataset, It exists variance between input data distribution and ground-true data distribution, which is called the domain shift problem. In this… ▽ More Shifts Challenge: Robustness and Uncertainty under Real-World Distributional Shift is a competition held by NeurIPS 2021. The objective of this competition is to search for methods to solve the motion prediction problem in cross-domain. In the real world dataset, It exists variance between input data distribution and ground-true data distribution, which is called the domain shift problem. In this report, we propose a new architecture inspired by state of the art papers. The main contribution is the backbone architecture with self-attention mechanism and predominant loss function. Subsequently, we won 3rd place as shown on the leaderboard. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Journal ref: Bayesian Deep Learning Workshop, NeurIPS 2021

arXiv:2110.05280 [pdf]

Multi-institutional Validation of Two-Streamed Deep Learning Method for Automated Delineation of Esophageal Gross Tumor Volume using planning-CT and FDG-PETCT

Authors: Xianghua Ye, Dazhou Guo, Chen-kan Tseng, Jia Ge, Tsung-Min Hung, Ping-Ching Pai, Yanping Ren, Lu Zheng, Xinli Zhu, Ling Peng, Ying Chen, Xiaohua Chen, Chen-Yu Chou, Danni Chen, Jiaze Yu, Yuzhen Chen, Feiran Jiao, Yi Xin, Lingyun Huang, Guotong Xie, Jing Xiao, Le Lu, Senxiang Yan, Dakai Jin, Tsung-Ying Ho

Abstract: Background: The current clinical workflow for esophageal gross tumor volume (GTV) contouring relies on manual delineation of high labor-costs and interuser variability. Purpose: To validate the clinical applicability of a deep learning (DL) multi-modality esophageal GTV contouring model, developed at 1 institution whereas tested at multiple ones. Methods and Materials: We collected 606 esophageal… ▽ More Background: The current clinical workflow for esophageal gross tumor volume (GTV) contouring relies on manual delineation of high labor-costs and interuser variability. Purpose: To validate the clinical applicability of a deep learning (DL) multi-modality esophageal GTV contouring model, developed at 1 institution whereas tested at multiple ones. Methods and Materials: We collected 606 esophageal cancer patients from four institutions. 252 institution-1 patients had a treatment planning-CT (pCT) and a pair of diagnostic FDG-PETCT; 354 patients from other 3 institutions had only pCT. A two-streamed DL model for GTV segmentation was developed using pCT and PETCT scans of a 148 patient institution-1 subset. This built model had the flexibility of segmenting GTVs via only pCT or pCT+PETCT combined. For independent evaluation, the rest 104 institution-1 patients behaved as unseen internal testing, and 354 institutions 2-4 patients were used for external testing. We evaluated manual revision degrees by human experts to assess the contour-editing effort. The performance of the deep model was compared against 4 radiation oncologists in a multiuser study with 20 random external patients. Contouring accuracy and time were recorded for the pre-and post-DL assisted delineation process. Results: Our model achieved high segmentation accuracy in internal testing (mean Dice score: 0.81 using pCT and 0.83 using pCT+PET) and generalized well to external evaluation (mean DSC: 0.80). Expert assessment showed that the predicted contours of 88% patients need only minor or no revision. In multi-user evaluation, with the assistance of a deep model, inter-observer variation and required contouring time were reduced by 37.6% and 48.0%, respectively. Conclusions: Deep learning predicted GTV contours were in close agreement with the ground truth and could be adopted clinically with mostly minor or no changes. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 36 pages, 10 figures

arXiv:2110.00199 [pdf]

doi 10.1109/IJCNN55064.2022.9892245

Perturbated Gradients Updating within Unit Space for Deep Learning

Authors: Ching-Hsun. Tseng, Liu-Hsueh. Cheng, Shin-Jye. Lee, Xiaojun Zeng

Abstract: In deep learning, optimization plays a vital role. By focusing on image classification, this work investigates the pros and cons of the widely used optimizers, and proposes a new optimizer: Perturbated Unit Gradient Descent (PUGD) algorithm with extending normalized gradient operation in tensor within perturbation to update in unit space. Via a set of experiments and analyses, we show that PUGD is… ▽ More In deep learning, optimization plays a vital role. By focusing on image classification, this work investigates the pros and cons of the widely used optimizers, and proposes a new optimizer: Perturbated Unit Gradient Descent (PUGD) algorithm with extending normalized gradient operation in tensor within perturbation to update in unit space. Via a set of experiments and analyses, we show that PUGD is locally bounded updating, which means the updating from time to time is controlled. On the other hand, PUGD can push models to a flat minimum, where the error remains approximately constant, not only because of the nature of avoiding stationary points in gradient normalization but also by scanning sharpness in the unit ball. From a series of rigorous experiments, PUGD helps models to gain a state-of-the-art Top-1 accuracy in Tiny ImageNet and competitive performances in CIFAR- {10, 100}. We open-source our code at link: https://github.com/hanktseng131415go/PUGD. △ Less

Submitted 24 January, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2104.04687 [pdf, other]

Learning from 2D: Contrastive Pixel-to-Point Knowledge Transfer for 3D Pretraining

Authors: Yueh-Cheng Liu, Yu-Kai Huang, Hung-Yueh Chiang, Hung-Ting Su, Zhe-Yu Liu, Chin-Tang Chen, Ching-Yu Tseng, Winston H. Hsu

Abstract: Most 3D neural networks are trained from scratch owing to the lack of large-scale labeled 3D datasets. In this paper, we present a novel 3D pretraining method by leveraging 2D networks learned from rich 2D datasets. We propose the contrastive pixel-to-point knowledge transfer to effectively utilize the 2D information by mapping the pixel-level and point-level features into the same embedding space… ▽ More Most 3D neural networks are trained from scratch owing to the lack of large-scale labeled 3D datasets. In this paper, we present a novel 3D pretraining method by leveraging 2D networks learned from rich 2D datasets. We propose the contrastive pixel-to-point knowledge transfer to effectively utilize the 2D information by mapping the pixel-level and point-level features into the same embedding space. Due to the heterogeneous nature between 2D and 3D networks, we introduce the back-projection function to align the features between 2D and 3D to make the transfer possible. Additionally, we devise an upsampling feature projection layer to increase the spatial resolution of high-level 2D feature maps, which enables learning fine-grained 3D representations. With a pretrained 2D network, the proposed pretraining process requires no additional 2D or 3D labeled data, further alleviating the expensive 3D data annotation cost. To the best of our knowledge, we are the first to exploit existing 2D trained weights to pretrain 3D deep neural networks. Our intensive experiments show that the 3D models pretrained with 2D knowledge boost the performances of 3D networks across various real-world 3D downstream tasks. △ Less

Submitted 27 December, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

arXiv:2104.02215 [pdf, other]

When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes

Authors: Philipp Bomatter, Mengmi Zhang, Dimitar Karev, Spandan Madan, Claire Tseng, Gabriel Kreiman

Abstract: Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene con… ▽ More Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition, and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/WhenPigsFlyContext △ Less

Submitted 11 August, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

Comments: International Conference on Computer Vision (ICCV), 2021

arXiv:2103.08640 [pdf]

doi 10.3390/e24091243

UPANets: Learning from the Universal Pixel Attention Networks

Authors: Ching-Hsun Tseng, Shin-Jye Lee, Jia-Nan Feng, Shengzhong Mao, Yu-Ping Wu, Jia-Yu Shang, Mou-Chung Tseng, Xiao-Jun Zeng

Abstract: Among image classification, skip and densely-connection-based networks have dominated most leaderboards. Recently, from the successful development of multi-head attention in natural language processing, it is sure that now is a time of either using a Transformer-like model or hybrid CNNs with attention. However, the former need a tremendous resource to train, and the latter is in the perfect balan… ▽ More Among image classification, skip and densely-connection-based networks have dominated most leaderboards. Recently, from the successful development of multi-head attention in natural language processing, it is sure that now is a time of either using a Transformer-like model or hybrid CNNs with attention. However, the former need a tremendous resource to train, and the latter is in the perfect balance in this direction. In this work, to make CNNs handle global and local information, we proposed UPANets, which equips channel-wise attention with a hybrid skip-densely-connection structure. Also, the extreme-connection structure makes UPANets robust with a smoother loss landscape. In experiments, UPANets surpassed most well-known and widely-used SOTAs with an accuracy of 96.47% in Cifar-10, 80.29% in Cifar-100, and 67.67% in Tiny Imagenet. Most importantly, these performances have high parameters efficiency and only trained in one customer-based GPU. We share implementing code of UPANets in https://github.com/hanktseng131415go/UPANets. △ Less

Submitted 22 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2103.05927 [pdf]

Deep Sensing of Urban Waterlogging

Authors: Shi-Wei Lo, Jyh-Horng Wu, Jo-Yu Chang, Chien-Hao Tseng, Meng-Wei Lin, Fang-Pang Lin

Abstract: In the monsoon season, sudden flood events occur frequently in urban areas, which hamper the social and economic activities and may threaten the infrastructure and lives. The use of an efficient large-scale waterlogging sensing and information system can provide valuable real-time disaster information to facilitate disaster management and enhance awareness of the general public to alleviate losses… ▽ More In the monsoon season, sudden flood events occur frequently in urban areas, which hamper the social and economic activities and may threaten the infrastructure and lives. The use of an efficient large-scale waterlogging sensing and information system can provide valuable real-time disaster information to facilitate disaster management and enhance awareness of the general public to alleviate losses during and after flood disasters. Therefore, in this study, a visual sensing approach driven by deep neural networks and information and communication technology was developed to provide an end-to-end mechanism to realize waterlogging sensing and event-location mapping. The use of a deep sensing system in the monsoon season in Taiwan was demonstrated, and waterlogging events were predicted on the island-wide scale. The system could sense approximately 2379 vision sources through an internet of video things framework and transmit the event-location information in 5 min. The proposed approach can sense waterlogging events at a national scale and provide an efficient and highly scalable alternative to conventional waterlogging sensing methods. △ Less

Submitted 15 August, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: 19 pages, 14 figures, under submitting and patenting

Report number: revise-2021-05-25

arXiv:2102.03049 [pdf]

doi 10.1371/journal.pone.0254134

Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V1

Authors: Fu-Shun Hsu, Shang-Ran Huang, Chien-Wen Huang, Chao-Jung Huang, Yuan-Ren Cheng, Chun-Chieh Chen, Jack Hsiao, Chung-Wei Chen, Li-Chin Chen, Yen-Chun Lai, Bi-Fang Hsu, Nian-Jhen Lin, Wan-Lin Tsai, Yi-Lin Wu, Tzu-Ling Tseng, Ching-Ting Tseng, Yi-Tsun Chen, Feipei Lai

Abstract: A reliable, remote, and continuous real-time respiratory sound monitor with automated respiratory sound analysis ability is urgently required in many clinical scenarios-such as in monitoring disease progression of coronavirus disease 2019-to replace conventional auscultation with a handheld stethoscope. However, a robust computerized respiratory sound analysis algorithm has not yet been validated… ▽ More A reliable, remote, and continuous real-time respiratory sound monitor with automated respiratory sound analysis ability is urgently required in many clinical scenarios-such as in monitoring disease progression of coronavirus disease 2019-to replace conventional auscultation with a handheld stethoscope. However, a robust computerized respiratory sound analysis algorithm has not yet been validated in practical applications. In this study, we developed a lung sound database (HF_Lung_V1) comprising 9,765 audio files of lung sounds (duration of 15 s each), 34,095 inhalation labels, 18,349 exhalation labels, 13,883 continuous adventitious sound (CAS) labels (comprising 8,457 wheeze labels, 686 stridor labels, and 4,740 rhonchi labels), and 15,606 discontinuous adventitious sound labels (all crackles). We conducted benchmark tests for long short-term memory (LSTM), gated recurrent unit (GRU), bidirectional LSTM (BiLSTM), bidirectional GRU (BiGRU), convolutional neural network (CNN)-LSTM, CNN-GRU, CNN-BiLSTM, and CNN-BiGRU models for breath phase detection and adventitious sound detection. We also conducted a performance comparison between the LSTM-based and GRU-based models, between unidirectional and bidirectional models, and between models with and without a CNN. The results revealed that these models exhibited adequate performance in lung sound analysis. The GRU-based models outperformed, in terms of F1 scores and areas under the receiver operating characteristic curves, the LSTM-based models in most of the defined tasks. Furthermore, all bidirectional models outperformed their unidirectional counterparts. Finally, the addition of a CNN improved the accuracy of lung sound analysis, especially in the CAS detection tasks. △ Less

Submitted 12 July, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

Comments: 48 pages, 8 figures. Accepted by PLoS One

Journal ref: PLoS ONE, 2021, 16(7): e0254134

arXiv:2101.07779 [pdf, other]

doi 10.1002/spe.3120

Collaborative Experience between Scientific Software Projects using Agile Scrum Development

Authors: A. L. Baxter, S. Y. BenZvi, W. Bonivento, A. Brazier, M. Clark, A. Coleiro, D. Collom, M. Colomer-Molla, B. Cousins, A. Delgado Orellana, D. Dornic, V. Ekimtcov, S. ElSayed, A. Gallo Rosso, P. Godwin, S. Griswold, A. Habig, S. Horiuchi, D. A. Howell, M. W. G. Johnson, M. Juric, J. P. Kneller, A. Kopec, C. Kopper, V. Kulikovskiy , et al. (27 additional authors not shown)

Abstract: Developing sustainable software for the scientific community requires expertise in software engineering and domain science. This can be challenging due to the unique needs of scientific software, the insufficient resources for software engineering practices in the scientific community, and the complexity of developing for evolving scientific contexts. While open-source software can partially addre… ▽ More Developing sustainable software for the scientific community requires expertise in software engineering and domain science. This can be challenging due to the unique needs of scientific software, the insufficient resources for software engineering practices in the scientific community, and the complexity of developing for evolving scientific contexts. While open-source software can partially address these concerns, it can introduce complicating dependencies and delay development. These issues can be reduced if scientists and software developers collaborate. We present a case study wherein scientists from the SuperNova Early Warning System collaborated with software developers from the Scalable Cyberinfrastructure for Multi-Messenger Astrophysics project. The collaboration addressed the difficulties of open-source software development, but presented additional risks to each team. For the scientists, there was a concern of relying on external systems and lacking control in the development process. For the developers, there was a risk in supporting a user-group while maintaining core development. These issues were mitigated by creating a second Agile Scrum framework in parallel with the developers' ongoing Agile Scrum process. This Agile collaboration promoted communication, ensured that the scientists had an active role in development, and allowed the developers to evaluate and implement the scientists' software requirements. The collaboration provided benefits for each group: the scientists actuated their development by using an existing platform, and the developers utilized the scientists' use-case to improve their systems. This case study suggests that scientists and software developers can avoid scientific computing issues by collaborating and that Agile Scrum methods can address emergent concerns. △ Less

Submitted 2 August, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

Comments: Revisions: in response to peer-review recommendations, most sections have been substantially expanded and reworked, five new figures have been added, and the title has been changed. Results unchanged

arXiv:1911.07349 [pdf, other]

Putting visual object recognition in context

Authors: Mengmi Zhang, Claire Tseng, Gabriel Kreiman

Abstract: Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g., a cow in the ocean). To model the role of contextual information in visual recognition, we systematically investigated ten critical properties of where, when, and how context modulates recognition, including the amount of… ▽ More Context plays an important role in visual recognition. Recent studies have shown that visual recognition networks can be fooled by placing objects in inconsistent contexts (e.g., a cow in the ocean). To model the role of contextual information in visual recognition, we systematically investigated ten critical properties of where, when, and how context modulates recognition, including the amount of context, context and object resolution, geometrical structure of context, context congruence, and temporal dynamics of contextual modulation. The tasks involved recognizing a target object surrounded with context in a natural image. As an essential benchmark, we conducted a series of psychophysics experiments where we altered one aspect of context at a time, and quantified recognition accuracy. We propose a biologically-inspired context-aware object recognition model consisting of a two-stream architecture. The model processes visual information at the fovea and periphery in parallel, dynamically incorporates object and contextual information, and sequentially reasons about the class label for the target object. Across a wide range of behavioral tasks, the model approximates human level performance without retraining for each task, captures the dependence of context enhancement on image properties, and provides initial steps towards integrating scene and object information for visual recognition. All source code and data are publicly available: https://github.com/kreimanlab/Put-In-Context. △ Less

Submitted 25 March, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: 8 pages, CVPR2020

arXiv:1909.01526 [pdf, other]

Deep Esophageal Clinical Target Volume Delineation using Encoded 3D Spatial Context of Tumors, Lymph Nodes, and Organs At Risk

Authors: Dakai Jin, Dazhou Guo, Tsung-Ying Ho, Adam P. Harrison, Jing Xiao, Chen-kan Tseng, Le Lu

Abstract: Clinical target volume (CTV) delineation from radiotherapy computed tomography (RTCT) images is used to define the treatment areas containing the gross tumor volume (GTV) and/or sub-clinical malignant disease for radiotherapy (RT). High intra- and inter-user variability makes this a particularly difficult task for esophageal cancer. This motivates automated solutions, which is the aim of our work.… ▽ More Clinical target volume (CTV) delineation from radiotherapy computed tomography (RTCT) images is used to define the treatment areas containing the gross tumor volume (GTV) and/or sub-clinical malignant disease for radiotherapy (RT). High intra- and inter-user variability makes this a particularly difficult task for esophageal cancer. This motivates automated solutions, which is the aim of our work. Because CTV delineation is highly context-dependent--it must encompass the GTV and regional lymph nodes (LNs) while also avoiding excessive exposure to the organs at risk (OARs)--we formulate it as a deep contextual appearance-based problem using encoded spatial contexts of these anatomical structures. This allows the deep network to better learn from and emulate the margin- and appearance-based delineation performed by human physicians. Additionally, we develop domain-specific data augmentation to inject robustness to our system. Finally, we show that a simple 3D progressive holistically nested network (PHNN), which avoids computationally heavy decoding paths while still aggregating features at different levels of context, can outperform more complicated networks. Cross-validated experiments on a dataset of 135 esophageal cancer patients demonstrate that our encoded spatial context approach can produce concrete performance improvements, with an average Dice score of 83.9% and an average surface distance of 4.2 mm, representing improvements of 3.8% and 2.4 mm, respectively, over the state-of-the-art approach. △ Less

Submitted 5 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: MICCAI 2019 (early accept)

arXiv:1909.01524 [pdf, other]

Accurate Esophageal Gross Tumor Volume Segmentation in PET/CT using Two-Stream Chained 3D Deep Network Fusion

Authors: Dakai Jin, Dazhou Guo, Tsung-Ying Ho, Adam P. Harrison, Jing Xiao, Chen-kan Tseng, Le Lu

Abstract: Gross tumor volume (GTV) segmentation is a critical step in esophageal cancer radiotherapy treatment planning. Inconsistencies across oncologists and prohibitive labor costs motivate automated approaches for this task. However, leading approaches are only applied to radiotherapy computed tomography (RTCT) images taken prior to treatment. This limits the performance as RTCT suffers from low contras… ▽ More Gross tumor volume (GTV) segmentation is a critical step in esophageal cancer radiotherapy treatment planning. Inconsistencies across oncologists and prohibitive labor costs motivate automated approaches for this task. However, leading approaches are only applied to radiotherapy computed tomography (RTCT) images taken prior to treatment. This limits the performance as RTCT suffers from low contrast between the esophagus, tumor, and surrounding tissues. In this paper, we aim to exploit both RTCT and positron emission tomography (PET) imaging modalities to facilitate more accurate GTV segmentation. By utilizing PET, we emulate medical professionals who frequently delineate GTV boundaries through observation of the RTCT images obtained after prescribing radiotherapy and PET/CT images acquired earlier for cancer staging. To take advantage of both modalities, we present a two-stream chained segmentation approach that effectively fuses the CT and PET modalities via early and late 3D deep-network-based fusion. Furthermore, to effect the fusion and segmentation we propose a simple yet effective progressive semantically nested network (PSNN) model that outperforms more complicated models. Extensive 5-fold cross-validation on 110 esophageal cancer patients, the largest analysis to date, demonstrates that both the proposed two-stream chained segmentation pipeline and the PSNN model can significantly improve the quantitative performance over the previous state-of-the-art work by 11% in absolute Dice score (DSC) (from 0.654 to 0.764) and, at the same time, reducing the Hausdorff distance from 129 mm to 47 mm. △ Less

Submitted 5 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: MICCAI 2019 (early accept and oral presentation)

arXiv:1904.03086 [pdf, other]

Radiotherapy Target Contouring with Convolutional Gated Graph Neural Network

Authors: Chun-Hung Chao, Yen-Chi Cheng, Hsien-Tzu Cheng, Chi-Wen Huang, Tsung-Ying Ho, Chen-Kan Tseng, Le Lu, Min Sun

Abstract: Tomography medical imaging is essential in the clinical workflow of modern cancer radiotherapy. Radiation oncologists identify cancerous tissues, applying delineation on treatment regions throughout all image slices. This kind of task is often formulated as a volumetric segmentation task by means of 3D convolutional networks with considerable computational cost. Instead, inspired by the treating m… ▽ More Tomography medical imaging is essential in the clinical workflow of modern cancer radiotherapy. Radiation oncologists identify cancerous tissues, applying delineation on treatment regions throughout all image slices. This kind of task is often formulated as a volumetric segmentation task by means of 3D convolutional networks with considerable computational cost. Instead, inspired by the treating methodology of considering meaningful information across slices, we used Gated Graph Neural Network to frame this problem more efficiently. More specifically, we propose convolutional recurrent Gated Graph Propagator (GGP) to propagate high-level information through image slices, with learnable adjacency weighted matrix. Furthermore, as physicians often investigate a few specific slices to refine their decision, we model this slice-wise interaction procedure to further improve our segmentation result. This can be set by editing any slice effortlessly as updating predictions of other slices using GGP. To evaluate our method, we collect an Esophageal Cancer Radiotherapy Target Treatment Contouring dataset of 81 patients which includes tomography images with radiotherapy target. On this dataset, our convolutional graph network produces state-of-the-art results and outperforms the baselines. With the addition of interactive setting, performance is improved even further. Our method has the potential to be easily applied to diverse kinds of medical tasks with volumetric images. Incorporating both the ability to make a feasible prediction and to consider the human interactive input, the proposed method is suitable for clinical scenarios. △ Less

Submitted 5 April, 2019; originally announced April 2019.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018. Version 2

arXiv:1902.00163 [pdf, other]

Lift-the-flap: what, where and when for context reasoning

Authors: Mengmi Zhang, Claire Tseng, Karla Montejo, Joseph Kwon, Gabriel Kreiman

Abstract: Context reasoning is critical in a wide variety of applications where current inputs need to be interpreted in the light of previous experience and knowledge. Both spatial and temporal contextual information play a critical role in the domain of visual recognition. Here we investigate spatial constraints (what image features provide contextual information and where they are located), and temporal… ▽ More Context reasoning is critical in a wide variety of applications where current inputs need to be interpreted in the light of previous experience and knowledge. Both spatial and temporal contextual information play a critical role in the domain of visual recognition. Here we investigate spatial constraints (what image features provide contextual information and where they are located), and temporal constraints (when different contextual cues matter) for visual recognition. The task is to reason about the scene context and infer what a target object hidden behind a flap is in a natural image. To tackle this problem, we first describe an online human psychophysics experiment recording active sampling via mouse clicks in lift-the-flap games and identify clicking patterns and features which are diagnostic for high contextual reasoning accuracy. As a proof of the usefulness of these clicking patterns and visual features, we extend a state-of-the-art recurrent model capable of attending to salient context regions, dynamically integrating useful information, making inferences, and predicting class label for the target object over multiple clicks. The proposed model achieves human-level contextual reasoning accuracy, shares human-like sampling behavior and learns interpretable features for contextual reasoning. △ Less

Submitted 24 September, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

arXiv:1801.08650 [pdf]

doi 10.1109/FUZZ-IEEE.2018.8491610

Ontology-based Fuzzy Markup Language Agent for Student and Robot Co-Learning

Authors: Chang-Shing Lee, Mei-Hui Wang, Tzong-Xiang Huang, Li-Chung Chen, Yung-Ching Huang, Sheng-Chi Yang, Chien-Hsun Tseng, Pi-Hsia Hung, Naoyuki Kubota

Abstract: An intelligent robot agent based on domain ontology, machine learning mechanism, and Fuzzy Markup Language (FML) for students and robot co-learning is presented in this paper. The machine-human co-learning model is established to help various students learn the mathematical concepts based on their learning ability and performance. Meanwhile, the robot acts as a teacher's assistant to co-learn with… ▽ More An intelligent robot agent based on domain ontology, machine learning mechanism, and Fuzzy Markup Language (FML) for students and robot co-learning is presented in this paper. The machine-human co-learning model is established to help various students learn the mathematical concepts based on their learning ability and performance. Meanwhile, the robot acts as a teacher's assistant to co-learn with children in the class. The FML-based knowledge base and rule base are embedded in the robot so that the teachers can get feedback from the robot on whether students make progress or not. Next, we inferred students' learning performance based on learning content's difficulty and students' ability, concentration level, as well as teamwork sprit in the class. Experimental results show that learning with the robot is helpful for disadvantaged and below-basic children. Moreover, the accuracy of the intelligent FML-based agent for student learning is increased after machine learning mechanism. △ Less

Submitted 25 January, 2018; originally announced January 2018.

Comments: This paper is submitted to IEEE WCCI 2018 Conference for review

arXiv:1709.08463 [pdf, other]

doi 10.1109/TITS.2018.2839265

Improving Viability of Electric Taxis by Taxi Service Strategy Optimization: A Big Data Study of New York City

Authors: Chien-Ming Tseng, Sid Chi-Kin Chau, Xue Liu

Abstract: Electrification of transportation is critical for a low-carbon society. In particular, public vehicles (e.g., taxis) provide a crucial opportunity for electrification. Despite the benefits of eco-friendliness and energy efficiency, adoption of electric taxis faces several obstacles, including constrained driving range, long recharging duration, limited charging stations and low gas price, all of w… ▽ More Electrification of transportation is critical for a low-carbon society. In particular, public vehicles (e.g., taxis) provide a crucial opportunity for electrification. Despite the benefits of eco-friendliness and energy efficiency, adoption of electric taxis faces several obstacles, including constrained driving range, long recharging duration, limited charging stations and low gas price, all of which impede taxi drivers' decisions to switch to electric taxis. On the other hand, the popularity of ride-hailing mobile apps facilitates the computerization and optimization of taxi service strategies, which can provide computer-assisted decisions of navigation and roaming for taxi drivers to locate potential customers. This paper examines the viability of electric taxis with the assistance of taxi service strategy optimization, in comparison with conventional taxis with internal combustion engines. A big data study is provided using a large dataset of real-world taxi trips in New York City. Our methodology is to first model the computerized taxi service strategy by Markov Decision Process (MDP), and then obtain the optimized taxi service strategy based on NYC taxi trip dataset. The profitability of electric taxi drivers is studied empirically under various battery capacity and charging conditions. Consequently, we shed light on the solutions that can improve viability of electric taxis. △ Less

Submitted 18 May, 2018; v1 submitted 25 September, 2017; originally announced September 2017.

Comments: This paper appears in IEEE Transactions on Intelligent Transportation Systems

Journal ref: IEEE Transactions on Intelligent Transportation Systems, Vol. 20, No. 3, pp817-829, Mar 2019

arXiv:1703.10049 [pdf, other]

doi 10.1109/TASE.2022.3175565

Autonomous Recharging and Flight Mission Planning for Battery-operated Autonomous Drones

Authors: Rashid Alyassi, Majid Khonji, Areg Karapetyan, Sid Chi-Kin Chau, Khaled Elbassioni, Chien-Ming Tseng

Abstract: Unmanned aerial vehicles (UAVs), commonly known as drones, are being increasingly deployed throughout the globe as a means to streamline monitoring, inspection, mapping, and logistic routines. When dispatched on autonomous missions, drones require an intelligent decision-making system for trajectory planning and tour optimization. Given the limited capacity of their onboard batteries, a key design… ▽ More Unmanned aerial vehicles (UAVs), commonly known as drones, are being increasingly deployed throughout the globe as a means to streamline monitoring, inspection, mapping, and logistic routines. When dispatched on autonomous missions, drones require an intelligent decision-making system for trajectory planning and tour optimization. Given the limited capacity of their onboard batteries, a key design challenge is to ensure the underlying algorithms can efficiently optimize the mission objectives along with recharging operations during long-haul flights. With this in view, the present work undertakes a comprehensive study on automated tour management systems for an energy-constrained drone: (1) We construct a machine learning model that estimates the energy expenditure of typical multi-rotor drones while accounting for real-world aspects and extrinsic meteorological factors. (2) Leveraging this model, the joint program of flight mission planning and recharging optimization is formulated as a multi-criteria Asymmetric Traveling Salesman Problem (ATSP), wherein a drone seeks for the time-optimal energy-feasible tour that visits all the target sites and refuels whenever necessary. (3) We devise an efficient approximation algorithm with provable worst-case performance guarantees and implement it in a drone management system, which supports real-time flight path tracking and re-computation in dynamic environments. (4) The effectiveness and practicality of the proposed approach are validated through extensive numerical simulations as well as real-world experiments. △ Less

Submitted 19 April, 2022; v1 submitted 29 March, 2017; originally announced March 2017.

Journal ref: IEEE Transactions on Automation Science and Engineering, vol. 20, no. 2, pp. 1034-1046, April 2023

arXiv:1611.01032 [pdf, other]

doi 10.1109/TITS.2017.2691606

Drive Mode Optimization and Path Planning for Plug-in Hybrid Electric Vehicles

Authors: Chi-Kin Chau, Khaled Elbassioni, Chien-Ming Tseng

Abstract: Drive modes are driver-selectable pre-set configurations of powertrain and certain vehicle parameters. Plug-in hybrid electric vehicles (PHEVs) typically feature special options of drive modes that can affect the hybrid energy source management system, for example, electric vehicle (EV) mode (that draws fully on battery) and charge sustaining (CS) mode (that utilizes internal combustion engine to… ▽ More Drive modes are driver-selectable pre-set configurations of powertrain and certain vehicle parameters. Plug-in hybrid electric vehicles (PHEVs) typically feature special options of drive modes that can affect the hybrid energy source management system, for example, electric vehicle (EV) mode (that draws fully on battery) and charge sustaining (CS) mode (that utilizes internal combustion engine to charge battery while propelling the vehicle). This paper studies an optimization problem to enable the driver to select the appropriate drive modes for fuel minimization. We develop optimization algorithms that optimize the decisions of drive modes based on trip information, and integrated with path planning to find an optimal path, considering intermediate filling and charging stations. We further provide an online algorithm that is based on the revealed trip information. We evaluate our algorithms empirically on a Chevrolet Volt, which shows significant fuel savings. △ Less

Submitted 4 April, 2017; v1 submitted 2 November, 2016; originally announced November 2016.

Comments: To appear in IEEE Transactions on Intelligent Transportation Systems

Journal ref: IEEE Transactions on Intelligent Transportation Systems ( Volume: 18, Issue: 12, Dec. 2017 ), pp 3421 - 3432

arXiv:1610.00171 [pdf, other]

doi 10.1109/TITS.2017.2672880

Personalized Prediction of Vehicle Energy Consumption based on Participatory Sensing

Authors: Chien-Ming Tseng, Chi-Kin Chau

Abstract: The advent of abundant on-board sensors and electronic devices in vehicles populates the paradigm of participatory sensing to harness crowd-sourced data gathering for intelligent transportation applications, such as distance-to-empty prediction and eco-routing. While participatory sensing can provide diverse driving data, there lacks a systematic study of effective utilization of the data for pers… ▽ More The advent of abundant on-board sensors and electronic devices in vehicles populates the paradigm of participatory sensing to harness crowd-sourced data gathering for intelligent transportation applications, such as distance-to-empty prediction and eco-routing. While participatory sensing can provide diverse driving data, there lacks a systematic study of effective utilization of the data for personalized prediction. There are considerable challenges on how to interpolate the missing data from a sparse dataset, which often arises from participatory sensing. This paper presents and compares various approaches for personalized vehicle energy consumption prediction, including a blackbox framework that identifies driver/vehicle/environment-dependent factors and a collaborative filtering approach based on matrix factorization. Furthermore, a case study of distance-to-empty prediction for electric vehicles by participatory sensing data is conducted and evaluated empirically, which shows that our approaches can significantly improve the prediction accuracy. △ Less

Submitted 20 February, 2017; v1 submitted 1 October, 2016; originally announced October 2016.

Comments: To appear in IEEE Transactions on Intelligent Transportation Systems

Journal ref: IEEE Transactions on Intelligent Transportation Systems ( Volume: 18, Issue: 11, Nov. 2017 ), pp 3103 - 3113

arXiv:1311.7225

Link Quality Control Mechanism for Selective and Opportunistic AF Relaying in Cooperative ARQs: A MLSD Perspective

Authors: Chun-Kai Tseng, Sau-Hsuan Wu

Abstract: Incorporating relaying techniques into Automatic Repeat reQuest (ARQ) mechanisms gives a general impression of diversity and throughput enhancements. Allowing overhearing among multiple relays is also a known approach to increase the number of participating relays in ARQs. However, when opportunistic amplify-and-forward (AF) relaying is applied to cooperative ARQs, the system design becomes nontri… ▽ More Incorporating relaying techniques into Automatic Repeat reQuest (ARQ) mechanisms gives a general impression of diversity and throughput enhancements. Allowing overhearing among multiple relays is also a known approach to increase the number of participating relays in ARQs. However, when opportunistic amplify-and-forward (AF) relaying is applied to cooperative ARQs, the system design becomes nontrivial and even involved. Based on outage analysis, the spatial and temporal diversities are first found sensitive to the received signal qualities of relays, and a link quality control mechanism is then developed to prescreen candidate relays in order to explore the diversity of cooperative ARQs with a selective and opportunistic AF (SOAF) relaying method. According to the analysis, the temporal and spatial diversities can be fully exploited if proper thresholds are set for each hop along the relaying routes. The SOAF relaying method is further examined from a packet delivery viewpoint. By the principle of the maximum likelihood sequence detection (MLSD), sufficient conditions on the link quality are established for the proposed SOAF-relaying-based ARQ scheme to attain its potential diversity order in the packet error rates (PERs) of MLSD. The conditions depend on the minimum codeword distance and the average signal-to-noise ratio (SNR). Furthermore, from a heuristic viewpoint, we also develop a threshold searching algorithm for the proposed SOAF relaying and link quality method to exploit both the diversity and the SNR gains in PER. The effectiveness of the proposed thresholding mechanism is verified via simulations with trellis codes. △ Less

Submitted 1 February, 2016; v1 submitted 28 November, 2013; originally announced November 2013.

Comments: This paper has been withdrawn by the authors due to an improper proof for Theorem 2. To avoid a misleading understanding, we thus decide to withdraw this paper

arXiv:1308.5168 [pdf, ps, other]

Is Somebody Watching Your Facebook Newsfeed?

Authors: Shan-Hung Wu, Man-Ju Chou, Ming-Hung Wang, Chun-Hsiung Tseng, Yuh-Jye Lee, Kuan-Ta Chen

Abstract: With the popularity of Social Networking Services (SNS), more and more sensitive information are stored online and associated with SNS accounts. The obvious value of SNS accounts motivates the usage stealing problem -- unauthorized, stealthy use of SNS accounts on the devices owned/used by account owners without any technology hacks. For example, anxious parents may use their kids' SNS accounts to… ▽ More With the popularity of Social Networking Services (SNS), more and more sensitive information are stored online and associated with SNS accounts. The obvious value of SNS accounts motivates the usage stealing problem -- unauthorized, stealthy use of SNS accounts on the devices owned/used by account owners without any technology hacks. For example, anxious parents may use their kids' SNS accounts to inspect the kids' social status; husbands/wives may use their spouses' SNS accounts to spot possible affairs. Usage stealing could happen anywhere in any form, and seriously invades the privacy of account owners. However, there is no any currently known defense against such usage stealing. To an SNS operator (e.g., Facebook Inc.), usage stealing is hard to detect using traditional methods because such attackers come from the same IP addresses/devices, use the same credentials, and share the same accounts as the owners do. In this paper, we propose a novel continuous authentication approach that analyzes user browsing behavior to detect SNS usage stealing incidents. We use Facebook as a case study and show that it is possible to detect such incidents by analyzing SNS browsing behavior. Our experiment results show that our proposal can achieve higher than 80% detection accuracy within 2 minutes, and higher than 90% detection accuracy after 7 minutes of observation time. △ Less

Submitted 23 August, 2013; originally announced August 2013.

arXiv:1010.5691 [pdf, ps, other]

A Bio-Inspired Robust Adaptive Random Search Algorithm for Distributed Beamforming

Authors: Chia-Shiang Tseng, Chang-Ching Chen, Che Lin

Abstract: A bio-inspired robust adaptive random search algorithm (BioRARSA), designed for distributed beamforming for sensor and relay networks, is proposed in this work. It has been shown via a systematic framework that BioRARSA converges in probability and its convergence time scales linearly with the number of distributed transmitters. More importantly, extensive simulation results demonstrate that the p… ▽ More A bio-inspired robust adaptive random search algorithm (BioRARSA), designed for distributed beamforming for sensor and relay networks, is proposed in this work. It has been shown via a systematic framework that BioRARSA converges in probability and its convergence time scales linearly with the number of distributed transmitters. More importantly, extensive simulation results demonstrate that the proposed BioRARSA outperforms existing adaptive distributed beamforming schemes by as large as 29.8% on average. This increase in performance results from the fact that BioRARSA can adaptively adjust its sampling stepsize via the "swim" behavior inspired by the bacterial foraging mechanism. Hence, the convergence time of BioRARSA is insensitive to the initial sampling stepsize of the algorithm, which makes it robust against the dynamic nature of distributed wireless networks. △ Less

Submitted 15 February, 2011; v1 submitted 27 October, 2010; originally announced October 2010.

Comments: 6 pages, 5 figures, In proc. ICC 2011

arXiv:0808.4160 [pdf, ps, other]

doi 10.1016/j.physa.2008.08.035

Using Relative Entropy to Find Optimal Approximations: an Application to Simple Fluids

Authors: Chih-Yuan Tseng, Ariel Caticha

Abstract: We develop a maximum relative entropy formalism to generate optimal approximations to probability distributions. The central results consist in (a) justifying the use of relative entropy as the uniquely natural criterion to select a preferred approximation from within a family of trial parameterized distributions, and (b) to obtain the optimal approximation by marginalizing over parameters using… ▽ More We develop a maximum relative entropy formalism to generate optimal approximations to probability distributions. The central results consist in (a) justifying the use of relative entropy as the uniquely natural criterion to select a preferred approximation from within a family of trial parameterized distributions, and (b) to obtain the optimal approximation by marginalizing over parameters using the method of maximum entropy and information geometry. As an illustration we apply our method to simple fluids. The "exact" canonical distribution is approximated by that of a fluid of hard spheres. The proposed method first determines the preferred value of the hard-sphere diameter, and then obtains an optimal hard-sphere approximation by a suitably weighed average over different hard-sphere diameters. This leads to a considerable improvement in accounting for the soft-core nature of the interatomic potential. As a numerical demonstration, the radial distribution function and the equation of state for a Lennard-Jones fluid (argon) are compared with results from molecular dynamics simulations. △ Less

Submitted 29 August, 2008; originally announced August 2008.

Comments: 5 figures, accepted for publication in Physica A, 2008

Journal ref: Physica A387, 6759 (2008)

Showing 1–39 of 39 results for author: Tseng, C