Search | arXiv e-print repository

Approximately counting maximal independent set is equivalent to #SAT

Abstract: A maximal independent set is an independent set that is not a subset of any other independent set. It is also the key problem of mathematics, computer science, and other fields. A counting problem is a type of computational problem that associated with the number of solutions. Besides, counting problems help us better understand several fields such as algorithm analysis, complexity theory, artific… ▽ More A maximal independent set is an independent set that is not a subset of any other independent set. It is also the key problem of mathematics, computer science, and other fields. A counting problem is a type of computational problem that associated with the number of solutions. Besides, counting problems help us better understand several fields such as algorithm analysis, complexity theory, artificial intelligence, etc. The problem of counting maximal independent sets is #P-complete. So it is natural to think about approximate counting for maximal independent sets problem. In this article, we study the complexity of approximately counting maximal independent sets. Specifically, we are the first to prove that the #MIS problem is AP-interreducible with the #SAT of a given general graph. △ Less

Submitted 13 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: After discussion, this is already known in JCSS (with the arXiv:1411.6829),proving that approximately counting MIS in bipartite graphs is equivalent to #SAT under AP-reductions, it is a stronger result if it restricts to bipartite graphs, which implies it for general graphs. Therefore, this paper tends to be more of a direct proof exercise

arXiv:2409.01994 [pdf, other]

BinPRE: Enhancing Field Inference in Binary Analysis Based Protocol Reverse Engineering

Authors: Jiayi Jiang, Xiyuan Zhang, Chengcheng Wan, Haoyi Chen, Haiying Sun, Ting Su

Abstract: Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference… ▽ More Protocol reverse engineering (PRE) aims to infer the specification of network protocols when the source code is not available. Specifically, field inference is one crucial step in PRE to infer the field formats and semantics. To perform field inference, binary analysis based PRE techniques are one major approach category. However, such techniques face two key challenges - (1) the format inference is fragile when the logics of processing input messages may vary among different protocol implementations, and (2) the semantic inference is limited by inadequate and inaccurate inference rules. To tackle these challenges, we present BinPRE, a binary analysis based PRE tool. BinPRE incorporates (1) an instruction-based semantic similarity analysis strategy for format extraction; (2) a novel library composed of atomic semantic detectors for improving semantic inference adequacy; and (3) a cluster-and-refine paradigm to further improve semantic inference accuracy. We have evaluated BinPRE against five existing PRE tools, including Polyglot, AutoFormat, Tupni, BinaryInferno and DynPRE. The evaluation results on eight widely-used protocols show that BinPRE outperforms the prior PRE tools in both format and semantic inference. BinPRE achieves the perfection of 0.73 on format extraction and the F1-score of 0.74 (0.81) on semantic inference of types (functions), respectively. The field inference results of BinPRE have helped improve the effectiveness of protocol fuzzing by achieving 5-29% higher branch coverage, compared to those of the best prior PRE tool. BinPRE has also helped discover one new zero-day vulnerability, which otherwise cannot be found. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

arXiv:2408.13855 [pdf, other]

An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues

Authors: Han Cui, Menglei Xie, Ting Su, Chengyu Zhang, Shin Hwei Tan

Abstract: Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the mainta… ▽ More Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the maintainers, users and researchers in their issue repositories -- each of these issues manifested as a FN or FP of these analyzers in the history and has already been confirmed and fixed by the analyzers' developers. To this end, we conduct the first systematic study on a broad range of 350 historical issues of FNs/FPs from three popular static code analyzers (i.e., PMD, SpotBugs, and SonarQube). All these issues have been confirmed and fixed by the developers. We investigated these issues' root causes and the characteristics of the corresponding issue-triggering programs. It reveals several new interesting findings and implications on mitigating FNs and FPs. Furthermore, guided by some findings of our study, we designed a metamorphic testing strategy to find FNs and FPs. This strategy successfully found 14 new issues of FNs/FPs, 11 of which have been confirmed and 9 have already been fixed by the developers. Our further manual investigation of the studied analyzers revealed one rule specification issue and additional four FNs/FPs due to the weaknesses of the implemented static analysis. We have made all the artifacts (datasets and tools) publicly available at https://zenodo.org/doi/10.5281/zenodo.11525129. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2407.20773 [pdf]

UpDown: Programmable fine-grained Events for Scalable Performance on Irregular Applications

Authors: Andronicus Rajasukumar, Jiya Su, Yuqing, Wang, Tianshuo Su, Marziyeh Nourian, Jose M Monsalve Diaz, Tianchi Zhang, Jianru Ding, Wenyi Wang, Ziyi Zhang, Moubarak Jeje, Henry Hoffmann, Yanjing Li, Andrew A. Chien

Abstract: Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports fine-grained execution with novel architecture mechanisms - lightweight threading, event-driven scheduling, efficient ultra-short threads, and split-transaction DRAM acc… ▽ More Applications with irregular data structures, data-dependent control flows and fine-grained data transfers (e.g., real-world graph computations) perform poorly on cache-based systems. We propose the UpDown accelerator that supports fine-grained execution with novel architecture mechanisms - lightweight threading, event-driven scheduling, efficient ultra-short threads, and split-transaction DRAM access with software-controlled synchronization. These hardware primitives support software programmable events, enabling high performance on diverse data structures and algorithms. UpDown also supports scalable performance; hardware replication enables programs to scale up performance. Evaluation results show UpDown's flexibility and scalability enable it to outperform CPUs on graph mining and analytics computations by up to 116-195x geomean speedup and more than 4x speedup over prior accelerators. We show that UpDown generates high memory parallelism (~4.6x over CPU) required for memory intensive graph computations. We present measurements that attribute the performance of UpDown (23x architectural advantage) to its individual architectural mechanisms. Finally, we also analyze the area and power cost of UpDown's mechanisms for software programmability. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: 14 pages, 23 figures

arXiv:2407.19625 [pdf, other]

LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment

Authors: Taoyu Su, Xinghua Zhang, Jiawei Sheng, Zhenyu Zhang, Tingwen Liu

Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs (MMKGs), whose entities can be associated with relational triples and related images. Most previous studies treat the graph structure as a special modality, and fuse different modality information with separate uni-modal encoders, neglecting valuable relational associations in modaliti… ▽ More Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs (MMKGs), whose entities can be associated with relational triples and related images. Most previous studies treat the graph structure as a special modality, and fuse different modality information with separate uni-modal encoders, neglecting valuable relational associations in modalities. Other studies refine each uni-modal information with graph structures, but may introduce unnecessary relations in specific modalities. To this end, we propose a novel local-to-global interaction network for MMEA, termed as LoginMEA. Particularly, we first fuse local multi-modal interactions to generate holistic entity semantics and then refine them with global relational interactions of entity neighbors. In this design, the uni-modal information is fused adaptively, and can be refined with relations accordingly. To enrich local interactions of multi-modal entity information, we device modality weights and low-rank interactive fusion, allowing diverse impacts and element-level interactions among modalities. To capture global interactions of graph structures, we adopt relation reflection graph attention networks, which fully capture relational associations between entities. Extensive experiments demonstrate superior results of our method over 5 cross-KG or bilingual benchmark datasets, indicating the effectiveness of capturing local and global interactions. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: Accepted by ECAI 2024

arXiv:2407.19302 [pdf, other]

IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

Authors: Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu

Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational infor… ▽ More Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational information bottleneck for multi-modal entity alignment (IBMEA), which emphasizes the alignment-relevant information and suppresses the alignment-irrelevant information in generating entity representations. Specifically, we devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions. Then, we propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations. Finally, we propose a modal-hybrid information contrastive regularizer to integrate all the refined modal-specific representations, enhancing the entity similarity between MMKGs to achieve MMEA. We conduct extensive experiments on two cross-KG and three bilingual MMEA datasets. Experimental results demonstrate that our model consistently outperforms previous state-of-the-art methods, and also shows promising and robust performance in low-resource and high-noise data scenarios. △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: Accepted by ACM MM 2024

arXiv:2407.18955 [pdf, other]

Real Face Video Animation Platform

Authors: Xiaokai Chen, Xuan Liu, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

Abstract: In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gra… ▽ More In recent years, facial video generation models have gained popularity. However, these models often lack expressive power when dealing with exaggerated anime-style faces due to the absence of high-quality anime-style face training sets. We propose a facial animation platform that enables real-time conversion from real human faces to cartoon-style faces, supporting multiple models. Built on the Gradio framework, our platform ensures excellent interactivity and user-friendliness. Users can input a real face video or image and select their desired cartoon style. The system will then automatically analyze facial features, execute necessary preprocessing, and invoke appropriate models to generate expressive anime-style faces. We employ a variety of models within our system to process the HDTF dataset, thereby creating an animated facial video dataset. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08949 [pdf, other]

One-Shot Pose-Driving Face Animation Platform

Authors: He Feng, Donglin Di, Yongjia Ma, Wei Chen, Tonghua Su

Abstract: The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-… ▽ More The objective of face animation is to generate dynamic and expressive talking head videos from a single reference face, utilizing driving conditions derived from either video or audio inputs. Current approaches often require fine-tuning for specific identities and frequently fail to produce expressive videos due to the limited effectiveness of Wav2Pose modules. To facilitate the generation of one-shot and more consecutive talking head videos, we refine an existing Image2Video model by integrating a Face Locator and Motion Frame mechanism. We subsequently optimize the model using extensive human face video datasets, significantly enhancing its ability to produce high-quality and expressive talking head videos. Additionally, we develop a demo platform using the Gradio framework, which streamlines the process, enabling users to quickly create customized talking head videos. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.08234 [pdf, other]

Model Predictive Control For Mobile Manipulators Based On Neural Dynamics(Extended version)

Authors: Tao Su, Shiqi Zheng

Abstract: This article focuses on the trajectory tracking problem of mobile manipulators (MMs). Firstly, we construct a position and orientation model predictive tracking control (POMPTC) scheme for mobile manipulators. The proposed POMPTC scheme can simultaneously minimize the tracking error, joint velocity, and joint acceleration. Moreover, it can achieve synchronous control for the position and orientati… ▽ More This article focuses on the trajectory tracking problem of mobile manipulators (MMs). Firstly, we construct a position and orientation model predictive tracking control (POMPTC) scheme for mobile manipulators. The proposed POMPTC scheme can simultaneously minimize the tracking error, joint velocity, and joint acceleration. Moreover, it can achieve synchronous control for the position and orientation of the end-effector. Secondly, a finite-time convergent neural dynamics (FTCND) model is constructed to find the optimal solution of the POMPTC scheme. Then, based on the proposed POMPTC scheme, a non-singular fast terminal sliding model (NFTSM) control method is presented, which considers the disturbances caused by the base motion on the manipulator at the dynamic level. It can achieve finite-time tracking performance and improve the anti-disturbances ability. Finally, simulation and experiments show that the proposed control method has the advantages of strong robustness, fast convergence, and high control accuracy. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: This article consists of 13 pages, including the text and the proof process

arXiv:2407.05138 [pdf, other]

Vortex under Ripplet: An Empirical Study of RAG-enabled Applications

Authors: Yuchen Shao, Yuheng Huang, Jiawei Shen, Lei Ma, Ting Su, Chengcheng Wan

Abstract: Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. However, developers face challenges in integrating RAG-enhanced LLMs into software systems, due to lack of interface specification, requirements from software context, and complicated system management. In this paper, we manually studied 100 open-source applic… ▽ More Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. However, developers face challenges in integrating RAG-enhanced LLMs into software systems, due to lack of interface specification, requirements from software context, and complicated system management. In this paper, we manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports. We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security. We have also generalized 19 defect patterns and proposed guidelines to tackle them. We hope this work could aid LLM-enabled software development and motivate future research. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.01636 [pdf, other]

Learning Frequency-Aware Dynamic Transformers for All-In-One Image Restoration

Authors: Zenglin Shi, Tong Su, Pei Liu, Yunpeng Wu, Le Zhang, Meng Wang

Abstract: This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Recognizing that various degradations affect image content differently across frequency band… ▽ More This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Recognizing that various degradations affect image content differently across frequency bands, we propose a new all-in-one image restoration approach from a frequency perspective, leveraging advanced vision transformers. Our method consists of two main components: a frequency-aware Degradation prior learning transformer (Dformer) and a degradation-adaptive Restoration transformer (Rformer). The Dformer captures the essential characteristics of various degradations by decomposing inputs into different frequency components. By understanding how degradations affect these frequency components, the Dformer learns robust priors that effectively guide the restoration process. The Rformer then employs a degradation-adaptive self-attention module to selectively focus on the most affected frequency components, guided by the learned degradation representations. Extensive experimental results demonstrate that our approach outperforms the existing methods on four representative restoration tasks, including denoising, deraining, dehazing and deblurring. Additionally, our method offers benefits for handling spatially variant degradations and unseen degradation levels. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 8 pages

arXiv:2406.00644 [pdf, other]

Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance

Authors: Jun Li, Tongkun Su, Baoliang Zhao, Faqin Lv, Qiong Wang, Nassir Navab, Ying Hu, Zhongliang Jiang

Abstract: Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation proces… ▽ More Automatic report generation has arisen as a significant research area in computer-aided diagnosis, aiming to alleviate the burden on clinicians by generating reports automatically based on medical images. In this work, we propose a novel framework for automatic ultrasound report generation, leveraging a combination of unsupervised and supervised learning methods to aid the report generation process. Our framework incorporates unsupervised learning methods to extract potential knowledge from ultrasound text reports, serving as the prior information to guide the model in aligning visual and textual features, thereby addressing the challenge of feature discrepancy. Additionally, we design a global semantic comparison mechanism to enhance the performance of generating more comprehensive and accurate medical reports. To enable the implementation of ultrasound report generation, we constructed three large-scale ultrasound image-text datasets from different organs for training and validation purposes. Extensive evaluations with other state-of-the-art approaches exhibit its superior performance across all three datasets. Code and dataset are valuable at this link. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20727 [pdf, other]

GANcrop: A Contrastive Defense Against Backdoor Attacks in Federated Learning

Authors: Xiaoyun Gan, Shanyu Gan, Taizhi Su, Peng Liu

Abstract: With heightened awareness of data privacy protection, Federated Learning (FL) has attracted widespread attention as a privacy-preserving distributed machine learning method. However, the distributed nature of federated learning also provides opportunities for backdoor attacks, where attackers can guide the model to produce incorrect predictions without affecting the global model training process.… ▽ More With heightened awareness of data privacy protection, Federated Learning (FL) has attracted widespread attention as a privacy-preserving distributed machine learning method. However, the distributed nature of federated learning also provides opportunities for backdoor attacks, where attackers can guide the model to produce incorrect predictions without affecting the global model training process. This paper introduces a novel defense mechanism against backdoor attacks in federated learning, named GANcrop. This approach leverages contrastive learning to deeply explore the disparities between malicious and benign models for attack identification, followed by the utilization of Generative Adversarial Networks (GAN) to recover backdoor triggers and implement targeted mitigation strategies. Experimental findings demonstrate that GANcrop effectively safeguards against backdoor attacks, particularly in non-IID scenarios, while maintaining satisfactory model accuracy, showcasing its remarkable defensive efficacy and practical utility. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.16557 [pdf, other]

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Authors: Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, Che Lin

Abstract: Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furth… ▽ More Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furthermore, these methods typically fail to provide robust initial embeddings for values infrequently observed or even absent within the training set, posing significant challenges to model generalizability. In response to these challenges, we propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token, effectively bypassing the need for imputation. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. Coupling SCANE with the Transformer Encoder architecture, we develop the Scalable nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries. Our experimental validation, conducted across three disparate electronic health record (EHR) datasets marked by elevated missing value frequencies, confirms the superior performance of SUMMIT over contemporary state-of-the-art approaches addressing similar challenges. These results substantiate the efficacy of SCANE and SUMMIT, underscoring their potential applicability across a broad spectrum of MTS data analytical tasks. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.09820 [pdf, other]

Densely Distilling Cumulative Knowledge for Continual Learning

Authors: Zenglin Shi, Pei Liu, Tong Su, Yunpeng Wu, Kuien Liu, Yu Song, Meng Wang

Abstract: Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track t… ▽ More Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track the model's capabilities. It partitions the output logits of the model into dense groups, each corresponding to a task in the task pool. It then distills all tasks' knowledge using all groups. However, using all the groups can be computationally expensive, we also suggest random group selection in each optimization step. Moreover, we propose an adaptive weighting scheme, which balances the learning of new classes and the retention of old classes, based on the count and similarity of the classes. Our DKD outperforms recent state-of-the-art baselines across diverse benchmarks and scenarios. Empirical analysis underscores DKD's ability to enhance model stability, promote flatter minima for improved generalization, and remains robust across various memory budgets and task orders. Moreover, it seamlessly integrates with other CL methods to boost performance and proves versatile in offline scenarios like model compression. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 12 pages; Continual Leanrning; Class-incremental Learning; Knowledge Distillation; Forgetting

arXiv:2404.04212 [pdf, other]

Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation

Authors: Tong Su, Xin Peng, Sarubi Thillainathan, David Guzmán, Surangika Ranathunga, En-Shiun Annie Lee

Abstract: Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies sign… ▽ More Parameter-efficient fine-tuning (PEFT) methods are increasingly vital in adapting large-scale pre-trained language models for diverse tasks, offering a balance between adaptability and computational efficiency. They are important in Low-Resource Language (LRL) Neural Machine Translation (NMT) to enhance translation accuracy with minimal resources. However, their practical effectiveness varies significantly across different languages. We conducted comprehensive empirical experiments with varying LRL domains and sizes to evaluate the performance of 8 PEFT methods with in total of 15 architectures using the SacreBLEU score. We showed that 6 PEFT architectures outperform the baseline for both in-domain and out-domain tests and the Houlsby+Inversion adapter has the best performance overall, proving the effectiveness of PEFT methods. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Accepted to the Findings of NAACL 2024

arXiv:2404.00226 [pdf, other]

Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training

Authors: Tongkun Su, Jun Li, Xi Zhang, Haibo Jin, Hao Chen, Qiong Wang, Faqin Lv, Baoliang Zhao, Yin Hu

Abstract: Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. To the best of our knowledge, we are the first to utilize Visual Ques… ▽ More Multimodal pre-training demonstrates its potential in the medical domain, which learns medical visual representations from paired medical reports. However, many pre-training tasks require extra annotations from clinicians, and most of them fail to explicitly guide the model to learn the desired features of different pathologies. To the best of our knowledge, we are the first to utilize Visual Question Answering (VQA) for multimodal pre-training to guide the framework focusing on targeted pathological features. In this work, we leverage descriptions in medical reports to design multi-granular question-answer pairs associated with different diseases, which assist the framework in pre-training without requiring extra annotations from experts. We also propose a novel pre-training framework with a quasi-textual feature transformer, a module designed to transform visual features into a quasi-textual space closer to the textual domain via a contrastive learning strategy. This narrows the vision-language gap and facilitates modality alignment. Our framework is applied to four downstream tasks: report generation, classification, segmentation, and detection across five datasets. Extensive experiments demonstrate the superiority of our framework compared to other state-of-the-art methods. Our code will be released upon acceptance. △ Less

Submitted 8 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.09712 [pdf, other]

doi 10.1145/3589334.3645406

A Knowledge-Injected Curriculum Pretraining Framework for Question Answering

Authors: Xin Lin, Tianhuang Su, Zhenya Huang, Shangzi Xue, Haifeng Liu, Enhong Chen

Abstract: Knowledge-based question answering (KBQA) is a key task in NLP research, and also an approach to access the web data and knowledge, which requires exploiting knowledge graphs (KGs) for reasoning. In the literature, one promising solution for KBQA is to incorporate the pretrained language model (LM) with KGs by generating KG-centered pretraining corpus, which has shown its superiority. However, the… ▽ More Knowledge-based question answering (KBQA) is a key task in NLP research, and also an approach to access the web data and knowledge, which requires exploiting knowledge graphs (KGs) for reasoning. In the literature, one promising solution for KBQA is to incorporate the pretrained language model (LM) with KGs by generating KG-centered pretraining corpus, which has shown its superiority. However, these methods often depend on specific techniques and resources to work, which may not always be available and restrict its application. Moreover, existing methods focus more on improving language understanding with KGs, while neglect the more important human-like complex reasoning. To this end, in this paper, we propose a general Knowledge-Injected Curriculum Pretraining framework (KICP) to achieve comprehensive KG learning and exploitation for KBQA tasks, which is composed of knowledge injection (KI), knowledge adaptation (KA) and curriculum reasoning (CR). Specifically, the KI module first injects knowledge into the LM by generating KG-centered pretraining corpus, and generalizes the process into three key steps that could work with different implementations for flexible application. Next, the KA module learns knowledge from the generated corpus with LM equipped with an adapter as well as keeps its original natural language understanding ability to reduce the negative impacts of the difference between the generated and natural corpus. Last, to enable the LM with complex reasoning, the CR module follows human reasoning patterns to construct three corpora with increasing difficulties of reasoning, and further trains the LM from easy to hard in a curriculum manner. We provide an implementation of the general framework, and evaluate the proposed KICP on four real-word datasets. The results demonstrate that our framework can achieve higher performances. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: Accepted by WWW 2024

arXiv:2401.15119 [pdf, other]

Interpreting Time Series Transformer Models and Sensitivity Analysis of Population Age Groups to COVID-19 Infections

Authors: Md Khairul Islam, Tyler Valentine, Timothy Joowon Sue, Ayush Karmacharya, Luke Neil Benham, Zhengguang Wang, Kingsley Kim, Judy Fox

Abstract: Interpreting deep learning time series models is crucial in understanding the model's behavior and learning patterns from raw data for real-time decision-making. However, the complexity inherent in transformer-based time series models poses challenges in explaining the impact of individual features on predictions. In this study, we leverage recent local interpretation methods to interpret state-of… ▽ More Interpreting deep learning time series models is crucial in understanding the model's behavior and learning patterns from raw data for real-time decision-making. However, the complexity inherent in transformer-based time series models poses challenges in explaining the impact of individual features on predictions. In this study, we leverage recent local interpretation methods to interpret state-of-the-art time series models. To use real-world datasets, we collected three years of daily case data for 3,142 US counties. Firstly, we compare six transformer-based models and choose the best prediction model for COVID-19 infection. Using 13 input features from the last two weeks, we can predict the cases for the next two weeks. Secondly, we present an innovative way to evaluate the prediction sensitivity to 8 population age groups over highly dynamic multivariate infection data. Thirdly, we compare our proposed perturbation-based interpretation method with related work, including a total of eight local interpretation methods. Finally, we apply our framework to traffic and electricity datasets, demonstrating that our approach is generic and can be applied to other time-series domains. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.04126 [pdf, other]

An Improved Scheduling with Advantage Actor-Critic for Storm Workloads

Authors: Gaoqiang Dong, Jia Wang, Mingjing Wang, Tingting Su

Abstract: Various resources as the essential elements of data centers, and the completion time is vital to users. In terms of the persistence, the periodicity and the spatial-temporal dependence of stream workload, a new Storm scheduler with Advantage Actor-Critic is proposed to improve resource utilization for minimizing the completion time. A new weighted embedding with a Graph Neural Network is designed… ▽ More Various resources as the essential elements of data centers, and the completion time is vital to users. In terms of the persistence, the periodicity and the spatial-temporal dependence of stream workload, a new Storm scheduler with Advantage Actor-Critic is proposed to improve resource utilization for minimizing the completion time. A new weighted embedding with a Graph Neural Network is designed to depend on the features of a job comprehensively, which includes the dependence, the types and the positions of tasks in a job. An improved Advantage Actor-Critic integrating task chosen and executor assignment is proposed to schedule tasks to executors in order to better resource utilization. Then the status of tasks and executors are updated for the next scheduling. Compared to existing methods, experimental results show that the proposed Storm scheduler improves resource utilization. The completion time is reduced by almost 17\% on the TPC-H data set and reduced by almost 25\% on the Alibaba data set. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.01311 [pdf, other]

Software Engineering for OpenHarmony: A Research Roadmap

Authors: Li Li, Xiang Gao, Hailong Sun, Chunming Hu, Xiaoyu Sun, Haoyu Wang, Haipeng Cai, Ting Su, Xiapu Luo, Tegawendé F. Bissyandé, Jacques Klein, John Grundy, Tao Xie, Haibo Chen, Huaimin Wang

Abstract: Mobile software engineering has been a hot research topic for decades. Our fellow researchers have proposed various approaches (with over 7,000 publications for Android alone) in this field that essentially contributed to the great success of the current mobile ecosystem. Existing research efforts mainly focus on popular mobile platforms, namely Android and iOS. OpenHarmony, a newly open-sourced m… ▽ More Mobile software engineering has been a hot research topic for decades. Our fellow researchers have proposed various approaches (with over 7,000 publications for Android alone) in this field that essentially contributed to the great success of the current mobile ecosystem. Existing research efforts mainly focus on popular mobile platforms, namely Android and iOS. OpenHarmony, a newly open-sourced mobile platform, has rarely been considered, although it is the one requiring the most attention as OpenHarmony is expected to occupy one-third of the market in China (if not in the world). To fill the gap, we present to the mobile software engineering community a research roadmap for encouraging our fellow researchers to contribute promising approaches to OpenHarmony. Specifically, we start by presenting a literature review of mobile software engineering, attempting to understand what problems have been targeted by the mobile community and how they have been resolved. We then summarize the existing (limited) achievements of OpenHarmony and subsequently highlight the research gap between Android/iOS and OpenHarmony. This research gap eventually helps in forming the roadmap for conducting software engineering research for OpenHarmony. △ Less

Submitted 21 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

arXiv:2308.10021 [pdf, other]

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

Authors: Tung-Cheng Su, Yung-Chuan Chang, Yi-Wen Liu

Abstract: Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quali… ▽ More Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: The original edition of this paper will be published in the CMMR 2023 Proceedings. This ArXiv publication is a copy

arXiv:2306.01260 [pdf, other]

FREPA: An Automated and Formal Approach to Requirement Modeling and Analysis in Aircraft Control Domain

Authors: Jincao Feng, Weikai Miao, Hanyue Zheng, Yihao Huang, Jianwen Li, Zheng Wang, Ting Su, Bin Gu, Geguang Pu, Mengfei Yang, Jifeng He

Abstract: Formal methods are promising for modeling and analyzing system requirements. However, applying formal methods to large-scale industrial projects is a remaining challenge. The industrial engineers are suffering from the lack of automated engineering methodologies to effectively conduct precise requirement models, and rigorously validate and verify (V&V) the generated models. To tackle this challeng… ▽ More Formal methods are promising for modeling and analyzing system requirements. However, applying formal methods to large-scale industrial projects is a remaining challenge. The industrial engineers are suffering from the lack of automated engineering methodologies to effectively conduct precise requirement models, and rigorously validate and verify (V&V) the generated models. To tackle this challenge, in this paper, we present a systematic engineering approach, named Formal Requirement Engineering Platform in Aircraft (FREPA), for formal requirement modeling and V\&V in the aerospace and aviation control domains. FREPA is an outcome of the seamless collaboration between the academy and industry over the last eight years. The main contributions of this paper include 1) an automated and systematic engineering approach FREPA to construct requirement models, validate and verify systems in the aerospace and aviation control domain, 2) a domain-specific modeling language AASRDL to describe the formal specification, and 3) a practical FREPA-based tool AeroReq which has been used by our industry partners. We have successfully adopted FREPA to seven real aerospace gesture control and two aviation engine control systems. The experimental results show that FREPA and the corresponding tool AeroReq significantly facilitate formal modeling and V&V in the industry. Moreover, we also discuss the experiences and lessons gained from using FREPA in aerospace and aviation projects. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 12 pages, Published by FSE 2020

arXiv:2305.08322 [pdf, other]

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

Authors: Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He

Abstract: New NLP benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. C-Eval comprises multiple-choice questions across four difficulty levels: middle school, high school, college, and prof… ▽ More New NLP benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context. C-Eval comprises multiple-choice questions across four difficulty levels: middle school, high school, college, and professional. The questions span 52 diverse disciplines, ranging from humanities to science and engineering. C-Eval is accompanied by C-Eval Hard, a subset of very challenging subjects in C-Eval that requires advanced reasoning abilities to solve. We conduct a comprehensive evaluation of the most advanced LLMs on C-Eval, including both English- and Chinese-oriented models. Results indicate that only GPT-4 could achieve an average accuracy of over 60%, suggesting that there is still significant room for improvement for current LLMs. We anticipate C-Eval will help analyze important strengths and shortcomings of foundation models, and foster their development and growth for Chinese users. △ Less

Submitted 6 November, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023. Website: https://cevalbenchmark.com

arXiv:2304.10097 [pdf, other]

Scene Style Text Editing

Authors: Tonghua Su, Fuxiang Yang, Xiang Zhou, Donglin Di, Zhongjie Wang, Songze Li

Abstract: In this work, we propose a task called "Scene Style Text Editing (SSTE)", changing the text content as well as the text style of the source image while keeping the original text scene. Existing methods neglect to fine-grained adjust the style of the foreground text, such as its rotation angle, color, and font type. To tackle this task, we propose a quadruple framework named "QuadNet" to embed and… ▽ More In this work, we propose a task called "Scene Style Text Editing (SSTE)", changing the text content as well as the text style of the source image while keeping the original text scene. Existing methods neglect to fine-grained adjust the style of the foreground text, such as its rotation angle, color, and font type. To tackle this task, we propose a quadruple framework named "QuadNet" to embed and adjust foreground text styles in the latent feature space. Specifically, QuadNet consists of four parts, namely background inpainting, style encoder, content encoder, and fusion generator. The background inpainting erases the source text content and recovers the appropriate background with a highly authentic texture. The style encoder extracts the style embedding of the foreground text. The content encoder provides target text representations in the latent feature space to implement the content edits. The fusion generator combines the information yielded from the mentioned parts and generates the rendered text images. Practically, our method is capable of performing promisingly on real-world datasets with merely string-level annotation. To the best of our knowledge, our work is the first to finely manipulate the foreground text content and style by deeply semantic editing in the latent feature space. Extensive experiments demonstrate that QuadNet has the ability to generate photo-realistic foreground text and avoid source text shadows in real-world scenes when editing text content. △ Less

Submitted 20 April, 2023; originally announced April 2023.

arXiv:2304.10020 [pdf, other]

A Survey on Deep Neural Network Partition over Cloud, Edge and End Devices

Authors: Di Xu, Xiang He, Tonghua Su, Zhongjie Wang

Abstract: Deep neural network (DNN) partition is a research problem that involves splitting a DNN into multiple parts and offloading them to specific locations. Because of the recent advancement in multi-access edge computing and edge intelligence, DNN partition has been considered as a powerful tool for improving DNN inference performance when the computing resources of edge and end devices are limited and… ▽ More Deep neural network (DNN) partition is a research problem that involves splitting a DNN into multiple parts and offloading them to specific locations. Because of the recent advancement in multi-access edge computing and edge intelligence, DNN partition has been considered as a powerful tool for improving DNN inference performance when the computing resources of edge and end devices are limited and the remote transmission of data from these devices to clouds is costly. This paper provides a comprehensive survey on the recent advances and challenges in DNN partition approaches over the cloud, edge, and end devices based on a detailed literature collection. We review how DNN partition works in various application scenarios, and provide a unified mathematical model of the DNN partition problem. We developed a five-dimensional classification framework for DNN partition approaches, consisting of deployment locations, partition granularity, partition constraints, optimization objectives, and optimization algorithms. Each existing DNN partition approache can be perfectly defined in this framework by instantiating each dimension into specific values. In addition, we suggest a set of metrics for comparing and evaluating the DNN partition approaches. Based on this, we identify and discuss research challenges that have not yet been investigated or fully addressed. We hope that this work helps DNN partition researchers by highlighting significant future research directions in this domain. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2304.04233 [pdf, other]

ODDFUZZ: Discovering Java Deserialization Vulnerabilities via Structure-Aware Directed Greybox Fuzzing

Authors: Sicong Cao, Biao He, Xiaobing Sun, Yu Ouyang, Chao Zhang, Xiaoxue Wu, Ting Su, Lili Bo, Bin Li, Chuanlei Ma, Jiajia Li, Tao Wei

Abstract: Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover… ▽ More Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover Java deserialization vulnerabilities. First, ODDFUZZ performs lightweight static taint analysis to identify candidate gadget chains that may cause deserialization vulner-abilities. In this step, ODDFUZZ tries to locate all candidates and avoid false negatives. Then, ODDFUZZ performs directed greybox fuzzing (DGF) to explore those candidates and generate PoC testcases to mitigate false positives. Specifically, ODDFUZZ applies a structure-aware seed generation method to guarantee the validity of the testcases, and adopts a novel hybrid feedback and a step-forward strategy to guide the directed fuzzing. We implemented a prototype of ODDFUZZ and evaluated it on the popular Java deserialization repository ysoserial. Results show that, ODDFUZZ could discover 16 out of 34 known gadget chains, while two state-of-the-art baselines only identify three of them. In addition, we evaluated ODDFUZZ on real-world applications including Oracle WebLogic Server, Apache Dubbo, Sonatype Nexus, and protostuff, and found six previously unreported exploitable gadget chains with five CVEs assigned. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: To appear in the Main Track of IEEE S&P 2023

arXiv:2303.17568 [pdf, other]

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X

Authors: Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, Jie Tang

Abstract: Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages… ▽ More Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages as of June 2022. Our extensive experiments suggest that CodeGeeX outperforms multilingual code models of similar scale for both the tasks of code generation and translation on HumanEval-X. Building upon HumanEval (Python only), we develop the HumanEval-X benchmark for evaluating multilingual models by hand-writing the solutions in C++, Java, JavaScript, and Go. In addition, we build CodeGeeX-based extensions on Visual Studio Code, JetBrains, and Cloud Studio, generating 4.7 billion tokens for tens of thousands of active users per week. Our user study demonstrates that CodeGeeX can help to increase coding efficiency for 83.4% of its users. Finally, CodeGeeX is publicly accessible and in Sep. 2022, we open-sourced its code, model weights (the version of 850B tokens), API, extensions, and HumanEval-X at https://github.com/THUDM/CodeGeeX. △ Less

Submitted 9 July, 2024; v1 submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.12071 [pdf, other]

ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals

Authors: Xishun Wang, Tong Su, Fang Da, Xiaodong Yang

Abstract: Motion forecasting is a key module in an autonomous driving system. Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging. To cope with these difficulties, this paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion prediction. W… ▽ More Motion forecasting is a key module in an autonomous driving system. Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging. To cope with these difficulties, this paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion prediction. We design a modality-agnostic strategy to concisely encode the complex input in a unified manner. We generate diverse proposals, fused with anchors bearing goal-oriented scene context, to induce multimodal prediction that covers a wide range of future trajectories. Our network architecture is highly uniform and succinct, leading to an efficient model amenable for real-world driving deployment. Experiments reveal that our agent-centric network compares favorably with the state-of-the-art methods in prediction accuracy, while achieving scene-centric level inference latency. △ Less

Submitted 28 June, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: CVPR 2023 (Highlight)

arXiv:2303.10845 [pdf, other]

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Authors: Xiaozhe Ren, Pingyi Zhou, Xinfan Meng, Xinjing Huang, Yadao Wang, Weichao Wang, Pengfei Li, Xiaoda Zhang, Alexander Podolskiy, Grigory Arshinov, Andrey Bout, Irina Piontkovskaya, Jiansheng Wei, Xin Jiang, Teng Su, Qun Liu, Jun Yao

Abstract: The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-Σ. With parameter inherent from PanGu-α, we extend the dense Transfo… ▽ More The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-Σ. With parameter inherent from PanGu-α, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-Σ provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation. △ Less

Submitted 19 March, 2023; originally announced March 2023.

arXiv:2301.04285 [pdf, other]

TAPS: Topology-Aware Intra-Operator Parallelism Strategy Searching Algorithm for Deep Neural Networks

Authors: Peng Liang, Hao Zheng, Teng Su, Linbo Qiao, Dongsheng Li

Abstract: TAPS is a Topology-Aware intra-operator Parallelism strategy Searching algorithm that generates intra-operator parallelism strategies by considering both intra-node and inter-node bandwidth. Most of the existing auto-parallelism works use the communication volume as the communication cost directly when generating strategies, which we prove to be sub-optimal in multi-nodes cases. We design a topolo… ▽ More TAPS is a Topology-Aware intra-operator Parallelism strategy Searching algorithm that generates intra-operator parallelism strategies by considering both intra-node and inter-node bandwidth. Most of the existing auto-parallelism works use the communication volume as the communication cost directly when generating strategies, which we prove to be sub-optimal in multi-nodes cases. We design a topology-aware cost model for multi-node intra-operator parallelism strategy searching. Numerical experiments demonstrate that TAPS can generate strategies with up to 85% fewer communication costs, which outperform the latest baselines. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: 11 pages, 6 figures. To be submitted to conference proceedings or a journal after modifications

arXiv:2301.02738 [pdf]

doi 10.1061/JENMDT.EMENG-6945

LS-DYNA Machine Learning-based Multiscale Method for Nonlinear Modeling of Short Fiber-Reinforced Composites

Authors: Haoyan Wei, C. T. Wu, Wei Hu, Tung-Huan Su, Hitoshi Oura, Masato Nishi, Tadashi Naito, Stan Chung, Leo Shen

Abstract: Short-fiber-reinforced composites (SFRC) are high-performance engineering materials for lightweight structural applications in the automotive and electronics industries. Typically, SFRC structures are manufactured by injection molding, which induces heterogeneous microstructures, and the resulting nonlinear anisotropic behaviors are challenging to predict by conventional micromechanical analyses.… ▽ More Short-fiber-reinforced composites (SFRC) are high-performance engineering materials for lightweight structural applications in the automotive and electronics industries. Typically, SFRC structures are manufactured by injection molding, which induces heterogeneous microstructures, and the resulting nonlinear anisotropic behaviors are challenging to predict by conventional micromechanical analyses. In this work, we present a machine learning-based multiscale method by integrating injection molding-induced microstructures, material homogenization, and Deep Material Network (DMN) in the finite element simulation software LS-DYNA for structural analysis of SFRC. DMN is a physics-embedded machine learning model that learns the microscale material morphologies hidden in representative volume elements of composites through offline training. By coupling DMN with finite elements, we have developed a highly accurate and efficient data-driven approach, which predicts nonlinear behaviors of composite materials and structures at a computational speed orders-of-magnitude faster than the high-fidelity direct numerical simulation. To model industrial-scale SFRC products, transfer learning is utilized to generate a unified DMN database, which effectively captures the effects of injection molding-induced fiber orientations and volume fractions on the overall composite properties. Numerical examples are presented to demonstrate the promising performance of this LS-DYNA machine learning-based multiscale method for SFRC modeling. △ Less

Submitted 6 January, 2023; originally announced January 2023.

Comments: Final version of this manuscript is published in Journal of Engineering Mechanics. Wei, H., Wu, C. T., Hu, W., Su, T. H., Oura H., Nishi, M., Naito T., Chung S., Shen L. (2023). LS-DYNA machine learning-based multiscale method for nonlinear modeling of short-fiber-reinforced composites. Journal of Engineering Mechanics. 149(3): 04023003. https://doi.org/10.1061/JENMDT.EMENG-6945

Journal ref: Journal of Engineering Mechanics, 2023, 149(3): 04023003

arXiv:2211.10678 [pdf, other]

Entity-Assisted Language Models for Identifying Check-worthy Sentences

Authors: Ting Su, Craig Macdonald, Iadh Ounis

Abstract: We propose a new uniform framework for text classification and ranking that can automate the process of identifying check-worthy sentences in political debates and speech transcripts. Our framework combines the semantic analysis of the sentences, with additional entity embeddings obtained through the identified entities within the sentences. In particular, we analyse the semantic meaning of each s… ▽ More We propose a new uniform framework for text classification and ranking that can automate the process of identifying check-worthy sentences in political debates and speech transcripts. Our framework combines the semantic analysis of the sentences, with additional entity embeddings obtained through the identified entities within the sentences. In particular, we analyse the semantic meaning of each sentence using state-of-the-art neural language models such as BERT, ALBERT, and RoBERTa, while embeddings for entities are obtained from knowledge graph (KG) embedding models. Specifically, we instantiate our framework using five different language models, entity embeddings obtained from six different KG embedding models, as well as two combination methods leading to several Entity-Assisted neural language models. We extensively evaluate the effectiveness of our framework using two publicly available datasets from the CLEF' 2019 & 2020 CheckThat! Labs. Our results show that the neural language models significantly outperform traditional TF.IDF and LSTM methods. In addition, we show that the ALBERT model is consistently the most effective model among all the tested neural language models. Our entity embeddings significantly outperform other existing approaches from the literature that are based on similarity and relatedness scores between the entities in a sentence, when used alongside a KG embedding. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: 22 pages, 15 tables, 3 figures

arXiv:2211.10672 [pdf, other]

Leveraging Users' Social Network Embeddings for Fake News Detection on Twitter

Authors: Ting Su, Craig Macdonald, Iadh Ounis

Abstract: Social networks (SNs) are increasingly important sources of news for many people. The online connections made by users allows information to spread more easily than traditional news media (e.g., newspaper, television). However, they also make the spread of fake news easier than in traditional media, especially through the users' social network connections. In this paper, we focus on investigating… ▽ More Social networks (SNs) are increasingly important sources of news for many people. The online connections made by users allows information to spread more easily than traditional news media (e.g., newspaper, television). However, they also make the spread of fake news easier than in traditional media, especially through the users' social network connections. In this paper, we focus on investigating if the SNs' users connection structure can aid fake news detection on Twitter. In particular, we propose to embed users based on their follower or friendship networks on the Twitter platform, so as to identify the groups that users form. Indeed, by applying unsupervised graph embedding methods on the graphs from the Twitter users' social network connections, we observe that users engaged with fake news are more tightly clustered together than users only engaged in factual news. Thus, we hypothesise that the embedded user's network can help detect fake news effectively. Through extensive experiments using a publicly available Twitter dataset, our results show that applying graph embedding methods on SNs, using the user connections as network information, can indeed classify fake news more effectively than most language-based approaches. Specifically, we observe a significant improvement over using only the textual information (i.e., TF.IDF or a BERT language model), as well as over models that deploy both advanced textual features (i.e., stance detection) and complex network features (e.g., users network, publishers cross citations). We conclude that the Twitter users' friendship and followers network information can significantly outperform language-based approaches, as well as the existing state-of-the-art fake news detection models that use a more sophisticated network structure, in classifying fake news on Twitter. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: 15 pages, 5 figures

arXiv:2210.12064

Embedded Silicon-Organic Integrated Neuromorphic System

Authors: Shengjie Zheng, Ling Liu, Junjie Yang, Jianwei Zhang, Tao Su, Bin Yue, Xiaojian Li

Abstract: The development of artificial intelligence (AI) and robotics are both based on the tenet of "science and technology are people-oriented", and both need to achieve efficient communication with the human brain. Based on multi-disciplinary research in systems neuroscience, computer architecture, and functional organic materials, we proposed the concept of using AI to simulate the operating principles… ▽ More The development of artificial intelligence (AI) and robotics are both based on the tenet of "science and technology are people-oriented", and both need to achieve efficient communication with the human brain. Based on multi-disciplinary research in systems neuroscience, computer architecture, and functional organic materials, we proposed the concept of using AI to simulate the operating principles and materials of the brain in hardware to develop brain-inspired intelligence technology, and realized the preparation of neuromorphic computing devices and basic materials. We simulated neurons and neural networks in terms of material and morphology, using a variety of organic polymers as the base materials for neuroelectronic devices, for building neural interfaces as well as organic neural devices and silicon neural computational modules. We assemble organic artificial synapses with simulated neurons from silicon-based Field-Programmable Gate Array (FPGA) into organic artificial neurons, the basic components of neural networks, and later construct biological neural network models based on the interpreted neural circuits. Finally, we also discuss how to further build neuromorphic devices based on these organic artificial neurons, which have both a neural interface friendly to nervous tissue and interact with information from real biological neural networks. △ Less

Submitted 25 June, 2024; v1 submitted 17 October, 2022; originally announced October 2022.

Comments: This article need to update the corrected figure and data

arXiv:2209.08719 [pdf, other]

doi 10.1145/3533767.3534402

Detecting and Fixing Data Loss Issues in Android Apps

Authors: Wunan Guo, Zhen Dong, Liwei Shen, Wei Tian, Ting Su, Xin Peng

Abstract: Android apps are event-driven, and their execution is often interrupted by external events. This interruption can cause data loss issues that annoy users. For instance, when the screen is rotated, the current app page will be destroyed and recreated. If the app state is improperly preserved, user data will be lost. In this work, we present an approach and tool iFixDataloss that automatically detec… ▽ More Android apps are event-driven, and their execution is often interrupted by external events. This interruption can cause data loss issues that annoy users. For instance, when the screen is rotated, the current app page will be destroyed and recreated. If the app state is improperly preserved, user data will be lost. In this work, we present an approach and tool iFixDataloss that automatically detects and fixes data loss issues in Android apps. To achieve this, we identify scenarios in which data loss issues may occur, develop strategies to reveal data loss issues, and design patch templates to fix them. Our experiments on 66 Android apps show iFixDataloss detected 374 data loss issues (284 of them were previously unknown) and successfully generated patches for 188 of the 374 issues. Out of 20 submitted patches, 16 have been accepted by developers. In comparison with state-of-the-art techniques, iFixDataloss performed significantly better in terms of the number of detected data loss issues and the quality of generated patches. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Journal ref: Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis(ISSTA 2022), pp. 605-616

arXiv:2207.06553 [pdf, other]

QML for Argoverse 2 Motion Forecasting Challenge

Authors: Tong Su, Xishun Wang, Xiaodong Yang

Abstract: To safely navigate in various complex traffic scenarios, autonomous driving systems are generally equipped with a motion forecasting module to provide vital information for the downstream planning module. For the real-world onboard applications, both accuracy and latency of a motion forecasting model are essential. In this report, we present an effective and efficient solution, which ranks the 3rd… ▽ More To safely navigate in various complex traffic scenarios, autonomous driving systems are generally equipped with a motion forecasting module to provide vital information for the downstream planning module. For the real-world onboard applications, both accuracy and latency of a motion forecasting model are essential. In this report, we present an effective and efficient solution, which ranks the 3rd place in the Argoverse 2 Motion Forecasting Challenge 2022. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2204.04932 [pdf]

doi 10.1109/CVCI56766.2022.9964574

Optimized SC-F-LOAM: Optimized Fast LiDAR Odometry and Mapping Using Scan Context

Authors: Lizhou Liao, Chunyun Fu, Binbin Feng, Tian Su

Abstract: LiDAR odometry can achieve accurate vehicle pose estimation for short driving range or in small-scale environments, but for long driving range or in large-scale environments, the accuracy deteriorates as a result of cumulative estimation errors. This drawback necessitates the inclusion of loop closure detection in a SLAM framework to suppress the adverse effects of cumulative errors. To improve th… ▽ More LiDAR odometry can achieve accurate vehicle pose estimation for short driving range or in small-scale environments, but for long driving range or in large-scale environments, the accuracy deteriorates as a result of cumulative estimation errors. This drawback necessitates the inclusion of loop closure detection in a SLAM framework to suppress the adverse effects of cumulative errors. To improve the accuracy of pose estimation, we propose a new LiDAR-based SLAM method which uses F-LOAM as LiDAR odometry, Scan Context for loop closure detection, and GTSAM for global optimization. In our approach, an adaptive distance threshold (instead of a fixed threshold) is employed for loop closure detection, which achieves more accurate loop closure detection results. Besides, a feature-based matching method is used in our approach to compute vehicle pose transformations between loop closure point cloud pairs, instead of using the raw point cloud obtained by the LiDAR sensor, which significantly reduces the computation time. The KITTI dataset is used for verifications of our method, and the experimental results demonstrate that the proposed method outperforms typical LiDAR odometry/SLAM methods in the literature. Our code is made publicly available for the benefit of the community. △ Less

Submitted 15 March, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

Journal ref: Proceedings of the 2022 6th CAA International Conference on Vehicular Control and Intelligence (CVCI), Nanjing, China, 28-30 October 2022

arXiv:2112.06624 [pdf, other]

Pedestrian Trajectory Prediction via Spatial Interaction Transformer Network

Authors: Tong Su, Yu Meng, Yan Xu

Abstract: As a core technology of the autonomous driving system, pedestrian trajectory prediction can significantly enhance the function of active vehicle safety and reduce road traffic injuries. In traffic scenes, when encountering with oncoming people, pedestrians may make sudden turns or stop immediately, which often leads to complicated trajectories. To predict such unpredictable trajectories, we can ga… ▽ More As a core technology of the autonomous driving system, pedestrian trajectory prediction can significantly enhance the function of active vehicle safety and reduce road traffic injuries. In traffic scenes, when encountering with oncoming people, pedestrians may make sudden turns or stop immediately, which often leads to complicated trajectories. To predict such unpredictable trajectories, we can gain insights into the interaction between pedestrians. In this paper, we present a novel generative method named Spatial Interaction Transformer (SIT), which learns the spatio-temporal correlation of pedestrian trajectories through attention mechanisms. Furthermore, we introduce the conditional variational autoencoder (CVAE) framework to model the future latent motion states of pedestrians. In particular, the experiments based on large-scale trafc dataset nuScenes [2] show that SIT has an outstanding performance than state-of-the-art (SOTA) methods. Experimental evaluation on the challenging ETH and UCY datasets conrms the robustness of our proposed model △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2110.11634 [pdf, ps, other]

High-performance Estimation of Jamming Covariance Matrix for IRS-aided Directional Modulation Network with a Malicious Attacker

Authors: Hangjia He, Ting Su, Hongjun Wang, Yin Teng, Weiping Shi, Feng Shu, Jiangzhou Wang

Abstract: In this paper, we investigate the anti-jamming problem of a directional modulation (DM) system with the aid of intelligent reflecting surface (IRS). As an efficient tool to combat malicious jamming, receive beamforming (RBF) is usually designed to be on null-space of jamming channel or covariance matrix from Mallory to Bob. Thus, it is very necessary to estimate the receive jamming covariance matr… ▽ More In this paper, we investigate the anti-jamming problem of a directional modulation (DM) system with the aid of intelligent reflecting surface (IRS). As an efficient tool to combat malicious jamming, receive beamforming (RBF) is usually designed to be on null-space of jamming channel or covariance matrix from Mallory to Bob. Thus, it is very necessary to estimate the receive jamming covariance matrix (JCM) at Bob. To achieve a precise JCM estimate, three JCM estimation methods, including eigenvalue decomposition (EVD), parametric estimation method by gradient descend (PEM-GD) and parametric estimation method by alternating optimization (PEM-AO), are proposed. Here, the proposed EVD is under rank-2 constraint of JCM. The PEM-GD method fully explores the structure features of JCM and the PEM-AO is to decrease the computational complexity of the former via dimensionality reduction. The simulation results show that in low and medium jamming-noise ratio (JNR) regions, the proposed three methods perform better than the existing sample covariance matrix method. The proposed PEM-GD and PEM-AO outperform EVD method and existing clutter and disturbance covariance estimator RCML. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 5 pages, 5 figures

arXiv:2106.09508 [pdf, other]

KIT Bus: A Shuttle Model for CARLA Simulator

Authors: Yusheng Xiang, Shuo Wang, Tianqing Su, Jun Li, Samuel S. Mao, Marcus Geimer

Abstract: With the continuous development of science and technology, self-driving vehicles will surely change the nature of transportation and realize the automotive industry's transformation in the future. Compared with self-driving cars, self-driving buses are more efficient in carrying passengers and more environmentally friendly in terms of energy consumption. Therefore, it is speculated that in the fut… ▽ More With the continuous development of science and technology, self-driving vehicles will surely change the nature of transportation and realize the automotive industry's transformation in the future. Compared with self-driving cars, self-driving buses are more efficient in carrying passengers and more environmentally friendly in terms of energy consumption. Therefore, it is speculated that in the future, self-driving buses will become more and more important. As a simulator for autonomous driving research, the CARLA simulator can help people accumulate experience in autonomous driving technology faster and safer. However, a shortcoming is that there is no modern bus model in the CARLA simulator. Consequently, people cannot simulate autonomous driving on buses or the scenarios interacting with buses. Therefore, we built a bus model in 3ds Max software and imported it into the CARLA to fill this gap. Our model, namely KIT bus, is proven to work in the CARLA by testing it with the autopilot simulation. The video demo is shown on our Youtube. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 6 pages, 12 figures

arXiv:2105.06635 [pdf, other]

An Extension of BIM Using AI: a Multi Working-Machines Pathfinding Solution

Authors: Yusheng Xiang, Kailun Liu, Tianqing Su, Jun Li, Shirui Ouyang, Samuel S. Mao, Marcus Geimer

Abstract: Multi working-machines pathfinding solution enables more mobile machines simultaneously to work inside of a working site so that the productivity can be expected to increase evolutionary. To date, the potential cooperation conflicts among construction machinery limit the amount of construction machinery investment in a concrete working site. To solve the cooperation problem, civil engineers optimi… ▽ More Multi working-machines pathfinding solution enables more mobile machines simultaneously to work inside of a working site so that the productivity can be expected to increase evolutionary. To date, the potential cooperation conflicts among construction machinery limit the amount of construction machinery investment in a concrete working site. To solve the cooperation problem, civil engineers optimize the working site from a logistic perspective while computer scientists improve pathfinding algorithms' performance on the given benchmark maps. In the practical implementation of a construction site, it is sensible to solve the problem with a hybrid solution; therefore, in our study, we proposed an algorithm based on a cutting-edge multi-pathfinding algorithm to enable the massive number of machines cooperation and offer the advice to modify the unreasonable part of the working site in the meantime. Using the logistic information from BIM, such as unloading and loading point, we added a pathfinding solution for multi machines to improve the whole construction fleet's productivity. In the previous study, the experiments were limited to no more than ten participants, and the computational time to gather the solution was not given; thus, we publish our pseudo-code, our tested map, and benchmark our results. Our algorithm's most extensive feature is that it can quickly replan the path to overcome the emergency on a construction site. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Comments: 17 pages, 12 figures

arXiv:2104.12369 [pdf, other]

PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

Authors: Wei Zeng, Xiaozhe Ren, Teng Su, Hui Wang, Yi Liao, Zhiwei Wang, Xin Jiang, ZhenZhang Yang, Kaisheng Wang, Xiaoda Zhang, Chen Li, Ziyan Gong, Yifan Yao, Xinjing Huang, Jun Wang, Jianfeng Yu, Qi Guo, Yue Yu, Yan Zhang, Jin Wang, Hengtao Tao, Dasen Yan, Zexuan Yi, Fang Peng, Fangqing Jiang , et al. (13 additional authors not shown)

Abstract: Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language understanding and generation with \textit{few-shot in-context} learning. In this work, we present our practice on training large-scale autoregressive language models named… ▽ More Large-scale Pretrained Language Models (PLMs) have become the new paradigm for Natural Language Processing (NLP). PLMs with hundreds of billions parameters such as GPT-3 have demonstrated strong performances on natural language understanding and generation with \textit{few-shot in-context} learning. In this work, we present our practice on training large-scale autoregressive language models named PanGu-$α$, with up to 200 billion parameters. PanGu-$α$ is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-$α$, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-$α$ in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-$α$ in performing various tasks under few-shot or zero-shot settings. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: The technique report for PanGu-$α$

arXiv:2103.10602 [pdf, other]

doi 10.1145/3451262

HeterSkinNet: A Heterogeneous Network for Skin Weights Prediction

Authors: Xiaoyu Pan, Jiancong Huang, Jiaming Mai, He Wang, Honglin Li, Tongkui Su, Wenjun Wang, Xiaogang Jin

Abstract: Character rigging is universally needed in computer graphics but notoriously laborious. We present a new method, HeterSkinNet, aiming to fully automate such processes and significantly boost productivity. Given a character mesh and skeleton as input, our method builds a heterogeneous graph that treats the mesh vertices and the skeletal bones as nodes of different types and uses graph convolutions… ▽ More Character rigging is universally needed in computer graphics but notoriously laborious. We present a new method, HeterSkinNet, aiming to fully automate such processes and significantly boost productivity. Given a character mesh and skeleton as input, our method builds a heterogeneous graph that treats the mesh vertices and the skeletal bones as nodes of different types and uses graph convolutions to learn their relationships. To tackle the graph heterogeneity, we propose a new graph network convolution operator that transfers information between heterogeneous nodes. The convolution is based on a new distance HollowDist that quantifies the relations between mesh vertices and bones. We show that HeterSkinNet is robust for production characters by providing the ability to incorporate meshes and skeletons with arbitrary topologies and morphologies (e.g., out-of-body bones, disconnected mesh components, etc.). Through exhaustive comparisons, we show that HeterSkinNet outperforms state-of-the-art methods by large margins in terms of rigging accuracy and naturalness. HeterSkinNet provides a solution for effective and robust character rigging. △ Less

Submitted 18 March, 2021; originally announced March 2021.

Comments: I3D 2021

arXiv:2102.11089 [pdf, ps, other]

doi 10.1109/TCOMM.2021.3078776

Belief-Propagation Decoding of LDPC Codes with Variable Node-Centric Dynamic Schedules

Authors: Tofar C. -Y. Chang, Pin-Han Wang, Jian-Jia Weng, I-Hsiang Lee, Yu T. Su

Abstract: Belief propagation (BP) decoding of low-density parity-check (LDPC) codes with various dynamic decoding schedules have been proposed to improve the efficiency of the conventional flooding schedule. As the ultimate goal of an ideal LDPC code decoder is to have correct bit decisions, a dynamic decoding schedule should be variable node (VN)-centric and be able to find the VNs with probable incorrect… ▽ More Belief propagation (BP) decoding of low-density parity-check (LDPC) codes with various dynamic decoding schedules have been proposed to improve the efficiency of the conventional flooding schedule. As the ultimate goal of an ideal LDPC code decoder is to have correct bit decisions, a dynamic decoding schedule should be variable node (VN)-centric and be able to find the VNs with probable incorrect decisions and having a good chance to be corrected if chosen for update. We propose a novel and effective metric called conditional innovation (CI) which serves this design goal well. To make the most of dynamic scheduling which produces high-reliability bit decisions, we limit our search for the candidate VNs to those related to the latest updated nodes only. Based on the CI metric and the new search guideline separately or in combination, we develop several highly efficient decoding schedules. To reduce decoding latency, we introduce multi-edge updating versions which offer extra latency-performance tradeoffs. Numerical results show that both single-edge and multi-edge algorithms provide better decoding performance against most dynamic schedules and the CI-based algorithms are particularly impressive at the first few decoding iterations. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2011.01830 [pdf, other]

Where am I? SLAM for Mobile Machines on A Smart Working Site

Authors: Yusheng Xiang, Dianzhao Li, Tianqing Su, Quan Zhou, Christine Brach, Samuel S. Mao, Marcus Geimer

Abstract: The current optimization approaches of construction machinery are mainly based on internal sensors. However, the decision of a reasonable strategy is not only determined by its intrinsic signals, but also very strongly by environmental information, especially the terrain. Due to the dynamically changing of the construction site and the consequent absence of a high definition map, the Simultaneous… ▽ More The current optimization approaches of construction machinery are mainly based on internal sensors. However, the decision of a reasonable strategy is not only determined by its intrinsic signals, but also very strongly by environmental information, especially the terrain. Due to the dynamically changing of the construction site and the consequent absence of a high definition map, the Simultaneous Localization and Mapping (SLAM) offering the terrain information for construction machines is still challenging. Current SLAM technologies proposed for mobile machines are strongly dependent on costly or computationally expensive sensors, such as RTK GPS and cameras, so that commercial use is rare. In this study, we proposed an affordable SLAM method to create a multi-layer gird map for the construction site so that the machine can have the environmental information and be optimized accordingly. Concretely, after the machine passes by, we can get the local information and record it. Combining with positioning technology, we then create a map of the interesting places of the construction site. As a result of our research gathered from Gazebo, we showed that a suitable layout is the combination of 1 IMU and 2 differential GPS antennas using the unscented Kalman filter, which keeps the average distance error lower than 2m and the mapping error lower than 1.3% in the harsh environment. As an outlook, our SLAM technology provides the cornerstone to activate many efficiency improvement approaches. △ Less

Submitted 5 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: 13 pages; 41 figures

arXiv:2010.08502 [pdf]

A Reversible Data hiding Scheme in Encrypted Domain for Secret Image Sharing based on Chinese Remainder Theorem

Authors: Yan Ke, Minqing Zhang, Xinpeng Zhang, Jia Liu, Tingting Su, Xiaoyuan Yang

Abstract: Reversible data hiding in encrypted domain (RDH-ED) schemes based on symmetric or public key encryption are mainly applied to the security of end-to-end communication. Aimed at providing reliable technical supports for multi-party security scenarios, a separable RDH-ED scheme for secret image sharing based on Chinese remainder theorem (CRT) is presented. In the application of (t, n) secret image s… ▽ More Reversible data hiding in encrypted domain (RDH-ED) schemes based on symmetric or public key encryption are mainly applied to the security of end-to-end communication. Aimed at providing reliable technical supports for multi-party security scenarios, a separable RDH-ED scheme for secret image sharing based on Chinese remainder theorem (CRT) is presented. In the application of (t, n) secret image sharing, the image is first shared into n different shares of ciphertext. Only when not less than t shares obtained, can the original image be reconstructed. In our scheme, additional data could be embedded into the image shares. To realize data extraction from the image shares and the reconstructed image separably, two data hiding methods are proposed: one is homomorphic difference expansion in encrypted domain (HDE-ED) that supports data extraction from the reconstructed image by utilizing the addition homomorphism of CRT secret sharing; the other is difference expansion in image shares (DE-IS) that supports the data extraction from the marked shares before image reconstruction. Experimental results demonstrate that the proposed scheme could not only maintain the security and the threshold function of secret sharing system, but also obtain a better reversibility and efficiency compared with most existing RDH-ED algorithms. The maximum embedding rate of HDE-ED could reach 0.5000 bits per pixel and the average embedding rate of DE-IS is 0.0545 bits per bit of ciphertext. △ Less

Submitted 25 September, 2020; originally announced October 2020.

arXiv:2009.05033 [pdf, other]

5G meets Construction Machines: Towards a Smart working Site

Authors: Yusheng Xiang, Bing Xu, Tianqing Su, Christine Brach, Samuel S. Mao, Marcus Geimer

Abstract: The fleet management of mobile working machines with the help of connectivity can increase safety and productivity. Although in our previous study, we proposed a solution to use IEEE 802.11p to achieve the fleet management of construction machines, the shortcoming of WIFI may limit the usage of this technology in some cases. Alternatively, the fifth-generation mobile networks (5G) have shown great… ▽ More The fleet management of mobile working machines with the help of connectivity can increase safety and productivity. Although in our previous study, we proposed a solution to use IEEE 802.11p to achieve the fleet management of construction machines, the shortcoming of WIFI may limit the usage of this technology in some cases. Alternatively, the fifth-generation mobile networks (5G) have shown great potential to solve the problems. Thus, as the world's first academic paper investigating 5G and construction machines' cooperation, we demonstrated the scenarios where 5G can have a significant effect on the construction machines industry. Also, based on the simulation we made in $ns-3$, we compared the performance of 4G and 5G for the most relevant construction machines scenarios. Last but not least, we showed the feasibility of remote-control and self-working construction machines with the help of 5G. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Comments: 8 pages, 12 figures

arXiv:2008.06997 [pdf, other]

doi 10.1038/s41467-021-23235-4

Deep Learning Predicts Cardiovascular Disease Risks from Lung Cancer Screening Low Dose Computed Tomography

Authors: Hanqing Chao, Hongming Shan, Fatemeh Homayounieh, Ramandeep Singh, Ruhani Doda Khera, Hengtao Guo, Timothy Su, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

Abstract: Cancer patients have a higher risk of cardiovascular disease (CVD) mortality than the general population. Low dose computed tomography (LDCT) for lung cancer screening offers an opportunity for simultaneous CVD risk estimation in at-risk patients. Our deep learning CVD risk prediction model, trained with 30,286 LDCTs from the National Lung Cancer Screening Trial, achieved an area under the curve (… ▽ More Cancer patients have a higher risk of cardiovascular disease (CVD) mortality than the general population. Low dose computed tomography (LDCT) for lung cancer screening offers an opportunity for simultaneous CVD risk estimation in at-risk patients. Our deep learning CVD risk prediction model, trained with 30,286 LDCTs from the National Lung Cancer Screening Trial, achieved an area under the curve (AUC) of 0.871 on a separate test set of 2,085 subjects and identified patients with high CVD mortality risks (AUC of 0.768). We validated our model against ECG-gated cardiac CT based markers, including coronary artery calcification (CAC) score, CAD-RADS score, and MESA 10-year risk score from an independent dataset of 335 subjects. Our work shows that, in high-risk patients, deep learning can convert LDCT for lung cancer screening into a dual-screening quantitative tool for CVD risk estimation. △ Less

Submitted 29 March, 2021; v1 submitted 16 August, 2020; originally announced August 2020.

arXiv:2008.03585 [pdf, other]

Fully Automated Functional Fuzzing of Android Apps for Detecting Non-crashing Logic Bugs

Authors: Ting Su, Yichen Yan, Jue Wang, Jingling Sun, Yiheng Xiong, Geguang Pu, Ke Wang, Zhendong Su

Abstract: Android apps are GUI-based event-driven software and have become ubiquitous in recent years. Obviously, functional correctness is critical for an app's success. However, in addition to crash bugs, non-crashing functional bugs (in short as "non-crashing bugs" in this work) like inadvertent function failures, silent user data lost and incorrect display information are prevalent, even in popular, wel… ▽ More Android apps are GUI-based event-driven software and have become ubiquitous in recent years. Obviously, functional correctness is critical for an app's success. However, in addition to crash bugs, non-crashing functional bugs (in short as "non-crashing bugs" in this work) like inadvertent function failures, silent user data lost and incorrect display information are prevalent, even in popular, well-tested apps. These non-crashing functional bugs are usually caused by program logic errors and manifest themselves on the graphic user interfaces (GUIs). In practice, such bugs pose significant challenges in effectively detecting them because (1) current practices heavily rely on expensive, small-scale manual validation (the lack of automation); and (2) modern fully automated testing has been limited to crash bugs (the lack of test oracles). This paper fills this gap by introducing independent view fuzzing, a novel, fully automated approach for detecting non-crashing functional bugs in Android apps. Inspired by metamorphic testing, our key insight is to leverage the commonly-held independent view property of Android apps to manufacture property-preserving mutant tests from a set of seed tests that validate certain app properties. The mutated tests help exercise the tested apps under additional, adverse conditions. Any property violations indicate likely functional bugs for further manual confirmation. We have realized our approach as an automated, end-to-end functional fuzzing tool, Genie. Given an app, (1) Genie automatically detects non-crashing bugs without requiring human-provided tests and oracles (thus fully automated); and (2) the detected non-crashing bugs are diverse (thus general and not limited to specific functional properties), which set Genie apart from prior work. △ Less

Submitted 5 October, 2021; v1 submitted 8 August, 2020; originally announced August 2020.

Showing 1–50 of 98 results for author: Su, T