Search | arXiv e-print repository

Structured Model Pruning for Efficient Inference in Computational Pathology

Authors: Mohammed Adnan, Qinle Ba, Nazim Shaikh, Shivam Kalra, Satarupa Mukherjee, Auranuch Lorsakul

Abstract: Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to devel… ▽ More Recent years have seen significant efforts to adopt Artificial Intelligence (AI) in healthcare for various use cases, from computer-aided diagnosis to ICU triage. However, the size of AI models has been rapidly growing due to scaling laws and the success of foundational models, which poses an increasing challenge to leverage advanced models in practical applications. It is thus imperative to develop efficient models, especially for deploying AI solutions under resource-constrains or with time sensitivity. One potential solution is to perform model compression, a set of techniques that remove less important model components or reduce parameter precision, to reduce model computation demand. In this work, we demonstrate that model pruning, as a model compression technique, can effectively reduce inference cost for computational and digital pathology based analysis with a negligible loss of analysis performance. To this end, we develop a methodology for pruning the widely used U-Net-style architectures in biomedical imaging, with which we evaluate multiple pruning heuristics on nuclei instance segmentation and classification, and empirically demonstrate that pruning can compress models by at least 70% with a negligible drop in performance. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.07905 [pdf]

Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization

Authors: Zheng Xu, Yulu Gong, Yanlin Zhou, Qiaozhi Bao, Wenpin Qian

Abstract: With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of large-scale cloud computing systems. Aiming at the complexity and real-time requirement of task scheduling in large-scale cloud computing system, this paper pro… ▽ More With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of large-scale cloud computing systems. Aiming at the complexity and real-time requirement of task scheduling in large-scale cloud computing system, this paper proposes an automatic task scheduling scheme based on deep learning and reinforcement learning. Firstly, the deep learning technology is used to monitor and predict the parameters in the cloud computing system in real time to obtain the system status information. Then, combined with reinforcement learning algorithm, the task scheduling strategy is dynamically adjusted according to the real-time system state and task characteristics to achieve the optimal utilization of system resources and the maximum of task execution efficiency. This paper verifies the effectiveness and performance advantages of the proposed scheme in experiments, and proves the potential and application prospect of deep learning and reinforcement learning in automatic task scheduling in large-scale cloud computing systems. △ Less

Submitted 26 February, 2024; originally announced March 2024.

arXiv:2401.01078 [pdf, other]

Vietnamese Poem Generation & The Prospect Of Cross-Language Poem-To-Poem Translation

Authors: Triet Minh Huynh, Quan Le Bao

Abstract: Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacio… ▽ More Poetry generation has been a challenging task in the field of Natural Language Processing, as it requires the model to understand the nuances of language, sentiment, and style. In this paper, we propose using Large Language Models to generate Vietnamese poems of various genres from natural language prompts, thereby facilitating an intuitive process with enhanced content control. Our most efficacious model, the GPT-3 Babbage variant, achieves a custom evaluation score of 0.8, specifically tailored to the "luc bat" genre of Vietnamese poetry. Furthermore, we also explore the idea of paraphrasing poems into normal text prompts and yield a relatively high score of 0.781 in the "luc bat" genre. This experiment presents the potential for cross-Language poem-to-poem translation with translated poems as the inputs while concurrently maintaining complete control over the generated content. △ Less

Submitted 4 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

arXiv:2310.09430 [pdf, ps, other]

Assessing and Enhancing the Robustness of Large Language Models with Task Structure Variations for Logical Reasoning

Authors: Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan, Yang Chen, Michael Witbrock, Jiamou Liu

Abstract: Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets name… ▽ More Large language models (LLMs), such as LLaMA, Alpaca, Vicuna, GPT-3.5 and GPT-4, have advanced the performance of AI systems on various natural language processing tasks to human-like levels. However, their generalisation and robustness when performing logical reasoning has not been sufficiently assessed. To comprehensively evaluate this ability, we develop three new logical reasoning datasets named "ReClor-plus", "LogiQA-plus" and "LogiQAv2-plus" that extend standard logical reasoning datasets to evaluate the robustness of the LLM's reasoning. For each, we create three subsets: the first with randomly shuffled options, the second with the correct choices replaced by "none of the other options is correct", and the third with a combination of shuffling and substitution. Experiments on these datasets show that these simple augmentations greatly hinder the models' performance. Despite their high performance on the original publicly available datasets, we find that all models perform poorly on these newly constructed datasets. We also demonstrate that introducing task variations into the training set can markedly improve the model's performance on both the original and our developed datasets. Finally, we show that applying logic-driven data augmentation for fine-tuning and prompting can enhance generalisation in both discriminative and generative models, offering a path to improving their robustness for tasks involving logical reasoning. Source code and data are made publicly available at https://github.com/Strong-AI-Lab/Logical-and-abstract-reasoning. △ Less

Submitted 30 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: The short version (v3) was accepted for oral presentation at the first LLM@IJCAI 2023 non-archival symposium; the full version is under review

arXiv:2309.10444 [pdf, other]

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Authors: Qiming Bao, Juho Leinonen, Alex Yuxuan Peng, Wanjun Zhong, Gaël Gendron, Timothy Pistotti, Alice Huang, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other stud… ▽ More Large language models exhibit superior capabilities in processing and understanding language, yet their applications in educational contexts remain underexplored. Learnersourcing enhances learning by engaging students in creating their own educational content. When learnersourcing multiple-choice questions, creating explanations for the solution of a question is a crucial step; it helps other students understand the solution and promotes a deeper understanding of related concepts. However, it is often difficult for students to craft effective solution explanations, due to limited subject understanding. To help scaffold the task of automated explanation generation, we present and evaluate a framework called "ILearner-LLM", that iteratively enhances the generated explanations for the given questions with large language models. Comprising an explanation generation model and an explanation evaluation model, the framework generates high-quality student-aligned explanations by iteratively feeding the quality rating score from the evaluation model back into the instruction prompt of the explanation generation model. Experimental results demonstrate the effectiveness of our ILearner-LLM on LLaMA2-13B and GPT-4 to generate higher quality explanations that are closer to those written by students on five PeerWise datasets. Our findings represent a promising path to enrich the learnersourcing experience for students and to enhance the capabilities of large language models for educational applications. △ Less

Submitted 10 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: The short version (v4) was accepted as a non-archival workshop paper at AGI@ICLR 2024; the full version is under review

arXiv:2306.08253 [pdf, ps, other]

doi 10.1109/TIFS.2023.3279979

Measures and Optimization for Robustness and Vulnerability in Disconnected Networks

Authors: Liwang Zhu, Qi Bao, Zhongzhi Zhang

Abstract: The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing… ▽ More The function or performance of a network is strongly dependent on its robustness, quantifying the ability of the network to continue functioning under perturbations. While a wide variety of robustness metrics have been proposed, they have their respective limitations. In this paper, we propose to use the forest index as a measure of network robustness, which overcomes the deficiencies of existing metrics. Using such a measure as an optimization criterion, we propose and study the problem of breaking down a network by attacking some key edges. We show that the objective function of the problem is monotonic but not submodular, which impose more challenging on the problem. We thus resort to greedy algorithms extended for non-submodular functions by iteratively deleting the most promising edges. We first propose a simple greedy algorithm with a proved bound for the approximation ratio and cubic-time complexity. To confront the computation challenge for large networks, we further propose an improved nearly-linear time greedy algorithm, which significantly speeds up the process for edge selection but sacrifices little accuracy. Extensive experimental results for a large set of real-world networks verify the effectiveness and efficiency of our algorithms, demonstrating that our algorithms outperform several baseline schemes. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: 13 pages

Journal ref: IEEE Transactions on Information Forensics and Security,pp:3350-3362,2023

arXiv:2306.02850 [pdf, other]

TRACE: 5D Temporal Regression of Avatars with Dynamic Cameras in 3D Environments

Authors: Yu Sun, Qian Bao, Wu Liu, Tao Mei, Michael J. Black

Abstract: Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that… ▽ More Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critical for many applications. This is particularly challenging when the camera is also moving, entangling human and camera motion. To address these issues, we adopt a novel 5D representation (space, time, and identity) that enables end-to-end reasoning about people in scenes. Our method, called TRACE, introduces several novel architectural components. Most importantly, it uses two new "maps" to reason about the 3D trajectory of people over time in camera, and world, coordinates. An additional memory unit enables persistent tracking of people even during long occlusions. TRACE is the first one-stage method to jointly recover and track 3D humans in global coordinates from dynamic cameras. By training it end-to-end, and using full image information, TRACE achieves state-of-the-art performance on tracking and HPS benchmarks. The code and dataset are released for research purposes. △ Less

Submitted 20 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Project page: https://www.yusun.work/TRACE/TRACE.html

arXiv:2305.19555 [pdf, ps, other]

Large Language Models Are Not Strong Abstract Reasoners

Authors: Gaël Gendron, Qiming Bao, Michael Witbrock, Gillian Dobbie

Abstract: Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning i… ▽ More Large Language Models have shown tremendous performance on a large variety of natural language processing tasks, ranging from text comprehension to common sense reasoning. However, the mechanisms responsible for this success remain opaque, and it is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed. Abstract reasoning is a fundamental task for cognition, consisting of finding and applying a general pattern from few data. Evaluating deep neural architectures on this task could give insight into their potential limitations regarding reasoning and their broad generalisation abilities, yet this is currently an under-explored area. In this paper, we introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks. We perform extensive evaluations of state-of-the-art LLMs, showing that they currently achieve very limited performance in contrast with other natural language tasks, even when applying techniques that have been shown to improve performance on other NLP tasks. We argue that guiding LLM generation to follow causal paths could help improve the generalisation and reasoning abilities of LLMs. △ Less

Submitted 2 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: 50 pages, 14 pages for the main paper and 36 pages for the supplement, 35 figures, 17 tables. V3: performed additional experiments

ACM Class: I.2.2; I.2.3; I.2.7; I.5.1

arXiv:2305.12599 [pdf, other]

Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

Authors: Qiming Bao, Alex Yuxuan Peng, Zhenyun Deng, Wanjun Zhong, Gael Gendron, Timothy Pistotti, Neset Tan, Nathan Young, Yang Chen, Yonghua Zhu, Paul Denny, Michael Witbrock, Jiamou Liu

Abstract: Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data… ▽ More Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning. △ Less

Submitted 6 June, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: 21 pages, 8 figures, the Findings of ACL 2024

arXiv:2303.07585 [pdf, other]

Input-length-shortening and text generation via attention values

Authors: Neşet Özkan Tan, Alex Yuxuan Peng, Joshua Bensemann, Qiming Bao, Tim Hartill, Mark Gahegan, Michael Witbrock

Abstract: Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-leng… ▽ More Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-length limitation caused by hardware constraints. This limitation applies to many transformers, including the well-known bidirectional encoder representations of the transformer (BERT) model. In this paper, we examined BERT's attention assignment mechanism, focusing on two questions: (1) How can attention be employed to reduce input length? (2) How can attention be used as a control mechanism for conditional text generation? We investigated these questions in the context of a text classification task. We discovered that BERT's early layers assign more critical attention scores for text classification tasks compared to later layers. We demonstrated that the first layer's attention sums could be used to filter tokens in a given sequence, considerably decreasing the input length while maintaining good test accuracy. We also applied filtering, which uses a compute-efficient semantic similarities algorithm, and discovered that retaining approximately 6\% of the original sequence is sufficient to obtain 86.5\% accuracy. Finally, we showed that we could generate data in a stable manner and indistinguishable from the original one by only using a small percentage (10\%) of the tokens with high attention scores according to BERT's first layer. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 7 pages, 4 figures. AAAI23-EMC2

arXiv:2209.02431 [pdf, other]

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

Authors: Shuaitao Zhao, Kun Liu, Yuhang Huang, Qian Bao, Dan Zeng, Wu Liu

Abstract: Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happe… ▽ More Human pose estimation aims to figure out the keypoints of all people in different scenes. Current approaches still face some challenges despite promising results. Existing top-down methods deal with a single person individually, without the interaction between different people and the scene they are situated in. Consequently, the performance of human detection degrades when serious occlusion happens. On the other hand, existing bottom-up methods consider all people at the same time and capture the global knowledge of the entire image. However, they are less accurate than the top-down methods due to the scale variation. To address these problems, we propose a novel Dual-Pipeline Integrated Transformer (DPIT) by integrating top-down and bottom-up pipelines to explore the visual clues of different receptive fields and achieve their complementarity. Specifically, DPIT consists of two branches, the bottom-up branch deals with the whole image to capture the global visual information, while the top-down branch extracts the feature representation of local vision from the single-human bounding box. Then, the extracted feature representations from bottom-up and top-down branches are fed into the transformer encoder to fuse the global and local knowledge interactively. Moreover, we define the keypoint queries to explore both full-scene and single-human posture visual clues to realize the mutual complementarity of the two pipelines. To the best of our knowledge, this is one of the first works to integrate the bottom-up and top-down pipelines with transformers for human pose estimation. Extensive experiments on COCO and MPII datasets demonstrate that our DPIT achieves comparable performance to the state-of-the-art methods. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2209.01059 [pdf, other]

In-Place Gestures Classification via Long-term Memory Augmented Network

Authors: Lizhi Zhao, Xuequan Lu, Qianyue Bao, Meili Wang

Abstract: In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (u… ▽ More In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: This paper is accepted to IEEE ISMAR2022

arXiv:2209.00776 [pdf, other]

doi 10.1145/3503161.3547743

WOC: A Handy Webcam-based 3D Online Chatroom

Authors: Chuanhang Yan, Yu Sun, Qian Bao, Jinhui Pang, Wu Liu, Tao Mei

Abstract: We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual a… ▽ More We develop WOC, a webcam-based 3D virtual online chatroom for multi-person interaction, which captures the 3D motion of users and drives their individual 3D virtual avatars in real-time. Compared to the existing wearable equipment-based solution, WOC offers convenient and low-cost 3D motion capture with a single camera. To promote the immersive chat experience, WOC provides high-fidelity virtual avatar manipulation, which also supports the user-defined characters. With the distributed data flow service, the system delivers highly synchronized motion and voice for all users. Deployed on the website and no installation required, users can freely experience the virtual online chat at https://yanch.cloud. △ Less

Submitted 17 March, 2023; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2208.03609 [pdf, other]

Continual Learning for Tumor Classification in Histopathology Images

Authors: Veena Kaustaban, Qinle Ba, Ipshita Bhattacharya, Nahil Sobh, Satarupa Mukherjee, Jim Martin, Mohammad Saleh Miri, Christoph Guetter, Amal Chaturvedi

Abstract: Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from cata… ▽ More Recent years have seen great advancements in the development of deep learning models for histopathology image analysis in digital pathology applications, evidenced by the increasingly common deployment of these models in both research and clinical settings. Although such models have shown unprecedented performance in solving fundamental computational tasks in DP applications, they suffer from catastrophic forgetting when adapted to unseen data with transfer learning. With an increasing need for deep learning models to handle ever changing data distributions, including evolving patient population and new diagnosis assays, continual learning models that alleviate model forgetting need to be introduced in DP based analysis. However, to our best knowledge, there is no systematic study of such models for DP-specific applications. Here, we propose CL scenarios in DP settings, where histopathology image data from different sources/distributions arrive sequentially, the knowledge of which is integrated into a single model without training all the data from scratch. We then established an augmented dataset for colorectal cancer H&E classification to simulate shifts of image appearance and evaluated CL model performance in the proposed CL scenarios. We leveraged a breast tumor H&E dataset along with the colorectal cancer to evaluate CL from different tumor types. In addition, we evaluated CL methods in an online few-shot setting under the constraints of annotation and computational resources. We revealed promising results of CL in DP applications, potentially paving the way for application of these methods in clinical practice. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted by MOVI, a MICCAI2022 workshop: https://sites.google.com/view/movi2022

arXiv:2207.14000 [pdf, other]

Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation

Authors: Qiming Bao, Alex Yuxuan Peng, Tim Hartill, Neset Tan, Zhenyun Deng, Michael Witbrock, Jiamou Liu

Abstract: Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an it… ▽ More Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an iterative memory neural network based on RNN with a gated attention mechanism. We evaluate IMA-GloVe-GA on three datasets: PARARULES, CONCEPTRULES V1 and CONCEPTRULES V2. Experimental results show DeepLogic with gated attention can achieve higher test accuracy than DeepLogic and other RNN baseline models. Our model achieves better out-of-distribution generalisation than RoBERTa-Large when the rules have been shuffled. Furthermore, to address the issue of unbalanced distribution of reasoning depths in the current multi-step reasoning datasets, we develop PARARULE-Plus, a large dataset with more examples that require deeper reasoning steps. Experimental results show that the addition of PARARULE-Plus can increase the model's performance on examples requiring deeper reasoning depths. The source code and data are available at https://github.com/Strong-AI-Lab/Multi-Step-Deductive-Reasoning-Over-Natural-Language. △ Less

Submitted 30 March, 2024; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: 10 pages, 3 figures, The 2nd International Joint Conference on Learning & Reasoning and 16th International Workshop on Neural-Symbolic Learning and Reasoning (IJCLR-NeSy 2022)

arXiv:2203.12186 [pdf, other]

AbductionRules: Training Transformers to Explain Unexpected Inputs

Authors: Nathan Young, Qiming Bao, Joshua Bensemann, Michael Witbrock

Abstract: Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a… ▽ More Transformers have recently been shown to be capable of reliably performing logical reasoning over facts and rules expressed in natural language, but abductive reasoning - inference to the best explanation of an unexpected observation - has been underexplored despite significant applications to scientific discovery, common-sense reasoning, and model interpretability. We present AbductionRules, a group of natural language datasets designed to train and test generalisable abduction over natural-language knowledge bases. We use these datasets to finetune pretrained Transformers and discuss their performance, finding that our models learned generalisable abductive techniques but also learned to exploit the structure of our data. Finally, we discuss the viability of this approach to abductive reasoning and ways in which it may be improved in future work. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022

arXiv:2201.04024 [pdf, other]

Smart Director: An Event-Driven Directing System for Live Broadcasting

Authors: Yingwei Pan, Yue Chen, Qian Bao, Ning Zhang, Ting Yao, Jingen Liu, Tao Mei

Abstract: Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keep increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To reliev… ▽ More Live video broadcasting normally requires a multitude of skills and expertise with domain knowledge to enable multi-camera productions. As the number of cameras keep increasing, directing a live sports broadcast has now become more complicated and challenging than ever before. The broadcast directors need to be much more concentrated, responsive, and knowledgeable, during the production. To relieve the directors from their intensive efforts, we develop an innovative automated sports broadcast directing system, called Smart Director, which aims at mimicking the typical human-in-the-loop broadcasting process to automatically create near-professional broadcasting programs in real-time by using a set of advanced multi-view video analysis algorithms. Inspired by the so-called "three-event" construction of sports broadcast, we build our system with an event-driven pipeline consisting of three consecutive novel components: 1) the Multi-view Event Localization to detect events by modeling multi-view correlations, 2) the Multi-view Highlight Detection to rank camera views by the visual importance for view selection, 3) the Auto-Broadcasting Scheduler to control the production of broadcasting videos. To our best knowledge, our system is the first end-to-end automated directing system for multi-camera sports broadcasting, completely driven by the semantic understanding of sports events. It is also the first system to solve the novel problem of multi-view joint event detection by cross-view relation modeling. We conduct both objective and subjective evaluations on a real-world multi-camera soccer dataset, which demonstrate the quality of our auto-generated videos is comparable to that of the human-directed. Thanks to its faster response, our system is able to capture more fast-passing and short-duration events which are usually missed by human directors. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

arXiv:2201.00616 [pdf, other]

doi 10.1142/S2705078521500156

Relating Blindsight and AI: A Review

Authors: Joshua Bensemann, Qiming Bao, Gaël Gendron, Tim Hartill, Michael Witbrock

Abstract: Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled within artificial neural network architectures. Due to this, we have conducted a review of research on the phenomenon of blindsight in an attempt to generate ideas for artificial intelligence models. Blindsight can be considered as a diminished form of visual experience. If we assume that artificial network… ▽ More Processes occurring in brains, a.k.a. biological neural networks, can and have been modeled within artificial neural network architectures. Due to this, we have conducted a review of research on the phenomenon of blindsight in an attempt to generate ideas for artificial intelligence models. Blindsight can be considered as a diminished form of visual experience. If we assume that artificial networks have no form of visual experience, then deficits caused by blindsight give us insights into the processes occurring within visual experience that we can incorporate into artificial neural networks. This article has been structured into three parts. Section 2 is a review of blindsight research, looking specifically at the errors occurring during this condition compared to normal vision. Section 3 identifies overall patterns from Section 2 to generate insights for computational models of vision. Section 4 demonstrates the utility of examining biological research to inform artificial intelligence research by examining computation models of visual attention relevant to one of the insights generated in Section 3. The research covered in Section 4 shows that incorporating one of our insights into computational vision does benefit those models. Future research will be required to determine whether our other insights are as valuable. △ Less

Submitted 8 December, 2021; originally announced January 2022.

Comments: Preprint of an article published in Journal of Artificial Intelligence and Consciousness, 2021 doi.org/10.1142/S2705078521500156 \c{opyright} copyright World Scientific Publishing Company www.worldscientific.com/worldscinet/jaic

Journal ref: Journal of Artificial Intelligence and Consciousness, 1-15 (2021)

arXiv:2201.00466 [pdf, other]

RFormer: Transformer-based Generative Adversarial Network for Real Fundus Image Restoration on A New Clinical Benchmark

Authors: Zhuo Deng, Yuanhao Cai, Lu Chen, Zheng Gong, Qiqi Bao, Xue Yao, Dong Fang, Shaochong Zhang, Lan Ma

Abstract: Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real cli… ▽ More Ophthalmologists have used fundus images to screen and diagnose eye diseases. However, different equipments and ophthalmologists pose large variations to the quality of fundus images. Low-quality (LQ) degraded fundus images easily lead to uncertainty in clinical screening and generally increase the risk of misdiagnosis. Thus, real fundus image restoration is worth studying. Unfortunately, real clinical benchmark has not been explored for this task so far. In this paper, we investigate the real clinical fundus image restoration problem. Firstly, We establish a clinical dataset, Real Fundus (RF), including 120 low- and high-quality (HQ) image pairs. Then we propose a novel Transformer-based Generative Adversarial Network (RFormer) to restore the real degradation of clinical fundus images. The key component in our network is the Window-based Self-Attention Block (WSAB) which captures non-local self-similarity and long-range dependencies. To produce more visually pleasant results, a Transformer-based discriminator is introduced. Extensive experiments on our clinical benchmark show that the proposed RFormer significantly outperforms the state-of-the-art (SOTA) methods. In addition, experiments of downstream tasks such as vessel segmentation and optic disc/cup detection demonstrate that our proposed RFormer benefits clinical fundus image analysis and applications. The dataset, code, and models are publicly available at https://github.com/dengzhuo-AI/Real-Fundus △ Less

Submitted 3 August, 2022; v1 submitted 2 January, 2022; originally announced January 2022.

Comments: IEEE J-BHI 2022; The First Benchmark and First Transformer-based Method for Real Clinical Fundus Image Restoration

arXiv:2112.08274 [pdf, other]

Putting People in their Place: Monocular Regression of 3D People in Depth

Authors: Yu Sun, Wu Liu, Qian Bao, Yili Fu, Tao Mei, Michael J. Black

Abstract: Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several t… ▽ More Given an image with multiple people, our goal is to directly regress the pose and shape of all the people as well as their relative depth. Inferring the depth of a person in an image, however, is fundamentally ambiguous without knowing their height. This is particularly problematic when the scene contains people of very different sizes, e.g. from infants to adults. To solve this, we need several things. First, we develop a novel method to infer the poses and depth of multiple people in a single image. While previous work that estimates multiple people does so by reasoning in the image plane, our method, called BEV, adds an additional imaginary Bird's-Eye-View representation to explicitly reason about depth. BEV reasons simultaneously about body centers in the image and in depth and, by combing these, estimates 3D body position. Unlike prior work, BEV is a single-shot method that is end-to-end differentiable. Second, height varies with age, making it impossible to resolve depth without also estimating the age of people in the image. To do so, we exploit a 3D body model space that lets BEV infer shapes from infants to adults. Third, to train BEV, we need a new dataset. Specifically, we create a "Relative Human" (RH) dataset that includes age labels and relative depth relationships between the people in the images. Extensive experiments on RH and AGORA demonstrate the effectiveness of the model and training scheme. BEV outperforms existing methods on depth reasoning, child shape estimation, and robustness to occlusion. The code and dataset are released for research purposes. △ Less

Submitted 19 April, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: CVPR 2022; Code https://github.com/Arthur151/ROMP ; Dataset https://github.com/Arthur151/Relative_Human

arXiv:2111.10058 [pdf, other]

DeepQR: Neural-based Quality Ratings for Learnersourced Multiple-Choice Questions

Authors: Lin Ni, Qiming Bao, Xiaoxuan Li, Qianqian Qi, Paul Denny, Jim Warren, Michael Witbrock, Jiamou Liu

Abstract: Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-art deep-learning techniques. We propose DeepQR, a n… ▽ More Automated question quality rating (AQQR) aims to evaluate question quality through computational means, thereby addressing emerging challenges in online learnersourced question repositories. Existing methods for AQQR rely solely on explicitly-defined criteria such as readability and word count, while not fully utilising the power of state-of-the-art deep-learning techniques. We propose DeepQR, a novel neural-network model for AQQR that is trained using multiple-choice-question (MCQ) datasets collected from PeerWise, a widely-used learnersourcing platform. Along with designing DeepQR, we investigate models based on explicitly-defined features, or semantic features, or both. We also introduce a self-attention mechanism to capture semantic correlations between MCQ components, and a contrastive-learning approach to acquire question representations using quality ratings. Extensive experiments on datasets collected from eight university-level courses illustrate that DeepQR has superior performance over six comparative models. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: EAAI 22

arXiv:2110.07872 [pdf, ps, other]

Role Similarity Metric Based on Spanning Rooted Forest

Authors: Qi Bao, Zhongzhi Zhang, Haibin Kan

Abstract: As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on larg… ▽ More As a fundamental issue in network analysis, structural node similarity has received much attention in academia and is adopted in a wide range of applications. Among these proposed structural node similarity measures, role similarity stands out because of satisfying several axiomatic properties including automorphism conformation. Existing role similarity metrics cannot handle top-k queries on large real-world networks due to the high time and space cost. In this paper, we propose a new role similarity metric, namely \textsf{ForestSim}. We prove that \textsf{ForestSim} is an admissible role similarity metric and devise the corresponding top-k similarity search algorithm, namely \textsf{ForestSimSearch}, which is able to process a top-k query in $O(k)$ time once the precomputation is finished. Moreover, we speed up the precomputation by using a fast approximate algorithm to compute the diagonal entries of the forest matrix, which reduces the time and space complexity of the precomputation to $O(ε^{-2}m\log^5{n}\log{\frac{1}ε})$ and $O(m\log^3{n})$, respectively. Finally, we conduct extensive experiments on 26 real-world networks. The results show that \textsf{ForestSim} works efficiently on million-scale networks and achieves comparable performance to the state-of-art methods. △ Less

Submitted 1 April, 2024; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: 10 pages, 5 figures

arXiv:2108.13246 [pdf, other]

LUAI Challenge 2021 on Learning to Understand Aerial Images

Authors: Gui-Song Xia, Jian Ding, Ming Qian, Nan Xue, Jiaming Han, Xiang Bai, Michael Ying Yang, Shengyang Li, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, Liangpei Zhang, Qiang Zhou, Chao-hui Yu, Kaixuan Hu, Yingjia Bu, Wenming Tan, Zhe Yang, Wei Li, Shang Liu, Jiaxuan Zhao, Tianzhi Ma, Zi-han Gao, Lingqi Wang , et al. (11 additional authors not shown)

Abstract: This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This cha… ▽ More This report summarizes the results of Learning to Understand Aerial Images (LUAI) 2021 challenge held on ICCV 2021, which focuses on object detection and semantic segmentation in aerial images. Using DOTA-v2.0 and GID-15 datasets, this challenge proposes three tasks for oriented object detection, horizontal object detection, and semantic segmentation of common categories in aerial images. This challenge received a total of 146 registrations on the three tasks. Through the challenge, we hope to draw attention from a wide range of communities and call for more efforts on the problems of learning to understand aerial images. △ Less

Submitted 17 September, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 7 pages, 2 figures, accepted by ICCVW 2021

arXiv:2104.11536 [pdf, other]

Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep Learning Perspective

Authors: Wu Liu, Qian Bao, Yu Sun, Tao Mei

Abstract: Estimation of the human pose from a monocular camera has been an emerging research topic in the computer vision community with many applications. Recently, benefited from the deep learning technologies, a significant amount of research efforts have greatly advanced the monocular human pose estimation both in 2D and 3D areas. Although there have been some works to summarize the different approaches… ▽ More Estimation of the human pose from a monocular camera has been an emerging research topic in the computer vision community with many applications. Recently, benefited from the deep learning technologies, a significant amount of research efforts have greatly advanced the monocular human pose estimation both in 2D and 3D areas. Although there have been some works to summarize the different approaches, it still remains challenging for researchers to have an in-depth view of how these approaches work. In this paper, we provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem. We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks. By systematically summarizing the differences and connections between these approaches, we further analyze the solutions for challenging cases, such as the lack of data, the inherent ambiguity between 2D and 3D, and the complex multi-person scenarios. We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches. Finally, we discuss the challenges and give deep thinking of promising directions for future research. We believe this survey will provide the readers with a deep and insightful understanding of monocular human pose estimation. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2102.07087 [pdf, other]

Survey on Aerial Radio Access Networks: Toward a Comprehensive 6G Access Infrastructure

Authors: Nhu-Ngoc Dao, Quoc-Viet Pham, Ngo Hoang Tu, Tran Thien Thanh, Vo Nguyen Quoc Bao, Demeke Shumeye Lakew, Sungrae Cho

Abstract: Current network access infrastructures are characterized by heterogeneity, low latency, high throughput, and high computational capability, enabling massive concurrent connections and various services. Unfortunately, this design does not pay significant attention to mobile services in underserved areas. In this context, the use of aerial radio access networks (ARANs) is a promising strategy to com… ▽ More Current network access infrastructures are characterized by heterogeneity, low latency, high throughput, and high computational capability, enabling massive concurrent connections and various services. Unfortunately, this design does not pay significant attention to mobile services in underserved areas. In this context, the use of aerial radio access networks (ARANs) is a promising strategy to complement existing terrestrial communication systems. Involving airborne components such as unmanned aerial vehicles, drones, and satellites, ARANs can quickly establish a flexible access infrastructure on demand. ARANs are expected to support the development of seamless mobile communication systems toward a comprehensive sixth-generation (6G) global access infrastructure. This paper provides an overview of recent studies regarding ARANs in the literature. First, we investigate related work to identify areas for further exploration in terms of recent knowledge advancements and analyses. Second, we define the scope and methodology of this study. Then, we describe ARAN architecture and its fundamental features for the development of 6G networks. In particular, we analyze the system model from several perspectives, including transmission propagation, energy consumption, communication latency, and network mobility. Furthermore, we introduce technologies that enable the success of ARAN implementations in terms of energy replenishment, operational management, and data delivery. Subsequently, we discuss application scenarios envisioned for these technologies. Finally, we highlight ongoing research efforts and trends toward 6G ARANs. △ Less

Submitted 27 February, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: Accepted by the IEEE Communications Surveys and Tutorials

arXiv:2101.08143 [pdf, other]

doi 10.1145/3442381.3449812

Fast Evaluation for Relevant Quantities of Opinion Dynamics

Authors: Wanyue Xu, Qi Bao, Zhongzhi Zhang

Abstract: One of the main subjects in the field of social networks is to quantify conflict, disagreement, controversy, and polarization, and some quantitative indicators have been developed to quantify these concepts. However, direct computation of these indicators involves the operations of matrix inversion and multiplication, which make it computationally infeasible for large-scale graphs with millions of… ▽ More One of the main subjects in the field of social networks is to quantify conflict, disagreement, controversy, and polarization, and some quantitative indicators have been developed to quantify these concepts. However, direct computation of these indicators involves the operations of matrix inversion and multiplication, which make it computationally infeasible for large-scale graphs with millions of nodes. In this paper, by reducing the problem of computing relevant quantities to evaluating $\ell_2$ norms of some vectors, we present a nearly linear time algorithm to estimate all these quantities. Our algorithm is based on the Laplacian solvers, and has a proved theoretical guarantee of error for each quantity. We execute extensive numerical experiments on a variety of real networks, which demonstrate that our approximation algorithm is efficient and effective, scalable to large graphs having millions of nodes. △ Less

Submitted 12 June, 2021; v1 submitted 20 January, 2021; originally announced January 2021.

Journal ref: Proceedings of The Web Conference 2021, pp.2037-2045

arXiv:2012.13577 [pdf, other]

doi 10.1609/aaai.v36i10.21291

LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification

Authors: Jiangjie Chen, Qiaoben Bao, Changzhi Sun, Xinbo Zhang, Jiaze Chen, Hao Zhou, Yanghua Xiao, Lei Li

Abstract: Given a natural language statement, how to verify its veracity against a large-scale textual knowledge source like Wikipedia? Most existing neural models make predictions without giving clues about which part of a false claim goes wrong. In this paper, we propose LOREN, an approach for interpretable fact verification. We decompose the verification of the whole claim at phrase-level, where the vera… ▽ More Given a natural language statement, how to verify its veracity against a large-scale textual knowledge source like Wikipedia? Most existing neural models make predictions without giving clues about which part of a false claim goes wrong. In this paper, we propose LOREN, an approach for interpretable fact verification. We decompose the verification of the whole claim at phrase-level, where the veracity of the phrases serves as explanations and can be aggregated into the final verdict according to logical rules. The key insight of LOREN is to represent claim phrase veracity as three-valued latent variables, which are regularized by aggregation logical rules. The final claim verification is based on all latent variables. Thus, LOREN enjoys the additional benefit of interpretability -- it is easy to explain how it reaches certain results with claim phrase veracity. Experiments on a public fact verification benchmark show that LOREN is competitive against previous approaches while enjoying the merit of faithful and accurate interpretability. The resources of LOREN are available at: https://github.com/jiangjiechen/LOREN. △ Less

Submitted 9 December, 2021; v1 submitted 25 December, 2020; originally announced December 2020.

Comments: Accepted to AAAI 2022

arXiv:2012.04821 [pdf, ps, other]

Complex Relation Extraction: Challenges and Opportunities

Authors: Haiyun Jiang, Qiaoben Bao, Qiao Cheng, Deqing Yang, Li Wang, Yanghua Xiao

Abstract: Relation extraction aims to identify the target relations of entities in texts. Relation extraction is very important for knowledge base construction and text understanding. Traditional binary relation extraction, including supervised, semi-supervised and distant supervised ones, has been extensively studied and significant results are achieved. In recent years, many complex relation extraction ta… ▽ More Relation extraction aims to identify the target relations of entities in texts. Relation extraction is very important for knowledge base construction and text understanding. Traditional binary relation extraction, including supervised, semi-supervised and distant supervised ones, has been extensively studied and significant results are achieved. In recent years, many complex relation extraction tasks, i.e., the variants of simple binary relation extraction, are proposed to meet the complex applications in practice. However, there is no literature to fully investigate and summarize these complex relation extraction works so far. In this paper, we first report the recent progress in traditional simple binary relation extraction. Then we summarize the existing complex relation extraction tasks and present the definition, recent progress, challenges and opportunities for each task. △ Less

Submitted 8 December, 2020; originally announced December 2020.

Comments: 7 pages

arXiv:2010.14036 [pdf, other]

Synthetic Training for Monocular Human Mesh Recovery

Authors: Yu Sun, Qian Bao, Wu Liu, Wenpeng Gao, Yili Fu, Chuang Gan, Tao Mei

Abstract: Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve th… ▽ More Recovering 3D human mesh from monocular images is a popular topic in computer vision and has a wide range of applications. This paper aims to estimate 3D mesh of multiple body parts (e.g., body, hands) with large-scale differences from a single RGB image. Existing methods are mostly based on iterative optimization, which is very time-consuming. We propose to train a single-shot model to achieve this goal. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. To solve this problem, we design a multi-branch framework to disentangle the regression of different body properties, enabling us to separate each component's training in a synthetic training manner using unpaired data available. Besides, to strengthen the generalization ability, most existing methods have used in-the-wild 2D pose datasets to supervise the estimated 3D pose via 3D-to-2D projection. However, we observe that the commonly used weak-perspective model performs poorly in dealing with the external foreshortening effect of camera projection. Therefore, we propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants for more proper supervision. The proposed method outperforms previous methods on the CMU Panoptic Studio dataset according to the evaluation results and achieves comparable results on the Human3.6M body and STB hand benchmarks. More impressively, the performance in close shot images gets significantly improved using the proposed D2S projection for weak supervision, while maintains obvious superiority in computational efficiency. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2008.12272 [pdf, other]

Monocular, One-stage, Regression of Multiple 3D People

Authors: Yu Sun, Qian Bao, Wu Liu, Yili Fu, Michael J. Black, Tao Mei

Abstract: This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding… ▽ More This paper focuses on the regression of multiple 3D people from a single RGB image. Existing approaches predominantly follow a multi-stage pipeline that first detects people in bounding boxes and then independently regresses their 3D body meshes. In contrast, we propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP). The approach is conceptually simple, bounding box-free, and able to learn a per-pixel representation in an end-to-end manner. Our method simultaneously predicts a Body Center heatmap and a Mesh Parameter map, which can jointly describe the 3D body mesh on the pixel level. Through a body-center-guided sampling process, the body mesh parameters of all people in the image are easily extracted from the Mesh Parameter map. Equipped with such a fine-grained representation, our one-stage framework is free of the complex multi-stage process and more robust to occlusion. Compared with state-of-the-art methods, ROMP achieves superior performance on the challenging multi-person benchmarks, including 3DPW and CMU Panoptic. Experiments on crowded/occluded datasets demonstrate the robustness under various types of occlusion. The released code is the first real-time implementation of monocular multi-person 3D mesh regression. △ Less

Submitted 16 September, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: ICCV 2021, Code https://github.com/Arthur151/ROMP

arXiv:2008.06285 [pdf, other]

Rb-PaStaNet: A Few-Shot Human-Object Interaction Detection Based on Rules and Part States

Authors: Shenyu Zhang, Zichen Zhu, Qingquan Bao

Abstract: Existing Human-Object Interaction (HOI) Detection approaches have achieved great progress on nonrare classes while rare HOI classes are still not well-detected. In this paper, we intend to apply human prior knowledge into the existing work. So we add human-labeled rules to PaStaNet and propose Rb-PaStaNet aimed at improving rare HOI classes detection. Our results show a certain improvement of the… ▽ More Existing Human-Object Interaction (HOI) Detection approaches have achieved great progress on nonrare classes while rare HOI classes are still not well-detected. In this paper, we intend to apply human prior knowledge into the existing work. So we add human-labeled rules to PaStaNet and propose Rb-PaStaNet aimed at improving rare HOI classes detection. Our results show a certain improvement of the rare classes, while the non-rare classes and the overall improvement is more considerable. △ Less

Submitted 14 August, 2020; originally announced August 2020.

arXiv:2006.13693 [pdf, other]

PECAIQR: A Model for Infectious Disease Applied to the Covid-19 Epidemic

Authors: Richard Bao, August Chen, Jethin Gowda, Shiva Mudide

Abstract: The Covid-19 pandemic has made clear the need to improve modern multivariate time-series forecasting models. Current state of the art predictions of future daily deaths and, especially, hospital resource usage have confidence intervals that are unacceptably wide. Policy makers and hospitals require accurate forecasts to make informed decisions on passing legislation and allocating resources. We us… ▽ More The Covid-19 pandemic has made clear the need to improve modern multivariate time-series forecasting models. Current state of the art predictions of future daily deaths and, especially, hospital resource usage have confidence intervals that are unacceptably wide. Policy makers and hospitals require accurate forecasts to make informed decisions on passing legislation and allocating resources. We used US county-level data on daily deaths and population statistics to forecast future deaths. We extended the SIR epidemiological model to a novel model we call the PECAIQR model. It adds several new variables and parameters to the naive SIR model by taking into account the ramifications of the partial quarantining implemented in the US. We fitted data to the model parameters with numerical integration. Because of the fit degeneracy in parameter space and non-constant nature of the parameters, we developed several methods to optimize our fit, such as training on the data tail and training on specific policy regimes. We use cross-validation to tune our hyper parameters at the county level and generate a CDF for future daily deaths. For predictions made from training data up to May 25th, we consistently obtained an averaged pinball loss score of 0.096 on a 14 day forecast. We finally present examples of possible avenues for utility from our model. We generate longer-time horizon predictions over various 1-month windows in the past, forecast how many medical resources such as ventilators and ICU beds will be needed in counties, and evaluate the efficacy of our model in other countries. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2002.03140 [pdf, other]

doi 10.1145/3373017.3373049

HHH: An Online Medical Chatbot System based on Knowledge Graph and Hierarchical Bi-Directional Attention

Authors: Qiming Bao, Lin Ni, Jiamou Liu

Abstract: This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collected from the Internet. HHH also implements a novel t… ▽ More This paper proposes a chatbot framework that adopts a hybrid model which consists of a knowledge graph and a text similarity model. Based on this chatbot framework, we build HHH, an online question-and-answer (QA) Healthcare Helper system for answering complex medical questions. HHH maintains a knowledge graph constructed from medical data collected from the Internet. HHH also implements a novel text representation and similarity deep learning model, Hierarchical BiLSTM Attention Model (HBAM), to find the most similar question from a large QA dataset. We compare HBAM with other state-of-the-art language models such as bidirectional encoder representation from transformers (BERT) and Manhattan LSTM Model (MaLSTM). We train and test the models with a subset of the Quora duplicate questions dataset in the medical area. The experimental results show that our model is able to achieve a superior performance than these existing methods. △ Less

Submitted 8 February, 2020; originally announced February 2020.

Comments: 10 pages, 9 figures, 3 tables. Proceedings of the Australasian Computer Science Week Multiconference (ACSW 2020)

arXiv:1911.08045 [pdf, ps, other]

The k-Power Domination Number in Some Self-Similar Graphs

Authors: Yulun Xu, Qi Bao, Zhongzhi Zhang

Abstract: The $k$-power domination problem is a problem in graph theory, which has applications in many areas. However, it is hard to calculate the exact $k$-power domination number since determining k-power domination number of a generic graph is a NP-complete problem. We determine the exact $k$-power domination number in two graphs which have the same number of vertices and edges: pseudofractal scale-free… ▽ More The $k$-power domination problem is a problem in graph theory, which has applications in many areas. However, it is hard to calculate the exact $k$-power domination number since determining k-power domination number of a generic graph is a NP-complete problem. We determine the exact $k$-power domination number in two graphs which have the same number of vertices and edges: pseudofractal scale-free web and Sierpiński gasket. The $k$-power domination number becomes 1 for $k\ge2$ in the Sierpiński gasket, while the $k$-power domination number increases at an exponential rate with regard to the number of vertices in the pseudofractal scale-free web. The scale-free property may account for the difference in the behavior of two graphs. △ Less

Submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.07444 [pdf, other]

A Code Injection Method for Rapid Docker Image Building

Authors: Yujing Wang, Qinyang Bao

Abstract: Docker images are composed of multiple layers, each of which contains a set of instructions, and an archive of files. Layers allow Docker to separate a large build task into smaller ones, such that when a part of the program is changed, only the corresponding layer needs to be changed. Yet the current implementation has major inefficiencies that make the rebuilding of an image unnecessarily slow w… ▽ More Docker images are composed of multiple layers, each of which contains a set of instructions, and an archive of files. Layers allow Docker to separate a large build task into smaller ones, such that when a part of the program is changed, only the corresponding layer needs to be changed. Yet the current implementation has major inefficiencies that make the rebuilding of an image unnecessarily slow when changes in bottom layers are required: uneven content distribution amongst layers, the need to rebuild an entire layer during update, and the rebuild fall-throughs in many cases. In this paper, we propose a code injection method that overcomes these inefficiencies by targeting only the changed layer and then bypassing the layer's content checksum. This process is developed specifically for an interpreted language such as Python, where changes can be detected explicitly via text diff tools and run as-is without compilation. We then demonstrate that this method can accelerate the rebuild time, effectively reducing the O(n) where n = size of layer rebuild time to O(1). Whereas for compiled languages, literal code injection cannot guarantee integrity in compiled machine code. Expanding on the same code injection principle, multi-layer targeted code injection will be addressed in a future discussion. △ Less

Submitted 25 November, 2019; v1 submitted 18 November, 2019; originally announced November 2019.

Comments: 3 pages to be sumbitted to IEEE mobisecserv 2020

arXiv:1911.01075 [pdf, other]

Adapting a Container Infrastructure for Autonomous Vehicle Development

Authors: Yujing Wang, Qinyang Bao

Abstract: In the field of Autonomous Vehicle (AV) development, having a robust yet flexible infrastructure enables code to be continuously integrated and deployed, which in turn accelerates the rapid prototyping process. The platform-agnostic and scalable container infrastructure, often exploited by developers in the cloud domain, presents a viable solution addressing this need in AV development. Developers… ▽ More In the field of Autonomous Vehicle (AV) development, having a robust yet flexible infrastructure enables code to be continuously integrated and deployed, which in turn accelerates the rapid prototyping process. The platform-agnostic and scalable container infrastructure, often exploited by developers in the cloud domain, presents a viable solution addressing this need in AV development. Developers use tools such as Docker to build containers and Kubernetes to setup container networks. This paper presents a container infrastructure strategy for AV development, discusses the scenarios in which this strategy is useful and performs an analysis on container boundary overhead, and its impact on a Mix Critical System (MCS). An experiment was conducted to compare both operation runtime and communication delay of running a Gaussian Seidel Algorithm with I/O in four different environments: native OS, new container, existing container, and nested container. The comparison reveals that running in containers indeed adds a delay to signal response time, but behaves more deterministically and that nested container does not stack up delays but makes the process less deterministic. With these concerns in mind, the developers may be more informed when setting up the container infrastructure, and take full advantage of the new infrastructure while avoiding some common pitfalls. △ Less

Submitted 19 November, 2019; v1 submitted 4 November, 2019; originally announced November 2019.

Comments: to be submitted to IEEE CCWC, JAN 2020

arXiv:1811.10674 [pdf, other]

Exact Penalization of Generalized Nash Equilibrium Problems

Authors: Qin Ba, Jong-Shi Pang

Abstract: This paper presents an exact penalization theory of the generalized Nash equilibrium problem (GNEP) that has its origin from the renowned Arrow-Debreu general economic equilibrium model. While the latter model is the foundation of much of mathematical economics, the GNEP provides a mathematical model of multi-agent non-cooperative competition that has found many contemporary applications in divers… ▽ More This paper presents an exact penalization theory of the generalized Nash equilibrium problem (GNEP) that has its origin from the renowned Arrow-Debreu general economic equilibrium model. While the latter model is the foundation of much of mathematical economics, the GNEP provides a mathematical model of multi-agent non-cooperative competition that has found many contemporary applications in diverse engineering domains. The most salient feature of the GNEP that distinguishes it from a standard non-cooperative (Nash) game is that each player's optimization problem contains constraints that couple all players' decision variables. Extending results for stand-alone optimization problems, the penalization theory aims to convert the GNEP into a game of the standard kind without the coupled constraints, which is known to be more readily amenable to solution methods and analysis. Starting with an illustrative example to motivate the development, the paper focuses on two kinds of coupled constraints, shared (i.e., common) and finitely representable. Constraint residual functions and the associated error bound theory play an important role throughout the development. △ Less

Submitted 1 December, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: 23 pages, 1 figure

arXiv:1712.06064 [pdf, other]

Computing Optimal Control of Cascading Failure in DC Networks

Authors: Qin Ba, Ketan Savla

Abstract: We consider discrete-time dynamics, for cascading failure in DC networks, whose map is composition of failure rule with control actions. Supply-demand at the nodes is monotonically non-increasing under admissible control. Under the failure rule, a link is removed permanently if its flow exceeds capacity constraints. We consider finite horizon optimal control to steer the network from an arbitrary… ▽ More We consider discrete-time dynamics, for cascading failure in DC networks, whose map is composition of failure rule with control actions. Supply-demand at the nodes is monotonically non-increasing under admissible control. Under the failure rule, a link is removed permanently if its flow exceeds capacity constraints. We consider finite horizon optimal control to steer the network from an arbitrary initial state, defined in terms of active link set and supply-demand at the nodes, to a feasible state, i.e., a state which is invariant under the failure rule. There is no running cost and the reward associated with a feasible terminal state is the associated cumulative supply-demand. We propose two approaches for computing optimal control. The first approach, geared towards tree reducible networks, decomposes the global problem into a system of coupled local problems, which can be solved to optimality in two iterations. When restricted to the class of one-shot control actions, the optimal solutions to the local problems possess a piecewise affine property, which facilitates analytical solution. The second approach computes optimal control by searching over the reachable set, which is shown to admit an equivalent finite representation by aggregation of control actions leading to the same reachable active link set. An algorithmic procedure to construct this representation is provided by leveraging and extending tools for arrangement of hyperplanes and polytopes. Illustrative simulations, including showing the effectiveness of a projection-based approximation algorithm, are also presented. △ Less

Submitted 20 March, 2018; v1 submitted 17 December, 2017; originally announced December 2017.

arXiv:1301.0384 [pdf, ps, other]

Spectrum Sharing-based Multi-hop Decode-and-Forward Relay Networks under Interference Constraints: Performance Analysis and Relay Position Optimization

Authors: Vo Nguyen Quoc Bao, Tran Thien Thanh, Tuan Duc Nguyen, Thanh Dinh Vu

Abstract: The exact closed-form expressions for outage probability and bit error rate of spectrum sharing-based multi-hop decodeand- forward (DF) relay networks in non-identical Rayleigh fading channels are derived. We also provide the approximate closed-form expression for the system ergodic capacity. Utilizing these tractable analytical formulas, we can study the impact of key network parameters on the pe… ▽ More The exact closed-form expressions for outage probability and bit error rate of spectrum sharing-based multi-hop decodeand- forward (DF) relay networks in non-identical Rayleigh fading channels are derived. We also provide the approximate closed-form expression for the system ergodic capacity. Utilizing these tractable analytical formulas, we can study the impact of key network parameters on the performance of cognitivemulti-hop relay networks under interference constraints. Using a linear network model, we derive an optimum relay position scheme by numerically solving an optimization problem of balancing average signal-to-noise ratio (SNR) of each hop. The numerical results show that the optimal scheme leads to SNR performance gains of more than 1 dB. All the analytical expressions are verified by Monte-Carlo simulations confirming the advantage ofmultihop DF relaying networks in cognitive environments. △ Less

Submitted 3 January, 2013; originally announced January 2013.

Comments: 11 pages, 8 figures, accepted on "Journal of Communications and Networks"

Showing 1–39 of 39 results for author: Bao, Q