Search | arXiv e-print repository

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Authors: Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou, Xipeng Qiu, XuanJing Huang

Abstract: Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify t… ▽ More Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify this as a primary factor constraining the reasoning capabilities of LLMs, a limitation that cannot be resolved solely based on the predicted answers. To address this shortcoming, we introduce a hierarchical reasoning aggregation framework AoR (Aggregation of Reasoning), which selects answers based on the evaluation of reasoning chains. Additionally, AoR incorporates dynamic sampling, adjusting the number of reasoning chains in accordance with the complexity of the task. Experimental results on a series of complex reasoning tasks show that AoR outperforms prominent ensemble methods. Further analysis reveals that AoR not only adapts various LLMs but also achieves a superior performance ceiling when compared to current methods. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 17 pages, 14 figures, accepted by LREC-COLING 2024

arXiv:2404.11699 [pdf, other]

Retrieval-Augmented Embodied Agents

Authors: Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

Abstract: Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human… ▽ More Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human approach in robotics, we introduce the Retrieval-Augmented Embodied Agent (RAEA). This innovative system equips robots with a form of shared memory, significantly enhancing their performance. Our approach integrates a policy retriever, allowing robots to access relevant strategies from an external policy memory bank based on multi-modal inputs. Additionally, a policy generator is employed to assimilate these strategies into the learning process, enabling robots to formulate effective responses to tasks. Extensive testing of RAEA in both simulated and real-world scenarios demonstrates its superior performance over traditional methods, representing a major leap forward in robotic technology. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: CVPR2024

arXiv:2404.11051 [pdf]

WPS-Dataset: A benchmark for wood plate segmentation in bark removal processing

Authors: Rijun Wang, Guanghao Zhang, Fulong Liang, Bo Wang, Xiangwei Mou, Yesheng Chen, Peng Sun, Canjin Wang

Abstract: Using deep learning methods is a promising approach to improving bark removal efficiency and enhancing the quality of wood products. However, the lack of publicly available datasets for wood plate segmentation in bark removal processing poses challenges for researchers in this field. To address this issue, a benchmark for wood plate segmentation in bark removal processing named WPS-dataset is prop… ▽ More Using deep learning methods is a promising approach to improving bark removal efficiency and enhancing the quality of wood products. However, the lack of publicly available datasets for wood plate segmentation in bark removal processing poses challenges for researchers in this field. To address this issue, a benchmark for wood plate segmentation in bark removal processing named WPS-dataset is proposed in this study, which consists of 4863 images. We designed an image acquisition device and assembled it on a bark removal equipment to capture images in real industrial settings. We evaluated the WPS-dataset using six typical segmentation models. The models effectively learn and understand the WPS-dataset characteristics during training, resulting in high performance and accuracy in wood plate segmentation tasks. We believe that our dataset can lay a solid foundation for future research in bark removal processing and contribute to advancements in this field. △ Less

Submitted 25 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Report number: b06d7e0b-306f-476a-a72d-59a8793ac232 | v.1.2

arXiv:2403.07153 [pdf, other]

2023 Low-Power Computer Vision Challenge (LPCVC) Summary

Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accuracy with short execution time when their CV solutions run on an embedded device, such as Raspberry PI or Nvidia Jetson Nano. The vision problem for 2023 LPCVC is segmentation of images acquired by Unmanned Aerial Vehicles (UAVs, also called drones) after disasters. The 2023 LPCVC attracted 60 international teams that submitted 676 solutions during the submission window of one month. This article explains the setup of the competition and highlights the winners' methods that improve accuracy and shorten execution time. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: LPCVC 2023, website: https://lpcv.ai/

arXiv:2402.16333 [pdf, other]

Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation

Authors: Xinyi Mou, Zhongyu Wei, Xuanjing Huang

Abstract: Social media has emerged as a cornerstone of social movements, wielding significant influence in driving societal change. Simulating the response of the public and forecasting the potential impact has become increasingly important. However, existing methods for simulating such phenomena encounter challenges concerning their efficacy and efficiency in capturing the behaviors of social movement part… ▽ More Social media has emerged as a cornerstone of social movements, wielding significant influence in driving societal change. Simulating the response of the public and forecasting the potential impact has become increasingly important. However, existing methods for simulating such phenomena encounter challenges concerning their efficacy and efficiency in capturing the behaviors of social movement participants. In this paper, we introduce a hybrid framework HiSim for social media user simulation, wherein users are categorized into two types. Core users are driven by Large Language Models, while numerous ordinary users are modeled by deductive agent-based models. We further construct a Twitter-like environment to replicate their response dynamics following trigger events. Subsequently, we develop a multi-faceted benchmark SoMoSiMu-Bench for evaluation and conduct comprehensive experiments across real-world datasets. Experimental results demonstrate the effectiveness and flexibility of our method. △ Less

Submitted 17 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted to findings of ACL 2024

arXiv:2402.13022 [pdf, other]

SoMeLVLM: A Large Vision Language Model for Social Media Processing

Authors: Xinnong Zhang, Haoyu Kuang, Xinyi Mou, Hanjia Lyu, Kun Wu, Siming Chen, Jiebo Luo, Xuanjing Huang, Zhongyu Wei

Abstract: The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall sho… ▽ More The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall short in aligning with the unique speaking style and context of social media tasks. In this paper, we introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM), which is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation. SoMeLVLM is designed to understand and generate realistic social media behavior. We have developed a 654k multimodal social media instruction-tuning dataset to support our cognitive framework and fine-tune our model. Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks. Further analysis shows its significant advantages over baselines in terms of cognitive abilities. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.00084 [pdf, other]

EPSD: Early Pruning with Self-Distillation for Efficient Model Compression

Authors: Dong Chen, Ning Liu, Yichen Zhu, Zhengping Che, Rui Ma, Fachao Zhang, Xiaofeng Mou, Yi Chang, Jian Tang

Abstract: Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, make… ▽ More Neural network compression techniques, such as knowledge distillation (KD) and network pruning, have received increasing attention. Recent work `Prune, then Distill' reveals that a pruned student-friendly teacher network can benefit the performance of KD. However, the conventional teacher-student pipeline, which entails cumbersome pre-training of the teacher and complicated compression steps, makes pruning with KD less efficient. In addition to compressing models, recent compression techniques also emphasize the aspect of efficiency. Early pruning demands significantly less computational cost in comparison to the conventional pruning methods as it does not require a large pre-trained model. Likewise, a special case of KD, known as self-distillation (SD), is more efficient since it requires no pre-training or student-teacher pair selection. This inspires us to collaborate early pruning with SD for efficient model compression. In this work, we propose the framework named Early Pruning with Self-Distillation (EPSD), which identifies and preserves distillable weights in early pruning for a given SD task. EPSD efficiently combines early pruning and self-distillation in a two-step process, maintaining the pruned network's trainability for compression. Instead of a simple combination of pruning and SD, EPSD enables the pruned network to favor SD by keeping more distillable weights before training to ensure better distillation of the pruned network. We demonstrated that EPSD improves the training of pruned networks, supported by visual and quantitative analyses. Our evaluation covered diverse benchmarks (CIFAR-10/100, Tiny-ImageNet, full ImageNet, CUB-200-2011, and Pascal VOC), with EPSD outperforming advanced pruning and SD techniques. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: The first two authors are with equal contributions. Paper accepted by AAAI 2024

arXiv:2401.02330 [pdf, other]

LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model

Authors: Yichen Zhu, Minjie Zhu, Ning Liu, Zhicai Ou, Xiaofeng Mou, Jian Tang

Abstract: In this paper, we introduce LLaVA-$φ$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues. LLaVA-Phi marks a notable advancement in the realm of compact multi-modal models. It demonstrates that even smaller language models, with as few as 2.7B parameters, can effectively engage in intrica… ▽ More In this paper, we introduce LLaVA-$φ$ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues. LLaVA-Phi marks a notable advancement in the realm of compact multi-modal models. It demonstrates that even smaller language models, with as few as 2.7B parameters, can effectively engage in intricate dialogues that integrate both textual and visual elements, provided they are trained with high-quality corpora. Our model delivers commendable performance on publicly available benchmarks that encompass visual comprehension, reasoning, and knowledge-based perception. Beyond its remarkable performance in multi-modal dialogue tasks, our model opens new avenues for applications in time-sensitive environments and systems that require real-time interaction, such as embodied agents. It highlights the potential of smaller language models to achieve sophisticated levels of understanding and interaction, while maintaining greater resource efficiency.The project is available at {https://github.com/zhuyiche/llava-phi}. △ Less

Submitted 22 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: The datasets were incomplete as they did not include all the necessary copyrights

arXiv:2311.07547 [pdf, other]

GPT-4V(ision) as A Social Media Analysis Engine

Authors: Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo

Abstract: Recent research has offered insights into the extraordinary capabilities of Large Multimodal Models (LMMs) in various general vision and language tasks. There is growing interest in how LMMs perform in more specialized domains. Social media content, inherently multimodal, blends text, images, videos, and sometimes audio. Understanding social multimedia content remains a challenging problem for con… ▽ More Recent research has offered insights into the extraordinary capabilities of Large Multimodal Models (LMMs) in various general vision and language tasks. There is growing interest in how LMMs perform in more specialized domains. Social media content, inherently multimodal, blends text, images, videos, and sometimes audio. Understanding social multimedia content remains a challenging problem for contemporary machine learning frameworks. In this paper, we explore GPT-4V(ision)'s capabilities for social multimedia analysis. We select five representative tasks, including sentiment analysis, hate speech detection, fake news identification, demographic inference, and political ideology detection, to evaluate GPT-4V. Our investigation begins with a preliminary quantitative analysis for each task using existing benchmark datasets, followed by a careful review of the results and a selection of qualitative samples that illustrate GPT-4V's potential in understanding multimodal social media content. GPT-4V demonstrates remarkable efficacy in these tasks, showcasing strengths such as joint understanding of image-text pairs, contextual and cultural awareness, and extensive commonsense knowledge. Despite the overall impressive capacity of GPT-4V in the social media domain, there remain notable challenges. GPT-4V struggles with tasks involving multilingual social multimedia comprehension and has difficulties in generalizing to the latest trends in social media. Additionally, it exhibits a tendency to generate erroneous information in the context of evolving celebrity and politician knowledge, reflecting the known hallucination problem. The insights gleaned from our findings underscore a promising future for LMMs in enhancing our comprehension of social media content and its users through the analysis of multimodal information. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.03118 [pdf]

Blind CT Image Quality Assessment Using DDPM-derived Content and Transformer-based Evaluator

Authors: Yongyi Shi, Wenjun Xia, Ge Wang, Xuanqin Mou

Abstract: Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intrigu… ▽ More Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intriguing direction involves developing BIQA methods that mimic the operational characteristic of the human visual system (HVS). The internal generative mechanism (IGM) theory reveals that the HVS actively deduces primary content to enhance comprehension. In this study, we introduce an innovative BIQA metric that emulates the active inference process of IGM. Initially, an active inference module, implemented as a denoising diffusion probabilistic model (DDPM), is constructed to anticipate the primary content. Then, the dissimilarity map is derived by assessing the interrelation between the distorted image and its primary content. Subsequently, the distorted image and dissimilarity map are combined into a multi-channel image, which is inputted into a transformer-based image quality evaluator. Remarkably, by exclusively utilizing this transformer-based quality evaluator, we won the second place in the MICCAI 2023 low-dose computed tomography perceptual image quality assessment grand challenge. Leveraging the DDPM-derived primary content, our approach further improves the performance on the challenge dataset. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 10 pages, 6 figures

arXiv:2301.01621 [pdf]

Grammar construction methods for extended deterministic expressions

Authors: Xiaoying Mou, Haiming Chen

Abstract: Extended regular expressions with counting and interleaving are widely used in practice. However the related theoretical studies for this kind of expressions currently cannot meet the need of practical work. This paper develops syntax definitions for extended deterministic expressions and their subclasses, hope to completely solve the long-standing problem that there are no syntax definitions for… ▽ More Extended regular expressions with counting and interleaving are widely used in practice. However the related theoretical studies for this kind of expressions currently cannot meet the need of practical work. This paper develops syntax definitions for extended deterministic expressions and their subclasses, hope to completely solve the long-standing problem that there are no syntax definitions for this kind of expressions, which has become an important reason for restricting the use of extended expressions. △ Less

Submitted 2 February, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: in Chinese language

arXiv:2210.14529 [pdf, other]

Is MultiWOZ a Solved Task? An Interactive TOD Evaluation Framework with User Simulator

Authors: Qinyuan Cheng, Linyang Li, Guofeng Quan, Feng Gao, Xiaofeng Mou, Xipeng Qiu

Abstract: Task-Oriented Dialogue (TOD) systems are drawing more and more attention in recent studies. Current methods focus on constructing pre-trained models or fine-tuning strategies while the evaluation of TOD is limited by a policy mismatch problem. That is, during evaluation, the user utterances are from the annotated dataset while these utterances should interact with previous responses which can have… ▽ More Task-Oriented Dialogue (TOD) systems are drawing more and more attention in recent studies. Current methods focus on constructing pre-trained models or fine-tuning strategies while the evaluation of TOD is limited by a policy mismatch problem. That is, during evaluation, the user utterances are from the annotated dataset while these utterances should interact with previous responses which can have many alternatives besides annotated texts. Therefore, in this work, we propose an interactive evaluation framework for TOD. We first build a goal-oriented user simulator based on pre-trained models and then use the user simulator to interact with the dialogue system to generate dialogues. Besides, we introduce a sentence-level and a session-level score to measure the sentence fluency and session coherence in the interactive evaluation. Experimental results show that RL-based TOD systems trained by our proposed user simulator can achieve nearly 98% inform and success rates in the interactive evaluation of MultiWOZ dataset and the proposed scores measure the response quality besides the inform and success rates. We are hoping that our work will encourage simulator-based interactive evaluations in the TOD task. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: Accepted by Findings of EMNLP 2022

arXiv:2210.10994 [pdf, other]

MBTI Personality Prediction for Fictional Characters Using Movie Scripts

Authors: Yisi Sang, Xiangyang Mou, Mo Yu, Dakuo Wang, Jing Li, Jeffrey Stanton

Abstract: An NLP model that understands stories should be able to understand the characters in them. To support the development of neural models for this purpose, we construct a benchmark, Story2Personality. The task is to predict a movie character's MBTI or Big 5 personality types based on the narratives of the character. Experiments show that our task is challenging for the existing text classification mo… ▽ More An NLP model that understands stories should be able to understand the characters in them. To support the development of neural models for this purpose, we construct a benchmark, Story2Personality. The task is to predict a movie character's MBTI or Big 5 personality types based on the narratives of the character. Experiments show that our task is challenging for the existing text classification models, as none is able to largely outperform random guesses. We further proposed a multi-view model for personality prediction using both verbal and non-verbal descriptions, which gives improvement compared to using only verbal descriptions. The uniqueness and challenges in our dataset call for the development of narrative comprehension techniques from the perspective of understanding characters. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: paper accepted to EMNLP 2022

arXiv:2209.06596 [pdf, other]

Few Clean Instances Help Denoising Distant Supervision

Authors: Yufang Liu, Ziyin Huang, Yijun Wang, Changzhi Sun, Man Lan, Yuanbin Wu, Xiaofeng Mou, Ding Wang

Abstract: Existing distantly supervised relation extractors usually rely on noisy data for both model training and evaluation, which may lead to garbage-in-garbage-out systems. To alleviate the problem, we study whether a small clean dataset could help improve the quality of distantly supervised models. We show that besides getting a more convincing evaluation of models, a small clean dataset also helps us… ▽ More Existing distantly supervised relation extractors usually rely on noisy data for both model training and evaluation, which may lead to garbage-in-garbage-out systems. To alleviate the problem, we study whether a small clean dataset could help improve the quality of distantly supervised models. We show that besides getting a more convincing evaluation of models, a small clean dataset also helps us to build more robust denoising models. Specifically, we propose a new criterion for clean instance selection based on influence functions. It collects sample-level evidence for recognizing good instances (which is more informative than loss-level evidence). We also propose a teacher-student mechanism for controlling purity of intermediate results when bootstrapping the clean set. The whole approach is model-agnostic and demonstrates strong performances on both denoising real (NYT) and synthetic noisy datasets. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted by COLING 2022

arXiv:2207.01472 [pdf, other]

Deep Contrastive One-Class Time Series Anomaly Detection

Authors: Rui Wang, Chongwei Liu, Xudong Mou, Kai Gao, Xiaohui Guo, Pin Liu, Tianyu Wo, Xudong Liu

Abstract: The accumulation of time-series data and the absence of labels make time-series Anomaly Detection (AD) a self-supervised deep learning task. Single-normality-assumption-based methods, which reveal only a certain aspect of the whole normality, are incapable of tasks involved with a large number of anomalies. Specifically, Contrastive Learning (CL) methods distance negative pairs, many of which cons… ▽ More The accumulation of time-series data and the absence of labels make time-series Anomaly Detection (AD) a self-supervised deep learning task. Single-normality-assumption-based methods, which reveal only a certain aspect of the whole normality, are incapable of tasks involved with a large number of anomalies. Specifically, Contrastive Learning (CL) methods distance negative pairs, many of which consist of both normal samples, thus reducing the AD performance. Existing multi-normality-assumption-based methods are usually two-staged, firstly pre-training through certain tasks whose target may differ from AD, limiting their performance. To overcome the shortcomings, a deep Contrastive One-Class Anomaly detection method of time series (COCA) is proposed by authors, following the normality assumptions of CL and one-class classification. It treats the original and reconstructed representations as the positive pair of negative-sample-free CL, namely "sequence contrast". Next, invariance terms and variance terms compose a contrastive one-class loss function in which the loss of the assumptions is optimized by invariance terms simultaneously and the "hypersphere collapse" is prevented by variance terms. In addition, extensive experiments on two real-world time-series datasets show the superior performance of the proposed method achieves state-of-the-art. △ Less

Submitted 16 April, 2023; v1 submitted 4 July, 2022; originally announced July 2022.

arXiv:2205.00299 [pdf, ps, other]

A Survey of Machine Narrative Reading Comprehension Assessments

Authors: Yisi Sang, Xiangyang Mou, Jing Li, Jeffrey Stanton, Mo Yu

Abstract: As the body of research on machine narrative comprehension grows, there is a critical need for consideration of performance assessment strategies as well as the depth and scope of different benchmark tasks. Based on narrative theories, reading comprehension theories, as well as existing machine narrative reading comprehension tasks and datasets, we propose a typology that captures the main similar… ▽ More As the body of research on machine narrative comprehension grows, there is a critical need for consideration of performance assessment strategies as well as the depth and scope of different benchmark tasks. Based on narrative theories, reading comprehension theories, as well as existing machine narrative reading comprehension tasks and datasets, we propose a typology that captures the main similarities and differences among assessment tasks; and discuss the implications of our typology for new task design and the challenges of narrative reading comprehension. △ Less

Submitted 30 April, 2022; originally announced May 2022.

Comments: accepted for the IJCAI-ECAI2022 Survey Track

arXiv:2204.07721 [pdf, other]

TVShowGuess: Character Comprehension in Stories as Speaker Guessing

Authors: Yisi Sang, Xiangyang Mou, Mo Yu, Shunyu Yao, Jing Li, Jeffrey Stanton

Abstract: We propose a new task for assessing machines' skills of understanding fictional characters in narrative stories. The task, TVShowGuess, builds on the scripts of TV series and takes the form of guessing the anonymous main characters based on the backgrounds of the scenes and the dialogues. Our human study supports that this form of task covers comprehension of multiple types of character persona, i… ▽ More We propose a new task for assessing machines' skills of understanding fictional characters in narrative stories. The task, TVShowGuess, builds on the scripts of TV series and takes the form of guessing the anonymous main characters based on the backgrounds of the scenes and the dialogues. Our human study supports that this form of task covers comprehension of multiple types of character persona, including understanding characters' personalities, facts and memories of personal experience, which are well aligned with the psychological and literary theories about the theory of mind (ToM) of human beings on understanding fictional characters during reading. We further propose new model architectures to support the contextualized encoding of long scene texts. Experiments show that our proposed approaches significantly outperform baselines, yet still largely lag behind the (nearly perfect) human performance. Our work serves as a first step toward the goal of narrative character comprehension. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: Accepted at NAACL 2022

arXiv:2203.07644 [pdf, other]

Efficient Long Sequence Encoding via Synchronization

Authors: Xiangyang Mou, Mo Yu, Bingsheng Yao, Lifu Huang

Abstract: Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by hierarchical encoding or post-hoc aggregation. We propose a synchronization mechanism for hierarchical encoding. Our approach first identifies anchor tokens acros… ▽ More Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by hierarchical encoding or post-hoc aggregation. We propose a synchronization mechanism for hierarchical encoding. Our approach first identifies anchor tokens across segments and groups them by their roles in the original input sequence. Then inside Transformer layer, anchor embeddings are synchronized within their group via a self-attention module. Our approach is a general framework with sufficient flexibility -- when adapted to a new task, it is easy to be enhanced with the task-specific anchor definitions. Experiments on two representative tasks with different types of long input texts, NarrativeQA summary setting and wild multi-hop reasoning from HotpotQA, demonstrate that our approach is able to improve the global information exchange among segments while maintaining efficiency. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 5 pages, short paper

arXiv:2203.02384 [pdf, other]

AutoMO-Mixer: An automated multi-objective Mixer model for balanced, safe and robust prediction in medicine

Authors: Xi Chen, Jiahuan Lv, Dehua Feng, Xuanqin Mou, Ling Bai, Shu Zhang, Zhiguo Zhou

Abstract: Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment. Artificial intelligence (AI), especially the deep learning, has achieved great success in many fields. However, more reliable AI model is needed in image guided diagnosis and therapy. To achieve this goal, developing a balanced, safe and robust model with a unified framework is desira… ▽ More Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment. Artificial intelligence (AI), especially the deep learning, has achieved great success in many fields. However, more reliable AI model is needed in image guided diagnosis and therapy. To achieve this goal, developing a balanced, safe and robust model with a unified framework is desirable. In this study, a new unified model termed as automated multi-objective Mixer (AutoMO-Mixer) model was developed, which utilized a recent developed multiple layer perceptron Mixer (MLP-Mixer) as base. To build a balanced model, sensitivity and specificity were considered as the objective functions simultaneously in training stage. Meanwhile, a new evidential reasoning based on entropy was developed to achieve a safe and robust model in testing stage. The experiment on an optical coherence tomography dataset demonstrated that AutoMO-Mixer can obtain safer, more balanced, and robust results compared with MLP-Mixer and other available models. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2106.03826 [pdf, other]

Narrative Question Answering with Cutting-Edge Open-Domain QA Techniques: A Comprehensive Study

Authors: Xiangyang Mou, Chenghao Yang, Mo Yu, Bingsheng Yao, Xiaoxiao Guo, Saloni Potdar, Hui Su

Abstract: Recent advancements in open-domain question answering (ODQA), i.e., finding answers from large open-domain corpus like Wikipedia, have led to human-level performance on many datasets. However, progress in QA over book stories (Book QA) lags behind despite its similar task formulation to ODQA. This work provides a comprehensive and quantitative analysis about the difficulty of Book QA: (1) We bench… ▽ More Recent advancements in open-domain question answering (ODQA), i.e., finding answers from large open-domain corpus like Wikipedia, have led to human-level performance on many datasets. However, progress in QA over book stories (Book QA) lags behind despite its similar task formulation to ODQA. This work provides a comprehensive and quantitative analysis about the difficulty of Book QA: (1) We benchmark the research on the NarrativeQA dataset with extensive experiments with cutting-edge ODQA techniques. This quantifies the challenges Book QA poses, as well as advances the published state-of-the-art with a $\sim$7\% absolute improvement on Rouge-L. (2) We further analyze the detailed challenges in Book QA through human studies.\footnote{\url{https://github.com/gorov/BookQA}.} Our findings indicate that the event-centric questions dominate this task, which exemplifies the inability of existing QA models to handle event-oriented scenarios. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted to TACL

arXiv:2103.11643 [pdf, other]

Complementary Evidence Identification in Open-Domain Question Answering

Authors: Xiangyang Mou, Mo Yu, Shiyu Chang, Yufei Feng, Li Zhang, Hui Su

Abstract: This paper proposes a new problem of complementary evidence identification for open-domain question answering (QA). The problem aims to efficiently find a small set of passages that covers full evidence from multiple aspects as to answer a complex question. To this end, we proposes a method that learns vector representations of passages and models the sufficiency and diversity within the selected… ▽ More This paper proposes a new problem of complementary evidence identification for open-domain question answering (QA). The problem aims to efficiently find a small set of passages that covers full evidence from multiple aspects as to answer a complex question. To this end, we proposes a method that learns vector representations of passages and models the sufficiency and diversity within the selected set, in addition to the relevance between the question and passages. Our experiments demonstrate that our method considers the dependence within the supporting evidence and significantly improves the accuracy of complementary evidence selection in QA domain. △ Less

Submitted 5 April, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

Comments: 7 pages, EACL 2021

arXiv:2012.11525 [pdf]

A Shift-insensitive Full Reference Image Quality Assessment Model Based on Quadratic Sum of Gradient Magnitude and LOG signals

Authors: Congmin Chen, Xuanqin Mou

Abstract: Image quality assessment that aims at estimating the subject quality of images, builds models to evaluate the perceptual quality of the image in different applications. Based on the fact that the human visual system (HVS) is highly sensitive to structural information, the edge information extraction is widely applied in different IQA metrics. According to previous studies, the image gradient magni… ▽ More Image quality assessment that aims at estimating the subject quality of images, builds models to evaluate the perceptual quality of the image in different applications. Based on the fact that the human visual system (HVS) is highly sensitive to structural information, the edge information extraction is widely applied in different IQA metrics. According to previous studies, the image gradient magnitude (GM) and the Laplacian of Gaussian (LOG) operator are two efficient structural features in IQA tasks. However, most of the IQA metrics achieve good performance only when the distorted image is totally registered with the reference image, but fail to perform on images with small translations. In this paper, we propose an FR-IQA model with the quadratic sum of the GM and the LOG signals, which obtains good performance in image quality estimation considering shift-insensitive property for not well-registered reference and distortion image pairs. Experimental results show that the proposed model works robustly on three large scale subjective IQA databases which contain a variety of distortion types and levels, and stays in the state-of-the-art FR-IQA models no matter for single distortion type or across whole database. Furthermore, we validated that the proposed metric performs better with shift-insensitive property compared with the CW-SSIM metric that is considered to be shift-insensitive IQA so far. Meanwhile, the proposed model is much simple than the CW-SSIM, which is efficient for applications. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: 10 pages, 7 figures

MSC Class: 94A08 ACM Class: I.4.7

arXiv:2007.09903 [pdf, other]

Multimodal Dialogue State Tracking By QA Approach with Data Augmentation

Authors: Xiangyang Mou, Brandyn Sigouin, Ian Steenstra, Hui Su

Abstract: Recently, a more challenging state tracking task, Audio-Video Scene-Aware Dialogue (AVSD), is catching an increasing amount of attention among researchers. Different from purely text-based dialogue state tracking, the dialogue in AVSD contains a sequence of question-answer pairs about a video and the final answer to the given question requires additional understanding of the video. This paper inte… ▽ More Recently, a more challenging state tracking task, Audio-Video Scene-Aware Dialogue (AVSD), is catching an increasing amount of attention among researchers. Different from purely text-based dialogue state tracking, the dialogue in AVSD contains a sequence of question-answer pairs about a video and the final answer to the given question requires additional understanding of the video. This paper interprets the AVSD task from an open-domain Question Answering (QA) point of view and proposes a multimodal open-domain QA system to deal with the problem. The proposed QA system uses common encoder-decoder framework with multimodal fusion and attention. Teacher forcing is applied to train a natural language generator. We also propose a new data augmentation approach specifically under QA assumption. Our experiments show that our model and techniques bring significant improvements over the baseline model on the DSTC7-AVSD dataset and demonstrate the potentials of our data augmentation techniques. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: AAAI DSTC8 Workshop

arXiv:2007.09878 [pdf, ps, other]

Frustratingly Hard Evidence Retrieval for QA Over Books

Authors: Xiangyang Mou, Mo Yu, Bingsheng Yao, Chenghao Yang, Xiaoxiao Guo, Saloni Potdar, Hui Su

Abstract: A lot of progress has been made to improve question answering (QA) in recent years, but the special problem of QA over narrative book stories has not been explored in-depth. We formulate BookQA as an open-domain QA task given its similar dependency on evidence retrieval. We further investigate how state-of-the-art open-domain QA approaches can help BookQA. Besides achieving state-of-the-art on the… ▽ More A lot of progress has been made to improve question answering (QA) in recent years, but the special problem of QA over narrative book stories has not been explored in-depth. We formulate BookQA as an open-domain QA task given its similar dependency on evidence retrieval. We further investigate how state-of-the-art open-domain QA approaches can help BookQA. Besides achieving state-of-the-art on the NarrativeQA benchmark, our study also reveals the difficulty of evidence retrieval in books with a wealth of experiments and analysis - which necessitates future effort on novel solutions for evidence retrieval in BookQA. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: ACL 2020 NUSE Workshop, 6 pages

arXiv:2004.13369 [pdf, other]

SSIM-Based CTU-Level Joint Optimal Bit Allocation and Rate Distortion Optimization

Authors: Yang Li, Xuanqin Mou

Abstract: Structural similarity (SSIM)-based distortion $D_\text{SSIM}$ is more consistent with human perception than the traditional mean squared error $D_\text{MSE}$. To achieve better video quality, many studies on optimal bit allocation (OBA) and rate-distortion optimization (RDO) used $D_\text{SSIM}$ as the distortion metric. However, many of them failed to optimize OBA and RDO jointly based on SSIM, t… ▽ More Structural similarity (SSIM)-based distortion $D_\text{SSIM}$ is more consistent with human perception than the traditional mean squared error $D_\text{MSE}$. To achieve better video quality, many studies on optimal bit allocation (OBA) and rate-distortion optimization (RDO) used $D_\text{SSIM}$ as the distortion metric. However, many of them failed to optimize OBA and RDO jointly based on SSIM, thus causing a non-optimal R-$D_\text{SSIM}$ performance. This problem is due to the lack of an accurate R-$D_\text{SSIM}$ model that can be used uniformly in both OBA and RDO. To solve this problem, we propose a $D_\text{SSIM}$-$D_\text{MSE}$ model first. Based on this model, the complex R-$D_\text{SSIM}$ cost in RDO can be calculated as simpler R-$D_\text{MSE}$ cost with a new SSIM-related Lagrange multiplier. This not only reduces the computation burden of SSIM-based RDO, but also enables the R-$D_\text{SSIM}$ model to be uniformly used in OBA and RDO. Moreover, with the new SSIM-related Lagrange multiplier in hand, the joint relationship of R-$D_\text{SSIM}$-$λ_\text{SSIM}$ (the negative derivative of R-$D_\text{SSIM}$) can be built, based on which the R-$D_\text{SSIM}$ model parameters can be calculated accurately. With accurate and unified R-$D_\text{SSIM}$ model, SSIM-based OBA and SSIM-based RDO are unified together in our scheme, called SOSR. Compared with the HEVC reference encoder HM16.20, SOSR saves 4%, 10%, and 14% bitrate under the same SSIM in all-intra, hierarchical and non-hierarchical low-delay-B configurations, which is superior to other state-of-the-art schemes. △ Less

Submitted 3 April, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: An improved version of this manuscript has been accepted by IEEE Transactions on Broadcasting DOI 10.1109/TBC.2021.3068871. The project page is located at http://gr.xjtu.edu.cn/web/xqmou/sosr

arXiv:1905.10150 [pdf, other]

doi 10.1117/1.JEI.28.2.023025

Saliency detection based on structural dissimilarity induced by image quality assessment model

Authors: Yang Li, Xuanqin Mou

Abstract: The distinctiveness of image regions is widely used as the cue of saliency. Generally, the distinctiveness is computed according to the absolute difference of features. However, according to the image quality assessment (IQA) studies, the human visual system is highly sensitive to structural changes rather than absolute difference. Accordingly, we propose the computation of the structural dissimil… ▽ More The distinctiveness of image regions is widely used as the cue of saliency. Generally, the distinctiveness is computed according to the absolute difference of features. However, according to the image quality assessment (IQA) studies, the human visual system is highly sensitive to structural changes rather than absolute difference. Accordingly, we propose the computation of the structural dissimilarity between image patches as the distinctiveness measure for saliency detection. Similar to IQA models, the structural dissimilarity is computed based on the correlation of the structural features. The global structural dissimilarity of a patch to all the other patches represents saliency of the patch. We adopt two widely used structural features, namely the local contrast and gradient magnitude, into the structural dissimilarity computation in the proposed model. Without any postprocessing, the proposed model based on the correlation of either of the two structural features outperforms 11 state-of-the-art saliency models on three saliency databases. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: For associated source code, see https://github.com/yangli-xjtu/SDS

Journal ref: J. Electron. Imag. 28(2) 023025 (3 April 2019)

arXiv:1805.12503 [pdf, other]

Practical Study of Deterministic Regular Expressions from Large-scale XML and Schema Data

Authors: Yeting Li, Xinyu Chu, Xiaoying Mou, Chunmei Dong, Haiming Chen

Abstract: Regular expressions are a fundamental concept in computer science and widely used in various applications. In this paper we focused on deterministic regular expressions (DREs). Considering that researchers didn't have large datasets as evidence before, we first harvested a large corpus of real data from the Web then conducted a practical study to investigate the usage of DREs. One feature of our w… ▽ More Regular expressions are a fundamental concept in computer science and widely used in various applications. In this paper we focused on deterministic regular expressions (DREs). Considering that researchers didn't have large datasets as evidence before, we first harvested a large corpus of real data from the Web then conducted a practical study to investigate the usage of DREs. One feature of our work is that the data set is sufficiently large compared with previous work, which is obtained using several data collection strategies we proposed. The results show more than 98\% of expressions in Relax NG are DRE, and more than 56\% of expressions from RegExLib are DRE, while both Relax NG and RegExLib do not have the determinism constraint. These observations indicate that DREs are commonly used in practice. The results also show further study of subclasses of DREs is necessary. As far as we know, we are the first to analyze the determinism and the subclasses of DREs of Relax NG and RegExLib, and give these results. Furthermore, we give some discussions and applications of the data set. We obtain a DRE data set from the original data, which will be useful in practice and it has value in its own right. We find current research in new subclasses of DREs is insufficient, therefore it is necessary to do further study. We also analyze the referencing relationships among XSDs and define SchemaRank, which can be used in XML Schema design. △ Less

Submitted 31 May, 2018; originally announced May 2018.

Comments: 9 pages,5 figures

arXiv:1708.00961 [pdf, other]

doi 10.1109/TMI.2018.2827462

Low Dose CT Image Denoising Using a Generative Adversarial Network with Wasserstein Distance and Perceptual Loss

Authors: Qingsong Yang, Pingkun Yan, Yanbo Zhang, Hengyong Yu, Yongyi Shi, Xuanqin Mou, Mannudeep K. Kalra, Ge Wang

Abstract: In this paper, we introduce a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity. The Wasserstein distance is a key concept of the optimal transform theory, and promises to improve the performance of the GAN. The perceptual loss compares the perceptual features of a denoised output against those of the ground truth in… ▽ More In this paper, we introduce a new CT image denoising method based on the generative adversarial network (GAN) with Wasserstein distance and perceptual similarity. The Wasserstein distance is a key concept of the optimal transform theory, and promises to improve the performance of the GAN. The perceptual loss compares the perceptual features of a denoised output against those of the ground truth in an established feature space, while the GAN helps migrate the data noise distribution from strong to weak. Therefore, our proposed method transfers our knowledge of visual perception to the image denoising task, is capable of not only reducing the image noise level but also keeping the critical information at the same time. Promising results have been obtained in our experiments with clinical CT images. △ Less

Submitted 24 April, 2018; v1 submitted 2 August, 2017; originally announced August 2017.

Journal ref: IEEE Trans. Med. Imaging. 37(2018) 1348-1357

arXiv:1511.04691 [pdf]

Optimization of the Block-level Bit Allocation in Perceptual Video Coding based on MINMAX

Authors: Chao Wang, Xuanqin Mou, Lei Zhang

Abstract: In video coding, it is expected that the encoder could adaptively select the encoding parameters (e.g., quantization parameter) to optimize the bit allocation to different sources under the given constraint. However, in hybrid video coding, the dependency between sources brings high complexity for the bit allocation optimization, especially in the block-level, and existing optimization methods mos… ▽ More In video coding, it is expected that the encoder could adaptively select the encoding parameters (e.g., quantization parameter) to optimize the bit allocation to different sources under the given constraint. However, in hybrid video coding, the dependency between sources brings high complexity for the bit allocation optimization, especially in the block-level, and existing optimization methods mostly focus on frame-level bit allocation. In this paper, we propose a macroblock (MB) level bit allocation method based on the minimum maximum (MINMAX) criterion, which has acceptable encoding complexity for offline applications. An iterative-based algorithm, namely maximum distortion descend (MDD), is developed to reduce quality fluctuation among MBs within a frame, where the Structure SIMilarity (SSIM) index is used to measure the perceptual distortion of MBs. Our extensive experimental results on benchmark video sequences show that the proposed method can greatly enhance the encoding performance in terms of both bits saving and perceptual quality improvement. △ Less

Submitted 15 November, 2015; originally announced November 2015.

Comments: 11 pages, 17 figures

ACM Class: I.4.2; E.4

arXiv:1510.02884 [pdf, other]

Learn to Evaluate Image Perceptual Quality Blindly from Statistics of Self-similarity

Authors: Wufeng Xue, Xuanqin Mou, Lei Zhang

Abstract: Among the various image quality assessment (IQA) tasks, blind IQA (BIQA) is particularly challenging due to the absence of knowledge about the reference image and distortion type. Features based on natural scene statistics (NSS) have been successfully used in BIQA, while the quality relevance of the feature plays an essential role to the quality prediction performance. Motivated by the fact that t… ▽ More Among the various image quality assessment (IQA) tasks, blind IQA (BIQA) is particularly challenging due to the absence of knowledge about the reference image and distortion type. Features based on natural scene statistics (NSS) have been successfully used in BIQA, while the quality relevance of the feature plays an essential role to the quality prediction performance. Motivated by the fact that the early processing stage in human visual system aims to remove the signal redundancies for efficient visual coding, we propose a simple but very effective BIQA method by computing the statistics of self-similarity (SOS) in an image. Specifically, we calculate the inter-scale similarity and intra-scale similarity of the distorted image, extract the SOS features from these similarities, and learn a regression model to map the SOS features to the subjective quality score. Extensive experiments demonstrate very competitive quality prediction performance and generalization ability of the proposed SOS based BIQA method. △ Less

Submitted 10 October, 2015; originally announced October 2015.

arXiv:1502.04727 [pdf, ps, other]

doi 10.1109/VTCSpring.2015.7146165

Wireless Power Transfer: Survey and Roadmap

Authors: Xiaolin Mou, Hongjian Sun

Abstract: Wireless power transfer (WPT) technologies have been widely used in many areas, e.g., the charging of electric toothbrush, mobile phones, and electric vehicles. This paper introduces fundamental principles of three WPT technologies, i.e., inductive coupling-based WPT, magnetic resonant coupling-based WPT, and electromagnetic radiation-based WPT, together with discussions of their strengths and wea… ▽ More Wireless power transfer (WPT) technologies have been widely used in many areas, e.g., the charging of electric toothbrush, mobile phones, and electric vehicles. This paper introduces fundamental principles of three WPT technologies, i.e., inductive coupling-based WPT, magnetic resonant coupling-based WPT, and electromagnetic radiation-based WPT, together with discussions of their strengths and weaknesses. Main research themes are then presented, i.e., improving the transmission efficiency and distance, and designing multiple transmitters/receivers. The state-of-the-art techniques are reviewed and categorised. Several WPT applications are described. Open research challenges are then presented with a brief discussion of potential roadmap. △ Less

Submitted 16 February, 2015; originally announced February 2015.

Comments: To appear in Proceedings of IEEE VTC 2015 Spring, First International Workshop on Integrating Communications, Control, Computing Technologies for Smart Grid (ICT4SG)

Journal ref: 2015 IEEE 81st Vehicular Technology Conference: VTC2015-Spring. Glasgow, UK, IEEE, Glasgow

arXiv:1308.3052 [pdf]

doi 10.1109/TIP.2013.2293423

Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index

Authors: Wufeng Xue, Lei Zhang, Xuanqin Mou, Alan C. Bovik

Abstract: It is an important task to faithfully evaluate the perceptual quality of output images in many applications such as image compression, image restoration and multimedia streaming. A good image quality assessment (IQA) model should not only deliver high quality prediction accuracy but also be computationally efficient. The efficiency of IQA metrics is becoming particularly important due to the incre… ▽ More It is an important task to faithfully evaluate the perceptual quality of output images in many applications such as image compression, image restoration and multimedia streaming. A good image quality assessment (IQA) model should not only deliver high quality prediction accuracy but also be computationally efficient. The efficiency of IQA metrics is becoming particularly important due to the increasing proliferation of high-volume visual data in high-speed networks. We present a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). The image gradients are sensitive to image distortions, while different local structures in a distorted image suffer different degrees of degradations. This motivates us to explore the use of global variation of gradient based local quality map for overall image quality prediction. We find that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy the standard deviation of the GMS map can predict accurately perceptual image quality. The resulting GMSD algorithm is much faster than most state-of-the-art IQA methods, and delivers highly competitive prediction accuracy. △ Less

Submitted 25 November, 2013; v1 submitted 14 August, 2013; originally announced August 2013.

arXiv:nlin/0407052 [pdf, ps, other]

doi 10.1063/1.1856711

Breaking a chaos-noise-based secure communication scheme

Authors: Shujun Li, Gonzalo Álvarez, Guanrong Chen, Xuanqin Mou

Abstract: This paper studies the security of a secure communication scheme based on two discrete-time intermittently-chaotic systems synchronized via a common random driving signal. Some security defects of the scheme are revealed: 1) the key space can be remarkably reduced; 2) the decryption is insensitive to the mismatch of the secret key; 3) the key-generation process is insecure against known/chosen-p… ▽ More This paper studies the security of a secure communication scheme based on two discrete-time intermittently-chaotic systems synchronized via a common random driving signal. Some security defects of the scheme are revealed: 1) the key space can be remarkably reduced; 2) the decryption is insensitive to the mismatch of the secret key; 3) the key-generation process is insecure against known/chosen-plaintext attacks. The first two defects mean that the scheme is not secure enough against brute-force attacks, and the third one means that an attacker can easily break the cryptosystem by approximately estimating the secret key once he has a chance to access a fragment of the generated keystream. Yet it remains to be clarified if intermittent chaos could be used for designing secure chaotic cryptosystems. △ Less

Submitted 13 September, 2005; v1 submitted 23 July, 2004; originally announced July 2004.

Comments: RevTeX4, 11 pages, 15 figures

Journal ref: Chaos, vol. 15, no. 1, art. no. 013703, March 2005

arXiv:cs/0402054 [pdf, ps, other]

doi 10.1109/TCSII.2004.838657

On the Security of the Yi-Tan-Siew Chaos-Based Cipher

Authors: Shujun Li, Guanrong Chen, Xuanqin Mou

Abstract: This paper presents a comprehensive analysis on the security of the Yi-Tan-Siew chaotic cipher proposed in [IEEE TCAS-I 49(12):1826-1829 (2002)]. A differential chosen-plaintext attack and a differential chosen-ciphertext attack are suggested to break the sub-key K, under the assumption that the time stamp can be altered by the attacker, which is reasonable in such attacks. Also, some security P… ▽ More This paper presents a comprehensive analysis on the security of the Yi-Tan-Siew chaotic cipher proposed in [IEEE TCAS-I 49(12):1826-1829 (2002)]. A differential chosen-plaintext attack and a differential chosen-ciphertext attack are suggested to break the sub-key K, under the assumption that the time stamp can be altered by the attacker, which is reasonable in such attacks. Also, some security Problems about the sub-keys $α$ and $β$ are clarified, from both theoretical and experimental points of view. Further analysis shows that the security of this cipher is independent of the use of the chaotic tent map, once the sub-key $K$ is removed via the proposed suggested differential chosen-plaintext attack. △ Less

Submitted 1 December, 2004; v1 submitted 24 February, 2004; originally announced February 2004.

Comments: 5 pages, 3 figures, IEEEtrans.cls v 1.6

ACM Class: E.3

Journal ref: IEEE Trans. CAS-II, vol. 51, no. 12, pp. 665-669, 2004

arXiv:cs/0402004 [pdf, ps, other]

doi 10.1016/j.physleta.2004.09.028

Baptista-type chaotic cryptosystems: Problems and countermeasures

Authors: Shujun Li, Guanrong Chen, Kwok-Wo Wong, Xuanqin Mou, Yuanlong Cai

Abstract: In 1998, M. S. Baptista proposed a chaotic cryptosystem, which has attracted much attention from the chaotic cryptography community: some of its modifications and also attacks have been reported in recent years. In [Phys. Lett. A 307 (2003) 22], we suggested a method to enhance the security of Baptista-type cryptosystem, which can successfully resist all proposed attacks. However, the enhanced B… ▽ More In 1998, M. S. Baptista proposed a chaotic cryptosystem, which has attracted much attention from the chaotic cryptography community: some of its modifications and also attacks have been reported in recent years. In [Phys. Lett. A 307 (2003) 22], we suggested a method to enhance the security of Baptista-type cryptosystem, which can successfully resist all proposed attacks. However, the enhanced Baptista-type cryptosystem has a nontrivial defect, which produces errors in the decrypted data with a generally small but nonzero probability, and the consequent error propagation exists. In this Letter, we analyze this defect and discuss how to rectify it. In addition, we point out some newly-found problems existing in all Baptista-type cryptosystems and consequently propose corresponding countermeasures. △ Less

Submitted 3 November, 2004; v1 submitted 2 February, 2004; originally announced February 2004.

Comments: 13 pages, 2 figures

ACM Class: E.3

Journal ref: Physics Letters A, 332(5-6):368-375, 2004

Showing 1–35 of 35 results for author: Mou, X