-
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning
Authors:
Hourui Deng,
Hongjie Zhang,
Jie Ou,
Chaosheng Feng
Abstract:
Spatial reasoning in Large Language Models (LLMs) is the foundation for embodied intelligence. However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning. To address this challenge, this study proposes an innovative model, Spatial-to-Relatio…
▽ More
Spatial reasoning in Large Language Models (LLMs) is the foundation for embodied intelligence. However, even in simple maze environments, LLMs still encounter challenges in long-term path-planning, primarily influenced by their spatial hallucination and context inconsistency hallucination by long-term reasoning. To address this challenge, this study proposes an innovative model, Spatial-to-Relational Transformation and Curriculum Q-Learning (S2RCQL). To address the spatial hallucination of LLMs, we propose the Spatial-to-Relational approach, which transforms spatial prompts into entity relations and paths representing entity relation chains. This approach fully taps the potential of LLMs in terms of sequential thinking. As a result, we design a path-planning algorithm based on Q-learning to mitigate the context inconsistency hallucination, which enhances the reasoning ability of LLMs. Using the Q-value of state-action as auxiliary information for prompts, we correct the hallucinations of LLMs, thereby guiding LLMs to learn the optimal path. Finally, we propose a reverse curriculum learning technique based on LLMs to further mitigate the context inconsistency hallucination. LLMs can rapidly accumulate successful experiences by reducing task difficulty and leveraging them to tackle more complex tasks. We performed comprehensive experiments based on Baidu's self-developed LLM: ERNIE-Bot 4.0. The results showed that our S2RCQL achieved a 23%--40% improvement in both success and optimality rates compared with advanced prompt engineering.
△ Less
Submitted 26 August, 2024; v1 submitted 23 August, 2024;
originally announced August 2024.
-
Compensate Quantization Errors+: Quantized Models Are Inquisitive Learners
Authors:
Yifei Gao,
Jie Ou,
Lei Wang,
Fanhua Shang,
Jaji Wu,
Jun Cheng
Abstract:
Large Language Models (LLMs) showcase remarkable performance and robust deductive capabilities, yet their expansive size complicates deployment and raises environmental concerns due to substantial resource consumption. The recent development of a quantization technique known as Learnable Singular-value Increment (LSI) has addressed some of these quantization challenges. Leveraging insights from LS…
▽ More
Large Language Models (LLMs) showcase remarkable performance and robust deductive capabilities, yet their expansive size complicates deployment and raises environmental concerns due to substantial resource consumption. The recent development of a quantization technique known as Learnable Singular-value Increment (LSI) has addressed some of these quantization challenges. Leveraging insights from LSI and our extensive research, we have developed innovative methods that enhance the performance of quantized LLMs, particularly in low-bit settings. Our methods consistently deliver state-of-the-art results across various quantization scenarios and offer deep theoretical insights into the quantization process, elucidating the potential of quantized models for widespread application.
△ Less
Submitted 15 August, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
Authors:
Jiefu Ou,
Arda Uzunoglu,
Benjamin Van Durme,
Daniel Khashabi
Abstract:
AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and…
▽ More
AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and what should they look like? We explore this via a thought experiment: assuming that wikiHow tutorials cover a wide variety of human-written tasks, what is the space of APIs needed to cover these instructions? We propose a framework to iteratively induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4 to generate Pythonic programs as agent policies and bootstrap a universe of APIs by 1) reusing a seed set of APIs; and then 2) fabricate new API calls when necessary. The focus of this thought experiment is on defining these APIs rather than their executability. We apply the proposed pipeline on instructions from wikiHow tutorials. On a small fraction (0.5%) of tutorials, we induce an action space of 300+ APIs necessary for capturing the rich variety of tasks in the physical world. A detailed automatic and human analysis of the induction output reveals that the proposed pipeline enables effective reuse and creation of APIs. Moreover, a manual review revealed that existing simulators support only a small subset of the induced APIs (9 of the top 50 frequent APIs), motivating the development of action-rich embodied environments.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Authors:
Yifei Gao,
Jie Ou,
Lei Wang,
Yuting Xiao,
Zhiyuan Xiang,
Ruiting Dai,
Jun Cheng
Abstract:
Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization metho…
▽ More
Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve state-of-the-art performance in diverse quantization settings, no matter in weight-only, weight-activation or extremely low bit scenarios. By unleashing the potential of LSI, efficient finetuning on quantized model is no longer a prohibitive problem.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Authors:
Jingyang Ou,
Shen Nie,
Kaiwen Xue,
Fengqi Zhu,
Jiacheng Sun,
Zhenguo Li,
Chongxuan Li
Abstract:
Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time…
▽ More
Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form. Motivated by this finding, we propose reparameterized absorbing discrete diffusion (RADD), a dedicated diffusion model without time-condition that characterizes the time-independent conditional probabilities. Besides its simplicity, RADD can reduce the number of function evaluations (NFEs) by caching the output of the time-independent network when the noisy sample remains unchanged in a sampling interval. Empirically, RADD is up to 3.5 times faster while achieving similar performance with the strongest baseline. Built upon the new perspective of conditional distributions, we further unify absorbing discrete diffusion and any-order autoregressive models (AO-ARMs), showing that the upper bound on the negative log-likelihood for the diffusion model can be interpreted as an expected negative log-likelihood for AO-ARMs. Further, our RADD models achieve SOTA performance among diffusion models on 5 zero-shot language modeling benchmarks (measured by perplexity) at the GPT-2 scale. Our code is available at https://github.com/ML-GSAI/RADD.
△ Less
Submitted 6 July, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
Authors:
Zhongren Dong,
Zixing Zhang,
Weixiang Xu,
Jing Han,
Jianjun Ou,
Björn W. Schuller
Abstract:
Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying…
▽ More
Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying such models on edge devices. In this context, we construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for AD detection. Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation, and a GELU-based Gated Linear Unit to replace the feedforward layer, aiming to automatically filter out the redundant information. Moreover, we design a hierarchical structure to force it to learn a variety of information grains, from the frame level to the dialogue level. By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work, but with significant computational complexity and model size reduction compared to the standard Transformer. This shows the efficiency of HAFFormer in dealing with long audio for AD detection.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting
Authors:
Yifei Gao,
Jie Ou,
Lei Wang,
Jun Cheng
Abstract:
Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly de…
▽ More
Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly deviate from those seen during training. Additionally, issues such as dilation and aliasing arise when zooming in or out. These challenges can all be traced back to a single underlying issue: insufficient sampling. In our paper, we present a bootstrapping method that significantly addresses this problem. This approach employs a diffusion model to enhance the rendering of novel views using trained 3D-GS, thereby streamlining the training process. Our results indicate that bootstrapping effectively reduces artifacts, as well as clear enhancements on the evaluation metrics. Furthermore, we show that our method is versatile and can be easily integrated, allowing various 3D reconstruction projects to benefit from our approach.
△ Less
Submitted 12 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues
Authors:
Jiao Ou,
Jiayu Wu,
Che Liu,
Fuzheng Zhang,
Di Zhang,
Kun Gai
Abstract:
Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which can be achieved by raising diverse, in-depth, and insightful instructions that deepen interactions. Existing methods target instructions from real instruction dialogues as a learning goal and fine-tune a user simulator for posing instructions. However, the user simulator struggles to…
▽ More
Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which can be achieved by raising diverse, in-depth, and insightful instructions that deepen interactions. Existing methods target instructions from real instruction dialogues as a learning goal and fine-tune a user simulator for posing instructions. However, the user simulator struggles to implicitly model complex dialogue flows and pose high-quality instructions. In this paper, we take inspiration from the cognitive abilities inherent in human learning and propose the explicit modeling of complex dialogue flows through instructional strategy reuse. Specifically, we first induce high-level strategies from various real instruction dialogues. These strategies are applied to new dialogue scenarios deductively, where the instructional strategies facilitate high-quality instructions. Experimental results show that our method can generate diverse, in-depth, and insightful instructions for a given dialogue history. The constructed multi-turn instructional dialogues can outperform competitive baselines on the downstream chat model.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding
Authors:
Jie Ou,
Yueming Chen,
Wenhong Tian
Abstract:
While Large Language Models (LLMs) have shown remarkable abilities, they are hindered by significant resource consumption and considerable latency due to autoregressive processing. In this study, we introduce Adaptive N-gram Parallel Decoding (ANPD), an innovative and lossless approach that accelerates inference by allowing the simultaneous generation of multiple tokens. ANPD incorporates a two-st…
▽ More
While Large Language Models (LLMs) have shown remarkable abilities, they are hindered by significant resource consumption and considerable latency due to autoregressive processing. In this study, we introduce Adaptive N-gram Parallel Decoding (ANPD), an innovative and lossless approach that accelerates inference by allowing the simultaneous generation of multiple tokens. ANPD incorporates a two-stage approach: it begins with a rapid drafting phase that employs an N-gram module, which adapts based on the current interactive context, followed by a verification phase, during which the original LLM assesses and confirms the proposed tokens. Consequently, ANPD preserves the integrity of the LLM's original output while enhancing processing speed. We further leverage a multi-level architecture for the N-gram module to enhance the precision of the initial draft, consequently reducing inference latency. ANPD eliminates the need for retraining or extra GPU memory, making it an efficient and plug-and-play enhancement. In our experiments, models such as LLaMA and its fine-tuned variants have shown speed improvements up to 3.67x, validating the effectiveness of our proposed ANPD.
△ Less
Submitted 10 July, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement
Authors:
Yihong Tang,
Jiao Ou,
Che Liu,
Fuzheng Zhang,
Di Zhang,
Kun Gai
Abstract:
The advent of Large Language Models (LLMs) has propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs). While enhanced with ordinary role-relevant training dialogues, existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. In this paper, we design the Modular ORchestrated Trap-sett…
▽ More
The advent of Large Language Models (LLMs) has propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs). While enhanced with ordinary role-relevant training dialogues, existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. In this paper, we design the Modular ORchestrated Trap-setting Interaction SystEm (MORTISE) to benchmark and improve the role-playing LLMs' performance. MORTISE can produce highly role-relevant aggressive queries through the collaborative effort of multiple LLM-based modules, and formulate corresponding responses to create an adversarial training dataset via a consistent response generator. We select 190 Chinese and English roles to construct aggressive queries to benchmark existing role-playing LLMs. Through comprehensive evaluation, we find that existing models exhibit a general deficiency in role alignment capabilities. We further select 180 of the roles to collect an adversarial training dataset (named RoleAD) and retain the other 10 roles for testing. Experiments on models improved by RoleAD indicate that our adversarial dataset ameliorates this deficiency, with the improvements demonstrating a degree of generalizability in ordinary scenarios.
△ Less
Submitted 15 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
PMFSNet: Polarized Multi-scale Feature Self-attention Network For Lightweight Medical Image Segmentation
Authors:
Jiahui Zhong,
Wenhong Tian,
Yuanlun Xie,
Zhijia Liu,
Jie Ou,
Taoran Tian,
Lei Zhang
Abstract:
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes. Applying these large-scale models to the relatively limited scale of medical image datasets tends to induce redundant computation, complicating the process without the necessary benefits. This approach not only adds complexity but also…
▽ More
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes. Applying these large-scale models to the relatively limited scale of medical image datasets tends to induce redundant computation, complicating the process without the necessary benefits. This approach not only adds complexity but also presents challenges for the integration and deployment of lightweight models on edge devices. For instance, recent transformer-based models have excelled in 2D and 3D medical image segmentation due to their extensive receptive fields and high parameter count. However, their effectiveness comes with a risk of overfitting when applied to small datasets and often neglects the vital inductive biases of Convolutional Neural Networks (CNNs), essential for local feature representation. In this work, we propose PMFSNet, a novel medical imaging segmentation model that effectively balances global and local feature processing while avoiding the computational redundancy typical in larger models. PMFSNet streamlines the UNet-based hierarchical structure and simplifies the self-attention mechanism's computational complexity, making it suitable for lightweight applications. It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies. Extensive comprehensive results demonstrate that even with a model (less than 1 million parameters), our method achieves superior performance in various segmentation tasks across different data scales. It achieves (IoU) metrics of 84.68%, 82.02%, and 78.82% on public datasets of teeth CT (CBCT), ovarian tumors ultrasound(MMOTU), and skin lesions dermoscopy images (ISIC 2018), respectively. The source code is available at https://github.com/yykzjh/PMFSNet.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
DialogBench: Evaluating LLMs as Human-like Dialogue Systems
Authors:
Jiao Ou,
Junda Lu,
Che Liu,
Yihong Tang,
Fuzheng Zhang,
Di Zhang,
Kun Gai
Abstract:
Large language models (LLMs) have achieved remarkable breakthroughs in new dialogue capabilities by leveraging instruction tuning, which refreshes human impressions of dialogue systems. The long-standing goal of dialogue systems is to be human-like enough to establish long-term connections with users. Therefore, there has been an urgent need to evaluate LLMs as human-like dialogue systems. In this…
▽ More
Large language models (LLMs) have achieved remarkable breakthroughs in new dialogue capabilities by leveraging instruction tuning, which refreshes human impressions of dialogue systems. The long-standing goal of dialogue systems is to be human-like enough to establish long-term connections with users. Therefore, there has been an urgent need to evaluate LLMs as human-like dialogue systems. In this paper, we propose DialogBench, a dialogue evaluation benchmark that contains 12 dialogue tasks to probe the capabilities of LLMs as human-like dialogue systems should have. Specifically, we prompt GPT-4 to generate evaluation instances for each task. We first design the basic prompt based on widely used design principles and further mitigate the existing biases to generate higher-quality evaluation instances. Our extensive tests on English and Chinese DialogBench of 26 LLMs show that instruction tuning improves the human likeness of LLMs to a certain extent, but most LLMs still have much room for improvement as human-like dialogue systems. Interestingly, results also show that the positioning of assistant AI can make instruction tuning weaken the human emotional perception of LLMs and their mastery of information about human daily life.
△ Less
Submitted 29 March, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Exploring the Limits of Historical Information for Temporal Knowledge Graph Extrapolation
Authors:
Yi Xu,
Junjie Ou,
Hui Xu,
Luoyi Fu,
Lei Zhou,
Xinbing Wang,
Chenghu Zhou
Abstract:
Temporal knowledge graphs, representing the dynamic relationships and interactions between entities over time, have been identified as a promising approach for event forecasting. However, a limitation of most temporal knowledge graph reasoning methods is their heavy reliance on the recurrence or periodicity of events, which brings challenges to inferring future events related to entities that lack…
▽ More
Temporal knowledge graphs, representing the dynamic relationships and interactions between entities over time, have been identified as a promising approach for event forecasting. However, a limitation of most temporal knowledge graph reasoning methods is their heavy reliance on the recurrence or periodicity of events, which brings challenges to inferring future events related to entities that lack historical interaction. In fact, the current state of affairs is often the result of a combination of historical information and underlying factors that are not directly observable. To this end, we investigate the limits of historical information for temporal knowledge graph extrapolation and propose a new event forecasting model called Contrastive Event Network (CENET) based on a novel training framework of historical contrastive learning. CENET learns both the historical and non-historical dependency to distinguish the most potential entities that best match the given query. Simultaneously, by launching contrastive learning, it trains representations of queries to probe whether the current moment is more dependent on historical or non-historical events. These representations further help train a binary classifier, whose output is a boolean mask, indicating the related entities in the search space. During the inference process, CENET employs a mask-based strategy to generate the final results. We evaluate our proposed model on five benchmark graphs. The results demonstrate that CENET significantly outperforms all existing methods in most metrics, achieving at least 8.3% relative improvement of Hits@1 over previous state-of-the-art baselines on event-based datasets.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Pragmatic Inference with a CLIP Listener for Contrastive Captioning
Authors:
Jiefu Ou,
Benno Krojer,
Daniel Fried
Abstract:
We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedure that formulates captioning as a reference game between a speaker, which produces possible captions describing the target, and a listener, which sele…
▽ More
We propose a simple yet effective and robust method for contrastive captioning: generating discriminative captions that distinguish target images from very similar alternative distractor images. Our approach is built on a pragmatic inference procedure that formulates captioning as a reference game between a speaker, which produces possible captions describing the target, and a listener, which selects the target given the caption. Unlike previous methods that derive both speaker and listener distributions from a single captioning model, we leverage an off-the-shelf CLIP model to parameterize the listener. Compared with captioner-only pragmatic models, our method benefits from rich vision language alignment representations from CLIP when reasoning over distractors. Like previous methods for discriminative captioning, our method uses a hyperparameter to control the tradeoff between the informativity (how likely captions are to allow a human listener to discriminate the target image) and the fluency of the captions. However, we find that our method is substantially more robust to the value of this hyperparameter than past methods, which allows us to automatically optimize the captions for informativity - outperforming past methods for discriminative captioning by 11% to 15% accuracy in human evaluations
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Ensemble Reinforcement Learning: A Survey
Authors:
Yanjie Song,
P. N. Suganthan,
Witold Pedrycz,
Junwei Ou,
Yongming He,
Yingwu Chen,
Yutong Wu
Abstract:
Reinforcement Learning (RL) has emerged as a highly effective technique for addressing various scientific and applied problems. Despite its success, certain complex tasks remain challenging to be addressed solely with a single model and algorithm. In response, ensemble reinforcement learning (ERL), a promising approach that combines the benefits of both RL and ensemble learning (EL), has gained wi…
▽ More
Reinforcement Learning (RL) has emerged as a highly effective technique for addressing various scientific and applied problems. Despite its success, certain complex tasks remain challenging to be addressed solely with a single model and algorithm. In response, ensemble reinforcement learning (ERL), a promising approach that combines the benefits of both RL and ensemble learning (EL), has gained widespread popularity. ERL leverages multiple models or training algorithms to comprehensively explore the problem space and possesses strong generalization capabilities. In this study, we present a comprehensive survey on ERL to provide readers with an overview of recent advances and challenges in the field. Firstly, we provide an introduction to the background and motivation for ERL. Secondly, we conduct a detailed analysis of strategies such as model selection and combination that have been successfully implemented in ERL. Subsequently, we explore the application of ERL, summarize the datasets, and analyze the algorithms employed. Finally, we outline several open questions and discuss future research directions of ERL. By offering guidance for future scientific research and engineering applications, this survey significantly contributes to the advancement of ERL.
△ Less
Submitted 13 December, 2023; v1 submitted 5 March, 2023;
originally announced March 2023.
-
Hierarchical Event Grounding
Authors:
Jiefu Ou,
Adithya Pratapa,
Rishubh Gupta,
Teruko Mitamura
Abstract:
Event grounding aims at linking mention references in text corpora to events from a knowledge base (KB). Previous work on this task focused primarily on linking to a single KB event, thereby overlooking the hierarchical aspects of events. Events in documents are typically described at various levels of spatio-temporal granularity (Glavas et al. 2014). These hierarchical relations are utilized in d…
▽ More
Event grounding aims at linking mention references in text corpora to events from a knowledge base (KB). Previous work on this task focused primarily on linking to a single KB event, thereby overlooking the hierarchical aspects of events. Events in documents are typically described at various levels of spatio-temporal granularity (Glavas et al. 2014). These hierarchical relations are utilized in downstream tasks of narrative understanding and schema construction. In this work, we present an extension to the event grounding task that requires tackling hierarchical event structures from the KB. Our proposed task involves linking a mention reference to a set of event labels from a subevent hierarchy in the KB. We propose a retrieval methodology that leverages event hierarchy through an auxiliary hierarchical loss (Murty et al. 2018). On an automatically created multilingual dataset from Wikipedia and Wikidata, our experiments demonstrate the effectiveness of the hierarchical loss against retrieve and re-rank baselines (Wu et al. 2020; Pratapa, Gupta, and Mitamura 2022). Furthermore, we demonstrate the systems' ability to aid hierarchical discovery among unseen events.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Temporal Knowledge Graph Reasoning with Historical Contrastive Learning
Authors:
Yi Xu,
Junjie Ou,
Hui Xu,
Luoyi Fu
Abstract:
Temporal knowledge graph, serving as an effective way to store and model dynamic relations, shows promising prospects in event forecasting. However, most temporal knowledge graph reasoning methods are highly dependent on the recurrence or periodicity of events, which brings challenges to inferring future events related to entities that lack historical interaction. In fact, the current moment is of…
▽ More
Temporal knowledge graph, serving as an effective way to store and model dynamic relations, shows promising prospects in event forecasting. However, most temporal knowledge graph reasoning methods are highly dependent on the recurrence or periodicity of events, which brings challenges to inferring future events related to entities that lack historical interaction. In fact, the current moment is often the combined effect of a small part of historical information and those unobserved underlying factors. To this end, we propose a new event forecasting model called Contrastive Event Network (CENET), based on a novel training framework of historical contrastive learning. CENET learns both the historical and non-historical dependency to distinguish the most potential entities that can best match the given query. Simultaneously, it trains representations of queries to investigate whether the current moment depends more on historical or non-historical events by launching contrastive learning. The representations further help train a binary classifier whose output is a boolean mask to indicate related entities in the search space. During the inference process, CENET employs a mask-based strategy to generate the final results. We evaluate our proposed model on five benchmark graphs. The results demonstrate that CENET significantly outperforms all existing methods in most metrics, achieving at least $8.3\%$ relative improvement of Hits@1 over previous state-of-the-art baselines on event-based datasets.
△ Less
Submitted 2 December, 2022; v1 submitted 20 November, 2022;
originally announced November 2022.
-
Counterfactual Data Augmentation via Perspective Transition for Open-Domain Dialogues
Authors:
Jiao Ou,
Jinchao Zhang,
Yang Feng,
Jie Zhou
Abstract:
The construction of open-domain dialogue systems requires high-quality dialogue datasets. The dialogue data admits a wide variety of responses for a given dialogue history, especially responses with different semantics. However, collecting high-quality such a dataset in most scenarios is labor-intensive and time-consuming. In this paper, we propose a data augmentation method to automatically augme…
▽ More
The construction of open-domain dialogue systems requires high-quality dialogue datasets. The dialogue data admits a wide variety of responses for a given dialogue history, especially responses with different semantics. However, collecting high-quality such a dataset in most scenarios is labor-intensive and time-consuming. In this paper, we propose a data augmentation method to automatically augment high-quality responses with different semantics by counterfactual inference. Specifically, given an observed dialogue, our counterfactual generation model first infers semantically different responses by replacing the observed reply perspective with substituted ones. Furthermore, our data selection method filters out detrimental augmented responses. Experimental results show that our data augmentation method can augment high-quality responses with different semantics for a given dialogue history, and can outperform competitive baselines on multiple downstream tasks.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Influence maximization under limited network information: Seeding high-degree neighbors
Authors:
Jiamin Ou,
Vincent Buskens,
Arnout Van De Rijt,
Debabrata Panja
Abstract:
The diffusion of information, norms, and practices across a social network can be initiated by compelling a small number of seed individuals to adopt first. Strategies proposed in previous work either assume full network information or large degree of control over what information is collected. However, privacy settings on the Internet and high non-response in surveys often severely limit availabl…
▽ More
The diffusion of information, norms, and practices across a social network can be initiated by compelling a small number of seed individuals to adopt first. Strategies proposed in previous work either assume full network information or large degree of control over what information is collected. However, privacy settings on the Internet and high non-response in surveys often severely limit available connectivity information. Here we propose a seeding strategy for scenarios with limited network information: Only the degrees and connections of some random nodes are known. This new strategy is a modification of "random neighbor sampling" and seeds the highest-degree neighbors of randomly selected nodes. In simulations of a linear threshold model on a range of synthetic and real-world networks, we find that this new strategy outperforms other seeding strategies, including high-degree seeding and clustered seeding.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
ESAN: Efficient Sentiment Analysis Network of A-Shares Research Reports for Stock Price Prediction
Authors:
Tuo Sun,
Wanrong Zheng,
Shufan Yu,
Mengxun Li,
Jiarui Ou
Abstract:
In this paper, we are going to develop a natural language processing model to help us to predict stocks in the long term. The whole network includes two modules. The first module is a natural language processing model which seeks out reliable factors from input reports. While the other is a time-series forecasting model which takes the factors as input and aims to predict stocks earnings yield. To…
▽ More
In this paper, we are going to develop a natural language processing model to help us to predict stocks in the long term. The whole network includes two modules. The first module is a natural language processing model which seeks out reliable factors from input reports. While the other is a time-series forecasting model which takes the factors as input and aims to predict stocks earnings yield. To indicate the efficiency of our model to combine the sentiment analysis module and the time-series forecasting module, we name our method ESAN.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
An Adaptive Sampling and Edge Detection Approach for Encoding Static Images for Spiking Neural Networks
Authors:
Peyton Chandarana,
Junlin Ou,
Ramtin Zand
Abstract:
Current state-of-the-art methods of image classification using convolutional neural networks are often constrained by both latency and power consumption. This places a limit on the devices, particularly low-power edge devices, that can employ these methods. Spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks which aim to address these latency and…
▽ More
Current state-of-the-art methods of image classification using convolutional neural networks are often constrained by both latency and power consumption. This places a limit on the devices, particularly low-power edge devices, that can employ these methods. Spiking neural networks (SNNs) are considered to be the third generation of artificial neural networks which aim to address these latency and power constraints by taking inspiration from biological neuronal communication processes. Before data such as images can be input into an SNN, however, they must be first encoded into spike trains. Herein, we propose a method for encoding static images into temporal spike trains using edge detection and an adaptive signal sampling method for use in SNNs. The edge detection process consists of first performing Canny edge detection on the 2D static images and then converting the edge detected images into two X and Y signals using an image-to-signal conversion method. The adaptive signaling approach consists of sampling the signals such that the signals maintain enough detail and are sensitive to abrupt changes in the signal. Temporal encoding mechanisms such as threshold-based representation (TBR) and step-forward (SF) are then able to be used to convert the sampled signals into spike trains. We use various error and indicator metrics to optimize and evaluate the efficiency and precision of the proposed image encoding approach. Comparison results between the original and reconstructed signals from spike trains generated using edge-detection and adaptive temporal encoding mechanism exhibit 18x and 7x reduction in average root mean square error (RMSE) compared to the conventional SF and TBR encoding, respectively, while used for encoding MNIST dataset.
△ Less
Submitted 19 October, 2021;
originally announced October 2021.
-
Boost Neural Networks by Checkpoints
Authors:
Feng Wang,
Guoyizhe Wei,
Qiao Liu,
Jinxiang Ou,
Xian Wei,
Hairong Lv
Abstract:
Training multiple deep neural networks (DNNs) and averaging their outputs is a simple way to improve the predictive performance. Nevertheless, the multiplied training cost prevents this ensemble method to be practical and efficient. Several recent works attempt to save and ensemble the checkpoints of DNNs, which only requires the same computational cost as training a single network. However, these…
▽ More
Training multiple deep neural networks (DNNs) and averaging their outputs is a simple way to improve the predictive performance. Nevertheless, the multiplied training cost prevents this ensemble method to be practical and efficient. Several recent works attempt to save and ensemble the checkpoints of DNNs, which only requires the same computational cost as training a single network. However, these methods suffer from either marginal accuracy improvements due to the low diversity of checkpoints or high risk of divergence due to the cyclical learning rates they adopted. In this paper, we propose a novel method to ensemble the checkpoints, where a boosting scheme is utilized to accelerate model convergence and maximize the checkpoint diversity. We theoretically prove that it converges by reducing exponential loss. The empirical evaluation also indicates our proposed ensemble outperforms single model and existing ensembles in terms of accuracy and efficiency. With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-ImageNet with ResNet-110 architecture. Moreover, the adaptive sample weights in our method make it an effective solution to address the imbalanced class distribution. In the experiments, it yields up to 5.02% higher accuracy over single EfficientNet-B0 on the imbalanced datasets.
△ Less
Submitted 25 October, 2021; v1 submitted 3 October, 2021;
originally announced October 2021.
-
Constructing Emotion Consensus and Utilizing Unpaired Data for Empathetic Dialogue Generation
Authors:
Lei Shen,
Jinchao Zhang,
Jiao Ou,
Xiaofang Zhao,
Jie Zhou
Abstract:
Researches on dialogue empathy aim to endow an agent with the capacity of accurate understanding and proper responding for emotions. Existing models for empathetic dialogue generation focus on the emotion flow in one direction, that is, from the context to response. We argue that conducting an empathetic conversation is a bidirectional process, where empathy occurs when the emotions of two interlo…
▽ More
Researches on dialogue empathy aim to endow an agent with the capacity of accurate understanding and proper responding for emotions. Existing models for empathetic dialogue generation focus on the emotion flow in one direction, that is, from the context to response. We argue that conducting an empathetic conversation is a bidirectional process, where empathy occurs when the emotions of two interlocutors could converge on the same point, i.e., reaching an emotion consensus. Besides, we also find that the empathetic dialogue corpus is extremely limited, which further restricts the model performance. To address the above issues, we propose a dual-generative model, Dual-Emp, to simultaneously construct the emotion consensus and utilize some external unpaired data. Specifically, our model integrates a forward dialogue model, a backward dialogue model, and a discrete latent variable representing the emotion consensus into a unified architecture. Then, to alleviate the constraint of paired data, we extract unpaired emotional data from open-domain conversations and employ Dual-Emp to produce pseudo paired empathetic samples, which is more efficient and low-cost than the human annotation. Automatic and human evaluations demonstrate that our method outperforms competitive baselines in producing coherent and empathetic responses.
△ Less
Submitted 18 September, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Hidden dependence of spreading vulnerability on topological complexity
Authors:
Mark M. Dekker,
Raoul D. Schram,
Jiamin Ou,
Debabrata Panja
Abstract:
Many dynamical phenomena in complex systems concern spreading that plays out on top of networks with changing architecture over time -- commonly known as temporal networks. A complex system's proneness to facilitate spreading phenomena, which we abbreviate as its `spreading vulnerability', is often surmised to be related to the topology of the temporal network featured by the system. Yet, cleanly…
▽ More
Many dynamical phenomena in complex systems concern spreading that plays out on top of networks with changing architecture over time -- commonly known as temporal networks. A complex system's proneness to facilitate spreading phenomena, which we abbreviate as its `spreading vulnerability', is often surmised to be related to the topology of the temporal network featured by the system. Yet, cleanly extracting spreading vulnerability of a complex system directly from the topological information of the temporal network remains a challenge. Here, using data from a diverse set of real-world complex systems, we develop the `entropy of temporal entanglement' as a novel and insightful quantity to measure topological complexities of temporal networks. We show that this parameter-free quantity naturally allows for topological comparisons across vastly different complex systems. Importantly, by simulating three different types of stochastic dynamical processes playing out on top of temporal networks, we demonstrate that the entropy of temporal entanglement serves as a quantitative embodiment of the systems' spreading vulnerability, irrespective of the details of the processes. In being able to do so, i.e., in being able to quantitatively extract a complex system's proneness to facilitate spreading phenomena from topology, this entropic measure opens itself for applications in a wide variety of natural, social, biological and engineered systems.
△ Less
Submitted 14 April, 2022; v1 submitted 4 July, 2021;
originally announced July 2021.
-
Quantifying agent impacts on contact sequences in social interactions
Authors:
Mark M. Dekker,
Tessa F. Blanken,
Fabian Dablander,
Jiamin Ou,
Denny Borsboom,
Debabrata Panja
Abstract:
Human social behavior plays a crucial role in how pathogens like SARS-CoV-2 or fake news spread in a population. Social interactions determine the contact network among individuals, while spreading, requiring individual-to-individual transmission, takes place on top of the network. Studying the topological aspects of a contact network, therefore, not only has the potential of leading to valuable i…
▽ More
Human social behavior plays a crucial role in how pathogens like SARS-CoV-2 or fake news spread in a population. Social interactions determine the contact network among individuals, while spreading, requiring individual-to-individual transmission, takes place on top of the network. Studying the topological aspects of a contact network, therefore, not only has the potential of leading to valuable insights into how the behavior of individuals impacts spreading phenomena, but it may also open up possibilities for devising effective behavioral interventions. Because of the temporal nature of interactions - since the topology of the network, containing who is in contact with whom, when, for how long, and in which precise sequence, varies (rapidly) in time - analyzing them requires developing network methods and metrics that respect temporal variability, in contrast to those developed for static (i.e., time-invariant) networks. Here, by means of event mapping, we propose a method to quantify how quickly agents mingle by transforming temporal network data of agent contacts. We define a novel measure called 'contact sequence centrality', which quantifies the impact of an individual on the contact sequences, reflecting the individual's behavioral potential for spreading. Comparing contact sequence centrality across agents allows for ranking the impact of agents and identifying potential 'behavioral super-spreaders'. The method is applied to social interaction data collected at an art fair in Amsterdam. We relate the measure to the existing network metrics, both temporal and static, and find that (mostly at longer time scales) traditional metrics lose their resemblance to contact sequence centrality. Our work highlights the importance of accounting for the sequential nature of contacts when analyzing social interactions.
△ Less
Submitted 14 April, 2022; v1 submitted 3 July, 2021;
originally announced July 2021.
-
Exploring Discourse Structures for Argument Impact Classification
Authors:
Xin Liu,
Jiefu Ou,
Yangqiu Song,
Xin Jiang
Abstract:
Discourse relations among arguments reveal logical structures of a debate conversation. However, no prior work has explicitly studied how the sequence of discourse relations influence a claim's impact. This paper empirically shows that the discourse relations between two arguments along the context path are essential factors for identifying the persuasive power of an argument. We further propose D…
▽ More
Discourse relations among arguments reveal logical structures of a debate conversation. However, no prior work has explicitly studied how the sequence of discourse relations influence a claim's impact. This paper empirically shows that the discourse relations between two arguments along the context path are essential factors for identifying the persuasive power of an argument. We further propose DisCOC to inject and fuse the sentence-level structural discourse information with contextualized features derived from large-scale language models. Experimental results and extensive analysis show that the attention and gate mechanisms that explicitly model contexts and texts can indeed help the argument impact classification task defined by Durmus et al. (2019), and discourse structures among the context path of the claim to be classified can further boost the performance.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Full-Resolution Encoder-Decoder Networks with Multi-Scale Feature Fusion for Human Pose Estimation
Authors:
Jie Ou,
Mingjian Chen,
Hong Wu
Abstract:
To achieve more accurate 2D human pose estimation, we extend the successful encoder-decoder network, simple baseline network (SBN), in three ways. To reduce the quantization errors caused by the large output stride size, two more decoder modules are appended to the end of the simple baseline network to get full output resolution. Then, the global context blocks (GCBs) are added to the encoder and…
▽ More
To achieve more accurate 2D human pose estimation, we extend the successful encoder-decoder network, simple baseline network (SBN), in three ways. To reduce the quantization errors caused by the large output stride size, two more decoder modules are appended to the end of the simple baseline network to get full output resolution. Then, the global context blocks (GCBs) are added to the encoder and decoder modules to enhance them with global context features. Furthermore, we propose a novel spatial-attention-based multi-scale feature collection and distribution module (SA-MFCD) to fuse and distribute multi-scale features to boost the pose estimation. Experimental results on the MS COCO dataset indicate that our network can remarkably improve the accuracy of human pose estimation over SBN, our network using ResNet34 as the backbone network can even achieve the same accuracy as SBN with ResNet152, and our networks can achieve superior results with big backbone networks.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
SE-DAE: Style-Enhanced Denoising Auto-Encoder for Unsupervised Text Style Transfer
Authors:
Jicheng Li,
Yang Feng,
Jiao Ou
Abstract:
Text style transfer aims to change the style of sentences while preserving the semantic meanings. Due to the lack of parallel data, the Denoising Auto-Encoder (DAE) is widely used in this task to model distributions of different sentence styles. However, because of the conflict between the target of the conventional denoising procedure and the target of style transfer task, the vanilla DAE can not…
▽ More
Text style transfer aims to change the style of sentences while preserving the semantic meanings. Due to the lack of parallel data, the Denoising Auto-Encoder (DAE) is widely used in this task to model distributions of different sentence styles. However, because of the conflict between the target of the conventional denoising procedure and the target of style transfer task, the vanilla DAE can not produce satisfying enough results. To improve the transferability of the model, most of the existing works combine DAE with various complicated unsupervised networks, which makes the whole system become over-complex. In this work, we design a novel DAE model named Style-Enhanced DAE (SE-DAE), which is specifically designed for the text style transfer task. Compared with previous complicated style-transfer models, our model do not consist of any complicated unsupervised networks, but only relies on the high-quality pseudo-parallel data generated by a novel data refinement mechanism. Moreover, to alleviate the conflict between the targets of the conventional denoising procedure and the style transfer task, we propose another novel style denoising mechanism, which is more compatible with the target of the style transfer task. We validate the effectiveness of our model on two style benchmark datasets. Both automatic evaluation and human evaluation show that our proposed model is highly competitive compared with previous strong the state of the art (SOTA) approaches and greatly outperforms the vanilla DAE.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities
Authors:
Hongming Zhang,
Xin Liu,
Haojie Pan,
Haowen Ke,
Jiefu Ou,
Tianqing Fang,
Yangqiu Song
Abstract:
Commonsense knowledge acquisition and reasoning have long been a core artificial intelligence problem. However, in the past, there has been a lack of scalable methods to collect commonsense knowledge. In this paper, we propose to develop principles for collecting commonsense knowledge based on selectional preference. We generalize the definition of selectional preference from one-hop linguistic sy…
▽ More
Commonsense knowledge acquisition and reasoning have long been a core artificial intelligence problem. However, in the past, there has been a lack of scalable methods to collect commonsense knowledge. In this paper, we propose to develop principles for collecting commonsense knowledge based on selectional preference. We generalize the definition of selectional preference from one-hop linguistic syntactic relations to higher-order relations over linguistic graphs. Unlike previous commonsense knowledge definition (e.g., ConceptNet), selectional preference (SP) knowledge only relies on statistical distribution over linguistic graphs, which can be efficiently and accurately acquired from the unlabeled corpus with modern tools. Following this principle, we develop a large-scale eventuality (a linguistic term covering activity, state, and event)-based knowledge graph ASER, where each eventuality is represented as a dependency graph, and the relation between them is a discourse relation defined in shallow discourse parsing. The higher-order selectional preference over collected linguistic graphs reflects various kinds of commonsense knowledge. Moreover, motivated by the observation that humans understand events by abstracting the observed events to a higher level and can thus transfer their knowledge to new events, we propose a conceptualization module to significantly boost the coverage of ASER. In total, ASER contains 648 million edges between 438 million eventualities. After conceptualization with Probase, a selectional preference based concept-instance relational knowledge base, our concept graph contains 15 million conceptualized eventualities and 224 million edges between them. Detailed analysis is provided to demonstrate its quality. All the collected data, APIs, and tools are available at https://github.com/HKUST-KnowComp/ASER.
△ Less
Submitted 16 January, 2022; v1 submitted 5 April, 2021;
originally announced April 2021.
-
InFillmore: Frame-Guided Language Generation with Bidirectional Context
Authors:
Jiefu Ou,
Nathaniel Weir,
Anton Belyy,
Felix Yu,
Benjamin Van Durme
Abstract:
We propose a structured extension to bidirectional-context conditional language generation, or "infilling," inspired by Frame Semantic theory (Fillmore, 1976). Guidance is provided through two approaches: (1) model fine-tuning, conditioning directly on observed symbolic frames, and (2) a novel extension to disjunctive lexically constrained decoding that leverages frame semantic lexical units. Auto…
▽ More
We propose a structured extension to bidirectional-context conditional language generation, or "infilling," inspired by Frame Semantic theory (Fillmore, 1976). Guidance is provided through two approaches: (1) model fine-tuning, conditioning directly on observed symbolic frames, and (2) a novel extension to disjunctive lexically constrained decoding that leverages frame semantic lexical units. Automatic and human evaluations confirm that frame-guided generation allows for explicit manipulation of intended infill semantics, with minimal loss in distinguishability from human-generated text. Our methods flexibly apply to a variety of use scenarios, and we provide a codebase and interactive demo available from https://nlp.jhu.edu/demos/infillmore.
△ Less
Submitted 22 March, 2022; v1 submitted 8 March, 2021;
originally announced March 2021.
-
MicroHECL: High-Efficient Root Cause Localization in Large-Scale Microservice Systems
Authors:
Dewei Liu,
Chuan He,
Xin Peng,
Fan Lin,
Chenxi Zhang,
Shengfang Gong,
Ziang Li,
Jiayu Ou,
Zheshun Wu
Abstract:
Availability issues of industrial microservice systems (e.g., drop of successfully placed orders and processed transactions) directly affect the running of the business. These issues are usually caused by various types of service anomalies which propagate along service dependencies. Accurate and high-efficient root cause localization is thus a critical challenge for large-scale industrial microser…
▽ More
Availability issues of industrial microservice systems (e.g., drop of successfully placed orders and processed transactions) directly affect the running of the business. These issues are usually caused by various types of service anomalies which propagate along service dependencies. Accurate and high-efficient root cause localization is thus a critical challenge for large-scale industrial microservice systems. Existing approaches use service dependency graph based analysis techniques to automatically locate root causes. However, these approaches are limited due to their inaccurate detection of service anomalies and inefficient traversing of service dependency graph. In this paper, we propose a high-efficient root cause localization approach for availability issues of microservice systems, called MicroHECL. Based on a dynamically constructed service call graph, MicroHECL analyzes possible anomaly propagation chains, and ranks candidate root causes based on correlation analysis. We combine machine learning and statistical methods and design customized models for the detection of different types of service anomalies (i.e., performance, reliability, traffic). To improve the efficiency, we adopt a pruning strategy to eliminate irrelevant service calls in anomaly propagation chain analysis. Experimental studies show that MicroHECL significantly outperforms two state-of-the-art baseline approaches in terms of both accuracy and efficiency. MicroHECL has been used in Alibaba and achieves a top-3 hit ratio of 68% with root cause localization time reduced from 30 minutes to 5 minutes.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Efficient Human Pose Estimation with Depthwise Separable Convolution and Person Centroid Guided Joint Grouping
Authors:
Jie Ou,
Hong Wu
Abstract:
In this paper, we propose efficient and effective methods for 2D human pose estimation. A new ResBlock is proposed based on depthwise separable convolution and is utilized instead of the original one in Hourglass network. It can be further enhanced by replacing the vanilla depthwise convolution with a mixed depthwise convolution. Based on it, we propose a bottom-up multi-person pose estimation met…
▽ More
In this paper, we propose efficient and effective methods for 2D human pose estimation. A new ResBlock is proposed based on depthwise separable convolution and is utilized instead of the original one in Hourglass network. It can be further enhanced by replacing the vanilla depthwise convolution with a mixed depthwise convolution. Based on it, we propose a bottom-up multi-person pose estimation method. A rooted tree is used to represent human pose by introducing person centroid as the root which connects to all body joints directly or hierarchically. Two branches of sub-networks are used to predict the centroids, body joints and their offsets to their parent nodes. Joints are grouped by tracing along their offsets to the closest centroids. Experimental results on the MPII human dataset and the LSP dataset show that both our single-person and multi-person pose estimation methods can achieve competitive accuracies with low computational costs.
△ Less
Submitted 6 December, 2020;
originally announced December 2020.
-
On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification
Authors:
Xin Liu,
Jiefu Ou,
Yangqiu Song,
Xin Jiang
Abstract:
Implicit discourse relation classification is one of the most difficult parts in shallow discourse parsing as the relation prediction without explicit connectives requires the language understanding at both the text span level and the sentence level. Previous studies mainly focus on the interactions between two arguments. We argue that a powerful contextualized representation module, a bilateral m…
▽ More
Implicit discourse relation classification is one of the most difficult parts in shallow discourse parsing as the relation prediction without explicit connectives requires the language understanding at both the text span level and the sentence level. Previous studies mainly focus on the interactions between two arguments. We argue that a powerful contextualized representation module, a bilateral multi-perspective matching module, and a global information fusion module are all important to implicit discourse analysis. We propose a novel model to combine these modules together. Extensive experiments show that our proposed model outperforms BERT and other state-of-the-art systems on the PDTB dataset by around 8% and CoNLL 2016 datasets around 16%. We also analyze the effectiveness of different modules in the implicit discourse relation classification task and demonstrate how different levels of representation learning can affect the results.
△ Less
Submitted 28 April, 2020; v1 submitted 27 April, 2020;
originally announced April 2020.
-
FGN: Fully Guided Network for Few-Shot Instance Segmentation
Authors:
Zhibo Fan,
Jin-Gang Yu,
Zhihao Liang,
Jiarong Ou,
Changxin Gao,
Gui-Song Xia,
Yuanqing Li
Abstract:
Few-shot instance segmentation (FSIS) conjoins the few-shot learning paradigm with general instance segmentation, which provides a possible way of tackling instance segmentation in the lack of abundant labeled data for training. This paper presents a Fully Guided Network (FGN) for few-shot instance segmentation. FGN perceives FSIS as a guided model where a so-called support set is encoded and util…
▽ More
Few-shot instance segmentation (FSIS) conjoins the few-shot learning paradigm with general instance segmentation, which provides a possible way of tackling instance segmentation in the lack of abundant labeled data for training. This paper presents a Fully Guided Network (FGN) for few-shot instance segmentation. FGN perceives FSIS as a guided model where a so-called support set is encoded and utilized to guide the predictions of a base instance segmentation network (i.e., Mask R-CNN), critical to which is the guidance mechanism. In this view, FGN introduces different guidance mechanisms into the various key components in Mask R-CNN, including Attention-Guided RPN, Relation-Guided Detector, and Attention-Guided FCN, in order to make full use of the guidance effect from the support set and adapt better to the inter-class generalization. Experiments on public datasets demonstrate that our proposed FGN can outperform the state-of-the-art methods.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Autonomous quadrotor obstacle avoidance based on dueling double deep recurrent Q-learning with monocular vision
Authors:
Jiajun Ou,
Xiao Guo,
Ming Zhu,
Wenjie Lou
Abstract:
The rapid development of unmanned aerial vehicles (UAV) puts forward a higher requirement for autonomous obstacle avoidance. Due to the limited payload and power supply, small UAVs such as quadrotors usually carry simple sensors and computation units, which makes traditional methods more challenging to implement. In this paper, a novel framework is demonstrated to control a quadrotor flying throug…
▽ More
The rapid development of unmanned aerial vehicles (UAV) puts forward a higher requirement for autonomous obstacle avoidance. Due to the limited payload and power supply, small UAVs such as quadrotors usually carry simple sensors and computation units, which makes traditional methods more challenging to implement. In this paper, a novel framework is demonstrated to control a quadrotor flying through crowded environments autonomously with monocular vision. The framework adopts a two-stage architecture, consisting of a sensing module and a decision module. The sensing module is based on an unsupervised deep learning method. And the decision module uses dueling double deep recurrent Q-learning to eliminate the adverse effects of limited observation capacity of an on-board monocular camera. The framework enables the quadrotor to realize autonomous obstacle avoidance without any prior environment information or labeled datasets for training. The trained model shows a high success rate in the simulation and a good generalization ability for transformed scenarios.
△ Less
Submitted 2 March, 2020; v1 submitted 9 February, 2020;
originally announced February 2020.