-
HDNet: Physics-Inspired Neural Network for Flow Estimation based on Helmholtz Decomposition
Authors:
Miao Qi,
Ramzi Idoughi,
Wolfgang Heidrich
Abstract:
Flow estimation problems are ubiquitous in scientific imaging. Often, the underlying flows are subject to physical constraints that can be exploited in the flow estimation; for example, incompressible (divergence-free) flows are expected for many fluid experiments, while irrotational (curl-free) flows arise in the analysis of optical distortions and wavefront sensing. In this work, we propose a Ph…
▽ More
Flow estimation problems are ubiquitous in scientific imaging. Often, the underlying flows are subject to physical constraints that can be exploited in the flow estimation; for example, incompressible (divergence-free) flows are expected for many fluid experiments, while irrotational (curl-free) flows arise in the analysis of optical distortions and wavefront sensing. In this work, we propose a Physics- Inspired Neural Network (PINN) named HDNet, which performs a Helmholtz decomposition of an arbitrary flow field, i.e., it decomposes the input flow into a divergence-only and a curl-only component. HDNet can be trained exclusively on synthetic data generated by reverse Helmholtz decomposition, which we call Helmholtz synthesis. As a PINN, HDNet is fully differentiable and can easily be integrated into arbitrary flow estimation problems.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring
Authors:
Matthew Gadd,
Daniele De Martini,
Luke Pitt,
Wayne Tubby,
Matthew Towlson,
Chris Prahacs,
Oliver Bartlett,
John Jackson,
Man Qi,
Paul Newman,
Andrew Hector,
Roberto Salguero-Gómez,
Nick Hawes
Abstract:
We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platfor…
▽ More
We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platform, as localisation is a foundational part of that control loop, and so routes must be carefully taught and retaught until autonomy is robust and repeatable. Our system is demonstrated over a 6-week period monitoring the response of grass species to experimental climate change manipulations. We also discuss the applicability of our pipeline to monitor biodiversity in other complex natural settings.
△ Less
Submitted 1 May, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Is Next Token Prediction Sufficient for GPT? Exploration on Code Logic Comprehension
Authors:
Mengnan Qi,
Yufan Huang,
Yongqiang Yao,
Maoquan Wang,
Bin Gu,
Neel Sundaresan
Abstract:
Large language models (LLMs) has experienced exponential growth, they demonstrate remarkable performance across various tasks. Notwithstanding, contemporary research primarily centers on enhancing the size and quality of pretraining data, still utilizing the next token prediction task on autoregressive transformer model structure. The efficacy of this task in truly facilitating the model's compreh…
▽ More
Large language models (LLMs) has experienced exponential growth, they demonstrate remarkable performance across various tasks. Notwithstanding, contemporary research primarily centers on enhancing the size and quality of pretraining data, still utilizing the next token prediction task on autoregressive transformer model structure. The efficacy of this task in truly facilitating the model's comprehension of code logic remains questionable, we speculate that it still interprets code as mere text, while human emphasizes the underlying logical knowledge. In order to prove it, we introduce a new task, "Logically Equivalent Code Selection," which necessitates the selection of logically equivalent code from a candidate set, given a query code. Our experimental findings indicate that current LLMs underperform in this task, since they understand code by unordered bag of keywords. To ameliorate their performance, we propose an advanced pretraining task, "Next Token Prediction+". This task aims to modify the sentence embedding distribution of the LLM without sacrificing its generative capabilities. Our experimental results reveal that following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
HDA-LVIO: A High-Precision LiDAR-Visual-Inertial Odometry in Urban Environments with Hybrid Data Association
Authors:
Jian Shi,
Wei Wang,
Mingyang Qi,
Xin Li,
Ye Yan
Abstract:
To enhance localization accuracy in urban environments, an innovative LiDAR-Visual-Inertial odometry, named HDA-LVIO, is proposed by employing hybrid data association. The proposed HDA_LVIO system can be divided into two subsystems: the LiDAR-Inertial subsystem (LIS) and the Visual-Inertial subsystem (VIS). In the LIS, the LiDAR pointcloud is utilized to calculate the Iterative Closest Point (ICP)…
▽ More
To enhance localization accuracy in urban environments, an innovative LiDAR-Visual-Inertial odometry, named HDA-LVIO, is proposed by employing hybrid data association. The proposed HDA_LVIO system can be divided into two subsystems: the LiDAR-Inertial subsystem (LIS) and the Visual-Inertial subsystem (VIS). In the LIS, the LiDAR pointcloud is utilized to calculate the Iterative Closest Point (ICP) error, serving as the measurement value of Error State Iterated Kalman Filter (ESIKF) to construct the global map. In the VIS, an incremental method is firstly employed to adaptively extract planes from the global map. And the centroids of these planes are projected onto the image to obtain projection points. Then, feature points are extracted from the image and tracked along with projection points using Lucas-Kanade (LK) optical flow. Next, leveraging the vehicle states from previous intervals, sliding window optimization is performed to estimate the depth of feature points. Concurrently, a method based on epipolar geometric constraints is proposed to address tracking failures for feature points, which can improve the accuracy of depth estimation for feature points by ensuring sufficient parallax within the sliding window. Subsequently, the feature points and projection points are hybridly associated to construct reprojection error, serving as the measurement value of ESIKF to estimate vehicle states. Finally, the localization accuracy of the proposed HDA-LVIO is validated using public datasets and data from our equipment. The results demonstrate that the proposed algorithm achieves obviously improvement in localization accuracy compared to various existing algorithms.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Mechanism for Decision-aware Collaborative Federated Learning: A Pitfall of Shapley Values
Authors:
Meng Qi,
Mingxi Zhu
Abstract:
This paper investigates mechanism design for decision-aware collaboration via federated learning (FL) platforms. Our framework consists of a digital platform and multiple decision-aware agents, each endowed with proprietary data sets. The platform offers an infrastructure that enables access to the data, creates incentives for collaborative learning aimed at operational decision-making, and conduc…
▽ More
This paper investigates mechanism design for decision-aware collaboration via federated learning (FL) platforms. Our framework consists of a digital platform and multiple decision-aware agents, each endowed with proprietary data sets. The platform offers an infrastructure that enables access to the data, creates incentives for collaborative learning aimed at operational decision-making, and conducts FL to avoid direct raw data sharing. The computation and communication efficiency of the FL process is inherently influenced by the agent participation equilibrium induced by the mechanism. Therefore, assessing the system's efficiency involves two critical factors: the surplus created by coalition formation and the communication costs incurred across the coalition during FL. To evaluate the system efficiency under the intricate interplay between mechanism design, agent participation, operational decision-making, and the performance of FL algorithms, we introduce a multi-action collaborative federated learning (MCFL) framework for decision-aware agents. Under this framework, we further analyze the equilibrium for the renowned Shapley value based mechanisms. Specifically, we examine the issue of false-name manipulation, a form of dishonest behavior where participating agents create duplicate fake identities to split their original data among these identities. By solving the agent participation equilibrium, we demonstrate that while Shapley value effectively maximizes coalition-generated surplus by encouraging full participation, it inadvertently promotes false-name manipulation. This further significantly increases the communication costs when the platform conducts FL. Thus, we highlight a significant pitfall of Shapley value based mechanisms, which implicitly incentivizes data splitting and identity duplication, ultimately impairing the overall efficiency in FL systems.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Bitcoin Inscriptions: Foundations and Beyond
Authors:
Ningran Li,
Minfeng Qi,
Qin Wang,
Shiping Chen
Abstract:
Bitcoin inscription marks a pivotal moment in blockchain technology. This report presents a primary exploration of Bitcoin inscriptions. We dive into the technological underpinnings and offer a detailed comparative analysis between Bitcoin inscriptions and NFTs on other blockchains. Further, we explore a wide range of use cases and significant opportunities for future innovation, including inscrip…
▽ More
Bitcoin inscription marks a pivotal moment in blockchain technology. This report presents a primary exploration of Bitcoin inscriptions. We dive into the technological underpinnings and offer a detailed comparative analysis between Bitcoin inscriptions and NFTs on other blockchains. Further, we explore a wide range of use cases and significant opportunities for future innovation, including inscription derivative protocols, Bitcoin Layer2 solutions, and interoperability techniques.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Mutual Distillation Learning For Person Re-Identification
Authors:
Huiyuan Fu,
Kuilong Cui,
Chuanming Wang,
Mengshi Qi,
Huadong Ma
Abstract:
With the rapid advancements in deep learning technologies, person re-identification (ReID) has witnessed remarkable performance improvements. However, the majority of prior works have traditionally focused on solving the problem via extracting features solely from a single perspective, such as uniform partitioning, hard attention mechanisms, or semantic masks. While these approaches have demonstra…
▽ More
With the rapid advancements in deep learning technologies, person re-identification (ReID) has witnessed remarkable performance improvements. However, the majority of prior works have traditionally focused on solving the problem via extracting features solely from a single perspective, such as uniform partitioning, hard attention mechanisms, or semantic masks. While these approaches have demonstrated efficacy within specific contexts, they fall short in diverse situations. In this paper, we propose a novel approach, Mutual Distillation Learning For Person Re-identification (termed as MDPR), which addresses the challenging problem from multiple perspectives within a single unified model, leveraging the power of mutual distillation to enhance the feature representations collectively. Specifically, our approach encompasses two branches: a hard content branch to extract local features via a uniform horizontal partitioning strategy and a Soft Content Branch to dynamically distinguish between foreground and background and facilitate the extraction of multi-granularity features via a carefully designed attention mechanism. To facilitate knowledge exchange between these two branches, a mutual distillation and fusion process is employed, promoting the capability of the outputs of each branch. Extensive experiments are conducted on widely used person ReID datasets to validate the effectiveness and superiority of our approach. Notably, our method achieves an impressive $88.7\%/94.4\%$ in mAP/Rank-1 on the DukeMTMC-reID dataset, surpassing the current state-of-the-art results. Our source code is available at https://github.com/KuilongCui/MDPR.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Uncovering the human motion pattern: Pattern Memory-based Diffusion Model for Trajectory Prediction
Authors:
Yuxin Yang,
Pengfei Zhu,
Mengshi Qi,
Huadong Ma
Abstract:
Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constr…
▽ More
Human trajectory forecasting is a critical challenge in fields such as robotics and autonomous driving. Due to the inherent uncertainty of human actions and intentions in real-world scenarios, various unexpected occurrences may arise. To uncover latent motion patterns in human behavior, we introduce a novel memory-based method, named Motion Pattern Priors Memory Network. Our method involves constructing a memory bank derived from clustered prior knowledge of motion patterns observed in the training set trajectories. We introduce an addressing mechanism to retrieve the matched pattern and the potential target distributions for each prediction from the memory bank, which enables the identification and retrieval of natural motion patterns exhibited by agents, subsequently using the target priors memory token to guide the diffusion model to generate predictions. Extensive experiments validate the effectiveness of our approach, achieving state-of-the-art trajectory prediction accuracy. The code will be made publicly available.
△ Less
Submitted 8 January, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
Multi-Stage Contrastive Regression for Action Quality Assessment
Authors:
Qi An,
Mengshi Qi,
Huadong Ma
Abstract:
In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficient…
▽ More
In recent years, there has been growing interest in the video-based action quality assessment (AQA). Most existing methods typically solve AQA problem by considering the entire video yet overlooking the inherent stage-level characteristics of actions. To address this issue, we design a novel Multi-stage Contrastive Regression (MCoRe) framework for the AQA task. This approach allows us to efficiently extract spatial-temporal information, while simultaneously reducing computational costs by segmenting the input video into multiple stages or procedures. Inspired by the graph contrastive learning, we propose a new stage-wise contrastive learning loss function to enhance performance. As a result, MCoRe demonstrates the state-of-the-art result so far on the widely-adopted fine-grained AQA dataset.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Efficient Cloud-edge Collaborative Inference for Object Re-identification
Authors:
Chuanming Wang,
Yuxin Yang,
Mengshi Qi,
Huadong Ma
Abstract:
Current object re-identification (ReID) system follows the centralized processing paradigm, i.e., all computations are conducted in the cloud server and edge devices are only used to capture and send images. As the number of videos experiences a rapid escalation, this paradigm has become impractical due to the finite computational resources. In such a scenario, the ReID system should be converted…
▽ More
Current object re-identification (ReID) system follows the centralized processing paradigm, i.e., all computations are conducted in the cloud server and edge devices are only used to capture and send images. As the number of videos experiences a rapid escalation, this paradigm has become impractical due to the finite computational resources. In such a scenario, the ReID system should be converted to fit in the cloud-edge collaborative processing paradigm, which is crucial to boost the scalability and practicality of ReID systems. However, current relevant work lacks research on this issue, making it challenging for ReID methods to be adapted effectively. Therefore, we pioneer a cloud-edge collaborative inference framework for ReID systems and particularly propose a distribution-aware correlation modeling network (DaCM) to make the desired image return to the cloud server as soon as possible via learning to model the spatial-temporal correlations among instances. DaCM embeds the spatial-temporal correlations implicitly included in the timestamps into a graph structure, and it can be applied in the cloud to regulate the size of the upload window and on the edge device to adjust the sequence of images, respectively. Traditional ReID methods can be combined with DaCM seamlessly, enabling their application within our proposed edge-cloud collaborative framework. Extensive experiments demonstrate that our method obviously reduces transmission overhead and significantly improves performance. We will release our code and model.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Rethinking the Instruction Quality: LIFT is What You Need
Authors:
Yang Xu,
Yongqiang Yao,
Yufan Huang,
Mengnan Qi,
Maoquan Wang,
Bin Gu,
Neel Sundaresan
Abstract:
Instruction tuning, a specialized technique to enhance large language model (LLM) performance via instruction datasets, relies heavily on the quality of employed data. Existing quality improvement methods alter instruction data through dataset expansion or curation. However, the expansion method risks data redundancy, potentially compromising LLM performance, while the curation approach confines t…
▽ More
Instruction tuning, a specialized technique to enhance large language model (LLM) performance via instruction datasets, relies heavily on the quality of employed data. Existing quality improvement methods alter instruction data through dataset expansion or curation. However, the expansion method risks data redundancy, potentially compromising LLM performance, while the curation approach confines the LLM's potential to the original dataset. Our aim is to surpass the original data quality without encountering these shortcomings. To achieve this, we propose LIFT (LLM Instruction Fusion Transfer), a novel and versatile paradigm designed to elevate the instruction quality to new heights. LIFT strategically broadens data distribution to encompass more high-quality subspaces and eliminates redundancy, concentrating on high-quality segments across overall data subspaces. Experimental results demonstrate that, even with a limited quantity of high-quality instruction data selected by our paradigm, LLMs not only consistently uphold robust performance across various tasks but also surpass some state-of-the-art results, highlighting the significant improvement in instruction quality achieved by our paradigm.
△ Less
Submitted 27 December, 2023; v1 submitted 11 December, 2023;
originally announced December 2023.
-
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things
Authors:
Yaoyao Zhong,
Mengshi Qi,
Rui Wang,
Yuhan Qiu,
Yang Zhang,
Huadong Ma
Abstract:
Video Internet of Things (VIoT) has shown full potential in collecting an unprecedented volume of video data. Learning to schedule perceiving models and analyzing the collected videos intelligently will be potential sparks for VIoT. In this paper, to address the challenges posed by the fine-grained and interrelated vision tool usage of VIoT, we build VIoTGPT, the framework based on LLMs to correct…
▽ More
Video Internet of Things (VIoT) has shown full potential in collecting an unprecedented volume of video data. Learning to schedule perceiving models and analyzing the collected videos intelligently will be potential sparks for VIoT. In this paper, to address the challenges posed by the fine-grained and interrelated vision tool usage of VIoT, we build VIoTGPT, the framework based on LLMs to correctly interact with humans, query knowledge videos, and invoke vision models to accomplish complicated tasks. To support VIoTGPT and related future works, we meticulously crafted the training dataset and established benchmarks involving 11 representative vision models across three categories based on semi-automatic annotations. To guide LLM to act as the intelligent agent towards intelligent VIoT, we resort to ReAct instruction tuning based on the collected VIoT dataset to learn the tool capability. Quantitative and qualitative experimental results and analyses demonstrate the effectiveness of VIoTGPT.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
Authors:
Changsheng Lv,
Shuai Zhang,
Yapeng Tian,
Mengshi Qi,
Huadong Ma
Abstract:
In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, an…
▽ More
In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention. Our proposed method is a plug-and-play module that can be incorporated into any baseline. In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance. Our source code is available at https://github.com/Andy20178/DCL.
△ Less
Submitted 1 November, 2023; v1 submitted 30 October, 2023;
originally announced October 2023.
-
SUT: Active Defects Probing for Transcompiler Models
Authors:
Mengnan Qi,
Yufan Huang,
Maoquan Wang,
Yongqiang Yao,
Zihan Liu,
Bin Gu,
Colin Clement,
Neel Sundaresan
Abstract:
Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors, particularly, when the target language does not have syntax elements in the source language. Metrics like BLUE, CodeBLUE and computation accuracy may not expose these iss…
▽ More
Automatic Program translation has enormous application value and hence has been attracting significant interest from AI researchers. However, we observe that current program translation models still make elementary syntax errors, particularly, when the target language does not have syntax elements in the source language. Metrics like BLUE, CodeBLUE and computation accuracy may not expose these issues. In this paper we introduce a new metrics for programming language translation and these metrics address these basic syntax errors. We develop a novel active defects probing suite called Syntactic Unit Tests (SUT) which includes a highly interpretable evaluation harness for accuracy and test scoring. Experiments have shown that even powerful models like ChatGPT still make mistakes on these basic unit tests. Specifically, compared to previous program translation task evaluation dataset, its pass rate on our unit tests has decreased by 26.15%. Further our evaluation harness reveal syntactic element errors in which these models exhibit deficiencies.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Program Translation via Code Distillation
Authors:
Yufan Huang,
Mengnan Qi,
Yongqiang Yao,
Maoquan Wang,
Bin Gu,
Colin Clement,
Neel Sundaresan
Abstract:
Software version migration and program translation are an important and costly part of the lifecycle of large codebases. Traditional machine translation relies on parallel corpora for supervised translation, which is not feasible for program translation due to a dearth of aligned data. Recent unsupervised neural machine translation techniques have overcome data limitations by included techniques s…
▽ More
Software version migration and program translation are an important and costly part of the lifecycle of large codebases. Traditional machine translation relies on parallel corpora for supervised translation, which is not feasible for program translation due to a dearth of aligned data. Recent unsupervised neural machine translation techniques have overcome data limitations by included techniques such as back translation and low level compiler intermediate representations (IR). These methods face significant challenges due to the noise in code snippet alignment and the diversity of IRs respectively. In this paper we propose a novel model called Code Distillation (CoDist) whereby we capture the semantic and structural equivalence of code in a language agnostic intermediate representation. Distilled code serves as a translation pivot for any programming language, leading by construction to parallel corpora which scale to all available source code by simply applying the distillation compiler. We demonstrate that our approach achieves state-of-the-art performance on CodeXGLUE and TransCoder GeeksForGeeks translation benchmarks, with an average absolute increase of 12.7% on the TransCoder GeeksforGeeks translation benchmark compare to TransCoder-ST.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
YOLO-based Semantic Communication with Generative AI-aided Resource Allocation for Digital Twins Construction
Authors:
Baoxia Du,
Hongyang Du,
Haifeng Liu,
Dusit Niyato,
Peng Xin,
Jun Yu,
Mingyang Qi,
You Tang
Abstract:
Digital Twins play a crucial role in bridging the physical and virtual worlds. Given the dynamic and evolving characteristics of the physical world, a huge volume of data transmission and exchange is necessary to attain synchronized updates in the virtual world. In this paper, we propose a semantic communication framework based on You Only Look Once (YOLO) to construct a virtual apple orchard with…
▽ More
Digital Twins play a crucial role in bridging the physical and virtual worlds. Given the dynamic and evolving characteristics of the physical world, a huge volume of data transmission and exchange is necessary to attain synchronized updates in the virtual world. In this paper, we propose a semantic communication framework based on You Only Look Once (YOLO) to construct a virtual apple orchard with the aim of mitigating the costs associated with data transmission. Specifically, we first employ the YOLOv7-X object detector to extract semantic information from captured images of edge devices, thereby reducing the volume of transmitted data and saving transmission costs. Afterwards, we quantify the importance of each semantic information by the confidence generated through the object detector. Based on this, we propose two resource allocation schemes, i.e., the confidence-based scheme and the artificial intelligence-generated scheme, aimed at enhancing the transmission quality of important semantic information. The proposed diffusion model generates an optimal allocation scheme that outperforms both the average allocation scheme and the confidence-based allocation scheme. Moreover, to obtain semantic information more effectively, we enhance the detection capability of the YOLOv7-X object detector by introducing new Efficient Layer Aggregation Network-HorNet (ELAN-H) and SimAM attention modules, while reducing the model parameters and computational complexity, making it easier to run on edge devices with limited performance. The numerical results indicate that our proposed semantic communication framework and resource allocation schemes significantly reduce transmission costs while enhancing the transmission quality of important information in communication services.
△ Less
Submitted 25 June, 2023;
originally announced June 2023.
-
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
Authors:
Haowen Wang,
Zhengping Che,
Yufan Yang,
Mingyuan Wang,
Zhiyuan Xu,
Xiuquan Qiao,
Mengshi Qi,
Feifei Feng,
Jian Tang
Abstract:
Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of inc…
▽ More
Raw depth images captured in indoor scenarios frequently exhibit extensive missing values due to the inherent limitations of the sensors and environments. For example, transparent materials frequently elude detection by depth sensors; surfaces may introduce measurement inaccuracies due to their polished textures, extended distances, and oblique incidence angles from the sensor. The presence of incomplete depth maps imposes significant challenges for subsequent vision applications, prompting the development of numerous depth completion techniques to mitigate this problem. Numerous methods excel at reconstructing dense depth maps from sparse samples, but they often falter when faced with extensive contiguous regions of missing depth values, a prevalent and critical challenge in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. The other branch applies an RGB-depth fusion CycleGAN, adept at translating RGB imagery into detailed, textured depth maps while ensuring high fidelity through cycle consistency. We fuse the two branches via adaptive fusion modules named W-AdaIN and train the model with the help of pseudo depth maps. Comprehensive evaluations on NYU-Depth V2 and SUN RGB-D datasets show that our method significantly enhances depth completion performance particularly in realistic indoor settings.
△ Less
Submitted 11 April, 2024; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature
Authors:
Wulian Yun,
Mengshi Qi,
Chuanming Wang,
Huadong Ma
Abstract:
Weakly-supervised temporal action localization aims to locate action regions and identify action categories in untrimmed videos simultaneously by taking only video-level labels as the supervision. Pseudo label generation is a promising strategy to solve the challenging problem, but the current methods ignore the natural temporal structure of the video that can provide rich information to assist su…
▽ More
Weakly-supervised temporal action localization aims to locate action regions and identify action categories in untrimmed videos simultaneously by taking only video-level labels as the supervision. Pseudo label generation is a promising strategy to solve the challenging problem, but the current methods ignore the natural temporal structure of the video that can provide rich information to assist such a generation process. In this paper, we propose a novel weakly-supervised temporal action localization method by inferring salient snippet-feature. First, we design a saliency inference module that exploits the variation relationship between temporal neighbor snippets to discover salient snippet-features, which can reflect the significant dynamic change in the video. Secondly, we introduce a boundary refinement module that enhances salient snippet-features through the information interaction unit. Then, a discrimination enhancement module is introduced to enhance the discriminative nature of snippet-features. Finally, we adopt the refined snippet-features to produce high-fidelity pseudo labels, which could be used to supervise the training of the action localization network. Extensive experiments on two publicly available datasets, i.e., THUMOS14 and ActivityNet v1.3, demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.
△ Less
Submitted 24 December, 2023; v1 submitted 22 March, 2023;
originally announced March 2023.
-
SGFormer: Semantic Graph Transformer for Point Cloud-based 3D Scene Graph Generation
Authors:
Changsheng Lv,
Mengshi Qi,
Xia Li,
Zhengyuan Yang,
Huadong Ma
Abstract:
In this paper, we propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure. Existing methods based on graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and can only prop…
▽ More
In this paper, we propose a novel model called SGFormer, Semantic Graph TransFormer for point cloud-based 3D scene graph generation. The task aims to parse a point cloud-based scene into a semantic structural graph, with the core challenge of modeling the complex global structure. Existing methods based on graph convolutional networks (GCNs) suffer from the over-smoothing dilemma and can only propagate information from limited neighboring nodes. In contrast, SGFormer uses Transformer layers as the base building block to allow global information passing, with two types of newly-designed layers tailored for the 3D scene graph generation task. Specifically, we introduce the graph embedding layer to best utilize the global information in graph edges while maintaining comparable computation costs. Furthermore, we propose the semantic injection layer to leverage linguistic knowledge from large-scale language model (i.e., ChatGPT), to enhance objects' visual features. We benchmark our SGFormer on the established 3DSSG dataset and achieve a 40.94% absolute improvement in relationship prediction's R@50 and an 88.36% boost on the subset with complex scenes over the state-of-the-art. Our analyses further show SGFormer's superiority in the long-tail and zero-shot scenarios. Our source code is available at https://github.com/Andy20178/SGFormer.
△ Less
Submitted 20 December, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Unsupervised Self-Driving Attention Prediction via Uncertainty Mining and Knowledge Embedding
Authors:
Pengfei Zhu,
Mengshi Qi,
Xia Li,
Weijian Li,
Huadong Ma
Abstract:
Predicting attention regions of interest is an important yet challenging task for self-driving systems. Existing methodologies rely on large-scale labeled traffic datasets that are labor-intensive to obtain. Besides, the huge domain gap between natural scenes and traffic scenes in current datasets also limits the potential for model training. To address these challenges, we are the first to introd…
▽ More
Predicting attention regions of interest is an important yet challenging task for self-driving systems. Existing methodologies rely on large-scale labeled traffic datasets that are labor-intensive to obtain. Besides, the huge domain gap between natural scenes and traffic scenes in current datasets also limits the potential for model training. To address these challenges, we are the first to introduce an unsupervised way to predict self-driving attention by uncertainty modeling and driving knowledge integration. Our approach's Uncertainty Mining Branch (UMB) discovers commonalities and differences from multiple generated pseudo-labels achieved from models pre-trained on natural scenes by actively measuring the uncertainty. Meanwhile, our Knowledge Embedding Block (KEB) bridges the domain gap by incorporating driving knowledge to adaptively refine the generated pseudo-labels. Quantitative and qualitative results with equivalent or even more impressive performance compared to fully-supervised state-of-the-art approaches across all three public datasets demonstrate the effectiveness of the proposed method and the potential of this direction. The code will be made publicly available.
△ Less
Submitted 15 July, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
The non-overlapping statistical approximation to overlapping group lasso
Authors:
Mingyu Qi,
Tianxi Li
Abstract:
Group lasso is a commonly used regularization method in statistical learning in which parameters are eliminated from the model according to predefined groups. However, when the groups overlap, optimizing the group lasso penalized objective can be time-consuming on large-scale problems because of the non-separability induced by the overlapping groups. This bottleneck has seriously limited the appli…
▽ More
Group lasso is a commonly used regularization method in statistical learning in which parameters are eliminated from the model according to predefined groups. However, when the groups overlap, optimizing the group lasso penalized objective can be time-consuming on large-scale problems because of the non-separability induced by the overlapping groups. This bottleneck has seriously limited the application of overlapping group lasso regularization in many modern problems, such as gene pathway selection and graphical model estimation. In this paper, we propose a separable penalty as an approximation of the overlapping group lasso penalty. Thanks to the separability, the computation of regularization based on our penalty is substantially faster than that of the overlapping group lasso, especially for large-scale and high-dimensional problems. We show that the penalty is the tightest separable relaxation of the overlapping group lasso norm within the family of $\ell_{q_1}/\ell_{q_2}$ norms. Moreover, we show that the estimator based on the proposed separable penalty is statistically equivalent to the one based on the overlapping group lasso penalty with respect to their error bounds and the rate-optimal performance under the squared loss. We demonstrate the faster computational time and statistical equivalence of our method compared with the overlapping group lasso in simulation examples and a classification problem of cancer tumors based on gene expression and multiple gene pathways.
△ Less
Submitted 20 February, 2024; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Multi-scale frequency separation network for image deblurring
Authors:
Yanni Zhang,
Qiang Li,
Miao Qi,
Di Liu,
Jun Kong,
Jianzhong Wang
Abstract:
Image deblurring aims to restore the detailed texture information or structures from blurry images, which has become an indispensable step in many computer vision tasks. Although various methods have been proposed to deal with the image deblurring problem, most of them treated the blurry image as a whole and neglected the characteristics of different image frequencies. In this paper, we present a…
▽ More
Image deblurring aims to restore the detailed texture information or structures from blurry images, which has become an indispensable step in many computer vision tasks. Although various methods have been proposed to deal with the image deblurring problem, most of them treated the blurry image as a whole and neglected the characteristics of different image frequencies. In this paper, we present a new method called multi-scale frequency separation network (MSFS-Net) for image deblurring. MSFS-Net introduces the frequency separation module (FSM) into an encoder-decoder network architecture to capture the low- and high-frequency information of image at multiple scales. Then, a cycle-consistency strategy and a contrastive learning module (CLM) are respectively designed to retain the low-frequency information and recover the high-frequency information during deblurring. At last, the features of different scales are fused by a cross-scale feature fusion module (CSFFM). Extensive experiments on benchmark datasets show that the proposed network achieves state-of-the-art performance.
△ Less
Submitted 8 December, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Odd coloring of two subclasses of planar graphs
Authors:
Mengke Qi,
Xin Zhang
Abstract:
A proper coloring of a graph is odd if every non-isolated vertex has some color that appears an odd number of times on its neighborhood. Petruševski and Škrekovski conjectured in 2021 that every planar graph admits an odd $5$-coloring. We confirm this conjecture for outer-1-planar graphs and 2-boundary planar graphs, which are two subclasses of planar graphs.
A proper coloring of a graph is odd if every non-isolated vertex has some color that appears an odd number of times on its neighborhood. Petruševski and Škrekovski conjectured in 2021 that every planar graph admits an odd $5$-coloring. We confirm this conjecture for outer-1-planar graphs and 2-boundary planar graphs, which are two subclasses of planar graphs.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel Transformer
Authors:
Wulian Yun,
Mengshi Qi,
Chuanming Wang,
Huiyuan Fu,
Huadong Ma
Abstract:
Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks~(CNNs) to separate the noise from the original visual content, however, CNNs focus on local information and ignore the interactions between long-range regions in the frame. Furthermore, most related works directly take the output after basic spatio-temporal d…
▽ More
Video denoising aims to recover high-quality frames from the noisy video. While most existing approaches adopt convolutional neural networks~(CNNs) to separate the noise from the original visual content, however, CNNs focus on local information and ignore the interactions between long-range regions in the frame. Furthermore, most related works directly take the output after basic spatio-temporal denoising as the final result, leading to neglect the fine-grained denoising process. In this paper, we propose a Dual-stage Spatial-Channel Transformer for coarse-to-fine video denoising, which inherits the advantages of both Transformer and CNNs. Specifically, DSCT is proposed based on a progressive dual-stage architecture, namely a coarse-level and a fine-level stage to extract dynamic features and static features, respectively. At both stages, a Spatial-Channel Encoding Module is designed to model the long-range contextual dependencies at both spatial and channel levels. Meanwhile, we design a Multi-Scale Residual Structure to preserve multiple aspects of information at different stages, which contains a Temporal Features Aggregation Module to summarize the dynamic representation. Extensive experiments on four publicly available datasets demonstrate our proposed method achieves significant improvements compared to the state-of-the-art methods.
△ Less
Submitted 16 January, 2023; v1 submitted 30 April, 2022;
originally announced May 2022.
-
Completing Networks by Learning Local Connection Patterns
Authors:
Zhang Zhang,
Ruyi Tao,
Yongzai Tao,
Mingze Qi,
Jiang Zhang
Abstract:
Network completion is a harder problem than link prediction because it does not only try to infer missing links but also nodes. Different methods have been proposed to solve this problem, but few of them employed structural information - the similarity of local connection patterns. In this paper, we propose a model named C-GIN to capture the local structural patterns from the observed part of a ne…
▽ More
Network completion is a harder problem than link prediction because it does not only try to infer missing links but also nodes. Different methods have been proposed to solve this problem, but few of them employed structural information - the similarity of local connection patterns. In this paper, we propose a model named C-GIN to capture the local structural patterns from the observed part of a network based on the Graph Auto-Encoder framework equipped with Graph Isomorphism Network model and generalize these patterns to complete the whole graph. Experiments and analysis on synthetic and real-world networks from different domains show that competitive performance can be achieved by C-GIN with less information being needed, and higher accuracy compared with baseline prediction models in most cases can be obtained. We further proposed a metric "Reachable Clustering Coefficient(CC)" based on network structure. And experiments show that our model perform better on a network with higher Reachable CC.
△ Less
Submitted 7 August, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Self-Supervised Light Field Depth Estimation Using Epipolar Plane Images
Authors:
Kunyuan Li,
Jun Zhang,
Jun Gao,
Meibin Qi
Abstract:
Exploiting light field data makes it possible to obtain dense and accurate depth map. However, synthetic scenes with limited disparity range cannot contain the diversity of real scenes. By training in synthetic data, current learning-based methods do not perform well in real scenes. In this paper, we propose a self-supervised learning framework for light field depth estimation. Different from the…
▽ More
Exploiting light field data makes it possible to obtain dense and accurate depth map. However, synthetic scenes with limited disparity range cannot contain the diversity of real scenes. By training in synthetic data, current learning-based methods do not perform well in real scenes. In this paper, we propose a self-supervised learning framework for light field depth estimation. Different from the existing end-to-end training methods using disparity label per pixel, our approach implements network training by estimating EPI disparity shift after refocusing, which extends the disparity range of epipolar lines. To reduce the sensitivity of EPI to noise, we propose a new input mode called EPI-Stack, which stacks EPIs in the view dimension. This method is less sensitive to noise scenes than traditional input mode and improves the efficiency of estimation. Compared with other state-of-the-art methods, the proposed method can also obtain higher quality results in real-world scenarios, especially in the complex occlusion and depth discontinuity.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
RGB-Depth Fusion GAN for Indoor Depth Completion
Authors:
Haowen Wang,
Mingyuan Wang,
Zhengping Che,
Zhiyuan Xu,
Xiuquan Qiao,
Mengshi Qi,
Feifei Feng,
Jian Tang
Abstract:
The raw depth image captured by the indoor depth sensor usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and limited distance range. The incomplete depth map burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing meth…
▽ More
The raw depth image captured by the indoor depth sensor usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and limited distance range. The incomplete depth map burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing the large contiguous regions of missing depth values, which is common and critical. In this paper, we design a novel two-branch end-to-end fusion network, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure to regress the local dense depth values from the raw depth map, with the help of local guidance information extracted from the RGB image. In the other branch, we propose an RGB-depth fusion GAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments with the help of the pseudo depth map.
△ Less
Submitted 21 March, 2022;
originally announced March 2022.
-
Device-system Co-design of Photonic Neuromorphic Processor using Reinforcement Learning
Authors:
Yingheng Tang,
Princess Tara Zamani,
Ruiyang Chen,
Jianzhu Ma,
Minghao Qi,
Cunxi Yu,
Weilu Gao
Abstract:
The incorporation of high-performance optoelectronic devices into photonic neuromorphic processors can substantially accelerate computationally intensive operations in machine learning (ML) algorithms. However, the conventional device design wisdom is disconnected with system optimization. We report a device-system co-design methodology to optimize a free-space optical general matrix multiplicatio…
▽ More
The incorporation of high-performance optoelectronic devices into photonic neuromorphic processors can substantially accelerate computationally intensive operations in machine learning (ML) algorithms. However, the conventional device design wisdom is disconnected with system optimization. We report a device-system co-design methodology to optimize a free-space optical general matrix multiplication (GEMM) hardware accelerator by engineering a spatially reconfigurable array made from chalcogenide phase change materials. With a highly-parallelized hardware emulator constructed based on experimental information, we demonstrate the design of unit device by optimizing GEMM calculation accuracy via reinforcement learning, including deep Q-learning neural network, Bayesian optimization, and their cascaded approach, which show a clear correlation between system performance metrics and physical device specifications. Furthermore, we employ physics-aware training approaches to deploy optimized hardware to the tasks of image classification, materials discovery, and a closed-loop design of optical ML accelerators. The demonstrated framework offers insights into the co-design of optoelectronic devices and systems with reduced human-supervision and domain-knowledge barriers.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Conflict-free incidence coloring of outer-1-planar graphs
Authors:
Mengke Qi,
Xin Zhang
Abstract:
An incidence of a graph $G$ is a vertex-edge pair $(v,e)$ such that $v$ is incidence with $e$. A conflict-free incidence coloring of a graph is a coloring of the incidences in such a way that two incidences $(u,e)$ and $(v,f)$ get distinct colors if and only if they conflict each other, i.e.,(i) $u=v$, (ii) $uv$ is $e$ or $f$, or (iii) there is a vertex $w$ such that $uw=e$ and $vw=f$. The minimum…
▽ More
An incidence of a graph $G$ is a vertex-edge pair $(v,e)$ such that $v$ is incidence with $e$. A conflict-free incidence coloring of a graph is a coloring of the incidences in such a way that two incidences $(u,e)$ and $(v,f)$ get distinct colors if and only if they conflict each other, i.e.,(i) $u=v$, (ii) $uv$ is $e$ or $f$, or (iii) there is a vertex $w$ such that $uw=e$ and $vw=f$. The minimum number of colors used among all conflict-free incidence colorings of a graph is the conflict-free incidence chromatic number. A graph is outer-1-planar if it can be drawn in the plane so that vertices are on the outer-boundary and each edge is crossed at most once. In this paper, we show that the conflict-free incidence chromatic number of an outer-1-planar graph with maximum degree $Δ$ is either $2Δ$ or $2Δ+1$ unless the graph is a cycle on three vertices, and moreover, all outer-1-planar graphs with conflict-free incidence chromatic number $2Δ$ or $2Δ+1$ are completely characterized. An efficient algorithm for constructing an optimal conflict-free incidence coloring of a connected outer-1-planar graph is given.
△ Less
Submitted 9 October, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Reinforcement learning for multi-item retrieval in the puzzle-based storage system
Authors:
Jing He,
Xinglu Liu,
Qiyao Duan,
Wai Kin Victor Chan,
Mingyao Qi
Abstract:
Nowadays, fast delivery services have created the need for high-density warehouses. The puzzle-based storage system is a practical way to enhance the storage density, however, facing difficulties in the retrieval process. In this work, a deep reinforcement learning algorithm, specifically the Double&Dueling Deep Q Network, is developed to solve the multi-item retrieval problem in the system with g…
▽ More
Nowadays, fast delivery services have created the need for high-density warehouses. The puzzle-based storage system is a practical way to enhance the storage density, however, facing difficulties in the retrieval process. In this work, a deep reinforcement learning algorithm, specifically the Double&Dueling Deep Q Network, is developed to solve the multi-item retrieval problem in the system with general settings, where multiple desired items, escorts, and I/O points are placed randomly. Additionally, we propose a general compact integer programming model to evaluate the solution quality. Extensive numerical experiments demonstrate that the reinforcement learning approach can yield high-quality solutions and outperforms three related state-of-the-art heuristic algorithms. Furthermore, a conversion algorithm and a decomposition framework are proposed to handle simultaneous movement and large-scale instances respectively, thus improving the applicability of the PBS system.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
ISP-Agnostic Image Reconstruction for Under-Display Cameras
Authors:
Miao Qi,
Yuqi Li,
Wolfgang Heidrich
Abstract:
Under-display cameras have been proposed in recent years as a way to reduce the form factor of mobile devices while maximizing the screen area. Unfortunately, placing the camera behind the screen results in significant image distortions, including loss of contrast, blur, noise, color shift, scattering artifacts, and reduced light sensitivity. In this paper, we propose an image-restoration pipeline…
▽ More
Under-display cameras have been proposed in recent years as a way to reduce the form factor of mobile devices while maximizing the screen area. Unfortunately, placing the camera behind the screen results in significant image distortions, including loss of contrast, blur, noise, color shift, scattering artifacts, and reduced light sensitivity. In this paper, we propose an image-restoration pipeline that is ISP-agnostic, i.e. it can be combined with any legacy ISP to produce a final image that matches the appearance of regular cameras using the same ISP. This is achieved with a deep learning approach that performs a RAW-to-RAW image restoration. To obtain large quantities of real under-display camera training data with sufficient contrast and scene diversity, we furthermore develop a data capture method utilizing an HDR monitor, as well as a data augmentation method to generate suitable HDR content. The monitor data is supplemented with real-world data that has less scene diversity but allows us to achieve fine detail recovery without being limited by the monitor resolution. Together, this approach successfully restores color and contrast as well as image detail.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Shape and Reflectance Reconstruction in Uncontrolled Environments by Differentiable Rendering
Authors:
Rui Li,
Guangmin Zang,
Miao Qi,
Wolfgang Heidrich
Abstract:
Simultaneous reconstruction of geometry and reflectance properties in uncontrolled environments remains a challenging problem. In this paper, we propose an efficient method to reconstruct the scene's 3D geometry and reflectance from multi-view photography using conventional hand-held cameras. Our method automatically builds a virtual scene in a differentiable rendering system that roughly matches…
▽ More
Simultaneous reconstruction of geometry and reflectance properties in uncontrolled environments remains a challenging problem. In this paper, we propose an efficient method to reconstruct the scene's 3D geometry and reflectance from multi-view photography using conventional hand-held cameras. Our method automatically builds a virtual scene in a differentiable rendering system that roughly matches the real world's scene parameters, optimized by minimizing photometric objectives alternatingly and stochastically. With the optimal scene parameters evaluated, photo-realistic novel views for various viewing angles and distances can then be generated by our approach. We present the results of captured scenes with complex geometry and various reflection types. Our method also shows superior performance compared to state-of-the-art alternatives in novel view synthesis visually and quantitatively.
△ Less
Submitted 28 February, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Integrated Conditional Estimation-Optimization
Authors:
Meng Qi,
Paul Grigas,
Zuo-Jun Max Shen
Abstract:
Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that es…
▽ More
Many real-world optimization problems involve uncertain parameters with probability distributions that can be estimated using contextual feature information. In contrast to the standard approach of first estimating the distribution of uncertain parameters and then optimizing the objective based on the estimation, we propose an integrated conditional estimation-optimization (ICEO) framework that estimates the underlying conditional distribution of the random parameter while considering the structure of the optimization problem. We directly model the relationship between the conditional distribution of the random parameter and the contextual features, and then estimate the probabilistic model with an objective that aligns with the downstream optimization problem. We show that our ICEO approach is asymptotically consistent under moderate regularity conditions and further provide finite performance guarantees in the form of generalization bounds. Computationally, performing estimation with the ICEO approach is a non-convex and often non-differentiable optimization problem. We propose a general methodology for approximating the potentially non-differentiable mapping from estimated conditional distribution to the optimal decision by a differentiable function, which greatly improves the performance of gradient-based algorithms applied to the non-convex problem. We also provide a polynomial optimization solution approach in the semi-algebraic case. Numerical experiments are also conducted to show the empirical success of our approach in different situations including with limited data samples and model mismatches.
△ Less
Submitted 1 August, 2023; v1 submitted 24 October, 2021;
originally announced October 2021.
-
Exploring Uncertainty in Deep Learning for Construction of Prediction Intervals
Authors:
Yuandu Lai,
Yucheng Shi,
Yahong Han,
Yunfeng Shao,
Meiyu Qi,
Bingshuai Li
Abstract:
Deep learning has achieved impressive performance on many tasks in recent years. However, it has been found that it is still not enough for deep neural networks to provide only point estimates. For high-risk tasks, we need to assess the reliability of the model predictions. This requires us to quantify the uncertainty of model prediction and construct prediction intervals. In this paper, We explor…
▽ More
Deep learning has achieved impressive performance on many tasks in recent years. However, it has been found that it is still not enough for deep neural networks to provide only point estimates. For high-risk tasks, we need to assess the reliability of the model predictions. This requires us to quantify the uncertainty of model prediction and construct prediction intervals. In this paper, We explore the uncertainty in deep learning to construct the prediction intervals. In general, We comprehensively consider two categories of uncertainties: aleatory uncertainty and epistemic uncertainty. We design a special loss function, which enables us to learn uncertainty without uncertainty label. We only need to supervise the learning of regression task. We learn the aleatory uncertainty implicitly from the loss function. And that epistemic uncertainty is accounted for in ensembled form. Our method correlates the construction of prediction intervals with the uncertainty estimation. Impressive results on some publicly available datasets show that the performance of our method is competitive with other state-of-the-art methods.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Smart Feasibility Pump: Reinforcement Learning for (Mixed) Integer Programming
Authors:
Meng Qi,
Mengxin Wang,
Zuo-Jun Shen
Abstract:
In this work, we propose a deep reinforcement learning (DRL) model for finding a feasible solution for (mixed) integer programming (MIP) problems. Finding a feasible solution for MIP problems is critical because many successful heuristics rely on a known initial feasible solution. However, it is in general NP-hard. Inspired by the feasibility pump (FP), a well-known heuristic for searching feasibl…
▽ More
In this work, we propose a deep reinforcement learning (DRL) model for finding a feasible solution for (mixed) integer programming (MIP) problems. Finding a feasible solution for MIP problems is critical because many successful heuristics rely on a known initial feasible solution. However, it is in general NP-hard. Inspired by the feasibility pump (FP), a well-known heuristic for searching feasible MIP solutions, we develop a smart feasibility pump (SFP) method using DRL. In addition to multi-layer perception (MLP), we propose a novel convolution neural network (CNN) structure for the policy network to capture the hidden information of the constraint matrix of the MIP problem. Numerical experiments on various problem instances show that SFP significantly outperforms the classic FP in terms of the number of steps required to reach the first feasible solution. Moreover, the CNN structure works without the projection of the current solution as the input, which saves the computational effort at each step of the FP algorithms to find projections. This highlights the representational power of the CNN structure.
△ Less
Submitted 16 July, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Image deblurring based on lightweight multi-information fusion network
Authors:
Yanni Zhang,
Yiming Liu,
Qiang Li,
Miao Qi,
Dahong Xu,
Jun Kong,
Jianzhong Wang
Abstract:
Recently, deep learning based image deblurring has been well developed. However, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from high computational burden. To solve this problem, we propose a lightweight multiinformation fusion network (LMFN) for image deblurring. The proposed LMFN is designed…
▽ More
Recently, deep learning based image deblurring has been well developed. However, exploiting the detailed image features in a deep learning framework always requires a mass of parameters, which inevitably makes the network suffer from high computational burden. To solve this problem, we propose a lightweight multiinformation fusion network (LMFN) for image deblurring. The proposed LMFN is designed as an encoder-decoder architecture. In the encoding stage, the image feature is reduced to various smallscale spaces for multi-scale information extraction and fusion without a large amount of information loss. Then, a distillation network is used in the decoding stage, which allows the network benefit the most from residual learning while remaining sufficiently lightweight. Meanwhile, an information fusion strategy between distillation modules and feature channels is also carried out by attention mechanism. Through fusing different information in the proposed approach, our network can achieve state-of-the-art image deblurring result with smaller number of parameters and outperforms existing methods in model complexity.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
Unsupervised Domain Adaptation with Temporal-Consistent Self-Training for 3D Hand-Object Joint Reconstruction
Authors:
Mengshi Qi,
Edoardo Remelli,
Mathieu Salzmann,
Pascal Fua
Abstract:
Deep learning-solutions for hand-object 3D pose and shape estimation are now very effective when an annotated dataset is available to train them to handle the scenarios and lighting conditions they will encounter at test time. Unfortunately, this is not always the case, and one often has to resort to training them on synthetic data, which does not guarantee that they will work well in real situati…
▽ More
Deep learning-solutions for hand-object 3D pose and shape estimation are now very effective when an annotated dataset is available to train them to handle the scenarios and lighting conditions they will encounter at test time. Unfortunately, this is not always the case, and one often has to resort to training them on synthetic data, which does not guarantee that they will work well in real situations. In this paper, we introduce an effective approach to addressing this challenge by exploiting 3D geometric constraints within a cycle generative adversarial network (CycleGAN) to perform domain adaptation. Furthermore, in contrast to most existing works, which fail to leverage the rich temporal information available in unlabeled real videos as a source of supervision, we propose to enforce short- and long-term temporal consistency to fine-tune the domain-adapted model in a self-supervised fashion. We will demonstrate that our approach outperforms state-of-the-art 3D hand-object joint reconstruction methods on three widely-used benchmarks and will make our code publicly available.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Envisioning Device-to-Device Communications in 6G
Authors:
Shangwei Zhang,
Jiajia Liu,
Hongzhi Guo,
Mingping Qi,
Nei Kato
Abstract:
To fulfill the requirements of various emerging applications, the future sixth generation (6G) mobile network is expected to be an innately intelligent, highly dynamic, ultradense heterogeneous network that interconnects all things with extremely low-latency and high speed data transmission. It is believed that artificial intelligence (AI) will be the most innovative technique that can achieve int…
▽ More
To fulfill the requirements of various emerging applications, the future sixth generation (6G) mobile network is expected to be an innately intelligent, highly dynamic, ultradense heterogeneous network that interconnects all things with extremely low-latency and high speed data transmission. It is believed that artificial intelligence (AI) will be the most innovative technique that can achieve intelligent automated network operations, management and maintenance in future complex 6G networks. Driven by AI techniques, device-to-device (D2D) communication will be one of the pieces of the 6G jigsaw puzzle. To construct an efficient implementation of intelligent D2D in future 6G, we outline a number of potential D2D solutions associating with 6G in terms of mobile edge computing, network slicing, and Non-orthogonal multiple access (NOMA) cognitive Networking.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
Making Study Populations Visible through Knowledge Graphs
Authors:
Shruthi Chari,
Miao Qi,
Nkcheniyere N. Agu,
Oshani Seneviratne,
James P. McCusker,
Kristin P. Bennett,
Amar K. Das,
Deborah L. McGuinness
Abstract:
Treatment recommendations within Clinical Practice Guidelines (CPGs) are largely based on findings from clinical trials and case studies, referred to here as research studies, that are often based on highly selective clinical populations, referred to here as study cohorts. When medical practitioners apply CPG recommendations, they need to understand how well their patient population matches the ch…
▽ More
Treatment recommendations within Clinical Practice Guidelines (CPGs) are largely based on findings from clinical trials and case studies, referred to here as research studies, that are often based on highly selective clinical populations, referred to here as study cohorts. When medical practitioners apply CPG recommendations, they need to understand how well their patient population matches the characteristics of those in the study cohort, and thus are confronted with the challenges of locating the study cohort information and making an analytic comparison. To address these challenges, we develop an ontology-enabled prototype system, which exposes the population descriptions in research studies in a declarative manner, with the ultimate goal of allowing medical practitioners to better understand the applicability and generalizability of treatment recommendations. We build a Study Cohort Ontology (SCO) to encode the vocabulary of study population descriptions, that are often reported in the first table in the published work, thus they are often referred to as Table 1. We leverage the well-used Semanticscience Integrated Ontology (SIO) for defining property associations between classes. Further, we model the key components of Table 1s, i.e., collections of study subjects, subject characteristics, and statistical measures in RDF knowledge graphs. We design scenarios for medical practitioners to perform population analysis, and generate cohort similarity visualizations to determine the applicability of a study population to the clinical population of interest. Our semantic approach to make study populations visible, by standardized representations of Table 1s, allows users to quickly derive clinically relevant inferences about study populations.
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
New Kloosterman sum identities from the Helleseth-Zinoviev result on $ Z_{4}$-linear Goethals codes
Authors:
Minglong Qi,
Shengwu Xiong
Abstract:
In the paper of Tor Helleseth and Victor Zinoviev (Designs, Codes and Cryptography, \textbf{17}, 269-288(1999)), the number of solutions of the system of equations from $ Z_{4} $-linear Goethals codes $ G_{4} $ was determined and stated in Theorem 4. We found that Theorem 4 is wrong for $ m $ even. In this note, we complete Theorem 4, and present a series of new Kloosterman sum identities deduced…
▽ More
In the paper of Tor Helleseth and Victor Zinoviev (Designs, Codes and Cryptography, \textbf{17}, 269-288(1999)), the number of solutions of the system of equations from $ Z_{4} $-linear Goethals codes $ G_{4} $ was determined and stated in Theorem 4. We found that Theorem 4 is wrong for $ m $ even. In this note, we complete Theorem 4, and present a series of new Kloosterman sum identities deduced from Theorem 4. Moreover, we show that several previously established formulas on the Kloosterman sum identities can be rediscovered from Theorem 4 with much simpler proofs.
△ Less
Submitted 1 April, 2019;
originally announced April 2019.
-
A partial knowledge of friends of friends speeds social search
Authors:
Amr Elsisy,
Boleslaw K. Szymanski,
Jasmine A. Plum,
Miao Qi,
Alex Pentland
Abstract:
Milgram empirically showed that people knowing only connections to their friends could locate any person in the U.S. in a few steps. Later research showed that social network topology enables a node aware of its full routing to find an arbitrary target in even fewer steps. Yet, the success of people in forwarding efficiently knowing only personal connections is still not fully explained. To study…
▽ More
Milgram empirically showed that people knowing only connections to their friends could locate any person in the U.S. in a few steps. Later research showed that social network topology enables a node aware of its full routing to find an arbitrary target in even fewer steps. Yet, the success of people in forwarding efficiently knowing only personal connections is still not fully explained. To study this problem, we emulate it on a real location-based social network, Gowalla. It provides explicit information about friends and temporal locations of each user useful for studies of human mobility. Here, we use it to conduct a massive computational experiment to establish new necessary and sufficient conditions for achieving social search efficiency. The results demonstrate that only the distribution of friendship edges and the partial knowledge of friends of friends are essential and sufficient for the efficiency of social search. Surprisingly, the efficiency of the search using the original distribution of friendship edges is not dependent on how the nodes are distributed into space. Moreover, the effect of using a limited knowledge that each node possesses about friends of its friends is strongly nonlinear. We show that gains of such use grow statistically significantly only when this knowledge is limited to a small fraction of friends of friends.
△ Less
Submitted 20 August, 2021; v1 submitted 13 April, 2019;
originally announced April 2019.
-
Two classes of linear codes with a few weights based on twisted Kloosterman sums
Authors:
Minglong Qi,
Shengwu Xiong
Abstract:
Linear codes with a few weights have wide applications in information security, data storage systems, consuming electronics and communication systems. Construction of the linear codes with a few weights and determination of their parameters are an important research topic in coding theory. In this paper, we construct two classes of linear codes with a few weights and determine their complete weigh…
▽ More
Linear codes with a few weights have wide applications in information security, data storage systems, consuming electronics and communication systems. Construction of the linear codes with a few weights and determination of their parameters are an important research topic in coding theory. In this paper, we construct two classes of linear codes with a few weights and determine their complete weight enumerators based on twisted Kloosterman sums.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
Attentive Relational Networks for Mapping Images to Scene Graphs
Authors:
Mengshi Qi,
Weijian Li,
Zhengyuan Yang,
Yunhong Wang,
Jiebo Luo
Abstract:
Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships. Despite the recent success in object detection using deep learning techniques, inferring complex contextual relationships and structured graph representations from visual data remains a challenging…
▽ More
Scene graph generation refers to the task of automatically mapping an image into a semantic structural graph, which requires correctly labeling each extracted object and their interaction relationships. Despite the recent success in object detection using deep learning techniques, inferring complex contextual relationships and structured graph representations from visual data remains a challenging topic. In this study, we propose a novel Attentive Relational Network that consists of two key modules with an object detection backbone to approach this problem. The first module is a semantic transformation module utilized to capture semantic embedded relation features, by translating visual features and linguistic features into a common semantic space. The other module is a graph self-attention module introduced to embed a joint graph representation through assigning various importance weights to neighboring nodes. Finally, accurate scene graphs are produced by the relation inference module to recognize all entities and the corresponding relations. We evaluate our proposed method on the widely-adopted Visual Genome Dataset, and the results demonstrate the effectiveness and superiority of our model.
△ Less
Submitted 6 April, 2019; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Sequence-based Person Attribute Recognition with Joint CTC-Attention Model
Authors:
Hao Liu,
Jingjing Wu,
Jianguo Jiang,
Meibin Qi,
Bo Ren
Abstract:
Attribute recognition has become crucial because of its wide applications in many computer vision tasks, such as person re-identification. Like many object recognition problems, variations in viewpoints, illumination, and recognition at far distance, all make this task challenging. In this work, we propose a joint CTC-Attention model (JCM), which maps attribute labels into sequences to learn the s…
▽ More
Attribute recognition has become crucial because of its wide applications in many computer vision tasks, such as person re-identification. Like many object recognition problems, variations in viewpoints, illumination, and recognition at far distance, all make this task challenging. In this work, we propose a joint CTC-Attention model (JCM), which maps attribute labels into sequences to learn the semantic relationship among attributes. Besides, this network uses neural network to encode images into sequences, and employs connectionist temporal classification (CTC) loss to train the network with the aim of improving the encoding performance of the network. At the same time, it adopts the attention model to decode the sequences, which can realize aligning the sequences and better learning the semantic information from attributes. Extensive experiments on three public datasets, i.e., Market-1501 attribute dataset, Duke attribute dataset and PETA dataset, demonstrate the effectiveness of the proposed method.
△ Less
Submitted 27 November, 2018; v1 submitted 20 November, 2018;
originally announced November 2018.
-
On a Theorem of Kyureghyan and Pott
Authors:
Minglong Qi,
Shenwu Xiong
Abstract:
In the paper of Gohar M. Kyureghyan and Alexander Pott (Designs, Codes and Cryptography, 29, 149-164, 2003), the linear feedback polynomials of the Sidel'nikov-Lempel-Cohn-Eastman sequences were determined for some special cases. When referring to that paper, we found that Corollary 4 and Theorem 2 of that paper are wrong because there exist many counterexamples for these two results. In this note…
▽ More
In the paper of Gohar M. Kyureghyan and Alexander Pott (Designs, Codes and Cryptography, 29, 149-164, 2003), the linear feedback polynomials of the Sidel'nikov-Lempel-Cohn-Eastman sequences were determined for some special cases. When referring to that paper, we found that Corollary 4 and Theorem 2 of that paper are wrong because there exist many counterexamples for these two results. In this note, we give some counterexamples of Corollary 4 and Theorem 2 of that paper.
△ Less
Submitted 5 October, 2018;
originally announced October 2018.
-
On the complete weight enumerators of some linear codes with a few weights
Authors:
Minglong Qi,
Shengwu Xiong,
Jingling Yuan,
Wenbi Rao,
Luo Zhong
Abstract:
Linear codes with a few weights have important applications in authentication codes, secret sharing, consumer electronics, etc.. The determination of the parameters such as Hamming weight distributions and complete weight enumerators of linear codes are important research topics. In this paper, we consider some classes of linear codes with a few weights and determine the complete weight enumerator…
▽ More
Linear codes with a few weights have important applications in authentication codes, secret sharing, consumer electronics, etc.. The determination of the parameters such as Hamming weight distributions and complete weight enumerators of linear codes are important research topics. In this paper, we consider some classes of linear codes with a few weights and determine the complete weight enumerators from which the corresponding Hamming weight distributions are derived with help of some sums involving Legendre symbol.
△ Less
Submitted 13 November, 2017; v1 submitted 1 November, 2017;
originally announced November 2017.
-
On Some Exponential Sums Related to the Coulter's Polynomial
Authors:
Minglong Qi,
Shengwu Xiong,
Jingling Yuan,
Wenbi Rao,
Luo Zhong
Abstract:
In this paper, the formulas of some exponential sums over finite field, related to the Coulter's polynomial, are settled based on the Coulter's theorems on Weil sums, which may have potential application in the construction of linear codes with few weights.
In this paper, the formulas of some exponential sums over finite field, related to the Coulter's polynomial, are settled based on the Coulter's theorems on Weil sums, which may have potential application in the construction of linear codes with few weights.
△ Less
Submitted 30 July, 2017;
originally announced July 2017.
-
Neural Person Search Machines
Authors:
Hao Liu,
Jiashi Feng,
Zequn Jie,
Karlekar Jayashree,
Bo Zhao,
Meibin Qi,
Jianguo Jiang,
Shuicheng Yan
Abstract:
We investigate the problem of person search in the wild in this work. Instead of comparing the query against all candidate regions generated in a query-blind manner, we propose to recursively shrink the search area from the whole image till achieving precise localization of the target person, by fully exploiting information from the query and contextual cues in every recursive search step. We deve…
▽ More
We investigate the problem of person search in the wild in this work. Instead of comparing the query against all candidate regions generated in a query-blind manner, we propose to recursively shrink the search area from the whole image till achieving precise localization of the target person, by fully exploiting information from the query and contextual cues in every recursive search step. We develop the Neural Person Search Machines (NPSM) to implement such recursive localization for person search. Benefiting from its neural search mechanism, NPSM is able to selectively shrink its focus from a loose region to a tighter one containing the target automatically. In this process, NPSM employs an internal primitive memory component to memorize the query representation which modulates the attention and augments its robustness to other distracting regions. Evaluations on two benchmark datasets, CUHK-SYSU Person Search dataset and PRW dataset, have demonstrated that our method can outperform current state-of-the-arts in both mAP and top-1 evaluation protocols.
△ Less
Submitted 21 July, 2017;
originally announced July 2017.
-
On the Hamming Auto- and Cross-correlation Functions of a Class of Frequency Hopping Sequences of Length $ p^{n} $
Authors:
Minglong Qi,
Shenwu Xiong,
Jingling Yuan
Abstract:
In this paper, a new class of frequency hopping sequences (FHSs) of length $ p^{n} $ is constructed by using Ding-Helleseth generalized cyclotomic classes of order 2, of which the Hamming auto- and cross-correlation functions are investigated (for the Hamming cross-correlation, only the case $ p\equiv 3\pmod 4 $ is considered). It is shown that the set of the constructed FHSs is optimal with respe…
▽ More
In this paper, a new class of frequency hopping sequences (FHSs) of length $ p^{n} $ is constructed by using Ding-Helleseth generalized cyclotomic classes of order 2, of which the Hamming auto- and cross-correlation functions are investigated (for the Hamming cross-correlation, only the case $ p\equiv 3\pmod 4 $ is considered). It is shown that the set of the constructed FHSs is optimal with respect to the average Hamming correlation functions.
△ Less
Submitted 14 June, 2017;
originally announced June 2017.
-
Video-based Person Re-identification with Accumulative Motion Context
Authors:
Hao Liu,
Zequn Jie,
Karlekar Jayashree,
Meibin Qi,
Jianguo Jiang,
Shuicheng Yan,
Jiashi Feng
Abstract:
Video based person re-identification plays a central role in realistic security and video surveillance. In this paper we propose a novel Accumulative Motion Context (AMOC) network for addressing this important problem, which effectively exploits the long-range motion context for robustly identifying the same person under challenging conditions. Given a video sequence of the same or different perso…
▽ More
Video based person re-identification plays a central role in realistic security and video surveillance. In this paper we propose a novel Accumulative Motion Context (AMOC) network for addressing this important problem, which effectively exploits the long-range motion context for robustly identifying the same person under challenging conditions. Given a video sequence of the same or different persons, the proposed AMOC network jointly learns appearance representation and motion context from a collection of adjacent frames using a two-stream convolutional architecture. Then AMOC accumulates clues from motion context by recurrent aggregation, allowing effective information flow among adjacent frames and capturing dynamic gist of the persons. The architecture of AMOC is end-to-end trainable and thus motion context can be adapted to complement appearance clues under unfavorable conditions (e.g. occlusions). Extensive experiments are conduced on three public benchmark datasets, i.e., the iLIDS-VID, PRID-2011 and MARS datasets, to investigate the performance of AMOC. The experimental results demonstrate that the proposed AMOC network outperforms state-of-the-arts for video-based re-identification significantly and confirm the advantage of exploiting long-range motion context for video based person re-identification, validating our motivation evidently.
△ Less
Submitted 12 June, 2017; v1 submitted 31 December, 2016;
originally announced January 2017.