-
Learning to Retrieve for Job Matching
Authors:
Jianqiang Shen,
Yuchin Juan,
Shaobo Zhang,
Ping Liu,
Wen Pu,
Sriram Vasudevan,
Qingquan Song,
Fedor Borisyuk,
Kay Qianqi Shen,
Haichao Wei,
Yunxiang Ren,
Yeou S. Chiou,
Sicong Kuang,
Yuan Yin,
Ben Zheng,
Muchen Wu,
Shaghayegh Gharghabi,
Xiaoqing Wang,
Huichao Xue,
Qi Guo,
Daniel Hewlett,
Luke Simon,
Liangjie Hong,
Wenjing Zhang
Abstract:
Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we d…
▽ More
Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Authors:
Junge Zhang,
Feihu Zhang,
Shaochen Kuang,
Li Zhang
Abstract:
Labeling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LI…
▽ More
Labeling LiDAR point clouds for training autonomous driving is extremely expensive and difficult. LiDAR simulation aims at generating realistic LiDAR data with labels for training and verifying self-driving algorithms more efficiently. Recently, Neural Radiance Fields (NeRF) have been proposed for novel view synthesis using implicit reconstruction of 3D scenes. Inspired by this, we present NeRF-LIDAR, a novel LiDAR simulation method that leverages real-world information to generate realistic LIDAR point clouds. Different from existing LiDAR simulators, we use real images and point cloud data collected by self-driving cars to learn the 3D scene representation, point cloud generation and label rendering. We verify the effectiveness of our NeRF-LiDAR by training different 3D segmentation models on the generated LiDAR point clouds. It reveals that the trained models are able to achieve similar accuracy when compared with the same model trained on the real LiDAR data. Besides, the generated data is capable of boosting the accuracy through pre-training which helps reduce the requirements of the real labeled data.
△ Less
Submitted 20 January, 2024; v1 submitted 28 April, 2023;
originally announced April 2023.
-
Auxiliary Task-based Deep Reinforcement Learning for Quantum Control
Authors:
Shumin Zhou,
Hailan Ma,
Sen Kuang,
Daoyi Dong
Abstract:
Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement…
▽ More
Due to its property of not requiring prior knowledge of the environment, reinforcement learning has significant potential for quantum control problems. In this work, we investigate the effectiveness of continuous control policies based on deep deterministic policy gradient. To solve the sparse reward signal in quantum learning control problems, we propose an auxiliary task-based deep reinforcement learning (AT-DRL) for quantum control. In particular, we first design a guided reward function based on the fidelity of quantum states that enables incremental fidelity improvement. Then, we introduce the concept of an auxiliary task whose network shares parameters with the main network to predict the reward provided by the environment (called the main task). The auxiliary task learns synchronously with the main task, allowing one to select the most relevant features of the environment, thus aiding the agent in comprehending how to achieve the desired state. The numerical simulations demonstrate that the proposed AT-DRL can provide a solution to the sparse reward in quantum systems, and has great potential in designing control pulses that achieve efficient quantum state preparation.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
MSCDA: Multi-level Semantic-guided Contrast Improves Unsupervised Domain Adaptation for Breast MRI Segmentation in Small Datasets
Authors:
Sheng Kuang,
Henry C. Woodruff,
Renee Granzier,
Thiemo J. A. van Nijnatten,
Marc B. I. Lobbes,
Marjolein L. Smidt,
Philippe Lambin,
Siamak Mehrkanoon
Abstract:
Deep learning (DL) applied to breast tissue segmentation in magnetic resonance imaging (MRI) has received increased attention in the last decade, however, the domain shift which arises from different vendors, acquisition protocols, and biological heterogeneity, remains an important but challenging obstacle on the path towards clinical implementation. In this paper, we propose a novel Multi-level S…
▽ More
Deep learning (DL) applied to breast tissue segmentation in magnetic resonance imaging (MRI) has received increased attention in the last decade, however, the domain shift which arises from different vendors, acquisition protocols, and biological heterogeneity, remains an important but challenging obstacle on the path towards clinical implementation. In this paper, we propose a novel Multi-level Semantic-guided Contrastive Domain Adaptation (MSCDA) framework to address this issue in an unsupervised manner. Our approach incorporates self-training with contrastive learning to align feature representations between domains. In particular, we extend the contrastive loss by incorporating pixel-to-pixel, pixel-to-centroid, and centroid-to-centroid contrasts to better exploit the underlying semantic information of the image at different levels. To resolve the data imbalance problem, we utilize a category-wise cross-domain sampling strategy to sample anchors from target images and build a hybrid memory bank to store samples from source images. We have validated MSCDA with a challenging task of cross-domain breast MRI segmentation between datasets of healthy volunteers and invasive breast cancer patients. Extensive experiments show that MSCDA effectively improves the model's feature alignment capabilities between domains, outperforming state-of-the-art methods. Furthermore, the framework is shown to be label-efficient, achieving good performance with a smaller source dataset. The code is publicly available at \url{https://github.com/ShengKuangCN/MSCDA}.
△ Less
Submitted 8 June, 2023; v1 submitted 4 January, 2023;
originally announced January 2023.
-
Towards Stable Co-saliency Detection and Object Co-segmentation
Authors:
Bo Li,
Lv Tang,
Senyun Kuang,
Mofei Song,
Shouhong Ding
Abstract:
In this paper, we present a novel model for simultaneous stable co-saliency detection (CoSOD) and object co-segmentation (CoSEG). To detect co-saliency (segmentation) accurately, the core problem is to well model inter-image relations between an image group. Some methods design sophisticated modules, such as recurrent neural network (RNN), to address this problem. However, order-sensitive problem…
▽ More
In this paper, we present a novel model for simultaneous stable co-saliency detection (CoSOD) and object co-segmentation (CoSEG). To detect co-saliency (segmentation) accurately, the core problem is to well model inter-image relations between an image group. Some methods design sophisticated modules, such as recurrent neural network (RNN), to address this problem. However, order-sensitive problem is the major drawback of RNN, which heavily affects the stability of proposed CoSOD (CoSEG) model. In this paper, inspired by RNN-based model, we first propose a multi-path stable recurrent unit (MSRU), containing dummy orders mechanisms (DOM) and recurrent unit (RU). Our proposed MSRU not only helps CoSOD (CoSEG) model captures robust inter-image relations, but also reduces order-sensitivity, resulting in a more stable inference and training process. { Moreover, we design a cross-order contrastive loss (COCL) that can further address order-sensitive problem by pulling close the feature embedding generated from different input orders.} We validate our model on five widely used CoSOD datasets (CoCA, CoSOD3k, Cosal2015, iCoseg and MSRC), and three widely used datasets (Internet, iCoseg and PASCAL-VOC) for object co-segmentation, the performance demonstrates the superiority of the proposed approach as compared to the state-of-the-art (SOTA) methods.
△ Less
Submitted 1 October, 2022; v1 submitted 24 September, 2022;
originally announced September 2022.
-
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
Authors:
Sheng Kuang,
Kiki van der Heijden,
Siamak Mehrkanoon
Abstract:
Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predic…
▽ More
Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment.
△ Less
Submitted 8 July, 2022;
originally announced July 2022.
-
Spiral Language Modeling
Authors:
Yong Cao,
Yukun Feng,
Shaohui Kuang,
Gu Xu
Abstract:
In almost all text generation applications, word sequences are constructed in a left-to-right (L2R) or right-to-left (R2L) manner, as natural language sentences are written either L2R or R2L. However, we find that the natural language written order is not essential for text generation. In this paper, we propose Spiral Language Modeling (SLM), a general approach that enables one to construct natura…
▽ More
In almost all text generation applications, word sequences are constructed in a left-to-right (L2R) or right-to-left (R2L) manner, as natural language sentences are written either L2R or R2L. However, we find that the natural language written order is not essential for text generation. In this paper, we propose Spiral Language Modeling (SLM), a general approach that enables one to construct natural language sentences beyond the L2R and R2L order. SLM allows one to form natural language text by starting from an arbitrary token inside the result text and expanding the rest tokens around the selected ones. It makes the decoding order a new optimization objective besides the language model perplexity, which further improves the diversity and quality of the generated text. Furthermore, SLM makes it possible to manipulate the text construction process by selecting a proper starting token. SLM also introduces generation orderings as additional regularization to improve model robustness in low-resource scenarios. Experiments on 8 widely studied Neural Machine Translation (NMT) tasks show that SLM is constantly effective with up to 4.7 BLEU increase comparing to the conventional L2R decoding approach.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
AR: Auto-Repair the Synthetic Data for Neural Machine Translation
Authors:
Shanbo Cheng,
Shaohui Kuang,
Rongxiang Weng,
Heng Yu,
Changfeng Zhu,
Weihua Luo
Abstract:
Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality. However, as a well-known shortcoming, synthetic parallel data is noisy because they are ge…
▽ More
Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality. However, as a well-known shortcoming, synthetic parallel data is noisy because they are generated by an imperfect NMT system. As a result, the improvements in translation quality bring by the synthetic parallel data are greatly diminished. In this paper, we propose a novel Auto- Repair (AR) framework to improve the quality of synthetic data. Our proposed AR model can learn the transformation from low quality (noisy) input sentence to high quality sentence based on large scale monolingual data with BT and FT techniques. The noise in synthetic parallel data will be sufficiently eliminated by the proposed AR model and then the repaired synthetic parallel data can help the NMT models to achieve larger improvements. Experimental results show that our approach can effective improve the quality of synthetic parallel data and the NMT model with the repaired synthetic data achieves consistent improvements on both WMT14 EN!DE and IWSLT14 DE!EN translation tasks.
△ Less
Submitted 5 April, 2020;
originally announced April 2020.
-
Merging External Bilingual Pairs into Neural Machine Translation
Authors:
Tao Wang,
Shaohui Kuang,
Deyi Xiong,
António Branco
Abstract:
As neural machine translation (NMT) is not easily amenable to explicit correction of errors, incorporating pre-specified translations into NMT is widely regarded as a non-trivial challenge. In this paper, we propose and explore three methods to endow NMT with pre-specified bilingual pairs. Instead, for instance, of modifying the beam search algorithm during decoding or making complex modifications…
▽ More
As neural machine translation (NMT) is not easily amenable to explicit correction of errors, incorporating pre-specified translations into NMT is widely regarded as a non-trivial challenge. In this paper, we propose and explore three methods to endow NMT with pre-specified bilingual pairs. Instead, for instance, of modifying the beam search algorithm during decoding or making complex modifications to the attention mechanism --- mainstream approaches to tackling this challenge ---, we experiment with the training data being appropriately pre-processed to add information about pre-specified translations. Extra embeddings are also used to distinguish pre-specified tokens from the other tokens. Extensive experimentation and analysis indicate that over 99% of the pre-specified phrases are successfully translated (given a 85% baseline) and that there is also a substantive improvement in translation quality with the methods explored here.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
Learning to Reuse Translations: Guiding Neural Machine Translation with Examples
Authors:
Qian Cao,
Shaohui Kuang,
Deyi Xiong
Abstract:
In this paper, we study the problem of enabling neural machine translation (NMT) to reuse previous translations from similar examples in target prediction. Distinguishing reusable translations from noisy segments and learning to reuse them in NMT are non-trivial. To solve these challenges, we propose an Example-Guided NMT (EGNMT) framework with two models: (1) a noise-masked encoder model that mas…
▽ More
In this paper, we study the problem of enabling neural machine translation (NMT) to reuse previous translations from similar examples in target prediction. Distinguishing reusable translations from noisy segments and learning to reuse them in NMT are non-trivial. To solve these challenges, we propose an Example-Guided NMT (EGNMT) framework with two models: (1) a noise-masked encoder model that masks out noisy words according to word alignments and encodes the noise-masked sentences with an additional example encoder and (2) an auxiliary decoder model that predicts reusable words via an auxiliary decoder sharing parameters with the primary decoder. We define and implement the two models with the state-of-the-art Transformer. Experiments show that the noise-masked encoder model allows NMT to learn useful information from examples with low fuzzy match scores (FMS) while the auxiliary decoder model is good for high-FMS examples. More experiments on Chinese-English, English-German and English-Spanish translation demonstrate that the combination of the two EGNMT models can achieve improvements of up to +9 BLEU points over the baseline system and +7 BLEU points over a two-encoder Transformer.
△ Less
Submitted 27 November, 2019; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model
Authors:
Shaohui Kuang,
Deyi Xiong
Abstract:
Neural machine translation (NMT) systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring inter-sentence information. This may make the translation of a sentence ambiguous or even inconsistent with the translations of neighboring sentences. In order to handle this issue, we propose an inter-sentence gate model that uses the same encod…
▽ More
Neural machine translation (NMT) systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring inter-sentence information. This may make the translation of a sentence ambiguous or even inconsistent with the translations of neighboring sentences. In order to handle this issue, we propose an inter-sentence gate model that uses the same encoder to encode two adjacent sentences and controls the amount of information flowing from the preceding sentence to the translation of the current sentence with an inter-sentence gate. In this way, our proposed model can capture the connection between sentences and fuse recency from neighboring sentences into neural machine translation. On several NIST Chinese-English translation tasks, our experiments demonstrate that the proposed inter-sentence gate model achieves substantial improvements over the baseline.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.
-
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than Character Level
Authors:
Lifeng Han,
Shaohui Kuang
Abstract:
In neural machine translation (NMT), researchers face the challenge of un-seen (or out-of-vocabulary OOV) words translation. To solve this, some researchers propose the splitting of western languages such as English and German into sub-words or compounds. In this paper, we try to address this OOV issue and improve the NMT adequacy with a harder language Chinese whose characters are even more sophi…
▽ More
In neural machine translation (NMT), researchers face the challenge of un-seen (or out-of-vocabulary OOV) words translation. To solve this, some researchers propose the splitting of western languages such as English and German into sub-words or compounds. In this paper, we try to address this OOV issue and improve the NMT adequacy with a harder language Chinese whose characters are even more sophisticated in composition. We integrate the Chinese radicals into the NMT model with different settings to address the unseen words challenge in Chinese to English translation. On the other hand, this also can be considered as semantic part of the MT system since the Chinese radicals usually carry the essential meaning of the words they are constructed in. Meaningful radicals and new characters can be integrated into the NMT systems with our models. We use an attention-based NMT system as a strong baseline system. The experiments on standard Chinese-to-English NIST translation shared task data 2006 and 2008 show that our designed models outperform the baseline model in a wide range of state-of-the-art evaluation metrics including LEPOR, BEER, and CharacTER, in addition to BLEU and NIST scores, especially on the adequacy-level translation. We also have some interesting findings from the results of our various experiment settings about the performance of words and characters in Chinese NMT, which is different with other languages. For instance, the fully character level NMT may perform well or the state of the art in some other languages as researchers demonstrated recently, however, in the Chinese NMT model, word boundary knowledge is important for the model learning.
△ Less
Submitted 24 June, 2019; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches
Authors:
Shaohui Kuang,
Deyi Xiong,
Weihua Luo,
Guodong Zhou
Abstract:
Sentences in a well-formed text are connected to each other via various links to form the cohesive structure of the text. Current neural machine translation (NMT) systems translate a text in a conventional sentence-by-sentence fashion, ignoring such cross-sentence links and dependencies. This may lead to generate an incoherent target text for a coherent source text. In order to handle this issue,…
▽ More
Sentences in a well-formed text are connected to each other via various links to form the cohesive structure of the text. Current neural machine translation (NMT) systems translate a text in a conventional sentence-by-sentence fashion, ignoring such cross-sentence links and dependencies. This may lead to generate an incoherent target text for a coherent source text. In order to handle this issue, we propose a cache-based approach to modeling coherence for neural machine translation by capturing contextual information either from recently translated sentences or the entire document. Particularly, we explore two types of caches: a dynamic cache, which stores words from the best translation hypotheses of preceding sentences, and a topic cache, which maintains a set of target-side topical words that are semantically related to the document to be translated. On this basis, we build a new layer to score target words in these two caches with a cache-based neural model. Here the estimated probabilities from the cache-based neural model are combined with NMT probabilities into the final word prediction probabilities via a gating mechanism. Finally, the proposed cache-based neural model is trained jointly with NMT system in an end-to-end manner. Experiments and analysis presented in this paper demonstrate that the proposed cache-based model achieves substantial improvements over several state-of-the-art SMT and NMT baselines.
△ Less
Submitted 14 June, 2018; v1 submitted 29 November, 2017;
originally announced November 2017.
-
Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings
Authors:
Shaohui Kuang,
Junhui Li,
António Branco,
Weihua Luo,
Deyi Xiong
Abstract:
In neural machine translation, a source sequence of words is encoded into a vector from which a target sequence is generated in the decoding phase. Differently from statistical machine translation, the associations between source words and their possible target counterparts are not explicitly stored. Source and target words are at the two ends of a long information processing procedure, mediated b…
▽ More
In neural machine translation, a source sequence of words is encoded into a vector from which a target sequence is generated in the decoding phase. Differently from statistical machine translation, the associations between source words and their possible target counterparts are not explicitly stored. Source and target words are at the two ends of a long information processing procedure, mediated by hidden states at both the source encoding and the target decoding phases. This makes it possible that a source word is incorrectly translated into a target word that is not any of its admissible equivalent counterparts in the target language.
In this paper, we seek to somewhat shorten the distance between source and target words in that procedure, and thus strengthen their association, by means of a method we term bridging source and target word embeddings. We experiment with three strategies: (1) a source-side bridging model, where source word embeddings are moved one step closer to the output target sequence; (2) a target-side bridging model, which explores the more relevant source word embeddings for the prediction of the target sequence; and (3) a direct bridging model, which directly connects source and target word embeddings seeking to minimize errors in the translation of ones by the others.
Experiments and analysis presented in this paper demonstrate that the proposed bridging models are able to significantly improve quality of both sentence translation, in general, and alignment and translation of individual source words with target words, in particular.
△ Less
Submitted 10 May, 2018; v1 submitted 14 November, 2017;
originally announced November 2017.
-
Random Caching in Backhaul-Limited Multi-Antenna Networks: Analysis and Area Spectrum Efficiency Optimization
Authors:
Sufeng Kuang,
Nan Liu
Abstract:
Caching at base stations is a promising technology to satisfy the increasing capacity requirements and reduce the backhaul loads in future wireless networks. Careful design of random caching can fully exploit the file popularity and achieve good performance. However, previous works on random caching scheme usually assumed single antenna at BSs and users, which is not the case in practical multi-an…
▽ More
Caching at base stations is a promising technology to satisfy the increasing capacity requirements and reduce the backhaul loads in future wireless networks. Careful design of random caching can fully exploit the file popularity and achieve good performance. However, previous works on random caching scheme usually assumed single antenna at BSs and users, which is not the case in practical multi-antenna networks. In this paper, we consider the analysis and optimization in the cache-enabled multi-antenna networks with limited backhaul. We first derive a closed-form expression and a simple tight upper bound of the successful transmission probability, using tools from stochastic geometry and a gamma approximation. Based on the analytic results, we then consider the area spectrum efficiency maximization by optimizing design parameters, which is a complicated mixed-integer optimization problem. After analyzing the optimal properties, we obtain a local optimal solution with lower complexity. To further simplify the optimization, we then solve an asymptotic optimization problem in the high user density region, using the upper bound as the objective function. Numerical simulations show that the asymptotic optimal caching scheme achieves better performance over existing caching schemes. The analysis and optimization results provide insightful design guidelines for random caching in practical networks.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.