Oswaldo Ludwig holds a Ph.D. degree in Electrical and Computer Engineering from the University of Coimbra, Portugal, and a postdoctoral fellowship in the Department of Computer Science of the KU Leuven, Belgium. His research interests include Machine Learning with applications in Natural Language Processing, Automatic Speech Recognition and Computer Vision.
Wav2vec2 self-supervised multilingual training learns speech units common to multiple languages, ... more Wav2vec2 self-supervised multilingual training learns speech units common to multiple languages, leading to better generalization capacity. However, Wav2vec2 is larger than other E2E ASR models such as the Conformer ASR. Therefore, the objective of this work is to reduce the Wav2vec footprint by pruning lines from the intermediate dense layers of the encoder block, since they represent about two thirds of the encoder parameters. We apply Genetic Algorithms (GA) to solve the combinatorial optimization problem associated with pruning, which means running many copies of the Wav2vec2 decoder in parallel using multiprocessing on a computer grid, so an effort was made to optimize the GA for good performance with few CPUs. The experiments show a small absolute word error rate damage of 0.21% (1.26% relative) for a pruning of 40% and compare this value with those of the usual L1-norm pruning and model restructuring by singular value decomposition.
This paper introduces a new biologically-inspired training method named Continual Learning throug... more This paper introduces a new biologically-inspired training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP). CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation (LTP). The first principle is a decay rate over the weight adjustment, which is implemented as a generalization of the AdaGrad optimization algorithm. This means that weights that have received many updates should have lower learning rates as they likely encode important information about previously seen data. However, this principle results in a diffuse distribution of updates throughout the model, as it promotes updates for weights that haven't been previously updated, while a sparse update distribution is preferred to leave weights unassigned for future tasks. Therefore, the second principle introduces a threshold on the loss gradient. This promotes sparse learning by updating a weight only if the loss gradient with respect to that weight is above a certain threshold, i.e. only updating weights with a significant impact on the current loss. Both principles reflect phenomena observed in LTP, where a threshold effect and a gradual saturation of potentiation have been observed. CLASSP is implemented in a Python/PyTorch class, making it applicable to any model. When compared with Elastic Weight Consolidation (EWC) using Computer Vision datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
A method for maximizing mutual information between segment representations and the generated sequ... more A method for maximizing mutual information between segment representations and the generated sequence of phonemes for unsupervised speech recognition using GAN for better control over the inclusion of unrelated textual information in the transcription, allowing experiments with deeper generators.
This paper presents a new adversarial learning method for generative conversational agents (GCA) ... more This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations s...
This repository presents a new adversarial learning method for generative conversational agents (... more This repository presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Moreover, the adversarial training also yields a trained discriminator that can be used to select the best answer, when dierent mod...
This paper presents a new adversarial learning method for generative conversational agents (GCA) ... more This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations s...
2008 11th International IEEE Conference on Intelligent Transportation Systems, 2008
ABSTRACT In this paper a multilayer feedforward neural network based approach for vehicle detecti... more ABSTRACT In this paper a multilayer feedforward neural network based approach for vehicle detection is proposed. The main idea is to use such network to perform both feature extraction and classification. This simplicity enables real time applications. In order to achieve such capabilities, the network is trained by a new algorithm, proposed in this paper, named minimization of inter-class interference (MCI). Such algorithm aims to create a hidden space (i.e. feature space) where the patterns have a desirable statistical distribution. Regarding the neural architecture, the linear output layer is replaced by the Mahalanobis kernel, in order to improve generalization. Experiments are performed by means of a dataset that includes two standard datasets from Caltech car rear. Finally, disturbed images are used, in order to evaluate the robustness of the neural-network based vehicle detection. The proposed method reveals low miss rate, low false alarm rate and high area under ROC curve. In Matlab environment, the algorithm spends only 3.280e-4 seconds per image. These facts encourage this research line.
This paper extends our previous work on regularization of neural networks using Eigenvalue Decay ... more This paper extends our previous work on regularization of neural networks using Eigenvalue Decay by employing a soft approximation of the dominant eigenvalue in order to enable the calculation of its derivatives in relation to the synaptic weights, and therefore the application of back-propagation, which is a primary demand for deep learning. Moreover, we extend our previous theoretical analysis to deep neural networks and multiclass classification problems. Our method is implemented as an additional regularizer in Keras, a modular neural networks library written in Python, and evaluated in the benchmark data sets Reuters Newswire Topics Classification, IMDB database for binary sentiment classification, MNIST database of handwritten digits and CIFAR-10 data set for image classification.
This paper presents a new adversarial learning method for generative conversational agents (GCA) ... more This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations show that the adversarial method yields significant performance gains over the usual teacher forcing training.
Wav2vec2 self-supervised multilingual training learns speech units common to multiple languages, ... more Wav2vec2 self-supervised multilingual training learns speech units common to multiple languages, leading to better generalization capacity. However, Wav2vec2 is larger than other E2E ASR models such as the Conformer ASR. Therefore, the objective of this work is to reduce the Wav2vec footprint by pruning lines from the intermediate dense layers of the encoder block, since they represent about two thirds of the encoder parameters. We apply Genetic Algorithms (GA) to solve the combinatorial optimization problem associated with pruning, which means running many copies of the Wav2vec2 decoder in parallel using multiprocessing on a computer grid, so an effort was made to optimize the GA for good performance with few CPUs. The experiments show a small absolute word error rate damage of 0.21% (1.26% relative) for a pruning of 40% and compare this value with those of the usual L1-norm pruning and model restructuring by singular value decomposition.
This paper introduces a new biologically-inspired training method named Continual Learning throug... more This paper introduces a new biologically-inspired training method named Continual Learning through Adjustment Suppression and Sparsity Promotion (CLASSP). CLASSP is based on two main principles observed in neuroscience, particularly in the context of synaptic transmission and Long-Term Potentiation (LTP). The first principle is a decay rate over the weight adjustment, which is implemented as a generalization of the AdaGrad optimization algorithm. This means that weights that have received many updates should have lower learning rates as they likely encode important information about previously seen data. However, this principle results in a diffuse distribution of updates throughout the model, as it promotes updates for weights that haven't been previously updated, while a sparse update distribution is preferred to leave weights unassigned for future tasks. Therefore, the second principle introduces a threshold on the loss gradient. This promotes sparse learning by updating a weight only if the loss gradient with respect to that weight is above a certain threshold, i.e. only updating weights with a significant impact on the current loss. Both principles reflect phenomena observed in LTP, where a threshold effect and a gradual saturation of potentiation have been observed. CLASSP is implemented in a Python/PyTorch class, making it applicable to any model. When compared with Elastic Weight Consolidation (EWC) using Computer Vision datasets, CLASSP demonstrates superior performance in terms of accuracy and memory footprint.
A method for maximizing mutual information between segment representations and the generated sequ... more A method for maximizing mutual information between segment representations and the generated sequence of phonemes for unsupervised speech recognition using GAN for better control over the inclusion of unrelated textual information in the transcription, allowing experiments with deeper generators.
This paper presents a new adversarial learning method for generative conversational agents (GCA) ... more This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations s...
This repository presents a new adversarial learning method for generative conversational agents (... more This repository presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Moreover, the adversarial training also yields a trained discriminator that can be used to select the best answer, when dierent mod...
This paper presents a new adversarial learning method for generative conversational agents (GCA) ... more This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations s...
2008 11th International IEEE Conference on Intelligent Transportation Systems, 2008
ABSTRACT In this paper a multilayer feedforward neural network based approach for vehicle detecti... more ABSTRACT In this paper a multilayer feedforward neural network based approach for vehicle detection is proposed. The main idea is to use such network to perform both feature extraction and classification. This simplicity enables real time applications. In order to achieve such capabilities, the network is trained by a new algorithm, proposed in this paper, named minimization of inter-class interference (MCI). Such algorithm aims to create a hidden space (i.e. feature space) where the patterns have a desirable statistical distribution. Regarding the neural architecture, the linear output layer is replaced by the Mahalanobis kernel, in order to improve generalization. Experiments are performed by means of a dataset that includes two standard datasets from Caltech car rear. Finally, disturbed images are used, in order to evaluate the robustness of the neural-network based vehicle detection. The proposed method reveals low miss rate, low false alarm rate and high area under ROC curve. In Matlab environment, the algorithm spends only 3.280e-4 seconds per image. These facts encourage this research line.
This paper extends our previous work on regularization of neural networks using Eigenvalue Decay ... more This paper extends our previous work on regularization of neural networks using Eigenvalue Decay by employing a soft approximation of the dominant eigenvalue in order to enable the calculation of its derivatives in relation to the synaptic weights, and therefore the application of back-propagation, which is a primary demand for deep learning. Moreover, we extend our previous theoretical analysis to deep neural networks and multiclass classification problems. Our method is implemented as an additional regularizer in Keras, a modular neural networks library written in Python, and evaluated in the benchmark data sets Reuters Newswire Topics Classification, IMDB database for binary sentiment classification, MNIST database of handwritten digits and CIFAR-10 data set for image classification.
This paper presents a new adversarial learning method for generative conversational agents (GCA) ... more This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations show that the adversarial method yields significant performance gains over the usual teacher forcing training.
A method for maximizing mutual information between segment representations and the generated sequ... more A method for maximizing mutual information between segment representations and the generated sequence of phonemes for unsupervised speech recognition using GAN for better control over the inclusion of unrelated textual information in the transcription, allowing experiments with deeper generators.
This position paper addresses hierarchical multi-task learning (HMTL) in the seq2seq context, i.e... more This position paper addresses hierarchical multi-task learning (HMTL) in the seq2seq context, i.e. a specific type of multi-task learning in which tasks are hierarchically related in terms of level of abstraction. As an example of HMTL, we can mention the pipeline of end2end Automatic Speech Recognition (ASR) and Natural Language Understanding (NLU) tasks, in which the ASR model takes waveforms as input and generates the corresponding textual transcription, while the NLU model learns to map this textual information to a structured representation according to the context/application. Back-propagating errors in a cascade of seq2seq models isn't simple, due to the sampling operations performed during decoding, which aren't differentiable, requiring approximations for the argmax operation. Therefore, this document proposes a training method in which each seq2seq block is trained separately, but aware of the errors of the next seq2seq block(s).
Uploads
Papers by Oswaldo Ludwig