Deep learning techniques have been increasingly applied to the natural sciences, e.g., for proper... more Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photo...
2021 IEEE International Conference on Big Data (Big Data)
Meteorological applications such as precipitation nowcasting, synthetic radar generation, statist... more Meteorological applications such as precipitation nowcasting, synthetic radar generation, statistical downscaling and others have benefited from deep learning (DL) approaches, however several challenges remain for widespread adaptation of these complex models in operational systems. One of these challenges is adequate generalizability; deep learning models trained from datasets collected in specific contexts should not be expected to perform as well when applied to different contexts required by large operational systems. One obvious mitigation for this is to collect massive amounts of training data that cover all expected meteorological contexts, however this is not only costly and difficult to manage, but is also not possible in many parts of the globe where certain sensing platforms are sparse. In this paper, we describe an application of transfer learning to perform domain transfer for deep learning models. We demonstrate a transfer learning algorithm called weight superposition to adapt a Convolutional Neural Network trained in a source context to a new target context. Weight superposition is a method for storing multiple models within a single set of parameters thus greatly simplifying model maintenance and training. This approach also addresses the issue of catastrophic forgetting where a model, once adapted to a new context, performs poorly in the original context. We apply weight superposition to the problem of synthetic weather radar generation and show that in scenarios where the target context has less data, a model adapted with weight superposition is better at maintaining performance when compared to simpler methods. Conversely, the simple adapted model performs better on the source context when the source and target contexts have comparable amounts of data.
2010 Mathematics Subject Classification: Primary: 05C81. Secondary: 60G50.We consider self-avoidi... more 2010 Mathematics Subject Classification: Primary: 05C81. Secondary: 60G50.We consider self-avoiding walks on the square grid graph. More precisely we investigate the number of walks of a fixed length on Z×{-1,0,1}. Using combinatorial arguments we derive the related generating function. We present the asymptotic estimates of the number of walks in consideration, as well as important connective constants
Румен Руменов Данговски, Калина Христова Петрова - Разглеждаме броя на несамопресичащите се разхо... more Румен Руменов Данговски, Калина Христова Петрова - Разглеждаме броя на несамопресичащите се разходки с фиксирана дължина върху целочислената решетка. Завършваме анализа върху случая за лента, с дължина едно. Чрез комбинаторни аргументи получаваме точна формула за броя на разходките върху лента, ограничена отляво и отдясно. Формулата я изследваме и асимптотично.We examine the number of self-avoiding walks with a fixed length on the square grid graph and more specifically we complete the analysis of the lattice strip of height one. By combinatorial arguments we get an exact formula for the number of self-avoiding walks on a restricted to the left and to the right lattice strip. We investigate the formula asymptotically as well. *2010 Mathematics Subject Classification: Primary: 52A40
A key factor in the modern success of deep learning is the astonishing expressive power of neural... more A key factor in the modern success of deep learning is the astonishing expressive power of neural networks. However, this comes at the cost of complex, black-boxed models that are unable to extrapolate beyond the domain of the training dataset, conflicting with goals of expressing physical laws or building human-readable programs. In this paper, we introduce OccamNet, a neural network model that can find interpretable, compact and sparse solutions for fitting data, a la Occam's razor. Our model defines a probability distribution over a non-differentiable function space, and we introduce an optimization method that samples functions and updates the weights based on cross-entropy matching in an evolutionary strategy: we train by biasing the probability mass towards better fitting solutions. We demonstrate that we can fit a variety of algorithms, ranging from simple analytic functions through recursive programs to even simple image classification. Our method takes minimal memory fo...
The attention mechanism is a key component of the neural revolution in Natural Language Processin... more The attention mechanism is a key component of the neural revolution in Natural Language Processing (NLP). As the size of attention-based models has been scaling with the available computational resources, a number of pruning techniques have been developed to detect and to exploit sparseness in such models in order to make them more efficient. The majority of such efforts have focused on looking for attention patterns and then hard-coding them to achieve sparseness, or pruning the weights of the attention mechanisms based on statistical information from the training data. In this paper, we marry these two lines of research by proposing Attention Pruning (AP): a novel pruning framework that collects observations about the attention patterns in a fixed dataset and then induces a global sparseness mask for the model. Through attention pruning, we find that about 90% of the attention computation can be reduced for language modelling and about 50% for machine translation and %natural lang...
The concepts of unitary evolution matrices and associative memory have boosted the field of Recur... more The concepts of unitary evolution matrices and associative memory have boosted the field of Recurrent Neural Networks (RNN) to state-of-the-art performance in a variety of sequential tasks. However, RNN still have a limited capacity to manipulate long-term memory. To bypass this weakness the most successful applications of RNN use external techniques such as attention mechanisms. In this paper we propose a novel RNN model that unifies the state-of-the-art approaches: Rotational Unit of Memory (RUM). The core of RUM is its rotational operation, which is, naturally, a unitary matrix, providing architectures with the power to learn long-term dependencies by overcoming the vanishing and exploding gradients problem. Moreover, the rotational unit also serves as associative memory. We evaluate our model on synthetic memorization, question answering and language modeling tasks. RUM learns the Copying Memory task completely and improves the state-of-the-art result in the Recall task. RUM'...
The emergence of unsupervised word embeddings, pre-trained on very large monolingual text corpora... more The emergence of unsupervised word embeddings, pre-trained on very large monolingual text corpora, is at the core of the ongoing neural revolution in Natural Language Processing (NLP). Initially introduced for English, such pre-trained word embeddings quickly emerged for a number of other languages. Subsequently, there have been a number of attempts to align the embedding spaces across languages, which could enable a number of cross-language NLP applications. Performing the alignment using unsupervised cross-lingual learning (UCL) is especially attractive as it requires little data and often rivals supervised and semi-supervised approaches. Here, we analyze popular methods for UCL and we find that often their objectives are, intrinsically, versions of the Wasserstein-Procrustes problem. Hence, we devise an approach to solve Wasserstein-Procrustes in a direct way, which can be used to refine and to improve popular UCL methods such as iterative closest point (ICP), multilingual unsupe...
Deep learning techniques have been increasingly applied to the natural sciences, e.g., for proper... more Deep learning techniques have been increasingly applied to the natural sciences, e.g., for property prediction and optimization or material discovery. A fundamental ingredient of such approaches is the vast quantity of labeled data needed to train the model. This poses severe challenges in data-scarce settings where obtaining labels requires substantial computational or labor resources. Noting that problems in natural sciences often benefit from easily obtainable auxiliary information sources, we introduce surrogate- and invariance-boosted contrastive learning (SIB-CL), a deep learning framework which incorporates three inexpensive and easily obtainable auxiliary information sources to overcome data scarcity. Specifically, these are: abundant unlabeled data, prior knowledge of symmetries or invariances, and surrogate data obtained at near-zero cost. We demonstrate SIB-CL’s effectiveness and generality on various scientific problems, e.g., predicting the density-of-states of 2D photo...
2021 IEEE International Conference on Big Data (Big Data)
Meteorological applications such as precipitation nowcasting, synthetic radar generation, statist... more Meteorological applications such as precipitation nowcasting, synthetic radar generation, statistical downscaling and others have benefited from deep learning (DL) approaches, however several challenges remain for widespread adaptation of these complex models in operational systems. One of these challenges is adequate generalizability; deep learning models trained from datasets collected in specific contexts should not be expected to perform as well when applied to different contexts required by large operational systems. One obvious mitigation for this is to collect massive amounts of training data that cover all expected meteorological contexts, however this is not only costly and difficult to manage, but is also not possible in many parts of the globe where certain sensing platforms are sparse. In this paper, we describe an application of transfer learning to perform domain transfer for deep learning models. We demonstrate a transfer learning algorithm called weight superposition to adapt a Convolutional Neural Network trained in a source context to a new target context. Weight superposition is a method for storing multiple models within a single set of parameters thus greatly simplifying model maintenance and training. This approach also addresses the issue of catastrophic forgetting where a model, once adapted to a new context, performs poorly in the original context. We apply weight superposition to the problem of synthetic weather radar generation and show that in scenarios where the target context has less data, a model adapted with weight superposition is better at maintaining performance when compared to simpler methods. Conversely, the simple adapted model performs better on the source context when the source and target contexts have comparable amounts of data.
2010 Mathematics Subject Classification: Primary: 05C81. Secondary: 60G50.We consider self-avoidi... more 2010 Mathematics Subject Classification: Primary: 05C81. Secondary: 60G50.We consider self-avoiding walks on the square grid graph. More precisely we investigate the number of walks of a fixed length on Z×{-1,0,1}. Using combinatorial arguments we derive the related generating function. We present the asymptotic estimates of the number of walks in consideration, as well as important connective constants
Румен Руменов Данговски, Калина Христова Петрова - Разглеждаме броя на несамопресичащите се разхо... more Румен Руменов Данговски, Калина Христова Петрова - Разглеждаме броя на несамопресичащите се разходки с фиксирана дължина върху целочислената решетка. Завършваме анализа върху случая за лента, с дължина едно. Чрез комбинаторни аргументи получаваме точна формула за броя на разходките върху лента, ограничена отляво и отдясно. Формулата я изследваме и асимптотично.We examine the number of self-avoiding walks with a fixed length on the square grid graph and more specifically we complete the analysis of the lattice strip of height one. By combinatorial arguments we get an exact formula for the number of self-avoiding walks on a restricted to the left and to the right lattice strip. We investigate the formula asymptotically as well. *2010 Mathematics Subject Classification: Primary: 52A40
A key factor in the modern success of deep learning is the astonishing expressive power of neural... more A key factor in the modern success of deep learning is the astonishing expressive power of neural networks. However, this comes at the cost of complex, black-boxed models that are unable to extrapolate beyond the domain of the training dataset, conflicting with goals of expressing physical laws or building human-readable programs. In this paper, we introduce OccamNet, a neural network model that can find interpretable, compact and sparse solutions for fitting data, a la Occam's razor. Our model defines a probability distribution over a non-differentiable function space, and we introduce an optimization method that samples functions and updates the weights based on cross-entropy matching in an evolutionary strategy: we train by biasing the probability mass towards better fitting solutions. We demonstrate that we can fit a variety of algorithms, ranging from simple analytic functions through recursive programs to even simple image classification. Our method takes minimal memory fo...
The attention mechanism is a key component of the neural revolution in Natural Language Processin... more The attention mechanism is a key component of the neural revolution in Natural Language Processing (NLP). As the size of attention-based models has been scaling with the available computational resources, a number of pruning techniques have been developed to detect and to exploit sparseness in such models in order to make them more efficient. The majority of such efforts have focused on looking for attention patterns and then hard-coding them to achieve sparseness, or pruning the weights of the attention mechanisms based on statistical information from the training data. In this paper, we marry these two lines of research by proposing Attention Pruning (AP): a novel pruning framework that collects observations about the attention patterns in a fixed dataset and then induces a global sparseness mask for the model. Through attention pruning, we find that about 90% of the attention computation can be reduced for language modelling and about 50% for machine translation and %natural lang...
The concepts of unitary evolution matrices and associative memory have boosted the field of Recur... more The concepts of unitary evolution matrices and associative memory have boosted the field of Recurrent Neural Networks (RNN) to state-of-the-art performance in a variety of sequential tasks. However, RNN still have a limited capacity to manipulate long-term memory. To bypass this weakness the most successful applications of RNN use external techniques such as attention mechanisms. In this paper we propose a novel RNN model that unifies the state-of-the-art approaches: Rotational Unit of Memory (RUM). The core of RUM is its rotational operation, which is, naturally, a unitary matrix, providing architectures with the power to learn long-term dependencies by overcoming the vanishing and exploding gradients problem. Moreover, the rotational unit also serves as associative memory. We evaluate our model on synthetic memorization, question answering and language modeling tasks. RUM learns the Copying Memory task completely and improves the state-of-the-art result in the Recall task. RUM'...
The emergence of unsupervised word embeddings, pre-trained on very large monolingual text corpora... more The emergence of unsupervised word embeddings, pre-trained on very large monolingual text corpora, is at the core of the ongoing neural revolution in Natural Language Processing (NLP). Initially introduced for English, such pre-trained word embeddings quickly emerged for a number of other languages. Subsequently, there have been a number of attempts to align the embedding spaces across languages, which could enable a number of cross-language NLP applications. Performing the alignment using unsupervised cross-lingual learning (UCL) is especially attractive as it requires little data and often rivals supervised and semi-supervised approaches. Here, we analyze popular methods for UCL and we find that often their objectives are, intrinsically, versions of the Wasserstein-Procrustes problem. Hence, we devise an approach to solve Wasserstein-Procrustes in a direct way, which can be used to refine and to improve popular UCL methods such as iterative closest point (ICP), multilingual unsupe...
Uploads
Papers by Rumen Dangovski