Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–4 of 4 results for author: Gressmann, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2108.06277  [pdf, other

    cs.CL cs.LG

    Towards Structured Dynamic Sparse Pre-Training of BERT

    Authors: Anastasia Dietrich, Frithjof Gressmann, Douglas Orr, Ivan Chelombiev, Daniel Justus, Carlo Luschi

    Abstract: Identifying algorithms for computational efficient unsupervised training of large language models is an important and active area of research. In this work, we develop and study a straightforward, dynamic always-sparse pre-training approach for BERT language modeling task, which leverages periodic compression steps based on magnitude pruning followed by random parameter re-allocation. This approac… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

  2. arXiv:2106.05822  [pdf, other

    cs.CL cs.LG

    GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

    Authors: Ivan Chelombiev, Daniel Justus, Douglas Orr, Anastasia Dietrich, Frithjof Gressmann, Alexandros Koliousis, Carlo Luschi

    Abstract: Attention based language models have become a critical component in state-of-the-art natural language processing systems. However, these models have significant computational requirements, due to long training times, dense operations and large parameter count. In this work we demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. First,… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  3. arXiv:2011.04720  [pdf, other

    cs.LG cs.NE stat.ML

    Improving Neural Network Training in Low Dimensional Random Bases

    Authors: Frithjof Gressmann, Zach Eaton-Rosen, Carlo Luschi

    Abstract: Stochastic Gradient Descent (SGD) has proven to be remarkably effective in optimizing deep neural networks that employ ever-larger numbers of parameters. Yet, improving the efficiency of large-scale optimization remains a vital and highly active area of research. Recent work has shown that deep neural networks can be optimized in randomly-projected subspaces of much smaller dimensionality than the… ▽ More

    Submitted 9 November, 2020; originally announced November 2020.

    Comments: Published in NeurIPS 2020

  4. arXiv:1801.00753  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Probabilistic supervised learning

    Authors: Frithjof Gressmann, Franz J. Király, Bilal Mateen, Harald Oberhauser

    Abstract: Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a genera… ▽ More

    Submitted 7 May, 2019; v1 submitted 2 January, 2018; originally announced January 2018.