Pfeiffer, Jonas (2023)
Modular and Parameter-efficient Fine-tuning of Language Models.
Technische Universität Darmstadt
doi: 10.26083/tuprints-00024565
Ph.D. Thesis, Primary publication, Publisher's Version
Text
PhD_thesis_Jonas (9).pdf Copyright Information: CC BY-SA 4.0 International - Creative Commons, Attribution ShareAlike. Download (19MB) |
Item Type: | Ph.D. Thesis | ||||
---|---|---|---|---|---|
Type of entry: | Primary publication | ||||
Title: | Modular and Parameter-efficient Fine-tuning of Language Models | ||||
Language: | English | ||||
Referees: | Gurevych, Prof. Dr. Iryna ; Glavaš, Prof. Dr. Goran ; Vulić, Prof. Dr. Ivan | ||||
Date: | 7 November 2023 | ||||
Place of Publication: | Darmstadt | ||||
Collation: | xiv, 164 Seiten | ||||
Date of oral examination: | 21 April 2023 | ||||
DOI: | 10.26083/tuprints-00024565 | ||||
Abstract: | Transfer learning has recently become the dominant paradigm of natural language processing. Models pre-trained on unlabeled data can be fine-tuned for downstream tasks based on only a handful of examples. A long-term goal is to develop models that acquire new information at scale without incurring negative transfer and that generalize systematically to new settings. Modular deep learning has emerged as a promising solution to these challenges, by updating parameter-efficient units of computation locally and asynchronously. These units are often implemented as modules that are interlaid between layers, interpolated with pre-trained parameters, or concatenated to the inputs. Conditioned on tasks or examples, information is routed to multiple modules through a fixed or learned function, followed by an aggregation of their outputs. This property enables compositional generalization, by disentangling knowledge and recombining it in new ways. In this thesis, we provide a unified view of modularity in natural language processing, spanning across four dimensions; specifically, we disentangle modularity into computation functions, routing functions, aggregation functions, and the training setting. Along those axes, we propose multiple contributions: a research framework which encompasses all dimensions; a novel attention-based aggregation function which combines the knowledge stored within different modules; routing mechanisms for out of distribution generalization in cross-lingual transfer scenarios; a dataset and modular training strategies for multimodal and multilingual transfer learning; a modular pre-training strategy to tackle catastrophic interference of heterogeneous data. |
||||
Alternative Abstract: |
|
||||
Status: | Publisher's Version | ||||
URN: | urn:nbn:de:tuda-tuprints-245651 | ||||
Classification DDC: | 000 Generalities, computers, information > 004 Computer science | ||||
Divisions: | 20 Department of Computer Science > Ubiquitous Knowledge Processing | ||||
TU-Projects: | HMWK|LOEWE|emergenC TP Gurevych | ||||
Date Deposited: | 07 Nov 2023 15:38 | ||||
Last Modified: | 21 Nov 2023 09:04 | ||||
URI: | https://tuprints.ulb.tu-darmstadt.de/id/eprint/24565 | ||||
PPN: | 513349545 | ||||
Export: |
View Item |