Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they recor... more Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a...
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to die... more Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification perfor...
ABSTRACTGermline variations in immunoglobulin genes influence the repertoire of B cell receptors ... more ABSTRACTGermline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-region in the 5’UTR, leader 1, and leader 2 sequences, and found that identical V-region alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-region but also in the upstream sequences of IGHV genes. Our findings challenge current approaches used for annotating immunoglobulin repertoire sequencing data.
DNA methylation and demethylation at cytosine residues are epigenetic modifications that regulate... more DNA methylation and demethylation at cytosine residues are epigenetic modifications that regulate gene expression associated with early cell development, somatic cell differentiation, cellular reprogramming and malignant transformation. While the process of DNA methylation and maintenance by DNA methyltransferases is well described, the nature of DNA demethylation remains poorly understood. The current model of DNA demethylation proposes modification of 5-methylcytosine followed by DNA repair-dependent cytosine substitution. Although there is debate on the extent of its involvement in DNA demethylation, activation-induced cytidine deaminase (AID) has recently emerged as an enzyme that is capable of deaminating 5-methylcytosine to thymine, creating a T:G mismatch which can be repaired back to cytosine through DNA repair pathways. AID is expressed at low levels in many tissue types but is most highly expressed in germinal center B cells where it deaminates cytidine to uracil during so...
The adaptive immune receptor repertoire (AIRR) contains information on an individuals’ immune pas... more The adaptive immune receptor repertoire (AIRR) contains information on an individuals’ immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D) and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorit...
Recent advances in data acquiring technologies in biology have led to major challenges in mining ... more Recent advances in data acquiring technologies in biology have led to major challenges in mining relevant information from large datasets. For example, single-cell RNA sequencing technologies are producing expression and sequence information from tens of thousands of cells in every single experiment. A common task in analyzing biological data is to cluster samples or features (e.g. genes) into groups sharing common characteristics. This is an NP hard problem for which numerous heuristic algorithms have been developed. However, in many cases, the clusters created by these algorithms do not reflect biological reality. To overcome this, a Networks Based Clustering (NBC) approach was recently proposed, by which the samples or genes in the dataset are first mapped to a network and then community detection (CD) algorithms are used to identify clusters of nodes. Here, we created an open and flexible python-based toolkit for NBC that enables easy and accessible network construction and comm...
Analysis of antibody repertoires by high throughput sequencing is of major importance in understa... more Analysis of antibody repertoires by high throughput sequencing is of major importance in understanding adaptive immune responses. Our knowledge of variations in the genomic loci encoding antibody genes is incomplete, mostly due to technical difficulties in aligning short reads to these highly repetitive loci. The partial knowledge results in conflicting V-D-J gene assignments between different algorithms, and biased genotype and haplotype inference. Previous studies have shown that haplotypes can be inferred by taking advantage of IGHJ6 heterozygosity, observed in approximately one third of the population. Here, we propose a robust novel method for determining V-D-J haplotypes by adapting a Bayesian framework. Our method extends haplotype inference to IGHD- and IGHV -based analysis, thereby enabling inference of complex genetic events like deletions and copy number variations in the entire population. We generated the largest multi individual data set, to date, of naïve B-cell reper...
Cancer immunotherapy has made enormous progress in offering safer and more effective treatments f... more Cancer immunotherapy has made enormous progress in offering safer and more effective treatments for the disease. Specifically, programmed death ligand 1 antibody (αPDL1), designed to perform immune checkpoint blockade (ICB), is now considered a pillar in cancer immunotherapy. However, due to the complexity and heterogeneity of tumors, as well as the diversity in patient response, ICB therapy only has a 30% success rate, at most; moreover, the efficacy of ICB can be evaluated only two months after start of treatment. Therefore, early identification of potential responders and nonresponders to therapy, using noninvasive means, is crucial for improving treatment decisions. Here, we report a straightforward approach for fast, image-guided prediction of therapeutic response to ICB. In a colon cancer mouse model, we demonstrate that the combination of computed tomography imaging and gold nanoparticles conjugated to αPDL1 allowed prediction of therapeutic response, as early as 48 h after t...
The role of B cells and posttranslational modifications in pathogenesis of organ-specific immune ... more The role of B cells and posttranslational modifications in pathogenesis of organ-specific immune diseases is increasingly envisioned but remains poorly understood, particularly in human disorders. In celiac disease, transglutaminase 2-modified (TG2-modified; deamidated) gluten peptides drive disease-specific T cell and B cell responses, and antibodies to deamidated gluten peptides are excellent diagnostic markers. Here, we substantiate by high-throughput sequencing of IGHV genes that antibodies to a disease-specific, deamidated, and immunodominant B cell epitope of gluten (PLQPEQPFP) have biased and stereotyped usage of IGHV3-23 and IGHV3-15 gene segments with modest somatic mutations. X-ray crystal structures of 2 prototype IGHV3-15/IGKV4-1 and IGHV3-23/IGLV4-69 antibodies reveal peptide interaction mainly via germline-encoded residues. In-depth mutational analysis showed restricted selection and substitution patterns at positions involved in antigen binding. While the IGHV3-15/IGK...
The T cell receptor (TCR) controls the cellular adaptive immune response to antigens, but our und... more The T cell receptor (TCR) controls the cellular adaptive immune response to antigens, but our understanding of TCR repertoire diversity and response to challenge is still incomplete. For example, TCR clones shared by different individuals with minimal alteration to germline gene sequences (public clones) are detectable in all vertebrates, but their significance is unknown. Although small in size, the zebrafish TCR repertoire is controlled by processes similar to those operating in mammals. Thus, we studied the zebrafish TCR repertoire and its response to stimulation with self and foreign antigens. We found that cross-reactive public TCRs dominate the T cell response, endowing a limited TCR repertoire with the ability to cope with diverse antigenic challenges. These features of vertebrate public TCRs might provide a mechanism for the rapid generation of protective T cell immunity, allowing a short temporal window for the development of more specific private T cell responses.
Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they recor... more Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a...
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to die... more Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification perfor...
ABSTRACTGermline variations in immunoglobulin genes influence the repertoire of B cell receptors ... more ABSTRACTGermline variations in immunoglobulin genes influence the repertoire of B cell receptors and antibodies, and such polymorphisms may impact disease susceptibility. However, the knowledge of the genomic variation of the immunoglobulin loci is scarce. Here, we report 25 novel germline IGHV alleles as inferred from rearranged naïve B cell cDNA repertoires of 98 individuals. Thirteen novel alleles were selected for validation, out of which ten were successfully confirmed by targeted amplification and Sanger sequencing of non-B cell DNA. Moreover, we detected a high degree of variability upstream of the V-region in the 5’UTR, leader 1, and leader 2 sequences, and found that identical V-region alleles can differ in upstream sequences. Thus, we have identified a large genetic variation not only in the V-region but also in the upstream sequences of IGHV genes. Our findings challenge current approaches used for annotating immunoglobulin repertoire sequencing data.
DNA methylation and demethylation at cytosine residues are epigenetic modifications that regulate... more DNA methylation and demethylation at cytosine residues are epigenetic modifications that regulate gene expression associated with early cell development, somatic cell differentiation, cellular reprogramming and malignant transformation. While the process of DNA methylation and maintenance by DNA methyltransferases is well described, the nature of DNA demethylation remains poorly understood. The current model of DNA demethylation proposes modification of 5-methylcytosine followed by DNA repair-dependent cytosine substitution. Although there is debate on the extent of its involvement in DNA demethylation, activation-induced cytidine deaminase (AID) has recently emerged as an enzyme that is capable of deaminating 5-methylcytosine to thymine, creating a T:G mismatch which can be repaired back to cytosine through DNA repair pathways. AID is expressed at low levels in many tissue types but is most highly expressed in germinal center B cells where it deaminates cytidine to uracil during so...
The adaptive immune receptor repertoire (AIRR) contains information on an individuals’ immune pas... more The adaptive immune receptor repertoire (AIRR) contains information on an individuals’ immune past, present and potential in the form of the evolving sequences that encode the B cell receptor (BCR) repertoire. AIRR sequencing (AIRR-seq) studies rely on databases of known BCR germline variable (V), diversity (D) and joining (J) genes to detect somatic mutations in AIRR-seq data via comparison to the best-aligning database alleles. However, it has been shown that these databases are far from complete, leading to systematic misidentification of mutated positions in subsets of sample sequences. We previously presented TIgGER, a computational method to identify subject-specific V gene genotypes, including the presence of novel V gene alleles, directly from AIRR-seq data. However, the original algorithm was unable to detect alleles that differed by more than 5 single nucleotide polymorphisms (SNPs) from a database allele. Here we present and apply an improved version of the TIgGER algorit...
Recent advances in data acquiring technologies in biology have led to major challenges in mining ... more Recent advances in data acquiring technologies in biology have led to major challenges in mining relevant information from large datasets. For example, single-cell RNA sequencing technologies are producing expression and sequence information from tens of thousands of cells in every single experiment. A common task in analyzing biological data is to cluster samples or features (e.g. genes) into groups sharing common characteristics. This is an NP hard problem for which numerous heuristic algorithms have been developed. However, in many cases, the clusters created by these algorithms do not reflect biological reality. To overcome this, a Networks Based Clustering (NBC) approach was recently proposed, by which the samples or genes in the dataset are first mapped to a network and then community detection (CD) algorithms are used to identify clusters of nodes. Here, we created an open and flexible python-based toolkit for NBC that enables easy and accessible network construction and comm...
Analysis of antibody repertoires by high throughput sequencing is of major importance in understa... more Analysis of antibody repertoires by high throughput sequencing is of major importance in understanding adaptive immune responses. Our knowledge of variations in the genomic loci encoding antibody genes is incomplete, mostly due to technical difficulties in aligning short reads to these highly repetitive loci. The partial knowledge results in conflicting V-D-J gene assignments between different algorithms, and biased genotype and haplotype inference. Previous studies have shown that haplotypes can be inferred by taking advantage of IGHJ6 heterozygosity, observed in approximately one third of the population. Here, we propose a robust novel method for determining V-D-J haplotypes by adapting a Bayesian framework. Our method extends haplotype inference to IGHD- and IGHV -based analysis, thereby enabling inference of complex genetic events like deletions and copy number variations in the entire population. We generated the largest multi individual data set, to date, of naïve B-cell reper...
Cancer immunotherapy has made enormous progress in offering safer and more effective treatments f... more Cancer immunotherapy has made enormous progress in offering safer and more effective treatments for the disease. Specifically, programmed death ligand 1 antibody (αPDL1), designed to perform immune checkpoint blockade (ICB), is now considered a pillar in cancer immunotherapy. However, due to the complexity and heterogeneity of tumors, as well as the diversity in patient response, ICB therapy only has a 30% success rate, at most; moreover, the efficacy of ICB can be evaluated only two months after start of treatment. Therefore, early identification of potential responders and nonresponders to therapy, using noninvasive means, is crucial for improving treatment decisions. Here, we report a straightforward approach for fast, image-guided prediction of therapeutic response to ICB. In a colon cancer mouse model, we demonstrate that the combination of computed tomography imaging and gold nanoparticles conjugated to αPDL1 allowed prediction of therapeutic response, as early as 48 h after t...
The role of B cells and posttranslational modifications in pathogenesis of organ-specific immune ... more The role of B cells and posttranslational modifications in pathogenesis of organ-specific immune diseases is increasingly envisioned but remains poorly understood, particularly in human disorders. In celiac disease, transglutaminase 2-modified (TG2-modified; deamidated) gluten peptides drive disease-specific T cell and B cell responses, and antibodies to deamidated gluten peptides are excellent diagnostic markers. Here, we substantiate by high-throughput sequencing of IGHV genes that antibodies to a disease-specific, deamidated, and immunodominant B cell epitope of gluten (PLQPEQPFP) have biased and stereotyped usage of IGHV3-23 and IGHV3-15 gene segments with modest somatic mutations. X-ray crystal structures of 2 prototype IGHV3-15/IGKV4-1 and IGHV3-23/IGLV4-69 antibodies reveal peptide interaction mainly via germline-encoded residues. In-depth mutational analysis showed restricted selection and substitution patterns at positions involved in antigen binding. While the IGHV3-15/IGK...
The T cell receptor (TCR) controls the cellular adaptive immune response to antigens, but our und... more The T cell receptor (TCR) controls the cellular adaptive immune response to antigens, but our understanding of TCR repertoire diversity and response to challenge is still incomplete. For example, TCR clones shared by different individuals with minimal alteration to germline gene sequences (public clones) are detectable in all vertebrates, but their significance is unknown. Although small in size, the zebrafish TCR repertoire is controlled by processes similar to those operating in mammals. Thus, we studied the zebrafish TCR repertoire and its response to stimulation with self and foreign antigens. We found that cross-reactive public TCRs dominate the T cell response, endowing a limited TCR repertoire with the ability to cope with diverse antigenic challenges. These features of vertebrate public TCRs might provide a mechanism for the rapid generation of protective T cell immunity, allowing a short temporal window for the development of more specific private T cell responses.
Uploads
Papers by gur yaari