Functional genomics: Difference between revisions
Links hinzugefügt Tags: Mobile edit Mobile app edit |
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) |
||
(41 intermediate revisions by 23 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Field of molecular biology}} |
|||
⚫ | '''Functional genomics''' is a field of [[molecular biology]] that attempts to describe [[gene]] (and [[protein]]) functions and interactions. Functional genomics make use of the vast data generated by [[genomics|genomic]] and [[Transcriptomics|transcriptomic]] projects (such as [[genome project|genome sequencing projects]] and [[RNA-Seq|RNA sequencing]]). Functional genomics focuses on the dynamic aspects such as gene [[transcription (genetics)|transcription]], [[translation (biology)|translation]], [[regulation of gene expression]] and [[protein–protein interaction]]s, as opposed to the static aspects of the genomic information such as [[DNA sequence]] or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional |
||
{{cs1 config |name-list-style=vanc|display-authors=6}} |
|||
⚫ | |||
⚫ | '''Functional genomics''' is a field of [[molecular biology]] that attempts to describe [[gene]] (and [[protein]]) functions and interactions. Functional genomics make use of the vast data generated by [[genomics|genomic]] and [[Transcriptomics|transcriptomic]] projects (such as [[genome project|genome sequencing projects]] and [[RNA-Seq|RNA sequencing]]). Functional genomics focuses on the dynamic aspects such as gene [[transcription (genetics)|transcription]], [[translation (biology)|translation]], [[regulation of gene expression]] and [[protein–protein interaction]]s, as opposed to the static aspects of the genomic information such as [[DNA sequence]] or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach. |
||
⚫ | |||
==Definition and goals |
==Definition and goals == |
||
In order to understand functional genomics it is important to first define function. In their paper<ref>{{cite journal | vauthors = Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E | title = On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE | journal = Genome Biology and Evolution | volume = 5 | issue = 3 | pages = 578–90 | date = 20 February 2013 | pmid = 23431001 | doi = 10.1093/gbe/evt028 | pmc=3622293}}</ref> Graur et al. define function in two possible ways. These are " |
In order to understand functional genomics it is important to first define function. In their paper<ref>{{cite journal | vauthors = Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E | title = On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE | journal = Genome Biology and Evolution | volume = 5 | issue = 3 | pages = 578–90 | date = 20 February 2013 | pmid = 23431001 | doi = 10.1093/gbe/evt028 | pmc=3622293}}</ref> Graur et al. define function in two possible ways. These are "selected effect" and "causal role". The "selected effect" function refers to the function for which a trait (DNA, RNA, protein etc.) is selected for. The "causal role" function refers to the function that a trait is sufficient and necessary for. Functional genomics usually tests the "causal role" definition of function. |
||
The goal of functional genomics is to understand the function of genes or proteins, eventually all components of a genome. The term functional genomics is often used to refer to the many |
The goal of functional genomics is to understand the function of genes or proteins, eventually all components of a genome. The term functional genomics is often used to refer to the many technical approaches to study an organism's genes and proteins, including the "biochemical, cellular, and/or physiological properties of each and every gene product"<ref name="Gibson and Muse">{{cite book |vauthors=Gibson G, Muse SV |title=A primer of genome science |edition=3rd |publisher=Sinauer Associates |location=Sunderland, MA}}</ref> while some authors include the study of nongenic elements in their definition.<ref name="Pevsner">{{cite book | vauthors = Pevsner J |year=2009 |title=Bioinformatics and functional genomics | url = https://archive.org/details/bioinformaticsfu00pevs_0 | url-access = registration |edition=2nd |publisher=Wiley-Blackwell |location=Hoboken, NJ|isbn=9780470085851 }}</ref> Functional genomics may also include studies of natural genetic variation over time (such as an organism's development) or space (such as its body regions), as well as functional disruptions such as mutations. |
||
The promise of functional genomics is to generate and synthesize genomic and proteomic knowledge into an understanding of the dynamic properties of an organism. This could potentially provide a more complete picture of how the genome specifies function compared to studies of single genes. Integration of functional genomics data is often a part of [[systems biology]] approaches. |
The promise of functional genomics is to generate and synthesize genomic and proteomic knowledge into an understanding of the dynamic properties of an organism. This could potentially provide a more complete picture of how the genome specifies function compared to studies of single genes. Integration of functional genomics data is often a part of [[systems biology]] approaches. |
||
Line 21: | Line 23: | ||
====DNA/Protein interactions==== |
====DNA/Protein interactions==== |
||
{{Main|ChIP sequencing}} |
|||
Proteins formed by the translation of the mRNA (messenger RNA, a coded information from DNA for protein synthesis) play a major role in regulating gene expression. To understand how they regulate gene expression it is necessary to identify DNA sequences that they interact with. Techniques have been developed to identify sites of DNA-protein interactions. These include [[ |
Proteins formed by the translation of the mRNA (messenger RNA, a coded information from DNA for protein synthesis) play a major role in regulating gene expression. To understand how they regulate gene expression it is necessary to identify DNA sequences that they interact with. Techniques have been developed to identify sites of DNA-protein interactions. These include [[ChIP-sequencing]], [[CUT&RUN sequencing]] and Calling Cards.<ref>{{cite journal | vauthors = Wang H, Mayhew D, Chen X, Johnston M, Mitra RD | title = Calling Cards enable multiplexed identification of the genomic targets of DNA-binding proteins | journal = Genome Research | volume = 21 | issue = 5 | pages = 748–55 | date = May 2011 | pmid = 21471402 | doi = 10.1101/gr.114850.110 | pmc = 3083092 }}</ref> |
||
====DNA accessibility assays==== |
====DNA accessibility assays==== |
||
Assays have been developed to identify regions of the genome that are accessible. These regions of |
Assays have been developed to identify regions of the genome that are accessible. These regions of accessible chromatin are candidate regulatory regions. These assays include [[ATAC-seq]], [[DNase-Seq]] and [[FAIRE-Seq]]. |
||
assays include [[ATAC-seq]], [[DNase-Seq]] and [[FAIRE-Seq]]. |
|||
===At the RNA level=== |
===At the RNA level=== |
||
Line 32: | Line 34: | ||
{{Main|DNA microarray}} |
{{Main|DNA microarray}} |
||
[[File:DNA microarray.svg|thumb|A [[DNA microarray]]]] |
[[File:DNA microarray.svg|thumb|A [[DNA microarray]]]] |
||
Microarrays measure the amount of mRNA in a sample that corresponds to a given gene or probe DNA sequence. Probe sequences are immobilized on a solid surface and allowed to [[Nucleic acid hybridization|hybridize]] with fluorescently labeled |
Microarrays measure the amount of mRNA in a sample that corresponds to a given gene or probe DNA sequence. Probe sequences are immobilized on a solid surface and allowed to [[Nucleic acid hybridization|hybridize]] with fluorescently labeled "target" mRNA. The intensity of fluorescence of a spot is proportional to the amount of target sequence that has hybridized to that spot and therefore to the abundance of that mRNA sequence in the sample. Microarrays allow for the identification of candidate genes involved in a given process based on variation between transcript levels for different conditions and shared expression patterns with genes of known function. |
||
====SAGE==== |
====SAGE==== |
||
Line 42: | Line 44: | ||
RNA sequencing has taken over microarray and SAGE technology in recent years, as noted in 2016, and has become the most efficient way to study transcription and gene expression. This is typically done by [[DNA sequencing|next-generation sequencing]].<ref>{{cite journal | vauthors = Hrdlickova R, Toloue M, Tian B | title = RNA-Seq methods for transcriptome analysis | journal = Wiley Interdisciplinary Reviews: RNA | volume = 8 | issue = 1 | pages = e1364 | date = January 2017 | pmid = 27198714 | pmc = 5717752 | doi = 10.1002/wrna.1364 }}</ref> |
RNA sequencing has taken over microarray and SAGE technology in recent years, as noted in 2016, and has become the most efficient way to study transcription and gene expression. This is typically done by [[DNA sequencing|next-generation sequencing]].<ref>{{cite journal | vauthors = Hrdlickova R, Toloue M, Tian B | title = RNA-Seq methods for transcriptome analysis | journal = Wiley Interdisciplinary Reviews: RNA | volume = 8 | issue = 1 | pages = e1364 | date = January 2017 | pmid = 27198714 | pmc = 5717752 | doi = 10.1002/wrna.1364 }}</ref> |
||
A subset of sequenced RNAs are small RNAs, a class of non-coding RNA molecules that are key regulators of transcriptional and post-transcriptional gene silencing, or [[RNA silencing]]. Next |
A subset of sequenced RNAs are small RNAs, a class of non-coding RNA molecules that are key regulators of transcriptional and post-transcriptional gene silencing, or [[RNA silencing]]. Next-generation sequencing is the gold standard tool for [[non-coding RNA]] discovery, profiling and expression analysis. |
||
====Massively Parallel Reporter Assays (MPRAs)==== |
====Massively Parallel Reporter Assays (MPRAs)==== |
||
Massively parallel reporter assays is a technology to test the cis-regulatory activity of DNA sequences.<ref>{{cite journal | vauthors = Kwasnieski JC, Fiore C, Chaudhari HG, Cohen BA | title = High-throughput functional testing of ENCODE segmentation predictions | journal = Genome Research | volume = 24 | issue = 10 | pages = 1595–602 | date = October 2014 | pmid = 25035418 | pmc = 4199366 | doi = 10.1101/gr.173518.114 }}</ref><ref>{{cite journal | vauthors = Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, Ahituv N, Pennacchio LA, Shendure J |
Massively parallel reporter assays is a technology to test the cis-regulatory activity of DNA sequences.<ref>{{cite journal | vauthors = Kwasnieski JC, Fiore C, Chaudhari HG, Cohen BA | title = High-throughput functional testing of ENCODE segmentation predictions | journal = Genome Research | volume = 24 | issue = 10 | pages = 1595–602 | date = October 2014 | pmid = 25035418 | pmc = 4199366 | doi = 10.1101/gr.173518.114 }}</ref><ref>{{cite journal | vauthors = Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, Lee C, Andrie JM, Lee SI, Cooper GM, Ahituv N, Pennacchio LA, Shendure J | title = Massively parallel functional dissection of mammalian enhancers in vivo | journal = Nature Biotechnology | volume = 30 | issue = 3 | pages = 265–70 | date = February 2012 | pmid = 22371081 | pmc = 3402344 | doi = 10.1038/nbt.2136 }}</ref> MPRAs use a [[plasmid]] with a synthetic cis-regulatory element upstream of a promoter driving a synthetic gene such as Green Fluorescent Protein. A library of cis-regulatory elements is usually tested using MPRAs, a library can contain from hundreds to thousands of cis-regulatory elements. The cis-regulatory activity of the elements is assayed by using the downstream reporter activity. The activity of all the library members is assayed in parallel using barcodes for each cis-regulatory element. One limitation of MPRAs is that the activity is assayed on a plasmid and may not capture all aspects of gene regulation observed in the genome. |
||
====STARR-seq==== |
====STARR-seq==== |
||
{{Main|STARR-seq}} |
{{Main|STARR-seq}} |
||
STARR-seq is a technique similar to MPRAs to assay enhancer activity of randomly sheared genomic fragments. In the original publication,<ref>{{cite journal | vauthors = Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A | title = Genome-wide quantitative enhancer activity maps identified by STARR-seq | journal = Science | volume = 339 | issue = 6123 | pages = 1074–7 | date = March 2013 | pmid = 23328393 | doi = 10.1126/science.1232542 | bibcode = 2013Sci...339.1074A | s2cid = 54488955 }}</ref> randomly sheared fragments of the Drosophila genome were placed downstream of a minimal promoter. Candidate enhancers amongst the randomly sheared fragments will transcribe themselves using the minimal promoter. By using sequencing as a readout and controlling for input amounts of each sequence the strength of putative enhancers are assayed by this method. |
STARR-seq is a technique similar to MPRAs to assay enhancer activity of randomly sheared genomic fragments. In the original publication,<ref>{{cite journal | vauthors = Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A | title = Genome-wide quantitative enhancer activity maps identified by STARR-seq | journal = Science | volume = 339 | issue = 6123 | pages = 1074–7 | date = March 2013 | pmid = 23328393 | doi = 10.1126/science.1232542 | bibcode = 2013Sci...339.1074A | s2cid = 54488955 }}</ref> randomly sheared fragments of the ''Drosophila'' genome were placed downstream of a minimal promoter. Candidate enhancers amongst the randomly sheared fragments will transcribe themselves using the minimal promoter. By using sequencing as a readout and controlling for input amounts of each sequence the strength of putative enhancers are assayed by this method. |
||
====Perturb-seq==== |
====Perturb-seq==== |
||
Line 63: | Line 65: | ||
A yeast [[two-hybrid screening]] (Y2H) tests a "bait" protein against many potential interacting proteins ("prey") to identify physical protein–protein interactions. This system is based on a transcription factor, originally GAL4,<ref name=pmid2547163>{{cite journal | vauthors = Fields S, Song O | title = A novel genetic system to detect protein-protein interactions | journal = Nature | volume = 340 | issue = 6230 | pages = 245–6 | date = July 1989 | pmid = 2547163 | doi = 10.1038/340245a0 | bibcode = 1989Natur.340..245F | s2cid = 4320733 }}</ref> whose separate DNA-binding and transcription activation domains are both required in order for the protein to cause transcription of a reporter gene. In a Y2H screen, the "bait" protein is fused to the binding domain of GAL4, and a library of potential "prey" (interacting) proteins is recombinantly expressed in a vector with the activation domain. In vivo interaction of bait and prey proteins in a yeast cell brings the activation and binding domains of GAL4 close enough together to result in expression of a [[reporter gene]]. It is also possible to systematically test a library of bait proteins against a library of prey proteins to identify all possible interactions in a cell. |
A yeast [[two-hybrid screening]] (Y2H) tests a "bait" protein against many potential interacting proteins ("prey") to identify physical protein–protein interactions. This system is based on a transcription factor, originally GAL4,<ref name=pmid2547163>{{cite journal | vauthors = Fields S, Song O | title = A novel genetic system to detect protein-protein interactions | journal = Nature | volume = 340 | issue = 6230 | pages = 245–6 | date = July 1989 | pmid = 2547163 | doi = 10.1038/340245a0 | bibcode = 1989Natur.340..245F | s2cid = 4320733 }}</ref> whose separate DNA-binding and transcription activation domains are both required in order for the protein to cause transcription of a reporter gene. In a Y2H screen, the "bait" protein is fused to the binding domain of GAL4, and a library of potential "prey" (interacting) proteins is recombinantly expressed in a vector with the activation domain. In vivo interaction of bait and prey proteins in a yeast cell brings the activation and binding domains of GAL4 close enough together to result in expression of a [[reporter gene]]. It is also possible to systematically test a library of bait proteins against a library of prey proteins to identify all possible interactions in a cell. |
||
====AP/MS==== |
====MS and AP/MS==== |
||
{{Main|Protein mass spectrometry|Affinity purification}} |
|||
[[Affinity purification]] and [[mass spectrometry]] (AP/MS) is able to identify proteins that interact with one another in complexes. Complexes of proteins are allowed to form around a particular “bait” protein. The bait protein is identified using an antibody or a recombinant tag which allows it to be extracted along with any proteins that have formed a complex with it. The proteins are then digested into short [[peptide]] fragments and mass spectrometry is used to identify the proteins based on the mass-to-charge ratios of those fragments. |
|||
[[Mass spectrometry]] (MS) can identify proteins and their relative levels, hence it can be used to study protein expression. When used in combination with [[affinity purification]], [[mass spectrometry]] (AP/MS) can be used to study protein complexes, that is, which proteins interact with one another in complexes and in which ratios. In order to purify protein complexes, usually a "bait" protein is tagged with a specific protein or peptide that can be used to pull out the complex from a complex mix. The purification is usually done using an antibody or a compound that binds to the fusion part. The proteins are then digested into short [[peptide]] fragments and mass spectrometry is used to identify the proteins based on the mass-to-charge ratios of those fragments. |
|||
====Deep |
====Deep mutational scanning==== |
||
In |
In deep mutational scanning, every possible amino acid change in a given protein is first synthesized.<ref>{{cite journal |last1=Araya |first1=Carlos |last2=Fowler |first2=Douglas |title=Deep mutational scanning: assessing protein function on a massive scale |journal=Trends in Biotechnology |date=September 29, 2011 |volume=29 |issue=9 |pages=435–442 |doi=10.1016/j.tibtech.2011.04.003 |pmid=21561674|pmc=3159719 }}</ref> The activity of each of these protein variants is assayed in parallel using barcodes for each variant.<ref>{{cite journal | vauthors = Penn WD, McKee AG, Kuntz CP, Woods H, Nash V, Gruenhagen TC, Roushar FJ, Chandak M, Hemmerich C, Rusch DB, Meiler J, Schlebach JP| title = Probing biophysical sequence constraints within the transmembrane domains of rhodopsin by deep mutational scanning| journal = Sci Adv | volume = 6 | issue = 10 | pages = eaay7505| date = March 2020 | pmid = 32181350 | doi = 10.1126/sciadv.aay7505| pmc = 7056298 | bibcode = 2020SciA....6.7505P}}</ref> By comparing the activity to the wild-type protein, the effect of each mutation is identified. While it is possible to assay every possible single amino-acid change due to combinatorics two or more concurrent mutations are hard to test. Deep mutational scanning experiments have also been used to infer protein structure and protein-protein interactions.<ref>{{cite journal |last1=Rollins |first1=N.J. |last2=Brock |first2=K.P. |last3=Poelwijk |first3=F.J |last4=Marks |first4=Debora |title=Inferring protein 3D structure from deep mutation scans |journal=Nature Genetics |date=2019 |volume=51 |issue=7 |pages=1170–1176 |doi=10.1038/s41588-019-0432-9 |pmid=31209393 |pmc=7295002 }}</ref> Deep Mutational Scanning is an example of a multiplexed assays of variant effect (MAVEs), a family of methods that involve mutagenesis of a DNA-encoded protein or regulatory element followed by a multiplexed assay for some aspect of function. MAVEs enable the generation of ‘variant effect maps’ characterizing aspects of the function of every possible single nucleotide change in a gene or functional element of interest. <ref>{{cite journal |last1=Fowler |first1=DM |last2=Adams |first2=DJ |last3=Gloyn |first3=AL |last4=Starita |first4=Lea |title=An Atlas of Variant Effects to understand the genome at nucleotide resolution |journal=Genome Biology |date=2023 |volume=24 |issue=1 |page=147 |doi=10.1186/s13059-023-02986-x |doi-access=free |pmid=37394429|pmc=10316620 }}</ref> |
||
===Mutagenesis and phenotyping=== |
|||
===Loss-of-function techniques=== |
|||
An important functional feature of genes is the phenotype caused by mutations. Mutants can be produced by random mutations or by directed mutagenesis, including site-directed mutagenesis, deleting complete genes, or other techniques. |
|||
==== |
====Knock-outs (gene deletions)==== |
||
Gene function can be investigated by systematically |
Gene function can be investigated by systematically "knocking out" genes one by one. This is done by either [[gene knockout|deletion]] or disruption of function (such as by [[insertional mutagenesis]]) and the resulting organisms are screened for phenotypes that provide clues to the function of the disrupted gene. Knock-outs have been produced for whole genomes, i.e. by deleting all genes in a genome. For [[essential gene]]s, this is not possible, so other techniques are used, e.g. deleting a gene while expressing the gene from a [[plasmid]], using an inducible promoter, so that the level of gene product can be changed at will (and thus a "functional" deletion achieved). |
||
====Site-directed mutagenesis==== |
|||
[[Site-directed mutagenesis]] is used to mutate specific bases (and thus [[amino acid]]s). This is critical to investigate the function of specific amino acids in a protein, e.g. in the active site of an [[enzyme]]. |
|||
====RNAi==== |
====RNAi==== |
||
{{Main|RNAi}} |
{{Main|RNAi}} |
||
RNA interference (RNAi) methods can be used to transiently silence or |
[[RNA interference]] (RNAi) methods can be used to transiently silence or knockdown gene expression using ~20 base-pair double-stranded RNA typically delivered by transfection of synthetic ~20-mer short-interfering RNA molecules (siRNAs) or by virally encoded short-hairpin RNAs (shRNAs). RNAi screens, typically performed in cell culture-based assays or experimental organisms (such as ''C. elegans'') can be used to systematically disrupt nearly every gene in a genome or subsets of genes (sub-genomes); possible functions of disrupted genes can be assigned based on observed [[phenotype]]s. |
||
====CRISPR screens==== |
====CRISPR screens==== |
||
[[File:Journal.pbio.2006951.g001-B.png|thumb|upright=1.5|An example of a CRISPR loss-of-function screen |
[[File:Journal.pbio.2006951.g001-B.png|thumb|upright=1.5|An example of a CRISPR loss-of-function screen<ref>{{cite journal |title=Genome-wide CRISPR screens for Shiga toxins and ricin reveal Golgi proteins critical for glycosylation |vauthors=Tian S, Muneeruddin K, Choi MY, Tao L, Bhuiyan RH, Ohmi Y, Furukawa K, Furukawa K, Boland S, Shaffer SA, Adam RM, Dong M |date=27 November 2018 |journal= PLOS Biology |volume=16 |issue=11 |at=e2006951 |doi-access=free |doi=10.1371/journal.pbio.2006951|pmid=30481169 |pmc=6258472 }}</ref>]] |
||
CRISPR-Cas9 has been used to delete genes in a multiplexed manner in cell-lines. Quantifying the amount of guide-RNAs for each gene before and after the experiment can point towards essential genes. If a guide-RNA disrupts an essential gene it will lead to the loss of that cell and hence there will be a depletion of that particular guide-RNA after the screen. In a recent CRISPR-cas9 experiment in mammalian cell-lines, around 2000 genes were found to be essential in multiple cell-lines.<ref>{{cite journal | vauthors = Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, Mis M, Zimmermann M, Fradet-Turcotte A, Sun S, Mero P, Dirks P, Sidhu S, Roth FP, Rissland OS, Durocher D, Angers S, Moffat J |
CRISPR-Cas9 has been used to delete genes in a multiplexed manner in cell-lines. Quantifying the amount of guide-RNAs for each gene before and after the experiment can point towards essential genes. If a guide-RNA disrupts an essential gene it will lead to the loss of that cell and hence there will be a depletion of that particular guide-RNA after the screen. In a recent CRISPR-cas9 experiment in mammalian cell-lines, around 2000 genes were found to be essential in multiple cell-lines.<ref>{{cite journal | vauthors = Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, Mis M, Zimmermann M, Fradet-Turcotte A, Sun S, Mero P, Dirks P, Sidhu S, Roth FP, Rissland OS, Durocher D, Angers S, Moffat J | title = High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities | journal = Cell | volume = 163 | issue = 6 | pages = 1515–26 | date = December 2015 | pmid = 26627737 | doi = 10.1016/j.cell.2015.11.015 | doi-access = free }}</ref><ref>{{cite journal | vauthors = Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelson T, Heckl D, Ebert BL, Root DE, Doench JG, Zhang F | title = Genome-scale CRISPR-Cas9 knockout screening in human cells | journal = Science | volume = 343 | issue = 6166 | pages = 84–87 | date = January 2014 | pmid = 24336571 | pmc = 4089965 | doi = 10.1126/science.1247005 | bibcode = 2014Sci...343...84S }}</ref> Some of these genes were essential in only one cell-line. Most of genes are part of multi-protein complexes. This approach can be used to identify synthetic lethality by using the appropriate genetic background. CRISPRi and CRISPRa enable loss-of-function and gain-of-function screens in a similar manner. CRISPRi identified ~2100 essential genes in the K562 cell-line.<ref>{{cite journal | vauthors = Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, Guimaraes C, Panning B, Ploegh HL, Bassik MC, Qi LS, Kampmann M, Weissman JS | title = Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation | journal = Cell | volume = 159 | issue = 3 | pages = 647–61 | date = October 2014 | pmid = 25307932 | pmc = 4253859 | doi = 10.1016/j.cell.2014.09.029 }}</ref><ref>{{cite journal | vauthors = Horlbeck MA, Gilbert LA, Villalta JE, Adamson B, Pak RA, Chen Y, Fields AP, Park CY, Corn JE, Kampmann M, Weissman JS | title = Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation | journal = eLife | volume = 5 | date = September 2016 | pmid = 27661255 | pmc = 5094855 | doi = 10.7554/eLife.19760 | doi-access = free }}</ref> CRISPR deletion screens have also been used to identify potential regulatory elements of a gene. For example, a technique called ScanDel was published which attempted this approach. The authors deleted regions outside a gene of interest(HPRT1 involved in a Mendelian disorder) in an attempt to identify regulatory elements of this gene.<ref>{{cite journal | vauthors = Gasperini M, Findlay GM, McKenna A, Milbank JH, Lee C, Zhang MD, Cusanovich DA, Shendure J | title = CRISPR/Cas9-Mediated Scanning for Regulatory Elements Required for HPRT1 Expression via Thousands of Large, Programmed Genomic Deletions | journal = American Journal of Human Genetics | volume = 101 | issue = 2 | pages = 192–205 | date = August 2017 | pmid = 28712454 | pmc = 5544381 | doi = 10.1016/j.ajhg.2017.06.010 }}</ref> Gassperini et al. did not identify any distal regulatory elements for HPRT1 using this approach, however such approaches can be extended to other genes of interest. |
||
===Functional annotations for genes=== |
===Functional annotations for genes=== |
||
Line 92: | Line 99: | ||
== Bioinformatics methods for Functional genomics == |
== Bioinformatics methods for Functional genomics == |
||
Because of the large quantity of data produced by these techniques and the desire to find biologically meaningful patterns, [[bioinformatics]] is crucial to analysis of functional genomics data. Examples of techniques in this class are [[data clustering]] or [[principal component analysis]] for unsupervised [[machine learning]] (class detection) as well as [[artificial neural network]]s or [[support vector machine]]s for supervised machine learning (class prediction, [[Statistical classification|classification]]). Functional enrichment analysis is used to determine the extent of over- or under-expression (positive- or negative- regulators in case of RNAi screens) of functional categories relative to a background sets. [[Gene ontology]] based enrichment analysis are provided by [[Gene set enrichment analysis#DAVID|DAVID]] and [[gene set enrichment analysis]] (GSEA),<ref>{{cite journal | vauthors = Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP |
Because of the large quantity of data produced by these techniques and the desire to find biologically meaningful patterns, [[bioinformatics]] is crucial to analysis of functional genomics data. Examples of techniques in this class are [[data clustering]] or [[principal component analysis]] for unsupervised [[machine learning]] (class detection) as well as [[artificial neural network]]s or [[support vector machine]]s for supervised machine learning (class prediction, [[Statistical classification|classification]]). Functional enrichment analysis is used to determine the extent of over- or under-expression (positive- or negative- regulators in case of RNAi screens) of functional categories relative to a background sets. [[Gene ontology]] based enrichment analysis are provided by [[Gene set enrichment analysis#DAVID|DAVID]] and [[gene set enrichment analysis]] (GSEA),<ref>{{cite journal | vauthors = Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP | title = Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles | journal = Proceedings of the National Academy of Sciences of the United States of America | volume = 102 | issue = 43 | pages = 15545–50 | date = October 2005 | pmid = 16199517 | pmc = 1239896 | doi = 10.1073/pnas.0506580102 | bibcode = 2005PNAS..10215545S | doi-access = free }}</ref> pathway based analysis by Ingenuity<ref>{{cite web |url=http://www.ingenuity.com/ |archive-url=https://web.archive.org/web/19990125100225/http://www.ingenuity.com/ |url-status=dead |archive-date=1999-01-25 |title=Ingenuity Systems |access-date=2007-12-31 }}</ref> and Pathway studio<ref>{{cite web |url=http://www.ariadnegenomics.com/products/pathway-studio/ |title=Ariadne Genomics: Pathway Studio |access-date=2007-12-31 |archive-url=https://web.archive.org/web/20071230035556/http://www.ariadnegenomics.com/products/pathway-studio |archive-date=2007-12-30 |url-status=dead }}</ref> and protein complex based analysis by COMPLEAT.<ref>{{cite journal | vauthors = Vinayagam A, Hu Y, Kulkarni M, Roesel C, Sopko R, Mohr SE, Perrimon N | title = Protein complex-based analysis framework for high-throughput data sets | journal = Science Signaling | volume = 6 | issue = 264 | pages = rs5 | date = February 2013 | pmid = 23443684 | pmc = 3756668 | doi = 10.1126/scisignal.2003629 | url = http://www.flyrnai.org/compleat/ }}</ref> |
||
[[File:Phydms.jpg|thumb|An overview of a phydms workflow]] |
[[File:Phydms.jpg|thumb|An overview of a phydms workflow]] |
||
New computational methods have been developed for understanding the results of a deep mutational scanning experiment. 'phydms' compares the result of a deep mutational scanning experiment to a phylogenetic tree.<ref name="Hilton_2017">{{cite journal | vauthors = Hilton SK, Doud MB, Bloom JD | title = phydms: software for phylogenetic analyses informed by deep mutational scanning | journal = PeerJ | volume = 5 | pages = e3657 | date = 2017 | pmid = 28785526 | pmc = 5541924 | doi = 10.7717/peerj.3657 }}</ref> This allows the user to infer if the selection process in nature applies similar constraints on a protein as the results of the deep mutational scan indicate. This may allow an experimenter to choose between different experimental conditions based on how well they reflect nature. Deep mutational scanning has also been used to infer protein-protein interactions.<ref>{{cite journal | vauthors = Diss G, Lehner B | title = The genetic landscape of a physical interaction | journal = eLife | volume = 7 | date = April 2018 | pmid = 29638215 | pmc = 5896888 | doi = 10.7554/eLife.32472 }}</ref> The authors used a thermodynamic model to predict the effects of mutations in different parts of a dimer. Deep mutational structure can also be used to infer protein structure. Strong positive epistasis between two mutations in a deep mutational scan can be indicative of two parts of the protein that are close to each other in 3-D space. This information can then be used to infer protein structure. A proof of principle of this approach was shown by two groups using the protein GB1.<ref>{{cite journal | |
New computational methods have been developed for understanding the results of a deep mutational scanning experiment. 'phydms' compares the result of a deep mutational scanning experiment to a phylogenetic tree.<ref name="Hilton_2017">{{cite journal | vauthors = Hilton SK, Doud MB, Bloom JD | title = phydms: software for phylogenetic analyses informed by deep mutational scanning | journal = PeerJ | volume = 5 | pages = e3657 | date = 2017 | pmid = 28785526 | pmc = 5541924 | doi = 10.7717/peerj.3657 | doi-access = free }}</ref> This allows the user to infer if the selection process in nature applies similar constraints on a protein as the results of the deep mutational scan indicate. This may allow an experimenter to choose between different experimental conditions based on how well they reflect nature. Deep mutational scanning has also been used to infer protein-protein interactions.<ref>{{cite journal | vauthors = Diss G, Lehner B | title = The genetic landscape of a physical interaction | journal = eLife | volume = 7 | date = April 2018 | pmid = 29638215 | pmc = 5896888 | doi = 10.7554/eLife.32472 | doi-access = free }}</ref> The authors used a thermodynamic model to predict the effects of mutations in different parts of a dimer. Deep mutational structure can also be used to infer protein structure. Strong positive epistasis between two mutations in a deep mutational scan can be indicative of two parts of the protein that are close to each other in 3-D space. This information can then be used to infer protein structure. A proof of principle of this approach was shown by two groups using the protein GB1.<ref>{{cite journal | vauthors = Schmiedel JM, Lehner B | title = Determining protein structures using deep mutagenesis | journal = Nature Genetics | volume = 51 | issue = 7 | pages = 1177–1186 | date = July 2019 | pmid = 31209395 | pmc = 7610650 | doi = 10.1038/s41588-019-0431-x | doi-access = free }}</ref><ref>{{cite journal | vauthors = Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, Marks DS | title = Inferring protein 3D structure from deep mutation scans | journal = Nature Genetics | volume = 51 | issue = 7 | pages = 1170–1176 | date = July 2019 | pmid = 31209393 | pmc = 7295002 | doi = 10.1038/s41588-019-0432-9 }}</ref> |
||
Results from MPRA experiments have required machine learning approaches to interpret the data. A gapped k-mer SVM model has been used to infer the kmers that are enriched within cis-regulatory sequences with high activity compared to sequences with lower activity.<ref>{{cite journal | vauthors = Ghandi M, Lee D, Mohammad-Noori M, Beer MA | title = Enhanced regulatory sequence prediction using gapped k-mer features | journal = PLOS Computational Biology | volume = 10 | issue = 7 | pages = e1003711 | date = July 2014 | pmid = 25033408 | pmc = 4102394 | doi = 10.1371/journal.pcbi.1003711 | doi-access = free | bibcode = 2014PLSCB..10E3711G }}</ref> These models provide high predictive power. Deep learning and random forest approaches have also been used to interpret the results of these high-dimensional experiments.<ref>{{cite journal | vauthors = Li Y, Shi W, Wasserman WW | title = Genome-wide prediction of cis-regulatory regions using supervised deep learning methods | journal = BMC Bioinformatics | volume = 19 | issue = 1 | pages = 202 | date = May 2018 | pmid = 29855387 | pmc = 5984344 | doi = 10.1186/s12859-018-2187-1 }}</ref> These models are beginning to help develop a better understanding of non-coding DNA function towards gene-regulation. |
Results from MPRA experiments have required machine learning approaches to interpret the data. A gapped k-mer SVM model has been used to infer the kmers that are enriched within cis-regulatory sequences with high activity compared to sequences with lower activity.<ref>{{cite journal | vauthors = Ghandi M, Lee D, Mohammad-Noori M, Beer MA | title = Enhanced regulatory sequence prediction using gapped k-mer features | journal = PLOS Computational Biology | volume = 10 | issue = 7 | pages = e1003711 | date = July 2014 | pmid = 25033408 | pmc = 4102394 | doi = 10.1371/journal.pcbi.1003711 | doi-access = free | bibcode = 2014PLSCB..10E3711G }}</ref> These models provide high predictive power. Deep learning and random forest approaches have also been used to interpret the results of these high-dimensional experiments.<ref>{{cite journal | vauthors = Li Y, Shi W, Wasserman WW | title = Genome-wide prediction of cis-regulatory regions using supervised deep learning methods | journal = BMC Bioinformatics | volume = 19 | issue = 1 | pages = 202 | date = May 2018 | pmid = 29855387 | pmc = 5984344 | doi = 10.1186/s12859-018-2187-1 | doi-access = free }}</ref> These models are beginning to help develop a better understanding of [[non-coding DNA]] function towards gene-regulation. |
||
==Consortium projects |
==Consortium projects== |
||
=== The ENCODE project === |
=== The ENCODE project === |
||
{{Main|ENCODE}} |
{{Main|ENCODE}} |
||
The ENCODE (Encyclopedia of DNA elements) project is an in-depth analysis of the human genome whose goal is to identify all the functional elements of genomic DNA, in both coding and |
The [[ENCODE]] (Encyclopedia of DNA elements) project is an in-depth analysis of the human genome whose goal is to identify all the functional elements of genomic DNA, in both coding and non-coding regions. Important results include evidence from genomic tiling arrays that most nucleotides are transcribed as coding transcripts, non-coding RNAs, or random transcripts, the discovery of additional transcriptional regulatory sites, further elucidation of chromatin-modifying mechanisms. |
||
=== The Genotype-Tissue Expression (GTEx) project === |
=== The Genotype-Tissue Expression (GTEx) project === |
||
[[File:Nature24277-f1.jpg|thumb|Samples used and eQTLs discovered in GTEx v6]] |
[[File:Nature24277-f1.jpg|thumb|Samples used and eQTLs discovered in GTEx v6]] |
||
The GTEx project is a human genetics project aimed at understanding the role of genetic variation in shaping variation in the transcriptome across tissues. The project has collected a variety of tissue samples (> 50 different tissues) from more than 700 post-mortem donors. This has resulted in the collection of >11,000 samples. GTEx has helped understand the tissue-sharing and tissue-specificity of [[ |
The GTEx project is a human genetics project aimed at understanding the role of genetic variation in shaping variation in the transcriptome across tissues. The project has collected a variety of tissue samples (> 50 different tissues) from more than 700 post-mortem donors. This has resulted in the collection of >11,000 samples. GTEx has helped understand the tissue-sharing and tissue-specificity of [[eQTL]]s.<ref>{{cite journal | vauthors = Battle A, Brown CD, Engelhardt BE, Montgomery SB | collaboration = GTEx Consortium | title = Genetic effects on gene expression across human tissues | journal = Nature | volume = 550 | issue = 7675 | pages = 204–213 | date = October 2017 | pmid = 29022597 | pmc = 5776756 | doi = 10.1038/nature24277 | bibcode = 2017Natur.550..204A }}</ref> The genomic resource was developed to "enrich our understanding of how differences in our DNA sequence contribute to health and disease."<ref>{{Cite web|url=https://commonfund.nih.gov/highlights2017|title=GTEx Creates a Reference Data Set to Study Genetic Changes and Gene Expression|access-date=2022-01-13| publisher = U.S. National Institutes of Health | work = Office of Strategic Coordination - The Common Fund|date=8 February 2018 }}</ref> |
||
=== The Atlas of Variant Effects Alliance === |
|||
[https://www.varianteffect.org/ The Atlas of Variant Effects Alliance] (AVE),<ref>{{cite web |title=Atlas of Variant Effects Alliance |url=https://ror.org/00p2ftz29 |website=Research Organization Registry}}</ref> founded in 2020, is an international consortium aiming to catalog the impact of all possible genetic variants for disease-related functional genomics by creating variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. AVE is funded in part through the Brotman Baty Institute at the University of Washington and the National Human Genome Research Institute, via funding from the Center of Excellence in Genome Science grant (NHGRI RM1HG010461).<ref name=":0">{{Cite web |title=Scientists Launch 'Herculean' Project Creating Atlas of Human Genome Variants {{!}} Brotman Baty Institute |url=http://brotmanbaty.org/news/scientists-launch-herculean-project-creating-atlas-of-human-genome-variants |access-date=2024-02-05 |website=brotmanbaty.org |language=en}}</ref> |
|||
== See also == |
== See also == |
||
{{div col|colwidth=18em}} |
{{div col|colwidth=18em}} |
||
*[[Systems biology]] |
* [[Systems biology]] |
||
*[[Structural genomics]] |
* [[Structural genomics]] |
||
*[[Comparative genomics]] |
* [[Comparative genomics]] |
||
*[[Pharmacogenomics]] |
* [[Pharmacogenomics]] |
||
*[[MGED Society]] |
* [[MGED Society]] |
||
*[[Epigenetics]] |
* [[Epigenetics]] |
||
*[[Bioinformatics]] |
* [[Bioinformatics]] |
||
*[[Epistasis and functional genomics]] |
* [[Epistasis and functional genomics]] |
||
*[[Synthetic viability]] |
* [[Synthetic viability]] |
||
*[[Protein function prediction]] |
* [[Protein function prediction]] |
||
{{div col end}} |
{{div col end}} |
||
Line 128: | Line 138: | ||
== External links == |
== External links == |
||
* [http://archives.esf.org/coordinating-research/research-networking-programmes/life-earth-and-environmental-sciences-lee/completed-esf-research-networking-programmes-in-life-earth-and-environmental-sciences/ffg.html European Science Foundation Programme on Frontiers of Functional Genomics] |
|||
*[http://www.ebi.ac.uk/training/online/course/functional-genomics-introduction-ebi-resources Functional genomics: An introduction to the EBI resources on Train OnLine] |
|||
*[http://www. |
* [http://www.mugen-noe.org/ MUGEN NoE] — Integrated Functional Genomics in Mutant Mouse Models |
||
⚫ | |||
*[http://www.mugen-noe.org/ MUGEN NoE] — Integrated Functional Genomics in Mutant Mouse Models |
|||
⚫ | |||
⚫ | |||
*[http://www.bioinfbook.org/ ''Bioinformatics and functional genomics''] — Companion site for ''Bioinformatics and functional genomics, 2nd ed.'' |
|||
⚫ | |||
*[http://www.esffg2010.org/ 4th European Science Foundation Conference on Functional Genomics and Disease ] |
|||
{{genomics-footer}} |
{{genomics-footer}} |
||
Line 141: | Line 148: | ||
[[Category:Molecular biology]] |
[[Category:Molecular biology]] |
||
[[Category:Genomics]] |
[[Category:Genomics]] |
||
[[Category:Omics]] |
Latest revision as of 04:45, 28 May 2024
Functional genomics is a field of molecular biology that attempts to describe gene (and protein) functions and interactions. Functional genomics make use of the vast data generated by genomic and transcriptomic projects (such as genome sequencing projects and RNA sequencing). Functional genomics focuses on the dynamic aspects such as gene transcription, translation, regulation of gene expression and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional "candidate-gene" approach.
Definition and goals
[edit]In order to understand functional genomics it is important to first define function. In their paper[1] Graur et al. define function in two possible ways. These are "selected effect" and "causal role". The "selected effect" function refers to the function for which a trait (DNA, RNA, protein etc.) is selected for. The "causal role" function refers to the function that a trait is sufficient and necessary for. Functional genomics usually tests the "causal role" definition of function.
The goal of functional genomics is to understand the function of genes or proteins, eventually all components of a genome. The term functional genomics is often used to refer to the many technical approaches to study an organism's genes and proteins, including the "biochemical, cellular, and/or physiological properties of each and every gene product"[2] while some authors include the study of nongenic elements in their definition.[3] Functional genomics may also include studies of natural genetic variation over time (such as an organism's development) or space (such as its body regions), as well as functional disruptions such as mutations.
The promise of functional genomics is to generate and synthesize genomic and proteomic knowledge into an understanding of the dynamic properties of an organism. This could potentially provide a more complete picture of how the genome specifies function compared to studies of single genes. Integration of functional genomics data is often a part of systems biology approaches.
Techniques and applications
[edit]Functional genomics includes function-related aspects of the genome itself such as mutation and polymorphism (such as single nucleotide polymorphism (SNP) analysis), as well as the measurement of molecular activities. The latter comprise a number of "-omics" such as transcriptomics (gene expression), proteomics (protein production), and metabolomics. Functional genomics uses mostly multiplex techniques to measure the abundance of many or all gene products such as mRNAs or proteins within a biological sample. A more focused functional genomics approach might test the function of all variants of one gene and quantify the effects of mutants by using sequencing as a readout of activity. Together these measurement modalities endeavor to quantitate the various biological processes and improve our understanding of gene and protein functions and interactions.
At the DNA level
[edit]Genetic interaction mapping
[edit]Systematic pairwise deletion of genes or inhibition of gene expression can be used to identify genes with related function, even if they do not interact physically. Epistasis refers to the fact that effects for two different gene knockouts may not be additive; that is, the phenotype that results when two genes are inhibited may be different from the sum of the effects of single knockouts.
DNA/Protein interactions
[edit]Proteins formed by the translation of the mRNA (messenger RNA, a coded information from DNA for protein synthesis) play a major role in regulating gene expression. To understand how they regulate gene expression it is necessary to identify DNA sequences that they interact with. Techniques have been developed to identify sites of DNA-protein interactions. These include ChIP-sequencing, CUT&RUN sequencing and Calling Cards.[4]
DNA accessibility assays
[edit]Assays have been developed to identify regions of the genome that are accessible. These regions of accessible chromatin are candidate regulatory regions. These assays include ATAC-seq, DNase-Seq and FAIRE-Seq.
At the RNA level
[edit]Microarrays
[edit]Microarrays measure the amount of mRNA in a sample that corresponds to a given gene or probe DNA sequence. Probe sequences are immobilized on a solid surface and allowed to hybridize with fluorescently labeled "target" mRNA. The intensity of fluorescence of a spot is proportional to the amount of target sequence that has hybridized to that spot and therefore to the abundance of that mRNA sequence in the sample. Microarrays allow for the identification of candidate genes involved in a given process based on variation between transcript levels for different conditions and shared expression patterns with genes of known function.
SAGE
[edit]Serial analysis of gene expression (SAGE) is an alternate method of analysis based on RNA sequencing rather than hybridization. SAGE relies on the sequencing of 10–17 base pair tags which are unique to each gene. These tags are produced from poly-A mRNA and ligated end-to-end before sequencing. SAGE gives an unbiased measurement of the number of transcripts per cell, since it does not depend on prior knowledge of what transcripts to study (as microarrays do).
RNA sequencing
[edit]RNA sequencing has taken over microarray and SAGE technology in recent years, as noted in 2016, and has become the most efficient way to study transcription and gene expression. This is typically done by next-generation sequencing.[5]
A subset of sequenced RNAs are small RNAs, a class of non-coding RNA molecules that are key regulators of transcriptional and post-transcriptional gene silencing, or RNA silencing. Next-generation sequencing is the gold standard tool for non-coding RNA discovery, profiling and expression analysis.
Massively Parallel Reporter Assays (MPRAs)
[edit]Massively parallel reporter assays is a technology to test the cis-regulatory activity of DNA sequences.[6][7] MPRAs use a plasmid with a synthetic cis-regulatory element upstream of a promoter driving a synthetic gene such as Green Fluorescent Protein. A library of cis-regulatory elements is usually tested using MPRAs, a library can contain from hundreds to thousands of cis-regulatory elements. The cis-regulatory activity of the elements is assayed by using the downstream reporter activity. The activity of all the library members is assayed in parallel using barcodes for each cis-regulatory element. One limitation of MPRAs is that the activity is assayed on a plasmid and may not capture all aspects of gene regulation observed in the genome.
STARR-seq
[edit]STARR-seq is a technique similar to MPRAs to assay enhancer activity of randomly sheared genomic fragments. In the original publication,[8] randomly sheared fragments of the Drosophila genome were placed downstream of a minimal promoter. Candidate enhancers amongst the randomly sheared fragments will transcribe themselves using the minimal promoter. By using sequencing as a readout and controlling for input amounts of each sequence the strength of putative enhancers are assayed by this method.
Perturb-seq
[edit]Perturb-seq couples CRISPR mediated gene knockdowns with single-cell gene expression. Linear models are used to calculate the effect of the knockdown of a single gene on the expression of multiple genes.
At the protein level
[edit]Yeast two-hybrid system
[edit]A yeast two-hybrid screening (Y2H) tests a "bait" protein against many potential interacting proteins ("prey") to identify physical protein–protein interactions. This system is based on a transcription factor, originally GAL4,[9] whose separate DNA-binding and transcription activation domains are both required in order for the protein to cause transcription of a reporter gene. In a Y2H screen, the "bait" protein is fused to the binding domain of GAL4, and a library of potential "prey" (interacting) proteins is recombinantly expressed in a vector with the activation domain. In vivo interaction of bait and prey proteins in a yeast cell brings the activation and binding domains of GAL4 close enough together to result in expression of a reporter gene. It is also possible to systematically test a library of bait proteins against a library of prey proteins to identify all possible interactions in a cell.
MS and AP/MS
[edit]Mass spectrometry (MS) can identify proteins and their relative levels, hence it can be used to study protein expression. When used in combination with affinity purification, mass spectrometry (AP/MS) can be used to study protein complexes, that is, which proteins interact with one another in complexes and in which ratios. In order to purify protein complexes, usually a "bait" protein is tagged with a specific protein or peptide that can be used to pull out the complex from a complex mix. The purification is usually done using an antibody or a compound that binds to the fusion part. The proteins are then digested into short peptide fragments and mass spectrometry is used to identify the proteins based on the mass-to-charge ratios of those fragments.
Deep mutational scanning
[edit]In deep mutational scanning, every possible amino acid change in a given protein is first synthesized.[10] The activity of each of these protein variants is assayed in parallel using barcodes for each variant.[11] By comparing the activity to the wild-type protein, the effect of each mutation is identified. While it is possible to assay every possible single amino-acid change due to combinatorics two or more concurrent mutations are hard to test. Deep mutational scanning experiments have also been used to infer protein structure and protein-protein interactions.[12] Deep Mutational Scanning is an example of a multiplexed assays of variant effect (MAVEs), a family of methods that involve mutagenesis of a DNA-encoded protein or regulatory element followed by a multiplexed assay for some aspect of function. MAVEs enable the generation of ‘variant effect maps’ characterizing aspects of the function of every possible single nucleotide change in a gene or functional element of interest. [13]
Mutagenesis and phenotyping
[edit]An important functional feature of genes is the phenotype caused by mutations. Mutants can be produced by random mutations or by directed mutagenesis, including site-directed mutagenesis, deleting complete genes, or other techniques.
Knock-outs (gene deletions)
[edit]Gene function can be investigated by systematically "knocking out" genes one by one. This is done by either deletion or disruption of function (such as by insertional mutagenesis) and the resulting organisms are screened for phenotypes that provide clues to the function of the disrupted gene. Knock-outs have been produced for whole genomes, i.e. by deleting all genes in a genome. For essential genes, this is not possible, so other techniques are used, e.g. deleting a gene while expressing the gene from a plasmid, using an inducible promoter, so that the level of gene product can be changed at will (and thus a "functional" deletion achieved).
Site-directed mutagenesis
[edit]Site-directed mutagenesis is used to mutate specific bases (and thus amino acids). This is critical to investigate the function of specific amino acids in a protein, e.g. in the active site of an enzyme.
RNAi
[edit]RNA interference (RNAi) methods can be used to transiently silence or knockdown gene expression using ~20 base-pair double-stranded RNA typically delivered by transfection of synthetic ~20-mer short-interfering RNA molecules (siRNAs) or by virally encoded short-hairpin RNAs (shRNAs). RNAi screens, typically performed in cell culture-based assays or experimental organisms (such as C. elegans) can be used to systematically disrupt nearly every gene in a genome or subsets of genes (sub-genomes); possible functions of disrupted genes can be assigned based on observed phenotypes.
CRISPR screens
[edit]CRISPR-Cas9 has been used to delete genes in a multiplexed manner in cell-lines. Quantifying the amount of guide-RNAs for each gene before and after the experiment can point towards essential genes. If a guide-RNA disrupts an essential gene it will lead to the loss of that cell and hence there will be a depletion of that particular guide-RNA after the screen. In a recent CRISPR-cas9 experiment in mammalian cell-lines, around 2000 genes were found to be essential in multiple cell-lines.[15][16] Some of these genes were essential in only one cell-line. Most of genes are part of multi-protein complexes. This approach can be used to identify synthetic lethality by using the appropriate genetic background. CRISPRi and CRISPRa enable loss-of-function and gain-of-function screens in a similar manner. CRISPRi identified ~2100 essential genes in the K562 cell-line.[17][18] CRISPR deletion screens have also been used to identify potential regulatory elements of a gene. For example, a technique called ScanDel was published which attempted this approach. The authors deleted regions outside a gene of interest(HPRT1 involved in a Mendelian disorder) in an attempt to identify regulatory elements of this gene.[19] Gassperini et al. did not identify any distal regulatory elements for HPRT1 using this approach, however such approaches can be extended to other genes of interest.
Functional annotations for genes
[edit]Genome annotation
[edit]Putative genes can be identified by scanning a genome for regions likely to encode proteins, based on characteristics such as long open reading frames, transcriptional initiation sequences, and polyadenylation sites. A sequence identified as a putative gene must be confirmed by further evidence, such as similarity to cDNA or EST sequences from the same organism, similarity of the predicted protein sequence to known proteins, association with promoter sequences, or evidence that mutating the sequence produces an observable phenotype.
Rosetta stone approach
[edit]The Rosetta stone approach is a computational method for de-novo protein function prediction. It is based on the hypothesis that some proteins involved in a given physiological process may exist as two separate genes in one organism and as a single gene in another. Genomes are scanned for sequences that are independent in one organism and in a single open reading frame in another. If two genes have fused, it is predicted that they have similar biological functions that make such co-regulation advantageous.
Bioinformatics methods for Functional genomics
[edit]Because of the large quantity of data produced by these techniques and the desire to find biologically meaningful patterns, bioinformatics is crucial to analysis of functional genomics data. Examples of techniques in this class are data clustering or principal component analysis for unsupervised machine learning (class detection) as well as artificial neural networks or support vector machines for supervised machine learning (class prediction, classification). Functional enrichment analysis is used to determine the extent of over- or under-expression (positive- or negative- regulators in case of RNAi screens) of functional categories relative to a background sets. Gene ontology based enrichment analysis are provided by DAVID and gene set enrichment analysis (GSEA),[20] pathway based analysis by Ingenuity[21] and Pathway studio[22] and protein complex based analysis by COMPLEAT.[23]
New computational methods have been developed for understanding the results of a deep mutational scanning experiment. 'phydms' compares the result of a deep mutational scanning experiment to a phylogenetic tree.[24] This allows the user to infer if the selection process in nature applies similar constraints on a protein as the results of the deep mutational scan indicate. This may allow an experimenter to choose between different experimental conditions based on how well they reflect nature. Deep mutational scanning has also been used to infer protein-protein interactions.[25] The authors used a thermodynamic model to predict the effects of mutations in different parts of a dimer. Deep mutational structure can also be used to infer protein structure. Strong positive epistasis between two mutations in a deep mutational scan can be indicative of two parts of the protein that are close to each other in 3-D space. This information can then be used to infer protein structure. A proof of principle of this approach was shown by two groups using the protein GB1.[26][27]
Results from MPRA experiments have required machine learning approaches to interpret the data. A gapped k-mer SVM model has been used to infer the kmers that are enriched within cis-regulatory sequences with high activity compared to sequences with lower activity.[28] These models provide high predictive power. Deep learning and random forest approaches have also been used to interpret the results of these high-dimensional experiments.[29] These models are beginning to help develop a better understanding of non-coding DNA function towards gene-regulation.
Consortium projects
[edit]The ENCODE project
[edit]The ENCODE (Encyclopedia of DNA elements) project is an in-depth analysis of the human genome whose goal is to identify all the functional elements of genomic DNA, in both coding and non-coding regions. Important results include evidence from genomic tiling arrays that most nucleotides are transcribed as coding transcripts, non-coding RNAs, or random transcripts, the discovery of additional transcriptional regulatory sites, further elucidation of chromatin-modifying mechanisms.
The Genotype-Tissue Expression (GTEx) project
[edit]The GTEx project is a human genetics project aimed at understanding the role of genetic variation in shaping variation in the transcriptome across tissues. The project has collected a variety of tissue samples (> 50 different tissues) from more than 700 post-mortem donors. This has resulted in the collection of >11,000 samples. GTEx has helped understand the tissue-sharing and tissue-specificity of eQTLs.[30] The genomic resource was developed to "enrich our understanding of how differences in our DNA sequence contribute to health and disease."[31]
The Atlas of Variant Effects Alliance
[edit]The Atlas of Variant Effects Alliance (AVE),[32] founded in 2020, is an international consortium aiming to catalog the impact of all possible genetic variants for disease-related functional genomics by creating variant effect maps that reveal the function of every possible single nucleotide change in a gene or regulatory element. AVE is funded in part through the Brotman Baty Institute at the University of Washington and the National Human Genome Research Institute, via funding from the Center of Excellence in Genome Science grant (NHGRI RM1HG010461).[33]
See also
[edit]References
[edit]- ^ Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E (20 February 2013). "On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE". Genome Biology and Evolution. 5 (3): 578–90. doi:10.1093/gbe/evt028. PMC 3622293. PMID 23431001.
- ^ Gibson G, Muse SV. A primer of genome science (3rd ed.). Sunderland, MA: Sinauer Associates.
- ^ Pevsner J (2009). Bioinformatics and functional genomics (2nd ed.). Hoboken, NJ: Wiley-Blackwell. ISBN 9780470085851.
- ^ Wang H, Mayhew D, Chen X, Johnston M, Mitra RD (May 2011). "Calling Cards enable multiplexed identification of the genomic targets of DNA-binding proteins". Genome Research. 21 (5): 748–55. doi:10.1101/gr.114850.110. PMC 3083092. PMID 21471402.
- ^ Hrdlickova R, Toloue M, Tian B (January 2017). "RNA-Seq methods for transcriptome analysis". Wiley Interdisciplinary Reviews: RNA. 8 (1): e1364. doi:10.1002/wrna.1364. PMC 5717752. PMID 27198714.
- ^ Kwasnieski JC, Fiore C, Chaudhari HG, Cohen BA (October 2014). "High-throughput functional testing of ENCODE segmentation predictions". Genome Research. 24 (10): 1595–602. doi:10.1101/gr.173518.114. PMC 4199366. PMID 25035418.
- ^ Patwardhan RP, Hiatt JB, Witten DM, Kim MJ, Smith RP, May D, et al. (February 2012). "Massively parallel functional dissection of mammalian enhancers in vivo". Nature Biotechnology. 30 (3): 265–70. doi:10.1038/nbt.2136. PMC 3402344. PMID 22371081.
- ^ Arnold CD, Gerlach D, Stelzer C, Boryń ŁM, Rath M, Stark A (March 2013). "Genome-wide quantitative enhancer activity maps identified by STARR-seq". Science. 339 (6123): 1074–7. Bibcode:2013Sci...339.1074A. doi:10.1126/science.1232542. PMID 23328393. S2CID 54488955.
- ^ Fields S, Song O (July 1989). "A novel genetic system to detect protein-protein interactions". Nature. 340 (6230): 245–6. Bibcode:1989Natur.340..245F. doi:10.1038/340245a0. PMID 2547163. S2CID 4320733.
- ^ Araya C, Fowler D (September 29, 2011). "Deep mutational scanning: assessing protein function on a massive scale". Trends in Biotechnology. 29 (9): 435–442. doi:10.1016/j.tibtech.2011.04.003. PMC 3159719. PMID 21561674.
- ^ Penn WD, McKee AG, Kuntz CP, Woods H, Nash V, Gruenhagen TC, et al. (March 2020). "Probing biophysical sequence constraints within the transmembrane domains of rhodopsin by deep mutational scanning". Sci Adv. 6 (10): eaay7505. Bibcode:2020SciA....6.7505P. doi:10.1126/sciadv.aay7505. PMC 7056298. PMID 32181350.
- ^ Rollins N, Brock K, Poelwijk F, Marks D (2019). "Inferring protein 3D structure from deep mutation scans". Nature Genetics. 51 (7): 1170–1176. doi:10.1038/s41588-019-0432-9. PMC 7295002. PMID 31209393.
- ^ Fowler DM, Adams DJ, Gloyn AL, Starita L (2023). "An Atlas of Variant Effects to understand the genome at nucleotide resolution". Genome Biology. 24 (1): 147. doi:10.1186/s13059-023-02986-x. PMC 10316620. PMID 37394429.
- ^ Tian S, Muneeruddin K, Choi MY, Tao L, Bhuiyan RH, Ohmi Y, et al. (27 November 2018). "Genome-wide CRISPR screens for Shiga toxins and ricin reveal Golgi proteins critical for glycosylation". PLOS Biology. 16 (11). e2006951. doi:10.1371/journal.pbio.2006951. PMC 6258472. PMID 30481169.
- ^ Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al. (December 2015). "High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities". Cell. 163 (6): 1515–26. doi:10.1016/j.cell.2015.11.015. PMID 26627737.
- ^ Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelson T, et al. (January 2014). "Genome-scale CRISPR-Cas9 knockout screening in human cells". Science. 343 (6166): 84–87. Bibcode:2014Sci...343...84S. doi:10.1126/science.1247005. PMC 4089965. PMID 24336571.
- ^ Gilbert LA, Horlbeck MA, Adamson B, Villalta JE, Chen Y, Whitehead EH, et al. (October 2014). "Genome-Scale CRISPR-Mediated Control of Gene Repression and Activation". Cell. 159 (3): 647–61. doi:10.1016/j.cell.2014.09.029. PMC 4253859. PMID 25307932.
- ^ Horlbeck MA, Gilbert LA, Villalta JE, Adamson B, Pak RA, Chen Y, et al. (September 2016). "Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation". eLife. 5. doi:10.7554/eLife.19760. PMC 5094855. PMID 27661255.
- ^ Gasperini M, Findlay GM, McKenna A, Milbank JH, Lee C, Zhang MD, et al. (August 2017). "CRISPR/Cas9-Mediated Scanning for Regulatory Elements Required for HPRT1 Expression via Thousands of Large, Programmed Genomic Deletions". American Journal of Human Genetics. 101 (2): 192–205. doi:10.1016/j.ajhg.2017.06.010. PMC 5544381. PMID 28712454.
- ^ Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. (October 2005). "Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles". Proceedings of the National Academy of Sciences of the United States of America. 102 (43): 15545–50. Bibcode:2005PNAS..10215545S. doi:10.1073/pnas.0506580102. PMC 1239896. PMID 16199517.
- ^ "Ingenuity Systems". Archived from the original on 1999-01-25. Retrieved 2007-12-31.
- ^ "Ariadne Genomics: Pathway Studio". Archived from the original on 2007-12-30. Retrieved 2007-12-31.
- ^ Vinayagam A, Hu Y, Kulkarni M, Roesel C, Sopko R, Mohr SE, et al. (February 2013). "Protein complex-based analysis framework for high-throughput data sets". Science Signaling. 6 (264): rs5. doi:10.1126/scisignal.2003629. PMC 3756668. PMID 23443684.
- ^ Hilton SK, Doud MB, Bloom JD (2017). "phydms: software for phylogenetic analyses informed by deep mutational scanning". PeerJ. 5: e3657. doi:10.7717/peerj.3657. PMC 5541924. PMID 28785526.
- ^ Diss G, Lehner B (April 2018). "The genetic landscape of a physical interaction". eLife. 7. doi:10.7554/eLife.32472. PMC 5896888. PMID 29638215.
- ^ Schmiedel JM, Lehner B (July 2019). "Determining protein structures using deep mutagenesis". Nature Genetics. 51 (7): 1177–1186. doi:10.1038/s41588-019-0431-x. PMC 7610650. PMID 31209395.
- ^ Rollins NJ, Brock KP, Poelwijk FJ, Stiffler MA, Gauthier NP, Sander C, et al. (July 2019). "Inferring protein 3D structure from deep mutation scans". Nature Genetics. 51 (7): 1170–1176. doi:10.1038/s41588-019-0432-9. PMC 7295002. PMID 31209393.
- ^ Ghandi M, Lee D, Mohammad-Noori M, Beer MA (July 2014). "Enhanced regulatory sequence prediction using gapped k-mer features". PLOS Computational Biology. 10 (7): e1003711. Bibcode:2014PLSCB..10E3711G. doi:10.1371/journal.pcbi.1003711. PMC 4102394. PMID 25033408.
- ^ Li Y, Shi W, Wasserman WW (May 2018). "Genome-wide prediction of cis-regulatory regions using supervised deep learning methods". BMC Bioinformatics. 19 (1): 202. doi:10.1186/s12859-018-2187-1. PMC 5984344. PMID 29855387.
- ^ Battle A, Brown CD, Engelhardt BE, Montgomery SB, et al. (GTEx Consortium) (October 2017). "Genetic effects on gene expression across human tissues". Nature. 550 (7675): 204–213. Bibcode:2017Natur.550..204A. doi:10.1038/nature24277. PMC 5776756. PMID 29022597.
- ^ "GTEx Creates a Reference Data Set to Study Genetic Changes and Gene Expression". Office of Strategic Coordination - The Common Fund. U.S. National Institutes of Health. 8 February 2018. Retrieved 2022-01-13.
- ^ "Atlas of Variant Effects Alliance". Research Organization Registry.
- ^ "Scientists Launch 'Herculean' Project Creating Atlas of Human Genome Variants | Brotman Baty Institute". brotmanbaty.org. Retrieved 2024-02-05.
External links
[edit]- European Science Foundation Programme on Frontiers of Functional Genomics
- MUGEN NoE — Integrated Functional Genomics in Mutant Mouse Models
- Nature insights: functional genomics
- ENCODE