Microarray 09
Microarray 09
StuartM.Brown
NYUSchoolofMedicine
TheCentralDogmaofMolecularBiology
DNAistranscribedintoRNAwhichisthen
translatedintoprotein
DNA
transcription
RNA
translation
protein
replication
Measured by Microarray
WhatisaMicroarray
Asimpleconcept:DotBlot+Northern
Reversethehybridizationputtheprobes
onthefilterandlabelthebulkRNA
Makeprobesforlotsofgenesamassively
parallelexperiment
Makeittinysoyoudontneedsomuch
RNAfromyourexperimentalcells.
Makequantitativemeasurements
=
=
=
=
=
=
4406
3509
2421
1557
834
294
5000
4500
4406
4000
3509
3500
3000
2500
2421
2000
1557
1500
1000
500
834
294
0
2000
2001
2002
2003
2004
2005
AFilterArray
DNAChipMicroarrays
Putalargenumber(~100K)ofcDNAsequencesor
syntheticDNAoligomersontoaglassslide(orother
subtrate)inknownlocationsonagrid.
LabelanRNAsampleandhybridize
MeasureamountsofRNAboundtoeachsquareinthe
grid
Makecomparisons
Cancerousvs.normaltissue
Treatedvs.untreated
Timecourse
Manyapplicationsinbothbasicandclinicalresearch
cDNAMicroarrayTechnologies
SpotclonedcDNAsontoaglassmicroscope
slide
usuallyPCRamplifiedsegmentsofplasmids
Label2RNAsampleswith2differentcolorsof
flourescentdyecontrolvs.experimental
MixtwolabeledRNAsandhybridizetothe
chip
Maketwoscansoneforeachcolor
Combinetheimagestocalculateratiosof
amountsofeachRNAthatbindtoeachspot
Robot spotter
Ordinary glass
microscope slide
CombinescansforRed&Green
cDNASpottedMicroarrays
AffymetrixGenechipsystem
Uses25baseoligossynthesizedinplaceona
chip(20pairsofoligosforeachgene)
RNAlabeledandscannedinasinglecolor
onesampleperchip
Canhaveasmanyas20,000genesonachip
Arraysgetsmallereveryyear(moregenes)
Chipsareexpensive
Proprietarysystem:blackboxsoftware,can
onlyusetheirchips
AffymetrixTechnology
DataAcquisition
Scanthearrays
Quantitateeachspot
Subtractbackground
Normalize
Exportatableoffluorescentintensities
foreachgeneinthearray
Automate!!
Allofthiscanbedoneautomaticallyby
software.
Muchmoreconsistent
Mistakeswillbemade(especiallyinthe
spotquantitation)butyoucant
manuallycheckhundredsofthousands
ofspots
AffymetrixSoftware
AffymetrixSystemistotallyautomated
Computesasinglevalueforeachgenefrom40
probes(usingsurprisinglykludgymath)
Highlyreproducible
(rescanofsamechiporhyb.ofduplicatechipswith
samelabeledsamplegivesverysimilarresults)
Incorporatesfalseresultsduetoimageartefacts
dust,bubbles
pixelspilloverfrombrightspottoneighboringdark
spots
Goals of a Microarray
Experiment
1. Find the genes that change expression
BasicDataAnalysis
Foldchange(relativeincreaseordecreasein
intensityforeachgene)
Setcutofffilterforlowvalues
(background+noise)
Clustergenesbysimilarchangesonlyreally
meaningfulacrossmultipletreatmentsor
timepoints
Clustersamplesbysimilargeneexpression
profiles
Raw data
Significance
t-test
SAM
Rank Product
Normalize
Filter
(RMA)
Classification
PAM
Machine learning
Gene lists
Function
(Genome Ontology)
Present/Absent
Minimum value
Fold change
Clustering
SourcesofVariability
Imageanalysis(identifyingandquantitating
eachspotonthearray)
Scanning(laseranddetector,chemistryofthe
flourescentlabel))
Hybridization(temperature,time,mixing,etc.)
Probelabeling
RNAextraction
Biologicalvariability
Scatterplotofallgenesina
simplecomparisonoftwo
control(A)andtwo
treatments(B:highvs.low
glucose)showingchangesin
expressiongreaterthan2.2
and3fold.
Normalization
Cancontrolformanyoftheexperimental
sourcesofvariability(systematic,notrandom
orgenespecific)
Bringeachimagetothesameaverage
brightness
Canusesimplemathorfancy
dividebythemean(wholechiporbysectors)
LOESS(locallyweightedregression)
Nosurebiologicalstandards
RMA
RobustMultichipAverage
AretheTreatmentsDifferent?
Analysisofmicroarraydatahastendedtofocuson
makinglistsofgenesthatareupordownregulated
betweentreatments
Beforemakingtheselists,askthequestion:
"Arethetreatmentsdifferent?"
Usestandardstatisticalmethodstoevaluateexpression
profilesforeachtreatment(ttestorftest)
Iftherearedifferences,findthegenesmost
responsible
Iftherearenotsignificantoveralldifferences,then
listsofgeneswithlargefoldchangesmayonlyreflect
randomvariability.
Statistics
Whenyouhavevariabilityinmeasurements,
youneedreplicationandstatisticstofindreal
differences
Itsnotjustthegeneswith2foldincrease,
butthosewithasignificantpvalueacross
replicates
Nonparametric(i.e.rank)orpairedvalue
statisticsmaybemoreappropriate
MultipleComparisons
Inamicroarrayexperiment,eachgene(each
probeorprobeset)isreallyaseparate
experiment
Yetifyoutreateachgeneasanindependent
comparison,youwillalwaysfindsomewith
significantdifferences
(thetailsofanormaldistribution)
FalseDiscovery
Statisticianscallfalsepositivesa"type1error"ora
"FalseDiscovery"
FalseDiscoveyRate(FDR)isequaltothepvalueof
thettestXthenumberofgenesinthearray
Forapvalueof0.01X10,000genes
=100falsedifferentgenes
Youcannoteliminatefalsepositives,butbychoosinga
morestringentpvalue,youcankeepthemmanageable
(tryp=0.001)
TheFDRmustbesmallerthanthenumberofreal
differencesthatyoufindwhichinturndependson
thesizeofthedifferencesandvarabilityofthe
measuredexpressionvalues
SAM
SignificanceAnalysisofMicroarrays
Tusher, Tibshirani and Chu (2001): Significance
analysisofmicroarraysappliedtotheionizingradiation
response.PNAS 2001 98: 5116-5121, (Apr 24).
Excel plugin
Free
Permutation based
Most published method of
microarray data analysis
HigherLevel
Microarraydataanalysis
Clusteringandpatterndetection
Dataminingandvisualization
Controlsandnormalizationofresults
Statisticalvalidatation
Linkagebetweengeneexpressiondataandgene
sequence/function/metabolicpathwaysdatabases
Discoveryofcommonsequencesincoregulated
genes
Metastudiesusingdatafrommultipleexperiments
TypesofClustering
Herarchical
Linksimilargenes,builduptoatreeofall
SelfOrganizingMaps(SOM)
Splitallgenesintosimilarsubgroups
Findsitsowngroups(machinelearning)
PrincipleComponent
everygeneisadimension(vector),findasingle
dimensionthatbestrepresentsthedifferencesin
thedata
Clusterby
color
difference
GeneSpring
SOM
Clusters
Classification
How to sort samples into two classes
BioConductor
All of these normalization, statistical,
Functional Genomics
Take a list of "interesting" genes and
significance/classfication analysis of
microarrays, proteomics, or other highthroughput methods
knowledge"
Genome Ontology
How to organize biological
knowledge?
Biologists work on a variety of
GO
Biologists got together a few years ago
Biological Pathways
MicroarrayDatabases
Largeexperimentsmayhavehundredsof
individualarrayhybridizations
Corelabataninstitutionormultiple
investigatorsusingonemachinedata
archiveandvalidateacrossexperiments
Datamininglookforsimilarpatternsof
geneexpressionacrossdifferentexperiments
PublicDatabases
GeneExpressiondataisanessentialaspectof
annotatingthegenome
Publicationanddataexchangeformicroarray
experiments
Datamining/Metastudies
CommondataformatXML
MIAME(MinimalInformationAbouta
MicroarrayExperiment)
GEOattheNCBI
ArrayExpressatEMBL
GeneExpression
Technologies
cDNA(EST)libraries
SAGE
Microarray
rtPCR
RNAseq
TheCancerGenomeAnatomy
Project
CGAPhascollectedalargeamountof
cDNAandrelateddataonline
http://cgap.nci.nih.gov/
cDNAlibrariesfromvarioustissues
searchforgenes
compareexpressionlevels
SAGE
SerialAnalysisofGeneExpressionisa
technologythatsequencesveryshort
fragmentsofmRNA(10or17bp)thathave
beenrandomlyligatedtogether
Theshorttagsareassignedtogenesand
thenrelativecountsforeachgeneare
computedforcDNAlibrariesfromvarious
tissues
SAGEGenie
SAGEAnatomicViewer
SAGEDigitalGeneExpressionDisplayer
DigitalNorthern
SAGEExperimentViewer
Microarray
GEOdatabaseatNCBI
Microarrayexperiments
Definedarrays
Publishedresults
Alsolotsofinconclusiveexperiments
Toolstosearchforspecificgenes
Unreliabletosearchfortissueordiseasein
experimentdescriptiontext
RNAseq
NextGenerationDNAseqencing
NYUcurrentlyhasoneIlluminaGenome
Analyser
generatesmorethan1millionRNAsequences
persample
CurrentlyseekingfundingforaRoche/454
produces100Kreadsof250400bp
CountTranscripts
Techologyexiststoaccuratelycount
transcriptsandcomparesamples
DigitalGeneExpression
Canalsoidentifyalternateisoforms,splice
variants,etc.