D68–D73 Nucleic Acids Research, 2006, Vol. 34, Database issue
doi:10.1093/nar/gkj075
cisRED: a database system for genome-scale
computational discovery of regulatory elements
5
G. Robertson*, M. Bilenky, K. Lin, A. He, W. Yuen, M. Dagpinar, R. Varhol, K. Teague,
O. L. Griffith, X. Zhang, Y. Pan, M. Hassel, M. C. Sleumer, W. Pan, E. D. Pleasance,
M. Chuang, H. Hao, Y. Y. Li, N. Robertson, C. Fjell, B. Li, S. B. Montgomery, T. Astakhova,
J. Zhou1, J. Sander1, A. S. Siddiqui and S. J. M. Jones
Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
and 1Department of Computing Science, University of Alberta, Edmonton, AB, Canada
Received August 15, 2005; Revised and Accepted October 8, 2005
10
ABSTRACT
15
20
25
We describe cisRED, a database for conserved regulatory elements that are identified and ranked by a
genome-scale computational system (www.cisred.
org). The database and high-throughput predictive
pipeline are designed to address diverse target
genomes in the context of rapidly evolving data
resources and tools. Motifs are predicted in promoter
regions using multiple discovery methods applied to
sequence sets that include corresponding sequence
regions from vertebrates. We estimate motif significance by applying discovery and post-processing
methods to randomized sequence sets that are
adaptively derived from target sequence sets, retain
motifs with p-values below a threshold and identify
groups of similar motifs and co-occurring motif
patterns. The database offers information on atomic
motifs, motif groups and patterns. It is webaccessible, and can be queried directly, downloaded
or installed locally.
contributions to motif discovery of different genomes or
sets of genomes; improving the predictive reliability of motifs
by genome-scale clustering and co-occurrence; determining a
best minimal set of motif discovery methods, probably in a
discovery approach that uses multiple methods (6); and using
coexpression and other functional data types.
In this report, we describe a new cisRED database that
contains predictions for whole-genome discovery of regulatory elements in mammals and other eukaryotes. We also
describe the predictive system behind the database, which
uses genome-scale approaches to predict deeply conserved
ab initio motifs and identifies groups of similar motifs and
co-occurring patterns of motifs. Results are available in a webaccessible and downloadable MySQL database. The system is
designed to be readily maintained and extended in the context
of rapidly evolving resources, data types and tools.
40
45
50
55
DATA SOURCES AND PROCESSING
The system is outlined in Figure 1, and is described in detail
at www.cisred.org/content/databases_methods. The upstream
section of the pipeline loads the database with significant
discovered atomic motifs, and the section downstream of
the database identifies groups of similar atomic motifs and
co-occurring patterns of motifs. An atomic motif consists of
a set of sequences, typically with a common length between 6
and 12 bp, members of which are present in a sequence region
on the target species and in corresponding regions on other
genomes.
The system uses genome resources that are a combination of
directly downloaded and processed sequences, annotations
and relationships (e.g. orthology, coexpression and interactions). These are stored in an automatically updated local
resource that holds a wide range of public and commercial
databases.
60
30
35
INTRODUCTION
Approaches for identifying transcriptional regulatory motifs
computationally have been reviewed previously (1–3).
Recently, progress has been made towards identifying a ‘comprehensive catalog’ of mammalian elements (4,5). As genome
resources, data types and tools evolve, predictive approaches
developed for such work can also be directed at a number
of closely connected issues. Some of these include taking
advantage in motif discovery of increasing numbers of genomes, including low-coverage sequences; quantifying the
65
70
*To whom correspondence should be addressed. Tel: +1 604 675 8170; Fax: +1 604 876 3561; Email: grobertson@bcgsc.ca
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
Ó The Author 2006. Published by Oxford University Press. All rights reserved.
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access
version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press
are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but
only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org
Nucleic Acids Research, 2006, Vol. 34, Database issue
D69
Figure 1. Data processing system for high-throughput motif discovery, clustering, co-occurrence, annotation and performance assessment.
5
10
Motif discovery was carried out in a search region based
on a single major transcript for each gene. An input target
sequence set consisted of a sequence from the target species,
and corresponding sequences from homologous vertebrate
genes. The following rules were used to assemble target
sequence sets for cisRED 1.2e. We identified homologous
genes by combining data from Compara (7), HomoloGene
(8), Inparanoid (9) and KEGG (10). For each target human
gene and each of its orthologues, we took the major Ensembl
(7) transcript if its protein sequence was N-terminal complete.
We further required that the human gene had an annotated
Ensembl 50 -untranslated region (50 -UTR) that was at least
10 bp long. For each target gene’s orthologue set, we required
at least one of Dog, Mouse and Rat, and at least one of
Chicken, Frog, Fugu, Tetraodon and Zebrafish. If the human
50 -UTR was <500 bp, we applied no UTR requirements to
orthologues; however, if the human 50 -UTR was >500 bp,
we required that all orthologues had annotated Ensembl
50 -UTRs that were at least 10 bp long. After this filtering,
for any orthologous region that was missing from an input
sequence set, we added a sequence region from the current
UCSC (11,12) multiple sequence alignment. For a small subset
of the target genes, corresponding regions from ENCODEspecific species were then added from the current ENCODE
multiple sequence alignment (13). Finally, regions were added
from an internally processed version of the unannotated
15
20
25
D70
5
10
15
20
25
30
35
40
Nucleic Acids Research, 2006, Vol. 34, Database issue
Macaca mullata genome from http://www.hgsc.bcm.tmc.edu/
projects/rmacaque/. These rules delivered 7500 human target genes with sequence sets that contained a minimum of
three and an average of six orthologous sequences from
other genomes. Sequence regions extended 1.5 kb upstream
and 100 bp downstream of transcription start sites (TSS), net
of all types of repeats except LTR/ERV1, LTR/ERVL, LTR/
MaLR and of coding sequences, which were masked.
In addition to a target sequence set, we supplied each discovery program with a genomic ‘background’ input file. The
background input for each species consisted of 1000 concatenated search regions that were randomly selected from the
genome’s entire set of search regions.
We used multiple discovery methods in parallel, running
each method with a range of parameter settings; typically
target motif width and motif occurrence model were varied.
We used a compact but diverse base set of discovery methods
that consisted of CONSENSUS (14), MEME (15) and MotifSampler (16). Raw discovered motifs were post-processed, for
example, to remove identical motifs reported by the same
algorithm, and to merge strongly overlapped motifs.
We assigned p-values to motifs discovered by multiple
methods across large sets of target genes whose sequence
sets varied in species composition, as follows. First, we identified a representative subset of target sequence sets that
sampled the range of species compositions of the target
sets. Then, for each representative target sequence set, we
created random sequence sets by retaining the original
sequence from the target species and replacing each orthologous sequence with a synthetic sequence. The random synthetic
sequence was generated from the target sequence by a tool we
developed to simulate neutral evolution using published substitution rates and indel rates and lengths. Each random
sequence set was generated 30 times in order to avoid statistical bias. We then submitted each target sequence set and all
random sequence sets to identical motif discovery and postprocessing procedures. We assigned method-independent
(MI) scores to all motifs discovered in target and random
sequence sets, using a trainable function that contained four
non-negative parameters:
scoreðaÞ ¼
45
50
55
ð1 þ a1 DÞð1 þ a2 Cremote Þð1 þ a4 WÞ
B‚
ð1 þ a3 ð1 Cclose ÞÞ
where coefficients took the following possible values: D,
W 2 R, B 2 {0, 1} and C 2 [0, 1]. In the above equation,
D characterized the number of site sequences in a motif; W and
B characterized the shape of a motif’s information content
profile; and C characterized motif sequence conservation.
The score increased when the motif was conserved for species
that are evolutionarily remote from the target, and decreased
when the motif was not conserved for species that are close to
the target (primates, in the case of human). We used the distribution of MI scores for motifs from a target gene’s random
sequences to transform the MI scores for the target gene’s
motifs into p-values. Finally, we loaded the database with
motifs whose p-values were below a threshold (which, for
cisRED 1.2e, was 0.05).
A library of known transcription factor binding sites, split
into mutually exclusive training and testing fractions, was
used to optimize the scoring function and to characterize
the performance of the system. The library contained
1000 sites for 300 human genes from TRANSFAC v9.1
(17), and 250 binding sites that we curated from the literature. We optimized the scoring function by simulated annealing, using a training fraction of known sites from randomly
selected genes and two objective functions: the area under a
receiver operating characteristic (ROC) curve and the number
of experimentally known motifs that were not predicted. We
assessed the system’s predictive performance with a test fraction of known sites, using observables like sensitivity, specificity and positive predictive value (PPV) (e.g. www.cisred.
org/content/databases_methods/human_1_2e/performance).
Although comparisons of method performance are constrained
by many factors, current system performance compared
favourably to results in a recent study (6).
To identify groups of similar motifs, we defined two pairwise motif similarity metrics. For the first metric, we used a
version of the Levenstein edit distance between two sequences
that was modified to permit no internal gaps (18). For each
motif, M, and its reverse complement, RC(M), we scanned
motif pairs relative to each other, and reported the overall
minimum average mutual edit distance to motif K, i.e.
min(d(M, K), d(RC(M), K)). The second metric was based
on the maximum information content shared between position
frequency matrices derived for each motif, and also treated a
motif and its reverse complement as equivalent. We hierarchically clustered pairwise dissimilarity matrices with the local
density-based OPTICS algorithm (19). We extracted clusters
from OPTICS’ reachability output by applying an automatic
cluster recognition method that identifies cluster boundaries as
inflection points in the reachability plot (20), then traversing
these hierarchical segmentation results with an algorithm that
traced a deepest available path, constrained by a maximum
preset depth.
The large size of mammalian genomes makes it challenging
to organize the computational hardware and software infrastructure required to address such issues routinely. We did the
large-scale discovery, similarity and co-occurrence calculations on a Beowulf-style, 400 CPU (Pentium III, Xeon,
Opteron) OSCAR cluster (http://oscar.openclustergroup.org)
running Red Hat Linux 9; and remotely on the Beowulf
1700 CPU ‘glacier’ cluster at WestGrid (www.westgrid.ca).
We clustered motifs on a 12 dual-core CPU (UltraSPARC IV)
SMP server with 96 GB of RAM running Solaris 9.
60
65
70
75
80
85
90
95
100
CISRED DATABASE CONTENTS,
STRUCTURE AND ACCESS
Figure 2 shows a schematic diagram of the database design
from a user perspective. The database infrastructure is
designed to evolve to hold results from a range of mammals,
as well as results from model organisms. Because promoter
regions are enriched sources of regulatory elements, motif
discovery and the cisRED design were both based on regions
around TSSs. cisRED human v1.2 contains motifs from promoter regions of 7500 human genes using Ensembl v30
(NCBI 35) data, as well as a pilot result set of 250 mouse
genes.
The current database design makes three types or levels
of information available for regulatory elements: (i) atomic
motifs, which are discovered independently in each target
105
110
115
Nucleic Acids Research, 2006, Vol. 34, Database issue
D71
Figure 2. Database contents and high-level links, from a web user perspective, as of cisRED human v1.2. ‘Atomic motifs’ are motifs discovered in target sequence
sets. ‘Groups’ are clusters of similar motifs that are identified by large-scale OPTICS (19) clustering. ‘Patterns’ are co-occurring sets of group-labelled motifs.
5
10
15
20
region sequence set; (ii) groups of similar motifs, each of
which is a putative model for the binding site of a single
transcription factor; and (iii) co-occurring patterns of models,
which are putative regulatory modules.
Predicted regulatory elements can be viewed directly in
cisRED’s web user interface. From this interface, motifs
can be viewed in the UCSC genome browser, in the Ensembl
genome browser via a DAS server, or in the Sockeye
comparative genomics workspace (21). A user can filter the
displayed motifs by criteria like the p-value threshold, the
orthologous species present in a motif and the discovery
method.
The database contains a table of high-confidence globally
coexpressed genes (22); genes coexpressed with each cisRED
target gene are listed on that gene’s page. Coexpression
resources are available for download at http://www.bcgsc.
ca/gc/bomge/coexpression. The database also contains a
table of single nucleotide polymorphisms (SNPs) from
dbSNP (8) that occur in predicted motifs. When a predicted
human motif contains a SNP, the cisRED atomic motif page
highlights this variation on the target sequence of a motif site
sequence set, and the highlighted base hyperlinks to the SNP’s
primary source information in dbSNP.
The database can be accessed in several ways. A web user
interface is available at www.cisred.org. A current schema
diagram is available from the ‘Databases & Methods’
page, and direct SQL queries can be run on the MySQL
databases at db.cisred.org. A user can download the data,
with SQL files, as well as a compressed file that contains
all input FASTA sequence sets. As genome resources and
our data processing evolve, we update database contents.
Because older versions of the database may not be compatible
with the current user interface, only recent versions for each
species can be accessed through the web or via a MySQL
client. However, historical releases of cisRED databases are
archived and can be downloaded.
Certain parts of the system’s software are available from
a tab on the cisRED home page. Sockeye (21) permits, for
example, a user to assess details of conserved regions relative
to genomic annotations in multiple sequence alignments. The
HitPlotter visualizer displays large sets of discovered motifs
from multiple-method discovery runs, and is available on
25
30
35
40
D72
Nucleic Acids Research, 2006, Vol. 34, Database issue
request as a beta release. The database infrastructure is
designed to facilitate installing cisRED locally.
high-confidence globally coexpressed gene pairs. The coexpression database makes results available from this global
analysis and from two other recent analyses (28,29). Although
coexpressed genes can have similar regulatory elements, the
system’s predictive performance improved only marginally
when inputs included coexpressed genes in addition to orthologous genes (data not shown). Given these results, currently
we are assessing an approach that includes no coexpressed
genes in motif discovery inputs, but uses coexpression
information to assess groups and co-occurring patterns identified in genome-scale sets of atomic motifs.
We will extend the database user interface to offer more
complex user filtering, as well as motif searches based on
consensus strings or matrices. For work with classes of regulatory elements that are defined by wet lab data types based
on, for example, ChIP or DNase I hypersensitivity (e.g. see
ENCODE tracks within the UCSC genome browser) (11,13),
we have designed a new schema that is based on search regions
rather than on genes, and will extend the user interface to
support this.
We will continue to assess the contributions to regulatory
element predictions of different genomes and sets of genomes.
We will integrate and assess new motif discovery methods,
and will identify a best minimal set of methods on an ongoing
basis.
50
FUTURE WORK
5
10
15
20
25
30
35
40
45
Database contents will be extended to include large-scale
results for human, mouse, rat, Caenorhabditis elegans and
Drosophila melanogaster. A new schema design supports
these species. For example, cisRED human v2 will contain
18 000 human genes. The input sequence sets for this database were based on human TSSs that were identified by considering Ensembl (7) and RefSeq (23) annotations. To address
the limitations of gene and transcript annotations for nonhuman species, corresponding vertebrate search regions
were taken from UCSC (11,12) and ENCODE multiple
sequence alignments (13). We are extending our ability to
take advantage of unannotated and low-coverage genome
sequence data.
We are continuing to improve motif post-processing and
scoring. Optimizing the scoring function depends on having a
large library of known motif sites; we anticipate that a newly
created web database for submitting and curating binding
sites from the literature may help us to enlarge and improve
this resource (www.oreganno.org; S. B. Montgomery and
O. L. Griffith, manuscript in preparation).
Given a scalable clustering method, we continue to assess
motif similarity metrics and how best to use the hierarchical
information output from OPTICS. Given groups of similar
motifs, we are applying group labels to atomic motifs, then
identifying overrepresented co-occurring motif patterns using
hypergeometric statistics, imposing separation constraints on
neighbouring motifs, and searching in two stages for patterns
larger than pairs (24,25). We are implementing methods for
annotating discovered motifs as known or novel against known
site resources. Given annotated motifs, we will annotate
overrepresented co-occurring patterns of human motif pairs
as known versus novel using the TRANSCompel resource
(26). We are applying genome-scale motif clustering and
co-occurrence as filters for predicted motifs that may improve
the predictive reliability and the resulting catalogue of conserved regulatory elements.
We have assembled a large multi-species coexpression
resource that contains public microarray and SAGE data from
diverse sources (22) (Table 1). We have shown that combining
global coexpression data from multiple platforms improves
confidence in coexpression predictions when assessed against
the Gene Ontology (GO) (27). From this, we established
GO-based Pearson correlation thresholds that identified
Table 1. Contents of coexpression database (22)
Species
Platform
Experiments
Unique genes
H.sapiens
SAGE (short)
Oligo microarray
cDNA microarray
SAGE (short)
Oligo microarray
cDNA microarray
SAGE (long/short)
cDNA microarray
272
1640
2852
85
1802
366
26
1059
8102
20 312
12 452
13 111
12 715
8 164
8 102
15 685
15 956
54 434
M.musculus
C.elegans
Total
55
60
65
70
ACKNOWLEDGEMENTS
We gratefully acknowledge funding from Genome Canada,
Genome British Columbia and the BC Cancer Foundation.
S.J.M.J., M.C.S., S.B.M. and O.L.G. were supported by the
Michael Smith Foundation for Health Research (MSFHR).
E.D.P. was supported by the Canadian Institutes of Health
Research (CIHR). S.B.M. and O.L.G. were also supported
by the Natural Sciences and Engineering Research Council.
H.H. was supported by the CIHR/MSFHR Strategic Training
Program in Bioinformatics. This research has been enabled by
the use of WestGrid computing resources, which are funded in
part by the Canada Foundation for Innovation, Alberta
Innovation and Science, BC Advanced Education, and the participating research institutions. Funding to pay the Open Access
publication charges for this article was provided by Genome
British Columbia.
75
80
85
Conflict of interest statement. None declared.
REFERENCES
1. Wasserman,W.W. and Sandelin,A. (2004) Applied bioinformatics for the
identification of regulatory elements. Nature Rev. Genet., 5, 276–87.
2. Bulyk,M.L. (2003) Computational prediction of transcription-factor
binding site locations. Genome Biol., 5, 201.
3. Stormo,G.D. (2000) DNA binding sites: representation and discovery.
Bioinformatics, 16, 16–23.
4. Xie,X., Lu,J., Kulbokas,E.J., Golub,T.R., Mootha,V., Lindblad-Toh,K.,
Lander,E.S. and Kellis,M. (2005) Systematic discovery of regulatory
motifs in human promoters and 30 UTRs by comparison of several
mammals. Nature, 434, 338–345.
5. Elemento,O. and Tavazoie,S. (2005) Fast and systematic genome-wide
discovery of conserved regulatory elements using a non-alignment based
approach. Genome Biol., 6, R18.
6. Tompa,M., Li,N., Bailey,T.L., Church,G.M., De Moor,B., Eskin,E.,
Favorov,A.V., Frith,M.C., Fu,Y., Kent,W.J. et al. (2005) Assessing
computational tools for the discovery of transcription factor binding sites.
Nat. Biotechnol., 23, 137–144.
90
95
100
105
Nucleic Acids Research, 2006, Vol. 34, Database issue
7. Hubbard,T., Andrews,D., Caccamo,M., Cameron,G., Chen,Y.,
Clamp,M., Clarke,L., Coates,G., Cox,T., Cunningham,F. et al. (2005)
Ensembl 2005. Nucleic Acids Res., 33, D447–D453.
8. Wheeler,D.L., Barrett,T., Benson,D.A., Bryant,S.H., Canese,K.,
Church,D.M., DiCuccio,M., Edgar,R., Federhen,S., Helmberg,W. et al.
(2005) Database resources of the National Center for Biotechnology
Information. Nucleic Acids Res., 33, D39–D45.
9. O’Brien,K.P., Remm,M. and Sonnhammer,E.L.L. (2005) Inparanoid:
a comprehensive database of eukaryotic orthologs. Nucleic Acids Res., 33,
D476–D480.
10. Kanehisa,M., Goto,S., Kawashima,S., Okuno,Y. and Hattori,M. (2004)
The KEGG resource for deciphering the genome. Nucleic Acids Res.,
32, D277–D280.
11. Karolchik,D., Baertsch,R., Diekhans,M., Furey,T.S., Hinrichs,A.,
Lu,Y.T., Roskin,K.M., Schwartz,M., Sugnet,C.W., Thomas,D.J.,
Weber,R.J., Haussler,D. and Kent,W.J. (2003) The UCSC Genome
Browser database. Nucleic Acids Res., 31, 51–54.
12. Blanchette,M., Kent,W.J., Riemer,C., Elnitski,L., Smit,A.F.,
Roskin,K.M., Baertsch,R., Rosenbloom,K., Clawson,H., Green,E.D.
et al. (2004) Aligning multiple genomic sequences with the threaded
blockset aligner. Genome Res., 14, 708–715.
13. Feingold,E.A., Good,P.J., Guyer,M.S., Kamholz,S., Liefer,L.,
Wetterstrand,K., Collins,F.S., Gingeras,T.R., Kampa,D., Sekinger,E.A.
et al. (2004) The ENCODE (ENCyclopedia of DNA Elements) Project.
Science, 306, 636–640.
14. Hertz,G.Z. and Stormo,G.D. (1999) Identifying DNA and protein patterns
with statistically significant alignments of multiple sequences.
Bioinformatics, 15, 563–577.
15. Bailey,T.L. and Elkan,C. (1994) Fitting a mixture model by expectation
maximization to discover motifs in biopolymers. Proc. Int. Conf. Intell.
Syst. Mol. Biol., 2, 28–36.
16. Thijs,G., Marchal,K., Lescot,M., Rombauts,S., De Moor,B., Rouzé,P. and
Moreau,Y. (2002) A Gibbs Sampling method to detect over-represented
motifs in upstream regions of coexpressed genes. J. Comp. Biol., 9,
447–464.
17. Wingender,E. (2004) TRANSFAC, TRANSPATH and CYTOMER as
starting points for an ontology of regulatory networks. In Silico Biol., 4,
55–61.
18. Levenstein,V.I. (1966) Binary codes capable of correcting insertions and
reversals. Sov. Phys. Docl., 10, 707–710.
5
10
15
20
25
30
35
40
D73
19. Ankerst,M., Breunig,M.M., Kriegel,H.-P. and Sander,J. (1999) OPTICS:
Ordering Points To Identify the Clustering Structure. In Proceedings
of the Int. Conf. on Management of Data (ACM SIGMOD’99),
Philadelphia, PA.
20. Brecheisen,S., Kriegel,H.-P., Kröger,P. and Pfeifle,P.M. (2004) Visual
mining through cluster hierarchies, Proc. SIAM Int. Conf. on Data Mining
(SDM’04), Lake Buena Vista, FL, pp. 400–412.
21. Montgomery,S.B., Astakhova,T., Bilenky,M., Birney,E., Fu,T.,
Hassel,M., Melsopp,C., Rak,M., Robertson,A.G., Sleumer,M.C. et al.
(2004) Sockeye: a 3D environment for comparative genomics.
Genome Res., 14, 956–962.
22. Griffith,O.L., Pleasance,E.D., Fulton,D.L., Oveisi,M., Ester,M.,
Siddiqui,A.S. and Jones,S.J.M. (2005) Assessment and integration of
publicly available SAGE, cDNA microarray, and oligonucleotide
microarray expression data for global coexpression analyses.
Genomics, 86, 476–488.
23. Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2005) NCBI Reference
Sequence (RefSeq): a curated non-redundant sequence database
of genomes, transcripts and proteins. Nucleic Acids Res., 33,
D501–D504.
24. Zhu,Z., Shendure,J. and Church,G.M. (2005) Discovering functional
transcription-factor combinations in the human cell cycle. Genome Res.,
15, 848–855.
25. Kreiman,G. (2004) Identification of sparsely distributed clusters of
cis-regulatory elements in sets of co-expressed genes. Nucleic Acids Res.,
32, 2889–2900.
26. Kel-Margoulis,O.V., Kel,A.E., Reuter,I., Deinenko,I.V. and
Wingender,E. (2002) TRANSCompel: a database on composite
regulatory elements in eukaryotic genes. Nucleic Acids Res., 30,
332–334.
27. Gene Ontology Consortium (2004), The Gene Ontology (GO)
database and informatics resource. Nucleic Acids Res., 32,
D258–D261.
28. Jensen,L.J., Lagarde,J., von Mering,C. and Bork,P. (2004)
ArrayProspector: a web resource of functional associations inferred
from microarray expression data. Nucleic Acids Res., 32,
W445–W448.
29. Lee,H.K., Hsu,A.K., Sajdak,J., Qin,J. and Pavlidis,P. (2004)
Coexpression analysis of human genes across many microarray data sets.
Genome Res., 14, 1085–1094.
45
50
55
60
65
70
75
80