Genomics 92 (2008) 353–358
Contents lists available at ScienceDirect
Genomics
j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / y g e n o
Development of a set of SNP markers present in expressed genes of the apple
David Chagné a,1, Ksenija Gasic b,1, Ross N. Crowhurst c, Yuepeng Han b, Heather C. Bassett a,
Deepa R. Bowatte a, Timothy J. Lawrence c, Erik H.A. Rikkerink c, Susan E. Gardiner a, Schuyler S. Korban b,⁎
a
b
c
The Horticulture and Food Research Institute of New Zealand (HortResearch) Palmerston North, PB 11030, Manawatu Mail Centre, Palmerston North 4442, New Zealand
Department of Natural Resources & Environmental Sciences, University of Illinois, Urbana, IL 61801, USA
HortResearch Mount Albert, PB 92169, Auckland 1142, New Zealand
a r t i c l e
i n f o
Article history:
Received 5 June 2008
Accepted 29 July 2008
Available online 14 September 2008
Keywords
Malus × domestica
Single nucleotide polymorphisms
Expressed sequence tags
Candidate genes
Bin mapping
a b s t r a c t
Molecular markers associated with gene coding regions are useful tools for bridging functional and structural
genomics. Due to their high abundance in plant genomes, single nucleotide polymorphisms (SNPs) are
present within virtually all genomic regions, including most coding sequences. The objective of this study
was to develop a set of SNPs for the apple by taking advantage of the wealth of genomics resources available
for the apple, including a large collection of expressed sequenced tags (ESTs). Using bioinformatics tools, a
search for SNPs within an EST database of approximately 350,000 sequences developed from a variety of
apple accessions was conducted. This resulted in the identification of a total of 71,482 putative SNPs. As the
apple genome is reported to be an ancient polyploid, attempts were made to verify whether those SNPs
detected in silico were attributable either to allelic polymorphisms or to gene duplication or paralogous or
homeologous sequence variations. To this end, a set of 464 PCR primer pairs was designed, PCR was amplified
using two subsets of plants, and the PCR products were sequenced. The SNPs retrieved from these sequences
were then mapped onto apple genetic maps, including a newly constructed map of a Royal Gala × A689-24
cross and a Malling 9 × Robusta 5, map using a bin mapping strategy. The SNP genotyping was performed
using the high-resolution melting (HRM) technique. A total of 93 new markers containing 210 coding SNPs
were successfully mapped. This new set of SNP markers for the apple offers new opportunities for
understanding the genetic control of important horticultural traits using quantitative trait loci (QTL) or
linkage disequilibrium analysis. These also serve as useful markers for aligning physical and genetic maps,
and as potential transferable markers across the Rosaceae family.
© 2008 Elsevier Inc. All rights reserved.
Introduction
Recent advances in plant genomics have moved beyond model
systems to various plant species of significant agronomical and
horticultural importance. Since the release of genome sequences of
Arabidopsis and rice in the past few years [1–4], a number of
comprehensive tools such as bioinformatics tools for sequence
assembly and functional annotation, microarray platforms for highthroughput gene expression, transformation systems, and large cDNA
and gDNA libraries have been developed for a range of species,
including those with a relatively small research investment such as
the apple (Malus × domestica Borkh.) [5]. Following five years of
intensive research efforts and investment in genomics, the apple
now possesses a large collection of expressed sequence tags (ESTs)
(N250 K, dbEST April 2008) comparable to those of well-investigated
animal livestock species and cereal crops. Moreover, a public database
for Rosaceae genomics (www.rosaceae.org), a number of saturated
⁎ Corresponding author. Fax: +1 217 333 8298.
E-mail address: korban@uiuc.edu (S.S. Korban).
1
The first two authors contributed equally to this work.
0888-7543/$ – see front matter © 2008 Elsevier Inc. All rights reserved.
doi:10.1016/j.ygeno.2008.07.008
genetic maps [6], and whole genome sequencing, currently in progress
(R. Velasco, unpublished), are also available. A major challenge for
plant biologists working with apple is to integrate these various tools
to better understand genome structure and function in this woody
temperate perennial fruit crop. This will provide opportunities for
developing robust and reliable molecular markers linked to genes of
interest as well as isolation and characterization of target genes and
regulatory elements for crop improvement efforts using markerassisted breeding and/or plant transformation using native genes and
regulatory elements, i.e., cis-genesis [7]. To pursue effective crop
improvement strategies, when molecular controls of traits of interest
are under investigation, these must be associated with observed
phenotypic variations. One strategy that can be used to achieve this
goal involves correlating observed DNA sequence polymorphism with
the phenotype using either genetic linkage mapping or association
studies. As a prelude to these approaches, a comprehensive set of
molecular markers must be developed from characterized genes using
genomics tools.
Single nucleotide polymorphisms (SNPs) are good candidates for
marker development, as they constitute the most common DNA
sequence variations found in genomes of most organisms, including
354
D. Chagné et al. / Genomics 92 (2008) 353–358
the apple [5]. SNPs, defined broadly to include small deletions and
insertions, are found in most genomic regions, including coding
regions, thus rendering them effective markers for mapping genes. In
general, implementation of SNPs for genetic studies, such as linkage
mapping or association studies, involves a three-step process. The first
step is the SNP discovery phase, which involves the detection of de
novo polymorphism by conducting bioinformatics sequence comparisons or molecular biology techniques or a combination of both. This is
followed by a validation step to distinguish DNA polymorphisms of
genuine allelic variants from those of other biological phenomena,
such as gene duplication events; i.e., paralogous or homeologous
genes, as well as those of technical errors, primarily sequencing errors,
as many databases are based on high-throughput single-pass
sequencing. The final step involves characterization of large numbers
of individuals using a high-throughput genotyping approach. A wide
range of molecular techniques suitable for pursuing these three steps
have been described [8], each characterized by a distinct cost-scale
and throughput capacity, and utilizing different technology platforms.
An example of developing a SNP marker from a candidate gene for
red flesh and foliage color in apple has recently been reported [9]. In a
targeted study on a small number of candidate genes, Chagné et al. [9]
have demonstrated that a single 4-bp insertion–deletion marker
located within a candidate gene cosegregates with red color in a
mapping population of more than 500 individuals. In this study, a
comprehensive approach is adopted to identify a set of markers linked
(and in a few cases possibly corresponding) to complex as well as
simply inherited traits. SNP markers located within expressed genes
have been identified in public apple EST collections and then placed on
a genetic map constructed for an important commercial apple cultivar,
Royal Gala, using selective (or bin) mapping [10,11]. A critical
application for these new markers will be the alignment of genetic
and physical maps in order to assemble the forthcoming apple
genome sequence, and the subsequent fast-tracking of positional
cloning strategies for major genes and quantitative trait loci (QTL)
controlling important agronomic characters. This resource of new SNP
markers will be particularly useful for future applications in reverse
genetics studies and in apple breeding, and also for understanding
apple genome evolution within the family Rosaceae.
Results
Genetic map construction and bin mapping set
Genetic maps were constructed for the parents Royal Gala and
A689-24 using 152 markers for each (Supplemental Material 1). These
markers consisted of 73 AFLP fragments, 155 SSRs, 51 SSRs that were
common between both maps, and 25 SCARs. A total of 20 and 18
linkage groups (LGs) spanning 997.1 and 1053.2 cM were obtained for
Royal Gala and A689-24, respectively. For LG 1 in Royal Gala, only two
SSRs were mapped and cosegregated perfectly, resulting in a 0 cM
group. Most other LGs contained at least two previously described SSR
markers, except for four LGs. For Royal Gala, LGs 12 and 13 had only
one published SSR and LG 7 did not contain any SSR marker; however,
a published SCAR marker (NLscDdARM) [12] was mapped. For A68924, only one LG did not contain any published SSR marker; however, as
it was the last group to remain orphaned, it was deduced that this
probably corresponded to LG 7. Altogether, this allowed alignment of
both maps to published apple maps, and assignment of the same LG
numbering [13].
For the bin mapping set selection, markers from the Royal Gala
dataset were inspected for absence of missing data and for suspect
double-recombination events. The number of markers was reduced to
allow division of the genome map into 54 and 60 bins for Royal Gala
and A689-24, respectively. A subset of 14 individuals was selected,
representing a large number of distinct recombination events, and
was designated as the bin mapping set. The bin mapping set was
validated by adding 10 apple and pear SSR markers of known
locations. These markers were chosen to fill in gaps that were not
covered by our genetic maps, when compared with published maps.
All 10 SSRs mapped to their expected positions (Supplemental
Material 1), thus indicating that bin mapping was accurate enough
to justify adding new markers. In addition, this strategy made it
possible to increase the coverage of the framework map.
SNP discovery in the apple EST dataset
The new EST assembly using 350,051 EST sequences resulted in a
total of 93,959 nonredundant sequences (NRs) at a clustering
threshold of 95%. This included 37,885 contigs representing a total
estimated sequence length of 32.1 Mb of expressed sequences,
approximately 4% of the estimated genome size of the apple. Of
these contigs, 17,825 contained at least four ESTs per alignment, and
could thus be used for SNP discovery. A set of 68 contigs, each
containing more than 200 ESTs in their alignment, was excluded
because of computing time issues. Finally, a set of 17,757 contigs
representing an estimated length of 10.7 Mb were used for SNP
detection (Table 1).
A total of 71,482 bi-allelic SNPs were detected in 9555 contigs
(53.8%), corresponding to an average occurrence of one SNP every
149 bp. A total of 34,361 transitions (A↔G and C↔T) and 37,121
transversions (A↔C, A↔T, C↔G, G↔T) (Fig. 1) were detected, with A↔G
being the most common (18,365; 25.7%) and C↔G the least common
(6181; 8.6%) variation observed.
SNPs, bioinformatics analysis, and validation
A subset of 600 NR sequences with significant blast matches were
selected. Putative functions of these sequences were recorded based
on their blastx scores. In addition, we recorded the types of SNPs
detected, and whether or not sequence variations induced changes in
amino acid sequences, i.e., synonymous vs nonsynonymous. A total of
1949 SNPs were identified, including 852 (43.7%) nonsynonymous
SNPs and 1097 synonymous SNPs. For Royal Gala, 1266 SNPs were
found to be heterozygous.
From these 600 NRs, a set of 464 PCR primer pairs encompassing
1434 SNPs was designed (Supplemental Material 2). Of these primer
pairs, 341 amplified a single PCR product size, while the remaining
yielded either no amplification product or complex patterns (more
than two size products). Two approaches were employed according to
the set of plants used to validate the SNPs. In the first approach, 25
amplicons were sequenced from the bin mapping set; while in the
second approach, 316 amplicons were sequenced from six apple
genotypes. A total of 110 amplicons, 10 and 100 using the first and
second approaches, respectively, yielded poor sequence quality that
could not be used for SNP detection. The remaining 231 (67.7%)
amplicons and their sequences were used for verification of SNPs
detected using ESTs. Out of 728 putative SNPs found in transcribed
regions of these 231 amplicons, 257 could not be verified because
of poor sequence quality. For the remaining 471 putative SNPs, 327
(69.4%) were independently verified in new sequences and
Table 1
Description of the contig set used for single nucleotide polymorphism detection
Number
ESTs
Singletons
Contigs
Total
NRs (more than 4 ESTs per contig)
NRs (more than 200 ESTs per contig)
Total NRs used for SNP detection
NRs containing SNP
350,051
56,074
37,885
93,959
17,825
68
17,757
9,555
Cumulative size (Mb)
32.1
10.7
D. Chagné et al. / Genomics 92 (2008) 353–358
Fig. 1. Classes of single nucleotide polymorphism detected in the full contig set of the
apple.
corresponded to true SNPs, whereas 112 (23.7%) were classified as
paralogous sequence variations, as they did not show any segregation in the bin mapping set. The remaining 32 SNPs (6.8%) were
deemed probable sequencing errors in EST datasets, as these SNPs
were not found following independent sequencing.
SNPs verified using the first approach were scored, and nine markers
were mapped. For markers validated using the second approach, a highresolution melting (HRM) analysis was used. For both approaches, SNP
data were compared with framework markers, and the position of the
new marker was assigned by visual inspection. When the HRM analysis
was monomorphic for the Royal Gala× A689-24 population, the bin
mapping set developed by Celton et al. [14], derived from a Malling
9 × Robusta 5 (M.9 × R5) map, was used instead. Of 167 amplicons tested
using the HRM analysis, 84 (50.3%) were polymorphic and were
mapped. A total of 93 markers segregated, and the locations of 90
markers were assigned to known bins in the maps of both Royal
Gala × A689-24 and M.9 × R5 (Supplemental Material 3). Three markers
could not be assigned to any LG. Altogether, a set of 93 new EST-based
markers corresponding to 210 new coding SNPs were added to the apple
genetic map. To assess the utility of this resource for future comparative
genomics studies in the family Rosaceae, contig sequences associated
with these 93 markers were blasted against Prunus sequences. A large
number of these apple contigs (73 out of 93) detected potentially
orthologous sequences in Prunus (Blastn score b1 × 10− 20).
Discussion
The efficacy of in silico SNP detection in apple ESTs
The availability of large sequence databases for a number of plant
species makes it possible to identify DNA variations corresponding to
SNPs [15,16]. SNP discovery within EST collections using bioinformatics tools has been successful in several plant species [17–20]. In
this study, a total of 71,482 SNPs were detected in 9555 nonredundant
coding regions from a set of 350,051 apple ESTs. This corresponds to an
average occurrence of one SNP every 149 bp, and approximately one
out of two contigs contains at least one putative SNP. This is
comparable to the frequency of SNP discovery observed in other
outcrossing plant species, such as pine (one SNP every 102 bp) [18],
and lower than white clover (one SNP every 54 bp) [21].
A preliminary SNP detection conducted using the HortResearch
EST dataset alone [5] reported the presence of 18,408 SNPs in 3915
contigs. In this study, with an increase in number of ESTs analyzed
from 151,687 to 350,051, number of contigs and cumulative sequence
lengths covered showed a 2.1- to 2.4-fold increase, resulting in a
drastic increase (3.9-fold) in the number of SNPs detected. This
increase is probably due to the larger number of genotypes used in
355
generating cDNA libraries represented in the combined EST dataset.
The analysis by Newcomb et al. [5] was based on a dataset
predominantly (79.8%) consisting of sequences from one apple
cultivar (Royal Gala), while the expanded set was derived from a
diverse set of apple cultivars, thus reducing the contribution from
Royal Gala sequences to 40.1%, followed by GoldRush (32.2%) and then
M.9 (5.0%). These results confirm the reported hypothesis that the
number of individuals used for generating ESTs has a strong influence
on SNP detection [18] and frequency.
The forthcoming whole genome sequence of the apple will be
based on a single apple genotype, Golden Delicious. Although Golden
Delicious is a diploid cultivar, thus permitting the detection of SNPs
heterozygous for this cultivar, many variants present in the wider apple
germ plasm base will remain undetected. With a single genotype
sequence, the probability of detecting SNPs will be limited by the fact
that only two haploid genomes are represented in any one individual,
even in cases where the read depth is high (whole genome sequence
data usually have a minimum read depth of 10× coverage). Indeed, out
of 164 SNPs heterozygous in Royal Gala that were sequenced in Golden
Delicious, 76 (46.3%) were found to be homozygous in Golden
Delicious. While providing a useful framework for assessing genome
variation, the apple whole genome sequence is not expected to provide
a complete picture of the extant genetic variation present in the entire
species. Additional sequencing using multiple genotypes will be
required to enhance the power and/or efficacy of SNP detection and
its downstream utilization. This is now possible even for crops such as
the apple, given the recent reduction in sequencing costs due to new
technologies and platforms for high-throughput sequencing.
In this study, a markedly higher proportion of synonymous SNPs
(56.3%) have been detected than expected (24%) if mutations occurred
randomly. This observation is likely to be due to selective pressure
operating on the position of SNPs within a gene and imposing variation
constraints. This results in synonymous SNPs being more likely to be
retained at certain sites for genes under purifying selection. However,
it should be noted that the approach for determining the position of
start codons used in this study is very conservative, i.e., comparison of
highly conserved sequences between apple and queried protein
sequences. This indicates that the set of sequences used in this study
is biased and as a result may be subjected to a correspondingly biased
set of selection pressures. This renders extending our conclusions to all
other apple genes or to the whole genome difficult.
Although both synonymous and nonsynonymous SNPs are equally
useful for mapping, nonsynonymous SNPs are arguably better targets
for correlating genotypes with phenotypes in candidate gene mapping
approaches, since nonsynonymous changes are more likely to lead to
changes in protein structure which, in turn, are more likely to have an
effect on plant phenotype. To determine whether nonsynonymous SNPs
can contribute to changes in protein structure, additional bioinformatics
analysis is required, e.g., analysis of a number of conservative versus
nonconservative substitutions generated by nonsynonymous changes.
However, synonymous SNPs may also contribute to phenotypic
variations and should not be completely ruled out [22]. This dataset
provides an important resource for association studies to determine
those SNPs linked to trait(s) of interest to breeders.
True SNPs versus paralogous variations
When bioinformatics tools are used to assemble EST datasets and
detect SNPs, alignments of contig sequences correspond only to
putative gene contigs, and may contain both paralogous and homeologous sequences. Unfortunately, there is no set value for sequence
assembly that can eliminate this problem, and the risk of compacting
contigs creates difficulties. As the apple genome is known to have
originated from an ancient tetraploidization [23], presence of homeologs in contigs of highly conserved genes is not surprising. Similarly,
it is not unusual to find paralogs within contigs of genes that have
356
D. Chagné et al. / Genomics 92 (2008) 353–358
undergone a recent duplication event or where the duplication event
is more ancient, but both copies are under selective constraints. In this
study, 23.7% of putative SNPs correspond to either paralogous or
homeologous sequence variations, which is a lower proportion than
that observed in white clover, another paleotetraploid outcrossing
plant [21]. It might be expected that the approach used to select for NR
contigs has resulted in additional bias toward contigs containing
homeologous genes, and this may have influenced the frequency of
SNPs detected. An independent assembly of the public Malus ESTs has
been performed and it is available on the Genomics Database for
Rosaceae (GDR; www.rosaceae.org; Malus assembly v3). This contig
set has detected 14,298 SNPs in 23,868 contigs. However, this
bioinformatics analysis is based on assembly parameters less stringent
(CAP3, -p 90) than those used in this study, which likely increases the
abundance of paralogous sequences clustered within contigs. Hence,
we speculate that a substantial number of the putative SNPs detected
from the GDR assembly must be due to nonallelic variations.
SNP validation and bin mapping
the apple are relatively poor in transcribed sequences compared with
other crops. This study adds a new set of 93 gene coding markers to the
Royal Gala map, which is the largest increase in gene coding sequence
for an apple genetic map reported so far. This set represents a good
resource for identifying genome colocation events between candidate
genes and QTLs. For example, our new set of markers include some
resistance gene analogues that map close to major resistance genes for
several pests and diseases. SNP markers developed from DQ644420
and EB121887 mapped to the middle and bottom of LG 2, respectively,
where several resistance genes controlling the fungal disease apple
scab have been reported [31]. Other previously mapped genes, such as
ACC synthase (MDU73816, LG 15 [32]) and an allergen protein
(EB133053, LG 13 [33]), have been remapped using these new apple
SNPs, confirming the validity of the approach.
Contigs containing these newly developed SNP markers will
enable the development of orthologous markers for comparative
genome mapping studies in Rosaceae. When the best blastn score
against Prunus sequences was analyzed (Supplemental Material 3),
high sequence similarities between ESTs used to design these new
markers and Prunus ESTs were found, with 78.5% of the new apple
contigs showing an expected blastn score b1 × 10− 20. This dataset
presents a particularly useful resource to build on previous mapping
studies [34], and thus for assessing the degree of synteny among
members of the Rosaceae family. Candidate orthologous markers for
these apple genes can be used across the family Rosaceae, using either
sequencing or HRM to identify the SNPs present in each genus.
This study is the first example of systematic SNP development
from sequence information in the apple. It has demonstrated the
importance of using sequence databases containing a broad germ
plasm base. The candidate genes and approaches used are valuable
resources for future SNP development for genetic mapping, comparative genome mapping, and association studies.
The SNP validation strategy used in this study consists of
sequencing PCR products obtained from genomic DNA PCR amplification. Segregation is then tested in a bin mapping set using both
resequencing and HRM. Allelic variants, true SNPs, are likely verified
in alignments obtained from sequenced fragments. Moreover, if these
SNPs exhibit variation for Royal Gala in EST alignments, they should
also segregate in a controlled cross having Royal Gala as one of the
parents. For those instances where in silico-detected SNPs could not
be detected upon resequencing, these were deemed either sequencing
or cloning errors. For those cases where verified sequence variants did
not exhibit any segregation in the bin mapping set, these must have
resulted from either paralogous or homeologous loci. Overall, only a
small subset of original SNPs identified were evaluated by resequencing. From an initial 1434 putative SNPs located in 464 PCR amplicons
designed, only 471 SNPs, including paralogous variants and sequencing errors, could be retrieved and analyzed. This is attributed to the
sequencing technique used, which generated a high proportion of
poor-quality sequences. Similar observations were made when a
comparable method was used to develop SNP markers in Vitis vinifera
[24]. When the HRM approach was used, a higher proportion of
candidate SNPs were validated and mapped (84 out of 167 amplicons),
and thus this approach was deemed more efficient. Relatively few
HRM reactions gave no amplification products due to either one or
more primers traversing a splicing site, but some did exhibit complex
melting patterns that were difficult to score. Therefore, the HRM
approach is the method of choice for future SNP development and for
medium level throughput genotyping. Although the HRM technique
has been used to detect mutations associated with chronic diseases in
humans [25,26] and for detection of RNA editing in Arabidopsis [27],
this is the first example, to our knowledge, of its use in gene mapping
in plants.
The strategy used in this study went beyond verification of
segregation of apple SNP markers, as a selective mapping approach
has been also used to identify putative chromosomal locations of
these SNPs. Use of a carefully chosen set of individuals has enabled
efficient validation with simultaneous mapping of SNPs. The apple
genetic framework map developed in this study for this purpose is the
first for the high-quality cultivar Royal Gala, and this is the first bin
mapping set developed for this cultivar. Previously, a bin mapping
strategy has been successfully employed in the peach, another
Rosaceae species [10], and in an apple rootstock map [14].
A genetic map was constructed using a population of 173
individuals from a cross between Royal Gala and A689-24. Trees
were grown at the HortResearch orchard in Havelock North, New
Zealand. DNA was extracted as described by Gardiner et al. [35]. The
map was constructed using simple sequence repeats (SSRs), amplified
fragment length polymorphisms (AFLPs), and sequence characterized
amplified regions (SCARs). SSRs were PCR amplified as described by
Maliepaard et al. [36]. The parental linkage maps were constructed
with the aid of Joinmap v3.0 [37] using markers informative for each
of the parents, and according to the double-pseudo-testcross mapping
strategy [38]. A minimum LOD score of 3.0 was used for grouping, the
Kosambi function was used for map distance calculation, and the maps
were constructed after three rounds.
SNPs were validated by evaluating their segregation in a subset of
plants selected from the whole mapping population using the
following protocol. For each linkage group, markers were sorted and
selected to cover the genome with intervals of 10 to 30 cM, referred to
as bins. Markers with missing data were not selected. Those individuals
presenting the most unique recombination events for these bins were
selected to reduce redundancy in the progeny set. A set of 14 of these
individuals (the bin mapping set), representing a large number of
distinct recombination events over available linkage groups, was
selected by an iterative approach using manual expert evaluation [14].
Potential use of these new SNPs for marker/trait associations
EST sources and database construction
Although some EST-based SSR and NBS-LRR markers have recently
been mapped [14,28–30] (166 in total), published genome maps for
A total of 350,051 sequences from three different sets of apple ESTs
was used to populate the database (Supplemental Material 4). The first
Materials and methods
Construction of a genetic map for a ‘Royal Gala’ × A689/24 cross and
development of a bin mapping set
D. Chagné et al. / Genomics 92 (2008) 353–358
set was described by Newcomb et al. [5] and consists of 151,687 ESTs,
with a large proportion (78.9%) obtained from the cultivar Royal Gala.
A second set was developed at the University of Illinois and consisted
of 101,581 publicly released ESTs [39] and 80,660 new sequences
originating from a number of apple accessions. A third, smaller set
corresponded to various apple coding sequences deposited in
GenBank. ESTs were assembled into a single nonredundant dataset
and annotated using Bioview [40] as described by Newcomb et al. [5].
Sequencing errors were minimized during the assembly process by
removing low-quality sequence regions. The information on the
cultivar of origin was recorded for each EST in order to identify the
genetic background of each of the sequences, and information for sets
one and two is available on the GDR database.
SNP discovery and characterization
An automated SNP tool developed within Bioview [40] was used
to search for SNPs within EST alignments. Sequence variation with
both variants occurring at least twice in the contig alignments was
retained in order to minimize detection of sequencing errors. A
conservative approach was used to find ORFs to ensure accurate open
reading frame (ORF) annotation and take into account the lack of a
publicly available whole genome sequence for apple for use in
aligning gene sequences. A subset of highly conserved contigs rather
than an automated bioinformatics analysis (e.g., based on ORF finder
type scripts) prone to errors was used. A subset of 600 nonredundant
contigs was selected according to the following criteria: (a) best
blastx matches with expected values of less than 1 × 10− 20 when
checked against plant proteins (UniRef90 database, [41]), and (b)
proteins from alignments between queried and translated sequences
beginning with the first amino acid, along with the first 15 amino
acids being identical. Contigs were first checked for the presence of
SNPs heterozygous in Royal Gala, and then SNPs were annotated by
comparing the amino acid translations to determine whether they
were synonymous or nonsynonymous, i.e., whether they coded for an
amino acid change.
SNP validation and mapping
A set of SNPs located in translated regions, both synonymous and
nonsynonymous, were selected for a validation trial and to estimate
the proportion of sequence variations corresponding to either true
SNPs or sequencing errors or attributable to gene duplication. PCR
primer pairs were designed using Primer 3 (http://frodo.wi.mit.edu/
primer3/input.htm) to yield PCR products that could be sequenced in a
single-pass sequencing reaction (300 to 550 bp), encompassing at
least one SNP, and preferably showing some sequence variability for
Royal Gala in EST alignments. Based on different sets of individuals
used for PCR amplification and sequencing, two approaches were used
to validate SNPs. The first approach consisted of resequencing PCR
fragments amplified from the 14 highly informative individuals from
the Royal Gala × A689-24 mapping population (bin mapping set) along
with Golden Delicious and Coop 17. The second approach consisted of
resequencing PCR fragments from six apple genotypes (Royal Gala,
Malling 9, Golden Delicious, Coop 17, Fuji, and GoldRush) and then
redesigning primers to generate shorter fragments (b300 bp), suitable
for SNP analysis by the high-resolution melting (HRM) technique [42]
on a LightCycler 480 instrument (Roche Diagnostics), in the bin
mapping set. Briefly, this latter technique allows detection of
mutations based on differential melting of PCR-amplified doublestranded DNA fragments. The melting analysis is performed at the end
of the PCR reaction, and the reaction mix contains a high-fidelity
intercalating dye. Products are slowly denatured, reannealed to
initiate the formation of heteroduplexes, and then melted again. The
decrease in the fluorescence intensity is measured, and the difference
in the melting temperature signals whether or not the sample
357
contains heteroduplexes (and hence is heterozygous). For both
approaches, raw sequence traces were aligned for each amplicon
using Sequencher v4.5 (Gene Codes, Ann Arbor, MI, USA), and SNPs
were visually scored for each genotype.
Acknowledgments
This work was funded by HortResearch (to DC and SEG). This
project was also supported by the USDA Cooperative State Research,
Education and Extension Service, National Research Initiative, Plant
Genome Program Grant 2005-35300-15538 (to SSK). The authors
thank Dr. Jean-Marc Celton for providing DNA from the M.9 × R5
population, Paula Jones and Anthony Thrush from Roche Diagnostic
NZ Ltd for their help with the HRM technique, and Drs. Nnadozie
Oraguzie and Vincent Bus for helpful comments on the manuscript.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at doi:10.1016/j.ygeno.2008.07.008.
References
[1] J. Yu, S.N. Hu, J. Wang, G.K.S. Wong, S.G. Li, B. Liu, Y.J. Deng, L. Dai, Y. Zhou, X.Q.
Zhang, et al., A draft sequence of the rice genome (Oryza sativa L. ssp indica),
Science 296 (5565) (2002) 79–92.
[2] S.A. Goff, D. Ricke, T.H. Lan, G. Presting, R.L. Wang, M. Dunn, J. Glazebrook, A.
Sessions, P. Oeller, H. Varma, et al., A draft sequence of the rice genome (Oryza
sativa L. ssp japonica), Science 296 (5565) (2002) 92–100.
[3] X.Y. Lin, S.S. Kaul, S. Rounsley, T.P. Shea, M.I. Benito, C.D. Town, C.Y. Fujii, T. Mason,
C.L. Bowman, M. Barnstead, et al., Sequence and analysis of chromosome 2 of the
plant Arabidopsis thaliana, Nature 402 (6763) (1999) 761.
[4] K. Mayer, C. Schuller, R. Wambutt, G. Murphy, G. Volckaert, T. Pohl, A. Dusterhoft,
W. Stiekema, K.D. Entian, N. Terryn, et al., Sequence and analysis of chromosome 4
of the plant Arabidopsis thaliana, Nature 402 (6763) (1999) 769.
[5] R.D. Newcomb, R.N. Crowhurst, A.P. Gleave, E.H.A. Rikkerink, A.C. Allan, L.L.
Beuning, J.H. Bowen, E. Gera, K.R. Jamieson, B.J. Janssen, et al., Analyses of
expressed sequence tags from apple, Plant Physiol. 141 (1) (2006) 147–166.
[6] S.E. Gardiner, V.G.M. Bus, R.L. Rushholme, D. Chagné, E. Rikkerink, Apple, in: C.R. K
(Ed.), Genome Mapping & Molecular Breeding in Plants, Horticultural trees, vol. 4,
Springer, New York, 2006, pp. 1–62.
[7] C.M. Rommens, All-native DNA transformation: a new approach to plant genetic
engineering, Trends Plant Sci. 9 (9) (2004) 457–464.
[8] D. Chagné, J. Batley, D. Edwards, J. Forster, Chapter 5: Single Nucleotide
Polymorphism genotyping in plants, in: N.C. Oraguzie, E. Rikkerink, S.E. Gardiner,
N.H. De Silva (Eds.), Association mapping in plants, Springer, New York, USA, 2006,
pp. 77–94.
[9] D. Chagné, C.M. Carlisle, C. Blond, R.K. Volz, C.J. Whitworth, N.C. Oraguzie, R.N.
Crowhurst, A.C. Allan, R.V. Espley, R.P. Hellens, et al., Mapping a candidate gene
(MdMYB10) for red flesh and foliage colour in apple, BMC Genomics (2007) 8.
[10] W. Howad, T. Yamamoto, E. Dirlewanger, R. Testolin, P. Cosson, G. Cipriani, A.J.
Monforte, L. Georgi, A.G. Abbott, P. Arus, Mapping with a few plants: Using
selective mapping for microsatellite saturation of the Prunus reference map,
Genetics 171 (3) (2005) 1305–1309.
[11] T.J. Vision, D.G. Brown, D.B. Shmoys, R.T. Durrett, S.D. Tanksley, Selective mapping:
A strategy for optimizing the construction of high-density linkage maps, Genetics
155 (1) (2000) 407–420.
[12] P. Roche, G. van Arkel, A.W. van Heusden, A specific PCR assay for resistance to
biotypes 1 and 2 of the rosy leaf curling aphid in apple based on an RFLP marker
closely linked to the Sd(1) gene, Plant Breeding 116 (6) (1997) 567–572.
[13] R. Liebhard, B. Koller, L. Gianfranceschi, C. Gessler, Creating a saturated reference
map for the apple (Malus × domestica Borkh.) genome, Theor. Appl. Genet. 106 (8)
(2003) 1497–1508.
[14] J.-M. Celton, D.S. Tustin, D. Chagné, S.E. Gardiner, Construction of a dense genetic
linkage map for apple rootstocks using SSRs developed from Malus ESTs and Pyrus
genomic sequences. Tree Genet. Genomes, doi:10.1007/s11295-008-0171-z.
[15] L. Picoult-Newberg, T.E. Ideker, M.G. Pohl, S.L. Taylor, M.A. Donaldson, D.A.
Nickerson, M. Boyce-Jacino, Mining SNPs from EST databases, Genome Res. 9 (2)
(1999) 167–174.
[16] P. Taillon-Miller, Z. Gu, Q. Li, L. Hillier, P.-Y. Kwok, Overlapping genomic sequences:
A treasure trove of single-nucleotide polymorphisms, Genome Res. 8 (7) (1998)
748–754.
[17] J. Batley, G. Barker, H. O'Sullivan, K.J. Edwards, D. Edwards, Mining for single
nucleotide polymorphisms and insertions/deletions in maize expressed sequence
tag data, Plant Physiol. 132 (1) (2003) 84–91.
[18] L. Le Dantec, D. Chagné, D. Pot, O. Cantin, P. Garnier-Gere, F. Bedon, J.M. Frigerio, P.
Chaumeil, P. Leger, V. Garcia, et al., Automated SNP detection in expressed
sequence tags: statistical considerations and application to maritime pine
sequences, Plant Mol. Biol. 54 (3) (2004) 461–470.
358
D. Chagné et al. / Genomics 92 (2008) 353–358
[19] C. Lopez, B. Piegu, R. Cooke, M. Delseny, J. Tohme, V. Verdier, Using cDNA and
genomic sequences as tools to develop SNP strategies in cassava (Manihot
esculenta Crantz), Theor. Appl. Genet. 110 (3) (2005) 425–431.
[20] D.J. Somers, R. Kirkpatrick, M. Moniwa, A. Walsh, Mining single-nucleotide
polymorphisms from hexaploid wheat ESTs, Genome 46 (3) (2003) 431–437.
[21] N.O.I. Cogan, R.C. Ponting, A.C. Vecchies, M.C. Drayton, J. George, P.M. Dracatos,
M.P. Dobrowolski, T.I. Sawbridge, K.F. Smith, G.C. Spangenberg, et al., Geneassociated single nucleotide polymorphism discovery in perennial ryegrass
(Lolium perenne L.), Mol. Genet. Genom. 276 (2) (2006) 101–112.
[22] C. Kimchi-Sarfaty, J.M. Oh, I.-W. Kim, Z.E. Sauna, A.M. Calcagno, S.V. Ambudkar,
M.M. Gottesman, A “Silent” Polymorphism in the MDR1 Gene Changes Substrate
Specificity, Science 315 (5811) (2007) 525–528.
[23] R.C. Evans, C.S. Campbell, The origin of the apple subfamily (Maloideae; Rosaceae)
is clarified by DNA sequence data from duplicated GBSSI genes, Am. J. Bot. 89 (9)
(2002) 1478–1484.
[24] M. Troggio, G. Malacarne, G. Coppola, C. Segala, D.A. Cartwright, M. Pindo, M.
Stefanini, R. Mank, M. Moroldo, M. Morgante, et al., A Dense Single-Nucleotide
Polymorphism-Based Genetic Linkage Map of Grapevine (Vitis vinifera L.)
Anchoring Pinot Noir Bacterial Artificial Chromosome Contigs, Genetics 176 (4)
(2007) 2637–2650.
[25] M. Liew, L. Nelson, R. Margraf, S. Mitchell, M. Erali, R. Mao, E. Lyon, C. Wittwer,
Genotyping of human platelet antigens 1 to 6 and 15 by high-resolution amplicon
melting and conventional hybridization probes, J. Mol. Diagnost. 8 (1) (2006)
97–104.
[26] J. Montgomery, C.T. Wittwer, J.O. Kent, L.M. Zhou, Scanning the cystic fibrosis
transmembrane conductance regulator gene using high-resolution DNA melting
analysis, Clin. Chem. 53 (11) (2007) 1891–1898.
[27] A.L. Chateigner-Boutin, I. Small, A rapid high-throughput method for the detection
and quantification of RNA editing based on high-resolution melting of amplicons,
Nucl. Acids Res. 35 (17) (2007).
[28] F. Calenge, C.G. van der Linden, E. van de Weg, H.J. Schouten, G. van Arkel, C.
Denance, C.E. Durel, Resistance gene analogues identified through the NBSprofiling method map close to major genes and QTL for disease resistance in apple,
Theoret. Appl. Genet. 110 (4) (2005) 660–668.
[29] E. Silfverberg-Dilworth, C.L. Matasci, W.E. van de Weg, M.P.W. van Kaauwen, M.
Walser, L.P. Kodde, V. Soglio, L. Gianfranceschi, C.E. Durel, F. Costa, et al.,
Microsatellite markers spanning the apple (Malus × domestica Borkh.) genome,
Tree Genet. Genomes 2 (4) (2006) 202–224.
[30] N. Suresh, C. Hampson, K. Gasic, G. Bakkeren, S.S. Korban, Development and
linkage mapping of E-STS and RGA markers for functional gene homologues in
apple, Genome 49 (8) (2006) 959–968.
[31] V.G.M. Bus, E.H.A. Rikkerink, W.E.v.d. Weg, R.L. Rusholme, S.E. Gardiner, H.C.M.
Bassett, L.P. Kodde, L. Parisi, F.N.D. Laurens, E.J. Meulenbroek, et al., The Vh2 and
Vh4 scab resistance genes in two differential hosts derived from Russian apple
R12740-7A map to the same linkage group of apple, Mol. Breeding 15 (1) (2005)
103–116.
[32] F. Costa, S. Stella, W.E. Van de Weg, W. Guerra, M. Cecchinel, J. Dallavia, B. Koller,
S. Sansavini, Role of the genes Md-ACO1 and Md-ACS1 in ethylene production
and shelf life of apple (Malus domestica Borkh), Euphytica 141 (1-2) (2005)
181–190.
[33] Z.S. Gao, W.E. van de Weg, J.G. Schaart, H.J. Schouten, D.H. Tran, L.P. Kodde, I.M. van
der Meer, A.H.M. van der Geest, J. Kodde, H. Breiteneder, et al., Genomic cloning
and linkage mapping of the Mal d 1 (PR-10) gene family in apple (Malus
domestica), Theoret. Appl. Genet. 111 (1) (2005) 171–183.
[34] E. Dirlewanger, E. Graziano, T. Joobeur, F. Garriga-Caldere, P. Cosson, W. Howad, P.
Arus, Comparative mapping and marker-assisted selection in Rosaceae fruit crops,
Proc. Natl. Acad. Sci. U.S.A. 101 (26) (2004) 9891–9896.
[35] S.E. Gardiner, H.C.M. Bassett, D.A.M. Noiton, V.G. Bus, M.E. Hofstee, A.G. White, R.D.
Ball, R.L.S. Forster, E.H.A. Rikkerink, A detailed linkage map around an apple scab
resistance gene demonstrates that two disease resistance classes both carry the V
(f) gene, Theoret. Appl. Genet. 93 (4) (1996) 485–493.
[36] C. Maliepaard, F.H. Alston, G. Van Arkel, L.M. Brown, E. Chevreau, F. Dunemann,
K.M. Evans, S. Gardiner, P. Guilford, A.W. Van Heusden, et al., Aligning male and
female linkage maps of apple (Malus pumila Mill.) using multi-allelic markers,
Theoret. Appl. Genet. 97 (1-2) (1998) 60–73.
[37] J.W. Van Ooijen, R.E. Voorrips, JoinMap® 3.0, Software for the calculation of genetic
linkage maps, Plant Research International, Wageningen, The Netherlands, 2001.
[38] D. Grattapaglia, R. Sederoff, Genetic linkage maps of Eucalyptus grandis and
Eucalyptus urophylla using a pseudo-testcross: Mapping strategy and RAPD
markers. Genetics 137 (4) (1994) 1121–1137.
[39] S. Naik, C. Hampson, K. Gasic, G. Bakkeren, S.S. Korban, Development and linkage
mapping of E-STS and RGA markers for functional gene homologs in apple,
Genome 49 (2006) 959–968.
[40] R.N. Crowhurst, M. Davy, C. Deng, BioView - an enterprise bioinformatics system
for automated analysis and annotation of non-genomic DNA sequence, 3rd
Roseceae Genomics Conference: 2006, Napier, New Zealand, 2006.
[41] B.E. Suzek, H.Z. Huang, P. McGarvey, R. Mazumder, C.H. Wu, UniRef: comprehensive and non-redundant UniProt reference clusters, Bioinformatics 23 (10) (2007)
1282–1288.
[42] M. Liew, R. Pryor, R. Palais, C. Meadows, M. Erali, E. Lyon, C. Wittwer, Genotyping of
single-nucleotide polymorphisms by high-resolution melting of small amplicons,
Clin. Chem. 50 (7) (2004) 1156–1164.