mmc1
mmc1
mmc1
Supporting Text
All sequenced P. acnes genomes encode three copies of 16S rRNA genes, which are identical
within each isolate, except KPA171202. Based on the KPA171202 genome (Bruggemann et al.,
2004), one copy of the 16S rRNA gene has one nucleotide difference from the other two
identical copies of RT1. However, this mutation was never observed in our 16S rDNA dataset.
We amplified, cloned and sequenced multiple clones of 16S rDNA gene from KPA171202 and
did not find a sequence harboring this mutation. Therefore, we believe that KPA171202 also has
To determine whether the P. acnes ribotypes and their relative abundances measured in this
study are unique to pilosebaceous units, we applied a similar analysis to the microbiome 16S
rDNA data from the Human Microbiome Project (HMP) and the data from Grice et al. (2009).
Both datasets were obtained from healthy subjects. The relative abundance of the major
ribotypes in healthy subjects from our study was largely similar to that found in these two
datasets despite the fact that they were sampled from different anatomical sites (Figure S2). RT6
(6.3%) was found to be more abundant than RT4 and RT5 combined (2.8%) in the HMP data,
similar to those found in our normal cohort where RT6 represents 4.8% and RT4 and RT5
combined represent 1.2% of the clones. The same five main microbiome types were observed in
the two datasets (Figure S5). This also suggests that our sampling and analysis of the
1
Genome clustering and phylogenetic tree
The recA gene has been widely used to classify P. acnes strains into four known types: IA, IB, II
and III (McDowell et al., 2008; McDowell et al., 2005). The phylogenetic tree of the 71
genomes based on the SNPs in the core genome matched the recA types perfectly except one
isolate, HL097PA1. Most of the genomes with ribotypes 1, 4, 5 and 532 were grouped to recA
Type IA clade, which can be further divided into subclades IA-1 and IA-2. Clade IA-2 is
composed of mostly RT4 and RT5. RT4 and most of RT5 genomes seem to belong to the same
lineage with very similar genome sequences. All the isolates with ribotypes 3, 8 and 16, who
share the mutation of T1007C in the 16S rDNA gene, were grouped to recA Type IB clade.
Most of the RT3 genomes form a subclade IB-2 and RT8 genomes form a subclade by
themselves, IB-1, which was highly associated with acne. Notably, RT2 and RT6, who share
T854C mutation, seem to have a more distant phylogenetic relationship to other ribotypes, and
were grouped to the recA Type II clade. This is consistent with previous studies (Lomholt and
Kilian, 2010; McDowell et al., 2005). We did not find P. acnes isolates with recA type III in our
samples.
We further analyzed the associations of P. acnes lineages with health and disease states. There
was a clear shift of the association strength of the clades with acne along the phylogenetic tree
(Figure S6). The three sequenced ribotypes identified as being strongly associated with acne
(RT4, RT5, and RT8) were found at one end of the tree in clades IA-2 and IB-1, while the RT6
identified as being associated with normal skin was at the other end of the tree at the tip of clade
II (Figure S6).
2
Antibiotic resistance
P. acnes ribotypes 4, 5 and 10 have a single nucleotide substitution G1058C in the 16S rDNA
sequences, which has previously been shown to confer increased resistance to tetracycline (Ross
et al., 1998a; Ross et al., 2001). In addition to the substitution in the 16S rDNA sequences, we
found that all the strains of RT4 and RT5 that we sequenced have a nucleotide substitution in the
23S rDNA sequences, which confers increased resistance to a different class of antibiotics,
erythromycin and clindamycin (Ross et al., 1997; Ross et al., 1998b). We experimentally
confirmed that these isolates except two that were unculturable were resistant to tetracycline,
We examined whether the enrichment of these ribotypes in the acne group could be due to
antibiotic treatment. However, in our study only a small percentage of the subjects harboring
ribotypes 4, 5 or 10 were treated with antibiotics (Table S2). Eighteen of the 29 subjects who
harbored any of these three ribotypes gave reports on both past and current treatments. Among
them, 50% (9/18) of the subjects were never treated; 33% (6/18) were treated with retinoids;
11% (2/18) were treated with antibiotics in the past, and 5.6% (1/18) were treated with both
antibiotics and retinoids in the past. Therefore, the theory of selection by antibiotic treatment is
not favored in our study. Previous surveys of antibiotic resistant strains in acne patients
demonstrated that previous use of antibiotics did not always result in the presence of resistant
strains and that some patients without previous use of antibiotics harbored resistant strains
(Coates et al., 2002; Dreno et al., 2001). Our observation in this study is consistent with these
previous studies. Nonetheless, we cannot completely rule out a possible influence of past
3
antibiotic exposure, other acne therapies, or an altered cutaneous environment, on the
Although more similar to the GC content of P. acnes genomes, four unique spacer sequences
found in strains of RT2 and RT6 have the best matches to the genome of Clostridium leptum, a
commensal bacterium in the gut microbiota (Table 2). On the 55Kb plasmid harbored in
HL096PA1 and other RT4 and RT5 genomes, there is also a large cluster of 35 genes that are
identical to the genes found in C. leptum, including the Tad locus. While P. acnes is also found
in the gut microbiota, it is unclear how these genes were horizontally transferred between P.
acnes and C. leptum or possibly from a progenitor organism. Further investigation into the
relationships between the microbes at different body sites will be crucial to determine how
genetic materials are transferred between different microbiomes and how this mechanism affects
Metagenomic DNA extraction, PCR amplification, cloning and 16S rDNA sequencing
Metagenomic DNA extraction – Individual microcomedones were isolated from the adhesive
nose strip using sterile forceps and placed in a 2 mL sterile microcentrifuge tube filled with ATL
buffer (Qiagen) and 0.1 mm diameter glass beads (BioSpec Products, Inc., Bartlesville, OK).
Cells were lysed using a beadbeater for 3 minutes at 4,800 rpm at room temperature. After
centrifugation at 14,000 rpm for 5 minutes, the supernatant was retrieved and used for genomic
DNA extraction using QIAamp DNA Micro Kit (Qiagen). The manufacturer protocol for
4
extracting DNA from chewing gum was used. Concentration of the genomic DNA was
16S rDNA PCR amplification, cloning and sequencing – Most of the metagenomic samples were
amplified in triplicate using 16S rDNA specific primers with the following sequences: 27f-MP
3’. PCR reactions contained 0.5 U/µL Platinum Taq DNA Polymerase High Fidelity
(Invitrogen), 1X Pre-mix E PCR buffer from Epicentre Fail-Safe PCR system, 0.12 µM
concentration of each primer 27f-MP and 1492r-MP, and Sigma PCR grade water. One
microliter of DNA (ranging from 0.2 - 10 ng total) was added to each reaction. The G-Storm
GS4 thermocycler conditions were as following: initial denaturation of 96°C for 5 minutes, and
30 cycles of denaturation at 94°C for 30 seconds, annealing at 57°C for 1 minute, and extension
at 72°C for 2 minutes, with a final extension at 72°C for 7 minutes. Following amplification, an
A-tailing reaction was performed by the addition of 1 U of GOTaq DNA Polymerase directly to
the amplification reaction and incubation in the thermocycler at 72°C for 10 minutes.
The three PCR amplification reactions from each source DNA were pooled and gel purified
(1.2% agarose gel stained with SYBR Green fluorescent dye). The 1.4 Kb product was excised
and further purified using the Qiagen QIAquick Gel Extraction kit. The purified product was
cloned into OneShot E. coli cells using TOPO TA cloning kit from Invitrogen.
White colonies were picked into a 384-well tray containing terrific broth, glycerol, and
kanamycin using a Qpix picking robot. Each tray was prepared for sequencing using a magnetic
5
bead prep from Agilent and sequenced with 1/16th Big Dye Terminator from ABI. Sequencing
was done with a universal forward, universal reverse, and for a subset, internal 16S rDNA primer
Sequence reactions were loaded on ABI 3730 machines from ABI on 50 cm arrays with a long
A slightly different PCR and cloning protocol without automation was used for several initial
samples as described below. 16S rDNA was amplified using universal primers 8F (5’-
(Gao et al., 2007). Thermocycling conditions were as following: initial denaturation step of 5
minutes at 94°C, 30 cycles of denaturation at 94°C for 45 seconds, annealing at 52°C for 30
seconds and elongation at 72°C for 90 seconds, and a final elongation step at 72°C for 20
minutes.
PCR products were purified using DNA Clean and Concentrator Kit (Zymo Research).
Subsequently, the 16S rDNA amplicons were cloned into pCR 2.1-TOPO vector (Invitrogen).
One-Shot TOP-10 Chemically Competent E. coli cells (Invitrogen) were transformed with the
vectors and plated on selective media. Individual positive colonies were picked and inoculated
into selective LB liquid medium. After 14 hours of incubation, the plasmids were extracted and
purified using PrepEase MiniSpin Plasmid Kit (USB Corporation) or Zyppy Plasmid Miniprep
Kit (Zymo Research). The clones were sequenced bidirectionally using Sanger sequencing
method with 1/8th chemistry using ABI 3730 sequencer (Applied Biosystems Inc.).
6
P. acnes isolation and culturing
Sample culture plate – Microcomedones on the inner surface of the nose strip were mashed and
scraped using a sterile loop (Fisherbrand, Pittsburgh, PA), and plated onto a blood agar plate
(Teknova Brucella Agar Plate with Hemin and Vitamin K, Teknova, Hollister, CA). The plates
were incubated at 37°C for 5 - 7 days anaerobically using the AnaeroPack System (Mitsubishi
Isolation and culturing of individual strains - Colonies with the macroscopic characteristics of P.
acnes were picked from each sample plate and were streaked onto A-media plates (Pancreatic
Digase of Casine, Difco yeast extract, glucose, KH2PO4, MgSO4, Difco Agar, and water). These
first-pass plates were then incubated anaerobically at 37°C for 5 - 7 days. As the second pass,
single isolated colonies were picked from the first-pass plates and streaked onto new A-Media
plates. These plates were then incubated anaerobically at 37°C for 5 - 7 days. The colonies on
these plates were picked for culturing, genotyping, and genome sequencing in the subsequence
steps.
Genotyping of the P. acnes isolates – each isolate was analyzed by PCR amplification of the 16S
rDNA gene. The ribotypes were determined based on the full length sequences. Isolates with
desired ribotypes were selected for future culturing and genome sequencing.
medium under anaerobic conditions at 37°C for 5 - 7 days. Cultures were pelleted by
7
centrifugation and washed with 3 mL phosphate buffer saline (PBS). The same protocol used for
the metagenomic DNA extraction was used for extracting the genomic DNA of the isolates.
Metagenomic DNA samples from microcomedone samples from 22 individuals with normal skin
were pooled and sequenced using Roche/454 FLX. The average read length was 236 bp. The
sequencing was limited with 13,291 sequence reads. Sequence reads were aligned against the
NCBI’s non-redundant database using BLAST. Species assignment was based on 97% identity
Assembly and alignment - Base calling and quality were determined with Phred (Ewing and
Green, 1998; Ewing et al., 1998) using default parameters. Bidirectional reads were assembled
AmosCmp16Spipeline and NAST-ier, which are from the Microbiome Utilities Portal of the
et al., 2004), Mummer (Kurtz et al., 2004), Lucy (Chou and Holmes, 2001), BLAST (Altschul et
were identified using ChimeraSlayer and WigeoN (Haas et al., 2011). Sequences with at least
90% bootstrap support for a chimeric breakpoint (ChimeraSlayer) or containing a region that
varies at more than the 99% quantile of expected variation (WigeoN) were removed from further
analysis.
8
Quality screening - For diversity analysis of the P. acnes population, sequences with at least
99% identity over 1,400 nucleotides to P. acnes KPA171202 (Bruggemann et al., 2004) 16S
rDNA were trimmed to positions 29-1483 (numbering based on the E. coli system of
nomenclature (Brosius et al., 1978)). Sequences without full coverage over this region were
excluded from further strain level analysis. Chimera screening, as described above, resulted in
removal of less than 0.35% of the sequences. This may be an under-estimation of the chimeras,
since the majority of sequences differ by only 1 or 2 nucleotides. Low quality sequences were
excluded, defined as more than 50 nucleotides between positions 79 and 1433 with Phred quality
scores of less than 15. To allow detailed strain-level analysis, the data were extensively
manually edited. Chromatograms were visually inspected at all bases with a Phred quality score
< 30, and appropriate corrections were applied. For analysis at the species level, the 16S rDNA
sequences were not manually edited. Chimera screening of assembled sequences resulted in
removal of less than 0.65% of the sequences. Aligned sequences were trimmed to E. coli
equivalent positions 29-1483 (Brosius et al., 1978). Sequences without full coverage over this
Sequence editing - Nearly 62,000 Sanger sequence reads representing the 26,446 assembled P.
acnes sequences were mapped to the RT1 sequence in CONSED (Gordon, 2003; Gordon et al.,
1998). Comprehensive semi-manual editing of the large number of sequences was made feasible
by their very high pairwise similarities: a median of only one nucleotide change from RT1 per
sequence (three nucleotide changes prior to editing). Editing was facilitated by the use of scripts
and the custom navigation feature of CONSED allowing single click jumps to sites requiring
inspection. Chromatograms were inspected for all low quality (Phred < 30) bases that differed
9
from RT1, and corrected as needed, including many commonly occurring sequence errors. In
order to minimize the effect of base mis-incorporation and chimera, specific base differences
from RT1 occurring in less than 4 sequences (frequency < 0.00015) were considered unreliable
and reverted to the corresponding RT1 base. Ribotypes were assigned for the resulting
OTUs and taxonomy assignments – QIIME (Caporaso et al., 2010b) was used to cluster the
sequences into OTUs using 99% identity cutoff, furthest neighbor, and UCLUST (Edgar, 2010).
Representative sequences (most abundant) were selected and aligned using PYNAST (Caporaso
et al., 2010a) to the greengenes database. Taxonomy was assigned using RDP method (Cole et
al., 2009). The alignment was filtered with the lanemask provided by greengenes, and a
Wilcoxon test on the top ten ribotypes - For each sample, the number of clones of each of the top
ten ribotypes was normalized by the total number of P. acnes clones of the sample. The
normalized counts were used to test the significance in enrichment between the acne group and
the normal group. The function wilcox_test in the R program (http://www.R-project.org) was
Microbiome type assignments – Microbiome types were assigned based on the largest clades
seen when samples were clustered using thetayc similarity in MOTHUR (Schloss et al., 2009)
(Figures 2 and S4) or hierarchical clustering (Eisen et al., 1998) (Figure S5).
10
Assigning ribotypes to datasets of HMP and Grice et al. 2009 - Sequences were assigned to a
ribotype if they met the following criteria. First, there was a single best match. Second, it
covered the range required to discriminate between the top 45 ribotypes (58-1388). Third, there
were no Ns at discriminatory positions. Lastly, there were no more than ten non-discriminatory
differences.
The HMP 16S rDNA Sanger sequence dataset was downloaded with permission from the HMP
Data Analysis and Coordination Center. It has 8,492 P. acnes sequences from 14 subjects and
nine body sites (retroauricular crease, anterior nares, hard palate, buccal mucosa, throat, palatine
tonsils, antecubital fossa, saliva, and subgingival plaque). More details on the dataset can be
found at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000228.v2.p1.
In this dataset, low quality bases (Phred quality < 20) were converted to Ns, and 26% of the
sequences were not assigned due to excessive Ns or Ns at ribotype discriminatory sites. Less
than 1% was unresolved due to equal best matches or greater than ten mismatches to RT1.
The dataset from Grice et al. (2009) is available at NCBI (GenBank accession numbers
GQ000001 to GQ116391). It has 22,378 P. acnes sequences from ten subjects and 21 skin sites
(buttock, elbow, hypothenar palm, volar forearm, antecubital fossa, axillary vault, gluteal crease,
inguinal crease, interdigital web space, nare, plantar heel, popliteal fossa, toe web space,
umbilicus, alar crease, back, external auditory canal, glabella, manubrium, occiput, and
retroauricular crease). Three percent of the sequences were unassigned due to greater than ten
mismatches to RT1, and 1.6% was unassigned due to equal best matches.
11
For comparison purpose, our unedited 16S rDNA sequences were assigned to ribotypes by the
same method described above and the result is shown in Figure S2. Less than 0.6% of the
sequences were unassigned due to greater than ten mismatches to RT1, and 1.7 % was
Genome HL096PA1 - The genome was sequenced using Roche/454 FLX at the UCLA
Genotyping and Sequencing Core. A total of 590,054 sequence reads were generated with an
average read length of 230 bp. Of these, 433,896 were assembled into two contigs, a circular
main chromosome of 2,494,190 bp and a linear plasmid of 55,585 bp. Assembly was
(Roche) with extensive manual editing in CONSED. GeneMark v2.6r (Borodovsky and
McIninch, 1993) and GLIMMER v2.0 (Salzberg et al., 1998) were used to performed ab initio
protein coding gene prediction. tRNAScan-SE 1.23 was used for tRNA identification and
RNAmmer was used for predicting ribosomal RNA genes (5S, 16S, and 23S). Genome
annotation results were based on automated searches in public databases, including Pfam
Genomes of the other 65 isolates - The genomes were sequenced using Illumina/Solexa Genome
Analyzer IIx and annotated by the Genome Center of Washington University at St. Louis.
12
Assembly: Each genomic DNA sample was randomly sheared and an indexed library was
constructed using standard Illumina protocols. Twelve uniquely tagged libraries were pooled
and run on one lane of a GAIIx flowcell and paired end sequences were generated. Following
deconvolution of the tagged reads into the separate samples, datasets were processed using BWA
(Li and Durbin, 2009) quality trimming at a q10 threshold. Reads trimmed to less than 35bp in
length were discarded and the remaining reads were assembled using oneButtonVelvet, an
optimizer program that runs the Velvet assembler (Zerbino and Birney, 2008) numerous times
over a user supplied k-mer range while varying several of the assembler parameters and
optimizing for the assembly parameter set which yields the longest N50 contig length.
Annotation: Coding sequences were predicted using GeneMark v3.3 (Borodovsky and McIninch,
1993) and GLIMMER v2.13 (Salzberg et al., 1998). Intergenic regions not spanned by
GeneMark and GLIMMER were aligned using BLAST against NCBI's non-redundant database
and predictions were generated based on protein alignments. tRNA genes were determined using
tRNAscan-SE 1.23 and non-coding RNA genes were determined by RNAmmer-1.2 and Rfam
v8.0. The final gene set was processed through a suite of protein categorization tools consisting
of Interpro, psort-b and KEGG. The gene product naming comes from the BER pipeline (JCVI).
http://hmpdacc.org/doc/sops/reference_genomes/annotation/WUGC_SOP_DACC.pdf.
Identification of the core regions of P. acnes genomes - The “core” regions were defined as
genome sequences that are present in all 71 genomes. P. acnes KPA171202 was used as the
13
reference genome. Each of the other 70 genome sequences (a series of contigs in most of the
genomes and two complete genomes) was mapped to the reference genome using Nucmer (Kurtz
et al., 2004). All the 70 “.coords” output files of Nucmer program were analyzed to identify
overlap regions based on the KPA171202 coordinates using a Perl script. Finally, “core”
sequences were extracted based on the genome sequence of KPA171202 with the coordinates
calculated above. On average, 90% (ranging from 88% to 92%) of the genomes were included in
Identification of SNPs in the core regions – Single nucleotide polymorphisms (SNPs) were
identified by using “show-snps” utility option of the Nucmer program (Kurtz et al., 2004) with
the default settings. Genome sequence of P. acnes KPA171202 was used as the reference
genome. All the 70 “.snps” output files of Nucmer program were analyzed to identify unique
SNP positions based on the KPA171202 coordinates using a Perl script. The SNPs in the core
Phylogenetic tree construction - The 71 concatenated sequences of the 96,887 SNP nucleotides
in the core regions were used to construct a phylogenetic tree of the P. acnes genomes. The
evolutionary distance of the core regions among the genomes was inferred using the Neighbor-
Joining method (Saitou and Nei, 1987). The bootstrap tree inferred from 1,000 replicates was
taken. Branches corresponding to partitions reproduced in less than 80% bootstrap replicates
were collapsed. Figure 3 shows only the topology. In Figure S6, the tree was drawn to scale,
with branch lengths in the same units as those of the evolutionary distances used to infer the
phylogenetic tree. The evolutionary distances were computed using the p-distance method and
14
are in the units of the number of nucleotide differences per site. This tree shows the comparison
based on only the core regions. The distance does not represent the true evolutionary distance
between different genomes, since the non-core regions of each genome were not considered here.
All positions containing gaps and missing data were eliminated. Evolutionary analysis was
Gene content comparison - In order to assess the conservation of gene content across the 71
genomes, protein coding genes in all the genomes were clustered using UCLUST (Edgar, 2010)
by first sorting by decreasing length then clustering each sequence to an existing seed sequence
if it had at least 90% nucleotide identity over its entire length, otherwise it became a new seed.
For visualization, the data were reformatted to columns and rows representing genes and
genomes, respectively. One or more copies of the genes in a genome were treated as present.
Gene columns were sorted by their position based on the coordinates of the HL096PA1 genome,
a fully finished genome with a 55Kb plasmid. Genome rows were sorted by their positions in the
Identification of CRISPR/Cas – CRISPRFinder (Grissa et al., 2007) was used to identify the
CRISPR repeat-spacer sequences. The annotation of HL110PA3 was used for BLAST
alignment in order to identify the presence of CRISPR/Cas structure and CRISPR repeat-spacer
HL110PA4 and J139. Each spacer sequence was annotated by BLAST alignment against
NCBI’s non-redundant nucleotide database and the reference genomic sequences database
(refseq_genomic).
15
Sequence coverage analysis – MAQ (Li et al., 2008) was used to map the raw sequence reads
from Illumina/Roche platform to the reference genomes. Briefly, “map” command was used for
mapping, and “assemble” command was used for calling the consensus sequences from read
mapping, then “cnd2win” command was used to extract information averaged in a tilling
window. A window size of 1,000 bp was used. Randomly selected 1 million reads were used
for mapping. This accounted for approximately 40X coverage for all the genomes except
HL096PA2, HL096PA3, HL097PA1 and HL099PA1, which had approximately 55X to 75X
coverage. BWA (Li and Durbin, 2010) was used to map the raw sequence reads from Roche/454
platform to the reference genome HL096PA1. The average coverage was calculated in 1,000 bp
window.
Quantitative PCR
Quantitative PCR (qPCR) targeting TadA on the plasmid (Locus 3) and housekeeping genes Pak
and RecA on the chromosome was performed using the genomic DNA extracted from the P.
acnes isolates. LightCyler 480 High Resolution Melting Master kit was used (Roche Diagnostics
GmbH, Mannheim, Germany). Each 10 µL reaction solution was consisted of 5 µL master mix
(2X concentrate), 1 µL 25 mM MgCl2, 0.5 µL 4 µM forward and reverse primers, and DNA
template. Four qPCR runs were performed on Roche LightCycler 480. Primer sequences for
16
GCTTCCTCATACCACTGGTCATC-3’ (reverse). All samples were run in duplicates in each
qPCR run, except the second run, which was not duplicated. Thermocycling conditions were as
following: initial activation step of 10 minutes at 95oC; 50 amplification cycles with each
consisting of 10 seconds at 95oC, 15 seconds at 65oC in the first cycle with a stepwise 0.5oC
decrease for each succeeding cycle, and 30 seconds at 72oC; and final melting curve step starting
at 65oC and ending at 99oC with a ramp rate of 0.02 oC/s and acquisition rate of 25/oC. DNA
concentration standards were run in duplicates. Copy number ratios of genes were calculated
Data Availability
16S rDNA sequences have been deposited at GenBank under the project ID 46327. Whole
genome shotgun sequences and annotations of the P. acnes strains have been deposited at
17
ADZN00000000, ADZO00000000, ADZP00000000, ADZQ00000000, ADZR00000000,
CP003294.
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool.
J Mol Biol 215:403-10.
Brosius J, Palmer ML, Kennedy PJ, Noller HF (1978) Complete nucleotide sequence of a 16S
ribosomal RNA gene from Escherichia coli. Proc Natl Acad Sci U S A 75:4801-5.
Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, Knight R (2010a)
PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26:266-7.
Chou HH, Holmes MH (2001) DNA sequence quality trimming and vector removal.
Bioinformatics 17:1093-104.
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, et al. (2009) The Ribosomal Database
Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141-5.
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics
26:2460-1.
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-
wide expression patterns. Proc Natl Acad Sci U S A 95:14863-8.
Gordon D (2003) Viewing and editing assembled sequences using Consed. Curr Protoc
Bioinformatics Chapter 11:Unit11 2.
Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform.
Bioinformatics 26:589-95.
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25:1754-60.
Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using
mapping quality scores. Genome Res 18:1851-8.
McDowell A, Perry AL, Lambert PA, Patrick S (2008) A new phylogenetic group of
Propionibacterium acnes. J Med Microbiol 57:218-24.
18
McDowell A, Valanne S, Ramage G, Tunney MM, Glenn JV, McLorinan GC, et al. (2005)
Propionibacterium acnes types I and II represent phylogenetically distinct groups. J Clin
Microbiol 43:326-34.
Pop M, Phillippy A, Delcher AL, Salzberg SL (2004) Comparative genome assembly. Brief
Bioinform 5:237-48.
Price MN, Dehal PS, Arkin AP (2009) FastTree: computing large minimum evolution trees with
profiles instead of a distance matrix. Mol Biol Evol 26:1641-50.
Ross JI, Eady EA, Cove JH, Jones CE, Ratyal AH, Miller YW, et al. (1997) Clinical resistance
to erythromycin and clindamycin in cutaneous propionibacteria isolated from acne patients is
associated with mutations in 23S rRNA. Antimicrob Agents Chemother 41:1162-5.
Ross JI, Eady EA, Cove JH, Ratyal AH, Cunliffe WJ (1998b) Resistance to erythromycin and
clindamycin in cutaneous propionibacteria is associated with mutations in 23S rRNA.
Dermatology 196:69-70.
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing
phylogenetic trees. Molecular biology and evolution 4:406-25.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009)
Introducing mothur: open-source, platform-independent, community-supported software for
describing and comparing microbial communities. Appl Environ Microbiol 75:7537-41.
19
Table S1. Six phyla and 42 genera found in pilosebaceous units.
20
Table S2. Past and current treatments of the subjects.
21
Table S3. P-values calculated using four different statistical tests of non-random distribution
between groups show that P. acnes population structures in acne patients and normal individuals
The tests were performed for two different sets of ribotypes: the top ten most abundant ribotypes
and the top 110 most abundant ribotypes (with 9 or more clones), and for three pairs of groups:
acne vs. normal skin; male vs. female; or random pairing. The analyses were based on thetayc
sample to sample distances and did not consider distance between ribotypes, i.e. only ribotype
counts. All tests were based on 100,000 iterations and were performed using MOTHUR
a
random assignment of subjects to groups A and B (repeated five times and averaged)
22
Table S4. Summary of genes encoded in loci 1, 2, 3.
23
6000
5000
4000
Number of clones
Acne
Normal
3000
2000
1000
Figure S1. Rank abundance of P. acnes ribotypes shows a distribution similar to that seen at the higher taxonomic levels.
A few highly abundant ribotypes and a large number of rare ribotypes were observed in the samples. Some ribotypes were
highly enriched in acne patients. Only top 30 most abundant ribotypes are shown for graphing purpose.
100%
80% RT10
RT9
70%
RT8
60% RT7
RT6
50%
RT5
40% RT4
RT3
30%
RT2
20% RT1
10%
0%
Acne in this Normal in this HMP Grice et al. 2009 Grice et al.
study study 2009, HV4
removed
Figure S2. The most abundant P. acnes ribotypes in pilosebaceous units were also abundant at other body sites. The major
ribotypes found in acne patients and normal individuals from this study were compared to the datasets from the HMP and
Grice et al. (2009). The top three ribotypes are the most abundant ones in different datasets. The excess RT4 and RT5 seen
in the dataset by Grice et al. (2009) was due to one subject, HV4, whose P. acnes strain population was dominated by these
two ribotypes at every skin site sampled. After removal of this subject, the ribotype distribution is similar to the HMP samples
and the normal skin samples in this study. RT6 is also found abundant in the HMP dataset, which were collected from healthy
individuals.
Acne
Normal
Figure S3. P. acnes population structures differ in acne and normal skin. P. acnes populations from samples were clustered
using principal coordinates analysis of the weighted UniFrac distance matrix for the top ten most abundant ribotypes. The
principal coordinate 1 (P1) explains 43.64% of the variation and P2 explains 20.07% of the variation. Analysis was performed
using QIIME (Caporaso et al., 2010b).
I IV II III V
100%
80%
RT10
70% RT9
RT8
60%
RT7
RT6
50%
RT5
40% RT4
RT3
30% RT2
RT1
20%
10%
0%
** *
I Microbiome type I II Microbiome type II III Microbiome type III
IV Microbiome type IV V Microbiome type V Minor microbiome types
Figure S4. Distribution of the top ten most abundant P. acnes ribotypes in all samples without separating the two groups of
acne and normal skin. Each column represents the percentage of the top ten ribotypes identified in each sample. When all
samples were clustered, we observed the same five major microbiome types at the P. acnes strain level. This suggests that
the microbiome classification does not depend on the states of the disease. Only three out of 99 samples were clustered
differently compared to the one shown in Figure 2 (marked with asterisks). Two samples (one from acne, one from normal
skin) with fewer than 50 P. acnes 16S rDNA sequences are not shown.
Relative abundance of the top ten ribotypes
V IV I II III
100%
90% RT10
80% RT9
70% RT8
RT7
60%
RT6
50%
RT5
40% RT4
30% RT3
RT2
20%
RT1
10%
0%
Samples in this study Samples in HMP dataset Samples in Grice et al. 2009
Figure S5. The same five major microbiome types were observed in multiple datasets. Samples from this study, HMP and
Grice et al. (2009) were clustered together based on the top ten most abundant P. acnes ribotypes. In total 284 samples were
included. Each column represents the percentage of the top ten ribotypes identified in each sample. Both HMP samples and
samples from Grice et al. (2009) were collected from healthy individuals, therefore the percentage of microbiome types IV and
V are under-represented in this analysis. Samples with fewer than ten sequences of the top ten ribotypes were not included.
II
Health associated
0.01
IB-3
IB-2
IB-1
IA-1
Figure S6. Phylogenetic tree constructed based on the 96,887 SNPs in P. acnes core genome shows that the 71 genomes
cluster into distinct clades, consistent with recA types that have been used to classify P. acnes strains. The 16S ribotypes of
the genomes represent the relationship of the lineages to a large extent. At one end of the tree, clades IA-2 and IB-1 mainly
consist of the ribotypes enriched in acne, and at the other end of the tree, RT6 in clade II was mainly found in healthy
subjects. Bootstrap test with 1,000 replicates were performed. The distances between the branches were calculated based
on the SNPs in the core genome and do not represent the non-core regions of each genome. The enlarged branches were
colored according to the 16S ribotypes as shown in Figure 3.
Locus 1 Locus 2
HL036PA1
HL036PA2
HL036PA3
HL046PA2
HL002PA3
HL002PA2
HL005PA3
HL005PA2
HL020PA1
HL027PA2
HL100PA1
HL087PA2
HL013PA2
HL063PA1
HL096PA3
J165
HL072PA1
HL072PA2
HL078PA1
SK137
HL106PA2
HL099PA1
HL083PA1
HL038PA1
HL005PA1
HL074PA1
HL007PA1
HL096PA2
HL096PA1
HL043PA1
HL043PA2
HL056PA1
HL053PA1
HL045PA1
HL025PA1
SK187
HL086PA1
HL082PA1
HL110PA2
HL053PA2
HL110PA1
HL092PA1
HL063PA2
HL030PA2
HL037PA1
HL059PA1
HL059PA2
HL025PA2
HL067PA1
HL005PA4
HL002PA1
HL027PA1
HL083PA2
HL046PA1
HL013PA1
HL087PA1
HL087PA3
HL050PA1
HL050PA3
HL097PA1
HL030PA1
KPA171202
HL050PA2
HL060PA1
HL103PA1
HL082PA2
HL001PA1
HL106PA1
J139
HL110PA3
HL110PA4
Figure S7. Genome comparison of 71 P. acnes strains shows that the genomes of RT4 and RT5 are distinct from others
(extension of Figure 3). All the predicted open reading frames (ORFs) encoded on the chromosome are shown. Each row
represents a P. acnes genome colored according to the ribotypes. Rows are ordered by the phylogeny calculated based on
the SNPs in P. acnes core genome. Only the topology is shown. Columns represent ORFs in the genomes and are ordered
by their positions along the finished genome HL096PA1.
Locus 1 Locus 2
150
100
HL046PA2, RT1 50
0
180
120
HL096PA3, RT1 60
0
150
100
HL072PA1, RT5 50
0
150
100
HL072PA2, RT5 50
0
150
100
50
HL078PA1, RT1 0
150
100
50
HL099PA1, RT4 0
150
100
50
HL038PA1, RT4 0
150
100
50
HL005PA1, RT4 0
150
100
50
HL074PA1, RT4 0
150
100
50
HL007PA1, RT4 0
150
100
50
HL096PA1, RT5 0
150
100
50
HL096PA2, RT5 0
150
100
50
HL043PA1, RT5 0
150
100
50
HL043PA2, RT5 0
150
100
HL053PA1, RT4
50
0
Figure S8. Sequence coverage comparison
150
100 between the chromosome and the plasmid
50
HL045PA1, RT4 0
150
region in all genomes harboring a putative
100
50
plasmid shows that the copy number of plasmid
HL067PA1, RT3 0
150
100
ranges from 1 to 3 per genome. X-axis
HL027PA1, RT3
50
0
represents the DNA sequences along the
150
100 chromosome based on the coordinates of the
50
HL087PA3, RT3 0 finished genome HL096PA1, followed by
150
100
50 plasmid sequences. Y-axis represents the
HL097PA1, RT5 0
150
100
sequence coverage. The genomes were in the
HL110PA3, RT6
50
0
same order as in Figure 3, except HL056PA1
150
100
50
(as a negative control).
HL110PA4, RT6 0
150
100
50
HL056PA1, RT4 0
(no plasmid)
Chromosome Plasmid
4.5
Gene copy number ratio
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
TadA/Pak
TadA/Pak (by(by sequence coverage)
sequencecoverage) TadA/Pak
TadA/Pak (by(by qPCR)
qPCR) RecA/Pak
RecA/Pak (by(by qPCR)
qPCR)
P. acnes strains
Figure S9. Quantitative PCR (qPCR) confirmed that the copy number of plasmid in each genome is 1-3 as predicted from
sequence coverage comparison. Pak and RecA are housekeeping genes located on the chromosome and TadA is a
conserved gene in the Tad locus located on the plasmid. The copy number ratio between TadA and Pak ranges from 1 to 3 in
genomes, while the ratio between RecA and Pak is 1 in all the genomes. The TadA gene in HL078PA1 and HL045PA1 had
amplification in late cycles in qPCR, thus the copy numbers could not be analysed correctly. Conventional PCR confirmed the
amplification of TadA in these two strains, while other strains without the plasmid showed no amplification (data not shown).
E. coli K-12 W3110
ygcB ygcL ygcK ygcJ ygcI ygcH ygbT ygbF
L T
GAGTTCCCCGCGCCAGCGGGGATAAACCG
L T
GTATTCCCCGCCTATGCGGGGGTGAGCC
Figure S10. Comparison of the CRISPR/cas systems in P. acnes and E. coli. All the P. acnes CRISPR/cas systems found in
isolates of RT2 and RT6 are homologous to the CRISPR systems in E. coli and Streptococcus thermophilus CRISPR4 (not
shown).