Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis

...Read more
1 The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis Francis Martin 1* , Raffaella Balestrini 2+ , Olivier Jaillon 3–5+ , Annegret Kohler 1+ , Barbara Montanini 6+ , Emmanuelle Morin 1+ , Claude Murat 1+ , Benjamin Noel 3–5+ , Riccardo Percudani 6+ , Bettina Porcel 3–5 , Andrea Rubini 7+ , Antonella Amicucci 8 , Joelle Amselem 9 , Véronique Anthouard 3–5 , Sergio Arcioni 7 , François Artiguenave 3–5 , Jean-Marc Aury 3–5 , Paola Ballario 10 , Angelo Bolchi 6 , Andrea Brenna 10 , Annick Brun 1 , Marc Buée 1 , Brandi Cantarel 11 , Gérard Chevalier 12 , Arnaud Couloux 3–5 , Pedro Coutinho 11 , Corinne Da Silva 3–5 , France Denoeud 3–5 , Sébastien Duplessis 1 , Stefano Ghignone 2 , Bernard Henrissat 11 , Benoît Hilselberger 1,9 , Mirco Iotti 13 , Antonietta Mello 2 , Michele Miranda 14 , Giovanni Pacioni 15 , Hadi Quesneville 9 , Claudia Riccioni 7 , Roberta Ruotolo 6 , Richard Splivallo 16 , Vilberto Stocchi 8 , Alessandra Zambonelli 13 , Elisa Zampieri 2 , Arturo Roberto Viscomi 6 , Francesco Paolocci 7++ , Paola Bonfante 2++ , Simone Ottonello 6++ & Patrick Wincker 3–5++ 1 INRA, UMR 1136, INRA-Nancy Université, Interactions Arbres/Microorganismes, 54280 Champenoux, France. 2 Istituto per la Protezione delle Piante del CNR, sez. di Torino c/o Dipartimento Biologia Vegetale, Università degli Studi di Torino, Viale Mattioli, 25, 10125 Torino, Italy. 3 CEA, IG, Genoscope, 2 rue Gaston Crémieux CP5702, 91057 Evry. 4 CNRS, UMR 8030, 2 rue Gaston Crémieux, CP5706, F-91057 Evry, France. 5 Université d’Evry, F-91057 Evry, France. 6 Dipartimento di Biochimica e Biologia Molecolare, Università degli Studi di Parma, Viale G.P. Usberti 23/A, 43100 Parma, Italy. . 7 CNR-IGV Istituto di Genetica Vegetale, Unità Organizzativa di Supporto di Perugia, via Madonna Alta, 130, 06128 Perugia, Italy. 8 Dipartimento di Scienze Biomolecolari, Università degli Studi di Urbino, Via Saffi 2 - 61029 Urbino (PU), Italy. 9 INRA, URGI, Route de Saint-Cyr 78026 Versailles cedex. 10 Dipartimento di Genetica e Biologia Molecolare & IBPM (CNR), Università La Sapienza, Roma, Piazzale .A. Moro 5, 00185 Roma, Italy. 11 Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS-Universités Aix-Marseille I & II, Marseille, France. 12 INRA, UMR Amélioration et Santé des Plantes, INRA-Université Blaise Pascal, centre INRA de Clermont-Ferrand-Theix France. 13 Dipartimento di Protezione e Valorizzazione Agroalimentare, Università degli Studi di Bologna, Bologna, Italy. 14 Dipartimento di Biologia di Base ed Applicata, Università degli Studi dell’Aquila,Via Vetoio Coppito 1 - 67100 L’Aquila, Italy. 15 Dipartimento di Scienze Ambientali, Università degli Studi dell’Aquila,Via Vetoio Coppito 1 - 67100 L’Aquila, Italy. 16 University of Goettingen, Molecular Phytopathology and Mycotoxin Research, Grisebachstrasse 6, D-37077 Goettingen, Germany. * to whom correspondence should be addressed. E-mail: fmartin@nancy.inra.fr + These authors contributed equally to this work as second autor ++ These authors contributed equally to this work as senior authors. One-sentence summary. In addition to unraveling specific features of a ‘cult food’, the analysis of the Black Truffle genome shows that Ascomycete and Basidiomycete fungi have acquired different combination of molecular adaptations and genetic predisposition to evolve the ectomycorrhizal symbiosis.
2 Abstract. The Perigord black truffle is a delicious gourmet food and an ectomycorrhizal symbiont. To understand the biology and evolution of this fungus, its haploid genome was sequenced. Proliferation of transposable elements explains the larger size of this genome compared with other fungi. The truffle genome only contains ~ 7,500 genes with very few multigene families. It lacks large sets of carbohydrate degrading enzymes, but endoglucanases and pectinases involved in degradation of cell walls are expressed in ectomycorrhiza. Identification of two different mating-type loci demonstrated that T. melanosporum is an heterothallic, obligate outcrossing species. Consistent with a role in flavour formation, the expression of several sulfur metabolism genes is upregulated in fruiting body. Our study suggests that genetic predispositions for symbiosis evolved along different ways in ascomycetes and basidiomycetes. The truffle genome sequence is thus a key resource for understanding evolution of symbiosis, and accelerating genetic improvement for truffle production. Ectomycorrhizas (ECM) are the most frequent mycorrhizal type in forests of temperate and boreal latitudes. ECM fungi establish a mutualistic symbiosis with their host trees and as such are essential contributors to carbon and nitrogen cycles in soils (1). In the basidiomycete Laccaria bicolor, the expansion of gene families may have acted as a putative ‘symbiosis toolkit’ and might thus be a landmark of symbiosis evolution (2). However, this contention must be tempered by the caveat that the above features may reflect evolution of this particular mycorrhizal taxon and not a general trait shared by all ECM species (3). We therefore sequenced the nuclear genome of the Perigord black truffle (Tuber melanosporum Vittad.) (table S1) (4) [supporting online material (SOM) text S1] . This tree ECM symbiont is endemic to calcareous soils in southern Europe and produces hypogeous fruit bodies, so-called truffles (5) (Fig. S1), highly appreciated by European gastronomy for their organoleptic properties (i.e., taste and flavors) (5). In terms of understanding the evolution of Fungi genetic and developmental complexity, T. melanosporum genome is likely to be critically important, as this Pezizomycetes phylum is regarded as the basal clade of Ascomycota (Fig. S2) lacking any genomic coverage. The present genome analysis highlights processes that may underlie the symbiotic lifestyle as well as fruiting body formation.
The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis Francis Martin1*, Raffaella Balestrini2+, Olivier Jaillon3–5+, Annegret Kohler1+, Barbara Montanini6+, Emmanuelle Morin1+, Claude Murat1+, Benjamin Noel3–5+, Riccardo Percudani6+, Bettina Porcel3–5, Andrea Rubini7+, Antonella Amicucci8, Joelle Amselem9, Véronique Anthouard3–5, Sergio Arcioni7, François Artiguenave3–5, Jean-Marc Aury3–5, Paola Ballario10, Angelo Bolchi6, Andrea Brenna10, Annick Brun1, Marc Buée1, Brandi Cantarel11, Gérard Chevalier12, Arnaud Couloux3–5, Pedro Coutinho11, Corinne Da Silva3–5, France Denoeud3–5, Sébastien Duplessis1, Stefano Ghignone2, Bernard Henrissat11, Benoît Hilselberger1,9, Mirco Iotti13, Antonietta Mello2, Michele Miranda14, Giovanni Pacioni15, Hadi Quesneville9, Claudia Riccioni7, Roberta Ruotolo6, Richard Splivallo16, Vilberto Stocchi8 , Alessandra Zambonelli13, Elisa Zampieri2, Arturo Roberto Viscomi6, Francesco Paolocci7++, Paola Bonfante2++, Simone Ottonello6++ & Patrick Wincker3–5++ 1 INRA, UMR 1136, INRA-Nancy Université, Interactions Arbres/Microorganismes, 54280 Champenoux, France. 2 Istituto per la Protezione delle Piante del CNR, sez. di Torino c/o Dipartimento Biologia Vegetale, Università degli Studi di Torino, Viale Mattioli, 25, 10125 Torino, Italy. 3 CEA, IG, Genoscope, 2 rue Gaston Crémieux CP5702, 91057 Evry. 4 CNRS, UMR 8030, 2 rue Gaston Crémieux, CP5706, F-91057 Evry, France. 5 Université d’Evry, F-91057 Evry, France. 6 Dipartimento di Biochimica e Biologia Molecolare, Università degli Studi di Parma, Viale G.P. Usberti 23/A, 43100 Parma, Italy.. 7 CNR-IGV Istituto di Genetica Vegetale, Unità Organizzativa di Supporto di Perugia, via Madonna Alta, 130, 06128 Perugia, Italy. 8 Dipartimento di Scienze Biomolecolari, Università degli Studi di Urbino, Via Saffi 2 - 61029 Urbino (PU), Italy. 9 INRA, URGI, Route de Saint-Cyr 78026 Versailles cedex. 10Dipartimento di Genetica e Biologia Molecolare & IBPM (CNR), Università La Sapienza, Roma, Piazzale .A. Moro 5, 00185 Roma, Italy. 11 Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS-Universités Aix-Marseille I & II, Marseille, France. 12 INRA, UMR Amélioration et Santé des Plantes, INRA-Université Blaise Pascal, centre INRA de Clermont-Ferrand-Theix France. 13 Dipartimento di Protezione e Valorizzazione Agroalimentare, Università degli Studi di Bologna, Bologna, Italy. 14 Dipartimento di Biologia di Base ed Applicata, Università degli Studi dell’Aquila,Via Vetoio Coppito 1 - 67100 L’Aquila, Italy. 15 Dipartimento di Scienze Ambientali, Università degli Studi dell’Aquila,Via Vetoio Coppito 1 67100 L’Aquila, Italy. 16 University of Goettingen, Molecular Phytopathology and Mycotoxin Research, Grisebachstrasse 6, D-37077 Goettingen, Germany. * to whom correspondence should be addressed. E-mail: fmartin@nancy.inra.fr + These authors contributed equally to this work as second autor ++ These authors contributed equally to this work as senior authors. One-sentence summary. In addition to unraveling specific features of a ‘cult food’, the analysis of the Black Truffle genome shows that Ascomycete and Basidiomycete fungi have acquired different combination of molecular adaptations and genetic predisposition to evolve the ectomycorrhizal symbiosis. 1 Abstract. The Perigord black truffle is a delicious gourmet food and an ectomycorrhizal symbiont. To understand the biology and evolution of this fungus, its haploid genome was sequenced. Proliferation of transposable elements explains the larger size of this genome compared with other fungi. The truffle genome only contains ~ 7,500 genes with very few multigene families. It lacks large sets of carbohydrate degrading enzymes, but endoglucanases and pectinases involved in degradation of cell walls are expressed in ectomycorrhiza. Identification of two different mating-type loci demonstrated that T. melanosporum is an heterothallic, obligate outcrossing species. Consistent with a role in flavour formation, the expression of several sulfur metabolism genes is upregulated in fruiting body. Our study suggests that genetic predispositions for symbiosis evolved along different ways in ascomycetes and basidiomycetes. The truffle genome sequence is thus a key resource for understanding evolution of symbiosis, and accelerating genetic improvement for truffle production. Ectomycorrhizas (ECM) are the most frequent mycorrhizal type in forests of temperate and boreal latitudes. ECM fungi establish a mutualistic symbiosis with their host trees and as such are essential contributors to carbon and nitrogen cycles in soils (1). In the basidiomycete Laccaria bicolor, the expansion of gene families may have acted as a putative ‘symbiosis toolkit’ and might thus be a landmark of symbiosis evolution (2). However, this contention must be tempered by the caveat that the above features may reflect evolution of this particular mycorrhizal taxon and not a general trait shared by all ECM species (3). We therefore sequenced the nuclear genome of the Perigord black truffle (Tuber melanosporum Vittad.) (table S1) (4) [supporting online material (SOM) text S1] . This tree ECM symbiont is endemic to calcareous soils in southern Europe and produces hypogeous fruit bodies, so-called truffles (5) (Fig. S1), highly appreciated by European gastronomy for their organoleptic properties (i.e., taste and flavors) (5). In terms of understanding the evolution of Fungi genetic and developmental complexity, T. melanosporum genome is likely to be critically important, as this Pezizomycetes phylum is regarded as the basal clade of Ascomycota (Fig. S2) lacking any genomic coverage. The present genome analysis highlights processes that may underlie the symbiotic lifestyle as well as fruiting body formation. 2 Transposable elements and genome defense. The 125 megabase genome of T. melanosporum is the largest sequenced fungal genome to date (6) (table S1), but no evidence for large scale duplications was observed. The ~4-fold larger size of the truffle genome compared with other sequenced ascomycetes is accounted for by multi-copy transposable elements (TE) (Fig. S3) which constitute about 66% of the assembled genome (Fig. S4) (SOM text S4). Estimated insertion times suggest a major wave of retrotransposition at <5 million year ago (Fig. S5). Most TEs are not uniformly distributed across the genome (Fig. 1). Relics of TEs riddled with stop codons have been found in a number of ascomycetes as a result of repeat-induced point mutation (RIP) (7). No RIP footprint (8) was detected in the T. melanosporum genome (SOM text S3), suggesting that RIP was not an active defense mechanism when TEs invaded this genome, or else that TEs were adapted to tolerate and escape RIP or related methylation mechanisms, such as methylation-induced premeiotically (MIP). The proliferation of TEs within the truffle genome may result from its low effective population size (SOM text S2.5) (9) during postglaciation migrations (10). In filamentous fungi, global genome defense mainly relies on RNAi and on DNA methylation. Two DNAmethyltransferases (DMT) are present in the T. melanosporum genome: TmelDMT2 and TmelDMT1 (Fig. S6). Distinct functional roles for these two DMTs are supported by the presence of a single DNA methylase domain in both TmelDMT2 and Neurospora crassa DIM-2, rather than two separate domains as in MIP-related Masc1 and TmelDMT1, and by the preferential expression of the latter in fruiting body. T. melanosporum thus likely uses general, rather than specialized RNAi and DNA methylation processes for genome defense (SOM text S6.1). The gene complement of Tuber. The predicted proteome is in the lower range of sequenced filamentous fungi as only 7496 protein-coding genes were identified (6) (SOM text S4). They are mainly located in TE-poor regions (Fig. 1) and the gene density is heterogeneous when compared with that of other ascomycetes (Fig. S7). Amongst the predicted proteins, only 3970, 5596 and 5644 showed significant sequence similarity to proteins from Saccharomyces cerevisiae, Neurospora crassa and Aspergillus niger, respectively (Fig. S8). This agrees with the predicted ancient separation (>450 Myr ago) of the Pezizomycetes 3 from the other ancestral fungal lineages (Fig. S2) (11). Of the ~5600 T. melanosporum genes that have an ortholog, very few show conservation of neighboring orthologs (synteny) in at least one of the other species (Fig. S9, SOM text S5.2). T. melanosporum genome shows a structural organization strikingly different from other sequenced ascomycetes; the largest syntenic region (with Coccidioides immitis) only contains 99 genes with 39 orthologs (Fig. S10). TE proliferation likely facilitated genome rearrangements. Some regions of mesosynteny were however detected (Fig. S9), suggesting that T. melanosporum could be used for assessing the genome organization of ancestral ascomycete clades. Expression of 98% of the predicted genes was detected in free-living mycelia, ECM root tips and/or fruiting bodies by custom-oligoarrays (SOM text S8, table S2, Fig. S23) and EST sequencing (SOM text S2.4). Only a low proportion (6%) is differentially expressed (fold-ratio >4.0, P<0.05) in either ectomycorrhiza (table S3) or fruiting body (table S4). Only 61 ectomycorrhiza-, fruiting body- or free-living mycelium specific transcripts were detected (table S5). Transcripts coding for lectins, MFS transporters, redox proteins and polysaccharide degrading enzymes are strikingly enriched in symbiotic tissues. They may play a role in adhesion to host cells and colonization of root apoplast. Gene and domain family expansion patterns. One of the most striking characteristics of the T. melanosporum genome is the almost complete absence of highly similar gene pairs. Of the predicted 7496 protein-coding genes, only seven pairs share >90% amino-acid identity in their coding sequence, whereas 30 pairs share >80% identity (Fig. S11A). The latter value is significant as RIP mutates duplicated sequences that share greater than ~80% nucleotide similarity (10). An ancestral RIP, or a similarly acting mechanism, has likely prevented the emergence of novel genes through duplication. Multigene families in T. melanosporum are in limited number and comprise only 19% of the predicted proteome; most families have only two diverging members (Fig. S12). The rate of gene family gain is much lower than the rate of gene loss (Fig. S11B) and amongst the 11234 gene families found in ascomycetes, 5695 appear to be missing in T. melanosporum (Fig. S11) (15). This feature may reflect the genome organization of the ascomycete common ancestor as T. melanosporum is the earliest diverging lineage within the Pezizomycotina clade. By comparison to other 4 ascomycetes, gene families predicted to encode metabolite transporters (e.g., amino acid & sugar permeases), secondary metabolism enzymes (e.g., polyketide synthases and cytochrome P450s) and carbohydrate-active enzymes (table S6) are lacking. Besides its low rate of gene family gain, T. melanosporum is characterized by a small-sized tRNA gene repertoire, a strikingly uniform codon usage, and an extremely weak translational selection (12) compared to other sequenced filamentous ascomycetes (SOM text S7.1). Truffles: a hypogeous fruiting body delicacy. T. melanosporum is the first sequenced fungus producing highly flavoured hypogeous fruiting bodies (SOM text S6.4). Genomic signatures of the long-standing (>2000 years-old) reputation of the black truffle as gastronomic delicacy are its extremely low allergenic potential (Fig. S15), coupled with the lack of key mycotoxin biosynthetic enzymes (SOM text S6.2, table S14), and the preferential overexpression of various flavour-related enzymes in fruiting body (Figs. S16-S18). Among the latter are specific subsets of sulfate assimilation and S-amino acid interconversion enzymes – especially, cystathionine lyases known to promote the side-formation of methyl sulfide volatiles abundant in truffles (13) – as well as various enzymes involved in amino acid degradation through the Ehrlich pathway and giving rise to known truffle volatiles and flavors, e.g. 2-methyl-1-butanal (SOM text S7.4) (Figs. S17 & 18). Also notable, given the subterranean habitat of this fungus, is the presence of various putative light-sensing components (SOM text S6.6), which might be involved in light avoidance mechanisms and/or in the control of seasonal developmental variations, especially those related to fruiting body formation and sexual reproduction. The analysis of genes implicated in the mating process, including pheromone response, meiosis and fruiting body development showed that most sex-related components identified in other ascomycetes are also present in T. melanosporum (table S11). Sexual reproduction in ascomycete filamentous fungi is partly controlled by two different mating-type (MAT) genes that establish sexual compatibility (14): one MAT gene codes for protein with an alpha box domain, whereas the other encodes a high mobility group (HMG) protein (SOM text S6.5). It was widely believed that T. melanosporum was an homothallic or even an exclusive selfing species (15). The sequenced Mel28 strain contains the HMG locus and the opposite, linked alpha MAT 5 locus was identified in another natural isolate (Fig. S19), confirming recent hints that T. melanosporum is heterothallic and thus an obligate outcrossing species (16). This result has major implications for truffle cultivation which will be improved by the use of host plants harboring truffle strains of opposite mating types. In most ascomycetes, the genomic region flanking the MAT locus shows an extended conservation (14), but there is no synteny of the MAT loci between T. melanosporum and other sequenced fungi (Fig. S20). Saprotrophism. We observed an extreme reduction in the number of enzymes involved in the degradation of plant cell wall (PCW) oligo- and polysaccharides (table S11). A comparison of T. melanosporum candidate CAZymes (17) (SOM text S6.11) with those of ascomycetous phytopathogens points to an adaptation to symbiosis (tables S23 & S24). Reduction in PCW CAZymes affects almost all glycosyl hydrolase (GH) families, some of which are completely absent. For instance, there is no GH5 cellulase appended to a cellulose-binding module (CBM1) and no cellulases from families GH6 and GH7 were found in the genome (table S24). However, a GH5 endoglucanase, together with a secreted GH12 xyloglucanspecific endoglucanase, a pectin methylesterase, a secreted GH28 polygalacturonase and a rhamnogalacturonan acetylesterase, were amongst the most highly upregulated transcripts in ECM root tips, suggesting a role for these enzymes in PCW degradation and remodeling during host colonization (table S3 & fig. S21). This repertoire of mycorrhiza-induced cell wall degrading enzymes is likely to be of profound importance for the symbiotic interaction. At variance with L. bicolor (5), it appears that T. melanosporum mycelium penetrates between colonized roots by degrading apoplastic pectin polymers. The ability to establish ECM symbioses is a widespread characteristic of various ascomycetes and basidiomycetes (1). The truffle genome reveals features of an ancestral symbiotic lineage that diverged from other fungal lineages >450 Myr ago (11). Despite their similar symbiotic structures and similar beneficial effects on plant growth, the ascomycete T. melanosporum and the basidiomycete L. bicolor encode strikingly different proteomes – compact with very few multigene families vs. large with many expanded multigene families – and symbiosis-regulated genes. Effector-like proteins, such as the L. bicolor ECM-induced SSP MiSSP7 (2), are not expressed in T. melanosporum ectomycorrhizas. Based on our results, the ECM appears 6 as an ancient innovation which developed several times during the course of Mycota evolution using different ‘toolkits’ (18). Sequencing of the T. melanosporum genome has provided unprecedented insights into the molecular bases of symbiosis, sex and fruiting in a most popular representative of the only lifestyle not yet addressed by Ascomycota genomics (19). It will be a major step in moving truffle research into the realm of ecosystem science, nothing to say about the exceptional social and cultural impact of a deeper understanding of the genome of one of the worldwide recognized icons of European gastronomy and culture. References and Notes 1. S.E. Smith, D. J. Read, Mycorrhizal Symbiosis (2nd edition, Academic Press, London) (1996). 2. F. Martin et al., Nature 452, 88 (2008). 3. F. Martin, M. A. Selosse, New Phytol. 180, 296 (2008). 4. Material and methods are available as supporting material on Science Online. 5. A. Mello, C. Murat, P. Bonfante, FEMS Microbiol. Lett. 260, 1 (2006). 6. J. E. Galagan, M. R. Henn, L. J. Ma, C. A. Cuomo, B. Birren, Genome Res. 15, 1620 (2005). 7. J.E. Galagan, E.U. Selker, Tr. Genetics 20, 417 (2004). 8. J. K. Hane, R. P. Oliver, BMC Bioinformatics 12, 478.000 (2008). 9. M. Lynch, J.S. Conery, Science 302, 1401 (2003). 10. Murat C, Diez J, Luis P, Delaruelle C, Dupre C, Chevalier G, Bonfante P, Martin F, New Phytol. 164, 401 (2004). 11. JW Taylor, ML Berbee, Mycologia 98, 838 (2006). 12. P. G. Higgs, W. Ran, Mol. Biol. Evol. 25, 2279 (2008). 13. R. Splivallo, S. Bossi, M. Maffei, P. Bonfante, Phytochem. 68, 2584 (2007). 14. JA Fraser, J Heitman, Mol Microbiol 51: 299 (2004). 15. G. Bertault, M. Raymond, A. Berthomieu, G. Callot, D. Fernandez, Nature 394, 734 (1998). 16. C. Riccioni C, B. Belfiori B, A. Rubini A, V. Passeri V, S. Arcioni S, F. Paolocci F, New Phytol. 180, 466 (2008). 17. B. L. Cantarel, P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard, B. Henrissat, Nucl. Ac. Res. 37, D233 (2009). 18. D. S. Hibbett, P. B. Matheny, BMC Biology 7, 13 (2009). 19. DM Soanes, I Alam, M Cornell, HM Wong, C Hedeler, NW Paton, M Rattray, SJ Hubbard, SG Oliver, NJ Talbot, PLoS one 3, e2300 (2008). 20. We thank the late L Riousset and C Dupré for providing the Mel28 isolate. The authors acknowledge Jean Weissenbach and 7 Marc-Henri Lebrun for continuous support. The genome sequencing of Tuber melanosporum was funded by the Genoscope, Institut de Génomique, CEA and Agence Nationale de la Recherche (ANR). Annotation and transcriptome analysis were supported by INRA, the European FP6 Network of Excellence EVOLTREE, Région Lorraine, the ANR FungEffector project, Fondazione Cariparma, Compagnia di San Paolo and the Italian Ministry of Education, University and Research (MIUR), Regione Umbria and Instituto Pasteur Fondazione Cenci Bolognetti. F.M. coordinated the project, annotation and transcriptome analysis; P.W. coordinated the sequencing and automated annotation at Genoscope. F.M. and S.O. wrote the manuscript with input from P.B.. R.B., A.K., O.J., B.M., E.M., C.M., B.N., R.P., B.P., A. R. and P.W. also made substantial contributions (listed in alphabetical order). All others contributed as members of the Tuber genome consortium or Genoscope sequencing and are listed in alphabetical order. Susanne von Pall di Tolna assisted in EST analysis. We would like to thank Antonella Bonfigli, Michele Buffalini, Sabrina Colafarina, Timothé Flutre, Shwet Kamal, Paola Ceccaroli, Christophe Roux, Roberta Saltarelli, and Osvaldo Zarivi for their assistance in annotation, and David Hibbett and John Heitman for useful comments on an early draft of the mansucript. Assemblies and annotations are available at INRA (http://mycor.nancy.inra.fr/IMGC/TuberGenome/) and Genoscope (https://www.genoscope.cns.fr/secure-nda/Tuber/html/entry_ggb.html). Genome assemblies together with predicted gene models and annotations were deposited at DNA Data Bank of Japan/European Molecular Biology Laboratory/GenBank under the project accession numbers CABJ01000001-CABJ01004455 (WGS data) and FN429986-FN430383 (scaffolds and annotations). Supporting Online Material Supplementary Methods, Results & Discussion References for Supplementary Materials Supplementary Tables Supplementary Figures 8 Figure Legends Figure 1. Genomic landscape of Tuber melanosporum. Area charts quantify transposable elements (66%) and genes (18%) on the supercontig 5 of the Arachne assembly. Heat maps tracks detail the distribution of selected elements. SSR, simple sequence repeats; TE, transposable elements; LTR, long terminal repeat retrotransposons; LINE, Long interspersed elements; TIR, Terminal inverted repeats; no cat, unknown class of TE. Figures for the 15 largest supercontigs are available at INRA TuberDB. 9 10 SUPPLEMENTARY INFORMATION The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis Supplementary Methods, Results & Discussion 1. Background information 2. Genome sequencing and assembly 3. Transposable elements 4. Gene prediction and annotation 5. Orthology, synteny, tandem repeats and multigene families 6. Targeted annotation of specific gene categories 7. Non-coding RNAs 8. Whole-genome exon oligoarray analyses 9. References for Supplementary Materials Supplementary Tables Supplementary Figures SUPPLEMENTARY INFORMATION The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis Supplementary Methods, Results & Discussion 1. Background information 1.1. The life cycle of Tuber melanosporum The Perigord truffle (Tuber melanosporum Vittad.) is endemic to calcareous soils in southern Europe and found in symbiotic association with roots of deciduous trees, mostly oaks (Quercus spp.) and hazelnut trees (Corylus avellana). The fungus requires a host tree to complete its life cycle and produce hypogeous fruit bodies, so-called truffles (Fig. S1) (1). The meiotic spores germinate in the Spring, producing a vegetative mycelium growing in the soil and the rhizosphere, which results in colonisation of root tips and further development of ectomycorrhizas. Extramatrical hyphae then aggregate to form fruit body initials. The latter developed to the fruit body during Fall and early Winter completing the truffle life cycle. In this mature truffle, the ascogenous heterokaryotic pseudotissues, resulting from an unknown fertilization process after outcrossing, are surrounded by homokaryotic maternal pseudotissues. After meiosis, the ascospores are dispersed, mainly by mycophagous animals, including wild boars. The spores pass through the digestive tract and are dispersed in the faeces over short distances (several kilometres). This short-distance land dispersal contrasts with the longer-distance dispersal of epigeous fungal species through airborne spores. Tuber species are found in temperate, mediterranean, and continental climates. They are excluded from tropical, dry (annual rainfall less than 350 mm) and very cold climates. The fruiting body of T. melanosporum is an edible truffle (= hypogeous ascocarp or ascoma), which is a highly appreciated delicacy for its delicate organoleptic properties (i.e., taste and aroma). The high prize of the Perigord truffle (from 300 to 1000 €/kg) has prompted the development of its culture through man-made inoculation of seedlings. It can be assumed that the natural distribution and genetic structure of populations of this black truffle species has been structured at least by five major factors: (i) the distribution of its host plant species (i.e. ectomycorrhizal deciduous trees), (ii) the spore dispersal by mycophagous animals; (iii) limiting ecological factors (calcareous soils and a temperate climate), (iv) geographical barriers (i.e. Mediterranean Sea, which limits its expansion towards North Africa) and (v) historical events (i.e. northward recolonization routes from glacial refugia in southern Europe). Phylogeographic analysis of T. melanosporum populations (2,3) suggested that host post-glacial expansion was one of the major factors that shaped the truffle population structure. The fruiting of ectomycorrhizal Tuber spp. depends on a complex set of variables, including metabolites and signals produced by the host plant, the nutritional status of the substrate, and unknown environmental cues (e.g., humidity and temperature) (4,5). The different types of cells and (pseudo)tissues of fruit bodies of ascomycetes (ascomata) are the result of a differentiation process leading to the production of asci containing meiospores. Morphological descriptions of ascoma development in truffles are scarce and mainly illustrate 2 SUPPLEMENTARY INFORMATION advanced developmental stages (6). This situation is due to the hypogeous habitat of truffles, which leads to erratic sampling. In addition, symbiotic relationships are required for the development of the truffle fruit body, and fruit bodies cannot be produced in vitro (1). 1.2. Phylogeny of T. melanosporum According to molecular clock analyses, the genus Tuber would have arisen between 270 and 140 million years (Myr) ago (7). Modern plant phylogenies show that the ectomycorrhizal lifestyle has arisen independently over the course of evolution in Pinaceae and several disparate lineages of angiosperms (8,9). The oldest known fossil ectomycorrhiza date from 50 Myr ago (11), but it is now thought that the symbiosis predates this period by some time (~ 135 Myr), because the Pinaceae and many of the angiosperm families, including the Dipterocarpaceae, whose current members establish ectomycorrhizal symbiosis were extant well before 50 Myr, along with the major fungal lineages with modern ectomycorrhizal representatives (12,13,14). The Perigord truffle (Tuber melanosporum Vittad.) belongs to the Ascomycota (Discomycetes, Pezizomycota, Pezizomycetes, Pezizales, Tuberaceae). This is the first Pezizomycetes sequenced to date. This phylum is considered as the earliest diverging lineage within the Pezizomycotina clade. The plant pathogens, Fusarium graminearum and Magnaporthe grisea, and the saprotrophs, Podospora anserina and Neurospora crassa, belong to the Sordariomycetes. The human pathogens, Aspergillus fumigatus, Neosartorya fisheri and Coccidioides immitis, and the saprotrophs, Aspergillus nidulans and A. niger, belong to the Eurotiomycetes. The plant pathogens Phaeosphaeria nodorum and Pyrenophora tritici-reprentis belong to the Dothideomycetes (Fig. S2). The analysis of the T. melanosporum genome therefore has the potential to illuminate features of their last common ancestor – the ancestral Ascomycotina – which lived approximately 400 to 800 Myr (15). Phylogenetic analysis based on well-conserved protein-coding genes showed that T. melanosporum is the earliest diverging clade within the current sequenced fungi belonging to the Pezizomycotina (Fig. S2). 1.3. Tuber melanosporum Mel28: origin of the sequenced strain and culture conditions A fruiting body of T. melanosporum was collected by Louis Riousset at St Rémy de Provence (Bouches-duRhône, France) in February 1988 and deposited at the INRA-Clermont-Ferrand Tuber Collection. Free-living vegetative mycelium from this fruiting body was subcultured (= strain Mel28) by Chantal Dupré and Gérard Chevalier (INRA-Clermont-Ferrand). For purification of the high molecular weight DNA used for genomic library construction, the haploid homokaryotic strain Mel28 was grown on liquid medium and incubated at 25°C. This strain is available upon request to Francis Martin (INRA-Nancy). 2. Genome sequencing and assembly 2.1. Shotgun sequencing strategy and results A whole-genome shotgun strategy (WGS) was adopted for sequencing and assembling the T. melanosporum draft genome. All genomic DNA was obtained from the vegetative mycelium of the homokaryotic haploid strain Mel28. Template DNA (200 µg) was extracted from mycelium using the Qiagen Genomic Tip DNA extraction kit. This high molecular weight DNA (~40 kb average size) was randomly sheared and size-fractionated to create plasmid libraries with roughly 3 kb and 10 kb inserts. From these two libraries, 1284900 reads were obtained by Sanger sequencing at the Genoscope facilities. The reads were screened for vector using cross_match, then trimmed for vector and quality. Reads shorter than 100 bases after trimming were then excluded. After trimming, the removal of multiple read attempts and the exclusion of overly-short reads, the pool of data available for the assembly consisted of 1262177 reads, with ~1250 MB of sequence. Nearly 10X total sequence redundancy of the predicted 125 megabase was thus obtained from the 3 kb and 10 kb 3 SUPPLEMENTARY INFORMATION plasmids. The data was assembled using the ARACHNE assembler (16). The 4484 contigs (N50 = 62 kb) were assembled (%ID >98%) in 398 supercontigs (N50= 637 kb) corresponding to 124.946 Mb of sequence (%GC= 52.02). Based on the number of alignments per read, the main genome scaffolds were at a depth of 10. The largest supercontig has a size of 2.785 Mb. The T. melanosporum genome is the largest fungal genome published so far [see the Broad Institute’s Fungal Genome Initiative (FGI), http ://www.broad.mit.edu/annotation/fungi/fgi/ and JGI (http ://www.jgi.doe.gov/) web sites]. No evidence of whole genome duplication or large-scale dispersed segmental duplication was detected. The 20 largest supercontigs (between 1 and 2.7 Mb), presumably correspond to chromosome arms or even entire chromosomes. Assemblies and annotations are available at INRA (http://mycor.nancy.inra.fr/IMGC/TuberGenome/) and Genoscope (https://www.genoscope.cns.fr/secure-nda/Tuber/html/entry_ggb.html). Genome assemblies together with predicted gene models and annotations were deposited at DNA Data Bank of Japan/European Molecular Biology Laboratory/GenBank under the project accession numbers CABJ01000001-CABJ01004455 (WGS data) and FN429986-FN430383 (scaffolds and annotations). 2.2. Telomeric repeats Putative telomeric repeats were identified from repetitive sequences found at the ends of ARACHNE supercontigs. The consensus telomeric repeat, T2AC3, was used as a query sequence in a BLASTN search (with dust filter disabled, E-value < 1e-10) against the assembled genome; 17 supercontigs have a telomere on one end; most of them are associated to a LINE transposable element. This would suggest there are at least eight chromosomes in the haploid T. melanosporum genome in agreement with the karyotyping (17). 2.3. Completeness of the assembly A low fraction of shotgun reads (i.e., 19214) was not assembled in the ARACHNE assembly suggesting that the assembled regions did capture the vast majority of protein-coding genes in T. melanosporum. This was checked by aligning 88829 Sanger and 113855 ‘454’ ESTs to the assembly using a two-step strategy. As a first step, BLAST served to generate the alignments between the repeat-masked EST sequences and the genomic sequence using the following settings: W = 20, X = 8, match score = 5, mismatch score = -4. The sum of scores of the high-scoring pairs was then calculated for each possible location, then the location with the highest score was retained if the sum of scores was more than 1,000. Once the location of the transcript sequence was determined, the corresponding genomic region was extended by 5 kb on either side. Transcript sequences were then realigned on the extended region using EST_GENOME (mismatch 2, gap penalty 3) (http://www.well.ox.ac.uk/~rmott/ESTGENOME/est_genome.shtml) to define transcript exons. These transcript models were fused by a single linkage clustering approach, in which transcripts from the same genomic region sharing at least 100 bp are merged. The assembled genome sequence provides near complete coverage of genes, since 98.5% of T. melanosporum cDNAs (section 2.4) could be aligned to the assembly covering 14.8 Mb (12%) of the assembly. 2.4. cDNA libraries and EST clustering The cDNAs were constructed from free-living mycelium (FLM) and fruiting body (FB) at the Genoscope. The library FLM was obtained from the homokaryotic strain Mel28 used for genome sequencing. Gleba (sterile 4 SUPPLEMENTARY INFORMATION mycelium) of a fruit body harvested in Auvergne was used for the library FB. Harvested mycelia were frozen in liquid nitrogen and stored at -80° C prior to RNA extraction. Poly-A+ RNA was used to make cDNA using the CloneMiner cDNA library construction Kit (pDONRTM222 vector, Invitrogen Life Technologies) following the supplier’s instructions. Paired-end sequencing of cDNA clones was performed at the Genoscope using conventional Sanger sequencing technology (ABI3730xl DNA analyzers, Applied Biosystems). Base calling of the 104549 5’- and 3’-cDNA sequences was carried out using Phred. Leading and trailing vector, and polylinker sequences were removed by Seqclean filters. Groups of sequences were assembled into clusters using Cap3 and parsed using dedicated Perl scripts. The 92371 edited Sanger EST will be available at the National Center for Biotechnology Information (NCBI) dbEST (accession number FP383504FP458874,FP458876-FP475875). Pyrosequencing was carried out on cDNAs from 200 µg of total RNA extracted from a fruiting body (sample #20044802) collected in October 2007 (by Henri Dessolas in St Front d’Alemps, Dordogne) under an oak tree. PolyA+ RNA were purified on Oligotex (Qiagen) according to the manufacturer’s instructions. cDNAs were synthesized using the SMART cDNA synthesis kit (Clontech) according to the manufacturer’s instructions and purified on a QIAquick PCR purification column (Qiagen). Adapter ligation, nebulization and DNA sequencing was performed by COGENICS (Meylan, France). A half-plate pyrosequencing on the Genome Sequencer FLX 454 System (454 Life Sciences/Roche Applied Biosystems, Nutley, New Jersey, USA) resulted in 164904 reads; 136640 reads which satisfied the length and sequence quality criteria and were assembled by using Newbler; 5641 TCs mapped to the genome assembly or the gene models. 2.5. Detection of single nucleotide polymorphisms in ESTs To detect SNPs in the cDNA pools, we used the gene models as a reference sequence to which individual ESTs were aligned. We used three EST datasets: – 44361 Sanger ESTs from the free-living homokaryotic mycelium of the isolate Mel28 from Provence (used for genome sequencing) (= Tm_FLM_AB). – 44468 Sanger ESTs from the heterokaryotic fruiting body so-called ‘Auvergne_Chevalier ‘(= Tm_FB_CD). – 136640 ‘454’ ESTs from the heterokaryotic fruiting body so called ‘Dordogne_ Dessolas’ (= Tm_FB_454). Sequence alignments were carried out with BLASTN and parsed with custom-made scripts based on Bioperl. Each read was aligned to only a single best homologous site in the reference genomic sequence. Reads aligning equally well in more than one location in the genome were discarded. For the analysis reported here, we considered only single nucleotide polymorphisms (SNPs), excluding all indels and variants involving more than one nucleotide. We also imposed the constraint that a nucleotide position must be covered by at least two ESTs in a given isolate. The number of sites meeting these criteria for the different isolates was as follows: Tm_FLM_AB, 1164268 sites; Tm_FB_CD, 1221914 sites and Tm_FB_454, 526759 sites. A site was considered to be polymorphic if (i) the substitution was confirmed by at least two different ESTs in a given isolate; and (ii) the frequency of the mutation in the isolate was >=40%. Protein coding sequences for each consensus was delimited using the T. melanopsorum peptides translated from gene models. Next, we determined whether SNPs positioned in consensus coding sequences introduced synonymous or nonsynonymous mutations by comparing the translated amino acids from the reference and variant sequences. By considering the proportion of silent (~25%) and replacement (~75%) sites, we calculated the rate of synonymous substitution per silent sites (S), and the rate of nonsynonymous substitutions per replacement sites (R). For the ‘Tm_FLM_AB’ ESTs, 292803 synonymous and 878410 5 SUPPLEMENTARY INFORMATION nonsynonymous sites on a total of 1171213 sites were found; 7 synonymous and 14 nonsynonymous substitutions were identified (S = 2.4e-05, R = 1.6e-05; R/S = 0.67). For the ‘Tm_FB_454’ ESTs, 133634 synonymous and 400902 nonsynonymous sites on a total of 534536 sites were found; 95 synonymous and 153 nonsynonymous substitutions were identified (S = 7.1e-04, R = 3.8e-04; R/S = 0.54). For the ‘Tm_FB_CD’ ESTs, 307272 synonymous and 921818 nonsynonymous sites on a total of 1229090 sites were found; 66 synonymous and 85 nonsynonymous substitutions were identified (S = 2.1e-04, R = 9.2e-05; R/S = 0.43). It follows from the above calculations that the level of neutral polymorphism (S) in T. melanosporum is in the 10-4 range; two order of magnitude lower than that observed in N. crassa (S = ~2 x 10-2) and S. cerevisiae (S = ~4 x 10-2) (18). The parameter S can be used as an estimator of the composite parameter Nu (where N is the long-term effective population site and u is the mutation rate per nucleotide per generation). Given the relation S = 2Nu and assuming a mutation rate of the order of 10-8 to 10-9, the long-term effective population size for T. melanosporum would be between 10000 and 350000 individuals. 3. Transposable elements Overall, the T. melanosporum genome contains an unusual, strikingly rich and diverse population of transposable elements (TEs) (Fig. S3). These TEs were predicted anonymously using the REPET pipeline (19). The TEdenovo pipeline was used to detect TEs, grouped them in families and classified the consensus of each family. The consensus sequences were obtained with PILER (20), RECON (21) and BLASTER (19) clustering methods. The TEannot pipeline annotated TEs in the genome using the consensus library obtained as output of TEdenovo. Using the 2515 consensus sequences coming from the TEdenovo pipeline, TEannot masked 71.32 Mb corresponding to 57.72 % of the T. melanosporum genome (Fig. S3). Although previously identified major TE superfamilies found in other fungi were found in T. melanosporum, 728 out of the 2515 TEs consensus sequences (12.97 % of the genome) were specific to T. melanosporum (Fig. S3). The most abundant TEs are Class 1 Gypsy/Ty3-like elements which represent 29.51% of the genome (Fig. S4). To identify full length LTRs retrotransposons, a de novo search was also performed with LTR_STRUC (22). The program yielded 304 full-length candidate LTR retrotransposon sequences, which were checked for their homology using BLASTN algorithm against the consensus sequences coming from the REPET pipeline. Amongst the 304 putative full length LTRs, 271 were attributed to Gypsy/Ty3-like elements and 13 to Copia/Ty1-like. Other 20 elements displayed sequence identity with several different family of TE elements indicating that they did not correspond to a single repeat element. Comparison of the T. melanosporum genome with other fungal genomes reveals an unusual genome organization, comprised of blocks of protein-coding genes in which gene density is relatively high and repeat content (e.g. LTR-Rs) is relatively low, separated by regions in which gene density is low and repeat content is high (Fig. 1). Recent proliferation of Gypsy elements underlies the genome expansion. The insertion age of full length LTRs (Fig. S5) was determined from the evolutionary distance between 5’- and 3’-solo LTR derived from a ClustalW alignment of the two LTR sequences using the Kimura correction in ClustalW. For the conversion of the sequence distance to putative insertion age, a substitution rate of 1.3e-8 mutations per site per year was used (23). Most full-length Gypsy/Ty3-like elements were inserted in the T. melanosporum genome 2 to 3 millions years ago (Fig. S5). LINE I are the second most frequent TE family corresponding to 5.68 % of T. melanosporum genome. Amongst the class II elements, the Tc1/Mariner are the most abundant with 258 consensus sequences corresponding to 4.20 % of T. melanosporum genome (Fig. S3 & S4). Consistent with a model of repeat-driven expansion of the T. melanosporum genome, the majority of TEs in the genome are highly similar to their consensus sequences, indicating a high rate of recent transposon activity (Fig. S5). In addition, we have observed and experimentally confirmed examples of active elements 6 SUPPLEMENTARY INFORMATION (>1000) by transcript profiling using the NimbleGen oligoarrays (section 9). 4. Gene prediction and annotation Most of the genome comparisons were performed with repeat masked sequences. For this purpose, we searched and masked sequentially several kinds of repeats: known T. melanosporum TEs (see SOM section 4), repeats and transposons available in Repbase (http://www.girinst.org/repbase/update/index.html) with the RepeatMasker program (24) (http://www.repeatmasker.org/), and tandem repeats with the TRF program (25). The UniProt (26) database was used to detect well conserved genes between T. melanosporum and other species. As GeneWise (27) is time greedy, the UniProt database was first aligned with the T. melanosporum genome assembly using BLAT (with parameters minIdentity = 0). Then High-scoring Segment Pairs (HSPs) were filtered on their score and their length. HSPs from the same protein were clustered on the genomic position, to assign one (or several) loci to each peptide. For a given locus, the five best matches were chosen for a GeneWise alignment. GeneID (28) and SNAP (Semi-HMM-based Nucleic Acid Parser) (29). Ab initio gene prediction softwares were trained on 250 protein-coding genes that had been manually annotated, using cDNA sequences, and reviewed by the Consortium experts. All the resources described here were used to automatically build T. melanosporum gene models using GAZE (30) (http://www.sanger.ac.uk/Software/analysis/GAZE). Individual predictions from each of the programs (GeneID, SNAP, Genewise and est2genome) were broken down into segments (coding, intron, intergenic) and signals (start codon, stop codon, splice acceptor, splice donor, transcript start, transcript stop). Exons predicted by ab-initio softwares, GeneWise, and Est2genome were used as coding segments. Introns predicted by GeneWise and est2Genome were used as intron segments. Intergenic segments created from the span of each mRNA, with a negative score (coercing GAZE not to split genes). Predicted repeats were used as intron segments, and non-coding RNAs as intergenic segments, to avoid prediction of genes coding proteins in such regions. The whole genome was scanned to find signals (splice sites, start and stop codons), and two signals, transcript START and STOP, were extracted from the ends of mRNAs. Each segment extracted from a software output which predicts exon boundaries (like GeneWise, est2genome or ab-initio predictors), was used by GAZE only if GAZE chose the same boundaries. Each segment or signal from a given program was given a value reflecting our confidence in the data, and these values were used as scores for the arcs of the GAZE automaton. All signals were given a fixed score, but segment scores were context sensitive: coding segment scores were linked to the percentage identity (%ID) of the alignment; intronic segment scores were linked to the %ID of the flanking exons. The impact of each data source (GeneID, etc.) was evaluated on a reference sequence, and a weight was assigned to each resource to further reflect its reliability and accuracy in predicting gene models. This weight acts as a multiplier for the score of each information source, before processing by GAZE. When applied to the entire assembled sequence, GAZE predicts 7496 gene models; 1309 of these predicted genes have been manually curated and revised (if needed). Protein domains were predicted using InterProScan (31) against various domain libraries (Prints, Prosite, Pfam, ProDom & SMART) (http ://www.ebi.ac.uk/interpro/). Annotations were also assigned to Gene Ontology (GO) (32) (http://www.geneontology.org/), eukaryotic clusters of orthologous groups (KOG) (33) Kyoto Encyclopedia of Genes and Genomes (KEGG) database (34) (http ://www.genome.jp/kegg/) by homology search against the corresponding databases and against EC number, using PRIAM (http://priam.prabi.fr/REL_JUL06/index_jul06.html), by homology search against the corresponding databases. The reference metabolic pathways, including in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (34) (http ://www.genome.jp/kegg/), were deduced from the EC number. 7 SUPPLEMENTARY INFORMATION The 7496 nuclear gene models that were automatically predicted, annotated and promoted to a ‘Reference’ set, including 1309 models manually annotated at community “jamborees” are available at the Genoscope Genome Portal for T. melanosporum (https://www.genoscope.cns.fr/secure-nda/Tuber/html/entry_ggb.html) and at the INRA TuberDB (http://mycor.nancy.inra.fr/IMGC/TuberGenome/). Similarly to other fungi, the majority (88%) of genes showed multi-exon gene structure with average of 4.5 exons per gene. The average gene length was 2073 bp and the average predicted protein length was 439 amino acids (table S1). A broad distribution of all exon lengths peaked at around 340 nucleotides, whereas for introns the peak occurred at around 107 nucleotides. This average intron length was consistent with the trend observed in other fungi. We assigned GO terms to 3646 (48.6%) T. melanosporum proteins including 3157, 2507 and 1499 genes with molecular function, cellular component and biological process, respectively. We also assigned 5146 (68%) proteins to KOG clusters. Analysis of gene density (Fig. S7) was performed by plotting the abundance of genes in a sliding window (100 kb for T. melanosporum and 30 kb for other fungi), binned and plotted. Up to 241 fragments (<100 AA) of protein-coding exons, showing a significant similarity (BLASTX, cut-off e-value > E-5) to known proteins in the NCBI non-redundant database, are located in TE-rich regions suggesting that TE activity played a key role in pseudogene generation. 5. Orthology, synteny, tandem repeats and multigene families 5.1. Orthology Up to 5990 of T. melanosporum predicted proteins (80%) showed a significant similarity (BLASTX, cut-off evalue > E-5) (35) to known proteins in the NCBI non-redundant database (May 2009). Most putative homologs (best reciprocal similarity BRH) were found in the Pezizomycotina. Amongst the predicted proteins, 3970, 5596 and 5644 showed significant sequence similarity to proteins from Saccharomyces cerevisiae, Neurospora crassa and Aspergillus niger, respectively (42 to 48% mean sequence similarity) (Fig. S8). The number of T. melanosporum genes showing a sequence similarity with N. crassa proteins ranged from 4667 to 5124 using a BLASTP cut-off e-value of 1,00E-10 or 1,00E-1, respectively. This agrees with the predicted ancient separation (>450 Myr ago) of the Pezizomycetes from the other ancestral fungal lineages (15). The time of divergence between the modern major families of Ascomycetes occurred approximately 450 Mya (Fig. S2). A substantial fraction (20%) of predicted genes in T. melanosporum are found to lack sequence similarity to any of the genes in public databases. The origin of these species-specific genes or orphan genes is poorly understood. An enrichment of orphan genes has been found at subtelomeric regions in Aspergillus species (36) and N. crassa (37). We therefore mapped the localization of T. melanosporum orphan genes on the largest supercontigs of the assembly (>0.5 Mbp) (Fig. S14). The orphan genes were evenly distributed along the supercontigs in most protein-rich regions of the genome. 5.2. Computation of blocks of conserved gene order amongst Ascomycete genomes Regions of conserved collinear gene order (syntenic regions) between T. melanosporum and Aspergillus nidulans, A. fumigatus, A. oryzae, A. terreus, Chaetomium globosum, Coccidioides immitis, Fusarium graminearum, F. verticillioides, Magnaporthe grisea, Neuropora crassa, Podospora anserina, Saccharomyces cerevisiae, S. kluyveri, Trichoderma reesei, and Yarrowia lipolytica were computed using FISH (38) based on BLASTP matches with a cutoff of E-value threshold of 1e-5. Input files for FISH was produced using custom Perl code and FISH was run with default parameters and required the minimal block to contain at least four anchors. Of the ~5600 T. melanosporum genes that have an ortholog in other fungi, very few show conservation of neighboring orthologs (synteny) in at least one of the other species (Fig. S9). T. melanosporum genome therefore shows a structural organization strikingly different from other sequenced ascomycetes; the largest syntenic region (with Coccidioides immitis) only contains 99 genes with 39 orthologs (Fig. S10). 8 SUPPLEMENTARY INFORMATION 5.3. Identification of tandem duplications, segmental duplications and duplications located anywhere in the genome Three different categories of gene duplicates were identified in the ascomycete genomes, tandem when genes duplicates sit next to each other, segmental duplicates when genes duplicates are part of a longer fragments of the genome duplicated as a whole, and the rest where duplication occurs all-over the genome. Tandem duplicates (>80% of nucleotide identity on their whole length) were identified through BLASTP algorithm (35) of adjacent genes using a cutoff of 1.E-5 and a sliding window of 10 kbp. Segmentally duplicated genes were identified through the synteny analysis as described in Section 6. Duplicated gene-pairs located anywhere in the genome were identified by the BLASTCLUST program (35) with bidirectional length coverage of 0.9. Two sequence identity cutoffs were used; 80% and 90% identity. The two datasets were subsequently divided into gene pairs having <80% percent identity (low similarity) and > 80 % identity (high similarity). One of the most striking characteristics of the T. melanosporum genome sequence is the almost complete absence of highly similar gene pairs (Fig. S11A & S12). Of the predicted 7496 protein-coding genes, only 11 pairs share >80% amino-acid identities in their coding sequence. This includes unlinked pairs of adenylosuccinate lyase C (GSTUMT00000346001/ GSTUMT00000611001), amino acid permease (GSTUMT00001333001/ GSTUMT00008162001), ubiquitin (GSTUMT00001535001/ GSTUMT00005111001), Hsp70 (GSTUMT00011349001/ GSTUMT00002222001), Major Facilitator Superfamily protein (GSTUMT00005575001/ GSTUMT00004129001), acyl-CoA dehydrogenase (GSTUMT00006563001/ GSTUMT00008938001) and cytochrome P450 (GSTUMT00000763001/ GSTUMT00009894001) genes. At the protein level, seven pairs share >90% amino-acid identities in their coding sequence, whereas 30 pairs (60 genes) share >80% amino-acid identities (Fig. S11A). Only three tandem duplications were found coding for an aldolase, a DEAD domain-containing protein and an orphan protein. 5.4. Multigene families and evolutionary analysis of multigene families (CAFE analysis) The species choice aimed to maximize the phylogenetic coverage with similar lineage radiation times in Sordariomycetes, Dothideomycetes, Leotiomycetes, and Eurotiomycetes. Protein sets from Stagonospora nodorum (Phaeosphaeria nodorum) (16597 models), Botrytis cinerea (Botryotinia fuckeliana) (16389 models), Sclerotinia sclerotiorum (14 522 models), Fusarium graminearum (Gibberella zeae) (15707 models), Neurospora crassa (9823 models), Magnaporthe grisea (12832 models), Aspergillus fumigatus (9631 models), and Laccaria bicolor (20 614 models) were retrieved from the NCBI Genome Database, the Joint Genome Institute Portal or the Broad Institute Portal. Multigene families were generated from proteins in T. melanosporum, representative Pezizomycotina phyla and L. bicolor (128941 predicted proteins) using TribeMCL tools (39) with default settings (BLASTP, cut-off e-value > E-5, inflation parameter = 3). In total, 11234 protein families (containing at least two sequences) were identified within the Ascomycetes (Fig. S11B). Within these 11234 ascomycetous (Tribe-MCL) protein families, 4056 were found in T. melanosporum. Amongst the 7496 T. melanosporum genes, 1441 (19%) were found in (Tribe-MCL) gene families (≥ 2 members); a value very similar to N. crassa (18%) and much lower than the ectomycorrhizal basidiomycete Laccaria bicolor (55%). The percentage of proteins found in protein families was not related to genome size and was lowest in T. melanosporum (Fig. S12). This was mainly due to the lower size of protein families (Fig. S11), but also to the lower number of protein families in T. melanosporum as compared to the other ascomycetes. The most abundant gene families are coding for proteins having a NB-ARC, Protein kinase, Helicase, AAA ATPase, or WD40 domains (table S6). Multigene families were analysed for evolutionary changes in protein family size using the CAFE program (40) (Fig. S11). The program makes phylogenetic inferences on changes in family size based on the topology and divergence times of a user defined linerarised tree using a maximum likelihood estimation to model the 9 SUPPLEMENTARY INFORMATION random birth and death process of genes in each family. CAFE calculates for each branch in the tree whether a protein family has not changed, is expanded, is contracted or has even gone extinct. Gene families that are lineage specific (i.e. unique to one species) were analysed separately since CAFE makes the assumption that at least one member of each family existed at the root. In addition to the classification, a p-value (0.001) for each branch is reported for each family to assess the significance compared to a random dataset using a Monte Carlo resampling procedure of gene gains and losses. This gives an indication whether the changes in the family size are indications of adaptive expansions or contractions. A linearized phylogenetic tree was constructed with estimates of the divergence times between T. melanosporum and other ascomycetes (Magnaporthe grisea 70-15 (GenBank accession # AACU00000000), Nectria haematococca/Fusarium solani (http://genome.jgi-psf.org/Necha2/Necha2.home.html), N. crassa OR74A (#AABX00000000), Botrytis cinerea/ Botryotinia fuckeliana B05.10 (#AAID00000000), Sclerotinia sclerotiorum 1980 (#AAGT00000000), Stagonospora nodorum/Phaeosphaeria nodorum SN15 (#AAGI00000000) and Aspergillus nidulans (#AACD00000000). The unique protein families were excluded from the analysis. In T. melanosporum, 269 families were expanded, 5270 showed no change and 5695 families had undergone contraction by comparison to a putative common ancestor Pezizomycotina having 11234 gene families. Comparing the counts of protein families on all branches of the Pezizomycotina tree, the largest average contraction in protein family size had occurred along the T. melanosporum lineage (Fig. S11B) with an average expansion rate of - 0.518. Tables S7 and S8 present the largest contracted and expanded gene families in T. melanosporum genome, respectively. The putative functions of these families were revealed by homology searches using the PFAM tools and database (41). Relative abundance of the various protein domains has then been vizualized by hierarchical clustering of the relative abundance of PFAM domains after transformation of the frequency values into z-scores (Fig. S13). Dramatic gene family contraction occurred in those genes predicted to have roles in metabolite transport (e.g., MFS, amino acid & sugar permeases), secondary metabolism (e.g., polyketide synthases, cytochrome P450), and carbohydrate-active enzymes (see below) (table S7, Fig. S13). On the other hand, several gene families showed a significant expansion, such as those coding for proteins containing the TPR, NB-ARC and protein kinase domains (table S8, Fig. S13). Overall, T. melanosporum only contains 58 species-specific gene families (most of which comprising two members) (table S9), but a fairly large number of single-copy orphan genes (ca. 1356) scattered in gene-rich gene regions (Fig. S14). Only 14 gene families are unique to L. bicolor and T. melanosporum (table S10), but none of their members is specifically overexpressed in symbiosis. 6. Targeted annotation of specific gene categories Gene categories corresponding to proteins playing a role in the fungal development, symbiosis and fruiting body formations, such as genome defense-related proteins, sex genes, carbohydrate degrading enzymes, and transporters, were targeted for manual review, phylogenetic analysis and gene model editing using ARTEMIS (http://www.sanger.ac.uk/Software/Artemis/). Evidence for identifying genes and editing exon boundaries was derived from protein and EST alignments, with focus provided by related proteins of known function. 6.1. RNA silencing and DNA methylation In filamentous fungi, global genome defense relies on RNA interference (RNAi), also known as RNA-mediated gene silencing, and on DNA methylation. Homologs of genes involved in both processes were identified in the T. melanosporum genome (table S12). No orthologs of the annotated genes, except for one putative helicase and the PP1 phosphatase, are present in S. cerevisiae. Some deviations in gene numbers, mainly regarding RNA silencing components (e.g., the two paralogous Argonaute proteins TmelAGO2 and TmelAGO3), were observed by comparison with an expanded set of reference ascomycetes (table S13). All but two siRNA/DNA 10 SUPPLEMENTARY INFORMATION methyltransferase (DMT)-related genes were covered by ESTs and all of them gave above background hybridization signals in at least one of the three life-cycle stages that were subjected to transcriptome analysis. Gene numbers for core RNA silencing components in T. melanosporum are very close to those of N. crassa and identical to those of M. grisea (table S13) Most of them are basal to groups of proteins containing functionally identified RNAi components from Neurospora and other fungi (e.g., M. grisea) in which siRNAmediated post-transcriptional gene silencing (PTGS) has been documented. However, except for the RNAdependent RNA polymerase TmelRRPc, no putative component of the RNAi machinery of T. melanosporum co-clusters with known PTGS (e.g., “quelling”; 42) components from N. crassa. Similar considerations hold for meiotic silencing by unpaired DNA (MSUD; 43), a component of which – the SAD-1 RRP chaperone SAD-2 – is missing in T. melanosporum (Fig. S6). Thus, if there are quelling- or MSUD-like processes in T. melanosporum they are likely to be quite different from those operating in N. crassa. Interestingly, one of the two Argonaute paralogs (TmelAGO3) clusters with the AGO protein of Schizosaccaromyces pombe (Fig. S6), a fungus in which siRNA-mediated transcriptional gene silencing (RITS) has been described (44). The latter process involves siRNA- and AGO-directed deposition of histone H3 K9 methylation marks on centromeric DNA repeats by a specialized histone methyl transferase (CLR4), a ortholog of which is also present in T. melanosporum (TmelCLR4; 53% identity; E-value=2e-27). Similar siRNA-mediated gene silencing processes have been described in plants, while their existence and mode of action in animals are still controversial. The lack of specialized miRNA biogenesis components such as a Drosha-like nuclease or a HEN1-like methyltransferase (45) suggests the absence in T. melanosporum (as in all fungi sequenced so far) of these particular non-coding RNAs and associated silencing mechanisms. Two putative DNA methyltransferases are present in the T. melanosporum genome as in the genomes of most filamentous ascomycetes sequenced so far (table S12). One of them (TmelDMT2) is homologous to, and clusters with, the DNA methyltransferase DIM-2 from N. crassa (Fig. S6) and is accompanied by the orthologs of all the proteins required for DIM-2 recruitment on hemimethylated DNA in this organism: the histone H3 Ser10-P phosphatase TmelPP1 (and its associated regulatory subunit TmelSDS22), the H3 K9 methyltransferase TmelCLR4 (DIM-5 in N. crassa), and the H3 3MeK9 binding (and heterochromatin forming) protein TmelHP1 (46) (table S12). The other DNA methyltransferase (TmelDMT1) is orthologous to, and clusters with the “methylation-induced premeiotically” (MIP; 47) enzyme MASC1 from Ascobolus immersus – a fungus that is basal to the Pezizales, the same order to which T. melanosporum belongs – rather than with the “Repeat-Induced Point Mutation” (RIP; 48) enzyme RID from N. crassa (Fig. S6). Potentially distinct functional roles of the two Tuber DMTs are also supported by the presence of a single C-5 cytosine-specific DNA methylase domain (PF00145) in both TmelDMT2 and Dim-2, rather than two separate domains as in MASC1, RID and TmelDMT1, and by the preferential expression of the latter gene in fruiting bodies (six fruiting bodyderived ESTs vs 0 free living mycelium-derived ESTs and a FB/FLM expression ratio >3; as opposed to the lack of any fruiting body expression bias for TmelDMT2). Finally, putative homologs of the Arabidopsis SWI/SNF chromatin remodeling proteins DDM1 and DRM1 and of the histone deacetylase HDA6 – involved in RNA-independent and siRNA-mediated DNA methylation in plants (49-51) – were identified (GSTUMT00001029001, GSTUMT00000296001 and GSTUMT00010406001) suggesting that siRNAmediated DNA methylation may also contribute to gene silencing in T. melanosporum. 6.2. Allergome and mycotoxin profiling Many fungi (e.g., Aspergillus fumigatus, Penicillium oxalicum, Penicillium citrinum, Cladosporium herbarum and Alternaria alternata) are strongly allergenic (59). Fungal allergens are usually (glyco) proteins or polysaccharides that elicit IgE antibody production upon interaction with the immune system of atopic individuals. An even more serious threaten regards mycotoxins: chemically diverse small molecules that can cause a variety of diseases, including cancer. Mycotoxins are produced by various filamentous Ascomycetes, 11 SUPPLEMENTARY INFORMATION especially members of the Aspergillus and Fusarium genera, through well defined biosynthetic pathways (60). Both issues are relevant to T. melanosporum because of its GRAS (Generally Recognized as Safe) status and production of edible, highly prized fruiting bodies with a long-standing reputation as a “gourmet food” and delicacy. To gain insight into the allergenic potential of truffles we searched the predicted proteome of T. melanosporum using as reference the fungal allergen database, an inventory of the 92 allergenic proteins presently identified in fungi (http://www.allergen.org). Since immunoreactions are determined by specific (often surface-exposed) epitopes, the mere presence of a putative allergen ortholog does not represent conclusive evidence for allergenicity. Also, recent estimates indicate that cross-reactivity (and thus prospective allergenicity) becomes significant at amino acid sequence identity values ≥ 50% (61). A sequence identity criterion was thus used to analyze the allergenic potential of T. melanosporum and to compare it with that of GRAS (S. cerevisiae and N. crassa) and strongly allergenic (A. fumigatus) ascomycetes as well as with that of other fungi, including the basidiomycete Laccaria bicolor, not explicitly associated with allergenicity. As revealed by the heat map reported in Fig. S15, T. melanosporum, amongst the examined fungi, presents the lowest allergenic potential after S. cerevisiae, with only four predicted proteins exhibiting >80% identity with known allergens. Only one of these proteins, the ribosomal protein encoded by GSTUMT00008226001, is more similar to its parent allergen (Alt a 12) than the corresponding protein from N. crassa. A similar analysis was conducted on genes involved in mycotoxin biosynthesis in Aspergillus, Fusarium, Alternaria, Penicillium and Trichothecium spp. Four mycotoxins (aflatoxin, trichothecene, fumonisin and gliotoxin) and the corresponding biosynthetic genes and pathways were interrogated (62-65). As shown in table S14, 51 potential homologs (E-value < 1e-10) of the 81 analyzed candidate mycotoxin biosynthetic genes were found in the T. melanosporum genome. It should be noted, however, that in all the investigated cases a minimum of six genes coding for key mycotoxin biosynthetic enzymes appear to be missing in T. melanosporum (table S14). Therefore, to the best of our present knowledge we conclude that none of the above mycotoxins is produced by T. melanosporum. 6.3. Sulfur assimilation and metabolism Ten sulfur metabolism-related sub-pathways, outlined in Fig. S16A, were interrogated and a total of 126 genes coding for most of the corresponding enzymes, permeases and regulators were identified (table S15). When compared with a reference set of five different Ascomycetes (S. cerevisiae, N. crassa, M. grisea, B. cinerea and A. nidulans), 21 genes appear to be the Tuber-specific paralogs of genes that are also present in S. cerevisiae, four genes are the Tuber-specific paralogs of filamentous ascomycete specific genes (a total of 25), while five genes were only found in T. melanosporum (table S15). The latter include phytochelatin synthase, the first enzyme of this kind to be described in filamentous ascomycetes; three putative dipeptidyl aminopeptidases possibly involved in Cys-Gly (and other dipeptides) hydrolysis; and a putative chromate efflux transporter similar to bacterial proteins conferring resistance to chromate – a toxic metal anion that is internalized by sulfate permeases –. The most significant deviations in gene number with respect to N. crassa and other filamentous ascomycetes (Fig. S16) are the duplication of PAPS reductase, the presence of an extra-copy of a putative sulfate transporter and the presence of 19 copies of putative cysteine desulfurase genes (table S15). Amongst the genes that appear to be missing in T. melanosporum there are those coding for arylsulfatase, alkanesulfonate monoxygenase and sulfide dehydrogenase. More than 75% of the manually annotated genes were covered by ESTs and 93% of them gave above background hybridization signals in at least one of the three life-cycle stages that were subjected to transcript profiling [free-living mycelium (FLM), fruiting body (FB), and ectomycorrhiza (ECM)]. Normalized EST redundancy for core S-metabolism genes (pathways 1, 2, 3, 4 and 7 in Fig. S16) was on average 6.8-fold higher in T. melanosporum than in N. crassa, with a 3.5-fold higher redundancy for FB-derived vs. FLM- 12 SUPPLEMENTARY INFORMATION derived ESTs. Similar results with regard to the fruiting body preferential expression of S-metabolism genes were obtained from transcriptome analysis, which also included ectomycorrhiza (Fig. S16B, table S15). Sulfur metabolism thus appears to be strikingly active in T. melanosporum, and particularly in fruiting bodies. The three pathways exhibiting the strongest fruting body bias were “sulfate internalization & reduction”, “Cys/Met biosynthesis & interconversion” and “methionine uptake & utilization”. Amongst the genes with the highest expression prevalence in fruiting bodies there are those encoding two sulfate permeases (ST1 and ST2) and various enzymes involved in S-amino acid biosynthesis/interconversion, especially methionine/SAM formation and recycling. Also worth of note is a case of markedly disproportionate transcriptional output (and EST redundancy) for otherwise convergent or coupled reactions involved in cysteine/homocysteine interconversion in fruiting bodies (Fig. S17, table S15). Here, the two most represented enzymes are cystathionine γ-lyase (CGL; TmelCYS3) and cystathionine β-lyase (CBL; TmelSTR3). The standard substrate of these enzymes is cystathionine, which is produced by two cystathionine synthases (TmelSTR2 and TmelCYS4) that are both expressed at exceedingly low levels in fruiting bodies (Fig. S17). What this suggests is that the entire pathway is strongly polarized toward Met (and to a lesser extent Cys) to start with, and that these two overexpressed lyases may actually be involved in different (cystathionine metabolism-unrelated) reactions leading to the formation of S-volatiles (or S-VOC precursors). Consistent with this hypothesis, CGLs and CBLs from several microorganisms (especially bacteria involved in cheese ripening) are known to act on various S-containing substrates besides cystathionine (66). As shown in Fig. S17, these alternative, nonstandard reactions include cysteine/homocysteine desulfhydrylation and H2S production, and the dethiomethylation of methionine to methanethiol, which can then spontaneously decompose to H2S, dimethyldisulfide and other methyl sulfides, many of which are known constituents (or precursors) of truffle flavour compounds (52-54). Additional flavour-related pathways originating from methionine are centered on 4-methylthio-2-oxobutanoic acid (also called α-keto-γ-(methylthio) butyric acid or KMBA), which can be degraded chemically or enzymatically to methanethiol and sulfides, or be converted enzymatically into 3-(methylthio)propanal and 3(methylthio)propanol (or the corresponding acids; the so called “fusel alcohols/acids”), through the Ehrlich pathway (Fig. S18) (see also section 6.4). The latter pathway as well as other “Methionine uptake & utilization” components (including TmelMsrA, a gene coding for a methionine sulfoxide reductase enzyme implicated in dimethyl sulfoxide reduction and flavor formation in yeast; 63) are also well represented in the T. melanosporum genome and transcriptome. Despite the presence of a Cys-degrading taurine dioxygenase (TmelTDI1c) that is highly expressed in fruiting bodies and a potentially flavour-related, but as yet unidentified role of the enzymes encoded by the multigene cysteine desulfurase family, the data suggest a more prominent role of methionine as a key sulfur metabolite and S-VOC precursor in T. melanosporum fruiting bodies. Many S-metabolism genes, including one sulfate permease, various sulfate reducing enzymes, a single methionine transporter and a methionine synthase, are expressed at fairly high levels in mycorrhiza (Fig. S16B). This suggests potential symbiosis-related roles for these proteins, such as an improved sulfur nutrition and metal tolerance of the host plant and an enhanced redox capacity of (pre)symbiotic hyphae to counteract the oxidative burst mounted by the plant upon infection. 6.4. Truffle aroma and volatile organic compounds (VOC) The truffle aroma is made of hundreds of volatiles that vary in proportion and composition depending on the species, maturity and origin of the isolates (52-54). Several isoprenoids (also known as terpenoids) have been identified amongst VOCs produced by ripe T. borchii fruiting bodies, along with a sustained expression of genes involved in the isoprenoid pathway (55). Isoprenoids belong to a vast group of secondary metabolites synthesized from isopentenyl diphosphate (IPP), which includes flavor enhancers and fragrances. A complete 13 SUPPLEMENTARY INFORMATION set of genes involved in the biosynthesis of isoprenoid units through the mevalonate intermediate, plus a group of putative polyisoprenoid/terpenoid biosynthetic genes, were identified in the T. melanosporum genome (table S16). Another major constituent of truffle aroma are fusel alcohols, which are also produced by yeasts via amino acid catabolism through the Ehrlich pathway (56). Earlier studies suggested that truffle aroma as a whole could be a mixture of compounds produced by both the mycelium (so called “gleba”) and fruiting bodyassociated microorganisms (53, 57). However, a large set of genes homologous to the Ehrlich pathway genes operating in yeast is present in the T. melanosporum genome (Fig. S18) and many of them are preferentially expressed in fruiting bodies, thus suggesting that truffles can produce most (if not all) of these compounds on their own. Similar considerations hold for sulfur-containing volatiles (S-VOCs), which are also major and characteristic constituents of truffle aroma found in all Tuber species (58). Indeed, S-metabolism, especially those coding for sulfur assimilation and Cys/Met metabolism components, are amongst the most highly expressed genes in T. melanosporum, with a strong expression bias for fruiting bodies (see section 6.3). Surprisingly, we found no gene coding for lipoxygenase, the enzyme that in most edible fungi is responsible for the biosynthesis of 1-octen-3-ol, a key component of ‘mushroom aroma’. 6.5. Sex and mating type genes The analysis of genes implicated in the mating process, including pheromone response, meiosis and fruiting body development showed that most sex-related components identified in other ascomycetes are also present in T. melanosporum (table S11). 6.5.1. Identification of mating type genes The T. melanosporum genome sequence, strain Mel28, was analyzed for the presence of mating type homologous genes using BLASTN and the T. melanosporum EST sequences showing a high similarity with the MAT1-2-1 gene from Diaporthe sp. (BAE93753, BAE93759), Cryphonectria parasitica (AAK83343), Fusarium sacchari (BAE94382) as query; this was confirmed using other Ascomycete MAT genes as queries. The BLAST search identified the corresponding gene model (GSTUMT00001090001) in scaffold 247. On the other hand, BLAST search using the α-box containing genes from different Ascomycetes as a query did not allow the identification of any region with significant similarity in the Mel28 genome. The MAT1-2-1 gene consists of 4 exons and contains the HMG-box sequence typical of the MAT1-2-1 genes of Ascomycetes (Fig. S19). The deduced amino acid sequence is 297 residues long. 6.5.2. Identification of mat1-1 and mat1-2 T. melanosporum strains To assess for the presence of the HMG-box containing region in different T. melanosporum strains, the gene specific primers GMmat121intf: 5’- TTTCTTTGATGGGTCGGATGGAG - 3’ and GMmat121intR: 5’ GCCCTTGCCTATTAATGTGTTAGTG - 3’ where designed and used to perform a PCR screening on 15 T. melanosporum ascocarps. Thirteen out the 15 samples screened produced the expected amplicon (673 bp). The two samples (mel206 and mel151) that did not give rise to any amplification product were then supposed to harbor the opposite mating type (MAT1-1). Conversely, all samples yielded the expected amplicons when amplified with primers for β-tubulin as control. 6.5.3. Isolation of the mat1-1 idiomorph To identify conserved regions flanking the two putative idiomorphs a set of primers was designed in order to amplify the 5’ and 3’ genomic regions surrounding the MAT1-2-1 gene. These primers were used both on samples harboring the putative MAT1-2 (mel271 and mel459) and MAT1-1 (mel206 and mel151) idiomorphs. On the 5’ flanking region, the primers GMmatext2F: 5’- CAATCTCTTCCATCGCCCGTCCAG -3’ and 14 SUPPLEMENTARY INFORMATION GMmatextr6 5’- TGGTATATGTGGATGTATTGATAACTATAAT -3’ yielded an amplicon of about 370 bp on both MAT1-2 and MAT1-1 strains. On the 3’ flanking region, the primers GMextF2: 5’ AGAGATAGAGAAATAGCATGGCTCGG -3’ and GMmatEXT2r: 5’ - AAGTAACCTTTGTGCCATTGCTCCA - 3’ produced an amplicon of about 1200 bp on MAT1-2 strains and a fragment of about 1700 bp on MAT1-1 strains. The PCR fragments obtained from sample mel206 with primer pairs GMmatext2F/GMextr6 and GMextf2/GMmatext2r were cloned and sequenced. Sequence alignment showed that these two fragments are highly similar (~ 96% sequence identity) to the corresponding regions on the sequenced genome, although the fragment generated by primers GMextf2/GMmatext2r showed an insertion of 495 bp with respect to the 3’ region downstream the MAT1-2 idiomorph. On this insertion a primer specific to MAT1-1 strains (mat111R1: 5’ - GCCAACCTCTAGTTGGGATATTTGTTCAGGAC – 3’) was designed and used in combination with GMmatext2f to amplify the entire MAT1-1 idiomorph on mel206. This amplification produced a fragment of 10326 bp whose sequencing confirmed the identification of the MAT1-1 idiomorph. Within this amplicon an idiomorphic region of 7470 bp containing the MAT1-2-1 gene was identified (Fig. S19). The deduced amino acid sequence of the MAT1-1-1 gene is 319 residues long and alignment with MAT1-1-1 sequences from other ascomycetes revealed a conserved α-box domain that typifies the MAT1-1-1 gene of filamentous ascomycetes (data not shown). More specifically, BLASTX analysis against the NCBI nr database revealed a sequence similarity with the mat1-1-1 protein of Alternaria brassicae (AAK85543.1, score = 35.4, E-value = 2.7), Penicillium marneffei and Ajellomyces capsulatus. The mat1-1-1 sequence was deposited in GenBank under accession number 000. 6.5.4. Pheromone and mating signal transduction-related genes Complementary alpha-factor and a-factor pheromones are required for sex in heterothallic species. Binding of the pheromone to a cognate G-protein coupled receptor triggers the activation of a mitogen-activated protein (MAP) kinase signaling pathway ultimately targeting a homeodomain transcription factor. Most of the key genes for pheromone and MAP-kinase cascades were identified in T. melanosporum genome (table S18). A putative pheromone precursor gene similar to the alpha-factor precursor gene of Saccharomyces cerevisiae and a series of genes involved in enzymatic processing, maturation (i.e. prenylation) and efflux of both alphafactor and a-factor pheromones were identified in the sequenced Mel28 genome (table S11). Conversely, the a-factor like pheromone precursor gene was not identified in this strain. Two G-protein coupled receptors (with a characteristic seven transmembrane domain) for the a- and alpha-factor like pheromones were also identified along with genes coding for the alpha, beta and gamma G-protein subunits. The MAP-kinase cascade and the homeodomain transcription factor STE12 are conserved in T. melanosporum. Furthermore, a gene containing a HMG-box domain similar to the transcription factor STE11 of S. pombe and a MADS-box gene similar to MCM1 of S. cerevisiae were identified in the Mel28 genome. In S. pombe, STE11 is involved in the induction of mating type genes in response to nutritional starvation and in the transcriptional activation of meiotic genes. In S. cerevisiae, MCM1 interacts with STE12 and with mating type transcription factors to trigger mating type-specific gene expression in a and α cells. 6.6. Light perception and potential photoresponses Light from UV-C to far red is perceived (and transduced) by fungi, in which it modulates growth and morphogenesis and other processes such as pigmentation, sexual/asexual development and secondary metabolism. Eight genes coding for putative “photoreceptors and light-dependent regulators” plus five “accessory components and modulators” were identified in the T. melanosporum genome (table S17). The former group includes the homologs of the blue light receptor WC-1 and its partner protein WC-2 from N. crassa (68-69), a putative phytochrome, and an opsin-like protein. All but one of the annotated genes were covered by ESTs, and all of them gave above background hybridization signals in at least one of the three life-cycle stages that were subjected to transcriptome analysis. 15 SUPPLEMENTARY INFORMATION The results of genome sequence analysis are in line with previous data pointing to the occurrence of a blue light-induced phenotype (i.e., apical growth inhibition), likely mediated by TmelWc-1, in Tuber borchii (70). At variance with its N. crassa homolog, TmelWc-1 lacks the polyQ region involved in transcriptional activation, suggesting a repressive, rather than an activating role for this protein. The putative phytochrome identified in T. melanosporum is homologous to one of the two phytochromes present in Neurospora (Phy-1; 71), and similarly bears two histidine kinase domains and single PAS, GAF and “response regulator” domains. It is accompanied by a suite of putative transducers and regulators (TmelVeA, TmelVelB, TmelLaeA and TmelVosA) resembling the components of the blue/red-light sensing complex of A. nidulans (72). The other putative light-sensing component is a bacteriopsin-like protein (TmelOrp1) homologous to the opsin-related protein-1 from N. crassa. Despite the presence of a canonical seven transmembrane-helix domain, TmelOrp1 lacks a conserved Schiff-base forming lysine as well as 10 of the 22 amino acid residues that bind all-trans retinaldehyde in the N. crassa bacteriopsin Nop-1 (73) In keeping with this observation, two putative polyisoprenoid synthetases (data not shown), but no true β-carotene (nor retinaldehyde) biosynthetic enzyme, were found in T. melanosporum. Also, comparison with opsin-like proteins from other organisms suggests that oxygen-, rather than light-sensing might be the main function of TmelOrp1, which is preferentially expressed in fruiting bodies. Amongst the light-sensing components that appear to be missing in T. melanosporum there is FREQUENCY, a regulator of circadian rhythmicity, and VIVID, a PAS/LOV protein that mediates adaptation to irradiation intensity in Neurospora (69). The lack of FREQUENCY is in line with previous observations pointing to the absence of circadian rhythmicity in T. borchii (70). Perhaps long-term seasonal variations, rather than finely tuned daily rhythms are more suited to a hypogeous fungus, whose life-cycle is likely influenced by light/temperature variations rather than by circadian rhythmicity. The lack of VIVID, a sensor specialized in the adaptation to changes in day-light irradiation intensity, may similarly reflect adaptation to a soil-screened subterranean habitat. Therefore, although the actual function of the above components remains to be defined, it is tempting to speculate that a photosystem such as the one revealed by genome analysis may be instrumental to: (i) enforce subterranean growth and development by acting as a sort of light escape/avoidance mechanism; and (ii) control seasonal developmental variations, especially those related to sexual differentiation and secondary metabolism reprogramming. Light sensing in truffles is also supported by molecular phylogenetic data attesting to repeated episodes of epigeous to hypogeous lifestyle transitions, with no instance of character reversal, in the evolutionary history of the Pezizales (74). 6.7. Genes families involved in transduction pathways We have carried out a genome-wide analysis of gene families encoding components of T. melanosporum signaling pathways, including monomeric G-proteins of the ras family, subunits of heterotrimeric G-proteins, G protein-coupled receptors (GPCR), and kinases (table S18). T. melanosporum signaling genes were compared to signaling genes in yeast and ascomycete genomes (N. crassa, A. nidulans, B. cinerea and M. grisea) using the best reciprocal blast hit (BRH) method to identify orthologous genes. Gene expression in mycelium or in fruiting bodies was estimated based on EST abundance. Transcript profiling was carried out using the NimbleGen exon oligonucletide arrays (section S8). T. melanosporum genome encodes the signalling genes documented in other filamentous ascomycetes (table S18). All proteins involved in pathways controlling key cellular processes such as stress response, filamentous growth, virulence and mating were identified and curated. Except for the GPCR Pth11 family (see below), no significant expansion or contraction of signaling gene families were found. Up to 80% of the gene models were supported by ESTs. During the interaction with the host plant, most signaling transcripts are strongly 16 SUPPLEMENTARY INFORMATION expressed and only a few were not or barely detected. Only a few transcripts showed a differential expression in ectomycorrhizal root tips (table S18). We identified 22 genes encoding the pathogenesis-related GPCR Pth11 in T. melanosporum. The number of paralogs in this multigene GPCR family was lower in T. melanosporum by comparison to M. grisea (86 members), A. nidulans (71 members) or B. cinerea (55 members) and more comparable to the reduced set present in N. crassa (30 members); no homologs are present in the yeast genome. Interestingly, a phylogenetic analysis of this GPCR family clustered five T. melanosporum Pth11-related genes in a subgroup that may represent functionally distinct Pth11-related GPCRs specific to T. melanosporum (data not shown). In this large gene family, more than half of the genes showed low expression levels while some exhibited a very strong expression in different biological situations. Interestingly, a few Pth11-like GPCR transcripts strongly accumulated in ectomycorrhizal tissues (e.g., Tmel_Pth11_rel4) or in fruiting bodies (Tmel_Pth11_rel15, Tmel_Pth11_rel17, Tmel_Pth11_rel20) compared to the free-living mycelium. The three most highly Pth11 transcripts in fruiting bodies all belong to the Tuber-specific Pth11 subgroup. The function of Pth11-related proteins is not yet known, but since GPCRs are key components of external stimulus sensing, one can speculate that they might be involved in the complex cross-talk between the mycobiont and its hostplant. 6.8. Secretome Secreted proteins were identified using a custom pipeline including the TargetP (http://www.cbs.dtu.dk/services/TargetP/) and SignalP (http://www.cbs.dtu.dk/services/SignalP/) algorithms (75). The 1449 proteins predicted to carry a signal peptide were then screened for transmembrane proteins using TMHMM (http://www.cbs.dtu.dk/services/TMHMM/), T. melanosporum TE fragments, and coding sequences with matches only in the secretory leader sequence. The pipeline identified 125 genes coding for cysteine-rich small secreted proteins (SSP) (≥4 Cys, 80<AA<300AA. Of them, 70 SSPs were lineage-specific. A single SSP cluster (defined as a group of at least three SSPs in 10 kbp), corresponding to hydrophobin genes, was found in T. melanosporum. Amongst the most highly upregulated transcripts in T. melanosporum/Corylus avellana ectomycorrhizal root tips were several encoding predicted secreted proteins [e.g., LysM domain (GSTUMT00012780001)] (table S3). Several transcripts encoding Tuber-specific SSPs were also differentially expressed in fruiting bodies, e.g. GSTUMT00006097001, GSTUMT00012378001, and GSTUMT00007269001, suggesting that they play a role in the differentiation of the fruitng body (pseudo)tissues. 6.9. Environmental stress response genes T. melanosporum experiences various stress conditions at different stages of its lifecycle. In particular, production of fruiting bodies follows a fairly strict seasonal timing, which is likely determined, and influenced, by various abiotic factors and potential stressors, such as high or low temperatures and drought. Proper response to these stressful conditions likely influences the T. melanosporum lifecycle, including its symbiotic interactions with its host plants, nutrient exchange and thus fruiting body production. We have curated genes coding for heat shock proteins (Hsp), as well as other chaperones and proteins binding to Hsps, i.e., cochaperones. Seventy-four genes were manually annotated; 15 gene models were edited. Sixty-four of them were assigned to 16 gene families identified through PFAM search (table S19), while the remaining 10 genes did not match any of the known families. Especially noteworthy amongst the latter was the dehydrin (DHN1) gene. Initially classified as LEA (late embryogenesis abundant), dehydrins (DHNs) are now believed to play a protective role during plant dehydration. The first fungal DHN-like gene (TbDHN1) was identified in the whitish truffle Tuber borchii. A homolog of TbDHN1, designated as TmelDHN1, was identified in the T. melanosporum genome. Despite a significant overall identity of 47%, the two amino acid sequences differ in length and in the 17 SUPPLEMENTARY INFORMATION number of repeats of the ‘DPRVDS’ motif. As expected, TmelDHN1 has no ortholog in S. cerevisiae, whereas potential orthologs are present in other filamentous fungi. The other stress-response related gene families that were annotated in T. melanosporum genome are: CPN60-TCP1, Hsp70, DNAJ, cyclophilin, Hsp90 and associated proteins, Bag, Fes, Pam16, Hsp20, Hsp9/12, Grpe, Clp, Cpn10, ClpA/B, FKBP and Usp (table S19). The CPN60-TCP1 family includes molecular chaperones belonging to the octameric complex TCP1 (CCT). Genes coding for eight of the known TCP1 subunits were identified in T. melanosporum (Tmelcct1, Tmelcct2, Tmelcct3, Tmelcct4, Tmelcct5, Tmelcct6, Tmelcct7, and Tmelcct8). According to the PFAM analysis, Tmelhsp60 also belongs to this family. All the above gene products were subjected to phylogenetic analysis (data not shown), which confirmed their identity and tentative designation as CPN60-TCP1 components. Twelve Hsp70-like genes were identified in the T. melanosporum genome. As revealed by phylogenetic analysis, Tmelhsp70 and Tmelhsp88 clustered with homologous proteins from A. nidulans, while TmelSSB is closely related to a homologous Hsp from N. crassa (bootstrap value > 50). Within this family, Tmelhspa12b and Tmelhspa12a_1 are likely to be specific to filamentous fungi. Members of the DNAJ family are involved in protein folding, protein transport, and response to stress (http://ghr.nlm.nih.gov/geneFamily=dnaj). Fourteen putative members of this family were identified in the T. melanosporum genome; these, along with a set of 90 homologs were subjected to phylogenetic analysis (data not shown). Seven genes, members of the cyclophilin superfamily, were identified in T. melanosporum genome. Hsp90 is a conserved heat shock protein, which can bind other proteins, defined as co-chaperones. In the T. melanosporum genome we found a single Tmelhsp90 and five co-chaperones (Tmelcdc37, Tmelsti1, Tmelcns1, Tmelwos2 and a not yet assigned gene). Members of the other families (Pam16, Bag, Fes, Hsp9/12, Hsp20, Grpe, Clp, Cpn10, ClpA/B, FKBP, Usp) were also identified in the T. melanosporum genome (table S19). 6.10. Aminoacyl-tRNA synthetases and translation factors The main components of the translation machinery are the 80S ribosome, the activated amino acid (aminoacyl tRNA) forming enzymes “aminoacyl-tRNA synthetases” and the translational factors. A conserved set of 77 ribosomal proteins (RPs) – 32 RPs plus the 18S rRNA associated with the small subunit; 45 RPs plus the 25S, 5.8S and 5S rRNA associated with the large subunit – is encoded by the genomes of all the ascomycetes (unicellular and multicellular) sequenced so far. Special attention was thus given to the annotation of the other two main components of the translation machinery, the “aminoacyl-tRNA synthetases” and the “translation factors”. The latter belong to a large and heterogenous set of proteins assisting and orchestrating the three functional phases of translation: initiation, elongation and termination. 6.10.1. Aminoacyl-tRNA synthetases Twenty cytosolic and eleven nuclear-encoded mitochondrial amino acyl-tRNA synthetase (ARS) genes were identified by similarity with orthologous genes of known function from other fungi. T. melanosporum ARSs, named according to the standard nomenclature (http://www.genenames.org), are listed in table S20. Paralogous genes with no assigned function or potential pseudogenes were found for leucyl-tRNA synthetase (GSTUMT00003081001, GSTUMT00000386001, GSTUMT00005548001, GSTUMT00009004001), lysyl-tRNA synthetase (GSTUMT00005082001), histidyl-tRNA synthetase (GSTUMT00009282001) and valyl-tRNA synthetase (GSTUMT00003370001). Ninety-seven % of the annotated genes were covered by ESTs and 90% of them gave above background hybridization signals in at least one of the three life-cycle stages that were subjected to transcriptome analysis (free-living mycelium, FLM; fruiting body, FB; and ectomycorrhiza, ECM) using the NimbleGen oligoarrays (section 8). The three ARS genes that appear not to be expressed under any of the presently examined life-cycle stages/ growth conditions are the mitochondrial leucyl-tRNA synthetase, methionyl-tRNA synthetase, and lysyl-tRNA synthetase. 18 SUPPLEMENTARY INFORMATION 6.10.2. Translation factors Forty-nine translation factor genes (37 initiation factors plus one putative polyA binding protein; eight elongation factors; and three termination factors) were identified in the T. melanosporum genome (table S21). Ninety-eight % of the annotated genes were covered by ESTs and all of them gave above background hybridization signals in at least one of the three life-cycle stages that were subjected to transcriptome analysis. When compared with a reference set of five sequenced ascomycetes (S. cerevisiae, N. crassa, M. grisea, B. cinerea and A. nidulans), seven of the annotated genes have no homolog in S. cerevisiae, whereas one of them, the paralog of the release factor 1 gene eRF1, was only present in T. melanosporum. Amongst the filamentous fungus-specific translation factors identified in the T. melanosporum genome there are three components of initiation factor eIF3 (eIF3e, eIF3j, eIF3k) (table S22). This is the largest eukaryotic initiation factor that serves as a scaffold for the interaction with other initiation factors as well as with other complexes and processes such as the COP9 signalosome, the 26S proteasome and nonsense-mediated mRNA decay. The most significant deviations in gene number are the presence in T. melanosporum of three copies of elongation factor eEF1a and the apparent absence of eEF1Bβ. As revealed by an extended sequence comparison (table S22), most T. melanosporum translation factors (28 out of 49) share the highest similarity with homologous components from Schizosaccharomyces pombe, while some of them are most similar to the corresponding animal (8 gene models) or plant (3 gene models) factors. 6.11. Carbohydrate Active enZymes (CAZymes) Enzymes that cleave, build and rearrange oligo- and polysaccharides play a central role in the biology of saprotrophic, pathogenic and symbiotic fungi and are key to optimizing biomass degradation by these species. Given the relative importance of this protein families to the ecology of ectomycorrhizal fungi, we performed a detailed examination of the genes coding for carbohydrate active enzymes (CAZYmes) in T. melanosporum genome and compared it with the corresponding gene subsets from saprotrophic, pathogenic, and symbiotic fungi (tables S23 & S24). The search for catalytic modules specific to CAZYmes, glycoside hydrolases (GH), glycosyltransferases (GT), polysaccharide lyases (PL), carbohydrate esterases (CE), and their ancillary carbohydrate-binding modules (CBMs) in T. melanosporum was performed exactly as for the daily updates of the Carbohydrate-Active enZymes (CAZy) database (76) (http://www.cazy.org). Each protein model was compared with a library of over 100000 constitutive modules (catalytic modules, CBMs and other non-catalytic modules or domains of unknown function) using BLASTP. Models that returned an e-value passing the 0.1 threshold were automatically sorted and manually analyzed. The presence of the catalytic machinery was verified for distant relatives whenever known in the family. The models that displayed significant similarities were retained for functional annotation and classified in the appropriate classes and families. A strong similarity to an enzyme with a characterized activity allows annotation as 'candidate activity', but often for a safe prediction of substrate specificity, annotation such as 'candidate α- or β-glycosidase' may be provided, as the stereochemistry of the α- or β-glycosidic bond is more conserved than the nature of the sugar itself. Each protein model was compared to the manually curated CAZy database, and a functional annotation was assigned according to the relevance. All uncharacterized protein models were thus annotated as 'candidates' or 'related to' or 'distantly related to' their characterized match as a function of their similarity. The overall results of the annotation of the set of CAZymes from T. melanosporum were compared to the content and distribution of CAZymes in several fungal species (table S23) in order to identify singularities in the families' distributions. This allowed the identification of significant reductions of specific CAZyme families in T . melanosporum. As expected for a symbiotic fungus leaving in the root apoplast, T. melanosporum has few genes encoding glycoside hydrolases (GHs). With a total of 91 GH encoding genes (table S23), it has much fewer GHs than 19 SUPPLEMENTARY INFORMATION the phytopathogens (e.g., M. grisea and F. graminearum) and the saprotrophs (e.g., N. crassa and P. anserina). This repertoire is even lower than the symbiotic basidiomycete L. bicolor. Based on its CAZome (table S24), T. melanosporum has a limited ability to hydrolyze plant cell wall polysaccharides (PCW). For instance, there is no GH5 cellulase appended to a cellulose-binding module (CBM1) and no cellulases from families GH6 and GH7 were found in the genome. However, we have detected a few genes encoding enzymes acting on PCW: – Cellulose degradation: Endoglucanases [GH5 (1 gene model); GH61 (1 to 4); CBM1 (1)] – Hemicellulose degradation: Xylanase [GH10 (1 gene model)]; Xyloglucanase [GH12 (1)]; Arabinanase [GH43 (1)]; but no Galactanase [GH53 (0)]; – Pectin degradation: Pectinase [GH28 (2 gene models); GH78 (2 gene models); PL1 (2 gene models); PL4 (1 gene models); CE8 (1 gene models); CE12 (1 gene models)]. The single GH5 endoglucanase, together with the single secreted GH12 xyloglucan-specific endoglucanase, a pectin methylesterase, a secreted GH28 polygalacturonase and a rhamnogalacturonan acetylesterase, were amongst the most highly upregulated transcripts in ECM root tips, suggesting a role for these enzymes in PCW degradation and remodeling during host colonization (table S3 and fig. S21). With 103 glycosyltransferases (GT), T. melanosporum is close to the average amongst Sordariomycetes and Eurotiomycetes, suggesting that glycosyltransferases possess basal intracellular activities and that variations in composition may reflect species divergence rather than ecological niche pressure. The enzymes involved in plant polysaccharide depolymerization often carry a carbohydratebinding module (CBM) appended to their catalytic domain. Expectedly, the T. melanosporum genome has the smallest number of CBM-containing proteins amongst the sequenced filamentous fungi, even lower than the ectomycorrhizal L. bicolor. The polysaccharide lyase gene set is also very low. Overall, the T. melanosporum genome encodes a paucity of enzymes involved in PCW depolymerization, but still encodes and expresses several degrading enzymes able to facilitate the progression of the hyphae in the pectin-rich middle lamella during the formation of the intraradicular Hartig net. 6.12. Secreted peptidases The total number of secreted peptidases (49 members) identified in the T. melanosporum genome using the MEROPS database (http://merops.sanger.ac.uk) is similar to that of other sequenced fungi (Fig. S22). However, the number of aspartyl protease is much lower in T. melanosporum in comparison to the other sequenced ascomycetes and the ectomycorrhizal L. bicolor. Several of these proteases may play a role in developmental processes as they are either up- or down-regulated in fruiting bodies and ectomycorrhizal root tips (data not shown). Interestingly, two putative aminopeptidases (M28A family), showing a strong amino acid identity with leupeptin-inactivating enzyme (LIE) were strongly up-regulated in fruiting bodies and ectomycorrhizal root tips. 6.13. Membrane transporters A process that is pivotal to the success of ectomycorrhizal associations is the exchange of nutrients between the symbiont and its host plant. The gene models coding for membrane transporters were identified and curated by using the Transport Classification Database (http://www.tcdb.org/). A comparison with other ascomycetes and basidiomycetes (table S25) revealed that the total number of predicted transporters in most T. melanosporum families is in the lower range of the values reported for Sordariomycetes and Eurotiomycetes. This is in contrast to the ectomycorrhizal L. bicolor which displays an expansion of several transporter gene families. Several of the identified transporters however likely play an important role in the 20 SUPPLEMENTARY INFORMATION symbiosis metabolism as their transcripts as strikingly upregulated in the ectomycorrhizal root tips (table S3). 6.14 Other gene categories A series of papers describing detailed analyses of the T. melanosporum gene categories and their expression will be published elsewhere. 7. Non-coding RNAs 7.1. Transfer RNAs (tRNA) gene abundance, anticodon/codon usage and translational selection tRNA coding genes were searched with tRNAscan (77) and Pol3scan (78). Their identity was further verified by homology searches conducted against a reference set of tRNA sequences in order to eliminate organellar and mispredicted genes. A total of 143 tRNA genes was thus identified, 65 of which contain introns. They correspond to 45 different anticodons (table S26), the maximum number expected for a non-redundant decoding system capable of decoding all the standard amino acids (79, 80); neither a selenocysteine tRNA, nor any suppressor tRNA gene were found in the T. melanosporum genome. The anticodon repertoire in this genome is consistent with a ‘restricted' use of wobbling (i.e., allowed anticodon:codon pairings: I/ANN:NNU,NNC; GNN:NNU,NNC; UNN:NNA; CNN:NNG). Many other filamentous ascomycetes share the same assortment of anticodons, suggesting that it was already present in the stem Pezizomycota ancestors. The number of tRNA genes in T. melanosporum is however at the lower end of the range found in Pezizomycota. It is about one third of the number found in N. crassa, and substantially less than that found in A. nidulans, M. grisea, and B. cinerea. The disparity between the tRNA gene repertoire of T. melanosporum and that of other fungi is even more pronounced when genome size is taken into account: for every Mb of DNA sequence there are on average 22 tRNA genes in S. cerevisiae, 10.3 in N. crassa, 6.2 in A. nidulans, 5.4 in M. grisea, 4.5 in B. cinerea, but only 1.1 in T. melanosporum. Also peculiar is the codon usage in T. melanosporum which shows a strikingly uniform use of codons, as revealed, for example, by relative synonymous codon usage (RSCU) values close to unity (table S27). When codon usage for ribosomal proteins, which are usually highly expressed and encoded by genes with a strong codon bias, is taken into account, a slight preference for some synonymous codons becomes apparent (table S28). These ‘preferred codons’ correspond to tRNA genes with high copy numbers (cf. tables S26 & S27), thus suggesting a selection for optimal translation (78). However, translational selection appears to be extremely weak in T. melanosporum compared with other fungi. 7.2. Spliceosomal RNAs (snRNA) Spliceosomal RNA gene prediction was performed with cmsearch of the INFERNAL package (81) using the relevant covariance model from Rfam (82). For each covariance model, the window size and trusted cut-off score indicated in the RFAM database were used. The T. melanosporum genome contains nine copies of U1snRNA, 14 copies of U2snRNA, eight copies of U4snRNA and two copies of U6snRNA. No U5snRNA gene was found. Similarly, no U11, U12, U4at and U6atsnRNA candidate genes were identified, in keeping with the inability to identify U12-type introns in this genome as in the genomes of all the other fungi analyzed so far. 7.3. Ribosomal RNAs (rRNA) Due to their high repeat content, ribosomal DNA (rDNA) repeats regions typically do not get assembled into supercontigs in fungal genomes. Partial sequences of the 18S, 5.8S and 25S sequences from fungal sequences retrieved from the NCBI were used as initial queries with BLASTN (35) against T. melanosporum 21 SUPPLEMENTARY INFORMATION genomic sequence. Sequences of the rDNA tandem repeat were found in several scaffolds, including: 23, 24, 297, 298, 354 and 355. The size of the T. melanosporum 18S-5.8S-26S RNA (rRNA) repeat was estimated to be ~13.6 kbp. 8. Whole-genome exon oligoarray analyses The T. melanosporum custom-exon expression array (4 x 72K) manufactured by Roche NimbleGen Systems Limited (Madison, WI) (http://www.nimblegen.com/products/exp/index.html) contained five independent, nonidentical, 60-mer probes per gene model coding sequence. Included in the oligoarray were 12232 annotated gene models, 3913 random 60-mer control probes and labelling controls. Sequences used for the oligonucleotide design were from an early draft of the gene catalog containing several TE families. For 1876 gene models, technical duplicates were included on the array. Free-living mycelium of T. melanosporum Mel28 was grown on 1% malt agar (Cristomalt-D, Difal, Villefranche-sur-Saône, France) for either five weeks or four month before harvesting. Ectomycorrhizal root tips were sampled from five-month-old Common Hazel (Corylus avellana L.) plantlets inoculated by a mycelium slurry produced from a fruiting body harvested in Meuse (France) by Gérard Chevalier. Inoculated plants were grown in the AGRI-TRUFFE (Saint-Maixant, France) greenhouse. Fruiting bodies of T. melanosporum were collected below Common Hazel trees or oak trees at different locations [(Auvergne, Meuse, and Dordogne (France), and Piceno (Italy)]. Tissues were snap frozen in liquid nitrogen and RNA extraction was carried out using the RNeasy Plant Mini Kit including a DNase treatment (Qiagen, Cat No. 74904). RNA quality and integrity were checked prior to cDNA synthesis using the Bio-Rad Experion analyzer. Total RNA preparations (four biological replicates for ectomycorrhizas, five for fruiting bodies and seven for free-living mycelium) were amplified using the SMART PCR cDNA Synthesis Kit (Clontech) according to the manufacturer’s instructions. Single dye labeling of samples, hybridization procedures, data acquisition, background correction and normalization were performed at the NimbleGen facilities (NimbleGen Systems, Reykjavik, Iceland) following their standard protocol. Microarray probe intensities were quantile normalized across all chips. Average expression levels were calculated for each gene from the independent probes on the array and were used for further analysis. Raw array data were filtered for non-specific probes (a probe was considered as non-specific if it shared more than 90% homology with a gene model other than the gene model it was made for) and renormalized using the ARRAYSTAR software (DNASTAR, Inc. Madison, WI, USA). For 1015 gene models no reliable probe was left. A transcript was deemed expressed when its signal intensity was three-fold higher than the mean signalto-noise threshold (cut-off value) of 3913 random oligonucleotide probes present on the array (50 to 100 arbitrary units). Gene models with an expression value higher than three-fold the cut-off level were considered as transcribed (table S2, Fig. S23). A Student t-test with false discovery rate (FDR) (Benjamini-Hochberg) multiple testing correction was applied to the data using the ARRAYSTAR software (DNASTAR). Transcripts with a significant p-value (<0.05) and more than a five-fold change in transcript level were considered as differentially expressed in ectomycorrhizal root tips or fruiting body. The complete expression dataset is available as series (accession number # 000) at the Gene Expression Omnibus at NCBI (http://www.ncbi.nlm.nih.gov/geo/). 9. References for Supplementary Materials 1. Mello A, Murat C, Bonfante P, FEMS Microbiol. Lett. 260, 1 (2006). 2. Murat C, Diez J, Luis P, Delaruelle C, Dupre C, Chevalier G, Bonfante P, Martin F, New Phytologist 164, 401 (2004). 3. Riccioni C, Belfiori B, Rubini A, Passeri V, Arcioni S, Paolocci F, New Phytol. 180, 466 (2008). 4. Poma A, Limongi T, Pacioni G, Appl. Microbiol. Biotech., 72, 437 (2006). 5. Hall IR, Yun W, Amicucci A, Tr. Biotech., 21, 433 (2003). 6. Pargney JC, Leduc JP, Bull. Soc. bot. Fr., 137, 21-34 (1990). 7. S Jeandroz, C Murat, W Yongjin, P Bonfante, F Le Tacon, J. Biogeography 35, 815 (2008). 22 SUPPLEMENTARY INFORMATION 8. Hibbett DS, Gilbert LB, Donoghue MJ, Nature 407, 506 (2000). 9. Lutzoni F, Kauff F, Cox CJ, McLaughlin D, Celio G, Dentinger B, Padamsee M, Hibbett D, James TY, Baloch E, Grube M, Reeb V, Hofstetter V, Schoch C, Arnold AE, Miadlikowska J, Spatafora J, Johnson D, Hambleton S, Crockett M, Shoemaker R, Sung GH, Lücking R, Lumbsch T, O'Donnell K, Binder M, Diederich P, Ertz D, Gueidan C, Hansen K, Harris RC, Hosaka K, Lim YW, Matheny B, Nishida H, Pfister D, Rogers J, Rossman A, Schmitt I, Sipman H, Stone J, Sugiyama J, Yahr R, Vilgalys R, Am. J. Bot. 91, 1446 (2004). 11. B. A. LePage, R. S. Currah, R. A. Stockey, G. W. Rothwell, Am. J. Bot. 84, 410 (1997). 12. I. J. Alexander, New Phytol. 172, 589 (2006). 13. B. Moyersoen, New Phytol. 172, 753 (2006). 14. D. S. Hibbett, P. B. Matheny, BMC Biol. 7, 13 (2009). 15. JW Taylor, ML Berbee, Mycologia 98, 838 (2006). 16. D. B. Jaffe, J. Butler, S. Gnerre, E. Mauceli, K. Lindblad-Toh, J. P. Mesirov, M. C. Zody, & E. S. Lander, Genome Res. 13, 91 (2003). 17. A. Poma, G. Venora, M. Miranda, G. Pacioni, Caryologia 55, 307 (2002). 18. M. Lynch, J.S. Conery, Science 302, 1401 (2003). 19. H. Quesneville, C. M. Bergman, O. Andrieu, D. Autard, D. Nouaud, M. Ashburner, D. Anxolabehere D, PLoS Comput. Biol., 1, e22 (2005). 20. Z. Bao, S. R. Eddy, Genome Res. 12, 1269 (2002). 21. R. C. Edgar, E. W. Myers, Bioinformatics 21 Suppl 1, i152-158. 22. Mc Carthy E & Mc Donald JF, Bioinformatics 19, 362-367 (2003). 23. Ma J, Bennetzen J.L, Proc. Natl. Acad. Sci. USA 101: 12404 (2004). 24. Smit, AFA, Hubley, R & Green, P. RepeatMasker Open-3.0. 1996-2004 <http://www.repeatmasker.org>. 25. G. Benson, Nucl Ac. Res. 27(2), 573 (1999). 26. The UniProt Consortium, Nucl Ac. Res. 36:D190-D195(2008). 27. E. Birney, M. Clamp, R. Durbin, Genome Res. 14(5), 988 (2004). 28. R. Guigó, S. Knudsen, N. Drake & T. F. Smith, J. Mol. Biol. 226, 141 (1992). 29. I. Korf, BMC Bioinformatics 5, 59 (2004). 30. K. L. Howe, T. Chothia & R. Durbin, Genome Res. 12, 1418 (2002). 31. E.M. Zdobnov & R. Apweiler, Bioinformatics 17(9), 847 (2001). 32. The Gene Ontology Consortium, Nat. Genet. 25(1), 25 (2000). 33. R. L. Tatusov, N. D. Fedorova, J. D. Jackson, A. R. Jacobs, B. Kiryutin, E. V. Koonin, D. M. Krylov, R. Mazumder, S. L. Mekhedov, A. N. Nikolskaya, B. S. Rao, S. Smirnov, A. V. Sverdlov, S. Vasudevan, Y. I. Wolf, J. J. Yin, D. A. Natale, BMC Bioinformatics 4, 41 (2003) 34. M. Kanehisa, & S. Goto, Nucl Ac. Res. 28, 27 (2000) 35. S. F. Altschul, W. Gish, W. Miller, E. W. Myers & D. J. Lipman, J. Mol. Biol. 215, 403 (1990) 36. Wortman JR, Fedorova N, Crabtree J, Joardar V, Maiti R, et al., Med Mycol 44: S3 (2006). 37. Kasuga T, Mannhaupt G, Glass L, PLoS ONE 4, e5286 (2009). 38. P. P. Calabrese, S. Chakravarty, T. J. Vision, Bioinformatics 19, i74 (2003). 39. A. J. Enright , S. Van Dongen & C. A. Ouzounis CA, Nucl Ac. Res. 30(7), 1575 (2002) 40. T. De Bie, N. Cristianini, J. P. Demuth & M. W. Hahn, Bioinformatics 22(10), 1269 (2006) 41. R. D. Finn, J. Tate, J. Mistry, P. C. Coggill, S. J. Sammut, HR Hotz, G. Ceric, K. Forslund, S. R. Eddy, E. L. L. Sonnhammer & A. Bateman, Nucl Ac. Res. 36, D281 (2008) 42. C. Catalanotto, G. Azzalin, G. Macino, C. Cogoni, Genes Dev. 16, 790 (2002). 43. P. K. Shiu, N. B. Raju, D. Zickler, R. L. Metzenberg, Cell 107, 905 (2001). 44. R. A. Martienssen, M. Zaratiegui, D. B. Goto, Trends Genet. 21, 450 (2005). 45. C. Matranga, P. D. Zamore, Curr. Biol. 17, R790 (2007). 46. K. K. Adhvaryu, E. U. Selker, Genes Dev. 22, 3391 (2008). 47. F. Malagnac, B. Wendel, C. Goyon, G. Faugeron, D. Zickler, J.L. Rossignol, M. Noyer-Weidner, P. Vollmayr, T.A. Trautner, J. Walter, Cell 91, 281 (1997). 48. J. E. Galagan, E. U. Selker, Trends Genet. 20, 417 (2004). 49. J. A. Jeddeloh, T. L. Stokes, E. J. Richards, Nature Genet. 22, 94 (1999). 50. T. Kanno, M. F. Mette, D. P. Kreil, W. Aufsatz, M. Matzke, Curr Biol. 14, 801 (2004). 51. A.V. Gendrel, Z. Lippman, C. Yordan, V. Colot, R. A. Martienssen, Science 297, 1871 (2002). 52. S. Zeppa, A. M. Gioacchini, C. Guidi, M. Guescini, R. Pierleoni, A. Zambonelli,V. Stocchi, Rapid Commun. Mass Spectrom. 18, 199 (2004). 53. R. Splivallo, S. Bossi, M. Maffei, P. Bonfante, Phytochemistry 68, 2584 (2007). 54. A. M. Gioacchini, M. Menotta, M. Guescini, R. Saltarelli, P. Ceccaroli, A. Amicucci, E. Barbieri, G. Giomaro, V. Stocchi, Rapid Commun Mass Spectrom. 22, 3147 (2008). 55. S. Gabella, S. Abbà, S. Duplessis, B. Montanini, F. Martin, P. Bonfante, Eukaryot Cell 4, 1599 (2005). 56. L. A. Hazelwood, J. M. Daran, A. J. A. van Maris, J. T. Pronk, R. Dickinson, Appl. Env. Microbiol. 74, 2259 (2008). 57. P. Buzzini, C. Gasparetti, B. Turchetti, M. R. Cramarossa, A. Vaughan-Martini, A. Martini, U. M. Pagnoni, L. Forti, Arch. Microbiol. 184, 187 (2005). 58. F. Pelusio, T. Nillsson, L. Montanarella, R. Tilio, B. Larsen, S. Facchetti, J. Ø. Madsen, J. Agric. Food Chem. 34, 2138 (1995). 59. V. P. Kurup, H. D. Shen, H. Vijay, Int. Arch. Allergy Immunol. 129, 181 (2002). 60. N. P. Keller, G. Turner, J. W. Bennett, Nat. Rev. Microbiol. 3, 937 (2005). 61. P. Bowyer, M. Fraczek, W. Denning, BMC Genomics 7, 251 (2006). 62. D. Bhatnagar, K. C. Ehrlich, T. E. Cleveland, Appl. Microbiol. Biotechnol. 61, 83 (2003). 63. D. M. Gardiner, B. J. Howlett, FEMS Microbiol. Lett. 248, 241 (2005). 64. M. Kimura, T. Tokai, N. Takahashi-Ando, S. Ohsato, M. Fujimura, Biosci. Biotechnol. Biochem. 71, 2105 (2007). 65. R.H. Proctor, M. Busman, J. A. Seo, Y. W. Lee, R. D. Plattner, Fungal Genet. Biol. 45, 1016 (2008). 66. M. Liu, A. Nauta, C. Francke, R. J. Siezen, Appl. Environ. Microbiol. 74, 4590 (2008). 67. J. Hansen, Appl. Environ. Microbiol. 65, 3915 (1999). 68. C. Talora, L. Franchi, H. Linden, P. Ballario, G. Macino, EMBO J. 18, 4961 (1999). 23 SUPPLEMENTARY INFORMATION 69. J. C. Dunlap, J. J. Loros, Curr. Opin. Microbiol. 9, 579 (2006). 70. R. Ambra, B. Grimaldi, S. Zamboni, P. Filetici, G. Macino, P. Ballario, Fungal Genet. Biol. 41, 688 (2004). 71. A. C. Froehlich, B. Noh, R. D. Vierstra, J. Loros, J. C Dunlap, Eukaryot. Cell 4, 2140 (2005). 72. J. Purschwitz, S. Müller, C. Kastner, M. Schöser, H. Haas, E. A. Espeso, A. Atoui, A. M. Calvo, R. Fischer, Curr. Biol. 18, 255 (2008). 73. J. A. Bieszke, E. L. Braun, L. E. Bean, S. Kang, D. O. Natvig, K. A. Borkovich, Proc. Natl. Acad. Sci. (U.S.A.) 96, 8034 (1999). 74. R. Percudani, A. Trevisi, A. Zambonelli, S. Ottonello, Mol. Phylogenet. Evol. 13, 169 (1999). 75. J. D. Bendtsen, H. Nielsen H, G. von Heijne G, S. Brunak, J Mol Biol. 340, 783 (2004) 76. B. L. Cantarel, P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard, B. Henrissat, Nucleic Acids Res. 37, D233 (2009). 77. T. M. Lowe, S. R. Eddy, Nucleic Acids Res. 25, 955 (1997). 78. R. Percudani, A. Pavesi, S. Ottonello, J. Mol. Biol. 268, 322 (1997). 79. C. Marck, H. Grosjean, RNA 8, 1189 (2002). 80. R. Percudani, Tr Genet. 17, 133 (2002). 81. Eddy S. R., BMC Bioinformatics, 3:18 (2002). 82. Griffiths-Jones S., Annu. Rev. Genom. Hum. Genet. 8:279–298 (2007). 83. Aguileta G., Marthey S., Chiapello H., Lebrun M.-H., Rodolphe F., Fournier E., Gendrault-Jacquemard A., Giraud T., Syst. Biol., 57:1 (2008). 84. Ronquist F., Huelsenbeck J. P., Bioinformatics 19:1572 (2003). 85. Wicker T., Sabot F., Hua-Van A., Bennetzen J. L., Capy P., Chalhoub B., Flavell A., Leroy L., Morgante M., Panaud O., Paux E., SanMiguel Ph., Schulman A.H., Nature Rev. Genet., 8, 973 (2007). 24 SUPPLEMENTARY INFORMATION The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis Supplementary Tables 25 SUPPLEMENTARY INFORMATION Table S1. The main features of the T. melanosporum nuclear genome. Genome features Values Size Chromosomes GC percentage (total genome) GC percentage in coding sequences GC percentage in non-coding regions tRNA genes rDNA repeat number Consensus rDNA repeat size 5S rRNAs snRNA genes Percent transposable elements Protein coding genes (CDSs) Percent coding Average CDS size (bp) Average codon exon size (bp) Average number of coding exons per gene Average codon intron size (bp) Average codon peptide size (aa) 124946 kb ≥8 52.02 55.87 48.82 143 5 in the assembly ~13.6 kb >7 in the assembly 33 58 7496 18 2073 bp 340 4.51 107 439 Table S2. Validation of predicted gene models of T. melanosporum based on the NimbleGen exon oligoarray. Samples Number of gene models with expression value >cut-off (%) total Free-living mycelium 8204 (96.9 %) Ectomycorrhiza 8137 (96.1 %) Fruiting body 8191 (96.8 %) 8329 (98.4 %) Values represent the proportion of genes expressed above the background control threshold. A transcript was deemed expressed when its signal intensity was three-fold higher than the mean background expression value (cut-off value) of 3913 random oligos present on the array (see section 8 for detailis). 26 SUPPLEMENTARY INFORMATION Table S3. The most highly upregulated transcripts in T. melanosporum/Corylus avellana ectomycorrhizal root tips compared to free-living mycelium and fruiting body. SEQ_ID ECM FB FLM level level level GSTUMT00012772001 13262 731 1 GSTUMT00012792001 11423 250 1 ECM/FLM ratio 13262 11423 FB/FLM ratio 731 250 2 2 2 4 2 1 2 4 5 9 9796 7721 6213 4538 2636 2063 2050 1745 1432 1401 90 1 1 2 1836 1 1610 12 692 107 593 35 1 18 1073 1066 593 2 2 1 1 2 902 833 2 1 GSTUMT00006579001 14241 8701 GSTUMT00010279001 7848 1926 GSTUMT00000499001 49524 634 22 12 77 662 651 644 404 160 8 GSTUMT00012667001 GSTUMT00009500001 2 6 575 403 3 453 GSTUMT00012437001 GSTUMT00009894001 GSTUMT00008973001 GSTUMT00008992001 GSTUMT00006890001 GSTUMT00010076001 GSTUMT00003538001 GSTUMT00012780001 GSTUMT00009016001 GSTUMT00005760001 19542 180 18205 2 10866 2 19305 9 5436 3787 2063 1 3588 2818 6542 44 6737 3257 12966 989 GSTUMT00007927001 1073 GSTUMT00008954001 19493 GSTUMT00000763001 GSTUMT00002130001 902 1534 1151 2529 7 2846 Definition H-type lectin Fasciclin-like arabinogalactan protein Lipase/esterase Cytochrome P450 Endoglucanase GH5 Laccase Sporulation-induced protein Tyrosinase FAD oxidoreductase LysM domain protein DUF1479-domain protein Major facilitator superfamily (MFS) permease* Hypothetical protein Major facilitator superfamily (MFS) permease Cytochrome P450 Beta-glucan synthesisassociated protein (SKN1) DUF1479-domain protein DUF2235-domain protein Phosphatidylserine decarboxylase Hypothetical protein Tuber-specific protein Size Location TMD 279 414 _ S 0 0 346 396 342 586 481 604 564 87 379 142 S M _ S _ _ _ S _ S 0 0 0 0 0 0 0 0 0 2 306 496 _ _ 0 10 397 575 M M 0 1 426 403 316 M _ _ 0 0 0 883 86 _ M 2 0 Transcript profiling was performed on free-living mycelium, fruiting bodies and ectomycorrhizal root tips. Values are the means of seven, five and four biological duplicates, respectively. Based on the statistical analysis, a gene was considered significantly upregulated if it met all two criteria: (i) t-test P-value < 0.05 (ArrayStar, DNASTAR); (ii) mycorrhiza vs. free-living mycelium fold change ≥ 4; 571 genes (7.6% of the total gene repertoire) showed an upregulated expression. Before the presence of a transcript can be declared, the signal-to-noise threshold (signal background) was calculated based on the mean intensity of 3,913 random probes present on the microarray. Cut-off values for signal intensity (50 to 100 arbitrary units), corresponding to three times the background values estimated from random 60-mer probes on the NimbleGen oligoarrays, were then subtracted from the normalized intensity values. The highest signal intensity values observed on these arrays were ~65,189 arbitrary units. Signals below the cut-off values were assigned a signal intensity value of 1. Abbreviations: FB, fruiting body; FLM, free-living mycelium; ECM, ectomycorrhizal root tips; S, secreted; M, mitochondrial; TMD, transmembrane domain. * truncated sequence. SUPPLEMENTARY INFORMATION Table S4. The most highly upregulated transcripts in T. melanosporum fruiting body compared to free-living mycelium and T. melanosporum/C. avellana ectomycorrhizal root tips. SEQ_ID ECM level FB level GSTUMT00001784001 4 41047 GSTUMT00009814001 1 4478 GSTUMT00009616001 11 10491 GSTUMT00001879001 39 3692 GSTUMT00006890001 5436 3787 GSTUMT00003538001 3588 2818 GSTUMT00010017001 17 1769 GSTUMT00001878001 1 14686 GSTUMT00006097001 105 13736 GSTUMT00012772001 13262 731 GSTUMT00002786001 5 978 GSTUMT00009016001 6737 3257 GSTUMT00009465001 18 6646 GSTUMT00003182001 17 1260 GSTUMT00007927001 1073 593 GSTUMT00002874001 1443 15022 GSTUMT00002703001 24 11228 GSTUMT00008314001 154 626 GSTUMT00009500001 2529 2846 GSTUMT00012516001 29 787 GSTUMT00004141001 2669 13898 GSTUMT00005006001 11706 18751 GSTUMT00006579001 14241 8701 GSTUMT00010468001 10 1363 GSTUMT00012378001 10 4272 GSTUMT00003188001 282 4551 GSTUMT00012817001 64 3666 GSTUMT00009177001 1 342 GSTUMT00005385001 860 1964 FLM FB/FLM Definition level ratio 3 1 4 2 2 2 2 14 17 1 1 5 10 2 1 30 23 1 6 2 33 44 22 4 12 12 11 1 6 12097 4478 2377 2303 1836 1610 1091 1026 820 731 708 692 666 619 593 500 481 454 453 443 427 423 404 388 368 366 345 342 315 Tuber-specific protein GAL4-like DNA-binding domain protein integral membrane protein Pth11-like Tuber-specific protein DUF2235 Sporulation associated protein FAD binding domain Atrophin-1 family Hypothetical protein Tuber-specific small secreted protein H-type lectin O-glycosylated cell wall protein DUF1479-domain protein WW domain containing protein Flavin containing amine oxidoreductase Hypothetical protein Lipase Hypothetical protein Hypothetical protein Tuber-specific mitochondrial protein O-methyltransferase GMC oxidoreductase Tuber-specific secreted protein DUF1479-domain protein Hypothetical protein Tuber-specific small secreted protein Hypothetical protein FAD binding domain Tuber-specific mitochondrial protein Tuber-specific protein Size Location TMD 332 585 253 71 481 564 767 389 131 279 161 379 259 439 306 336 62 133 86 333 607 180 426 890 216 336 510 598 206 _ _ S _ _ _ _ S S _ S _ _ _ _ S _ _ M _ _ S M _ S S S M _ 0 0 3 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 3 0 0 0 7 1 0 0 Transcript profiling was performed on free-living mycelium, fruiting bodies and ectomycorrhizal root tips. Values are the means of seven, five and four biological duplicates, respectively. Based on the statistical analysis, a gene was considered significantly upregulated if it met all two criteria: (i) t-test P-value < 0.05 (ArrayStar, DNASTAR); and (ii) fruiting bodies vs. free-living mycelium fold change ≥ 4. Before the presence of a transcript can be declared, the signal-to-noise threshold (signal background) was calculated based on the mean intensity of 3,913 random probes present on the microarray. Cut-off values for signal intensity (50 to 100 arbitrary units), corresponding to three times the background values estimated from random 60-mer probes on the NimbleGen oligoarrays, were then subtracted from the normalized intensity values. The highest signal intensity values observed on these arrays were ~65,189 arbitrary units. Signals below the cut-off values were assigned a signal intensity value of 1. Abbreviations: FB, fruiting body; FLM, free-living mycelium; ECM, ectomycorrhizal root tips; S, secreted; M, mitochondrial; TMD, transmembrane domain. 28 SUPPLEMENTARY INFORMATION Table S5. Tissue-specific transcripts in T. melanosporum. A transcript was considered as tissue specific when it was not detectable in the two other tissues or if the transcript level in this tissue was at least 100-fold higher than in the two other tissues. Ratios between 100-1000 are coloured in light pink, ratios higher than 1000 in dark pink. ECM level FB level FLM level 2063 375 19542 18205 10866 19305 6542 19493 902 1534 nd nd 180 2.1 1.8 9 44 35 1.6 nd nd nd 2.0 2.4 1.7 4 4 18 nd 1.8 GSTUMT00001976001 659 GSTUMT00012667001 1151 GSTUMT00008934001 385 GSTUMT00012588001 6518 GSTUMT00009298001 6905 GSTUMT00001432001 13664 GSTUMT00006630001 18170 GSTUMT00003580001 5014 GSTUMT00008388001 3047 GSTUMT00011889001 834 GSTUMT00010195001 6221 2 7 2 4 11 nd 3 nd nd 6 nd 342 148 41047 4478 10491 1769 14686 13736 978 6646 11228 1363 4272 4907 3776 524 2005 404 1962 34625 Tissue SEQ_ID Definition Size Location TMD 604 104 346 396 342 586 87 496 397 575 S S M _ S S _ M M 0 0 0 0 0 0 0 10 0 1 nd 2 nd 29 31 63 110 33 28 8 63 Tyrosinase Hypothetical protein Ab hydrolase_3 Cytochrome P450 Cellulase Multicopper oxidase Hypothetical protein Major Facilitator Superfamily Cytochrome P450 Beta-glucan synthesis-associated protein Endonuclease/Exonuclease/phosphatase family Hypothetical protein Tuber-specific protein Hypothetical protein Glycosyl hydrolase family 12 Major Facilitator Superfamily Class II Aldolase Phospholipase A2 Hypothetical protein Hypothetical protein Sugar (and other) transporter 461 883 120 353 241 435 283 216 667 810 452 S _ _ S S _ _ S _ S _ 0 2 0 6 0 10 0 0 3 0 9 nd nd 3.4 nd 4.4 1.6 14 17 1.4 10 23 3.5 12 17 16 2,4 9 2 11 219 Tuber-specific protein DEAD/DEAH box helicase Tuber-specific protein GAL4-like DNA-binding domain protein Hypothetical protein Atrophin-1 family Hypothetical protein Tuber-specific protein O-glycosylated cell wall protein WW domain containing protein Hypothetical protein Hypothetical protein Hypothetical protein Tuber-specific protein Tuber-specific protein Hypothetical protein Glycolipid anchored surface protein Tuber-specific protein DEAD/DEAH box helicase Tuber-specific protein 598 476 332 585 253 767 389 131 161 259 62 890 216 229 339 302 446 91 1535 193 M _ _ _ S _ S S S _ _ _ S _ S _ S _ _ S 0 0 0 0 3 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 ECM GSTUMT00010076001 GSTUMT00004343001 GSTUMT00012437001 GSTUMT00009894001 GSTUMT00008973001 GSTUMT00008992001 GSTUMT00012780001 GSTUMT00008954001 GSTUMT00000763001 GSTUMT00002130001 FB GSTUMT00009177001 GSTUMT00008309001 GSTUMT00001784001 GSTUMT00009814001 GSTUMT00009616001 GSTUMT00010017001 GSTUMT00001878001 GSTUMT00006097001 GSTUMT00002786001 GSTUMT00009465001 GSTUMT00002703001 GSTUMT00010468001 GSTUMT00012378001 GSTUMT00011444001 GSTUMT00003743001 GSTUMT00003678001 GSTUMT00002553001 GSTUMT00008040001 GSTUMT00003621001 GSTUMT00007269001 nd nd 4 1,3 11 17 nd 105 4,6 18 24 10 9.8 9.1 1.9 nd 7.6 1 3.2 169 29 SUPPLEMENTARY INFORMATION GSTUMT00011091001 GSTUMT00003226001 GSTUMT00006318001 nd nd 3.2 15694 524 3569 106 3.7 26 GSTUMT00005042001 GSTUMT00010990001 GSTUMT00002278001 GSTUMT00012523001 GSTUMT00002331001 GSTUMT00008156001 GSTUMT00011724001 GSTUMT00007237001 GSTUMT00008657001 GSTUMT00005026001 GSTUMT00006197001 GSTUMT00012511001 GSTUMT00002274001 GSTUMT00011356001 GSTUMT00008966001 GSTUMT00006468001 GSTUMT00005001001 GSTUMT00002323001 GSTUMT00009209001 GSTUMT00009688001 nd nd nd nd 4.4 14.8 3 nd 154 16 6,0 4 64,5 1,8 2.8 4,9 134 3 2.4 4.6 nd nd 2.8 4,0 nd 13.6 10 1.8 58 nd 10.6 3.4 60,5 nd 5.3 1,4 169 2.2 nd 5.4 189 105 1521 891 2469 7595 1294 271 39669 3733 1366 874 13377 356 542 837 18248 386 323 540 Tuber-specific protein Tuber-specific protein Tuber-specific protein 68 78 200 M _ _ 1 0 0 Tuber-specific protein Tuber-specific protein Tuber-specific protein Hypothetical protein Hypothetical protein Tuber-specific protein Tuber-specific protein Tuber-specific protein peptidyl-prolyl cis-trans isomerase Tuber-specific protein Hypothetical protein Zn(2)-Cys(6) cluster domain Tuber-specific protein RNase3 domain Major Facilitator Superfamily Sugar transporter Major Facilitator Superfamily Tuber-specific protein Hypothetical protein Tuber-specific protein 63 64 217 103 494 91 77 197 168 623 480 685 74 1402 514 407 565 225 331 140 M S M S _ _ _ M _ M _ _ M _ M _ _ _ _ M 0 0 1 1 0 0 0 1 0 0 1 0 0 0 13 7 14 0 0 0 FLM 30 SUPPLEMENTARY INFORMATION Table S6. The top 10 most abundant (Tribe-MCL) protein families (excluding TE-related families) in T. melanosporum genome. Family # PFAM description Tuber Neurospora Botrytis Nectria Magnaporthe Stagonospora Sclerotinia Aspergillus Laccaria Total 5 NB-ARC domain 46 3 9 11 4 5 5 5 205 293 7 Protein kinase domain 34 26 30 31 30 32 25 30 48 286 15 Helicase domain 22 23 26 23 23 23 23 22 28 213 22 AAA-ATPase family 22 16 19 18 18 16 16 16 16 157 2 WD40, WD domain 21 19 25 31 18 21 35 26 143 339 10 Short chain dehydrogenase 17 17 31 64 24 33 26 30 12 254 17 Methyltransferase domain 17 25 12 89 10 15 10 7 2 187 0 MFS1, Major Facilitator Superfamily 15 26 36 158 52 82 29 51 18 467 27 SNF2 family N-terminal domain 14 13 15 13 13 18 15 15 15 131 31 Ubiquitin-conjugating enzyme 14 12 13 14 13 13 13 14 12 118 SUPPLEMENTARY INFORMATION Table S7. The protein families showing the highest rate of contraction in T. melanosporum genome. N° members Family ID TUBME NECHA NEUCR MAGGR BOTCI SCLSC STANO ASPFU PFAM accession PFAM description 0 15 158 26 52 36 29 82 51 PF07690, PF00083 1 12 79 15 49 55 34 63 27 PF00067, Major Facilitator Superfamily (MFS), Sugar (and other) transporter Cytochrome P450 2 21 31 19 18 25 35 21 26 PF04047, PF08625, 3 9 72 22 27 45 34 48 49 4 7 74 16 51 28 25 76 33 PF07690, PF00083, PF00854 PF05730 6 0 150 29 10 28 16 56 1 8 12 51 15 16 47 35 43 44 10 17 64 17 24 31 26 33 30 12 7 46 15 24 43 23 37 27 14 0 90 12 13 17 9 71 18 3 44 8 19 35 14 19 8 58 12 16 12 20 3 14 7 38 31 21 6 46 10 17 23 0 10 0 6 Periodic tryptophan protein 2 WD repeat, WD40 associated domain, PGAP1-like protein MFS, Sugar (and other) transporter, POT family CFEM domain PF06985, PF00106, PF00596 PF00083, PF07690 Heterokaryon incompatibility protein (HET), short chain dehydrogenase, Sugar (and other) transporter, MFS NAD dependent epimerase/ dehydratase family, 3-beta hydroxysteroid dehydrogenase/isomerase family MFS, Sugar (and other) transporter, Beta-ketoacyl synthase 3 PF08659, PF01370, PF01073 PF07690, PF00083, PF02801 PF06985, PF00023 30 14 PF00135, PF02734 Carboxylesterase, DAK2 domain 12 21 22 PF00171, PF05893 Aldehyde dehydrogenase family, Acyl-CoA reductase 22 33 20 19 17 15 17 PF00698, PF08659, PF00975, PF08242 PF00324 Acyl transferase domain, KR domain, Thioesterase domain, Methyltransferase domain Amino acid permease 11 4 14 4 PF00069, PF00023 Protein kinase domain, Ankyrin repeat HET, Ankyrin repeat The table lists 20 TRIBE-MCL families that are in contraction in the T. melanoposrum lineage (CAFE analysis, P<0.001) (Fig. S11B). Annotations are based on searches of T. melanosporum protein sequences against the PFAM database. Abbreviations: TUBME, T. melanosporum; NECHA, Nectria haematococca; NEUCR, N. crassa; MAGGR, M. grisea; BOTCI, B. cinerea; SCLSC, ; STANO, Stagonospora nodorum; and ASPFU, A. fumigatus. Further information is found in SOM text S5.3. 32 SUPPLEMENTARY INFORMATION Table S8. Protein families showing the highest rate of expansion in T. melanosporum genome. N° members Pfam accession Pfam description Tetratricopeptide repeat (TPR), NB-ARC domain, PGAP1-like protein Family ID TUBME NECHA NEUCR MAGGR BOTCI SCLSC STANO ASPFU 5 46 11 3 4 9 5 5 5 7 34 31 26 30 30 25 32 30 17 17 89 25 10 12 10 15 7 PF07721, PF00931, PF07819 PF00069, PF00023, PF08587, PF07714 PF08242, PF08241 22 22 18 16 18 19 16 16 16 PF01078, PF07726, Magnesium chelatase, subunit ChlI, ATPase family (AAA) 42 13 10 10 11 11 11 10 10 PF08477, PF00071 Miro-like protein, Ras family 57 10 8 8 8 8 8 8 8 PF00118 TCP-1/cpn60 chaperonin family 72 9 6 6 7 7 6 8 6 PF00012, PF00096 Hsp70 protein, Zinc finger, C2H2 type 96 7 7 5 4 4 4 4 4 Ubiquitin family, Ribosomal protein S27a, Ribosomal L40e family 116 5 4 4 4 5 4 6 5 135 5 9 3 6 2 2 7 3 PF00240, PF01599, PF01020 PF00160, PF00515, PF07719 PF00082 149 8 3 4 2 2 2 3 2 PF00069, PF07714 Protein kinase domain, Protein tyrosine kinase 164 9 5 4 5 2 2 3 3 NO PFAM - 169 10 6 2 2 0 0 0 0 PF01926 GTPase of unknown function Protein kinase domain, Ankyrin repeat, Ubiquitin associated, domain (UBA), Protein tyrosine kinase Methyltransferase domain Cyclophilin type peptidyl-prolyl cis-trans isomerase/CLD, TPR Subtilase family The table lists 20 TRIBE-MCL families that are in expansion in the T. melanopsorum lineage (CAFE analysis, P<0.001) (Fig. S11B). Annotations are based on searches of T. melanosporum protein sequences against the PFAM database. Abbreviations: TUBME, T. melanosporum; NECHA, Nectria haematococca; NEUCR, N. crassa; MAGGR, M. grisea; BOTCI, B. cinerea; SCLSC, ; and STANO, Stagonospora nodorum; ASPFU, A. fumigatus. Further information is found in SOM text S5.3. 33 SUPPLEMENTARY INFORMATION Table S9. Gene families unique to T. melanosporum Tuber N° family ID PFAM # PFAM T. melanosporum gene model ID description 4601 2 - GSTUMT00010765001,GSTUMT00010766001 5042 2 - 5046 6 - GSTUMT00001741001,GSTUMT00007092001 GSTUMT00005968001,GSTUMT00005970001,GSTUMT00005973001,GSTUMT00005978001, GSTUMT00005985001,GSTUMT00005986001 5932 4 - 5936 4 PF0785 5938 4 - GSTUMT00003636001,GSTUMT00009890001,GSTUMT00009892001,GSTUMT00012733001 6709 3 - GSTUMT00005969001,GSTUMT00005976001,GSTUMT00005979001 6712 3 PF01185 Hydrophobin GSTUMT00006864001,GSTUMT00012443001,GSTUMT00012444001 6713 3 PF0572 GSTUMT00007355001,GSTUMT00012494001,GSTUMT00012778001 6717 3 - 6726 3 PF0771 6728 3 PF00096 zf-C2H2 9370 2 - 9373 2 PF0598 9374 2 - GSTUMT00000524001,GSTUMT00002591001 9375 2 - GSTUMT00000531001,GSTUMT00012382001 9377 2 PF01636 APH GSTUMT00000735001,GSTUMT00012643001 9384 2 PF1064 GSTUMT00001850001,GSTUMT00008931001 9385 2 - GSTUMT00001883001,GSTUMT00002072001 9386 2 - GSTUMT00001953001,GSTUMT00011182001 9388 2 PF00226 DnaJ GSTUMT00001985001,GSTUMT00006727001 9389 2 PF0857 GSTUMT00002245001,GSTUMT00006509001 9390 2 - GSTUMT00002313001,GSTUMT00005467001 9391 2 - GSTUMT00002606001,GSTUMT00002607001 9394 2 - 9395 2 PF0407 9396 2 - 9397 2 - 9398 2 PF0950 9401 2 - GSTUMT00003419001,GSTUMT00010002001 9402 2 - GSTUMT00003421001,GSTUMT00010000001 9403 2 - GSTUMT00003540001,GSTUMT00009531001 9406 2 - GSTUMT00003946001,GSTUMT00006860001 9409 2 - 9411 2 PF0056 ZZ GSTUMT00004719001,GSTUMT00006855001 9413 2 PF0347 MOSC_N GSTUMT00004993001,GSTUMT00004995001 9414 2 - GSTUMT00005128001,GSTUMT00005129001 9416 2 - GSTUMT00005332001,GSTUMT00007336001 9417 2 - GSTUMT00005622001,GSTUMT00012378001 9418 2 - GSTUMT00005917001,GSTUMT00012308001 9419 2 - GSTUMT00006042001,GSTUMT00012706001 9421 2 - GSTUMT00006905001,GSTUMT00010024001 9422 2 - GSTUMT00006917001,GSTUMT00007178001 9423 2 - GSTUMT00007121001,GSTUMT00008465001 9424 2 - GSTUMT00007558001,GSTUMT00007584001 9425 2 - GSTUMT00007581001,GSTUMT00007582001 9431 2 - GSTUMT00008393001,GSTUMT00012669001 9432 2 - GSTUMT00008439001,GSTUMT00008440001 9433 2 - GSTUMT00008916001,GSTUMT00009351001 9435 2 - GSTUMT00010064001,GSTUMT00012770001 9439 2 - GSTUMT00010824001,GSTUMT00012767001 9443 2 - GSTUMT00012279001,GSTUMT00012788001 GSTUMT00000178001,GSTUMT00000180001,GSTUMT00002847001,GSTUMT00012366001 Abhydrolase_3 GSTUMT00003552001,GSTUMT00012317001,GSTUMT00012437001,GSTUMT00012479001 TPMT GSTUMT00008303001,GSTUMT00008304001,GSTUMT00012650001 Pkinase_Tyr GSTUMT00012343001,GSTUMT00012419001,GSTUMT00012651001 GSTUMT00012427001,GSTUMT00012671001,GSTUMT00012761001 GSTUMT00000201001,GSTUMT00012769001 MED7 Carb_bind SAE2 GSTUMT00000345001,GSTUMT00000613001 GSTUMT00002818001,GSTUMT00009789001 YbaK GSTUMT00002819001,GSTUMT00009788001 GSTUMT00002854001,GSTUMT00002861001 GSTUMT00002932001,GSTUMT00009503001 CDC27 GSTUMT00002952001,GSTUMT00009250001 GSTUMT00004570001,GSTUMT00005977001 34 SUPPLEMENTARY INFORMATION 9445 2 - GSTUMT00012417001,GSTUMT00012654001 9446 2 - GSTUMT00012421001,GSTUMT00012653001 9447 2 PF01636 APH GSTUMT00012446001,GSTUMT00012447001 9448 2 PF00400 WD40 GSTUMT00012471001,GSTUMT00012763001 9449 2 - GSTUMT00012517001,GSTUMT00012678001 9450 2 - GSTUMT00012693001,GSTUMT00012694001 The table lists families that are unique to the T. melanopsorum lineage (CAFE analysis, P<0.001). Annotations are based on searches of T. melanosporum protein sequences against the PFAM database. . Further information is found in SOM text S5.3. 35 SUPPLEMENTARY INFORMATION Table S10. Genes unique to the symbiotic fungi T. melanosporum and Laccaria bicolor N° N° PFAM Tuber Laccaria ID PFAM description T. melanosporum Genoscope gene model ID 1 12 - - GSTUMT00012275001 2 5 - - GSTUMT00010765001, GSTUMT00010766001 1 2 6 4 PF01026 TatD_DNase - GSTUMT00012621001 GSTUMT00001741001, GSTUMT00007092001 1 1 1 1 1 1 1 1 1 1 5 2 1 1 1 1 1 1 1 1 PF01031 PF0218 PF0419 PF0179 PF1020 PF0384 PF0046 Dynamin_M YDG_SRA PQ-loop DeoC GSTUMT00003569001 GSTUMT00003278001 GSTUMT00000062001 GSTUMT00000719001 GSTUMT00001031001 GSTUMT00001736001 DUF2340 GSTUMT00001979001 GSTUMT00002691001 TFIID_20kDa GSTUMT00011058001 Ribosomal_L34 GSTUMT00011976001 L. bicolor JGI gene model ID 144541, 293566, 295948, 304863, 316954, 317895, 317896, 317897, 318946, 325570, 325617, 326160 295937, 320976, 321501, 321502, 333496 295047, 299432, 30237, 303664, 328304, 328338 313830, 317358, 325447, 326780 298267, 316255, 316256, 317888, 334853 153937, 30226 161810 294524 296435 149239 244798 305749 165152 150208 The table lists gene models that are unique to the T. melanopsorum and L. bicolor lineages. Annotations are based on searches of protein sequences from T. melanosporum and L. bicolor (http://genome.jgi-psf.org/Lacbi1/Lacbi1.home.html) against the PFAM database. 36 SUPPLEMENTARY INFORMATION Table S11. Genes implicated in sexual reproduction in T. melanosporum genome. Gene Mating processes MAT1 (matB) MAT2 (matA) Sc MFAL1, MFAL2 (An ppgA ) Sc MFA1, MFA2 (An ppgB) Sc KEX1 Sc KEX2 (An KexB) Sc STE13 Sc STE23 Sc RCE1 Sc STE24 Sc RAM1/STE16 Sc RAM2 Sc STE14/Sp mam4 Sc STE6/Sp mam1(An atrD) Sp Ste11 Sc MCM1 Mating signalling Sc STE2/Sp mam2 (An preB) Sc STE3/Sp map3 (An preA) Sc GPA1 (An fadA) Sc STE4 (An sfaD) Sc STE5 Sc STE18 Sc STE20 Sc STE11/Sp byr2 (An steC) Sc STE7/Sp byr1 Sc FUS3/Sp spk1 (An mpkB) Sc STE12 (An steA) Sc FAR1 Sc STE50 Sc DIG1/RST1 Sc DIG2/RST2 EST # Function Gene model Gene name MAT1-1-1 mating-type protein (alpha-box domain transcriptional activator) MAT1-2-1 mating-type protein (HMG-box domain transcriptional activator) pheromone precursor (Hypothetical mating factor alpha) pheromone precursor (a-factor) pheromone processing carboxypeptidase KexA (Kex1) putative pheromone processing endoprotease KexB (Kex2) pheromone maturation dipeptidyl aminopeptidase A a-pheromone processing metallopeptidase Ste23 CAAX prenyl protease 2 CAAX prenyl protease 1 protein farnesyltransferase subunit beta protein farnesyltransferase/geranylgeranyltransferase type-1 subunit alpha prenyl cysteine carboxylmethyltransferase Ste14 mating factor a secretion protein STE6 STE11 like HMG-box protein DNA-binding protein Mcm1 (MADS box family transcription factor) ** GSTUMT00001090001 GSTUMT00004099001 no GSTUMT00004029001 GSTUMT00008707001 GSTUMT00010539001 GSTUMT00003807001 GSTUMT00005185001 GSTUMT00011980001 GSTUMT00007324001 GSTUMT00001514001 GSTUMT00010209001 GSTUMT00003208001 GSTUMT00000587001* GSTUMT00012198001 TmelMAT1-1-1 TmelMAT1-2-1 TmelMFAL1 1 19 TmelKexA TmelkexB TmelDap1 TmelSte23 TmelRCE1 TmelSTE24 TmelRAM1 TmelRAM2 TmelSTE14 TmelSTE6 TmelSte11like TmelMcm1like 9 7 12 33 4 13 7 6 2 no no 27 pheromone alpha-factor receptor PreB/Ste2 pheromone a factor receptor PreA/Ste3 heterotrimeric G-protein alpha subunit (type 1 G-alpha, GPA1) heterotrimeric G-protein beta subunit scaffold protein heterotrimeric G-protein gamma subunit serine/threonine-protein kinase Ste20 mitogen activated protein kinase kinase kinase Ste11 mitogen activated protein kinase kinase, Dual specificity protein kinase, STE7-like mitogen-activated protein kinase, Fus3 sexual development transcription factor Ste12 (Homeodomain DNA binding) cell cycle arrest in G1/various other roles protein kinase regulator Ste50 Transcription factor, interacts ste12 pheromone response Transcription factor, interacts ste12 pheromone response GSTUMT00009053001 GSTUMT00012510001 GSTUMT00011064001 GSTUMT00010108001 no GSTUMT00012095001 GSTUMT00006969001 GSTUMT00005262001 TmelPreB TmelPreA TmelGPA1 TmelGPB 1 no 49 11 TmelGPG TmelSte20 TmelMAPKKK_Ste11 26 4 7 GSTUMT00011865001 GSTUMT00000551001 GSTUMT00006450001 no GSTUMT00010173001 no no TmelMAPKK_Ste7like TmelMAPK_Fus3 Tmelste12 4 12 10 TmelSte50 7 37 Aspergillus nidulans ID AN2755.2 AN4734.2 AN5791.2 Ambiguous AN1384.2 AN3583.2 AN2946.2 AN8044.2* AN6528.2 Yes** AN2002.2 AN3867.2 AN6162.2 AN2300.2 AN3667.2* AN8676.2* AN2520.2 AN7743.2 AN0651.2 AN0081.2 AN2742.2 * AN2067.2* AN2269.2 AN3422.2* AN3719.2 AN2290.2 AN7252.2 * - SUPPLEMENTARY INFORMATION Core meiotic genes in Budding Yeast DSB generation Sc MEI4 Sc MEK1/MRE4 Sc MER1 Sc MER2/REC107 Sc MER3 Sc NAM8/MRE2 Sc REC102 Sc REC103/SK18 Sc REC104 Sc RED1 Sc SPO11/Sp rec12 Removal of Spo11 protein from DNA Sc MRE11 Sc RAD50 Sc SAE2/COM1 Resection of ends Sc XRS2 Strand invasion Sc RAD51 Sc RAD52 Sc RAD54 Sc RAD55 Sc RAD57 Sc RDH54/TID1 Sc RFA1 Sc RFA2 Sc RFA3 Sc SAE3 Synapsis and synaptonemal complex formation Sc HOP1 Sc MND1 Sc ZIP1 Sc ZIP2 Regulation of Crossover frequency Sc MEI5 Sc MLH1 Sc MLH3 Sc MSH4 meiosis-specific protein MEI4 meiosis-specific serine/threonine-protein kinase MEK1 meiotic recombination 1 protein meiotic recombination 2 protein ATP-dependent DNA helicase MER3 RNA binding protein required for meiotic recombination meiotic recombination protein REC102 superkiller protein 8 meiotic recombination protein REC104 protein RED1 required for synaptonemal complex formation NO GSTUMT00012509001 no no GSTUMT00003621001 GSTUMT00001670001 no GSTUMT00010494001 no no GSTUMT00012793001 double-strand break repair protein MRE11 DNA repair protein RAD50 protein SAE2 TmelMEK1 no TmelMER3 TmelNAM8_like 5 32 TmelSKI8 5 TmelSPO11 no GSTUMT00004321001 GSTUMT00001634001 no Tmelmus-23 TmelRAD50 1 11 AN0556.3 AN3619.3 no DNA repair protein XRS2 GSTUMT00002619001 TmelRca 2 no DNA repair protein RAD51 DNA repair and recombination protein RAD52 DNA repair and recombination protein RAD54 DNA repair protein RAD55 DNA repair protein RAD57 DNA repair and recombination protein RDH54 replication factor A protein 1 replication factor A protein 2 replication factor A protein 3 pachytene arrest protein SAE3 GSTUMT00002392001 GSTUMT00010491001 GSTUMT00000296001 no GSTUMT00009344001 GSTUMT00003008001 GSTUMT00005528001 GSTUMT00003154001 no no TmelRad51 Tmelrad22 TmelRAD54 4 7 3 TmelRAD57 TmelRad54b TmelRFA1 TmelRFA2_like 5 1 12 5 uvsC radC AN10677.3 AN6728.3 AN10145.3 AN0855.3 AN7423.3 AN0582.3 no no meiosis-specific protein HOP1 meiotic nuclear division protein 1 synaptonemal complex protein ZIP1 protein ZIP2 GSTUMT00003611001* TmelHOP1 no no no meiosis protein 5 DNA mismatch repair protein MLH1 no GSTUMT00003139001 GSTUMT00008427001 GSTUMT00003206001 GSTUMT00009562001 DNA mismatch repair protein MLH3 MutS protein homolog 4 38 TmelMLH1 TmelMLH1_like TmelMLH3 TmelMSH4 4 3 1 6 12 AN4279.3 no no AN5514.3 AN9090.2 No AN1387.3 no no AN8259.2 AN5516.3 AN1843.3 AN3062.3 no no AN0126.3 AN4365.3 SUPPLEMENTARY INFORMATION Sc MSH5 Sc TAM1/NDJ1 Mismatch repair Sc MLH2 Sc MSH2 Sc MSH3 Sc MSH6 Sc PMS1 Resolution of recombination intermediates Sc MMS4/SLX2 Sc SLX1 Sc SLX3/MUS81 Sc SLX4 Sc SLX8 Sc HEX3/SLX5 Sc TOP1/MAK1/MAK17 Sc TOP2/TOR3/TRF3 Sc TOP3/EDR1 Nonhomologous end joining Sc LIF1 Sc LIG4 Sc YKU70/HDF1/NES24 Sc YKU80/HDF2 Core Meiotic Transcriptome Conserved in S. cerevisiae and S. pombe Anaphase-promoting complex Sc CDC27/ Sp nuc2 Sc APC4/ Sp cut20 Sc CDC16/ Sp cut9 Sc APC1/ Sp cut4 Sc APC5/ Sp apc5 Sc CDC23/ Sp cut23 Sc CDC26/ Sp hcn1 Sc HCT1/ Sp ste9 Sc CDC20/ Sp mfr1 Sc AMA1/ Sp slp1 Septins Sc CDC10/ Sp spn2 Sc CDC3/ Sp spn5 Sc SPR3/ Sp spn6 MutS protein homolog 5 non-disjunction protein 1 GSTUMT00011879001 no DNA mismatch repair protein MLH2 DNA mismatch repair protein MSH2 DNA mismatch repair protein MSH3 DNA mismatch repair protein MSH6 DNA mismatch repair protein PMS1 no GSTUMT00006266001 GSTUMT00011039001 GSTUMT00001828001 GSTUMT00002148001 crossover junction endonuclease MMS4 structure-specific endonuclease subunit slx1 crossover junction endonuclease MUS81 structure-specific endonuclease subunit SLX4 E3 ubiquitin-protein ligase complex SLX5-SLX8 subunit SLX8 E3 ubiquitin-protein ligase complex SLX5-SLX8 subunit SLX5 DNA topoisomerase 1 DNA topoisomerase 2 DNA topoisomerase 3 no GSTUMT00009143001 GSTUMT00006347001 no GSTUMT00007668001 no GSTUMT00008425001 GSTUMT00006716001 GSTUMT00005673001 ligase-interacting factor 1 DNA ligase 4 protein Ku70 protein Ku80 no GSTUMT00007703001 GSTUMT00005220001 GSTUMT00001928001 APC component APC component APC component APC component APC component APC component APC component APC regulator APC regulator APC regulator septin septin sporulation regulated septin 39 TmelMSH5 no AN8531.3 AN1564.3 AN10621.3 AN3749.3 AN1708.3 AN6316.3 TmelMSH2 TmelMSH3 TmelMSH6 TmelPMS1 1 1 10 7 TmelSLX1 TmelMUS81 14 5 TmelSLX8 7 TmelTOP1 TmelTOP2 TmelTOP3 12 5 2 AN6878.3 AN8212.3 AN3118.2 no AN10006.3 no AN0253.3 AN5406.3 AN4555.3 TmelLIG4 TmelKu70 TmelKu80 5 2 8 no AN0097.3 AN7753.3 AN4552.3 GSTUMT00009492001 no GSTUMT00003803001 GSTUMT00007432001 GSTUMT00010617001 GSTUMT00003819001 no GSTUMT00009718001 GSTUMT00005056001 GSTUMT00000167001 MANTUM00009492001 1 TmelCdc16 Tmelapc1 Tmelapc5 MANTUM00003819001 5 5 11 5 Tmelcdh1 MANTUM00005056001 TmelAMA1 1 1 no bimA AN0905.2 AN8002.2 AN2772.2 AN4735.2 AN8013.2 No AN2965.2 AN2965.2 AN0814.2 GSTUMT00010495001 GSTUMT00001755001 GSTUMT00003373001 TmelCdc10 TmelCdc3 TmelCdc12 9 34 12 AN1394.2 sepB AN8182.2 SUPPLEMENTARY INFORMATION Sc SPR28/ Sp spn7 Cell cycle regulators Sc CDC14/ Sp clp1 Sc CDC5/ Sp plo1 Sc CLB1/ Sp cig2 Sc CLB3/ Sp cdc13 Sc CLB4/ Sp cig1 Sc CLB5 Sc CLB6 Recombination/chromosome cohesion Sc REC114/ Sp rec7 Sc DMC1/ Sp dmc1 Sc MND1/ Sp mcp7 Sc HOP2/ Sp meu13 Sc SMC3/ Sp smc3 Sc REC8/ Sp rec8 Chromosome segregation Sc STU1/ Sp dis1 Sc TID3/ Sp ncd10 Sc UBC11/ Sp ubc11 DNA repair Sc RAD23/ Sp rhp23 Sc EXO1/ Sp exo1 Sc HRR25/ Sp hhp1 sporulation regulated septin GSTUMT00000090001 TmelCdc11 8 AN4667.2 protein phosphatase polo kinase B-type cyclin B-type cyclin B-type cyclin B-type cyclin B-type cyclin GSTUMT00011110001 GSTUMT00008436001 GSTUMT00007556001 GSTUMT00005720001 GSTUMT00011271001 no no TmelCdc14 MANTUM00008436001 TmelNimE TmelCLB3 TmelCLB4 3 no no 1 6 AN5057.2 AN1560.2 nimE AN2137.2 meiotic recombination protein DNA-binding helix-hairpin-helix protein, DNA strand exchange recombinatino and meioic nuclear division, interacts with Hop2 prevents synapsis between nonhomologous chromosomes cohesin cohesin complex (meiotic) no GSTUMT00009804001 no no GSTUMT00000412001 GSTUMT00012822001 TmelDMC1 14 Tmelsmc3 Tmelrec8 2 no No AN9092.2 AN1843.2 No AN6364.2 No spindle pole body component chromosome segregation, kinetochore-associated Ndc80 complex ubiquitin-conjugating enzyme, chromosome segregation in Sp GSTUMT00003608001 GSTUMT00000570001 GSTUMT00001653001 MANTUM00003608001 TmelTID3_like MANTUM00001653001 7 1 1 AN5521.2 AN4969.2 AN5495.2 DNA excision-repair, NEF2 subunit DNA repair, recombination casein kinase involved in DNA repair and chromsome segregation GSTUMT00006514001 GSTUMT00009088001 GSTUMT00002552001 TmelRAD23 TmelEXO1 TmelHRR25 30 1 25 AN2304.2 AN3035.2 AN4563.2 Other genes in the conserved meiotic core program Sc HUL4/ SPBP87.27 Sc LEE1/ Sp scp3 Sc ENA2/ Sp cta3 hect domain E3 ubiquitin-protein ligase Zn finger transcription factor, unknown function P-type ATPase sodium pump GSTUMT00002325001 GSTUMT00003772001 GSTUMT00000681001 TmelHUL4 TmelScp3 TmelENA2_like 10 6 4 Sc PMC1/ SPBC839.06 Sc CHS1/ Sp chs1 Sc ISA1/ SPCC645.03C Sc HTZ1/ Sp pht1 Sc AUT7/ SPBP8B7.24C Sc BAG7/ SPBC557.01 Sc ROM2/ SPAC1006.06 Sc RAS2/ Sp ras1 Sc GNA1/ Sp gna1 Sc SGA1/ Sp meu17 Sc CLG1/ SPBC1D7.03 vacuolar ATPase Ca2+ pump chitin synthase, pheromone inducible mitochondrial matrix protein, iron metabolism histone H2AZ varient required for autophagic vesicle delivery to vacuole in Sc Rho-GAP Rho-GEF Ras glucosamine acetyl transferase involved in cell cycle progression sporulation-specific glucosamylase, Sp Mei4 target gene cyclin-like protein interacts with Sc Pho85 GSTUMT00003738001 GSTUMT00011849001 GSTUMT00009441001 GSTUMT00009338001 GSTUMT00002234001 GSTUMT00005421001 GSTUMT00000084001 GSTUMT00002293001 GSTUMT00004721001 GSTUMT00004366001 GSTUMT00003449001 TmelPmc1 TmelCHS1 TmelISA1 TmelHTZ1 TmelAUT7 TmelBag7 TmelROM2 TmelRAS2 TmelGNA1 TmelSGA1_like TmelCLG1 15 25 8 12 122 9 9 2 no 359 29 AN0444.2 AN3447.2 AN6642.2, AN1628.2 AN1189.2 AN4566.2 AN1974.2 AN8039.2 AN5131.2 AN7650.2 AN4719.2 AN0182.2 AN8706.2 AN8904.2 AN4984.2 40 SUPPLEMENTARY INFORMATION Sc CYB2/ SPAB1A11.03 Sc ECM4/ SPCC1281.07C cytochrome-c oxidoreductase glutathione-S-transferase domain, unknown function Sc TOS7/ SPCC1739.10 Sc ARN2/ SPCC61.01C Sc GTT1/ SPAC688.04C Sc RIB5/ SPCC1450.13C Sc CHO1/ SPCC1442.12 Sc XKS1/ SPCPJ732.02C Sc PCT1/ SPCC1827.02C Sc ELC1/ SPBC1861.07 Sc SYF2/ SPBC3E7.13C Sc PGM2/ SPBC32F12.10 Sc RKI1/ Sp ppi Sc SUR4/ SPAC1B2.03C Sc PIB1/ SPBC36B7.05C Sc PIN3/ Sp csh3 Sc FBP1/ Sp fbp1 Sc GLG1/ SPBC4C3.08 Sc ARE2/ SPAC13G7.05 Sc GDI1/ Sp gdi1 Sc PDC1/ SPAC3H8.01 Sc OXR1/ SPAC8C9.16C Sc KGD1// SPBC3H7.03C unknown function in yeasts, pH regulation in A. nidulans ARN family of transporters for siderophore-iron chelates ER associated glutathione S-transferase riboflavin synthase, alpha subunit phosphatidylserine synthase xyulose kinase CTP:phosphocholine cytidylyltransferase transcription elongation factor splicesome component phosphoglucomutase ribose-5-phosphate isomerase long chain fatty acid elongation enzyme RING-type ubiquitin ligase, FYVE finger domain SH3-domain protein fructose-1,6-biphosphatase self-glucosylating initiator of glycogen synthesis acyl-CoA:sterol acyltransferase secretory pathway regulator pyruvate decarboxylase unknown function mitochondrial alpha-ketoglutarate dehydrogenase complex Sc damage response, related to mammalian membrane progesterone receptors Sc DAP1/ SPAC26H5.15 Other genes involved in Mating, Karyogamy and Meiosis in Budding and Fisson Yeast Sc BIM1 microtubule-binding protein Sc BNI1/Sp fus1 formin Sc CDC31 spindle pole body component Sc CMK2 calmodulin-dependent protein kinase Sc CSM1 mediates accurate chromosome segregation during Meiosis I Sc CSM3 mediates accurate chromosome segregation during meiosis Sc DIT1 sporulation-specific enzyme required for spore wall maturation Sc HO endonuclease, mating type switching Sc IDS2 modulator of Ime2 activity Sc IME1 master transcriptional regulator of meiosis Sc IME2/Sp mde3 inducer of meiosis, S/T kinase Sc IME4 mRNA N6-adenosine methyltransferase, IME1 regulation 41 GSTUMT00008475001 GSTUMT00010048001 GSTUMT00004090001 GSTUMT00006030001 GSTUMT00002078001 NO GSTUMT00010331001 GSTUMT00003859001 GSTUMT00006227001 GSTUMT00008349001 GSTUMT00003942001 GSTUMT00010887001 GSTUMT00002911001 GSTUMT00001433001 GSTUMT00009363001 GSTUMT00003835001 GSTUMT00002841001 NO GSTUMT00000649001 GSTUMT00005701001 GSTUMT00001327001 GSTUMT00006399001 GSTUMT00006321001 GSTUMT00003778001 GSTUMT00000661001 TmelCYB2 TmelGto_like(1) TmelGto_like(2) TmelGto_like(3) TmePalI 19 4 14 no 2 AN3901.2 AN5831.2 TmelGTT1 TmelRIB5 TmelCHO1 TmelXKS1 TmelPCT1 TmelELC1 TmelSYF2 TmelPgmA TmelRki1 TmelSUR4 TmelPIB1_like 6 10 4 1 3 no 4 4 12 7 7 Tmelfbp Tmelgyg TmelARE2_like TmelRab-gdi TmelPDC1 TmelOXR1 Tmelkgd1 149 10 13 10 270 7 19 palI AN5378.2 AN0629.2 AN4231.2 AN5661.2 AN8790.2 AN1357.2 No AN1861.2 AN2867.2 AN2440.2 AN8117.2 AN0627.2* AN2995.2 AN5604.2 AN4082.2 AN4208.2 AN5895.2 AN4888.2 AN3004.2 AN5571.2 GSTUMT00001452001 TmelDap1 4 AN4939.2 GSTUMT00005677001 GSTUMT00006074001 GSTUMT00001917001 GSTUMT00008759001 no GSTUMT00004513001 no no no no GSTUMT00001158001 no TmelEB1_like TmelsepA TmelCDC31 TmelCmk2_like 14 7 2 18 TmelCSM3 1 TmelIme2_like 2 AN2862.2 AN6523.2 AN5618.2* AN2412.2 No No AN2705.2 No No No AN6243.2 No SUPPLEMENTARY INFORMATION Sc KAR1 Sc KAR3/Sp pkl1 Sc KAR4 Sc KAR5/Sp tht1 Sc KAR9 Sc MUM2 Sc NDT80 Sc RIM4 Sc RME1 Sc SPO1 Sc SPO13 Sc SST2/Sp rgs1 Sc SPO22 Sc SUM1 Sc UME3/SSN8/Sp ume3 Sc UME6 required for karyogamy kinesin-like motor required for karyogamy TF required for mating and meiosis nuclear membrane fusion during karyogamy cytoplasmic microtubule orientation during karyogamy essential for meiotic DNA replication meiosis-specific transcriptional activator RNA-binding protein, early and middle sporulation gene expression Zn finger transcriptonal repressor of IME1 meiosis-specific phospholipase B meiosis specific protein required for Meiosis I and II RGS protein, regulates desensitization to alpha-factor meiosis-specific phospholipase A2 transcriptional repressor of middle sporulation-specific genes C-type cyclin C6 Zn finger regulator of early meiotic genes no GSTUMT00010256001 no GSTUMT00012548001 GSTUMT00006847001 no GSTUMT00006371001 GSTUMT00011359001 no no no GSTUMT00010835001 no no GSTUMT00010769001 no Sp pat1 Sp atf21 Sp bgs1 Sp cmk1 Sp dhc1 Sp gpa1 Sp hsk1 Sp isp5 Sp isp6 Sp lid2 Sp map1 Sp map2 Sp mat1-Mc topisomerase II associated protein (Pat1) bZip TF, involved in regulation of meiosis 1,3-beta-glucan synthase subunit, Sp Mei4 target gene calmodulin-dependent protein kinase dynein heavy chain, homologue of Sc DYN1 GTP binding (alpha-1 subunit) involved in conjugation homologue of Sc CDC7, Dbf4-dependent kinase amino acid permease involved in sexual differentiation serine protease involved in sexual differentiation homologue of Um rum1, Sc ECM5 MADS-box domain TF, pheromone receptor activator P-factor pheromone mating-type M-specific polypeptide Mc, HMG-box TF Sp mat1-Mi Sp mat1-Pc Sp mat1-Pi Sp mde5 mating-type M-specific polypeptide Mi mating-type P-specific polypeptide Pc, HMG-box TF mating-type P-specific polypeptide Pi, homeodomain TF Mei4-dependent protein 5 Sp mde6 Sp mde7 Sp mfm1,mfm2,mfm3 Sp mei2 Sp mei3 ketch repeat protein, Sp Mei4 target gene RNA-binding protein involved in meiosis, Sp Mei4 target M-factor pheromone precursor RNA-binding protein involved in meiosis meiosis inducing protein, inactivates Sp Ran1 GSTUMT00010739001 GSTUMT00007619001 GSTUMT00007493001 GSTUMT00007542001 GSTUMT00000794001 GSTUMT00011103001 GSTUMT00000259001 GSTUMT00001488001 GSTUMT00000037001 GSTUMT00011726001 GSTUMT00012198001 no GSTUMT00001090001* GSTUMT00008644001* no no no GSTUMT00006610001 GSTUMT00005820001 no GSTUMT00000091001 no GSTUMT00005188001 no 42 No AN6340.2 No ambiguous No No AN6015.2* No No No No flbA No No AN2172.2 No TmelKlpA_like 10 TmelKAR5_like TmelKAR9 no 2 TmelNDT80_like TmelRIM4 27 no TmelRGS_FLBA 69 TmelSSN8_like 115 TmelPat1 TmelATF21 Tmel_bgs1_like TmelCmk1 Tmeldync TmelGPA2 Tmelhsk1 MANTUM00001488001 TmelAlp2 Tmellid2 TmelMcm1like 14 2 8 25 6 6 1 1 55 4 27 AN2751.3 AN6849.2* AN3729.2 AN3065.2 nudA AN3090.2 AN3450.2 AN5678.2 AN0238.2 AN8211.2 AN8676.2* TmelMAT1-2-1 MANTUM00008644001 1 6 AN1962.2* Tmelamy1_like Tmelamy1 no 2 Tmelmde7_like 6 TmelMei2 1 No No No No No AN7700.2 No AN6494.2* No SUPPLEMENTARY INFORMATION Sp mei4 Sp mes1 Sp meu1 Sp meu14 Sp rad22 Sp ran1/pat1 Sp rec6 Sp rec10 Sp rec11 Sp rec15 Sp rem1 Sp rep1 Sp rhp51 Sp spo6 Sp ssm4 Sp sso1 Sp ste4 Sp ste6 Sp ste7 Sp sxa2 fork head domain TF, meiotic regulator meiosis II protein, Sp Mei4 target gene Sp Mei4 target gene involved in Meiosis II nuclear division, Sp Mei4 target gene DNA repair protein serine/threonine protein kinase, negative regulator of meiosis meiotic recombination protein sister chromatid cohesion sister chromatid cohesion meiotic recombination protein meiotic B-type cyclin regulator of pre-meiotic DNA replication homologue of Sc RAD51//52, Nc mei-3 homologue of Sc DBF4, required for orign of replication firing dynactin complex, homologue of Sc NIP100 syntaxin SAM domain, similar to Sc STE50 GEF involved in conjugation; related to Sc CDC25 meiotic suppressor protein serine carboxypeptidase, degrades extracellular P-factor GSTUMT00006159001 no GSTUMT00007435001 GSTUMT00006892001 GSTUMT00010491001 GSTUMT00001466001 no no no no no no GSTUMT00002392001 GSTUMT00000807001 GSTUMT00001061001 GSTUMT00002243001 GSTUMT00010173001 GSTUMT00007746001 no GSTUMT00010123001 *homology limited to functional domain; **Found in a different strain; Sc: Saccharomyces cerevisiae; Sp: Schizosaccharomyces pombe; An: Aspergillus nidulans. 43 TmelSep1 no TmelApsB TmelLSP1 Tmelrad22 TmelRan1 1 4 7 64 TmelRad51 TmelnimO Tmelro-3 TmelPsy1_like TmelSte50 TmelCdc25 4 1 2 45 7 6 TmelSxa2 no AN8858.2* No apsB AN3931.2 radC AN4935.2 No No No No AN2137.2 No uvsC nimO AN6323.2 AN3416.2 AN7252.2 AN2130.2 No AN2555.2 SUPPLEMENTARY INFORMATION Table S12. Genes involved in RNA silencing and DNA methylation in T. melanosporum genome. Gene model Name Putative function (length) EST number FLM FB Yeast BRH1 Acc. No. (length) Filamentous BRH2 % id. Acc. No. (length) % id. BC1G_15614 (1174) 48 NCU021784 (1050) 37 AN2717 (1204) 40 NCU08435 (1352) 32 NCU07534 (1403) 32 BC1G_10104 (1843) 39 NCU08270 (1585) 38 RNA silencing 1. RNA-dependent RNA polymerase GSTUMT00008250001 GSTUMT00008249001 GSTUMT00000785001 GSTUMT00001133001 TmelRRPA TmelRRPB TmelRRPC RNA-dependent RNA polymerase (1542) BEAH3: Neosartorya fischeri (1599) RNA-dependent RNA polymerase (1218) BEAH: Penicillium marneffei (1207) RNA-dependent RNA polymerase (1700) BEAH: Neurospora crassa (1402) 8 4 3 14 2 1 ----- ----- ----- ----- ----- ----- 2. Dicer GSTUMT00011356001 GSTUMT00001152001 TmelDCL1 TmelDCL2 RNA helicase/RNAse III, putative Dicer-like protein 1 (1423) BEAH: Neosartorya fischeri (1538) RNA helicase/RNAse III, putative Dicer-like protein 2 (1490) BEAH: Aspergillus clavatus (1389) ----- ----- No5 ----- 33 8 ----- No5 ----- NCU06766 (1422) 26 95 ----- ----- NCU04730 (1090) 40 1 3 ----- ----- No ----- BC1G_06939 (940) 39 NCU09434 (990) 34 3. Argonaute GSTUMT00011067001 GSTUMT00011068001 TmelAGO1 GSTUMT00010783001 TmelAGO2 GSTUMT00007541001 TmelAGO3 Argonaute-like protein (944) BEAH: Laccaria bicolor (961) Argonaute (863) BEAH: Schizosacc. japonicus (842) Argonaute (868) BEAH: Schizosacc. pombe (834) 5 44 2 ----- ----- SUPPLEMENTARY INFORMATION 4. RNA silencing accessory proteins (Ago-binding protein, H3 K9 methyltransferase and RISC helicases) GSTUMT00004385001 GSTUMT00006817001 GSTUMT00004046001 GSTUMT00004402001 GSTUMT00003241001 TmelARB1 putative AGO binding protein-Arb16 (457) TmelRQH1 ----- TmelDBP2 TmelCLR4 2 ATP-dependent DEAD/DEAH box helicase (1325) ATP-dependent DEAD/DEAH box helicase DEAD, helicase (898) ATP-dependent DEAD/DEAH box RNA helicase DBP2 (547) ----- 2 25 Histone H3 K9 methylase (355) 12 0 ----- ----- 48.27 2 ----- ----- 25 NP_014287.1 ( 546) No 44 NCU06316 (537) AN2087 (1535) 30 NCU08598 (1956) 53 BC1G_02690 (654) 42 MGG_12894 (549) 83 NCU07839 (547) AN1170 (552) 82 NCU04402 (332) 43 ----- NP_013915.1 (1447) 1 AN0157 (178) 54 72.03 48 ----- Yeast BRH=”best reciprocal hit” against S. cerevisiae. Filamentous BRH= “best reciprocal hit” against reference Ascomycetes (N. crassa, M. grisea, B. cinerea, A. nidulans) 3BEAH=”best absolute hit” (i.e., against the entire sequence database) 4Functionally characterized N. crassa homolog is indicated, even if not BRH 5Both genes have non-BRH homologs in S. cerevisiae belonging to the DEAH family of helicases (DNA replication) and RNA helicases (splicing), respectively. 6Argonaute siRNA chaperone (ARC) complex subunit Arb1, required for histone H3 Lys9 (H3-K9) methylation, heterochromatin assembly and siRNA generation in fission yeast. 1 2 DNA methylation 5. DNA methyltransferases GSTUMT00012206001 GSTUMT00003328001 GSTUMT00003329001 putative cytosine DNA methyltransferase (857) 0 cytosine (C5)-DNA methyltransferase (1124) 1 1 ----- TmelPP1 PP1 protein phosphatase catalytic subunit (325) 5 8 NP_011059 (312) TmelSDS22 PP1 protein phosphatase regulatory subunit (359) 2 Hetero-chromatinassociated, chromo domain protein HP1 (225) 13 TmelDMT1 TmelDMT2 6 ----- MGG_02795 (893) 40 NCU02034 (845) 38 ----- NCU02247 (1455) 37 91 NCU00043 (309) 99 AN10088 (356) 66 NCU08385 (384) 61 NCU04017 (267) 44 ----- 6. DNA methylation accessory components GSTUMT00009673001 GSTUMT00005072001 GSTUMT00000912001 TmelHP1 45 2 1 NP_012728 (338) 49 ----- ----- SUPPLEMENTARY INFORMATION Table S13. Distribution of genes involved core RNA silencing components and DNA methyltransferases in different Ascomycetes. core RNA silencing components Comparative abundance/distribution Tuber Neurospora Magnaporthe A. nidulans Podospora S. pombe, S. japonicus S. cerevisiae RdRP 3 3 3 2 2 1 0 Dicer 2 2 2 1 2 1 0 Argonaute 3 2 3 1 2 1 0 DNA methyltransferases Comparative abundance/distribution Tuber Ascobolus Neurospora Magnaporthe Aspergillus spp. Fusarium spp. Podospora S. cerevisiae, S. pombe DMT1 (specialized methylases) 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 0 DMT2 (general methylases) 46 SUPPLEMENTARY INFORMATION Table S14. Homologs of genes coding for the indicated mycotoxin biosynthesis components in T. melanosporum genome. Gene name Function Putative T. melanosporum homolog 1. Aflatoxin biosynthesis aflF Dehydrogenase aflU P450 monooxygenase aflT Putative ABC transporter aflC Polyketide synthase aflD Reductase aflB Fatty acid synthase β aflA Fatty acid synthase α aflR Transcriptional activator aflS Transcriptional enhancer aflH Alcohol dehydrogenase aflJ Esterase aflE NOR-reductase aflM Dehydrogenase aflN Monooxigenase aflG P450 monooxygenase aflL Desaturase aflI Oxidase aflO O-methyltransferase B aflP O-methyltransferase A aflQ Oxidoreductase aflK VERB synthase aflV P450 monooxygenase aflW Monooxigenase aflX Monooxigenase/oxidase nadA NADH oxidase hxtA hexose transporter glcA Glucosidase sugR Sugar regulator 2. Fumonisin biosynthesis NPT Nicotinate phosphoribosyl transferase WDR1 WD protein PNG1 Peptide N-glycanase ZNF1 Transcription factor/kinase ZBD1 Zinc-binding dehydrogenase/reductase FUM1 Polyketide synthase FUM6 P450 monooxygenase FUM7 Dehydrogenase FUM8 Aminotransferase FUM9 Dioxygenase FUM10 Fatty acyl-CoA synthetase FUM11 Tricarboxylate transporter FUM12 P450 monooxygenase FUM13 Short-chain dehydrogenase/reductase FUM14 Peptide synthetase condensation domain FUM15 P450 monooxygenase FUM16 Fatty acyl-CoA synthetase FUM17 Longevity assurance factor FUM18 Longevity assurance factor FUM19 ABC transporter ORF20 Transcription factor ORF21 Transcription factor/kinase MPU1 Mannose-P-dolichol utilization 3. Gliotoxin biosynthesis AFUA6G09580 C6 finger domain protein AFUA6G09590 Zinc alcohol dehydrogenase AFUA6G09600 Zinc metallo peptidase AFUA6G09610 Non-ribosomal peptide synthase AFUA6G09620 Hypothetical protein AFUA6G09630 C6 finger domain protein AFUA6G09640 Aminotransferase GliI AFUA6G09650 Membrane dipeptidase AFUA6G09660 Non-ribosomal peptide synthase AFUA6G09670 P450 oxidoreductase AFUA6G09680 O-methyltransferase AFUA6G09690 Glutathione S-transferase AFUA6G09700 Gliotoxin biosynthesis protein AFUA6G09710 MFS gliotoxin efflux transporter AFUA6G09720 Methyltransferase AFUA6G09730 P450 oxidoreductase 4. Trichothecene biosynthesis 47 YES YES YES YES NO YES YES NO NO YES YES YES YES YES YES YES NO YES YES YES YES YES YES NO YES YES YES NO YES YES YES YES NO YES YES NO YES NO YES YES YES NO NO YES YES YES YES YES NO NO YES YES YES NO YES NO NO NO YES YES NO NO YES YES YES YES YES SUPPLEMENTARY INFORMATION TRI5 TRI4 TRI11 TRI101 TRI3 TRI8 TRI13 TRI7 Trichodiene synthase Trichodiene oxygenase Isotrichodermin C-15 hydroxylase Trichothecene 3-O-acetyltransferase 15-O-acetyltransferase T-2 toxin biosynthesis protein P450 monooxygenase T-2 toxin biosynthesis protein 48 NO YES NO NO NO NO NO NO SUPPLEMENTARY INFORMATION Table S15. Genes involved in sulfur assimilation, metabolism and S-volatile compounds in T. melanosporum genome. Gene model1 Name2 EST number3 Putative function (length)2 FLM Yeast BRH4 FB Acc. No. (length) Filamentous BRH4 % id. Acc. No. (length) % id. Sulfur assimilation 1. Sulfate internalization & reduction GSTUMT00000861001 TmelSUL1 Sulfate permease (826) 0 77 NP_009853.1 (859) 41.97 GSTUMT00005125001 TmelSUL2 Sulfate permease (520) 0 0 No ----- Sulfate permease (719) 1 24 NP_015328.1 (754) 35.19 TmelST2 Similar to sulfate (and other anions) transporter (1091) 1 69 NP_011641.1 (1036) 40.88 NCU02632 (1119) 56.18 TmelMET3 ATP sulfurylase (566) 9 168 NP_012543.1 (511) 64.02 NCU01985 (574) 78.84 GSTUMT00003745001 TmelMET14 Adenosine 5'-phosphosulfatekinase (200) 0 10 NP_012925.1 (202) 70.35 MGG_06348 (209) 76.10 GSTUMT00002663001 TmelMET16A PAPS reductase (273) 53 139 NP_015493.1 (261) 54.15 GSTUMT00001561001 TmelMET16B 1 9 No ----- No ----- GSTUMT00001411001 TmelTRX1 34 34 NP_013144.1 (103) 43.56 AN0170 (190) 53.66 GSTUMT00008708001 TmelTRR1 Thioredoxin reductase (344) 20 3 NP_010640.1 (319) 70.49 MGG_01284 (329) 74.59 GSTUMT00006994001 TmelMET22 3'(2'),5'-bisphosphate nucleotidase (342) 0 33 NP_014577.1 (357) 43.92 GSTUMT00001735001 GSTUMT00000585001 GSTUMT00002747001 TmelST1 PAPS reductase (289) Thioredoxin (119) 49 BC1G_02187 (826) No BC1G_03076 (668) BC1G_12227 (309) MGG_04311 (355) 56.99 ----55.85 62.45 57.88 SUPPLEMENTARY INFORMATION GSTUMT00000610001 TmelMET10 GSTUMT00010016001 TmelECM17 Sulfite reductase alpha subunit (1044) Sulfite reductase beta subunit (1517) 11 26 9 13 NP_116686.1 (1035) NP_116579.1 (1442) 38.68 50.26 BC1G_04925 (1098) BC1G_08712 (1531) 54.41 AN2229 (490) 59.34 BC1G_09615 (525) 69.72 AN8277 (438) AN8057 (371) AN1513 (429) 82.08 68.62 2. Cys/Met biosynthesis & interconversion GSTUMT00000943001 TmelMET2 Homoserine acetyl-transferase (438) 1 25 NP_014122.1 (486) 50.53 GSTUMT00004828001 TmelCYSA Serine acetyl-transferase (459) 2 1 No ----- GSTUMT00009485001 TmelMET17 13 29 NP_013406.1 (444) 62.79 GSTUMT00003830001 TmelCYSB 4 1 No ----- GSTUMT00008902001 TmelCYSSYNTH2 0 0 NP_011526.1 (393) 53.89 GSTUMT00009043001 TmelSTR2 Cystathionine gamma-synthase (542) 7 7 NP_012664.1 (639) 40.72 AN3456 (670) 55.62 GSTUMT00006346001 TmelSTR3 Cystathionine beta-lyase (447) 1 5 NP_011331.1 (465) 48.66 AN7051 (459) 77.28 GSTUMT00006909001 TmelCYS4 Cystathionine beta-synthase (510) 1 4 NP_011671.1 (507) 52.48 BC1G_10936 (449) 79.69 GSTUMT00010528001 TmelCYS3 Cystathionine gamma-lyase (411) 157 333 NP_009390.1 (394) 65.45 AN1446 (420) 75.44 GSTUMT00005323001 TmelMET6 4 8 NP_011015.1 (767) 60.65 BC1G_12307 (768) 74.48 GSTUMT00005542001 TmelMHMT 0 0 ----- ----- AN5019 (439) 24.34 GSTUMT00008874001 TmelSAM1 0 17 75.20 TmelSAH1 9 32 NCU02657 (396) MGG_05155 (450) 87.37 GSTUMT00011415001 NP_010790.1 (384) NP_010961.1 (449) Homocysteine synthase (452) Cysteine synthase (367) Cysteine synthase, mitochondrial (419) cobalamin-independent Met synthase (774) similar to MHMTase, truncated cobalamin-independent Met synthase (376) S-adenosyl-methionine synthetase (382) Adenosyl-homo-cysteinase (417) Cys/Met utilization & catabolism 3a. Cysteine catabolism Sulfinoalanine-independ. Cys degradation 50 75.30 77.51 68.98 83.21 SUPPLEMENTARY INFORMATION GSTUMT00006350001 TmelAAT2_SPT Aspartate amino-transferase (417) 3 10 NP_013127.2 (418) 51.57 AN6048 (446) 69.93 GSTUMT00000780001 TmelTHIOTRANS Thiosulfate sulfur-transferase (336) 3 11 NP_014894.1 (304) 41.39 MGG_04087 (346) 50.77 Sulfinoalanine-depend. Cys degradation GSTUMT00004091001 TmelCDI1 Cysteine dioxygenase (223) 1 3 ----- ----- NCU06625 (228) 51.72 GSTUMT00003243001 TmelAAT Aspartate amino-transferase (410) 4 5 No ----- AN1993 (430) 72.66 NCU06112.3 ( 545) similar to glutamic acid decarboxylase isoform 67 Contig4544 ----- Sulfinoalanine decarboxylase 2 10 No ----- Contig3853 ----MGG_03869.6 ( 515) cysteine sulfinic acid decarboxylase TmelNFR1 AIF-like (NAD/FAD) oxido-reductase (551) 14 21 ----- ----- BC1G_09030 (548) 59.27 GSTUMT00002834001 TmelTDI1B Taurine dioxygenase (361) 20 0 NP_013043.1 (412) 43.03 AN6739 (377) 56.39 GSTUMT00003308001 TmelTDI1A Taurine dioxygenase (381) 49 1 No ----- MGG_03117 (372) 54.05 GSTUMT00010867001 TmelTDI1C Taurine dioxygenase (393) 3 99 No ----- BC1G_03189 (395) 64.91 7 6 72.14 1 NCU04636 (506) No 81.82 0 NP_009912.2 (497) No GSTUMT00001651001 Taurine degradation 3b. Cysteine desulfuration GSTUMT00004448001 TmelNFS1_1 GSTUMT00010387001 TmelNFS1_2 Cysteine desulfurase (404) Cysteine desulfurase 51 ----- ----- SUPPLEMENTARY INFORMATION GSTUMT00010387001 TmelNFS1_2 GSTUMT00004634001 TmelNFS1_3 GSTUMT00003703001 No express data TmelNFS1_4 GSTUMT00002637001 TmelNFS1_5 GSTUMT00002211001 No express data GSTUMT00001222001 No express data TmelNFS1_6 TmelNFS1_7 GSTUMT00010391001 TmelNFS1_8 GSTUMT00004633001 TmelNFS1_9 GSTUMT00008466001 TmelNFS1_10 GSTUMT00006565001 TmelNFS1_11 GSTUMT00008685001 TmelNFS1_12 GSTUMT00010389001 TmelNFS1_13 GSTUMT00007003001 TmelNFS1_14 GSTUMT00000388001 TmelNFS1_15 GSTUMT00008467001 TmelNFS1_16 GSTUMT00003702001 TmelNFS1_17 GSTUMT00001219001 TmelNFS1_18 GSTUMT00008793001 TmelNFS1_19 Cysteine desulfurase (374) Cysteine desulfurase (562) Cysteine desulfurase (277) Cysteine desulfurase (379) Cysteine desulfurase (276) Cysteine desulfurase (345) Cysteine desulfurase (276) Cysteine desulfurase (635) Cysteine desulfurase (398) Cysteine desulfurase (234) Cysteine desulfurase (605) Cysteine desulfurase (247) Cysteine desulfurase (247) Cysteine desulfurase (194) Cysteine desulfurase (218) Cysteine desulfurase (107) Cysteine desulfurase (590) Cysteine desulfurase (199) 0 1 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 2 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 2 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 0 0 No ----- No ----- 4 0 No ----- No ----- 0 0 No ----- No ----- NP_011569.1 (574) 33.27 NCU10675 (604) 43.33 Methionine uptake, modification & catabolism 4. Methionine uptake & utilization GSTUMT00005629001 TmelMUP1 Methionine permease (638) 7 52 0 SUPPLEMENTARY INFORMATION GSTUMT00011592001 TmelMUPA Methionine permease (690) 4 2 No ----- BC1G_05423 (573) 38.11 GSTUMT00006065001 MANTUM00006065001 Methionine (& other) permease (576) 2 3 No ----- AN10825 (610) 58.59 GSTUMT00009497001 MANTUM00009497001 Methionine (& other) permease (540) 0 34 No ----- MGG_00285 (508) 36.34 GSTUMT00007355001 TmelTMT1 similar to thiol methyl-transferase (267) 0 2 ----- ----- AN6094 (284) 47.55 GSTUMT00012494001 TmelTMT2 0 0 ----- ----- No No GSTUMT00012778001 TmelTMT3 0 0 ----- ----- No No 6 403 NP_010960.1 (184) 44.77 AN10562 (214) 66.67 0 0 NP_009897.1 (168) 51.52 MGG_02496 (148) 67.69 5 44 NP_011313.1 (500) 38.63 BC1G_07219 (524) 60.59 5 5 No ----- AN8172 (591) 45.56 50 220 NP_013145.1 (563) 51.61 NCU02397 (519) 57.17 4 179 NP_014032.1 (348) 60.68 AN2286 (353) 70.57 12 3 No ----- NCU09285 (347) 61.00 539 1 NP_014051.1 (360) 43.29 AN5355 (360) 54.60 similar to thiol methyl-transferase (250) similar to thiol methyl-transferase (273) Methionine sulfoxide reduction Peptide methionine sulfoxide reductase (175) similar to protein-methionine-RGSTUMT00000808001 TmelMSRB oxide reductase (199) 5. Met degradation via 2-oxo 4-methyl-thiobutanoate-Ehrlich/”fusel” pathway Aromatic amino acid amino GSTUMT00008805001 TmelARO8A transferase (575) Aromatic amino acid amino GSTUMT00009292001 TmelARO8B transferase (570) 2-oxo acid/ phenylpyruvate GSTUMT00006321001 TmelPDC1ARO10 decarboxylase (563) Alcohol dehydrogenase, class V GSTUMT00006862001 TmelALCC/ADH5A (349) hypothetical protein similar to GSTUMT00006980001 TmelADH5B alcohol dehydrogenase (364) Alcohol dehydrogenase, class V GSTUMT00001645001 TmelADH6 (346) GSTUMT00004791001 TmelMSRA 53 SUPPLEMENTARY INFORMATION GSTUMT00001074001 TmelADH4 Alcohol dehydrogenase, class IV (492) 26.37 AN1868 (495) 69.45 NP_013117.1 (337) 43.36 AN10230 (356) 62.46 0 NP_015443.1 (411) 41.79 AN4290 (380) 57.96 0 4 NP_012558.1 (244) 55.70 AN3593 (241) 61.18 1 0 NP_010876.2 (227) 35.51 BC1G_15439 (231) 45.70 2 0 NP_013722.1 (179) 45.93 AN9527 (179) 70.59 0 0 NP_014590.1 (396) 33.48 MGG_14297 (479) 55.21 0 1 0 3 1 NP_011258.1 (465) 6. Methionine salvage pathway GSTUMT00007013001 TmelMEU1 GSTUMT00011972001 TmelMRI1 GSTUMT00008825001 TmelMDE1 GSTUMT00002414001 TmelUTR4 GSTUMT00006393001 TmelADI1 GSTUMT00011721001 TmelSPE2 Methylthio-adenosine phosphorylase (402) Methyl-thioribose 1-phosphate isomerase (363) Methyl-thioribulose-1-phosphate dehydratase (233) 2,3-diketo-5-methyl-thiopentyl-1-P enolase-phosphatase (256) Acireductone dioxygenase (Fe2+/Ni2+) (173) SAM decarboxylase proenzyme (495) Glutathione biosynthesis, reduction/oxidation & utilization 7. GSH biosynthesis GSTUMT00005466001 TmelGSH1 γ-glutamyl-Cysteine synthetase (655) 11 2 NP_012434.1 (678) 41.35 MGG_07317 (696) 62.48 GSTUMT00006508001 TmelGSH2 Glutathione synthetase (512) 5 7 NP_014593.1 (491) 39.88 BC1G_07364 (510) 51.44 8. Glutathione reduction/oxidation & utilization GSTUMT00003950001 TmelGLR1 Glutathione reductase- NADPH (467) 10 28 NP_015234.1 (483) 56.87 MGG_12749 (485) 68.51 GSTUMT00005402001 TmelHYR1 Glutathione peroxidase (158) 8 19 NP_012303.1 (163) 51.57 NCU09534 (168) 49.40 GSTUMT00006625001 TmelGST2 Glutathione S-transferase (224) 307 1 NP_014170.1 (354) 38.56 MGG_09138 (224) 44.50 GSTUMT00006029001 GSTUMT00006030001 TmelGST3B Glutathione S-transferase (337) 0 0 No ----- No ----- GSTUMT00006718001 TmelGSTO1 Glutathione S-transferase (234) 100 53 ----- ----- AN3299 (237) 45.21 54 SUPPLEMENTARY INFORMATION GSTUMT00010331001 TmelGTT1 Glutathione S-transferase (303) 0 6 NP_012304.1 (234) 33.82 BC1G_15218 (250) 46.12 GSTUMT00010048001 TmelECM4 Glutathione S-transferase (330) 3 1 NP_013002.1 (370) 47.22 AN10273 (345) 63.35 GSTUMT00003178001 TmelGSTR1 Glutathione S-transferase related (332) 2 11 ----- ----- MGG_12319 (330) 38.95 GSTUMT00004090001 TmelGST3A Glutathione S-transferase (360) 5 9 No ----- NCU04368 (347) 59.16 GSTUMT00011418001 TmelGSTR2 3 1 No ----- AN10695 (290) 66.39 GSTUMT00001002001 TmelPCS 0 0 ----- ----- ----- ----- GSTUMT00008111001 TmelGFA1 3 0 ----- ----- AN7594 (137) 32.14 GSTUMT00006398001 TmelGFA2 0 0 ----- ----- MGG_12759 (158) GSTUMT00005441001 TmelFDH1 2 1 NP_010113.1 (386) 69.6 AN7632 (380) 83.11 GSTUMT00004188001 TmelECM38 0 21 NP_013402.1 (660) 36.01 MGG_02134 (696) 36.31 GSTUMT00003663001 TmelGGTASE 5 3 No ----- AN5658 (607) 53.71 GSTUMT00010539001 TmelDAP_1 3 9 NP_011893.1 (818) 42.34 BC1G_13641 (922) 62.84 GSTUMT00002049001 TmelDAP2 2 0 No ----- AN2572 (723) 47.81 GSTUMT00003350001 TmelDAP3 0 0 ----- ----- No ----- GSTUMT00000118001 TmelDAP4 0 1 ----- ----- AN6438 (774) 53.00 GSTUMT00000119001 GSTUMT00000120001 TmelDAP5 0 0 ----- ----- ----- ----- Glutathione S-transferase related (481) Phytochelatin synthase (416) similar to glutathione-dependent formaldehyde-activating enzyme (159) similar to glutathione-dependent formaldehyde-activating enzyme (120) S-(hydroxy-methyl) GSH dehydrogenase (373) γ-glutamyl-trans-peptidase (657) γ-glutamyl-trans-peptidase (591) Dipeptidyl amino-peptidase (cytoplasmic) (907) Dipeptidyl amino-peptidase (secreted) (710) Dipeptidyl amino-peptidase (cytoplasmic) (654) Dipeptidyl amino-peptidase (secreted) (757) Dipeptidyl amino-peptidase (cytoplasmic) 55 31.62 SUPPLEMENTARY INFORMATION GSTUMT00000119001 GSTUMT00000120001 TmelDAP5 Dipeptidyl amino-peptidase (cytoplasmic) (546) Dipeptidyl amino-peptidase GSTUMT00000121001 TmelDAP6 (secreted) (443) Dipeptidyl amino-peptidase GSTUMT00000122001 TmelDAP7 (cytoplasmic) (782) 9. Ancillary biosynthetic/interconversion reactions & accessory components Choline sulfatase (573) 0 0 ----- ----- ----- ----- 0 0 ----- ----- ----- ----- 1 ----- ----- ----- ----- ----- 0 0 ----- ----- AN6847 (617) 52.9 3 7 ----- ----- AN3341 (511) 61.02 0 0 ----- ----- ----- ----- 4 2 ----- ----- AN8058 (381) 50.26 GSTUMT00006469001 TmelCHS1 GSTUMT00004987001 TmelCHR1 GSTUMT00000370001 TmelCHR2 GSTUMT00003848001 TmelSUOX1 GSTUMT00009048001 TmelHOM6 Homoserine dehydrogenase (367) 1 2 NP_012673.1 (359) 49.29 MGG_11450 (371) 60.56 GSTUMT00007110001 TmelMET7 Folyl polyglutamate synthase (521) 2 7 NP_014884.1 (548) 46.00 AN3840 (518) 48.34 GSTUMT00003931001 TmelFOL3 4 4 38.68 TmelSHM2 0 7 GSTUMT00005470001 TmelGSTHM 3 5 No ----- NCU01337 (413) BC1G_06851 (478) AN10745 (601) 45.35 GSTUMT00006315001 NP_013831.1 (427) NP_013159.1 (469) GSTUMT00003351001 TmelMET12 1 4 NP_015302.1 (657) 45.18 BC1G_02720 (664) 58.01 GSTUMT00010817001 TmelMET13 12 66 NP_011390.2 (600) 54.19 AN5883 (628) 64.55 GSTUMT00008511001 TmelMOCOBP 0 0 ----- ----- MGG_08902 (212) 47.59 GSTUMT00001509001 TmelMOCOSULF1 0 7 ----- ----- BC1G_15280 (814) 52.11 GSTUMT00008546001 TmelMOCOSULF2 0 0 ----- ----- No ----- putative chromate efflux transporter (518) putative chromate efflux transporter (487) Sulfite oxidase (402) Dihydrofolate synthetase (415) Serine hydroxymethyl transferase (473) Glycine hydroxylmethyl transferase (502) 5,10-Methylene tetrahydrofolate reductase 1 (608) 5,10-Methylene tetrahydrofolate reductase 2 (600) similar to Mo cofactor biosynth protein (254) putative Mo cofactor sulfurase (780) putative Mo cofactor sulfurase/ SeCys lyase 56 68.80 76.81 70.25 SUPPLEMENTARY INFORMATION GSTUMT00004264001 TmelMOBP GSTUMT00002750001 TmelLIPA putative molybdopterin-binding protein (296) putative lipoate synthase (426) 44.36 AN0183 (310) 58.50 NP_014839.1 (414) 65.8 BC1G_01056 (372) 73.58 20 ----- ----- MGG_11778 (162) 4 12 ----- ----- AN4361 (295) 10 7 NP_012594.1 (352) 3 0 NP_010539.1 (191) 47.46 AN8663 (731) 82.14 3 33 NP_012218.1 (640) 42.17 AN6359 (679) 57.55 54 25 NP_010615.1 (194) 48.19 BC1G_01974 (168) 79.25 1 2 NP_014508.1 (121) 54.87 MGG_08844 (120) 79.83 3 0 3 1 53 NP_013903.1 (274) 10. S-metabolism-related transcription factors & regulators Sulfur metabolite activator control protein (289) Sulfur metabolite activator control protein (257) Sulfur metabolite activator control protein (351) Zn-finger DNA-binding regulator of Met biosynthetic genes (413) S-metabolite repression control protein (SconB); Ub-ligase component (717) S-metabolism negative regulator SconC/SkpA component of SCF Ub-ligase (160) HrtA subunit of SCF Ub-ligase (112) 61.68 GSTUMT00008609001 TmelCYS3-1 GSTUMT00004031001 TmelCYS3-2 GSTUMT00000814001 TmelCBF1 GSTUMT00012172001 TmelMET32 GSTUMT00000398001 TmelMET30 GSTUMT00006439001 TmelSKP GSTUMT00002417001 TmelHRT1 GSTUMT00000778001 TmelAPC11 putative Apc11 subunit of APC/C Ub ligase (86) 1 4 NP_010276.1 (165) 46.46 AN10394 (105) 78.48 GSTUMT00006793001 TmelCUL3 CulC subunit of SCF Ub-ligase (757) 2 6 NP_011517.1 (744) 25.10 AN3939 (824) 47.57 57 49.56 BC1G_07375 (296) 49.22 82.61 SUPPLEMENTARY INFORMATION Table S16. Genes involved in non-polyketide secondary metabolism in T. melanosporum genome. Gene model Name EST number Putative function (length) Yeast BRH Acc. No. (length aa) FLM FB 5 44 NP_011313.1 (500) 5 5 No 3 3 NP_011079.1 (443) 3 49 2 %id. Filamentous BRH Acc. No. %id. (length aa) 1. Fusel alcohol/acid formation (Ehrlich pathway) GSTUMT00008805001 TmelARO8A GSTUMT00009292001 TmelARO8B GSTUMT00008941001 MANTUM00008941001 GSTUMT00003753001 TmelBAT1 GSTUMT00011000001 TmelTOXF GSTUMT00000969001 TmelECA39 GSTUMT00006321001 TmelPDC1 GSTUMT00005441001 TmelFDH1 GSTUMT00006862001 TmelADH5A GSTUMT00006980001 TmelADH5B GSTUMT00001645001 TmelADH6 GSTUMT00001074001 TmelADH4 GSTUMT00004330001 TmelALD5 aromatic amino acid aminotransferase (575 ) aromatic amino acid aminotransferase (579) putative aromatic amino acid aminotransferase (445) branched-chain-amino-acid aminotransferase (mitochondrial) (405) putative branched-chain amino acid aminotransferase (367) branched-chain amino acid aminotransferase (mitochondrial precursor) (449) pyruvate decarboxylase (563) S-(hydroxymethyl) glutathione dehydrogenase (373) alcohol dehydrogenase 3 (349) hypothetical protein similar to alcohol dehydrogenase (364) NADP-dependent alcohol dehydrogenase 6 (359) NAD-dependent methanol dehydrogenase (492) aldehyde dehydrogenase (496) 58 38.63 BC1G_07219 (524) 60.59 AN8172 (591) 45.56 40.00 AN0780 (438) 34.09 No ----- No ----- 0 No ----- AN0385 (395) 63.10 5 106 NP_012682.1 (376) 50 220 NP_013145.1 (563) 2 1 NP_010113.1 (386) 15 168 NP_014032.1 (348) 12 3 No 539 1 NP_014051.1 (360) 0 1 NP_011258.1 (465) 6 35 NP_010996.1 (520) ----- 41.47 51.61 69.60 60.68 ----43.29 26.37 55.65 BC1G_10603 (406) NCU02397 (519) AN7632 (380) AN2286 (353) NCU09285 (347) BC1G_09005 (402) BC1G_02180 (299) BC1G_06362 (497) 54.79 57.17 83.11 70.57 61.00 55.12 70.89 73.43 SUPPLEMENTARY INFORMATION GSTUMT00012138001 TmelSNQ2L GSTUMT00007904001 TmelSNQ2L1 putative ABC-cassette multidrug transporter (1514) putative ABC-cassette multidrug transporter (1405) 1 1 NP_010294.1 (1501) 36.73 BC1G_03332 (1562) 0 0 No ----- BC1G_00425 (1476) 4 2 NP_015297.1 (398) 3 6 No ----- BC1G_02275 (435) 11 87 NP_013580.1 51.50 BC1G_09652 (460) 3 5 NP_013555.1 (1045) 1 1 NP_013935.1 (443) 2 5 NP_013947.1 (451) 2 8 NP_014441.1 (396) 0 0 No 4 13 NP_015208.1 (288) 7 3 NP_012368.1 (352) 5 54 NP_015256.1 (353) 0 0 NP_009557.1 (473) 55.81 69.30 2. Isoprenoids 2a. Five-carbon isoprene unit synthesis GSTUMT00003447001 TmelERG10 GSTUMT00006667001 TmelACAT1 GSTUMT00001922001 GSTUMT00001923001 GSTUMT00001924001 TmelERG13 GSTUMT00009050001 TmelHMGR GSTUMT00001515001 TmelRAR1 GSTUMT00008001001 TmelERG8 GSTUMT00005252001 TmelMVD1 GSTUMT00003692001 TmelMVD2 GSTUMT00006247001 TmelIDI1 acetyl-CoA C acetyltransferase (acetoacetyl-CoA thiolase) (397) acetyl-CoA C acetyltransferase (acetoacetyl-CoA thiolase) (424) 3-hydroxy-3-methylglutaryl-CoA synthase (472) 3-hydroxy-3-methylglutaryl-coenzyme A reductase (1092) mevalonate kinase (442) phosphomevalonate kinase (425) diphosphomevalonate decarboxylase (390) similar to diphosphomevalonate decarboxylase (174) isopentenyl-diphosphate delta-isomerase (299) 63.41 43.92 38.50 34.49 56.82 BC1G_11051 (401) BC1G_01518 (1153) AN3869 (736) BC1G_07491 (527 ) BC1G_14194 (383) 72.78 66.25 63.50 67.35 58.52 50.12 68.77 ----- No ----- 58.12 AN0579 (269) 73.59 2b. Polyisoprenoid/terpenoid biosynthesis GSTUMT00003820001 TmelFPP1 GSTUMT00011059001 TmelBTS1 GSTUMT00003780001 TmelCOQ1 farnesyl pyrophosphate synthetase (391) geranyl-geranyl pyrophosphate synthetase (343) putative isoprenyl (decahexa-prenyl) pyrophosphate synthetase (343) 59 47.36 44.48 44.97 NCU01175 (348) NCU01427 (434) BC1G_08074 (221) 64.45 72.30 54.82 SUPPLEMENTARY INFORMATION GSTUMT00002915001 TmelNUS1 GSTUMT00000645001 TmelRER2 GSTUMT00004443001 TmelERG9 GSTUMT00004857001 TmelERG1 GSTUMT00008512001 putative isoprenyl (undecaprenyl) diphosphate synthase (324) putative cis-prenyl transferase (dehydrodolichyl diphosphate synthase) (267) putative squalene synthetase (472) 32.11 3 NP_010088.1 (375) 3 3 NP_009556.1 (286) 1 1 NP_012060.1 (444) putative squalene monooxygenase (466) 0 5 NP_011691.1 (496) ----- putative polyisoprenoid (β-carotene/ lignostilbene) dioxygenase (521) 0 0 ----- ----- MGG_08016 (558) 45.05 GSTUMT00006317001 MANTUM00006317001 putative prenylcysteine oxidase (563) 6 11 ----- ----- AN3057 (555) 49.48 GSTUMT00011429001 MANTUM00011429001 putative prenylcysteine oxidase precursor (705) 7 5 ----- ----- ----- ----- 0 0 ----- ----- MGG_10859 (1109) 51.18 2 6 ----- ----- No ----- AN9177 (389) 47.38 45.74 52.07 41.24 BC1G_13721 (95) 54.39 4 BC1G_01412 (354) BC1G_01273 (483) AN11008 (484) 71.97 64.33 59.19 3. Fatty acid peroxidation products linoleate diol synthase/ fatty acid oxygenase (1119) linoleate diol synthase/ fatty acid oxygenase (1079) GSTUMT00000322001 TmelPPO1 GSTUMT00006891001 TmelPPO2 GSTUMT00007644001 TmelOYE2 12-oxophytodienoate reductase (378) 37 0 NP_012049.1 (400) 41.88 GSTUMT00004213001 TmelCYP2C30 linoleic acid epoxygenase (564) 2 7 No ----- AN7399 (544) 36.36 GSTUMT00000137001 TmelCYP617 cytochrome P450 hydroxylase (552) 0 3 No ----- BC1G_11822 (486) 40.73 GSTUMT00009186001 TmelCYP52 4 12 NP_010690.1 (486) 29.24 AN7131 (475) 51.11 GSTUMT00004620001 TmelCYPNF5 0 1 No ----- No ----- cytochrome P450 52A3 (515) cytochrome P450 52A13 (epoxygenase activity) (388) 60 SUPPLEMENTARY INFORMATION GSTUMT00003498001 TmelCYPNF3 cytochrome P450 (457) 54 10 No ----- AN8615 (451) 35.08 GSTUMT00005386001 TmelEPHX2.1 similar to epoxide hydrolase 2 (248) 0 1 ----- ----- BC1G_13574 (281) 42.22 GSTUMT00010225001 TmelEPHX2.2 putative epoxide hydrolase (305) 7 0 No ----- MGG_05175 (367) GSTUMT00005692001 TmelEPHX2.3 9 1 ----- ----- MGG_07232 (341) GSTUMT00005966001 TmelLAP2 1 3 NP_014353.1 (671) 46.78 GSTUMT00008016001 TmelCYPNF4B 0 2 No ----- GSTUMT00005402001 TmelHYR1 8 18 NP_012303.1 (163) 51.57 GSTUMT00010330001 TmelCBR1 0 1 No ----- NCU05223 (880) GSTUMT00006538001 TmelAKR1 0 3 No ----- AN7708 (284) 57.97 GSTUMT00005097001 TmelGRE3 aldose reductase (322) 0 0 NP_011972.1 (327) 53.46 AN0423 (320) 66.56 putative aryl-alcohol dehydrogenase (353) 13 0 No ----- MGG_12003 (308) 0 0 NP_015237.1 (342) 49.54 AN9474 (349) 48.41 0 0 NP_009560.1 (497) 48.68 AN3829 (532) 65.18 1 8 No AN3591 (537) 71.62 putative epoxide hydrolase (326) leucyl aminopeptidase / epoxide (leukotriene-A4) hydrolase (615) cytochrome P450 17-alpha-hydroxylase, 17,20-lyase (521) glutathione peroxidase (involved in omega-6 fatty acid 20:4(ω-6) metabolism) (158) carbonyl reductase (NADPH) (involved in omega-6 fatty acid 20:4(ω-6) metabolism) (273) aldehyde reductase (283) BC1G_09514 (670) BC1G_10350 (461) NCU09534 (168) 34.79 41.23 61.15 37.59 49.4 36.3 4. Other non-polyketide secondary metabolism-related enzymes GSTUMT00010607001 TmelAAD GSTUMT00009316001 TmelAAD1 GSTUMT00000440001 MANTUM00000440001 GSTUMT00008742001 TmelMMSDH1 putative aryl-alcohol dehydrogenase (345) succinate-semialdehyde dehydrogenase (Glu and other aa catabolism/ butanoate metabolism) (449) methylmalonate-semialdehyde dehydrogenase (Val, Leu, Ile degradation/propanoate metabolism) (541) 61 ----- 58.77 SUPPLEMENTARY INFORMATION GSTUMT00006975001 TmelBCKD_E1A GSTUMT00000127001 TmelBCKDHB GSTUMT00007426001 TmelECH3 GSTUMT00007620001 TmelHIBCH1 GSTUMT00008499001 TmelBHBD GSTUMT00000078001 TmelMCCA GSTUMT00000074001 TmelMCCB GSTUMT00008529001 TmelNAHF1 GSTUMT00002943001 TmelNAHG1 GSTUMT00007846001 TmelNAHG2 putative 2-oxoisovalerate dehydrogenase alpha (Val, Leu, Ile degradation/ methyloxopentanoate metabolism) (450) putative 2-oxoisovalerate dehydrogenase beta (Val, Leu, Ile degradation/ methyloxopentanoate metabolism) (394) similar to methylglutaconyl-CoA hydratase (Val,Leu,Ile degradation/methylbutanoate/oxopentanoate metabolism (290) similar to 3-hydroxyisobutyryl-CoA hydrolase (Val, Leu, Ile degradation) (474) similar to 3-hydroxybutyryl-CoA dehydrogenase (butanoate metabolism) (311) 3-methylcrotonyl-CoA carboxylase (alpha subunit) (Val, Leu, Ile degradation, methylbutanoate metabolism) (735) 3-methylcrotonyl-CoA carboxylase (beta subunit) (Val, Leu, Ile degradation, methylbutanoate metabolism) (566) phenolic aldehyde (salicylaldehyde/vanillin) dehydrogenase (447) phenolic acid (salicylate) hydroxylase (399) protein similar to monooxygenase (putative salicylate hydroxylase) (444) 62 1 11 No ----- AN1726 (465) 64.32 0 6 No ----- BC1G_14088 (305) 57.70 0 6 No ----- BC1G_10324 (326) 2 18 NP_010321.1 (500) 35.91 4 44 ----- ----- 2 81 NP_009767.1 (1835) 41.9 6 18 ----- ----- BC1G_08864 (567) 1 0 No ----- AN4050 (483) 1 0 ----- ----- BC1G_05013 (688) 1 0 ----- ----- BC1G_03120 (479) BC1G_06802 (503) AN7008 (320) BC1G_08870 (346) 58.01 53.19 70.43 72.78 72.44 46.91 50.42 59.66 SUPPLEMENTARY INFORMATION GSTUMT00003194001 TmelNAHG3 protein similar to monooxygenase (putative salicylate/phenolic acid hydroxylase) (438) GSTUMT00012239001 TmelNAHG4 putative phenolic acid hydroxylase (473) 1 0 ----- ----- MGG_06552 (431) 51.75 GSTUMT00012487001 TmelIRL similar to isoflavone reductase (272) 0 0 ----- ----- ----- ----- GSTUMT00004447001 TmelFRL1 putative flavonol reductase (341) 0 1 NP_010830.1 (344) 38.53 AN5977 (335) 58.82 GSTUMT00007431001 TmelFRL2 putative flavonol reductase (129) 0 0 No ----- No ----- 63 0 0 ----- ----- AN7684 (507) 40.35 SUPPLEMENTARY INFORMATION Table S17. Genes involved in light perception and potential photoresponses in T. melanosporum genome. Gene model Putative function (length) Name EST number FLM Yeast BRH FB Acc.No. (length) % id. Filamentous BRH Acc. No. % (length) id. 1. Photoreceptors and light-dependent regulators GSTUMT00007548001 GSTUMT00001635001 GSTUMT00005055001 TmelWC-1 TmelWC-2 TmelLAEA putative blue light photoreceptor (874) Blue light regulator 2 (453) Methyl transferase master regulator of secondary metabolism (314) GSTUMT00007375001 TmelVEA Regulator of sexual development (696) GSTUMT00009560001 TmelVEB VeA–like protein (366) GSTUMT00012049001 TmelVOSA GSTUMT00007042001 TmelPHY-1 GSTUMT00000117001 Regulator of secondary metabolism (490) Sensory transduction histidine kinase (1008) 0 8 4 13 1 0 3 ----- NP_013856 (560) ----- ----- ---- 50 ---- ---- 54 NCU02356 (1131) BC1G_01840 (510) 50 NCU00902 (532) 46 AN0807 (361) 51 47 AN1052 (538) 58 NCU01731 (554) 42 4 1 ----- ---- MGG_01620 (455) 69 4 1 ----- ---- MGG_00617 (473) 41 1 0 ----- ---- NCU04834 (1536) 57 BC1G_13906 (339) 39 NCU01735 (293) 38 NCU01427 (433) 72 BC1G_08074 (221) 55 NCU02305 (449) 54 Bacteriorhodopsin-like protein (315) 0 TmelBTS1 Geranyl geranyl pyrophosphate synthetase (343) 5 TmelHPS Probable hexaprenyl pyrophosphate sinthetase, mitocondrial (343) TmelORP-1 3 BC1G_13505 (1160) 26 NP_009610 (344) 27 NP_015256 (355) 44 2. Accessory components & modulators GSTUMT00011059001 GSTUMT00003780001 GSTUMT00005262001 TmelMAPKKK_Ste11 GSTUMT00005199001 TmelCAMK GSTUMT00008887001 TmelCKA Serine/threonine protein kinase (881) Ca2+/calmodulin-dependent protein kinase (606) Casein kinase II, alpha subunit (335) 64 0 54 0 NP_009557 (473) 45 2 5 NP_013466 (717) 56 MGG_12855 (660) 42 1 2 NP_011336 (560) 34 BC1G_06577 (704) 51 7 3 NP_014704 (339) 63 BC1G_01857 (336) 92 SUPPLEMENTARY INFORMATION Table S18. Genes coding for transduction pathway proteins in T. melanosporum and closest homologs in yeast and filamentous ascomycetes Gene Model # Gene name Putative function (length aa) EST number Yeast BRH Filamentous BRH Mycelium Fruiting body Accession # (length aa) % id Accession # (length aa) % id 29 BC1G_13906 (339) AN7743 (427) AN2520 (431) AN4932 (319) BC1G_11854 (500) AN6680 (490) AN8262 (405) MGG_00258 (599) BC1G_03450 (652) AN5720 (319) MGG_04698 (418) AN10166 (425) AN8548 (371) AN0857 (393) AN2249 (440) NCU09823 (502) MGG_05214 (372) AN0751 39 G-protein coupled receptors (GPCR) GSTUMT00000117001 Tmel BacRhodopsin GPCR similar to bacterial Rhodopsin (315) --- 26 GSTUMT00012510001 Tmel Ste3p Pheromone receptor Ste3p (374) --- ---- GSTUMT00009053001 Tmel Ste2p Pheromone receptor Ste2p (367) --- 1 GSTUMT00001471001 Tmel mPR 1 GPCR similar to mPR receptor (306) --- 1 GSTUMT00012001001 Tmel mPR 2 GPCR similar to mPR receptor (543) 3 --- GSTUMT00008690001 Tmel animalGPCR like GPCR similar to animal GPCR (590) 11 1 NP_009610 (344) NP_012743 (470) NP_116627 (431) NP_014641 (317) NP_013123 (543) --- GSTUMT00010085001 Tmel cAMP_R 1 cAMP receptor (385) --- --- --- --- GSTUMT00012270001 Tmel cAMP_R 2 cAMP receptor (404) --- --- --- --- GSTUMT00009991001 Tmel Gpr1 like Glucose sensor (596) 2 1 30 GSTUMT00003917001 Tmel PQloop 1 3 3 GSTUMT00011683001 Tmel PQloop 2 7 2 GSTUMT00007380001 Tmel PQloop 3 3 6 GSTUMT00000156001 Tmel Pth11 related 1 13 2 GSTUMT00004902001 Tmel_Pth11 related 2 GPCR of unknown function with PQ loop domain (320) GPCR of unknown function with PQ loop domain (307) GPCR of unknown function with PQ loop domain (347) GPCR related to Magnaporthe grisea Pth11p (397) GPCR related to M. grisea Pth11p (374) NP_010249 (961) NP_014549 (308) NP_014549 (308) NP_010639 (317) ---- --- --- --- --- GSTUMT00002940001 Tmel_Pth11 related 3 GPCR related to M. grisea Pth11p (369) 3 2 --- --- GSTUMT00012481001 Tmel_Pth11 related 4 GPCR related to M. grisea Pth11p (388) --- --- --- --- GSTUMT00004192001 Tmel_Pth11 related 5 GPCR related to M. grisea Pth11p (383) 1 --- --- --- GSTUMT00010450001 Tmel_Pth11 related 6 GPCR related to M. grisea Pth11p (341) 1 --- --- --- 65 27 41 39 --- 31 40 28 --- 30 32 56 53 39 44 44 39 42 50 38 38 42 29 24 39 33 SUPPLEMENTARY INFORMATION GSTUMT00012382001 Tmel_Pth11 related 7 GPCR related to M. grisea Pth11p (266) --- --- --- --- GSTUMT00000531001 Tmel_Pth11 related 8 GPCR related to M. grisea Pth11p (337) 2 --- --- --- GSTUMT00012745001 Tmel_Pth11 related 9 GPCR related to M. grisea Pth11p (213) --- --- --- --- GSTUMT00012376001 Tmel_Pth11 related 10 GPCR related to M. grisea Pth11p (284) --- --- --- --- GSTUMT00012373001 Tmel_Pth11 related 11 GPCR related to M. grisea Pth11p (354) --- --- --- --- GSTUMT00012424001 Tmel_Pth11 related 12 GPCR related to M. grisea Pth11p (387) --- --- --- --- GSTUMT00010364001 Tmel_Pth11 related 13 GPCR related to M. grisea Pth11p (282) 1 --- --- --- GSTUMT00009022001 Tmel_Pth11 related 14 GPCR related to M. grisea Pth11p (250) --- --- --- --- GSTUMT00003188001 Tmel_Pth11 related 15 GPCR related to M. grisea Pth11p (336) --- 4 --- --- GSTUMT00003592001 Tmel_Pth11 related 16 GPCR related to M. grisea Pth11p (369) 6 --- --- --- GSTUMT00003181001 Tmel_Pth11 related 17 GPCR related to M. grisea Pth11p (408) --- 3 --- --- GSTUMT00004274001 Tmel_Pth11 related 18 GPCR related to M. grisea Pth11p (308) 10 12 --- --- GSTUMT00004223001 Tmel_Pth11 related 19 GPCR related to M. grisea Pth11p (481) --- 2 --- --- GSTUMT00009616001 Tmel_Pth11 related 20 GPCR related to M. grisea Pth11p (253) --- 18 --- --- GSTUMT00012518001 Tmel_Pth11 related 21 GPCR related to M. grisea Pth11p (219) --- --- --- --- GSTUMT00012596001 Tmel_Pth11 related 22 GPCR related to M. grisea Pth11p (323) --- --- --- --- Adenylate cyclase (AC) and associated protein (CAP) GSTUMT00006808001 Tmel AC Adenylate cyclase (2015) 8 8 41 GSTUMT00007790001 Tmel CAP AC associated protein (520) 2 2 NP_012529 (2026) NP_014261 (526) Phosphodiesterases (PDE) GSTUMT00009828001 Tmel PDEase I High affinity cAMP phosphodiesterase (794) 5 --- NP_015005 (526) GSTUMT00003009001 Tmel PDEase II Low affinity cAMP phosphodiesterase (684) --- 4 NP_011266 66 (545) AN10886 (372) AN10886 (372) BC1G_12534 (335) AN1930 (361) MGG_09070 (372) NCU02903 (483) AN9387 (313) BC1G_09796 (309) NCU00700 (438) NCU06531 (541) BC1G_10169 (363) NCU00700 (438) AN9387 (313) BC1G_01571 (243) BC1G_12534 (335) BC1G_09188 (371) 27 27 32 34 30 31 36 38 29 26 26 27 52 39 36 26 BC1G_02865 (1740) AN0999 (530) 59 28 BC1G_08417 (946) 43 28 NCU00237 49 38 46 SUPPLEMENTARY INFORMATION (369) Ras (monomeric G-protein) GSTUMT00004380001 Tmel Ras1 Ras small G-protein Ras1p like (382) 4 3 GSTUMT00002293001 Tmel Ras2 Ras small G-protein, Ras2p like (224) 1 1 GSTUMT00000073001 Tmel Rsr1 (ras like) Ras-like protein Rsr1 (203) 31 3 GSTUMT00007546001 Tmel Ras like Ras-like protein (232) --- 1 Heterotrimeric G-proteins GSTUMT00011064001 Tmel Gpa1 30 19 GSTUMT00011103001 Tmel Gpa2 5 1 GSTUMT00005851001 Tmel Gpa3 9 5 GSTUMT00010108001 Tmel Gpb 6 5 GSTUMT00012095001 Tmel Gpg Gpa1, alpha subunit of heterotrimeric Gprotein (353) Gpa2, alpha subunit of heterotrimeric Gprotein (351) Gpa3, alpha subunit of heterotrimeric Gprotein (352) Gpb, beta subunit of heterotrimeric G-protein (347) Gpg, gamma subunit of heterotrimeric Gprotein (88) 19 7 31 38 30 76 Regulator of G-protein signaling (RGS) GSTUMT00010835001 Tmel Rgs FlbA GSTUMT00004040001 Tmel RgsA Regulator of G-protein, GTPase activating protein (508) Regulator of G-protein (376) GSTUMT00008293001 Tmel RgsB Regulator of G-protein (350) 6 4 GSTUMT00002373001 Tmel RgsC Regulator of G-protein (1230) --- 2 cAMP-dependent protein kinase catalytic subunit (356) cAMP-dependent protein kinase catalytic subunit (430) Regulatory subunit of the cyclic AMPdependent protein kinase (514) Protein kinase similar to PKA (940) 2 4 3 2 4 6 2 5 Mitogen activated protein kinase similar to Fus3/Kss1 (351) Mitogen activated protein kinase similar to 6 6 5 2 cAMP-dependent protein kinase (PKA) GSTUMT00000447001 Tmel PkaA GSTUMT00000085001 Tmel PkaB GSTUMT00007366001 Tmel PkaR GSTUMT00004530001 Tmel Pka like Mitogen Activated protein kinase (MAPK) GSTUMT00000551001 Tmel MAPK Fus3 like GSTUMT00001034001 Tmel MAPK MpkA like 67 (687) NP_014301 (322) NP_014301 (322) NP_011668 (272) NP_014744 (309) 72 NP_010937 (449) NP_010937 (449) NP_010937 (449) NP_014855 (423) NP_012619 (110) 43 NP_013557 (698) --- 63 NP_014945 (435) NP_013603 (1127) 27 NP_015121 (380) NP_012371 (397) NP_012231 (416) NP_012075 (824) 72 NP_011554 (368) NP_011895 58 58 60 38 44 54 41 51 --- 31 46 53 69 69 AN0182 (213) BC1G_04437 (233) AN4685 (203) AN3434 (198) 92 MGG_00365 (354) BC1G_08985 (354) AN1016 (357) AN0081 (353) AN2742 (96) 92 AN5893 (720) MGG_03146 (581) MGG_03726 (365) NCU03937 (1261) 21 MGG_06368 (532) AN4717 (397) AN4987 (413) BC1G_14873 (978) 84 AN3719 (355) AN5666 93 82 64 61 62 78 87 72 51 62 47 56 68 70 80 SUPPLEMENTARY INFORMATION GSTUMT00005852001 Tmel MAPK Hog1 like MpkA/Slt2 (427) Mitogen activated protein kinase similar to Hog1 (332) 60 17 MAPK kinase (MAPKK) GSTUMT00011865001 Tmel MAPKK Ste7 like MAPK kinase similar to Ste7 (573) 3 1 GSTUMT00003004001 Tmel MAPKK Pbs2 like MAPK kinase similar to Pbs2 (346) 7 2 GSTUMT00005681001 Tmel MAPKK Mkk2 like MAPK kinase similar to Mkk1/Mkk2 (438) 11 8 MAPKK kinase (MAPKKK) GSTUMT00002374001 Tmel MAPKKK Ssk like MAPKK Kinase similar to Ssk2 (1356) 6 6 GSTUMT00005262001 Tmel MAPKKK Ste11 like MAPKK Kinase similar to Ste11p (881) 2 5 GSTUMT00006322001 Tmel MAPKKK Bck1 like MAPKK Kinase similar to Bck1 (1927) 3 3 68 (484) NP_013214 (435) 74 NP_010122 (515) NP_012407 (668) NP_015185 (506) 45 NP_014428 (1579) NP_013466 (717) NP_012440 (1478) 43 57 56 56 43 (331) NCU07024 (359) 83 AN3422 540) BC1G_07633 (605) BC1G_11713 (495) 61 AN10153 (1314) MGG_12855 (660) AN4887 (1559) 55 70 59 49 63 SUPPLEMENTARY INFORMATION Table S19. Genes coding for heat shock- and stress-related proteins in T. melanosporum genome. Gene name Putative function (length) GSTUMT00008363001 CLP FAMILY Tmelbag102 BAG family molecular chaperone regulator 1B (266) - 5 - - NCU01220 (358) 38.46 GSTUMT00004219001* HSP90 FAMILY-associated Tmelhsp98 Heat shock protein HSP98 (931) 51 15 NP_013074.1 (908) 50.94 NCU00104 (811) 71.40 GSTUMT00012112001** Tmelhsp90 2 - no file BRH Tmelhsp90 718 26 72.62 no file BRH BC1G_07315 (702) no file BRH GSTUMT00012114001** no file BRH NP_013911.1 (705) GSTUMT00012113001** Tmelhsp90 - - no file BRH no file BRH no file BRH 12 2 40,34 MGG_00814 (329) 59.02 46 8 37.93 11 4 6 29,13 AN6921 (211) BC1G_01523 (588) BC1G_10364 (497) 56.00 72 1 1 no file BRH NP_010500.1 (350) NP_012805.1 (216) NP_014670.1 (589) NP_010452.1 (506) NP_009713.1 (385) 38,98 NCU06340 (472) 41,86 Gene model EST number Fruiting Mycelium body Yeast BRH Acc. No. (length aa) % id Filamentous BRH Acc. No. (length aa) % id BAG FAMILY GSTUMT00012030001 MANTUM00012030001 GSTUMT00005825001 Tmelwos2 GSTUMT00000199001 Tmelsti1 GSTUMT00010261001 TmelCDC37 GSTUMT00008279001 FES FAMILY TmelCNS1 Heat shock protein 90, putative (701) Heat shock protein 90, putative (701) Heat shock protein 90, putative (701) Uncharacterized protein C1711.08 (324) Protein wos2 (232) Heat shock protein sti1 homolog (570) Hsp90 co-chaperone Cdc37 (515) Cyclophilin seven suppressor 1 (394) GSTUMT00006161001 PAM16 FAMILY TmelFES1 Hsp70 nucleotide exchange factor FES1 (213) 4 6 NP_009659.1 (290) 48.5 BC1G_08200 (218) 49.77 GSTUMT0001150200* GRPE FAMILY Tmelpam16 Mitochondrial import inner membrane translocase subunit tim16 (136) - 3 NP_012431.1 (149) 39.39 BC1G_00158 (161) 62.50 GSTUMT00002348001 HSP20 FAMILY Tmelmge-1 GrpE protein homolog, mitochondrial precursor (255) 1 1 NP_014875.1 (228) 50.29 NCU01516 (239) 57.78 69 57.81 75.37 63.41 53.33 SUPPLEMENTARY INFORMATION GSTUMT00011894001 Tmelhsp30_3 GSTUMT00008756001 Tmelhsp30_2 GSTUMT00000685001 HSP9/12 FAMILY Tmelhsp30_1 GSTUMT00000236001 Tmelhsp9_1 GSTUMT00004694001 DNAJ FAMILY Tmelhsp9_2 GSTUMT00001300001 GSTUMT00004488001 GSTUMT00005543001 GSTUMT00006401001 GSTUMT00009811001 GSTUMT00010039001 GSTUMT00011043001 GSTUMT00003606001 GSTUMT00007480001 GSTUMT00010431001 GSTUMT00004954001 GSTUMT00005686001 GSTUMT00005206001 GSTUMT00007528001 30 kDa heat shock protein (161) 30 kDa heat shock protein (149) 30 kDa heat shock protein (207) Heat shock protein hsp9 (87) Heat shock protein hsp9 (100) DnaJ protein homolog xdj1 (433) Uncharacterized J domain-containing protein C1778.01c MANTUM00004488001 (442) DnaJ-related protein SCJ1 precursor TmelSCJ1 (404) DnaJ homolog 1, mitochondrial precursor Tmelmdj1 (489) Chaperone protein dnaJ TmeldnaJ (343) J domain-containing protein spf31 Tmelspf31 (212) Uncharacterized J domain-containing protein C2E1P5.03 precursor (383) MANTUM00011043001 Uncharacterized J domain-containing protein C1071.09c MANTUM00003606001 (367) DnaJ homolog subfamily C member 3 Tmeldnajc3 (521) Mitochondrial import inner membrane translocase subunit tim14 (99) Tmelpam18 DnaJ homolog subfamily B member 12 TmelDnajb12 (349) DnaJ homolog subfamily C member 7 TmelDNAJC7 (553) Protein psi1 Tmelpsi1 (373) Translocation protein sec63 Tmelsec63 (684) Tmelxdj1 70 476 161 No hit No hit AN3555 (181) 43.89 - 1 - - no no 7 - - - no no 1 1 No hit 2 - No hit NP_116640.1 (109) 31.63 no no 2 6 no BC1G_09575 (429) 54.46 7 4 2 - 6 - no NP_011801.1 (433) NP_013941.2 (377) NP_116638.1 (511) 4 5 no 6 4 1 6 2 - - 1 - - 1 1 20 11 20 12 8 9 (90) 52.81 63.76 no AN7143 (448) BC1G_09129 (417) BC1G_11653 (460) BC1G_10020 (358) - - AN0590 (211) 69.15 NP_116699.1 (295) 19.82 BC1G_00200 (394) 43.40 no AN6233 (300) 48.80 52.17 AN3463 (520) 50.22 53.01 BC1G_10252 (107) 70.00 50.67 AN4441 (340) 67.59 no NCU00170 (785) BC1G_06525 (381) BC1G_03757 (697) 56.14 no NP_012462.1 (645) NP_013108.1 (168) NP_013884.1 (224) no NP_014391.1 (352) NP_014897.1 (663) 46.60 AN3725 43.86 36.52 38.68 27.30 65.81 52.61 42.54 49.22 53.96 SUPPLEMENTARY INFORMATION CPN60-TCP1 FAMILY GSTUMT00005900001 Tmelcct1 GSTUMT00009564001* Tmelcct2 GSTUMT00007765001 Tmelcct3 GSTUMT00011767001 Tmelcct4 GSTUMT00007689001 Tmelcct5 GSTUMT00008182001 Tmelcct6 GSTUMT00000052001 Tmelcct7 GSTUMT00011157001 Tmelcct8 GSTUMT00009341001 HSP70 FAMILY TmelHSP60 GSTUMT00000289001** Tmelhsp70 GSTUMT00000288001** Tmelhsp70 GSTUMT00000866001 TmelHSPa12b GSTUMT00001960001 TmelHSPa12a_1 GSTUMT00004279001 TmelHSPa12a_2 GSTUMT00009214001* Tme HSPa12a_3 GSTUMT00001987001 TmelHSPa12a_4 GSTUMT00010876001 TmelHSPa12a_5 GSTUMT00001748001 TmelHSP88 GSTUMT00008839001* TmelSSB GSTUMT00010477001* TmelbipA T-complex protein 1 subunit alpha (506) Probable T-complex protein 1 subunit beta (567) T-complex protein 1 subunit gamma (538) T-complex protein 1 subunit delta (531) T-complex protein 1 subunit epsilon (550) T-complex protein 1 subunit zeta (641) Probable T-complex protein 1 subunit eta (555) Probable T-complex protein 1 subunit theta (547) Heat shock protein 60, mitochondrial precursor (592) Heat shock 70 kDa protein (648) Heat shock 70 kDa protein (648) Heat shock 70 kDa protein 12B (587) Heat shock 70 kDa protein 12A (649) Heat shock 70 kDa protein 12A (642) Heat shock 70 kDa protein 12A (637) Heat shock 70 kDa protein 12A (587) Heat shock 70 kDa protein 12A (587) Heat shock protein Hsp88 (692) Heat shock protein SSB (613) 78 kDa glucose-regulated protein homolog precursor (666) 71 34 11 8 16 - 6 8 5 2 12 1 28 5 10 2 2 56 15 3 - 532 NP_010498.1 (559) NP_012124.1 (527) NP_012520.1 (534) NP_010138.1 (528) NP_012598.1 (562) NP_010474.1 (546) NP_012424.1 (550) NP_011493.1 (718) NP_013360.1 (572) 73.22 BC1G_07480 (567) BC1G_11867 (539) 71.92 NCU01843 (541) 84.48 69.77 NCU02839 (534) 79.42 68.07 81.75 67.46 MGG_11889 (552) BC1G_07745 (541) BC1G_14004 (545) BC1G_02512 (754) 71.76 NCU01589 (491) 84.75 75.69 61.55 69.57 81.32 81.37 86.38 79.55 76.13 no no no 35 no NP_013076.1 (639) 81,59 NCU09602 (647) 88,78 4 - - - AN2646 (688) 32,39 4 5 - - no 2 - no no no BC1G_12181 (741) 93.80 2 1 no no MGG_07156 (613) 58.79 1 - no no AN0587 (586) 35,54 1 - no - 69.94 6 10 76,14 no BC1G_09769 (712) BC1G_10846 (425) no 18 35 36 no NP_009728.1 (693) NP_014190.1 (613) NP_012500.1 (682) 70,24 NCU03982 (662) 82,18 50,29 77.67 SUPPLEMENTARY INFORMATION 3 6 TmelSSC1 Ribosome-associated complex subunit SSZ1 (542) Heat shock protein SSC1, mitochondrial precursor (679) 24 6 NP_011931.2 (538) NP_012579.1 (654) GSTUMT00002165001* CLPA/B FAMILY TmelHSP10 10 kDa heat shock protein, mitochondrial (104) 12 5 GSTUMT00009533001* USP FAMILY TmelHSP78 Heat shock protein 78, mitochondrial precursor (793) 8 GSTUMT00002779001 MANTUM00002779001 GSTUMT00004495001 CYCLOPHILIN FAMILY MANTUM00004495001 GSTUMT00002739001 TmelSSZ1 GSTUMT00011349001* CPN10 FAMILY GSTUMT00005334001 GSTUMT00008642001 GSTUMT00008657001 Uncharacterized protein C167.05 (731) Universal stress protein A family protein C25B2.10 (357) Peptidyl-prolyl cis-trans isomerase-like 1 (162) peptidyl prolyl cis-trans isomerase Cyclophilin, putative MANTUM00008642001 (152) peptidyl-prolyl cis-trans isomerase, mitochondrial precursor MANTUM000086457001 (168) Tmelcyp1 GSTUMT00008859001 Tmelcyp3 GSTUMT00007865001 Tmelcyp15 GSTUMT00010900001 Tmelcyp4 GSTUMT00010103001* FKBP FAMILY Tmelcyp8 GSTUMT00010499001 Tmelfpr1A GSTUMT00000766001 NO FAMILY Tmelfkbp22 GSTUMT00009281001 TmelAGP2 GSTUMT00004833001 GSTUMT00007543001* TmelSSU81 TmelYAR1 Peptidyl-prolyl cis-trans isomerase H (174) Peptidyl-prolyl cis-trans isomerase cyp15 (645) Peptidyl-prolyl cis-trans isomerase-like 3 (167) Peptidyl-prolyl cis-trans isomerase-like 2 (567) FK506-binding protein 1A (108) FK506-binding protein 2 (139) General amino acid permease AGP2 (555) Protein SSU81 (318) Ankyrin repeat-containing protein YAR1 72 43,89 AN4616 (564) 58.56 63,29 AN6010 (667) 68.89 NP_014663.1 (106) 63,11 MGG_13221 (106) 78.85 1 NP_010544.1 (811) 59,14 12 3 - - MGG_09507 (719) 49.73 3 2 - - AN6061 (440) 70.30 1 1 no no 74.48 2 3 no no 29 9 no 1 1 no NP_013633.1 (182) 50.00 AN8680 (163) BC1G_15349 (154) BC1G_01740 (182) BC1G_01528 (181) - 2 no no AN0380 (630) 65.34 3 - no no 68.86 2 2 no no MGG_03239 (169) BC1G_11448 (754) 4 1 62.38 AN3598 (109) 70.37 4 3 48.31 AN8343 (136) 63.00 2 27 49,9 BC1G_09689 (463) 77.41 2 5 1 4 31,12 32,5 AN7698 (288) - 53.04 - NP_014264.1 (114) NP_010807.1 (135) NP_009690.1 (596) NP_011043.1 (367) NP_015085.1 AN1163 (801) 66.38 56.21 64.90 67.46 55.29 SUPPLEMENTARY INFORMATION GSTUMT00002556001* TmelDHN1 GSTUMT00006625001 Tmelgst2 GSTUMT00006718001 TmelGsto1 GSTUMT00001990001 MANTUM00001990001 GSTUMT00001866001 Tmelucp7 GSTUMT00007789001* TmelNAM9 GSTUMT00000027001 TmeldgoD (233) DHN1 (264) Glutathione S-transferase 2 (224) Glutathione transferase omega-1 (234) J domain-containing protein C21orf55 (512) UBA domain-containing protein 7 (922) 37S ribosomal protein NAM9, mitochondrial precursor (311) D-galactonate dehydratase (387) *The gene model has been edited. ** The gene models have been merged. 73 (200) - 6 - NCU04667 (328) 42,44 1 NP_014170.1 (354) 307 38,56 44.50 100 53 - - 1 - - 2 2 35,18 2 42,29 AN8798 (884) BC1G_14515 (424) 38.71 2 NP_010606.1 (668) NP_014262.1 (486) MGG_09138 (225) BC1G_12605 (137) BC1G_01357 (565) - 5 - - MGG_07935 (385) 67.18 55,36 42.35 55.32 SUPPLEMENTARY INFORMATION Table S20. Amino acyl-tRNA synthetase genes in T. melanosporum. Gene Model Name Putative function (length) EST number Mycelium GSTUMT00007997001 TmelAARS GSTUMT00006820001 TmelCARS GSTUMT00002673001 TmelDARS GSTUMT00008559001 TmelDARS2 GSTUMT00010264001 TmelEARS GSTUMT00001301001 TmelEARS2 GSTUMT00002349001 TmelFARS2 GSTUMT00004266001 TmelFARSA GSTUMT00000015001 TmelFARSB GSTUMT00003052001 TmelGARS GSTUMT00004084001 TmelHARS GSTUMT00005330001 TmelIARS GSTUMT00011254001 TmelIARS2 GSTUMT00002716001 TmelKARS GSTUMT00009944001 TmelLARS GSTUMT00006079001 TmelLARS2 GSTUMT00008601001 TmelMARS GSTUMT00011730001 TmelMARS2 Alanyl-tRNA synthetase, cytoplasmic (959) Cysteinyl-tRNA synthetase (718) Aspartyl-tRNA synthetase, cytoplasmic (552) Aspartyl-tRNA synthetase, mitochondrial (593) Glutamyl-tRNA synthetase, cytoplasmic (721) Glutamyl-tRNA synthetase, mitochondrial (381) Phenylalanyl-tRNA synthetase, mitochondrial (421) Phenylalanyl-tRNA synthetase alpha chain (484) Phenylalanyl-tRNA synthetase beta chain (592) Glycyl-tRNA synthetase (609) Histidyl-tRNA synthetase (490) Isoleucyl-tRNA synthetase, cytoplasmic (1075) Isoleucyl-tRNA synthetase, mitochondrial (956) Lysyl-tRNA synthetase, cytoplasmic (618) Leucyl-tRNA synthetase, cytoplasmic (1066) Leucyl-tRNA synthetase, mitochondrial (838) Methionyl-tRNA synthetase, cytoplasmic (817) Methionyl-tRNA synthetase, mitochondrial 74 Yeast BRH Fruiting Acc. No. % body (length aa) id Filamentous BRH Acc. No. (length aa) % id 6 10 NP_014980 AN9419 62% (958) (962) 2 8 NP_014152 BC1G_05496 44% 42% (767) (797) 2 5 NP_013083 AN4550 61% (557) (556) 54% 0 1 NP_015221 AN1710 38% (658) (680) 46% 6 22 NP_011269 NCU08894 54% (708) (637) 57% 0 1 NP_014609 BC1G_09576 47% 54% (563) (613) 0 1 NP_015372 BC1G_07094 53% 55% (469) (493) 2 5 NP_116631 NCU05095 54% (503) (518) 3 2 NP_013161 BC1G_10189 55% 70% (595) (610) 8 17 NP_009679 MGG_06321 58% (667) (666) 4 4 NP_015358 BC1G_07247 55% 64% (546) (535) 8 4 NP_009477 BC1G_13385 61% 74% (1072) (1070) 4 0 NP_015285 AN10393 38% (1002) (1001) 46% 1 4 NP_010322 AN1913 63% (591) (607) 69% 1 4 NP_015165 BC1G_15093 55% 60% (1090) (1125) 1 1 NP_013486 MGG_04042 42% (894) (918) 4 24 NP_011780 BC1G_04159 62% 62% (751) (643) 0 1 NP_011687 AN3865 38% (575) (580) 68% 62% 61% 47% 55% SUPPLEMENTARY INFORMATION (589) Asparaginyl-tRNA GSTUMT00011574001 TmelNARS synthetase, cytoplasmic (579) Prolyl-tRNA synthetase GSTUMT00009788001 TmelPARS (317) Glutaminyl-tRNA GSTUMT00008811001 TmelQARS synthetase, cytoplasmic (791) Asparaginyl-tRNA synthetase, GSTUMT00000298001 TmelQARS2 mitochondrial (492) Arginyl-tRNA GSTUMT00011569001 TmelRARS synthetase, cytoplasmic (633) Seryl-tRNA synthetase, GSTUMT00005319001 TmelSARS cytoplasmic (444) Seryl-tRNA synthetase, GSTUMT00001005001 TmelSARS2 mitochondrial (527) Threonyl-tRNA GSTUMT00010648001 TmelTARS synthetase, cytoplasmic (658) Threonyl-tRNA synthetase, GSTUMT00009747001 TmelTARS2 mitochondrial (459) Valyl-tRNA synthetase GSTUMT00009839001 TmelVARS (905) Tryptophanyl-tRNA GSTUMT00002674001 TmelWARS synthetase, cytoplasmic (444) Tryptophanyl-tRNA synthetase, GSTUMT00007805001 TmelWARS2 mitochondrial (369) Tyrosyl-tRNA GSTUMT00012225001 TmelYARS synthetase, cytoplasmic (418) Tyrosyl-tRNA synthetase, GSTUMT00011504001 TmelYARS2 mitochondrial (449) 75 2 11 NP_011883 AN7479 55% (554) (575) 11 8 NP_011884 BC1G_07479 39% 55% (688) (589) 2 3 NP_014811 NCU07926 50% (809) (652) 62% 4 9 NP_009953 NCU06866 39% (492) (490) 48% 5 5 NP_010628 AN6368 61% (607) (651) 58% 2 2 NP_010306 BC1G_06103 61% 71% (462) (472) 0 0 NP_011875 NCU09594 38% (446) (552) 4 0 NP_116578 BC1G_11336 56% 65% (734) (788) 0 1 NP_012727 NCU03129 42% (462) (499) 55% 6 2 NP_011608 MGG_04396 48% (1104) (1100) 49% 1 1 NP_014544 BC1G_07147 60% 61% (432) (475) 1 1 NP_010554 AN6488 46% (379) (385) 54% 1 3 NP_011701 MGG_02449 65% (394) (389) 58% 1 0 NP_015228 NCU03030 38% (492) (670) 36% 61% 50% SUPPLEMENTARY INFORMATION Table S21. Translation factor genes in T. melanosporum. EST number Gene Model Name Putative function (length) Yeast BRH Acc. No. (length) FLM FB 79 71 17 9 1 0 3 10 8 10 11 30 8 5 ----- ----- 3 3 6 3 4 1 2 4 9 5 2 2 4 3 6 5 5 11 ----- 3 6 NO Filamentous BRH % id Acc. No. ( length) % id 1. Initiation GSTUMT90000712001 TmelSUI1 GSTUMT00004265001 TmelTIF11 GSTUMT00001399001 TmelEIF1A2 GSTUMT00002346001 TmelSUI2 GSTUMT00006311001 TmeleIF2A GSTUMT00006825001 TmelSUI3 GSTUMT00008017001 TmelGCD11 GSTUMT00004384001 TmelGCN3 GSTUMT00004293001 TmelCDC7 GSTUMT00011372001 TmelEIF2B3 GSTUMT00010167001 TmelGCD2 GSTUMT00003358001 TmelGCD6 GSTUMT00012096001 TmelTIF32 GSTUMT00005362001 TmelSPAC25G10.08 GSTUMT00004436001 TmelTIF33 GSTUMT00000237001 TMELTIF35 GSTUMT00008033001 GSTUMT00008034001 TMELMOE1 GSTUMT00004946001 TMELSPBC4C3.07 Initiation factor 1 (114) Initiation factor 1A (145) Initiation factor 1A-like (148) Initiation factor 2a (306) Initiation factor 2A (713) Initiation factor 2b (316) Initiation factor 2g (530) Initiation factor 2Ba (332) Initiation factor 2Bb (137) Initiation factor 2Bg (573) Initiation factor 2Bd (506) Initiation factor 2Be (683) Initiation factor 3A (1072) Initiation factor 3B (725) Initiation factor 3C (854) Initiation factor 3G (289) Initiation factor 3D (578) Initiation factor 3F (332) NP_014155.1 (108) NP_013987.1 (145) ----NP_012540.1 (304) NP_011568.1 (642) 60.00 64.06 NP_015087.1 (285) NP_010942.1 (527) NP_012951.1 (305) 45.83 NP_013394.1 (381) NP_014903.1 (578) NP_011597.1 (651) 38.37 NP_010497.1 (712) NP_009635.1 (964) NP_015006.1 (763) 34.23 NP_014040.1 (812) NP_010717.1 (274) 35.86 BC1G_06212 (870) 35.77 AN10765 (290) BC1G_12797 ----(601) AN10182 ----(346) 81.57 39.14 AN2992 (305) AN4470 (515) AN0167 (353) 63.90 83.74 35.63 AN1344 40.00 (434) 22.40 BC1G_00730 35.46 (557) 35.94 BC1G_08380 40.83 (478) AN10459 41.67 (705) 37.24 AN2743 69.60 (1036) 35.83 BC1G_11866 60.98 (745) Initiation factor 3E (454) 5 13 ----- ----- GSTUMT00005308001 TMELGCD10 Initiation factor 3g (534) 1 4 NP_014337.1 (478) 29.18 GSTUMT00011424001 TMELSPAC821.05 3 6 ----- GSTUMT00009054001 TMELSUM1 6 11 NP_013866.1 (347) GSTUMT00009595001 TMELEIF3J 1 1 ----- GSTUMT00007524001 TMELEIF3K ----- 2 ----- GSTUMT00003100001 TMELEIF3X 2 13 NP_013725.1 (1277) 76 82.93 AN0105 42.36 (136) 68.04 NCU08277 81.72 (311) 37.79 MGG_00847 50.41 (705) TMELINT6 Initiation factor 3K (233) Initiation factor 3X (1296) 62.71 ----- GSTUMT00006261001 GSTUMT00006262001 GSTUMT00006263001 Initiation factor 3H (366) Initiation factor 3I (333) Initiation factor 3J (283) AN4742 (115) AN8712 (150) 65.70 66.67 73.17 67.97 AN2907 (449) 54.55 AN8066 (551) 42.97 AN1270 59.49 (367) 54.30 BC1G_04452 73.64 (336) MGG_05134 45.52 ----(273) ----- BC1G_11363 64.98 (246) 24.94 BC1G_11770 57.75 (1307) ----- SUPPLEMENTARY INFORMATION GSTUMT00004430001 TMELEIF3L GSTUMT00008763001 TMELEIF4A3 GSTUMT00004300001 TMELEIF4A3B GSTUMT00007732001 TMELSCE3 GSTUMT00011734001 TMELEIF4EA GSTUMT00009184001 TMELCDC33 GSTUMT00007631001 TMELEIF4EB GSTUMT00004418001 TMELTIF471 Initiation factor 3l (478) Initiation factor 4A-III (472) Initiation factor 4A-IIIb Initiation factor 4B (493) Initiation factor 4Ea (340) Initiation factor 4Eb (249) Initiation factor 4Ec (253) Initiation factor 4G (1068) Initiation factor 5 (420) Initiation factor 5A (158) Initiation factor 5B (1075) Initiation factor 6 (246) NCU06279 69.87 (475) 70.30 BC1G_00466 92.66 (399) 6 4 ----- 20 36 NP_012397.1 (395) 6 3 11 5 3 2 NO 9 3 NP_014502.1 (213) 49.20 4 2 NO ----- 10 6 40.46 23 13 10 11 2 6 1 8 NP_011466.1 (914) NP_015366.1 (405) NP_012581.1 (157) NP_009365.1 (1002) NP_015341.1 (245) NP_010304.1 (399) NP_015489.1 (436) ----- 62.28 BC1G_07971 91.73 (400) 28.80 BC1G_07588 39.13 (575) AN8191 48.86 ----(351) 41.56 AN3411 68.72 (245) BC1G_00854 45.73 (245) AN6060 (1519) AN6067 (424) NCU05274 (164) AN4038 (1073) MGG_01671 (246) 48.33 60.58 GSTUMT00004696001 TMELTIF5 GSTUMT00002158001 TMELHYP2 GSTUMT00009928001 TMELFUN12 GSTUMT00006515001 TMELTIF6 GSTUMT00011471001 TMELPABP Poly-A binding protein (750) 3 2 NP_011092.1 (577) 60.16 MGG_09505 83.40 (762) GSTUMT00000021001 TMELTEF1A Elongation factor 1A (421) 129 42 NP_015405.1 (458) 74.47 BC1G_09492 80.41 (461) GSTUMT00010479001 GSTUMT00010480001 TMELTEF1B 6 4 NO GSTUMT00010962001 TMELTEF1C 7 4 NO GSTUMT00001146001 TMELEFB1 14 1 NP_009398.1 (206) BC1G_04388 55.46 (757) NCU09513 58.78 ----(651 50.47 NCU06035 59.31 (232) GSTUMT00005165001 GSTUMT00005166001 TMELTEF4 10 8 NP_015277.1 (415) 47.94 BC1G_00939 57.86 (416) GSTUMT00001342001 GSTUMT00001343001 GSTUMT00001344001 TMELRIA1A Elongation factor 2 (830) 30 14 NP_014776.1 (842) 79.35 GSTUMT00007649001 TMELRIA1B Elongation factor 2 (1075) 1 ----- NP_014236.1 (1110) 46.65 BC1G_09557 47.89 (1042) TMELHEF3 Elongation factor 3 (1064) 135 17 NP_014384.1 (1044) 62.16 BC1G_15638 73.82 (947) GSTUMT00006283001 TMELSUP45B Release factor 1 (445) 2 3 NO ----- GSTUMT00002415001 TMELSUP45A 8 3 71.90 GSTUMT00006992001 TMELSUP35 5 17 NP_009701.1 (437) NP_010457.1 (685) 71.43 66.61 77.14 73.89 77.06 84.55 2. Elongation GSTUMT00001743001 GSTUMT00001744001 3. Termination Elongation factor 1A (589) Elongation factor 1A (615) Elongation factor 1Bα (220) Elongation factor 1Bγ (402) Release factor 1 (434) Release factor 3 (730) 77 ----- AN6330 (845) NO 89.68 ----- AN8853 85.31 (436) 65.19 BC1G_14662 64.56 (727) SUPPLEMENTARY INFORMATION Table S22. Comparison of T. melanosporum translation factor genes with those of other eukaryotic organisms. BLAST RESULT Gene name GSTUMT No. Best match of Neurospora crassa homolog Best matcha S. cerevisiae S. pombe 00000712001 00004265001 00001399001 00002346001 00006311001 00006825001 00008017001 00004384001 00004293001 00011372001 00010167001 00003358001 00012096001 00005362001 00004436001 00000237001 00008034001 00004946001 00006263001 00005308001 00011424001 00009054001 00009595001 00007524001 00003100001 00004430001 00008763001 00004300001 00007732001 00011734001 00009184001 00007631001 00004418001 00004696001 00002158001 00009928001 00006515001 00011471001 Sp Sp Sp Sc Sp Sp Sc Sp Sc Animal Sp Sp Sp Sp Sp Sp Sp Sp Animal Sc Plant Sp NF Animal Sp Animal Sp Sp Sp Plant Sp Plant Sp Sp Sc Sp Sc Sp 9e-31 3e-40 NFd 3e-90 2e-76 1e-47 0.00 2e-54 6e-11 6e-13 7e-55 e-109 2e-93 e-121 e-118 3e-32 NF NF NF 6e-47 NF e-102 NF NF 2e-80 NF e-143 e-139 NF e-23 2e-45 NF e-58 6e-84 6e-57 0.00 e-107 2e-88 7e-31 4e-46 2e-09 5e-89 e-142 3e-49 0.00 3e-58 5e-07 8e-23 1e-60 e-143 0.00 e-175 0.00 e-55 2e-98 9e-86 NF 2e-46 6e-47 e-132 NF NF 0.00 NF 0.00 0.00 5e-29 2e-21 3e-51 9e-20 e-86 2e-87 5e-53 0.00 e-100 e-106 4e-26 6e-45 6e-05 7e-64 4e-87 3e-43 0.00 2e-38 2e-05 2e-25 2e-54 3e-78 e-143 e-132 e-120 2e-39 4e-80 4e-45 2e-18 2e-23 3e-47 6e-98 NF 6e-22 e-141 e-107 e-159 e-177 8e-12 2e-40 e-36 4e-28 3e-41 2e-68 6e-51 0.00 2e-97 4e-84 5e-26 1e-37 NFd e-68 e-67 4e-41 e-172 7e-46 2e-06 4e-14 8e-51 8e-95 e-117 e-123 e-140 e-34 3e83 4e-43 2e-16 8e-12 5e-51 e-78 NF 2e-17 9e-90 4e-83 e-150 e-163 9e-9 2e-44 5e-32 2e-30 3e-38 7e-38 4e-42 0.00 5e-93 e-73 Sc Animal Sp Sp Sp Sc Sc Sp Sc Sp Sp Sp Sp Sp Sp Animal Sp Sp Animal Sc Sp Sp Animal Animal Sp Animal Sp Sp Sp Plant Sp Animal Sp Sp Sc Sc Sc Sp 00000021001 00010479001 00010962001 00001146001 00005165001 00001342001 00007649001 00001743001 Sp Animal Animal Sp Sc Sc Sp Sp e-139 NF NF 2e-44 3e-72 0.00 e-120 0.00 e-140 5e-63 NF 8e-51 3e-54 0.00 0.00 0.00 NF 8e-72 e-112 e-47 2e-29 0.00 e-113 6e-57 NF NF NF 3e-27 3e-34 e-170 e-143 3e-52 Sp Animal Animal Sp Sc Sc Sp Sc 00006283001 00006992001 00002415001 Animal Sc Animal e-169 0.00 e-176 e-155 0.00 e-166 e-172 4e-42 0.00 e-165 6e-42 e-173 --Sc Animal Animalsb Plantsc 1. Initiation EIF1 EIF1A EIF1A-like EIF2α EIF2A EIF2β EIF2γ EIF2Bα EIF2Bβ EIF2Bγ EIF2Bδ EIF2Bε EIF3A EIF3B EIF3C EIF3G EIF3D EIF3F EIF3E EIF3γ EIF3H EIF3I EIF3J EIF3K EIF3X EIF3L EIF4A3 EIF4A3B EIF4B EIF4EA EIF4EB EIF4EC EIF4G EIF5 EIF5A EIF5B EIF6 PABP 2. Elongation EEF1A EEF1A EEF1A EEF1Bα EEF1Bγ EEF2 EEF2 EEF3 3. Termination ERF1 ERF3 ERF1 Sp, S. pombe; Sc, S. cerevisiae. Anopheles gambiae, Drosophila melanogaster, Homo sapiens, Mus musculus, Oryctolagus cuniculus,Rattus norvegicus, Xenopus laevis. cArabidopsis thaliana, Oryza sativa,Triticum aestivum, Zea mays. dNF, not found. a b 78 SUPPLEMENTARY INFORMATION Table S23. Comparison of major classes of carbohydrate-active enzymes in T. melanosporum and comparison with other sequenced ascomycetous and basidiomycetous fungi. Species Pezizomycetes Tuber melanosporum GH* GT CBM CE PL 95 96 18 13 3 Saccharomycetes Saccharomyces cerevisiae S288C Candida albicans SC5314 Candida glabrata CBS138 45 58 38 67 69 73 12 4 12 3 3 3 0 0 0 Eurotiomycetes Aspergillus nidulans FGSC A4 Aspergillus oryzea RIB40 Aspergillus fumigatus Af293 247 285 263 91 114 103 36 30 55 29 26 29 19 21 13 Sordariomycetes Magnaporthe grisea 70-15 Trichoderma reesei Fusarium graminearum PH-1 Neurospora crassa 74A Podospora anserina S mat+ 231 200 243 171 229 94 103 110 76 88 58 36 61 39 75 47 16 42 21 16 4 3 20 3 7 Archiascomycetes Schizosaccharomyces pombe 972H 46 61 5 5 0 Basidiomycetes Cryptococcus neoformans JEC21 Ustilago maydis 521 Laccaria bicolor S238 Phanerochaete chrysosporium RP78 Coprinopsis cinerea Okayama 7 (#130) 75 97 165 179 210 68 64 88 66 72 10 9 14 47 90 9 3 1 7 4 13 20 7 * GH, glycosyl hydrolase family; GT, glycosyl transferase family; PL, polysaccharide lyase family; CBM, carbohydrate-binding module family. Grey cells indicate the species having the largest set of enzymes. The highest numbers of entries in each category is indicated in bold. 79 SUPPLEMENTARY INFORMATION Table S24. Carbohydrate-active enzymes (CAZymes) encoded in the T. melanosporum genome. Gene model ID Family Definition Metabolism Carbohydrate Esterase Family GSTUMT00009714001 GSTUMT00012611001 GSTUMT00004210001 GSTUMT00007845001 GSTUMT00009302001 GSTUMT00005594001 GSTUMT00000534001 GSTUMT00012125001 GSTUMT00003032001 GSTUMT00007976001 GSTUMT00006497001 GSTUMT00012789001 GSTUMT00003723001 CE1 CE1 CE4 CE4 CE4 CE4 CE4 CE4 CE4 CE4 CE4 CE8 CE12 candidate esterase related to S-formylglutathione hydrolase and feruloyl esterase candidate esterase distantly related to feruloyl esterase candidate esterase related to chitin deacetylase; N-term CBM18 module candidate esterase related to cyclic imide hydrolase candidate esterase related to chitin deacetylase candidate esterase distantly related to chitin deacetylase candidate esterase related to chitin deacetylase candidate esterase related to chitin deacetylase candidate esterase related to chitin deacetylase candidate esterase related to chitin deacetylase; N-term CBM18 module candidate esterase related to chitin deacetylase; 3 N-term CBM18 modules candidate esterase distantly related to pectin methylesterase; possibly GPI-anchored candidate esterase related to rhamnogalacturonan acetylesterase cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin pectin pectin Carbohydrate-binding Module Family GSTUMT00001211001 GSTUMT00012353001 GSTUMT00004210001 GSTUMT00002591001 GSTUMT00000524001 GSTUMT00010448001 GSTUMT00007976001 GSTUMT00009270001 GSTUMT00006497001 GSTUMT00004733001 GSTUMT00007975001 GSTUMT00000402001 GSTUMT00007852001 GSTUMT00010976001 GSTUMT00007490001 GSTUMT00006246001 GSTUMT00007342001 GSTUMT00007945001 CBM1 CBM18 CBM18 CBM18 CBM18 CBM18 CBM18 CBM18 CBM18 CBM18 CBM19 CBM21 CBM21 CBM32 CBM43 CBM48 CBM48 CBM48 candidate ß-glycosidase related to endoglucanase; C-term CBM1 module Chitin binding domain candidate esterase related to chitin deacetylase; N-term CBM18 module candidate esterase related to chitin deacetylase; N-term CBM18 module candidate esterase related to chitin deacetylase; 3 N-term CBM18 modules candidate ß-1,3/6-glucan-active enzyme; N-term CBM18 module; GPI-anchored distantly related to protein phosphatase glycogen-binding regulatory subunit candidate α-glycosidase distantly related to α-amylase; N-term CBM21 module candidate α,α-trehalase; C-term CBM32 module candidate ß-1,3-glucanosyltransferase; C-term CBM43 module; GPI-anchored candidate α-1,4-glucan branching enzyme Glycosyl Hydrolase Family GSTUMT00001302001 GSTUMT00003864001 GSTUMT00005036001 GSTUMT00010944001 GSTUMT00007083001 GSTUMT00003224001 GSTUMT00005119001 GSTUMT00001003001 GSTUMT00006007001 GSTUMT00005599001 GSTUMT00008973001 GSTUMT00008530001 GSTUMT00006999001 GSTUMT00010324001 GSTUMT00003767001 GSTUMT00003582001 GSTUMT00009198001 GSTUMT00009298001 GSTUMT00008058001 GSTUMT00004119001 GSTUMT00006246001 GSTUMT00002531001 GSTUMT00004257001 GSTUMT00005820001 GSTUMT00007852001 GSTUMT00006610001 GSTUMT00004366001 GSTUMT00002130001 GSTUMT00006794001 GH1 GH1 GH2 GH2 GH3 GH3 GH3 GH3 GH3 GH3 GH5 GH5 GH5 GH5 GH5 GH5 GH10 GH12 GH13 GH13 GH13 GH13 GH13 GH13 GH13 GH13 GH15 GH16 GH16 candidate ß-glucosidase candidate ß-glycosidase distantly related to ß-galactosidase/b-glucosidase candidate ß-mannosidase candidate ß-glycosidase related to ß-mannosidase candidate ß-glucosidase candidate ß-glucosidase candidate ß-glycosidase distantly related to bacterial ß-N-acetylhexosaminidase candidate ß-glucosidase or exo-b-1,3-glucosidase candidate ß-glycosidase candidate ß-glucosidase candidate endoglucanase candidate exo-1,3-b-glucanase candidate exo-1,3-b-glucanase candidate ß-1,6-glucanase candidate ß-glycosidase distantly realted to bacterial endoglycoceramidase candidate ß-glycosidase distantly related to bacterial endoglycoceramidase candidate ß-xylanase candidate endo-1,4-glucanase (xyloglucan-specific) candidate α-glycosidase related to oligo-1,6-glucosidase candidate glycogen-debranching enzyme candidate α-1,4-glucan branching enzyme candidate α-1,3/4-glucan synthase candidate maltotriose-producing a-amylase candidate α-amylase candidate α-glycosidase distantly related to a-amylase candidate α-amylase candidate glucoamylase candidate cell-wall ß-1,6-glucan active enzyme; membrane-anchored candidate ß-1,3/6-glucan-active enzyme; GPI-anchored 80 cell-wall ß-glucan cellulose/xyloglucan cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall xylan xyloglucan starch/glycogen starch/glycogen starch/glycogen cell-wall a-glucan cell-wall a-glucan starch/glycogen starch/glycogen starch/glycogen starch/glycogen cell-wall ß-glucan cell-wall ß-glucan SUPPLEMENTARY INFORMATION GSTUMT00003034001 GSTUMT00001660001 GSTUMT00004630001 GSTUMT00002321001 GSTUMT00004733001 GSTUMT00003996001 GSTUMT00008448001 GSTUMT00010908001 GSTUMT00007484001 GSTUMT00011326001 GSTUMT00011529001 GSTUMT00011196001 GSTUMT00001975001 GSTUMT00006229001 GSTUMT00010084001 GSTUMT00006876001 GSTUMT00012061001 GSTUMT00010928001 GSTUMT00012011001 GSTUMT00004199001 GSTUMT00005998001 GSTUMT00008333001 GSTUMT00004616001 GSTUMT00010862001 GSTUMT00003340001 GSTUMT00007119001 GH16 GH16 GH16 GH16 GH16 GH17 GH17 GH17 GH17 GH18 GH18 GH18 GH18 GH18 GH20 GH20 GH24 GH25 GH28 GH28 GH31 GH31 GH31 GH31 GH31 GH32 GSTUMT00007318001 GSTUMT00001902001 GSTUMT00006536001 GSTUMT00003229001 GSTUMT00011089001 GSTUMT00006238001 GSTUMT00006375001 GSTUMT00001932001 GSTUMT00008220001 GSTUMT00006578001 GSTUMT00011889001 GSTUMT00008775001 GSTUMT00012573001 GSTUMT00008986001 GSTUMT00001211001 GSTUMT00012497001 GSTUMT00006099001 GSTUMT00010976001 GSTUMT00004023001 GSTUMT00010889001 GSTUMT00007490001 GSTUMT00002553001 GSTUMT00000527001 GSTUMT00012299001 GSTUMT00004785001 GSTUMT00008680001 GSTUMT00004651001 GSTUMT00004301001 GSTUMT00000874001 GSTUMT00006913001 GSTUMT00005050001 GSTUMT00005750001 GSTUMT00008143001 GSTUMT00007295001 GSTUMT00011606001 GSTUMT00007970001 GH36 GH36 GH38 GH43 GH45 GH47 GH47 GH47 GH47 GH47 GH55 GH55 GH61 GH61 GH61 GH61 GH63 GH65 GH72 GH72 GH72 GH72 GH75 GH76 GH76 GH76 GH76 GH76 GH76 GH78 GH78 GH81 GH81 GH81 GH85 GH92 GSTUMT00008776001 GSTUMT00001255001 GSTUMT00004007001 GSTUMT00005391001 – – – – GSTUMT00010384001 GSTUMT00011693001 GSTUMT00007486001 GSTUMT00009030001 GT1 GT1 GT1 GT1 candidate ß-(trans)glycosidase candidate ß-(trans)glycosidase; membrane-anchored candidate ß-(trans)glycosidase distantly related to bacterial endo-1,3-b-glucanase candidate cell-wall ß-1,6-glucan active enzyme; membrane-anchored candidate ß-1,3/6-glucan-active enzyme; N-term CBM18 module; GPI-anchored candidate ß-(trans)glycosidase related to ß-1,3-glucanase; membrane-anchored candidate ß-(trans)glycosidase distantly related to exo-b-1,3-glucanase candidate ß-(trans)glycosidase distantly related to exo-b-1,3-glucanase candidate ß-(trans)glycosidase distantly related to exo-b-1,3-glucanase candidate ß-glycosidase related to chitinase candidate ß-glycosidase distantly related to chitinase candidate ß-glycosidase related to chitinase candidate ß-glycosidase distantly related to bacterial chitinase candidate chitinase candidate ß-glycosidase distantly related to N-acetylglucosaminidase candidate ß-glycosidase related to exochitinase candidate ß-glycosidase candidate ß-glycosidase candidate polygalacturonase candidate ß-glycosidase related to rhamnogalacturonase candidate α-1,4-glucan lyase candidate α-1,4-glucan lyase candidate α-glucosidase candidate α-glucosidase candidate α-glycosidase related to a-glucosidase candidate ß-(transglycosidase) distantly related to ß-fructosidase and ßfructosyltransferase candidate α-glycosidase distantly related to plant α-galactosidase candidate α-galactosidase distantly related to α-galactosidase candidate α-mannosidase candidate arabinanase candidate endoglucanase candidate α-glycosidase related to α-mannosidase; transmembrane-anchored candidate α-glycosidase related to α-1,2-mannosidase; transmembrane-anchored candidate α-1,2-mannosidase; transmembrane-anchored candidate α-1,2-mannosidase candidate α-glycosidase distantly related to animal α-1,2-mannosidase candidate exo-b-1,3-glucanase candidate ß-glycosidase related to exo-b-1,3-glucanase candidate ß-glycosidase distantly related to endoglucanase candidate ß-glycosidase distantly related to endoglucanase candidate ß-glycosidase related to endoglucanase; C-term CBM1 module candidate ß-glycosidase distantly related to endoglucanase candidate processing a-glucosidase; ER-retention signal candidate α,α-trehalase; C-term CBM32 module candidate ß-1,3-glucanosyltransferase; GPI-anchored candidate ß-1,3-glucanosyltransferase; GPI-anchored candidate ß-1,3-glucanosyltransferase; C-term CBM43 module; GPI-anchored candidate ß-1,3-glucanosyltransglycosylase; GPI-anchored fragment of candidate chitosanase candidate α-(trans)glycosidase candidate α-(trans)glycosidase candidate α-(trans)glycosidase candidate cell-wall α-(trans)glycosidase; GPI-anchored candidate cell-wall α-(trans)glycosidase; GPI-anchored candidate cell α-(trans)glycosidase; GPI-anchored candidate α-L-rhamnosidase candidate ß-glycosidase related to α-L-rhamnosidase candidate ß-glycosidase related to ß-1,3-glucanase candidate ß-glycosidase related to ß-1,3-glucanase candidate ß-glycosidase distantly related to ß-1,3-glucanase candidate ß-glycosidase distantly related to endo-b-N-acetylglucosaminidase candidate α-glycosidase distantly related to bacterial α-1,2-mannosidase; transmembrane-anchored not assigned not assigned not assigned not assigned Glycosyl Transferase Family candidate ß-glycosyltransferase related to plant UDP-Glc:sterol ß-glucosyltransferase candidate ß-glycosyltransferase; N-terminal domain candidate UDP-Glc: sterol ß-glucosyltransferase candidate ß-glycosyltransferase; C-terminal domain 81 cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin cell-wall chitin pectin pectin starch/glycogen; ? starch/glycogen; ? sucrose? cellulose/xyloglucan N-glycan N-glycan N-glycan cell-wall a-mannan N-glycan cell-wall ß-glucan cell-wall ß-glucan cellulose? cellulose? cellulose cellulose? N-glycans trehalose cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall chitin cell-wall cell-wall cell-wall cell-wall cell-wall pectin pectin cell-wall ß-glucan cell-wall ß-glucan cell-wall ß-glucan cell-wall SUPPLEMENTARY INFORMATION GSTUMT00003671001 GT1 GSTUMT00001747001 GSTUMT00009079001 GSTUMT00002409001 GSTUMT00008447001 GSTUMT00011849001 GSTUMT00008119001 GSTUMT00005561001 GSTUMT00008120001 GSTUMT00004894001 GT2 GT2 GT2 GT2 GT2 GT2 GT2 GT2 GT2 candidate ß-glycosyltransferase distantly related to plant UDP-Glc: sterol ßglucosyltransferase candidate chitin synthase candidate chitin synthase candidate chitin synthase candidate chitin synthase candidate chitin synthase candidate chitin synthase candidate chitin synthase candidate chitin synthase candidate dolichyl-phosphate ß-glucosyltransferase GSTUMT00008671001 GT2 candidate ß-glycosyltransferase related to dolichyl-phosphate ß-mannosyltransferase GSTUMT00003828001 GSTUMT00001241001 GT3 GT4 candidate glycogen synthase candidate NDP-sugar α-glycosyltransferase related to α-1,3/6-mannosyltransferases GSTUMT00011323001 GSTUMT00003184001 GT4 GT4 candidate UDP-GlcNAc: phosphatidylinositol-α-N-acetylglucosaminyltransferase candidate NDP-sugar a-glycosyltransferase related to α-1,2-mannosyltransferase GSTUMT00000764001 GSTUMT00000765001 GSTUMT00000797001 GSTUMT00000799001 GSTUMT00000802001 GSTUMT00001113001 GSTUMT00001114001 GSTUMT00001115001 GSTUMT00003187001 GSTUMT00004809001 GSTUMT00006327001 GSTUMT00006328001 GSTUMT00006329001 GSTUMT00006330001 GSTUMT00006331001 GSTUMT00006332001 GSTUMT00006333001 GSTUMT00008118001 GSTUMT00009893001 GSTUMT00009901001 GSTUMT00009971001 GSTUMT00009975001 GSTUMT00011362001 GSTUMT00011363001 GSTUMT00011364001 GSTUMT00011366001 GSTUMT00011486001 GSTUMT00011488001 GSTUMT00011489001 GSTUMT00011491001 GSTUMT00011493001 GSTUMT00011494001 GSTUMT00011496001 GSTUMT00011497001 GSTUMT00011507001 GSTUMT00002531001 GSTUMT00005701001 GSTUMT00011097001 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT4 GT5 GT8 GT8 GSTUMT00011376001 GT15 candidate trehalose phosphorylase (TP) candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate NDP-sugar α−glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate α-glycosyltransferase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose-phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate NDP-sugar α-glycosyltransferase/α-glycan phosphorylase related to TP candidate trehalose phosphorylase candidate trehalose phosphorylase candidate trehalose phosphorylase candidate α-1,3/4-glucan synthase candidate glycogenin candidate NDP-sugar α-glycosyltransferase distantly related to inositol 1-αgalactosyltransferase candidate α-1-2-mannosyltransferase GSTUMT00008926001 GT15 candidate α-1-2-mannosyltransferase GSTUMT00000550001 GSTUMT00001845001 GSTUMT0001196100 GT20 GT20 GT20 GSTUMT00011684001 GSTUMT00006689001 GT21 GT22 GSTUMT00002928001 GT22 GSTUMT00006440001 GT22 candidate α,α-trehalose-phosphate synthase candidate α−α-trehalose-phosphate synthase candidate bifunctional α,α-trehalose-phosphate synthase / α,α-trehalose-phosphate phosphatase candidate NDP-sugar ß-glycosyltransferase related to ceramide glucosyltransferase candidate Dol-P-sugar a-glycosyltransferase distantly related to Dol-P-Man: α-1,2mannosyltransferase candidate Dol-P-sugar α-glycosyltransferase distantly related to α-1,2/6mannosyltransferases candidate Dol-P-sugar α-glycosyltransferase related to Dol-P-Man: Man3-GlcNAcphosphatidylinositol α-1,2-mannosyltransferase 82 chitin chitin chitin chitin chitin chitin chitin chitin N-glycans; Oglycans N-glycans; Oglycans glycogen N-glycans; Oglycans GPI N-glycans; Oglycans trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose trehalose α-1,3/4-glucan glycogen N-glycans; Oglycans N-glycans; Oglycans trehalose trehalose trehalose N-glycans; Oglycans GPI SUPPLEMENTARY INFORMATION GSTUMT00011236001 GT22 GSTUMT00003782001 GT24 candidate Dol-P-sugar α-glycosyltransferase;related to Dol-P-Man: Man2-GlcNAcphosphatidylinositoll α-1,2-mannosyltransferase α-1,2-mannosyltransferase candidate UDP-Glc: glycoprotein α-glucosyltransferase; ER-retention signal GSTUMT00004829001 GT32 candidate α-1-6-mannosyltransferase GSTUMT00006149001 GSTUMT00011329001 GSTUMT00011393001 GSTUMT00002332001 GT32 GT33 GT34 GT34 GSTUMT00005850001 GSTUMT00002656001 GSTUMT00001582001 GSTUMT00002470001 GSTUMT00009848001 GT35 GT39 GT39 GT39 GT41 GSTUMT00007491001 GSTUMT00007493001 GSTUMT00003075001 GSTUMT00009264001 GT48 GT48 GT50 GT57 GSTUMT00001621001 GT57 GSTUMT00001398001 GSTUMT00010811001 GT58 GT59 GSTUMT00011096001 GSTUMT00004018001 GSTUMT00010033001 GT62 GT62 GT62 GSTUMT00011157001 GSTUMT00006001001 GSTUMT00007196001 GSTUMT00003042001 GT66 GT69 GT69 GT76 candidate NDP-sugar α-glycosyltransferase candidate NDP-sugar ß-glycosyltransferase distantly related to ß-mannosyltransferase candidate NDP-sugar a-glycosyltransferase candidate NDP-sugar α-glycosyltransferase distantly related to α-1,2galactosyltransferase candidate glycogen phosphorylase candidate Dol-P-Man: protein-O-mannosyltransferase candidate protein-O-mannosyltransferase candidate Dol-P-Man: protein O-mannosyltransferase candidate NDP-sugar ß-glycosyltransferase related to plant and animal peptide Nacetylglucosaminyltransferase candidate 1,3-β-glucan synthase candidate 1,3-β-glucan synthase candidate Dol-P-sugar α-glycosyltransferase related to α-1,4-mannosyltransferase candidate Dol-P-sugar α-glycosyltransferase related to Dol-P-Glc: GlcMan9GlcNAc2PP-Dol α-1,3-glucosyltransferase candidate Dol-P-sugar α-glycosyltransferase related to Dol-P-Glc: Man9GlcNAc2-PPDol α-1,3-glucosyltransferase candidate Dol-P-sugar α-glycosyltransferase related to α-1,2/3-mannosyltransferases candidate Dol-P-sugar α-glycosyltransferase distantly related to α-1,2glucosyltransferase candidate α-1/6-mannosyltransferase candidate NDP-sugar α-glycosyltransferase related to α-1,2/6-mannosyltransferases candidate NDP-sugar α-glycosyltransferase distantly related to α-1,2/6mannosyltransferases candidate oligosaccharyltransferase GSTUMT00010168001 GT90 candidate NDP-sugar α-glycosyltransferase candidate Dol-P-sugar α-glycosyltransferase distantly related to α-1,6mannosyltransferase candidate NDP-sugar ß-glycosyltransferase GPI N-glycans; Oglycans N-glycans; Oglycans starch/glycogen O-glycans O-glycans O-glycans N-glycans; Oglycans β-1,3-glucan β-1,3-glucan GPI N-glycans N-glycans N-glycans N-glycans GPI Polysaccharide Lyase Family GSTUMT00006239001 GSTUMT00008534001 GSTUMT00004952001 PL1 PL1 PL4 candidate polysaccharide lyase related to pectin lyase (EC 4.2.2.10) candidate pectate lyase candidate polysaccharide lyase related to rhamnogalacturonan lyase pectin pectin pectin Carbohydrate-active enzymes (CAZymes) are categorized into different classes and families in the CAZy database (http://www.cazy.org). 83 SUPPLEMENTARY INFORMATION Table S25. Distribution of genes coding for membrane transporter families in T. melanosporum, other sequenced ascomycetes and Laccaria bicolor. Transporter type Family ATP-Dependent ABC (ATP-binding Cassette) ArsAB (Arsenite-Antimonite Efflux Family) F-ATPase (H+- or Na+translocating F-type, V-type and A-type ATPase MPT (Mitochondrial Protein Translocase) P-ATPase (P-type ATPase) Sec (General Secretory Pathway) TOTAL PROTEINS (ATPDependent) Ion Channels Amt (Ammonium Transporter) Annexin ClC (Chloride Channel) Mid1 (Yeast StretchActivated, Cation-Selective, Ca2+ Channel) MIP (Major Intrinsic Protein) MIT (CorA Metal Ion Transporter) MPP (Mitochondrial and Plastid Porin) MscS (Small Conductance Mechanosensitive Ion Channel NSCC2 (Non-selective Cation Channel-2) TRP-CC (Transient Receptor Potential Ca2+ Channel) VIC (Voltage-gated Ion Channel ) TOTAL PROTEINS (Ion Channels) Secondary Transporter AAAP (Amino Acid/Auxin Permease) ACR3 (Arsenical Resistance-3) AE (Anion Exchanger) AEC APC (Amino AcidPolyamine-Organocation) ACT (Amino Acid/Choline Transporter) LAT (L-type Amino Acid Transporter) YAT (Yeast Amino Acid Transporter) BASS CaCA (Ca2+:Cation Antiporter ) Evolutionary changes in Tuber Tuber1 Magnaporte2 Neurospora Botrytis Aspergillus Saccharomyces Laccaria 27 38 29 41 39 22 53 c 1 2 1 1 1 1 1 n 21 20 21 17 28 25 23 n 20 17 15 23 17 18 18 21 16 21 24 16 18 24 n n 7 7 7 6 7 10 8 n 4 1 3 3 1 3 1 3 3 1 3 4 2 3 3 0 1 7 1 3 n n n 1 3 1 6 1 1 1 10 1 5 1 4 1 7 n c 7 5 3 4 7 5 5 e 1 1 1 1 1 2 1 n 2 3 2 2 2 0 3 n 1 1 1 1 1 1 1 n 3 4 5 4 3 1 2 n 5 6 5 7 4 2 4 e 4 10 8 10 15 7 3 c 1 2 2 1 1 2 1 2 2 1 2 3 1 2 2 1 1 0 3 2 2 n n n 14 26 17 30 47 24 27 c 5 5 4 13 18 4 9 c 4 6 4 3 6 2 7 n 5 0 15 1 9 0 14 0 23 1 17 1 11 0 c n 5 7 8 9 6 4 7 c 4 84 SUPPLEMENTARY INFORMATION CCC (Cation-Chloride Cotransporter) CDF (Cation Diffusion Facilitator) CHR (Chromate Ion Transporter) CNT (Concentrative Nucleoside Transporter) CPA1 (Monovalent Cation:Proton Antiporter-1) CPA2 (Monovalent Cation:Proton Antiporter-2) DAACS DASS (Divalent Anion:Na+ Symporter) DMT (Drug/Metabolite Transporter) ENT (Equilibrative Nucleoside Transporter) FNT (Formate-Nitrite Transporter) GPH (Glycoside-PentosideHexuronide: Cation Symporter) GUP (Glycerol Uptake) KUP (K+ Uptake Permease) LCT (Lysosomal Cystine Transporter) MC (Mitochondrial Carrier) MFS (Major Facilitator Superfamily) MOP (Multidrug/Oligosaccharidyllipid/Polysaccharide) Flippase) MTC (Mitochondrial Tricarboxylate Carrier) NCS1 (Nucleobase:Cation Symporter-1) NCS2 (Nucleobase:Cation Symporter-2) NiCoT (Ni2+-Co2+ Transporter) Nramp (Metal Ion (Mn2+iron) Transporter) NSS OPT (Oligopeptide Transporter) Oxa1 (Cytochrome Oxidase Biogenesis) PiT (Inorganic Phosphate Transporter) POT (Proton-dependent Oligopeptide Transporter) RND (ResistanceNodulation-Cell Division) SSS (Solute:Sodium Symporter) SulP (Sulfate Permease) TDT (Teluriteresistance/Dicarboxylate Transporter) ThrE Trk (K+ Transporter) ZIP (Zinc (Zn2+)-Iron (Fe2+) Permease) TOTAL PROTEINS 1 1 1 1 1 1 0 n 5 8 8 8 6 5 8 c 1 1 1 0 2 0 4 n 1 1 1 0 1 0 1 n 2 4 3 4 6 2 5 c 1 0 3 1 2 0 2 0 3 1 1 0 2 0 c c 1 1 1 1 1 3 1 n 11 10 10 6 11 9 10 e 1 1 1 1 1 1 1 n 0 1 1 1 1 1 0 c 1 4 1 3 1 1 2 1 1 3 1 1 2 1 0 0 2 0 2 1 1 c e n 2 34 2 41 2 34 2 38 2 38 1 34 0 36 n c 91 251 141 204 356 85 96 c* 2 3 2 1 2 2 6 n 1 1 1 1 1 1 0 e 4 8 3 8 11 10 11 c 3 2 2 2 2 0 3 e 1 2 1 1 1 0 1 n 1 0 0 2 2 0 11 1 0 11 1 1 3 0 1 0 n c 3 12 6 3 10 c 1 1 1 1 1 1 1 n 1 3 1 1 4 1 0 n 3 4 2 3 4 2 3 n 1 1 1 1 1 1 1 n 1 4 3 6 2 4 2 4 4 4 1 4 2 5 c n 3 1 2 3 1 4 3 1 2 2 1 2 3 3 3 1 2 0 1 3 2 n n n 7 8 5 8 7 3 5 e 85 SUPPLEMENTARY INFORMATION (Secondary Transporter) Incompletely Characterized Transport Systems ATP-E (ATP Exporter) Ctr (Copper Transporter ) FP (Ferroportin) ILT (Iron/Lead Transporter) MgtE (Mg2+ Transporter-E) MHP (Metal Homeostasis Protein) MnHP (Mn2+ Homeostasis Protein) PF27 PLI (Phospholipid Importer) PPI (Protein Importer) SHP (Stress-Induced Hydrophobic Peptide) YaaH (ATO) 1 2 1 1 0 1 6 1 1 0 1 2 0 1 0 1 2 1 1 0 1 3 1 0 0 1 1 1 1 1 0 1 2 7 0 1 1 7 0 1 1 7 0 1 1 5 0 1 1 8 0 0 0 0 0 0 1 1 3 3 1 3 0 2 0 1 1 4 2 2 2 1 1 0 1 3 8 3 4 1 8 5 2 2 Family names are based on the Transport Classification Database: http://www.tcdb.org/). Lineage specific gene gain and loss in the membrane transporter families have been estimated using CAFE33. Listed are 74 transporter families containing 31 expansions (8), 42 no change (n), 24 contraction (c) families in the T. melanosporum lineage. “*” indicates significant expansion (P<0.001) in the Tuber branch (SOM). Abbreviations: xxx 1http://genome.jgi-psf.org/Lacbi1/Lacbi1.home.html 86 SUPPLEMENTARY INFORMATION Table S26. T. melanosporum tRNA genes grouped by anticodons. The tRNAScan-SE algorithm was used with default parameters and the Eukaryotic model. 87 SUPPLEMENTARY INFORMATION Table S27. Overall codon usage of T. melanosporum genes 88 SUPPLEMENTARY INFORMATION Table S28. Codon usage of T. melanosporum genes coding for ribosomal proteins 89 SUPPLEMENTARY INFORMATION The Black Truffle Genome Uncovers Evolutionary Origins and Mechanisms of Symbiosis Supplementary Tables 90 SUPPLEMENTARY INFORMATION Figure S1. The life cycle of the black truffle of Perigord (Tuber melanosporum). (1) The spores released from mature truffles germinate (2) in the following Spring, producing a vegetative mycelium, which results in colonisation of tree root tips and further development of the symbiosis (3).In the ectomycorrhizal symbiotic relationship, long, branching fungal filaments known as hyphae ramify between cells of the host root’s outer layers, form a sheath around the root, and radiate outwards into the surrounding soil and litter. In early Summer, extramatrical hyphae aggregate to form fruit body initials (4). The latter developed to the fruit body during Fall and early Winter giving rise to mature truffles (5). 91 SUPPLEMENTARY INFORMATION Figure S2. Phylogeny of some representative sequenced Ascomycetes and Basidiomycetes. The alignment was realized with the two best phylogenetic genes (MS277 and MS456) identified by (83). The Bayesian Inference (BI) based upon the posterior probability distribution of trees was performed with MrBayes (84) with the following settings: Lset rates=gamma; Prset aamodelpr=mixed; mcmcp ngen=100,000 samplefreq=50; other settings = default. The sump burnin=500 command was used to verify the stationary of the analysis. The sumt burnin=500 command was used to produce summary statistics for trees sampled during the Bayesian analysis. The consensus tree was visualized and edited with FigTree v1.2.1 (http://tree.bio.ed.ac.uk/). 92 SUPPLEMENTARY INFORMATION Figure S3. The diversity and distribution of class I and class II transposable elements in T. melanosporum. The TE were identified using the REPET pipeline (19). TE structures are depicted according to (85). The number of TE occurrences and the % genome coverage were identified with RepeatMasker (www.repeatmasker.org) using the 846 consensus sequences coming from the TEdenovo pipeline (19). 93 SUPPLEMENTARY INFORMATION Figure S4. Genome coverage (%) of the different T. melanosporum transposable element families. 94 SUPPLEMENTARY INFORMATION Figure S5. Major cycles of LTR retrotransposon activity. T. melanosporum underwent at least two cycles of LTR-R amplifications. The most recent activity peaks at an estimated 2.5 Mya, preceded by a gradual increase starting 5.5 Mya. An old activity occurred at >10 Mya. The decrease between 10 to 5 Mya probably reflects element deterioration leading to loss of ability to detect these elements. Consensus full-length copies of each element are shown. A substitution mutattion rate of 1.3 x 10-8 was used. 95 SUPPLEMENTARY INFORMATION Figure S6. Phylogenetic relationships amongst RNA silencing- and DNA methylation-related gene products from T. melanosporum and other fungi. Functionally characterized components from N. crassa and S. pombe as well as putative homologs from T. melanosporum are in bold; different types of siRNA-related processes are indicated on the right of the neighbor-joining trees. 96 SUPPLEMENTARY INFORMATION 97 SUPPLEMENTARY INFORMATION Figure S7. Distribution of gene density in T. melanosporum and in representative ascomycetes from the Eurotiomycetes and Sordariomycetes. 98 SUPPLEMENTARY INFORMATION Figure S8. (A) Analysis of molecular divergence between the Pezizomycete T. melanosporum and selected fungi, the Zygomycete Rhizopus oryzae, Ascomycetes from the Saccharomycetes (Saccharomyces cerevisiae), Eurotiomycetes (Aspergillus nidulans), Leotiomycetes (Botrytis cinerea), Sordariomycetes (Neurospora crassa, Magnaporthe grisea), and Basidiomycetes (Laccaria bicolor). The truffle–yeast pair displays the lowest amino acid identity (41.8%), in agreement with their proposed ancient separation, > 450 Myr ago (13). In the figure, we represent the cumulative frequencies of amino acid identity across each set of potential orthologous pairs shown. (B) Distribution of protein similarity (%) between T. melanosporum and representative ascomycetes. BLASTP (Best Reciprocal Hits) 99 SUPPLEMENTARY INFORMATION Fig. S9. Synteny between Tuber melanosporum and selected ascomycete genomes. (A) Oxford plot of T. melanosporum scaffolds (x axis) plotted against Coccidioides immitis chromosomes (y axis). In such a presentation, conserved segmental homologies are visualized by diagonally oriented clusters, or at least by co-clustering of genes on genomic scaffolds. The lack of such clusters indicates the lack of any major synteny between the two fungal genomes, although several microsyntenic regions can be visualized, e.g., between T. melanosporum scaffold 6 and Coccidioides immitis chromosome 12. (B) Table summarizing the number of genomic blocks and genes showing a synteny between Tuber melanosporum and selected Ascomycete belonging to the major Ascomycete phylogenetic groups. 100 SUPPLEMENTARY INFORMATION Figure S10. The largest syntenic region between T. melanosporum (Pezizomycetes) and Coccoides immitis (Eurotiomycetes). This region only contains 99 genes with 39 orthologs. The scheme shows the central part of this syntenic region. 101 SUPPLEMENTARY INFORMATION Figure S11. Gene families in the truffle genome. (A) The percentage of amino-acid identity of the top-scoring self-matches for protein-coding genes in T. melanosporum, Saccharomyces cereviseae, Aspergillus nidulans, Neurospora crassa, Magnaporthe grisea, and Botrytis cinerea. For each fungus, the protein-coding regions for each gene were compared with those of every other gene in the same genome using BLASTX. Top scoring matches were aligned and percentage of identities were calculated. N. crassa and T. melanosporum possesse a low set of highly similar (>90%) gene pairs, 13 and 7, respectively. (B) Expanding and contracting gene families (as determined by TRIBE-MCL) in T. melanosporum and representative sequenced ascomycetes. The numbers on the branches show the numbers of expanded (left, red), unchanged (middel, black) or contracted (right, blue) protein families along the lineages. 102 SUPPLEMENTARY INFORMATION Figure S12. Distribution of multigene families (Tribe-MCL) in T. melanosporum, representative ascomycetes and the ectomycorrhizal basidiomycete Laccaria bicolor. 103 SUPPLEMENTARY INFORMATION Figure S13. Functional comparison of the PFAM protein domains of T. melanosporum with other sequenced ascomycetes. Hierarchical clustering based on the relative of PFAM protein domains. The top 100 PFAM domains found in T. melanosporum were selected. The frequency values were transformed into z-scores, which are measure of relative enrichment (red) and depletion (green). The data were clustered according to species and PFAM domains by using a euclidean distance metric (Cluster 3.0) (http://www.falw.vu/~huik/cluster.htm). The results were visualized by using Java Treeview (http://sourceforge.net/projects/jtreeview/). The T. melanosporum proteome is enriched for proteins containing TPR1, histone, and PPR domains (see section 5.3). 104 SUPPLEMENTARY INFORMATION Figure S14. Distribution of orphan gene coding for lineage-specific protein on the largest genomic supercontigs of T. melanosporum. Orphans are in yellow, whereas gene models are in blue and TE in red. Orphan genes are randomly scattered over the protein-coding regions. 105 SUPPLEMENTARY INFORMATION Fig. S15. Heat map showing the identity of 92 established fungal allergens to their best homologs in the predicted T. melanosporum proteome and in seven additional fungal genomes, including the reference GRAS fungi S. cerevisiae and N. crassa, and the highly allergenic A. fumigatus. The map was constructed from an MS excel file using EPClust. Identities are coded by increasing colour saturation, with bright red corresponding to the highest degree of identity and black indicating the lack of a given allergen homolog in a given genome. 106 SUPPLEMENTARY INFORMATION Figure S16. (Top panel) Outline of sulfur metabolism and of the corresponding genes and pathways in T. melanosporum. Numbers in bold rightside to the names of the 10 main S-pathways indicate the number of the corresponding genes; numbers in italics indicate the number of gene products involved in specific reactions. Orphan reactions are crossed; pathways or components that are more represented in Tuber than in Neurospora are indicated with green numbers. (Bottom panel) Relative expression levels of genes involved in sulfate internalization and reduction (A) and Cys/Met biosynthesis and interconversion (B) in different lifecycle stages of T. melanosporum. The specific log2 ratios utilized for EPICLUST analysis, represented in a false color scale, are indicated above each column; gene names, shown on the right, are as specified in the metabolic map. 107 SUPPLEMENTARY INFORMATION Figure S17. Metabolic map of the “Cys/Met biosynthesis & interconversion” pathway and mRNA expression levels of the corresponding enzymes. Expression levels are the mean of filtered and normalized hybridization signals derived from multireplicate experiments: expression values for free-living mycelia (FLM) and fruiting bodies (FB) are shown in green and black, respectively. Of note: i) the preference for homocysteine (rxn. #2) vs cysteine (rxn. #1) biosynthesis in FB; and ii) the disproportionately high expression levels of cystathionine γ-lyase (CGL, #5) and cystathionine β-lyase (CBL, #4) in FBs compared to those of the preceding cystathionine synthase enzymes (#3 and #5), both of which are less expressed (3-6 fold) in FBs than in FLM. Alternative reactions supported by CGL and CBL homologs from lactic bacteria (Liu et al 2008), potentially relevant for S-VOC formation in T. melanosporum, are indicated by dashed arrows. OAS, O-acetylserine; OAH, O-acetylhomoserine; SAM, S-adenosylmethionine; SAH, Sadenosylhomocysteine; DMS, dimethylsulfide; DMDS, dimethyl-disulfide; DMTS, dimethyl-trisulfide. 108 SUPPLEMENTARY INFORMATION Figure S18. Predicted Ehrlich pathways leading to characteristic truffle volatile organic compounds (VOC). Based on known yeast pathways, the catabolism of five amino acids could lead in T. melanosporum to the formation of the aldehydes and alcohols (given with their structures) that are key contributors of the truffle aroma. Compounds with a high volatility are in bold. Genes indentified in T. melanosporum potentially involved in the Ehrlich pathway (transamination, decarboxylation and reduction steps, full arrows) are listed on the right part of the figure. The formation of dimethyl sulfide (DMS), dimethyl disulfide (DMDS) and dimethyltrisulfide (DMTS) from 3-(methylthio)propanal (dashed arrow) can occur through chemical non-enzymatic degradation. 109 SUPPLEMENTARY INFORMATION Figure S19. Schematic representation of the mating type locus in T. melanosporum. The idiomorphic regions are in light grey, the black lines indicate the common flanking regions. The MAT1-2 idiomorph is 4770 bp long and contains the MAT1-2-1 gene (red arrowed box). The MAT1-1 idiomorph is 7470 bp long and contains the MAT1-1-1 gene (blue arrowed box). Within each MAT gene, the position of introns and conserved HMG-box and α-box regions are indicated. The arrows indicate the direction of transcription. An additional putative ORF was detected in the MAT1-2 idiomorph (pink arrowed box; gene model: GSTUMT00001089001). The yellow, green and dark-grey boxes represent regions sharing sequence similarities between idiomorphs. Yellow boxes indicate inverted repeat elements with about 82% of sequence similarity. The dark-grey and green boxes indicate regions with 71% and 76% of sequence similarity, respectively. The regions marked with the green boxes show high similarity (BlastX: Score = 242, E-value = 2e-62 ) with an ankyrin repeat-containing protein. The 45° grid pattern indicates the opposite orientation of similar sequences between the idiomorphs. The flanking region downstream the MAT1-1 idiomorph contains a 493 bp long insertion used to design a backward primer to specifically amplify the MAT1-1 idiomorph. A putative ORF (white arrowed box; gene model: GSTUMT00001092001) with no significant similarity in the GenBank database but highly conserved (94% of sequence identity) between idiomorphs is present downstream the MAT locus. 110 SUPPLEMENTARY INFORMATION Fig. S20. Schematic comparison of the genomic regions flanking the MAT12 locus in T. melanosporum, Fusarium graminearum, Botrytis cinerea and Coccidioides immitis drawn using Chromomapper software (Niculita-Hirzel & Hirzel 2008). This analysis shows the low level of synteny in T.melanosporum mating type region compared with other ascomycetes. In T. melanosporum, the only conserved gene in linkage with MAT-1-2 is the gene coding for cytochrome c oxidase, subunit VIa/COX13 gene (GSTUMT00001086001). 111 SUPPLEMENTARY INFORMATION Figure S21. Hierarchical clustering tree view of transcripts coding for carbohydrate-active enzymes (CAZymes) from T. melanosporum in freeliving mycelium, ectomycorrhizal root tips and fruiting bodies. Clustering analysis was carried out using the EPCLUST clustering tool. Each horizontal line displays the expression ratio for one gene in symbiotic tissues, fruiting bodies or free-living mycélium vs. a mean expression reference calculated from all arrays. Each gene is represented by a row of coloured boxes and each stage is represented by a single column. Regulation levels range from pale to saturated colors (red for induction; green for repression). Black indicates no change in gene expression. ECM, T. melanosporum/Corylus avelana ectomycorrhizae. See also Supplementary section 9. 112 SUPPLEMENTARY INFORMATION Figure S22. Distribution of secreted peptidase gene models in T. melanosporum and other saprotrophic or pathogenic fungi. Secreted peptidase classification is based on the MEROPS database (http://merops.sanger.ac.uk). Signal peptide was predicted using TargetP (http://www.cbs.dtu.dk/services/TargetP/). 113 SUPPLEMENTARY INFORMATION Figure S23. Distribution of expression levels for the predicted T. melanosporum gene models. Low expression, <500; medium expression, 500-5000 and high expression >5000. FLM free-living mycelium ; ECM ectomycorrhizal root tips ; FB fruiting bodies. 114
Keep reading this paper — and 50 million others — with a free Academia account
Used by leading Academics
Fernando Muñoz
Universidad Nacional del Litoral
Merrill Gassman
University of Illinois at Chicago
Ute Krämer
Ruhr University Bochum Germany
John Leslie
Kansas State University