The complete genome sequence of the Gram-positive bacterium Bacillus subtilis

Kunst, F.; Ogasawara, N.; Moszer, I.; Albertini, A. M.; Alloni, G.; Azevedo, V.; Bertero, M. G.; Bessières, P.; Bolotin, A.; Borchert, S.; Borriss, R.; Boursier, L.; Brans, A.; Braun, M.; Brignell, S. C.; Bron, S.; Brouillet, S.; Bruschi, C. V.; Caldwell, B.; Capuano, V.; Carter, N. M.; Choi, S.-K.; Codani, J.-J.; Connerton, I. F.; Cummings, N. J.; Daniel, R. A.; Denizot, F.; Devine, K. M.; Düsterhöft, A.; Ehrlich, S. D.; Emmerson, P. T.; Entian, K. D.; Errington, J.; Fabret, C.; Ferrari, E.; Foulger, D.; Fritz, C.; Fujita, M.; Fujita, Y.; Fuma, S.; Galizzi, A.; Galleron, N.; Ghim, S.-Y.; Glaser, P.; Goffeau, A.; Golightly, E. J.; Grandi, G.; Guiseppi, G.; Guy, B. J.; Haga, K.; Haiech, J.; Harwood, C. R.; Hénaut, A.; Hilbert, H.; Holsappel, S.; Hosono, S.; Hullo, M.-F.; Itaya, M.; Jones, L.; Joris, B.; Karamata, D.; Kasahara, Y.; Klaerr-Blanchard, M.; Klein, C.; Kobayashi, Y.; Koetter, P.; Koningstein, G.; Krogh, S.; Kumano, M.; Kurita, K.; Lapidus, A.; Lardinois, S.; Lauber, J.; Lazarevic, V.; Lee, S.-M.; Levine, A.; Liu, H.; Masuda, S.; Mauël, C.; Médigue, C.; Medina, N.; Mellado, R. P.; Mizuno, M.; Moestl, D.; Nakai, S.; Noback, M.; Noone, D.; O'Reilly, M.; Ogawa, K.; Ogiwara, A.; Oudega, B.; Park, S.-H.; Parro, V.; Pohl, T. M.; Portetelle, D.; Porwollik, S.; Prescott, A. M.; Presecan, E.; Pujic, P.; Purnelle, B.; Rapoport, G.; Rey, M.; Reynolds, S.; Rieger, M.; Rivolta, C.; Rocha, E.; Roche, B.; Rose, M.; Sadaie, Y.; Sato, T.; Scanlan, E.; Schleich, S.; Schroeter, R.; Scoffone, F.; Sekiguchi, J.; Sekowska, A.; Seror, S. J.; Serror, P.; Shin, B.-S.; Soldo, B.; Sorokin, A.; Tacconi, E.; Takagi, T.; Takahashi, H.; Takemaru, K.; Takeuchi, M.; Tamakoshi, A.; Tanaka, T.; Terpstra, P.; Tognoni, A.; Tosato, V.; Uchiyama, S.; Vandenbol, M.; Vannier, F.; Vassarotti, A.; Viari, A.; Wambutt, R.; Wedler, E.; Wedler, H.; Weitzenegger, T.; Winters, P.; Wipat, A.; Yamamoto, H.; Yamane, K.; Yasumoto, K.; Yata, K.; Yoshida, K.; Yoshikawa, H.-F.; Zumstein, E.; Yoshikawa, H.; Danchin, A.

doi:10.1038/36786

Download PDF

Article
Open access
Published: 20 November 1997

The complete genome sequence of the Gram-positive bacterium Bacillus subtilis

F. Kunst¹,
N. Ogasawara²,
I. Moszer³,
A. M. Albertini⁴,
G. Alloni⁴,
V. Azevedo⁵,
M. G. Bertero^3,4,
P. BessiÃ¨res⁵,
A. Bolotin⁵,
S. Borchert⁶,
R. Borriss⁷,
L. Boursier³,
A. Brans⁸,
M. Braun⁹,
S. C. Brignell¹⁰,
S. Bron¹¹,
S. Brouillet^3,12,
C. V. Bruschi¹³,
B. Caldwell¹⁴,
V. Capuano⁵,
N. M. Carter¹⁰,
S.-K. Choi¹⁵,
J.-J. Codani¹⁶,
I. F. Connerton¹⁷,
N. J. Cummings¹⁷,
R. A. Daniel¹⁸,
F. Denizot¹⁹,
K. M. Devine²⁰,
A. DÃ¼sterhÃ¶ft⁹,
S. D. Ehrlich⁵,
P. T. Emmerson²¹,
K. D. Entian⁶,
J. Errington¹⁸,
C. Fabret¹⁹,
E. Ferrari¹⁴,
D. Foulger¹⁸,
C. Fritz⁹,
M. Fujita²²,
Y. Fujita²³,
S. Fuma²⁴,
A. Galizzi⁴,
N. Galleron⁵,
S.-Y. Ghim¹⁵,
P. Glaser³,
A. Goffeau²⁵,
E. J. Golightly²⁶,
G. Grandi²⁷,
G. Guiseppi¹⁹,
B. J. Guy¹⁰,
K. Haga²⁸,
J. Haiech¹⁹,
C. R. Harwood¹⁰,
A. HÃ©naut²⁹,
H. Hilbert⁹,
S. Holsappel¹¹,
S. Hosono³⁰,
M.-F. Hullo³,
M. Itaya³¹,
L. Jones³²,
B. Joris⁸,
D. Karamata³³,
Y. Kasahara²,
M. Klaerr-Blanchard³,
C. Klein⁶,
Y. Kobayashi³⁰,
P. Koetter⁶,
G. Koningstein³⁴,
S. Krogh²⁰,
M. Kumano²⁴,
K. Kurita²⁴,
A. Lapidus⁵,
S. Lardinois⁸,
J. Lauber⁹,
V. Lazarevic³³,
S.-M. Lee³⁵,
A. Levine³⁶,
H. Liu²⁸,
S. Masuda³⁰,
C. MauÃ«l³³,
C. MÃ©digue^3,12,
N. Medina³⁶,
R. P. Mellado³⁷,
M. Mizuno³⁰,
D. Moestl⁹,
S. Nakai²,
M. Noback¹¹,
D. Noone²⁰,
M. O'Reilly²⁰,
K. Ogawa²⁴,
A. Ogiwara³⁸,
B. Oudega³⁴,
S.-H. Park¹⁵,
V. Parro³⁷,
T. M. Pohl³⁹,
D. Portetelle⁴⁰,
S. Porwollik⁷,
A. M. Prescott¹⁸,
E. Presecan³,
P. Pujic⁵,
B. Purnelle²⁵,
G. Rapoport¹,
M. Rey²⁶,
S. Reynolds³³,
M. Rieger⁴¹,
C. Rivolta³³,
E. Rocha^3,12,
B. Roche³⁶,
M. Rose⁶,
Y. Sadaie²²,
T. Sato³⁰,
E. Scanlan²⁰,
S. Schleich³,
R. Schroeter⁷,
F. Scoffone⁴,
J. Sekiguchi⁴²,
A. Sekowska³,
S. J. Seror³⁶,
P. Serror⁵,
B.-S. Shin¹⁵,
B. Soldo³³,
A. Sorokin⁵,
E. Tacconi⁴,
T. Takagi⁴³,
H. Takahashi²⁸,
K. Takemaru³⁰,
M. Takeuchi³⁰,
A. Tamakoshi²⁴,
T. Tanaka⁴⁴,
P. Terpstra¹¹,
A. Tognoni²⁷,
V. Tosato¹³,
S. Uchiyama⁴²,
M. Vandenbol⁴⁰,
F. Vannier³⁶,
A. Vassarotti⁴⁵,
A. Viari¹²,
R. Wambutt⁴⁶,
E. Wedler⁴⁶,
H. Wedler⁴⁶,
T. Weitzenegger³⁹,
P. Winters¹⁴,
A. Wipat¹⁰,
H. Yamamoto⁴²,
K. Yamane²⁴,
K. Yasumoto²⁸,
K. Yata²²,
K. Yoshida²³,
H.-F. Yoshikawa²⁸,
E. Zumstein⁵,
H. Yoshikawa² &
â¦
A. Danchin³Â

Nature volumeÂ 390,Â pages 249â256 (1997)Cite this article

83k Accesses
3111 Citations
38 Altmetric
Metrics details

Abstract

Bacillus subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs comprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest family containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources, including many plant-derived molecules. The identification of five signal peptidase genes, as well as several genes for components of the secretion apparatus, is important given the capacity of Bacillus strains to secrete large amounts of industrially important enzymes. Many of the genes are involved in the synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces species. The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.

A comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome

Article Open access 18 November 2022

Ancient origin and constrained evolution of the division and cell wall gene cluster in Bacteria

Article 21 November 2022

Connecting genomic islands across prokaryotic and phage genomes via protein families

Article Open access 07 January 2023

Main

Techniques for large-scale DNA sequencing have brought about a revolution in our perception of genomes. Together with our understanding of intermediary metabolism, it is now realistic to envisage a time when it should be possible to provide an extensive chemical definition of many living organisms. During the past couple of years, the genome sequences of Haemophilus influenzae, Mycoplasma genitalium, Synechocystis PCC6803, Methanococcus jannaschii, M. pneumoniae, Escherichia coli, Helicobacter pylori, Archaeoglobus fulgidus and the yeast Saccharomyces cerevisiae have been published in their entirety^{1,2,3,4,5,6,7,8}, and at least 40 prokaryotic genomes are currently being sequenced. Regularly updated lists of genome sequencing projects are available at http://www.mcs.anl.gov/home/gaasterl/genomes.html(Argonne National Laboratory, Illinois, USA) and http://www.tigr.org(TIGR, Rockville, Maryland, USA).

The list of sequenced microorganisms does not currently include a paradigm for Gram-positive bacteria, which are known to be important for the environment, medicine and industry. Bacillus subtilis has been chosen to fill this gap⁹,¹⁰ as its biochemistry, physiology and genetics have been studied intensely for more than 40 years. B. subtilis is an aerobic, endospore-forming, rod-shaped bacterium commonly found in soil, water sources and in association with plants. B. subtilis and its close relatives are an important source of industrial enzymes (such as amylases and proteases), and much of the commercial interest in these bacteria arises from their capacity to secrete these enzymes at gram per litre concentrations. It has therefore been used for the study of protein secretion and for development as a host for the production of heterologous proteins¹¹. B. subtilis (natto) is also used in the production of Natto, a traditional Japanese dish of fermented soya beans.

Under conditions of nutritional starvation, B. subtilis stops growing and initiates responses to restore growth by increasing metabolic diversity. These responses include the induction of motility and chemotaxis, and the production of macromolecular hydrolases (proteases and carbohydrases) and antibiotics. If these responses fail to re-establish growth, the cells are induced to form chemically, irradiation- and desiccation-resistant endospores. Sporulation involves a perturbation of the normal cell cycle and the differentiation of a binucleate cell into two cell types. The division of the cell into a smaller forespore and a larger mother cell, each with an entire copy of the chromosome, is the first morphological indication of sporulation. The former is engulfed by the latter and differential expression of their respective genomes, coupled to a complex network of interconnected regulatory pathways and developmental checkpoints, culminates in the programmed death and lysis of the mother cell and release of the mature spore¹². In an alternative developmental process, B. subtilis is also able to differentiate into a physiological state, the competent state, that allows it to undergo genetic transformation¹³.

General features of the DNA sequence

Analysis at the replicon level.The B. subtilis chromosome has 4,214,810 base pairs (bp), with the origin of replication coinciding with the base numbering start point¹⁴, and the terminus at about 2,017 kilobases (kb)¹⁵. The average G+C ratio is 43.5%, but it varies considerably throughout the chromosome. This average is also different if one considers the nucleotide content of coding sequences, for which G and A (24% and 30%) are relatively more abundant than their counterparts C and T (20% and 26%). A significant inversion of the relative Gâ C/G + C ratio is visible at the origin of replication, indicating asymmetry of the nucleotide composition between the replication leading strand and the lagging strand¹⁶. Several A+ T-rich islands are likely to reveal the signature of bacteriophage lysogens or other inserted elements (Fig. 1, see below).

**Figure 1: Distribution of A+ T-rich islands along the chromosome of *B. subtilis*, in sliding windows of 10,000 nucleotides, with a step of 5,000 nucleotides.**

We have analysed the abundance of oligonucleotides (âwordsâ) in the genome in various ways: absolute number of words in the genomic text, or comparison with the expected count derived from several models of the chromosome (for example, Markov models, or simulated sequences in which previously known features of the genome were conserved¹⁷). Comparing the experimental data with various models allowed us to define under- and overrepresentation of words in the experimental data set by reference to the model chosen. In general, the dinucleotide bias follows closely what has been described for other prokaryotes¹⁸,¹⁹, in that the dinucleotides most overrepresented are AA, TT and GC, whereas those less represented are TA, AC and GT. Plots of the frequencies of AG, GA, CT and TC in sliding windows along the chromosome show dramatic decreases or increases around the origin and terminus of replication (data not shown). Trinucleotide frequency, directly related to the coding frame, will be discussed below. The distribution of words of four, five and six nucleotides shows significant correlations between the usage of some words and replication (several such oligonucleotides are very significantly overrepresented in one of the strands and underrepresented in the other one).

Setting a statistical cut-off for the significance of duplications at 10^â3, we expected duplication by chance of words longer than 24 nucleotides to be rare²⁰. In fact, the genome of B. subtilis contains a plethora of such duplications, some of them appearing more than twice. Among the duplications, we identified, as expected, the ribosomal RNA genes and their flanking regions, but also regions known to correspond to genes comprising long sequence repeats (such as pks and srf). We also found several regions that were not expected: a 182-bp repetition within the yyaL and yyaO genes; a 410-bp repetition between the yxaK and yxaL genes; an internal duplication of 174âbp inside ydcI; and significant duplications in the regions involved in the transcriptional control of several genes (such as 118âbp repeated three times between yxbB and yxbC). Finally, we found several repetitions at the borders of regions that might be involved in bacteriophage integration.

The most prominent duplication was a 190-bp element that was repeated 10 times in the chromosome. Multiple alignment of the ten repeats showed that they could be classified into two subfamilies with six and three copies each, plus a copy of what appears to be a chimaera. Similar sequences have also been described in the closely related species Bacillus licheniformis²¹,²². A striking feature of these repeats is that they are only found in half of the chromosome, at either side of the origin of replication, with five repeats on each side. Furthermore, with the exception of the most distal repeat at position 737,062, they lie in the same orientation with respect to the movement of the replication fork (Figs 2 (PDF File: 1,684k) and 3). Putative secondary structures conserved by compensatory mutations, as well as an insert in three of the copies, suggest that this element could indicate a structural RNA molecule.

Figure 3: Density of coding nucleotides along the *B. subtilis* chromosome. Yellow stands for the density of coding nucleotides in both strands of the sequence; red indicates the density of coding nucleotides in the clockwise strand (nucleotides involved in genes transcribed in the clockwise orientation).

Analysis at the transcription and translation level. Over 4,000 putative protein coding sequences (CDSs) have been identified, with an average size of 890âbp, covering 87% of the genome sequence (Fig. 2 (PDF File: 1,684k)). We found that 78% of the genes started with ATG, 13% with TTG and 9% with GTG, which compares with 85%, 3% and 14%, respectively, in E. coli⁸. Fifteen genes (eight in the predicted CDSs in bacteriophage SPÎ²) exhibiting unusual start codons (namely ATT and CTG) were also identified through their similarities to known genes in other organisms or because they had a good GeneMark prediction (see Methods). This has not yet been substantiated experimentally. However, in the case of the gene coding for translation initiation factor 3, the similarity with its E. coli counterpart strongly suggests that the initiation codon is ATT, as is the case in E. coli.

We have not annotated CDSs that largely or entirely overlap existing genes, although such genes (for example, comS inside srfAA) certainly exist. It is also likely that some of the short CDSs present in the B. subtilis genome have been overlooked. For these reasons and possible sequencing errors, the estimated number of B. subtilis CDSs will fluctuate around the present figure of 4,100.

In several cases, in-frame termination codons or frameshifts were confirmed to be present on the chromosome (for example, an internal termination codon in ywtF, or the known programmed translational frameshift in prfB), indicating that the genes are either non-functional (pseudogenes) or subject to regulatory processes. It will therefore be of interest to determine whether these gene features are conserved in related Bacillus species, especially as strain 168 is derived from the Marburg strain that was subjected to X-ray irradiation²³.

A few regions do not have any identifiable feature indicating that they are transcribed: they could be âgrey holesâ of the type described in E. coli²⁴. Preliminary studies involving all regions of more than 400âbp without annotated CDSs indicated that, of â¼300 such regions, only 15% were likely to be really devoid of protein-coding sequences. One of the longest such regions, located between yfjO and yfjN, is 1,628âbp long. Grey holes seem generally to be clustered near the terminus of replication. However, a grey-hole cluster located at â¼600âkb might be related to the temporary chromosome partition observed during the first stages of sporulation, when a segment of about one-third of the chromosome enters the prespore, and remains the sole part of the chromosome in the prespore for a significant transition period²⁵.

The codon usage of B. subtilis CDSs was analysed using factorial correspondence analysis¹⁷. We found that the CDSs of B. subtilis could be separated into three well-defined classes (Fig. 4). Class 1 comprises the majority of the B. subtilis genes (3,375 CDSs), including most of the genes involved in sporulation. Class 2 (188 CDSs) includes genes that are highly expressed under exponential growth conditions, such as genes encoding the transcription and translation machineries, core intermediary metabolism, stress proteins, and one-third of genes of unknown function. Class 3 (537 CDSs) contains a very high proportion of genes of unidentified function (84%), and the members of this class have codons enriched in A+ T residues. These genes are usually clustered into groups between 15 and 160 genes (for example, bacteriophage SPÎ²) and correspond to the A+ T-rich islands described above (Fig. 1). When they are of known function, or when their products display similarity to proteins of known function, they usually correspond to functions found in, or associated with, bacteriophages or transposons, as well as functions related to the cell envelope. This includes the region ydc/ydd/yde (40 genes that are missing in some B. subtilis strains²⁶), where gene products showing similarities to bacteriophage and transposon proteins are intertwined. Many of these genes are associated with virulence genes identified in pathogenic Gram-positive bacteria, suggesting that such virulence factors are transmitted horizontally among bacteria at a much higher frequency than previously thought. If we include these A+ T-rich regions as possible cryptic phages, together with known bacteriophages or bacteriophage-like elements (SPÎ², PBSX and the skin element), we find that the genome of B. subtilis 168 contains at least 10 such elements (Figs 2 (PDF File: 1,684k) and 3). Annotation of the corresponding regions often reveals the presence of genes that are similar to bacteriophage lytic enzymes, perhaps accounting for the observation that B. subtilis cultures are extremely prone to lysis.

**Figure 4: Factorial correspondence analysis of codon usage in the *B. subtilis* CDSs.**

The ribosomal RNA genes have been previously identified and shown to be organized into ten rRNA operons, mainly clustered around the origin of replication of the chromosome (Figs 2 (PDF File: 1,684k) and 3). In addition to the 84 previously identified tRNA genes, by using the Palingol²⁷ and tRNAscan²⁸ programs, we propose four putative new tRNA loci (at 1,262âkb, 1,945âkb, 2,003âkb and 2,899âkb), specific for lysine, proline and arginine (UUU, GGG, CCU and UCU anticodons, respectively). The 10S RNA involved in degradation of proteins made from truncated mRNA has been identified (ssrA), as well as the RNA component of RNase P (rnpB) and the 4.5S RNA involved in the secretion apparatus (scr).

There is a strong transcription orientation bias with respect to the movement of the replication fork: 75% of the predicted genes are transcribed in the direction of replication. Plotting the density of coding nucleotides in each strand along the chromosome readily identifies the replication origin and terminus (Fig. 3). To identify putative operons, we followed ref. 29 for describing Rho-independent transcription termination sites. This yielded â¼1,630 putative terminators (340 of which were bidirectional). We retained only those that were located less than 100âbp downstream of a gene, or that were considered by the program to be âvery strongâ (in order to account for possible erroneous CDSs). This yielded a total of â¼1,250 terminators, with a mean operon size of three genes. A similar approach to the identification of promoters is problematical, especially because at least 14 sigma factors, recognizing different promoter sequences, have been identified in B. subtilis. Nevertheless, the consensus of the main vegetative sigma factor (Ï^A) appears to be identical to its counterpart in E. coli (Ï⁷⁰): 5â²-TTGACA- n₁₇-TATAAT-3â². Relaxing the constraints of the similarity to sigma-specific consensus sequences led to an extremely high number of false-positive results, suggesting that the consensus-oriented approach to the identification of promoters should be replaced by another approach¹⁷.

Classification of gene products

Genes were classified according to ref. 14, based on the representation of cells as Turing machines in which one distinguishes between the machine and the program (Table 1 (PDF File: 275k)). Using the BLAST2P software running against a composite protein databank compound of SWISS-PROT (release 34), TREMBL (release 3, update 1) and B. subtilis proteins, we assigned at least one significant counterpart with a known function to 58% of the B. subtilis proteins. Thus for up to 42% of the gene products, the function cannot be predicted by similarity to proteins of known function: 4% of the proteins are similar only to other unknown proteins of B. subtilis; 12% are similar to unknown proteins from some other organism; and 26% of the proteins are not significantly similar to any other proteins in databanks. This preliminary analysis should be interpreted with caution, because only â¼1,200 gene functions (30%) have been experimentally identified in B. subtilis. We used the âyâ prefix in gene names to emphasize that the function has not been ascertained (2,853 âyâ genes, representing 70%).

Table bl1 Functional classification of the Bacillus subtilis protein-coding gene

Full size table

Regulatory systems. Transcription regulatory proteins. Helixâturnâhelix proteins form a large family of regulatory proteins found in both prokaryotes and eukaryotes. There are several classes, including repressors, activators and sigma factors. Using BLAST searches, we constructed consensus matrices for helixâturnâhelix proteins to analyse the B. subtilis protein library. We identified 18 sigma or sigma-like factors, of which nine (including a new one) are of the SigA type. We also putatively identified 20 regulators (among which 18 were products of âyâ genes) of the GntR family, 19 regulators (15 âyâ genes) of the LysR family, and 12 regulators (5 âyâ genes) of the LacI family. Other transcription regulatory proteins were of the AraC family (11 members, 10 âyâ), the Lrp family (7 members, 3 âyâ), the DeoR family (6 members, 3 âyâ), or additional families (such as the MarR, ArsR or TetR families). A puzzling observation is that several regulatory proteins display significant similarity to aminotransferases (seven such enzymes have been identified as showing similarity to repressors).

Two-component signal-transduction pathways.Two-component regulatory systems, consisting of a sensor protein kinase and a response regulator, are widespread among prokaryotes. We have identified 34 genes encoding response regulators in B. subtilis, most of which have adjacent genes encoding histidine kinases. Response regulators possess a well-conserved N-terminal phospho-acceptor domain³⁰, whereas their C-terminal DNA-binding domains share similarities with previously identified response regulators in E. coli, Rhizobium meliloti, Klebsiella pneumoniae or Staphylococcus aureus. Representatives of the four subfamilies recently identified in E. coli³¹ (OmpR, FixJ, CitB and LytR) have been identified in B. subtilis. In a fifth subfamily, CheY, the DNA-binding domain is absent. The DNA-binding domain of a single B. subtilis response regulator, YesN, shares similarity with regulatory proteins of the AraC family.

Quorum sensing. The B. subtilis genome contains 11 aspartate phosphatase genes, whose products are involved in dephosphorylation of response regulators, that do not seem to have counterparts in Gram-negative bacteria such as E. coli. Downstream from the corresponding genes are some small genes, called phr, encoding regulatory peptides that may serve as quorum sensors³². Seven phr genes have been identified so far, including three new genes (phrG, phrI and phrK).

Protein secretion. It is known that B. subtilis and related Bacillus species, in particular B. licheniformis and B. amyloliquefaciens, have a high capacity to secrete proteins into the culture medium. Several genes encoding proteins of the major secretion pathway have been identified: secA, secD, secE, secF, secY, ffh and ftsY. Surprisingly, there is no gene for the SecB chaperone. It is thought that other chaperone(s) and targeting factor(s), such as Ffh and FtsY, may take over the SecB function. Further, although there is only one such gene in E. coli, five type I signal peptidase genes (sipS, sipT, sipU, sipV and sipW) have been found³³. The lsp gene, encoding a type II signal peptidase required for processing of lipo-modified precursors, was also identified. PrsA, located at the outer side of the membrane, is important for the refolding of several mature proteins after their translocation through the membrane.

Other families of proteins.ABC transporters were the most frequent class of proteins found in B. subtilis. They must be extremely important in Gram-positive bacteria, because they have an envelope comprising a single membrane. ABC transporters will therefore allow such bacteria to escape the toxic action of many compounds. We propose that 77 such transporters are encoded in the genome. In general they involve the interaction of at least three gene products, specified by genes organized into an operon. Other families comprised 47 transport proteins similar to facilitators (and perhaps sometimes part of the ABC transport systems), 18 amino-acid permeases (probably antiporters), and at least 16 sugar transporters belonging to the PEP-dependent phosphotransferase system.

General stress proteins are important for the survival of bacteria under a variety of environmental conditions. We identified 43 temperature-shock and general stress proteins displaying strong similarity to E. coli counterparts.

Missing genes. Histone-like proteins such as HU and H-NS have been identified in E. coli. We found that B. subtilis encodes two putative histone-like proteins that show similarity to E. coli HU, namely HBsu and YonN, but found no homologue to H-NS. It is known that the hbs gene encoding HBsu is essential, but we do not expect the yonN gene to be essential because it is present in the SPÎ² prophage. IHF is similar to HU, and it is not known whether HBsu plays a similar role to that of IHF in E. coli. Similarly, no protein similar to FIS could be found.

Genes encoding products that interact with methylated DNA, such as seqA in E. coli, involved in the regulation of replication initiation timing, or mutH, the endonuclease recognizing the newly synthesized strand during mismatch repair at hemi-methylated GATC sites, are also missing. This is in line with the absence of known methylation in B. subtilis, equivalent to Dam methylation in E. coli. Similarly, E. coli sfiA, encoding an inhibitor of FtsZ action in the SOS response, has no counterpart in B. subtilis. In contrast, B. subtilis replication initiation-specific genes, such as dnaB and dnaD, are missing in E. coli. The exact counterpart of the E. coli mukB gene, involved in chromosome partitioning, does not exist in B. subtilis, but genes spo0J and smc (Smc is weakly similar to MukB), which are suggested to be involved in partitioning of the B. subtilis chromosome, are missing in E. coli.

Turnover of mRNA is controlled in E. coli by a âdegradosomeâ comprising RNase E. It has a counterpart in B. subtilis, but we failed to find a clear homologue of RNase E in this organism. Whether this is related to the role of ribosomal protein S1 as an RNA helicase involved in mRNA turnover in E. coli requires further investigation. In particular, a homologue of rpsA (S1 structural gene), ypfD, might be involved in a structure homologous to the degradosome³⁴.

Structurally unrelated genes of similar function. Several genes encode products that have similar functions in E. coli and B. subtilis, but have no evident common structure. This is the case for the helicase loader genes, E. coli dnaC and B. subtilis dnaI; the genes coding for the replication termination protein, E. coli tus and B. subtilis rtp; and the division topology specifier genes, E. coli minE and B. subtilis divIVA. The situation may even be more complex in multisubunit enzymes: B. subtilis synthesizes two DNA polymerase III Î± chains, one having 3â²â5â² proofreading exonuclease activity (PolC) and the other without the exonuclease activity (DnaE); in E. coli, only the latter exists. E. coli DNA polymerase II is structurally related to DNA polymerase Î± of eukaryotes, whereas B. subtilis YshC is related to DNA polymerase Î².

Metabolism of small molecules

The type and range of metabolism used for the interconversion of low-molecular-weight compounds provide important clues to an organism's natural environment(s) and its biologil activity. Here we briefly outline the main metabolic pathways of B. subtilis before the reconstruction of these pathways in silico, the correlation of genes with specific steps in the pathway, and ultimately the prediction of patterns of gene expression.

Intermediary metabolism.It has long been known that B. subtilis can use a variety of carbohydrates. As expected, it encodes an EmbdenâMeyerhofâParnas glycolytic pathway, coupled to a functional tricarboxylic acid cycle. Further, B. subtilis is also able to grow anaerobically in the presence of nitrate as an electron acceptor. This metabolism is, at least in part, regulated by the FNR protein, binding to sites upstream of at least eight genes (four sites experimentally confirmed and four putative sites). A noteworthy feature of B. subtilis metabolism is an apparent requirement of branched short-chain carboxylic acids for lipid biosynthesis³⁵. Branched-chain 2-keto acid decarboxylase activity exists and may be linked to a variety of genes, suggesting that B. subtilis can synthesize and utilize linear branched short-chain carboxylic acids and alcohols.

Amino-acid and nucleotide metabolism. Pyrimidine metabolism of B. subtilis seems to be regulated in a way fundamentally different from that of E. coli, as it has two carbamylphosphate synthetases (one specific for arginine synthesis, the other for pyrimidine). Additionally, the aspartate transcarbamylase of B. subtilis does not act as an allosteric regulator as it does in E. coli. As in other microorganisms, pyrimidine deoxyribonucleotides are synthesized from ribonucleoside diphosphates, not triphosphates. The cytidine diphosphate required for DNA synthesis is derived from either the salvage pathway of mRNA turnover or from the synthesis of phospholipids and components of the cell wall. This means that polynucleotide phosphorylase is of fundamental importance in nucleic acid metabolism, and may account for its important role in competence³⁶. Two ribonucleoside reductases, both of class I, NrdEF type, are encoded by the B. subtilis chromosome, in one case from within the SPÎ² genome. In this latter case, the gene corresponding to the large subunit both contains an intron and codes for an intein (V.L., unpublished data). The gene of the small subunit of this enzyme also contains an intron, encoding an endonuclease, as was found for the homologue in bacteriophage T4.

By similarity with genes from other organisms, there appears to be, in addition to genes involved in amino-acid degradation (such as the roc operon, which degrades arginine and related amino acids), a large number of genes involved in the degradation of molecules such as opines and related molecules, derived from plants. This is also in line with the fact that B. subtilis degrades polygalacturonate, and suggests that, in its biotope, it forms specific relations with plants.

Secondary metabolism. In addition to many genes coding for degradative enzymes, almost 4% of the B. subtilis genome codes for large multifunctional enzymes (for example, the srf, pps and pks loci), similar to those involved in the synthesis of antibiotics in other genera of Gram-positive bacteria such as Streptomyces. Natural isolates of B. subtilis produce compounds with antibiotic activity, such as surfactin, fengycin and difficidin, that can be related to the above-mentioned loci. This bacterium therefore provides a simple and genetically amenable model in which to study the synthesis of antibiotics and its regulation. These pathways are often organized in very long operons (for example, the pks region spans 78.5âkb, about 2% of the genome). The corresponding sequences are mostly located near the terminus of replication, together with prophages and prophage-like sequences.

Paralogues and orthologues

It is important to relate intermediary metabolism to genome structure, function and evolution. We therefore compared the B. subtilis proteins with themselves, as well as with proteins from known complete genomes, using a consistent statistical method that allows the evaluation of unbiased probabilities of similarities between proteins³⁷,³⁸. For Z-scores higher than 13, the number of proteins similar to each given protein does not vary, indicating that this cut-off value identifies sets of proteins that are significantly similar.

Families of paralogues. Many of the paralogues constitute large families of functionally related proteins, involved in the transport of compounds into and out of the cell, or involved in transcription regulation. Another part of the genome consists of gene doublets (568 genes), triplets (273 genes), quadruplets (168 genes) and quintuplets (100 genes). Finally, about half of the genome is made of genes coding for proteins with no apparent paralogues (Fig. 5). No large family comprises only proteins without any similarity to proteins of known function.

Figure 5: Gene paralogue distribution in the genome of *B. subtilis*

The process by which paralogues are generated is not well understood, but we might find clues by studying some of the duplications in the genome. Several approximate DNA repetitions, associated with very high levels of protein identity, were found, mainly within regions putatively or previously identified as prophages. This is in line with previous observations about PBSX and the skin element³⁹,⁴⁰, and suggests that these prophage-like elements share a common ancestor and have diverged relatively recently. In addition, several protein duplications are in genes that are located very close to each other, such as yukL and dhbF (the corresponding proteins are 65% identical in an overlap of 580 amino acids), yugJ and yugK (proteins 73% identical), yxjG and yxjH (proteins 70% identical), and the entire opuB operon, which is duplicated 3âkb away (opuC operon, yielding â¼80% of amino-acid identity in the corresponding proteins).

The study of paralogues showed that, as in other genomes, a few classes of genes have been highly expanded. This argues against the idea of the genome evolving through a series of duplications of ancestral genomes, but rather for the idea of genes as living organisms, subject to evolutionary constraints, some being submitted to expansion and natural selection, and others to local duplications of DNA regions.

Among paralogue doublets, some were unexpected, such as the three aminoacyl tRNA synthetases doublets (hisS (2,817âkb) and hisZ (3,588âkb); thrS (2,960âkb) and thrZ (3,855âkb); tyrS (3,036âkb) and tyrZ (3,945âkb)) or the two mutS paralogues (mutS and yshD). This latter situation is similar to that found in Synechocystis. In the case of B. subtilis, the presence of two MutS proteins could indicate that there are two different pathways for long-patch mismatch repair, possibly a consequence of the active genetic transformation mechanism of B. subtilis.

Families of orthologues. Because Mycoplasma spp. are thought to be derived from Gram-positive bacteria similar to B. subtilis, we compared the B. subtilis genome with that of M. genitalium. Among the 450 genes encoded by M. genitalium, the products of 300 are similar to proteins of B. subtilis. Among the 146 remaining gene products, a further 3 are similar to proteins of other Bacillus species, and 9 to proteins of other Gram-positive bacteria; 25 are similar to proteins of Gram-negative bacteria; and 19 are similar to proteins of other Mycoplasma spp. This leaves only 90 genes that would be specific to M. genitalium and might be involved in the interaction of this organism with its host.

The B. subtilis genome is similar in size to that of E. coli. Because these bacteria probably diverged more than one billion years ago, it is of evolutionary value to investigate their relative similarity. About 1,000 B. subtilis genes have clear orthologous counterparts in E. coli (one-quarter of the genome). These genes did not belong either to the prophage-like regions or to regions coding for secondary metabolism (â¼15% of the B. subtilis genome). This indicates that a large fraction of these genomes shared similar functions. At first sight, however, it seems that little of the operon structure has been conserved. We nevertheless found that â¼100 putative operons or parts of operons were conserved between E. coli and B. subtilis. Among these, â¼12 exhibited a reshuffled gene order (typically, the arabinose operon is araABD in B. subtilis and araBAD in E. coli). In addition to the core of the translation and transcription machinery, we identified other classes of operons that were well conserved between the two organisms, including major integrated functions such as ATP synthesis (atp operon) and electron transfer (cta and qox operons). As well as being well preserved, the murein biosynthetic region was partly duplicated, allowing creation of part of the genes required for the sporulation division machinery⁴¹. The amino-acid biosynthesis genes differ more in their organization: the E. coli genes for arginine biosynthesis are spread throughout the chromosome, whereas the arginine biosynthesis genes of B. subtilis form an operon. The same is true for purine biosynthetic genes. Genes responsible for the biosynthesis of coenzymes and prosthetic groups in B. subtilis are often clustered in operons that differ from those found in E. coli. Finally, several operons conserved in E. coli and B. subtilis correspond to unknown functions, and should therefore be priority targets for the functional analysis of these model genomes.

Comparison with Synechocystis PCC6803 revealed about 800 orthologues. However, in this case the putative operon structure is extremely poorly conserved, apart from four of the ribosomal protein operons, the groESâgroEL operon, yfnHG (respectively in Synechocystis rfbFG), rpsB-tsf, ylxS-nusA-infB, asd-dapGA-ymfA, spmAB, efp-accB, grpE-dnaK, yurXW. The nine-gene atp operon of B. subtilis is split into two parts in Synechocystis: atpBE and atpIHGFDAC.

Conclusion

The biochemistry, physiology and molecular biology of B. subtilis have been extensively studied over the past 40 years. In particular, B. subtilis has been used to study postexponential phase phenomena such as sporulation and competence for DNA uptake. The genome sequences of E. coli and B. subtilis provide a means of studying the evolutionary divergence, one billion years ago, of eubacteria into the Gram-positive and Gram-negative groups. The availability of powerful genetic tools will allow the B. subtilis genome sequence data to be exploited fully within the framework of a systematic functional analysis program, undertaken by a consortium of 19 European and 7 Japanese laboratories coordinated by S. D. Ehrlich (INRA, Jouy-en-Josas, France) and by N. Ogasawara and H. Yoshikawa (Nara Institute of Science and Technology, Nara, Japan).

Methods

Genome cloning and sequencing. An international consortium was established to sequence the genome of B. subtilis strain 168 (refs 9, 10, 42). At its peak, 25 European, seven Japanese and one Korean laboratory participated in the program, together with two biotechnology companies. Five contiguous DNA regions totalling 0.94âMb, and two additional regions of 0.28 and 0.14âMb, were sequenced by the Japanese partners, while the European partners sequenced a total of 2.68âMb. A few sequences from strain 168 published previously were not resequenced when long overlaps did not indicate differences.

A major technical difficulty was the inability to construct in E. coli gene banks representative of the entire B. subtilis chromosome using vectors that have proved efficient for other sources of bacterial DNA (such as bacteriophage or cosmid vectors). This was due to the generally very high level of expression of B. subtilis genes in E. coli, leading to toxic effects. This limitation was overcome by: cloning into a variety of vectors⁹,⁴³,⁴⁴; using an E. coli strain maintaining low-copy number plasmids⁴⁴; using an integrative plasmid/marker rescue genome-walking strategy⁴⁴; and in vitro amplification using polymerase chain reaction (PCR) techniques⁴⁵,⁴⁶.

Although cloning vectors were used in the early stages as templates for sequencing reactions, they were largely superseded in the later stages by long-range and inverse PCR techniques. To reduce sequencing errors resulting from PCR amplification artefacts, at least eight amplification reactions were performed independently and subsequently pooled. The various sequencing groups were free to choose their own strategy, except that all DNA sequences had to be determined entirely on both strands.

Sequence annotation and verification. The sequences were annotated by the groups, and sent to a central depository at the Institut Pasteur¹⁴. The Japanese sequences were also sent there through the Japanese depository at the Nara Institute of Science and Technology. The same procedures were used to identify CDSs and to detect frameshifts. They were embedded within a cooperative computer environment dedicated to automatic sequence annotation and analysis³⁹. In a first step, we identified in all six possible frames the open reading frames (ORFs) that were at least 100 codons in length. In a second step, three independent methods were used: the first method used the GeneMark coding-sequence prediction method⁴⁷ together with the search for CDSs preceded by typical translation initiation signals (5â²-AAGGAGGTG-3â²), located 4â13 bases upstream of the putative start codons (ATG, TTG or GTG); the second method used the results of a BLAST2X analysis performed on the entire B. subtilis genome against the non-redundant protein databank at the NCBI; and the third method was based on the distribution of non-overlapping trinucleotides or hexanucleotides in the three frames of an ORF⁴⁸.

In general, frameshifts and missense mutations generating termination codons or eliminating start codons are relatively easy to detect. We shall devise a procedure for detecting another type of error, GC instead of CG or vice versa, which are much more difficult to identify. It should be noted that putative frameshift errors should not be corrected automatically. The sequences of the flanking regions of a 500-bp fragment centred around a putative error were sent to an independent verification group, which performed PCR amplifications using chromosomal DNA as template, and sequenced the corresponding DNA products.

Organization and accessibility of data. The B. subtilis sequence data have been combined with data from other sources (biochemical, physiological and genetic) in a specialized database, SubtiList⁴⁹, available as a Macintosh or Windows stand-alone application (4th Dimension runtime) by anonymous FTP at ftp://ftp.pasteur.fr/pub/GenomeDB/SubtiList. SubtiList is also accessible through a World-Wide Web server at http://www.pasteur.fr/Bio/SubtiList.html, where it has been implemented on a UNIX system using the Sybase relational database management system. A completely rewritten version of SubtiList is in preparation to facilitate browsing of the information of the whole chromosome. Flat files of the whole DNA and protein sequences in EMBL and FASTA format will be made available at the above ftp address. Another B. subtilis genome database is also under development at the Human Genome Center of Tokyo University (http://www.genome.ad.jp), and SubtiList will also be available there.

**Figure 2: General view of the *B. subtilis* chromosome.**

References

Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496â512 (1995).
ADSÂ CASÂ PubMedÂ Google ScholarÂ
Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397â403 (1995).
ADSÂ CASÂ PubMedÂ Google ScholarÂ
Kaneko, T. et al. Sequence analysis of the genome of the unicellular Cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 3, 109â136 (1996).
CASÂ PubMedÂ Google ScholarÂ
Bult, C. J. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058â1073 (1996).
ADSÂ CASÂ PubMedÂ Google ScholarÂ
Himmelreich, R. et al. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420â4449 (1996).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Goffeau, A. et al. The yeast genome directory. Nature 387, 5â105 (1997).
CASÂ Google ScholarÂ
Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539â547 (1997).
ADSÂ CASÂ PubMedÂ Google ScholarÂ
Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453â1462 (1997).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Kunst, F., Vassarotti, A. & Danchin, A. Organization of the European Bacillus subtilis genome sequencing project. Microbiology 389, 84â87 (1995).
Google ScholarÂ
Ogasawara, N. & Yoshikawa, H. The systematic sequencing of the Bacillus subtilis genome in Japan. Microbiology 142, 2993â2994 (1996).
CASÂ Google ScholarÂ
Harwood, C. R. Bacillus subtilis and its relatives: molecular biological and industrial workhorses. Trends Biotechnol. 10, 247â256 (1992).
CASÂ PubMedÂ Google ScholarÂ
Stragier, P. & Losick, R. Molecular genetics of sporulation in Bacillus subtilis. Annu. Rev. Genet. 30, 297â341 (1996).
CASÂ PubMedÂ Google ScholarÂ
Solomon, J. M. & Grossman, A. D. Who's competent and when: regulation of natural genetic competence in bacteria. Trends Genet. 12, 150â155 (1996).
CASÂ PubMedÂ Google ScholarÂ
Moszer, I., Kunst, F. & Danchin, A. The European Bacillus subtilis genome sequencing project: current status and accessibility of the data from a new World Wide Web site. Microbiology 142, 2987â2991 (1996).
CASÂ PubMedÂ Google ScholarÂ
Franks, A. H., Griffiths, A. A. & Wake, R. G. Identification and characterization of new DNA replication terminators in Bacillus subtilis. Mol. Microbiol. 17, 13â23 (1995).
CASÂ PubMedÂ Google ScholarÂ
Lobry, J. R. Asymmetric substitution patterns in the two DNA strands of bacteria. Mol. Biol. Evol. 13, 660â665 (1996).
CASÂ PubMedÂ Google ScholarÂ
HÃ©naut, A. & Danchin, A. in Escherichia coli and Salmonella: Cellular and Molecular Biology (eds Neidhardt, F. et al.) 2047â2066 (ASM, Washington DC, (1996)).
Google ScholarÂ
Nussinov, R. The universal dinucleotide asymmetry rules in DNA and amino acid codon choice. Nucleic Acids Res. 17, 237â244 (1981).
CASÂ Google ScholarÂ
Karlin, S., Burge, C. & Campbell, A. M. Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Res. 20, 1363â1370 (1992).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Burge, C., Campbell, A. M. & Karlin, S. Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl Acad. Sci. USA 89, 1358â1362 (1992).
ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Kasahara, Y., Nakai, S. & Ogasawara, H. Sequen analysis of the 36-kb region between gntZ and trnY genes of Bacillus subtilis genome. DNA Res. 4, 155â159 (1997).
CASÂ PubMedÂ Google ScholarÂ
Presecan, E. et al. The Bacillus subtilis genome from gerBC (311Â°) to licR (334Â°). Microbiology 143, 3313â3328 (1997).
CASÂ PubMedÂ Google ScholarÂ
Burkholder, P. R. & Giles, N. H. Induced biochemical mutations in Bacillus subtilis. Am. J. Bot. 33, 345â348 (1947).
Google ScholarÂ
Daniels, D. L., Plunkett, G. II, Burland, V. & Blattner, F. R. Analysis of the Escherichia coli genome: DNA sequence of the region from 84.5 to 86.5 minutes. Science 257, 771â778 (1992).
ADSÂ CASÂ PubMedÂ Google ScholarÂ
Wu, L. J. & Errington, J. Bacillus subtilis SpoIIIE protein required for DNA segregation during asymmetric cell division. Science 264, 572â575 (1994).
ADSÂ CASÂ PubMedÂ Google ScholarÂ
Itaya, M. Stability and asymmetric replication of the Bacillus subtilis 168 chromosome structure. J. Bacteriol. 175, 741â749 (1993).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Billoud, B., Kontic, M. & Viari, A. Palingol: a declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. Nucleic Acids Res. 24, 1395â1403 (1996).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Fichant, G. A. & Burks, C. Identifying potential tRNA genes in genomic DNA sequences. J. Mol. Biol. 220, 659â671 (1991).
CASÂ PubMedÂ Google ScholarÂ
d'Aubenton Carafa, Y., Brody, E. & Thermes, C. Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. J. Mol. Biol. 216, 835â858 (1990).
CASÂ PubMedÂ Google ScholarÂ
Stock, J. B., Surette, M. G., Levitt, M. & Park, P. in Two-Component Signal Transduction (eds Hoch, J. A. & Silhavy, T. J.) 25â51 (ASM, Washington DC, (1995)).
Google ScholarÂ
Mizuno, T. Compilation of all genes encoding two-component phosphotransfer signal transducers in the genome of Escherichia coli. DNA Res. 4, 161â168 (1997).
CASÂ PubMedÂ Google ScholarÂ
Perego, M., Glaser, P. & Hoch, J. A. Aspartyl-phosphate phosphatases deactivate the response regulator components of the sporulation signal transduction system in Bacillus subtilis. Mol. Microbiol. 19, 1151â1157 (1996).
CASÂ PubMedÂ Google ScholarÂ
Tjalsma, H. et al. Bacillus subtilis contains four closely related type I signal peptidases with overlapping substrate specificities: constitutive and temporally controlled expression of different sip genes. J. Biol. Chem. 272, 25983â25992 (1997).
CASÂ PubMedÂ Google ScholarÂ
Danchin, A. Comparison between the Escherichia coli and Bacillus subtilis genomes suggests that a major function of polynucleotide phosphorylase is to synthesize CDP. DNA Res. 4, 9â18 (1997).
CASÂ PubMedÂ Google ScholarÂ
Suutari, M. & Laakso, S. Unsaturated and branched chain-fatty acids in temperature adaptation of Bacillus subtilis and Bacillus megaterium. Biochim. Biophys. Acta 1126, 119â124 (1992).
CASÂ PubMedÂ Google ScholarÂ
Luttinger, A., Hahn, J. & Dubnau, D. Polynucleotide phosphorylase is necessary for competence development in Bacillus subtilis. Mol. Microbiol. 19, 343â356 (1996).
CASÂ PubMedÂ Google ScholarÂ
LandÃ¨s, C., HÃ©naut, A. & Risler, J.-L. Acomparison of several similarity indices used in the classification of protein sequences: a multivariate analysis. Nucleic Acids Res. 20, 3631â3637 (1992).
PubMedÂ PubMed CentralÂ Google ScholarÂ
GlÃ©met, E. & Codani, J.-J. LASSAP, a LArge Scale Sequence compArison Package. Comput. Appl. Biosci. 13, 137â143 (1997).
PubMedÂ Google ScholarÂ
MÃ©digue, C., Moszer, I., Viari, A. & Danchin, A. Analysis of a Bacillus subtilis genome fragment using a co-operative computer system prototype. Gene 165, GC37âGC51 (1995).
PubMedÂ Google ScholarÂ
Krogh, S., O'Reilly, M., Nolan, N. & Devine, K. M. The phage-like element PBSX and part of the skin element, which are resident at different locations on the Bacillus subtilis chromosome, are highly homologous. Microbiology 142, 2031â2040 (1996).
CASÂ PubMedÂ Google ScholarÂ
Daniel, R. A., Drake, S., Buchanan, C. E., Scholle, R. & Errington, J. The Bacillus subtilis spoVD gene encodes a mother-cell-specific penicillin-binding protein required for spore morphogenesis. J. Mol. Biol. 235, 209â220 (1994).
CASÂ PubMedÂ Google ScholarÂ
Anagnostopoulos, C. & Spizizen, J. Requirements for transformation in Bacillus subtilis. J. Bacteriol. 81, 741â746 (1961).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Azevedo, V. et al. An ordered collection of Bacillus subtilis DNA segments cloned in yeast artificial chromosomes. Proc. Natl Acad. Sci. USA 90, 6047â6051 (1993).
ADSÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Glaser, P. et al. Bacillus subtilis genome project: cloning and sequencing of the 97âkb region from 325Â° to 333Â°. Mol. Microbiol. 10, 371â384 (1993).
CASÂ PubMedÂ Google ScholarÂ
Ogasawara, N., Nakai, S. & Yoshikawa, H. Systematic sequencing of the 180 kilobase region of the Bacillus subtilis chromosome containing the replication origin. DNA Res. 1, 1â14 (1994).
CASÂ PubMedÂ Google ScholarÂ
Sorokin, A. et al. Anew approach using multiplex long accurate PCR and yeast artificial chromomes for bacterial chromosome mapping and sequencing. Genome Res. 6, 448â453 (1996).
CASÂ PubMedÂ Google ScholarÂ
Borodovsky, M. & McIninch, J. GENMARK: parallel gene recognition for both DNA strands. Comput. Chem. 17, 123â133 (1993).
CASÂ MATHÂ Google ScholarÂ
Fichant, G. A. & Quentin, Y. Aframeshift error detection algorithm for DNA sequencing projects. Nucleic Acids Res. 23, 2900â2908 (1995).
CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Moszer, I., Glaser, P. & Danchin, A. SubtiList: a relational database for the Bacillus subtilis genome. Microbiology 141, 261â268 (1995).
CASÂ PubMedÂ Google ScholarÂ

Download references

Acknowledgements

We thank C. Anagnostopoulos, R. Dedonder and J. Hoch for their pioneering efforts, and A. Bairoch for advice in annotating B. subtilis protein data. The main funding of the European network was provided by the European Commission under the Biotechnology program. The Japanese project was included in the Human Genome Program, and supported by a research grant from the Ministry of Education, Science and Culture, and the Proposal-Based Advanced Industrial Technology R&D Program from New EEnergy and Industrial Technology Development Organization. The Swiss and Korean projects were funded by the Swiss National Fund and the Korean government, respectively. An industrial platform was set up to facilitate contacts between participants of the European consortium and some European biotechnology companies: DuPont de Nemours (France, USA), Frimond (Belgium), Genencor (Finland, USA), Gist Brocades (The Netherlands), Glaxo-Wellcome (UK, Italy), Hoechst Marion Roussel (France, Germany), F. Hoffmann-La Roche AG (Switzerland), Novo Nordisk (Denmark), SmithKline Beecham (UK).

Author information

Authors and Affiliations

Institut Pasteur, UnitÃ© de Biochimie Microbienne, 25 rue du Docteur Roux, Paris, 75724, Cedex 15, France
F. KunstÂ &Â G. Rapoport
Nara Institute of Science and Technology, Graduate School of Biological Sciences, Ikoma, 630-01, Nara, Japan
N. Ogasawara,Â Y. Kasahara,Â S. NakaiÂ &Â H. Yoshikawa
Institut Pasteur, UnitÃ© de RÃ©gulation de l'Expression GÃ©nÃ©tique, 28 rue du Docteur Roux, Paris, 75724, Cedex 15, France
I. Moszer,Â M. G. Bertero,Â L. Boursier,Â S. Brouillet,Â P. Glaser,Â M.-F. Hullo,Â M. Klaerr-Blanchard,Â C. MÃ©digue,Â E. Presecan,Â E. Rocha,Â S. Schleich,Â A. SekowskaÂ &Â A. Danchin
Dipartimento di Genetica e Microbiologia, Universita di Pavia, Via Abbiategrasso 207, 27100, Pavia, Italy
A. M. Albertini,Â G. Alloni,Â M. G. Bertero,Â A. Galizzi,Â F. ScoffoneÂ &Â E. Tacconi
INRA, GÃ©nÃ©tique Microbienne, Domaine de Vilvert, 78352, Jouy-en-Josas Cedex, France
V. Azevedo,Â P. BessiÃ¨res,Â A. Bolotin,Â V. Capuano,Â S. D. Ehrlich,Â N. Galleron,Â A. Lapidus,Â P. Pujic,Â P. Serror,Â A. SorokinÂ &Â E. Zumstein
Institut fÃ¼r Mikrobiologie, J. W. Goethe-UniversitÃ¤t, Marie Curie Strasse 9, 60439, Frankfurt/Maine, Germany
S. Borchert,Â K. D. Entian,Â C. Klein,Â P. KoetterÂ &Â M. Rose
Institut fÃ¼r Genetik und Mikrobiologie, Humboldt UniversitÃ¤t, Chausseestrasse 17, D-10115, Berlin, Germany
R. Borriss,Â S. PorwollikÂ &Â R. Schroeter
Centre d'IngÃ©nierie des ProtÃ©ines, UniversitÃ© de LiÃ¨ge, Institut de Chimie B6, Sart Tilman, B-4000, LiÃ¨ge, Belgium
A. Brans,Â B. JorisÂ &Â S. Lardinois
QIAGEN GmbH, Max-Volmer-Strasse 4, D-40724, Hilden, Germany
M. Braun,Â A. DÃ¼sterhÃ¶ft,Â C. Fritz,Â H. Hilbert,Â J. LauberÂ &Â D. Moestl
Department of Microbiological, Immunological and Virological Sciences, The Medical School, University of Newcastle, Framlington Place, NE2 4HH, Newcastle upon Tyne, UK
S. C. Brignell,Â N. M. Carter,Â B. J. Guy,Â C. R. HarwoodÂ &Â A. Wipat
Department of Genetics, University of Groningen, Kerklaan 30, 9751, NN Haren, The Netherlands
S. Bron,Â S. Holsappel,Â M. NobackÂ &Â P. Terpstra
Atelier de BioInformatique, UniversitÃ© Paris VI, 12 rue Cuvier, 75005, Paris, France
S. Brouillet,Â C. MÃ©digue,Â E. RochaÂ &Â A. Viari
ICGEB, AREA Science Park, Padriciano 99, I-34012, Trieste, Italy
C. V. BruschiÂ &Â V. Tosato
Genencor International, 925 Page Mill Road, Palo Alto, 94304-1013, California, USA
B. Caldwell,Â E. FerrariÂ &Â P. Winters
Applied Microbiology Research Division, Bacterial Molecular Genetics Research Unit, KRIBB, PO Box 115, Yusong, 305-600, Taejon, Korea
S.-K. Choi,Â S.-Y. Ghim,Â S.-H. ParkÂ &Â B.-S. Shin
INRIA, Domaine de Voluceau, PB 105, Le Chesnay, 78153, Cedex, France
J.-J. Codani
Department of Food Macromolecular Science, Institute of Food Research, Reading Laboratory, Earley Gate, Whiteknights Road, RG6 6BZ, Reading, UK
I. F. ConnertonÂ &Â N. J. Cummings
Sir William Dunn School of Pathology, University of Oxford, South Parks Road, OX1 3RE, Oxford, UK
R. A. Daniel,Â J. Errington,Â D. FoulgerÂ &Â A. M. Prescott
Laboratoire de Chimie BactÃ©rienne, CNRS BP 71, 31 Chemin Joseph Aiguier, Marseille, 13402, Cedex 09, France
F. Denizot,Â C. Fabret,Â G. GuiseppiÂ &Â J. Haiech
Department of Genetics, Trinity College, Lincoln Place Gate, 2, Dublin, Republic of Ireland
K. M. Devine,Â S. Krogh,Â D. Noone,Â M. O'ReillyÂ &Â E. Scanlan
Department of Biochemistry and Genetics, The Medical School, University of Newcastle, Framlington Place, NE2 4HH, Newcastle upon Tyne, UK
P. T. Emmerson
Radioisotope Center, National Insitute of Genetics, Mishima, 411, Shizuoka-ken, Japan
M. Fujita,Â Y. SadaieÂ &Â K. Yata
Department of Biotechnology, Faculty of Engineering, Fukuyama University, Higashimura-cho, Fukuyama-shi, 729-02, Hiroshima, Japan
Y. FujitaÂ &Â K. Yoshida
Institute of Biological Sciences, Tsukuba University, Tsuiuba-shi, 305, Ibaraki, Japan
S. Fuma,Â M. Kumano,Â K. Kurita,Â K. Ogawa,Â A. TamakoshiÂ &Â K. Yamane
FacultÃ© des Sciences Agronomiques, UnitÃ© de Biochimie Physiologique, UniversitÃ© Catholique de Louvain, Place Croix du Sud, 2-20 B-1348, Louvain-la-Neuve, Belgium
A. GoffeauÂ &Â B. Purnelle
Novo Nordisk Biotech, 1445 Drew Avenue, Davis, 95616-4880, California, USA
E. J. GolightlyÂ &Â M. Rey
Eniricerche, Via Maritano 26, San Donato Milanese, Milan, 20097, Italy
G. GrandiÂ &Â A. Tognoni
Institute of Molecular and Cellular Biology, The University of Tokyo, Bunkyo-ku, 113, Tokyo, Japan
K. Haga,Â H. Liu,Â H. Takahashi,Â K. YasumotoÂ &Â H.-F. Yoshikawa
Laboratoire GÃ©nome et Informatique, UniversitÃ© de Versailles, BÃ¢timent Buffon, 45 Avenue des Ãtats-Unis, 78035, Versailles Cedex, France
A. HÃ©naut
Faculty of Agriculture, Tokyo University of Agriculture and Technology, Fuchu, 183, Tokyo, Japan
S. Hosono,Â Y. Kobayashi,Â S. Masuda,Â M. Mizuno,Â T. Sato,Â K. TakemaruÂ &Â M. Takeuchi
Mitsubishi Kasei Institute of Life Sciences, 11 Minamyiooa, Machida-shi, 194, Tokyo, Japan
M. Itaya
Institut Pasteur, Service d'Informatique Scientifique, 28 rue du Docteur Roux, Paris, 75724, Cedex 15, France
L. Jones
Institut de GÃ©nÃ©tique et Biologie Microbiennes, UniversitÃ© de Lausanne, 19 rue CÃ©sar Roux, 1005, Lausanne, Switzerland
D. Karamata,Â V. Lazarevic,Â C. MauÃ«l,Â S. Reynolds,Â C. RivoltaÂ &Â B. Soldo
Department of Molecular Microbiology, MBW/BCA, Faculty of Biology, Vrije Universiteit Amsterdam, De Boelelaan 1087, 1081, HV Amsterdam, The Netherlands
G. KoningsteinÂ &Â B. Oudega
Chongju University College of Science and Engineering, Chongju City, Korea
S.-M. Lee
Institut de GÃ©nÃ©tique et Microbiologie, UniversitÃ© Paris Sud, URA CNRS 2225, UniversitÃ© Paris XIâBÃ¢timent 409, 91405, Orsay Cedex, France
A. Levine,Â N. Medina,Â B. Roche,Â S. J. SerorÂ &Â F. Vannier
Centro Nacional de Biotecnologia (CSIC), Campus Universidad Autonoma, Cantoblanco, 28049, Madrid, Spain
R. P. MelladoÂ &Â V. Parro
National Institute of Basic Biology, 38 Nishigounaka, Myoudaiji-chou, 444, Okazaki, Japan
A. Ogiwara
Gesellschaft fÃ¼r Analyse-Technik und Consulting mbH, Fritz-Arnold StraÎ²e 23, D-78467, Konstanz, Germany
T. M. PohlÂ &Â T. Weitzenegger
Department of Microbiology, Faculty of Agronomy, 6 Avenue du MarÃ©chal Juin, B-5030, Gembloux, Belgium
D. PortetelleÂ &Â M. Vandenbol
Biotech Research, BMF, Wilhelmsfeld, Klingelstrasse 35, D-69434, Hirschhorn, Germany
M. Rieger
Department of Applied Biology, Faculty of Textile Science and Technology, Shinshu University 3-15-1, Tokida, Ueda-shi, 386, Nagano, Japan
J. Sekiguchi,Â S. UchiyamaÂ &Â H. Yamamoto
Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, 108, Tokyo, Japan
T. Takagi
Department of Marine Science, School of Marine Science and Technology, Tokai University, 3-20-1 Orido Shimizu, 424, Shizuoka, Japan
T. Tanaka
European Commission, DG XII-E-1, SDME 8/78, Rue de la Loi 200, B-1049, Brussels, Belgium
A. Vassarotti
AGOWAmbH, Glienicker Weg 185, 12489, Berlin, Germany
R. Wambutt,Â E. WedlerÂ &Â H. Wedler

Authors

F. Kunst
View author publications
You can also search for this author in PubMedÂ Google Scholar
N. Ogasawara
View author publications
You can also search for this author in PubMedÂ Google Scholar
I. Moszer
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. M. Albertini
View author publications
You can also search for this author in PubMedÂ Google Scholar
G. Alloni
View author publications
You can also search for this author in PubMedÂ Google Scholar
V. Azevedo
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. G. Bertero
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. BessiÃ¨res
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Bolotin
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Borchert
View author publications
You can also search for this author in PubMedÂ Google Scholar
R. Borriss
View author publications
You can also search for this author in PubMedÂ Google Scholar
L. Boursier
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Brans
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Braun
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. C. Brignell
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Bron
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Brouillet
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. V. Bruschi
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. Caldwell
View author publications
You can also search for this author in PubMedÂ Google Scholar
V. Capuano
View author publications
You can also search for this author in PubMedÂ Google Scholar
N. M. Carter
View author publications
You can also search for this author in PubMedÂ Google Scholar
S.-K. Choi
View author publications
You can also search for this author in PubMedÂ Google Scholar
J.-J. Codani
View author publications
You can also search for this author in PubMedÂ Google Scholar
I. F. Connerton
View author publications
You can also search for this author in PubMedÂ Google Scholar
N. J. Cummings
View author publications
You can also search for this author in PubMedÂ Google Scholar
R. A. Daniel
View author publications
You can also search for this author in PubMedÂ Google Scholar
F. Denizot
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. M. Devine
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. DÃ¼sterhÃ¶ft
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. D. Ehrlich
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. T. Emmerson
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. D. Entian
View author publications
You can also search for this author in PubMedÂ Google Scholar
J. Errington
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. Fabret
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Ferrari
View author publications
You can also search for this author in PubMedÂ Google Scholar
D. Foulger
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. Fritz
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Fujita
View author publications
You can also search for this author in PubMedÂ Google Scholar
Y. Fujita
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Fuma
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Galizzi
View author publications
You can also search for this author in PubMedÂ Google Scholar
N. Galleron
View author publications
You can also search for this author in PubMedÂ Google Scholar
S.-Y. Ghim
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. Glaser
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Goffeau
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. J. Golightly
View author publications
You can also search for this author in PubMedÂ Google Scholar
G. Grandi
View author publications
You can also search for this author in PubMedÂ Google Scholar
G. Guiseppi
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. J. Guy
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Haga
View author publications
You can also search for this author in PubMedÂ Google Scholar
J. Haiech
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. R. Harwood
View author publications
You can also search for this author in PubMedÂ Google Scholar
H. Hilbert
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Holsappel
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Hosono
View author publications
You can also search for this author in PubMedÂ Google Scholar
M.-F. Hullo
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Itaya
View author publications
You can also search for this author in PubMedÂ Google Scholar
L. Jones
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. Joris
View author publications
You can also search for this author in PubMedÂ Google Scholar
D. Karamata
View author publications
You can also search for this author in PubMedÂ Google Scholar
Y. Kasahara
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Klaerr-Blanchard
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. Klein
View author publications
You can also search for this author in PubMedÂ Google Scholar
Y. Kobayashi
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. Koetter
View author publications
You can also search for this author in PubMedÂ Google Scholar
G. Koningstein
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Krogh
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Kumano
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Kurita
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Lapidus
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Lardinois
View author publications
You can also search for this author in PubMedÂ Google Scholar
J. Lauber
View author publications
You can also search for this author in PubMedÂ Google Scholar
V. Lazarevic
View author publications
You can also search for this author in PubMedÂ Google Scholar
S.-M. Lee
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Levine
View author publications
You can also search for this author in PubMedÂ Google Scholar
H. Liu
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Masuda
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. MauÃ«l
View author publications
You can also search for this author in PubMedÂ Google Scholar
N. Medina
View author publications
You can also search for this author in PubMedÂ Google Scholar
R. P. Mellado
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Mizuno
View author publications
You can also search for this author in PubMedÂ Google Scholar
D. Moestl
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Nakai
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Noback
View author publications
You can also search for this author in PubMedÂ Google Scholar
D. Noone
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. O'Reilly
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Ogawa
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Ogiwara
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. Oudega
View author publications
You can also search for this author in PubMedÂ Google Scholar
S.-H. Park
View author publications
You can also search for this author in PubMedÂ Google Scholar
V. Parro
View author publications
You can also search for this author in PubMedÂ Google Scholar
T. M. Pohl
View author publications
You can also search for this author in PubMedÂ Google Scholar
D. Portetelle
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Porwollik
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. M. Prescott
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Presecan
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. Pujic
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. Purnelle
View author publications
You can also search for this author in PubMedÂ Google Scholar
G. Rapoport
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Rey
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Reynolds
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Rieger
View author publications
You can also search for this author in PubMedÂ Google Scholar
C. Rivolta
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Rocha
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. Roche
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Rose
View author publications
You can also search for this author in PubMedÂ Google Scholar
Y. Sadaie
View author publications
You can also search for this author in PubMedÂ Google Scholar
T. Sato
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Scanlan
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Schleich
View author publications
You can also search for this author in PubMedÂ Google Scholar
R. Schroeter
View author publications
You can also search for this author in PubMedÂ Google Scholar
F. Scoffone
View author publications
You can also search for this author in PubMedÂ Google Scholar
J. Sekiguchi
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Sekowska
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. J. Seror
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. Serror
View author publications
You can also search for this author in PubMedÂ Google Scholar
B.-S. Shin
View author publications
You can also search for this author in PubMedÂ Google Scholar
B. Soldo
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Sorokin
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Tacconi
View author publications
You can also search for this author in PubMedÂ Google Scholar
T. Takagi
View author publications
You can also search for this author in PubMedÂ Google Scholar
H. Takahashi
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Takemaru
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Takeuchi
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Tamakoshi
View author publications
You can also search for this author in PubMedÂ Google Scholar
T. Tanaka
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. Terpstra
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Tognoni
View author publications
You can also search for this author in PubMedÂ Google Scholar
V. Tosato
View author publications
You can also search for this author in PubMedÂ Google Scholar
S. Uchiyama
View author publications
You can also search for this author in PubMedÂ Google Scholar
M. Vandenbol
View author publications
You can also search for this author in PubMedÂ Google Scholar
F. Vannier
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Vassarotti
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Viari
View author publications
You can also search for this author in PubMedÂ Google Scholar
R. Wambutt
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Wedler
View author publications
You can also search for this author in PubMedÂ Google Scholar
H. Wedler
View author publications
You can also search for this author in PubMedÂ Google Scholar
T. Weitzenegger
View author publications
You can also search for this author in PubMedÂ Google Scholar
P. Winters
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Wipat
View author publications
You can also search for this author in PubMedÂ Google Scholar
H. Yamamoto
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Yamane
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Yasumoto
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Yata
View author publications
You can also search for this author in PubMedÂ Google Scholar
K. Yoshida
View author publications
You can also search for this author in PubMedÂ Google Scholar
H.-F. Yoshikawa
View author publications
You can also search for this author in PubMedÂ Google Scholar
E. Zumstein
View author publications
You can also search for this author in PubMedÂ Google Scholar
H. Yoshikawa
View author publications
You can also search for this author in PubMedÂ Google Scholar
A. Danchin
View author publications
You can also search for this author in PubMedÂ Google Scholar

Corresponding authors

Correspondence to F. Kunst or N. Ogasawara.

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence (http://creativecommons.org/licenses/by-nc-sa/3.0/), which permits distribution, and reproduction in any medium, provided the original author and source are credited. This licence does not permit commercial exploitation, and derivative works must be licensed under the same or similar licence.

Reprints and permissions

About this article

Cite this article

Kunst, F., Ogasawara, N., Moszer, I. et al. The complete genome sequence of the Gram-positive bacterium Bacillus subtilis. Nature 390, 249â256 (1997). https://doi.org/10.1038/36786

Download citation

Received: 16 July 1997
Accepted: 29 September 1997
Issue Date: 20 November 1997
DOI: https://doi.org/10.1038/36786

The complete genome sequence of the Gram-positive bacterium Bacillus subtilis

Abstract

Similar content being viewed by others

A comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome

Ancient origin and constrained evolution of the division and cell wall gene cluster in Bacteria

Connecting genomic islands across prokaryotic and phage genomes via protein families

Main

General features of the DNA sequence

Classification of gene products

Metabolism of small molecules

Paralogues and orthologues

Conclusion

Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Search

Quick links

Abstract

Similar content being viewed by others

A comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome

Ancient origin and constrained evolution of the division and cell wall gene cluster in Bacteria

Connecting genomic islands across prokaryotic and phage genomes via protein families

Main

General features of the DNA sequence

Classification of gene products

Metabolism of small molecules

Paralogues and orthologues

Conclusion

Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links