Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1758089.1758110guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Frequency of symbol occurrences in simple non-primitive stochastic models

Published: 07 July 2003 Publication History

Abstract

We study the random variable Yn representing the number of occurrences of a given symbol in a word of length n generated at random. The stochastic model we assume is a simple non-ergodic model defined by the product of two primitive rational formal series, which form two distinct ergodic components. We obtain asymptotic evaluations for the mean and the variance of Yn and its limit distribution. It turns out that there are two main cases: if one component is dominant and nondegenerate we get a Gaussian limit distribution; if the two components are equipotent and have different leading terms of the mean, we get a uniform limit distribution. Other particular limit distributions are obtained in the case of a degenerate dominant component and in the equipotent case when the leading terms of the expectation values are equal.

References

[1]
E. A. Bender and F. Kochman. The distribution of subword counts is usually normal. European Journal of Combinatorics, 14:265-275, 1993.
[2]
J. Berstel and C. Reutenauer. Rational series and their languages, Springer-Verlag, New York - Heidelberg - Berlin, 1988.
[3]
A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. On the number of occurrences of a symbol in words of regular languages. Rapporto Interno n. 274-02, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, February 2002 (to appear in TCS).
[4]
A. Bertoni, C. Choffrut, M. Goldwurm, and V. Lonati. The symbol-periodicity of irreducible finite automata. Rapporto Interno n. 277-02, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, April 2002 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).
[5]
D. de Falco, M. Goldwurm, and V. Lonati. Frequency of symbol occurrences in simple non-primitive stochastic models. Rapporto Interno n. 287-03, Dipartimento di Scienze dell'Informazione, Università degli Studi di Milano, February 2003 (available at http://homes.dsi.unimi.it/~goldwurm/home.html).
[6]
A. Denise. Génération aléatoire uniforme de mots de langages rationnels. Theoretical Computer Science, 159:43-63, 1996.
[7]
J. Fickett. Recognition of protein coding regions in DNA sequences. Nucleic Acid Res, 10:5303-5318, 1982.
[8]
P. Flajolet and R. Sedgewick. The average case analysis of algorithms: multivariate asymptotics and limit distributions. Rapport de recherche n. 3162, INRIA Rocquencourt, May 1997.
[9]
M.S. Gelfand. Prediction of function in DNA sequence analysis. J. Comput. Biol., 2:87-117, 1995.
[10]
L.J. Guibas and A. M. Odlyzko. Maximal prefix-synchronized codes. SIAM J. Appl. Math., 35:401-418, 1978.
[11]
L.J. Guibas and A. M. Odlyzko. Periods in strings. Journal of Combinatorial Theory. Series A, 30:19-43, 1981.
[12]
L.J. Guibas and A. M. Odlyzko. String overlaps, pattern matching, and nontransitive games. Journal of Combinatorial Theory. Series A, 30(2):183-208, 1981.
[13]
P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts Proc. MFCS 91, Lecture Notes in Computer Science, vol. n.520, Springer, 240-248, 1991.
[14]
P. Nicodeme, B. Salvy, and P. Flajolet. Motif statistics. In Proceedings of the 7th ESA, J. Nešetril editor. Lecture Notes in Computer Science, vol. n.1643, Springer, 1999, 194-211.
[15]
B. Prum, F. Rudolphe and E. Turckheim. Finding words with unexpected frequencies in deoxyribonucleic acid sequence. J. Roy. Statist. Soc. Ser. B, 57: 205-220, 1995.
[16]
M. Régnier and W. Szpankowski. On the approximate pattern occurrence in a text. Proc. Sequence '97, Positano, 1997.
[17]
M. Régnier and W. Szpankowski. On pattern frequency occurrences in a Markovian sequence. Algorithmica, 22 (4):621-649, 1998.
[18]
C. Reutenauer. Propriétés arithmétiques et topologiques de séries rationnelles en variables non commutatives, These Sc. Maths, Doctorat troisieme cycle, Université Paris VI, 1977.
[19]
E. Seneta. Non-negative matrices and Markov chains, Springer-Verlag, New York Heidelberg Berlin, 1981.
[20]
M. Waterman. Introduction to computational biology, Chapman & Hall, New York, 1995.
[21]
K. Wich. Sublinear ambiguity. In Proceedings of the 25th MFCS, M. Nielsen and B. Rovan editors. Lecture Notes in Computer Science, vol. n.1893, Springer, 2000, 690-698.
[22]
S. Wolfram. The Mathematica book Fourth Edition, Wolfram Media - Cambridge University Press, 1999.

Cited By

View all
  • (2004)On the maximum coefficients of rational formal series in commuting variablesProceedings of the 8th international conference on Developments in Language Theory10.1007/978-3-540-30550-7_10(114-126)Online publication date: 13-Dec-2004

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
DLT'03: Proceedings of the 7th international conference on Developments in language theory
July 2003
437 pages
ISBN:3540404341
  • Editors:
  • Zoltán Ésik,
  • Zoltán Fülöp

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 07 July 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2004)On the maximum coefficients of rational formal series in commuting variablesProceedings of the 8th international conference on Developments in Language Theory10.1007/978-3-540-30550-7_10(114-126)Online publication date: 13-Dec-2004

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media