Abstract
Engineering diagnosis often involves analyzing complex records of system states printed to large, textual log files. Typically the logs are designed to accommodate the widest debugging needs without rigorous plans on formatting. As a result, critical quantities and flags are mixed with less important messages in a loose structure. Once the system is sealed, the log format is not changeable, causing great difficulties to the technicians who need to understand the event correlations. We describe a modular system for analyzing such logs where document analysis, report generation, and data exploration tools are factored into generic, reusable components and domain-dependent, isolated plug-ins. The system supports incremental, focused analysis of complicated symptoms with minimal programming effort and software installation. We discuss important concerns in the analysis of logs that sets it apart from understanding natural language text or rigorously structured computer programs. We highlight the research challenges that would guide the development of a deep analysis system for many kinds of semi-structured documents.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Angluin, D.: Finding Patterns Common to A Set of Strings. In: Proc. of the 11th Annual ACM Symposium on Theory of Computing, Atlanta, pp. 130–141 (1979)
Angluin, D.: Learning Regular Sets From Queries and Counterexamples. Information and Computation 75, 87–106 (1987)
Baird, H.S.: Anatomy of a versatile page reader. Proceedings of the IEEE 80(7), 1059–1065 (1992)
Bunke, H., Sanfeliu, A.: Syntactic And Structural Pattern Recognition: Theory And Applications. World Scientific, Singapore (1990)
Cullen, P., Ho, T.K., Hull, J.J., Prussak, M., Srihari, S.N.: Contextual Analysis of Machine Printed Addresses. In: Proc. of the 4th USPS Advanced Technology Conference, Washington, D.C, November 1990, pp. 779–793 (1990)
Fielding, R.T.: Architectural Styles and the Design of Network-based Software Architectures, PhD Dissertation, Information and Computer Science, University of California, Irvine (2000)
Franke, K., Guyon, I., Schomaker, L., Vuurpijl, L.: The WANDAML Markup Language for Digital Document Annotation. In: Proc. of the 9th International Workshop on Frontiers in Handwriting Recognition, pp. 563–568
Glance, N.S., Hurst, M., Tomokiyo, T.: BlogPulse: Automated Trend Discovery for Weblogs. In: Proc. of WWW 2004, May 17-22, New York (2004), http://www.blogpulse.com/research.html
Ho, T.K.: Exploratory Analysis of Point Proximity in Subspaces. In: Proc. of the 16th ICPR, Quebec City, Canada, August 11-15 (2002)
Ho, T.K.: Interactive Tools for Pattern Discovery. In: Proc. of the 17th ICPR, Cambridge, U.K, August 22-26, vol. 2, pp. 509–512 (2004)
Ho, T.K.: Mirage project site, http://www.cs.bell-labs.com/who/tkh/mirage
Honavar, V., Slutzki, G. (eds.): ICGI 1998. LNCS (LNAI), vol. 1433. Springer, Heidelberg (1998)
Hu, J., Kashi, R., Wilfong, G.: Document Image Layout Comparison and Classification. In: Proc. of the 5th ICDAR, Bangalore, p. 285 (1999)
Loganalysis.org, http://www.loganalysis.org
Lopresti, D., Nagy, G.: Automated Table Processing: An (Opinionated) Survey. In: Proc. IAPR Workshop on Graphics Recognition (GREC 1999), Jaipur, September 1999, pp. 109–134 (1999)
Madhvanath, S., Govindaraju, V., Ramanaprasad, V., Lee, D.S., Srihari, S.N.: Reading Handwritten US Census Forms. In: Proc. of the 3rd ICDAR, vol. 1, p. 82 (1995)
Nagy, G.: Twenty Years of Document Image Analysis in PAMI. IEEE Trans. PAMI 22(1), 38–62 (2000)
Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: Proc. of the 27th VLDB Conference, Roma, Italy (2001)
Rossmanith, P., Zeugmann, T.: Stochastic Finite Learning of the Pattern Languages. Machine Learning 44, 67–91 (2001)
Sakakibara, Y.: Grammatical Inference in Bioinformatics. IEEE Trans. on Pattern Analysis and Machine Intelligence 27, 1051–1062 (2005)
Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton, J.: Relational Databases for Querying XML Documents: Limitations and Opportunities. In: Proc. of the 25th VLDB Conference, Edinburgh, Scotland (1999)
Spitz, A.L.: Determination of the Script And Language Content of Document Images. IEEE Trans. Pattern Analysis and Machine Intelligence 19(3), 235–245 (1997)
van Zaanen, M.: The Grammatical Induction Website, http://eurise.univ-st-etienne.fr/gi
Watanabe, T., Sobue, T.: Layout Analysis of Complex Documents. In: Proc. of the 15th ICPR, Barcelona, vol. 4, p. 4447
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Flaster, M., Hillyer, B., Ho, T.K. (2006). Exploratory Analysis System for Semi-structured Engineering Logs. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_26
Download citation
DOI: https://doi.org/10.1007/11669487_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)