Abstract
A parsing algorithm visualizer is a tool that visualizes the construction of a parser for a given context-free grammar and then illustrates the use of that parser to parse a given string. Parsing algorithm visualizers are used to teach the course on compiler construction which in invariably included in all undergraduate computer science curricula. This paper presents a new parsing algorithm visualizer that can visualize six parsing algorithms, viz. predictive parsing, simple LR parsing, canonical LR parsing, look-ahead LR parsing, Earley parsing and CYK parsing. The tool logically explains the process of parsing showing the calculations involved in each step. The output of the tool has been structured to maximize the learning outcomes and contains important constructs like FIRST and FOLLOW sets, item sets, parsing table, parse tree and leftmost or rightmost derivation depending on the algorithm being visualized. The tool has been used to teach the course on compiler construction at both undergraduate and graduate levels. An overall positive feedback was received from the students with 89% of them saying that the tool helped them in understanding the parsing algorithms. The tool is capable of visualizing multiple parsing algorithms and 88% students used it to compare the algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adams, D. R., & Trefftz, C. (2004). Using XML in a compiler course. inroads – ACM SIGCSE Bulletin, 36, 4–6.
Aiken, A. (1996). Cool: A portable project for teaching compiler construction. ACM SIGPLAN Notices, 31, 19–24.
Almeida-Martínez, F. J., Urquiza-Fuentes, J., & Velázquez-Iturbide, J. A. (2008). VAST: Visualization of abstract syntax trees within language processors courses. Proceedings of the Fourth ACM Symposium on Software Visualization, 209–210.
Andrews, K., Henry, R. R., & Yamamoto, W. K. (1988). Design and implementation of the UW illustrated compiler. ACM SIGPLAN Notices, 23, 105–114.
Barnard, A. C. L. (1975). Planning and experience with a one quarter course on compiler writing using Gries’ book and structured programming. ACM SIGCSE Bulletin, 7, 27–29.
Blythe, S. A., James, M. C., & Rodger, S. H. (1994). LLparse and LRparse: Visual and interactive tools for parsing. ACM SIGCSE Bulletin, 26, 208–212.
Chakraborty, P., Saxena, P. C., Katti, C. P., Pahwa, G., & Taneja, S. (2014). A new practicum in compiler construction. Computer Applications in Engineering Education, 22, 429–441.
Chanon, R. N. (1975). Compiler construction in an undergraduate course: Some difficulties. ACM SIGCSE Bulletin, 7, 30–32.
Corliss, M. L., Furcy, D., Davis, J., & Pietraszek, L. (2010). Bantam Java compiler project: experiences and extensions. Journal of Computing Sciences in Colleges, 25, 159–166.
de Oliveira Guimarães, J. (2007). Learning compiler construction by examples. inroads – ACM. SIGCSE Bulletin, 39, 70–74.
Debray, S. (2002). Making compiler design relevant for students who will (most likely) never design a compiler. inroads – ACM SIGCSE Bulletin, 34, 341–345.
Elsworth, E. F. (1992). The MSL compiler writing project. ACM SIGCSE Bulletin, 24, 41–44.
Jain, A., Goyal, A., & Chakraborty, P. (2017). PPVT: A tool to visualize predictive parsing. ACM Inroads, 8, 47–51.
Kaplan, A., & Shoup, D. (2000). CUPV – A visualization tool for generated parsers. ACM SIGCSE Bulletin, 32, 11–15.
Khuri, S., & Sugono, Y. (1998). Animating parsing algorithms. ACM SIGCSE Bulletin, 30, 232–236.
Lovato, M. E., & Kleyn, M. F. (1995). Parser visualizations for developing grammars with YACC. ACM SIGCSE Bulletin, 27, 345–349.
McMahon, I. C. (2014). Improving the capabilities of JFLAP: Creating effective user interfaces in learning for theoretical computer science. Undergraduate thesis, Duke University.
Mernik, M., & Zumer, V. (2003). An educational tool for teaching compiler construction. IEEE Transactions on Education, 46, 61–68.
Resler, R. D., & Deaver, D. M. (1998). VCOCO: A visualisation tool for teaching compilers. ACM SIGCSE Bulletin, 30, 199–202.
Resler, D., & O’Sullivan, K. (1990). VisiCLANG – A visible compiler for CLANG. ACM SIGPLAN Notices, 25, 120–123.
Rodger, S. H., & Finley, T. W. (2006). JFLAP: An interactive formal languages and automata package. Jones and Bartlett.
Shapiro, H. D., & Mickunas, M. D. (1976). A new approach to teaching a first course in compiler construction. ACM SIGCSE Bulletin, 8, 158–166.
Sierra, J.-L., Fernández-Pampillon, A. M., & Fernández-Valmayor, A. (2008). An environment for supporting active learning in courses on language processing. ACM SIGCSE Bulletin, 40, 128–132.
Temte, M. C. (1992). A compiler construction project for an object oriented language. ACM SIGCSE Bulletin, 24, 138–141.
Vegdahl, S. R. (2001). Using visualization tools to teach compiler design. Journal of Computing Sciences in Colleges, 16, 72–83.
White, E. L., Ruby, J., & Deddens, L. D. (1999). Software visualization of LR parsing and synthesized attribute evaluation. Software: Practice and Experience, 29, 1–16.
Author information
Authors and Affiliations
Corresponding author
APPENDIX
APPENDIX
1.1 Predictive parsing
Given below is a sample output of the module to visualize the predictive parsing algorithm. We provide the grammar given in Section 2 and the string “(i + i)*i” as input. The grammar is left-recursive but does not have any scope for left-factoring. The module first displays the LL(1) grammar obtained after eliminating left-recursion. The module then displays FIRST sets for the 5 terminals and the 5 nonterminals, and FOLLOW sets for the 5 nonterminals. The module then displays the parsing table. The parsing table has 5 rows, one for each nonterminal, and 6 columns, one for each terminal and the ‘$’ character. A cell in the parsing table either contains a production rule or is empty. Then the module illustrates the table-driven parsing of the string. In each step, the part of the string that has been already matched, the content of the parsing stack, the part of the string yet to be matched and the production rule used in that step are displayed. The process ends with the acceptance of the string when the entire string has been matched and the parsing stack contains only the ‘$’ character. Since the parsing process has resulted in the acceptance of the string, the module displays the parse tree. Since predictive parsing follows leftmost derivation, the module at last displays the leftmost derivation of the string from the grammar start symbol.
1.2 Simple LR parsing
Given below is the output of the module to visualize the simple LR parsing algorithm for the grammar given in Section 2 and the string “(i + i)*i”. The module first displays the augmented grammar with Z being the new start symbol. The module then displays FIRST sets for the 5 terminals and the 4 nonterminals, and FOLLOW sets for the 4 nonterminals. The module then displays the item sets. The canonical collection contains 12 item sets, viz. I0 to I11. The module then displays the parsing table. The parsing table has 12 rows corresponding to the states constructed from the 12 item sets. The ACTION part of the parsing table has 6 columns, one for each terminal and the ‘$’ character, while the GOTO part of the parsing table has 3 columns, one for each nonterminal except Z. A cell in the ACTION part either specifies the action to be performed or is empty. A cell containing ‘sj’ specifies that the state j has to be pushed onto the parsing stack. A cell containing ‘rj’ specifies that the jth production rule has to be used to reduce. A cell in the GOTO part either gives the next state or is empty. The module then illustrates the table-driven parsing of the string. In each step, the content of the parsing stack, the part of the string yet to be matched, the action performed in that step and the production rule used, if the action was to reduce, are displayed. Since the parsing process results in the acceptance of the string, the module displays the parse tree. Note that the parse trees displayed by the modules visualizing predictive parsing and simple LR parsing are different. This is because of the different preprocessing techniques required by the two algorithms. Since simple LR parsing follows rightmost derivation in reverse, the module at last displays the rightmost derivation in reverse order.
1.3 Canonical LR parsing
Given below is the output of the module to visualize the canonical LR parsing algorithm for the grammar given in Section 2 and the string “(i + i)*i”. The module first displays the augmented grammar, with Z being the new start symbol, and then FIRST sets for the 5 terminals and the 4 nonterminals. The module then displays the item sets. The canonical collection contains 22 item sets, viz. I0 to I21. The module then displays the parsing table. The parsing table is like the one used for simple LR parsing except for the fact that it contains 22 rows. The number of rows in the parsing table can be more even by an order of magnitude in the case of canonical LR parsing than in the case of simple LR parsing for a satisfactorily large grammar. The module then illustrates the table-driven parsing of the string. The number of steps in the parsing process is exactly same as that in the case of simple LR parsing. The module displays the parse tree and the rightmost derivation in reverse order, which are again same as those in the case of simple LR parsing.
1.4 Look-ahead LR parsing
The output of the module to visualize the look-ahead LR parsing algorithm is similar to that of the module to visualize canonical LR parsing for the same grammar and the same string. However, due to the merging of item sets there are fewer rows in the parsing table. In fact, the number of rows in the parsing table is same for simple LR parsing and look-ahead LR parsing for a given grammar.
1.5 Earley parsing
Given below is the output of the module to visualize the Earley parsing algorithm for the grammar given in Section 2 and the string “(i + i)*i”. The module displays the augmented grammar, with Z being the new start symbol. The module then displays the 8 Earley item sets, viz. S[0] to S[7]. Since S[7] contains the item [Z- > E.,0], the string is accepted.
1.6 CYK parsing
Given below is the output of the module to visualize the CYK parsing algorithm for the same grammar and the same string. The module displays the grammar after it has been converted to CNF. As many as 8 new nonterminals has been introduced in the grammar, viz. R, S, U, V, W, X, Y and Z, with Z being the new start symbol. The module then displays the binary matrices. Since there are 7 terminals in the string, 7 binary matrices are printed. A value of 1 in the jth row and kth column of the ith binary matrix means that a string of i terminals can be derived from the jth nonterminal starting at the kth terminal. The string is accepted because the cell corresponding to Z and ‘(‘in the seventh binary matrix has a value of 1. The module also displays the same information using a lower triangular matrix. A nonterminal in the ith row from the bottom and the kth column in this lower triangular matrix can derive a string of i terminals starting with the kth. The presence of Z, the grammar start symbol, in the seventh row from the bottom and first column denotes the acceptance of the string.
Rights and permissions
About this article
Cite this article
Sangal, S., Kataria, S., Tyagi, T. et al. PAVT: a tool to visualize and teach parsing algorithms. Educ Inf Technol 23, 2737–2764 (2018). https://doi.org/10.1007/s10639-018-9739-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10639-018-9739-x