Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
27 views

Pseudocode to Python - A Compiler Approach

This paper presents a compiler that converts pseudocode into Python code, facilitating software development and enhancing educational experiences by allowing students to focus on algorithms rather than syntax. The compiler includes components for lexical analysis, syntax analysis, and code generation, automating the conversion process and improving efficiency. The research highlights the importance of this tool in programming education, particularly for beginners, by minimizing the need for complex syntax knowledge.

Uploaded by

Salvo Matteini
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Pseudocode to Python - A Compiler Approach

This paper presents a compiler that converts pseudocode into Python code, facilitating software development and enhancing educational experiences by allowing students to focus on algorithms rather than syntax. The compiler includes components for lexical analysis, syntax analysis, and code generation, automating the conversion process and improving efficiency. The research highlights the importance of this tool in programming education, particularly for beginners, by minimizing the need for complex syntax knowledge.

Uploaded by

Salvo Matteini
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2024 8th International Conference on Computational System and Information Technology for Sustainable Solutions (CSITSS) | 979-8-3315-0546-2/24/$31.

00 ©2024 IEEE | DOI: 10.1109/CSITSS64042.2024.10816767

Pseudocode to Python - A Compiler Approach


Mokshit P, P Syam Prasad,N Sumanth Reddy, Meena Belwal
Department of Computer Science and Engineering,
Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, India
mokshithsreekar3@gmail.com,syamprasad7798@gmail.com,sumanthreddynara@gmail.com,
b meena@blr.amrita.edu

Abstract—The transformation of pseudocode to Python is compound of a programming language. In the last stage of
vital as it enables students to concentrate on the algorithms the code generation, all tokens are unmistakably converted into
while not being distracted by the syntax and also is the key syntactically correct PyObject code [20] that execute exactly
stage in software development and computer science education.
This research gives a good programmer-design tool for turning the same operations as stated in the original pseudocode. This
pseudocode into Python,The compiler supports basic data types, work has done in python.
control structures, loops, and functions which are the main This paper presents the design and implementation of a
components of the introductory programming courses. It consists compiler that translates pseudocode into Python code using
of three main components: lexical analysis, syntax analysis, and tokenization and syntax translation techniques. By breaking
code generation are the major processes of an interpreter. The
implementation seeks to offer a swift and dependable way of the down pseudocode into tokens and mapping these tokens to
transformation of pseudocode to Python, thereby accelerating Python constructs the compiler automates the conversion pro-
software development processes and improving the educational cess, enhancing development efficiency and reducing errors.
experiences. The rest of the paper is organised as follows: Section - II
Index Terms—Compiler Design, Lex, Pseudocode Conversion, talks about related work. Section - III talks about proposed
Python
methodology of our work. Section - IV talks about results of
our work and section - V talks about conclusion.
I. I NTRODUCTION
The design of programming skills, especially for the educa- II. R ELATED W ORK
tional application of coded algorithms, commonly consists in Akanbi et al. [3] put forward a detailed study of the code
utilising the pseudocode. Indeed, pseudocode is the abstraction generation techniques in the compiler design, focusing on the
of syntax that indicates the flow of specific programming structure and the strategies of the design that are necessary
languages. Students can refine their abilities to write code for certain computational environments. The article delves into
correctly and thus they don’t lose their ability to think algo- various methods, such as the peephole optimization, the parse
rithmically when using pseudocode. Nevertheless, it may be trees, and three address code generation, and explains the
hard for rookies to write executable code from pseudocode, peculiar benefits and the suitability of these methods in various
and the statement’s success depends on how well the student situations . Also the study shows how these techniques affect
is familiarized with the syntax of the target programming the compiler efficiency, because the importance of the code
language, and the defining structure of algorithms. optimization for the performance and resource management is
This work develops a pseudocode to python coverter that stressed.
requires very minimal user input, such as structured pseu- Patade et al. [4] delve into the design of an automatic code
docode that the compiler [18] automatically translates into generator for C and C++ programming that uses structured
Python, one of the most prevailing programming languages flowcharts to assist the coding process. This tool that is made
in both academia and industry. Consequently, this tool is for both the experts and the newbies enables the users to
not only helpful for students yielding in practical lessons on graphically design a algorithm through the flowcharts which
programming but also allows the educators to demonstrate the are afterwards automatically turned into the executable code.
practical implementation of algorithms. This compiler supports The research examined user satisfaction and the result is
basic data types, branches, loops, and function definitions. A that the user groups were highly satisfied with the tool, and
programmer who is learning basics of programming need not this proves that the tool is effective and usable . Through
to know anything else to start writing programs. the minimization of the need to remember complex syntax,
The compiler’s design is centered around three main compo- the application greatly simplifies the programming task thus
nents: tokenization[1][19], grammar analysis, and code output. making it more easy to understand and manipulate and the
Tokens form a series of elements like those important to the chances of mistakes are reduced.
code semantic sense which, in the syntactic phase of lexica Ravichandran et al. [5] analyze the intricacies of code
analysis, are broken down input code. This phase of syntax generation techniques in compiler design, they regard both
analysis[2] combines the tokens with pre-defined grammar the theoretical and practical aspects of these techniques. Their
and verifies that the pseudocode follows the pre-specified study shows different methods among which are peephole

Authorized licensed use limited to: University of Wales Trinity Saint David. Downloaded on March 13,2025 at 17:07:07 UTC from IEEE Xplore. Restrictions apply.
optimization, parse trees, and three-address code generation, code with the help of a compiler-based system. It is about
which altogether increase the compiler efficiency and accuracy. the difference between human language and programming
Moreover, the study also examines the effect of these methods languages and it consists of a complete framework which has
on the optimization of the compiler back-end processes, the a speech recognition, a lexical analysis, a syntax analysis, a
insights into their appropriateness for different computing semantic analysis and a code generation. The architecture of
environments are given. This thorough analysis brings the the system consist of a Flask server that processes user instruc-
improvement of the compiler design and execution in the real tions and a web-based code editor which is used for displaying
applications to reality. and editing the generated code. Main parts are a speech engine
Parekh et al. [6] has created a system to translate pseu- for voice input, lexical analysis for keyword detection, syntax
docode into source code of languages like C, C++, and analysis for text to code mapping, and semantic analysis
Java using neural networks, especially the cascade back prop for type checking and variable management. The evaluation
agation neural networks. This system is designed to sup- of the system’s efficiency is done by measuring processing
port both the beginners and the experienced programmers time, accuracy, precision, recall, and error percentage. Further
by automating the conversion of the logical constructs into work is proposed to expand the system to other programming
the syntactically correct programming statements which, thus, languages and to use dynamic learning algorithms for more
makes the bridge between the natural and the programming improved functionality. The method provides a new way for
languages. Their approach not only makes coding easier but the process of programming to be improved in terms of
also improves learning and coding efficiency by concentrating accessibility and efficiency by means of natural language
on the algorithmic logic instead of the language syntax . This interfaces.
tool thus becomes a key factor in bringing programming to Steuwer et al. [10] propose a compiler design that is oriented
a broader level and at the same time, reducing the cognitive to the language, which is related to the specialized soft ware
load on developers. and hardware trends, especially in machine learning. The paper
Kapseet et al. [7] is possible to observe the transformation of suggests to view the intermediate representations (IRs) as
high-level language algorithms into C/C++ via the rule-based formal programming languages, and thereby, a type system
approach that is assisted by the syntax-directed translation will be used to ensure the correct use of those.They discuss
(SDT) without the usage of an intermediate representation. the Shine compiler, that compiles the RISE functional pattern-
Their system automatically translates English-written algo- based language to C, OpenCL, and OpenMP through a hybrid
rithms into C/C++ code by using a set of lexical tools like Flex functional-imperative IR. The LIFT IR system is a proof of
for token generation and Bison for parsing. This methodology the increased reliability and certainty when compared with
solves the problem of the fuzzy meaning of natural language it. The method highlights the clear levels of IR abstraction
by using a common dictionary that makes the synonyms the and the expressive transformation capabilities. Experimental
same, thus simplifying the translation. The system is made evaluation of the systems shows that the generated GPU code
to increase the availability of programming by letting users is of high quality, proving that the language-oriented design
to input commands in the natural language which are then is very useful for the specialized compilers.
automatically converted into the syntactically correct code, p Sesha et al [11] introduce EzLang, a new interpreted
thus, the barrier to entry for non-programmers is reduced. programming language designed to simplify data visualization
Lexical analysis is the main part of compilation, in which the tasks. EzLang features dynamic typing, common program-
source code is scanned and turned into tokens. ming constructs, and a rich library of functions tailored for
Farhanaaz et al.[8] give the example of its roles in error data visualization. Its performance is favorably compared to
handling, preprocessing, and token generation. Tokens are existing languages in terms of execution speed and memory
specified by regular expressions and are realized through Finite usage, making it suitable for small to medium-sized programs.
Automata. The research compares the Deterministic Finite The research covers interpreter design, error handling, and
Automata (DFA) and the Non-Deterministic Finite Automata library development, offering insights into effective language
(NFA), and describes the conversion methods. RE are pre- implementation. Future enhancements include better memory
sented as language descriptors, translated to FSM for recogni- management, stricter type checking, and expanded code reuse
tion.Lexical specification tools like Lex perform this process capabilities. EzLang’s development involved using Bison for
by using user-defined regular expressions that the user can grammar, Flex for lexical analysis, and C for an evaluation
define. In multi-core systems, the lexical analysis can utilize engine, demonstrating competitive performance for data visu-
parallelism for the tokenization to be faster. Techniques, for alization in various applications
instance, make use of loop parallelism and processor affinity Xiao et al. [12] created a C-like language interpreter in C++
and thus optimize the compilation process by distributing with a modular architecture that comprises lexical analysis,
the tasks across cores. The study stresses the significance syntax analysis, and expression evaluation components. The
of tokenization and illustrates that parallelization’s benefits interpreter presents simple language components like loops,
outweigh the time reduction for the compile-time. conditionals, arithmetic operations, and function calls. Many
Sashank Sridhar et al. [9] create a new method for the hours of testing on Windows XP and MAC OS X 10 have been
conversion of natural language instructions into executable completed. 6. 1, and iPhone OS 3 as a result was removed.

Authorized licensed use limited to: University of Wales Trinity Saint David. Downloaded on March 13,2025 at 17:07:07 UTC from IEEE Xplore. Restrictions apply.
0 shows its ability to work with different platforms and that the future improvements involving so sophisticated program
it has a strong functionality. Moreover, the interpreter can be slicing.
combined into games to enable the dynamic modification of Imam et al. [16] invented CodeComposer, which is an
game logic via text input without the need of recompilation. Its automatic code generator that converts pseudocode into C
structure is focused on the reuse of the code and extensibility, code. This tool uses the natural language processing tech-
which allows the addition of new interface functions during niques like verb classifi cation, thematic roles, and semantic
the runtime. Nevertheless, the program is not equipped with role labeling to analyze pseudocode, which, in this case,
some of the features like object code generation and thorough is regarded as natural English text. The analyses provide a
variable scope control. The next steps could be on the devel- semantic rule-based mapping of each component of the code
opment of further data types and on the improvement of error for its composition. CodeComposer, the I CASE tool, has been
detection.In summary, this research shows how interpreters assessed for its effectiveness, and its performance has been
are capable of generating flexible and dynamic behavior in rated with a precision of 88 percent, a recall of 91 percent,
software systems, especially in gaming applications. and an F-measure of 89 percent. This proves its ability to help
Nguyen et al. work [13] consists in the application of developers by making it easier to change the pseudocode into
Abstract Syntax Trees (ASTs) for the evaluation of the code the executable code.
in the programming education. Their work shows how ASTs Amal et al. [17] have developed a software tool that converts
contribute to the improvement of the evaluation process on pseudocode into different programming languages, thus mak-
educational platforms by increasing the accuracy, the speed, ing the coding process for beginners more simple. This tool
and the error detection. They show how ASTs look at the gives users the ability to type pseudocode, select a language of
code patterns using tree regular expressions so they can their choice and get the corresponding code, thus the user can
automatically assess the code. The paper shows that Python skip the whole process of learning the syntax of the language.
can work with the ASTs, thus, the code submissions can The program is designed to help new programmers to get
be tracked down as strings. The application of ASTs in over the word introducing easier. The tool has a user-friendly
platforms like LeetCode and HackerRank has been the key interface and a flexible library which can be extended to cover
factor in improving grading and feedback, which shows their more languages. Finally, this could be turned into a worldwide
capability to enhance the programming skills evaluation and programming tool that is able to create code in any language
the development of assessment techniques in the educational from a single pseudocode input, thus, making it easier for
tools. beginners to understand the code.
Delaney Moore et al. [14] have created a new educational There has been limited work for pseudocode to python
tool that uses keystroke data to build a temporal AST that converter using python, that is important for beginners. In the
helps the computer science beginners to understand the cod- next section, we present our work.
ing thought process better. This method tries to portray the
nonlinear mental construction that is associated with software III. P ROPOSED M ETHODOLOGY
development by observing the way a student’s code changes The proposed system is presented in the figure 1 and
over time. The tool connects the keystroke dynamics with the explained as below:
AST structural changes and thus the student’s code writing
strategy, on the one hand, and problem solving on the other,
are seen in a real time graphical form. Besides, the research
is grounded on the Code Process Chart (CPC), which displays
the coding actions over the time span, and tries to use Fig. 1. System Diagram
these findings together with the student video to get a better
perspective of the coding process, which, in turn can help
students to improve their software development approach. A. Lexical Analysis
Chen Lin et al. [15] proposed a new method named the Lexical analysis is the technique of transforming the ini-
Block-wise Abstract Syntax Tree Splitting (BASTS) that can tial pseudocode text into tokens, which are the elementary
be used to improve automatic code summarization. BASTS constituents for parsing. This stage is about delineating the
solves the problems of modeling and producing code sum- patterns of tokens using regular expressions to describe iden-
maries from the Abstract Syntax Trees (ASTs) by splitting tifiers, constants, operators, and punctuation. The Python re
the code blocks block-wise in the control flow graph’s domi- module is employed to match these patterns. The pseudocode
nator tree.This method separates ASTs for each code segment is scanned, and then the tokens are extracted, and the whites-
and they are then processed by a Tree-LSTM with a pre- pace is ignored, hence the sequence of the pseudocode shows
training strategy to capture complex syntax patterns. The above the syntactic structure of the input.
patterns are used in a Transformer-based model to create
more effective code summaries. Their technique has demon- B. Syntax Analysis
strated substantial progress over the methods developed before Syntax analysis breaks the token sequence into a readable
according to the benchmarks, and they have the plans for form that is of great use in the generation of Python code.

Authorized licensed use limited to: University of Wales Trinity Saint David. Downloaded on March 13,2025 at 17:07:07 UTC from IEEE Xplore. Restrictions apply.
It means the pseudocode-specific language constructs are
replaced by the Python syntax, for instance, the ”is empty” is
changed to ”== []”. This phase controls indentation so much
because Python relies on it for defining code blocks and in
this way the translated code is in the style of Python’s strict
syntax. Besides, it also deals with the translation of pseudo
code function definitions into Python functions, accurately
parsing and incorporating the parameters.
C. Code Generation
The last stage, which is code generation, is about trans-
forming the structured token stream into the executable Python
code. This phase begins by re-assessing the tokenizer to make Fig. 3. Python code for factorial(n)
sure that all the syntactic parts are properly analyzed. Thus, it
first applies transformations and mappings that were decided
during the syntax analysis and then translates them into Python
code blocks. The compiling of the Python script from the
tokens guarantees its syntactically correctness and readiness
for execution, thus converting the pseudocode into a working
Python program.
The proposed compiler methodology in a way exactly
automates the translation from pseudocode to Python which
guarantees the accuracy and the executable Python scripts.
Thus, the organized method of code conversion increases the
productivity and the quality of the code, which is of great
importance in the educational and the software development
context. In the next section we presented our results.
Fig. 4. psuedocode for sum of n natural numbers
IV. R ESULTS
The proposed method accurately converted pseudocode
examples into Python code, including calculating factorials,
summing natural numbers, and matrix multiplication. We
present the obtained results in the following examples.
Example 1: Factorial(n)
The figure 2 is the input for the factorial(n) and correspond-
ingly figure 3 is the output for factorial(n).

Fig. 5. Python code for sum of n natural numbers

Example 3: Matrix multiplication


Figure 6 and figure 7 depicts the input and output respec-
tively for the matrix multiplication.
The generated Python code produced correct results: matrix
Fig. 2. psuedocode for factorial(n) multiplication yielded [[19, 22], [43, 50]], the factorial of
5 yielded 120, and the sum of the first 10 natural numbers
Example 2: Sum of n natural numbers.
yielded 55. This demonstrates the method’s effectiveness and
The input and output for Sum of n natural numbers are
potential for streamlining Python code development from
depicted in figure 4 and figure 5 respectively.
pseudocode. As per best of our knowledge none other work has
applied lex and YACC to convert the pseudocode to Python.
In the next section we conclude our work.

Authorized licensed use limited to: University of Wales Trinity Saint David. Downloaded on March 13,2025 at 17:07:07 UTC from IEEE Xplore. Restrictions apply.
[5] S Ravichandran, K.G.S Venkatesan,V Sitharamulu ”Design and Devel
opment of Code Generation Procedures in Compiler Scheme: Theoret
ical and Essential Appraise” Journal of Network Communications and
Emerging Technologies (JNCET) Volume 12, Issue 6, June (2022)
[6] Parekh V and Nilesh, D ” Pseudocode to source code translation”Intl. J.
Emerging Technologies and Innovative Research (JETIR), 3(11), pp.45-
52. 2016.
[7] Kapse, A.S, Thakare V.M. and Kapse, A.S ”Translation of High Level
Language Algorithm in C/C++ Program using Syntax Directed Trans-
lation”
[8] Farhanaaz and V. Sanju, ”An exploration on lexical analysis,” 2016
International Conference on Electrical, Electronics, and Optimiza-
tion Techniques (ICEEOT), Chennai, India, 2016, pp. 253-258, doi:
10.1109/ICEEOT.2016.7755127.
[9] Sashank Sridhar, Sowmya Sanagavarapu, “A Compiler-based Approach
for Natural Language to Code Conversion”, International Conference on
Fig. 6. psuedocode for matrix multiplication Computer and Informatics Engineering (IC2IE) (2020)
[10] Steuwer, M., Koehler, T., K¨ opcke, B. and Pizzuti, F., 2022.
RISE and shine: Language-oriented compiler design. arXiv preprint
arXiv:2201.03611.
[11] Sesha, P.J., Bairagi, S.A., Abhishek, K., Yadav, D. and Bharati, R.,
2023, August. EzLang: AC Based Programming Language. In 2023 7th
International Conference On Computing, Communication, Control And
Automation (ICCUBEA) (pp. 1-5). IEEE.
[12] Xiao, X. and Xu, Y., 2011, October. The design and implementation
of c-like language interpreter. In 2011 2nd International Symposium on
Intelligence Information Processing and Trusted Computing (pp. 104
107). IEEE.
[13] JNguyen, A.T.P. and Hoang, V.D. ” Development of Code Evaluation
System based on Abstract Syntax Tree. Journal of Technical Education
Science” 19(1), pp.15-24. 2021
[14] Moore, D., Edwards, J., Karimi, H., Khadka, R. and Bodily, P., 2022,
May. Temporal Abstract Syntax Trees for Understanding Student Coding
Thought Process. In 2022 Intermountain Engineering, Technology and
Computing (IETC) (pp. 1-6). IEEE.
Fig. 7. Python code for matrix multiplication [15] Lin, C., Ouyang, Z., Zhuang, J., Chen, J., Li, H. and Wu, R., 2021,
May. Improving code summarization with block-wise abstract syntax
tree splitting. In 2021 IEEE/ACM 29th International Conference on
V. C ONCLUSION Program Comprehension (ICPC) (pp. 184-195). IEEE.
[16] Imam, A.T. and Alnsour, A.J., 2019. The use of natural language
This paper introduces a method for converting pseudocode processing approach for converting pseudo code to C code. Journal of
to Python using a tokenizer and translator function. The Intelligent Systems, 29(1), pp.1388-1407.
approach defines token patterns and keywords to tokenize the [17] Amal, M.R., Jamsheedh, C.V. and Mathew, L.S., 2016. Software tool for
translating pseudocode to a programming language. Intl. J. Cybernetics
input pseudocode, which is then translated into Python code AND Informatics, 5(2), pp.79-87.
with proper syntax and indentation. Our method effectively [18] Likhith, Arlagadda Naga, Kothuru Gurunadh, Vimal Chinthapalli, and
manages control structures, function definitions, and arithmetic Meena Belwal. ”Compiler For Mathematical Operations Using English
Like Sentences.” In 2023 7th International Conference on Compu-
operations. The successful translation of a matrix multiplica- tation System and Information Technology for Sustainable Solutions
tion pseudocode example demonstrates the method’s practical- (CSITSS), pp. 1-6. IEEE, 2023.
ity. This automated process can significantly enhance software [19] Sougandh, Thatavarthi Giri, Nithish Sagar Reddy, and Meena Belwal.
”Automated Resume Parsing: A Natural Language Processing Ap-
development by simplifying the transition from pseudocode to proach.” In 2023 7th International Conference on Computation System
executable code. Future work will focus on supporting more and Information Technology for Sustainable Solutions (CSITSS), pp.
complex pseudocode constructs and improving error handling 1-6. IEEE, 2023.
[20] Tiwari, S.P., Prasad, S. and Thushara, M.G., 2023, May. Machine
to make the translator more robust and versatile. Learning for Translating Pseudocode to Python: A Comprehensive
Review. In 2023 7th International Conference on Intelligent Computing
R EFERENCES and Control Systems (ICICCS) (pp. 274-280). IEEE.
[1] Vaikunta Pai T, P.S Aithal ”A Systematic Literature Review of Lexical
Analyzer Implementation Techniques in Compiler Design” International
Journal of Applied Engineering and Management Letters (IJAEML),
4(2), 285-301. ISSN: 2581-7000.2021
[2] Manju ,Rajesh Kumar ”Automatic Scanning and Parsing using LEX
and YACC”International Journal of Computer Science and Mobile
Computing, Vol.6 Issue.7, July- 2017, pg. 77-82
[3] Johnson Akanbi,Allen Akinkitan,Kareem afiss Emiola”Code Generation
Techniques in Compiler Design: Conceptual and Structural Review”
SSRG International Journal of Recent Engineering Science Volume 9
Issue 3, 1-6, May-Jun 2022 ISSN: 2349– 7157
[4] Sanika Patade, Pratiksha Patil, Ashwini Kamble, Prof. Madhuri Patil
”AUTOMATIC CODE GENERATION FOR C AND C++ PROGRAM-
MING” International Research Journal of Engineering and Technology
(IRJET) Volume: 08 Issue: 05 — May 2021

Authorized licensed use limited to: University of Wales Trinity Saint David. Downloaded on March 13,2025 at 17:07:07 UTC from IEEE Xplore. Restrictions apply.

You might also like