Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2022, IRJET
An The process of turning a string of letters into a string of tokens is known as lexical analysis, commonly referred to as lexing or tokenization. These tokens may be keywords, identifiers, constants, operators, or other language-specific symbols. The word lexical is obtained from the native word i.e. lexeme which means tokens. The process of lexical analysis usually includes reading each character of the input one by one, grouping characters into tokens, and passing these tokens to a parser or other program for further processing. Lexical analysis is often first step in the operation of compiling or interpreting a program. It is also used in natural language processing, information retrieval, and other fields where it is necessary to identify and classify the elements of a body of text. In general, lexical analysis involves breaking up a stream of text into a sequence of tokens, which can then be further processed and analyzed by other programs. It is an important step in the compilation and interpretation of programming languages, as well as in the processing of natural language.
International Journal of Applied Engineering and Management Letters (IJAEML), 2020
The term “lexical” in lexical analysis process of the compilation is derived from the word“lexeme”, which is the basic conceptual unit of the linguistic morphological study. In computerscience, lexical analysis, also referred to as lexing, scanning or tokenization, is the process oftransforming the string of characters in source program to a stream of tokens, where the tokenis a string with a designated and identified meaning. It is the first phase of a two-stepcompilation processing model known as the analysis stage of compilation process used bycompiler to understand the input source program. The objective is to convert character streamsinto words and recognize its token type. The generated stream of tokens is then used by theparser to determine the syntax of the source program. A program in compilation phase thatperforms a lexical analysis process is termed as lexical analyzer, lexer, scanner or tokenizer.Lexical analyzer is used in various computer science applications, such as word processing, information retrieval systems, pattern recognition systems and language-processing systems.However, the scope of our review study is related to language processing. Various tools areused for automatic generation of tokens and are more suitable for sequential execution of theprocess. Recent advances in multi-core architecture systems have led to the need to re-engineerthe compilation process to integrate the multi-core architecture. By parallelization in therecognition of tokens in multiple cores, multi cores can be used optimally, thus reducingcompilation time. To attain parallelism in tokenizationon multi-core machines, the lexicalanalyzer phase of compilation needs to be restructured to accommodate the multi-corearchitecture and by exploiting the language constructs which can run parallel and the conceptof processor affinity. This paper provides a systematic analysis of literature to discuss emergingapproaches and issues related to lexical analyzer implementation and the adoption of mprovedmethodologies. This has been achieved by reviewing 30 published articles on theimplementation of lexical analyzers. The results of this review indicate various techniques,latest developments, and current approaches for implementing auto generated scanners andhand-crafted scanners. Based on the findings, we draw on the efficacy of lexical analyzerimplementation techniques from the results discussed in the selected review studies and thepaper provides future research challenges and needs to explore the previously under-researchedareas for scanner implementation processes.
Syntactic Wordclass Tagging. Kluwer Academic …, 1999
1994
Any linguistic treatment of freely occurring text must provide an answer to what is considered as a token. In arti cial languages, the de nition of what is considered as a token can be precisely and unambiguously de ned. Natural languages, on the other hand, display such a rich variety that there are many ways to decide upon what will be considered as a unit for a computational approach to text. Here we will discuss tokenization as a problem for computational lexicography. Our discussion will cover the aspects of what is usually considered preprocessing of text in order to prepare it for some automated treatment. We present the roles of tokenization, methods of tokenizing, grammars for recognizing acronyms, abbreviations, and regular expressions such as numbers and dates. We present the problems encountered and discuss the e ects of seemingly innocent choices.
1994
Any linguistic treatment of freely occurring text must provide an answer to what is considered as a token. In artificial languages, the definition of what is considered as a token can be precisely and unambiguously defined. Natural languages, on the other hand, display such a rich variety that there are many ways to decide upon what will be considered as a unit for a computational approach to text. Here we will discuss tokenization as a problem for computational lexicography. Our discussion will cover the aspects of what is usually considered preprocessing of text in order to prepare it for some automated treatment. We present the roles of tokenization, methods of tokenizing, grammars for recognizing acronyms, abbreviations, and regular expressions such as numbers and dates. We present the problems encountered and discuss the effects of seemingly innocent choices.
International Journal of Scientific Research in Science, Engineering and Technology, 2020
A compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language). The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language [7]. The three main processes of compilation are lexical analysis, syntax analysis and semantic analysis. A compiler has two components, front-end and back-end. Front-end portion of a compiler has to do to main tasks lexical analysis and syntax analysis. On the lexical analysis, input source code are scanned and tokenized into various tokens [6]. In the system, front-end portion of the compiler, lexical analysis is used. There are many token elements in C++ programming language. In this system, line break token, white space tokens (space and tab) and operators (+,-,*,/,=,+= and so on) are used as token elements for the assignment statements of C++ source program. This system is taken all the assignment statements of C++ program as input. Of course, the extracted assignment statements may be literals or values assignment statement (e.g. x=3; or pi= 3.142;), variable assignment statement (e.g. x=y; or x=z;) and expression assignment statement (e.g. a=b+c; or x=y*z; or a=b*(c+d); and produced symbol table, step by step recognized table by using finite state automata and lexeme table.
Proceedings of the 14th conference on Computational linguistics -, 1992
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Christian Century, December, 2023
Fifty Years of Prosopography. The Later Roman Empire, Byzantium and Beyond, ed. by A. Cameron, Oxford 2003, 11-22 (= Proceedings of the British Academy 118)
El uso de Génesis 15:6 por Pablo y Santiago: Implicaciones hermenéuticas, 2012
International Journal of Financial, Accounting, and Management
Brain tumor pathology, 2005
Construction and Building Materials
PLOS ONE, 2016
Journal of Instrumentation, 2010
Investigación Económica, 2013
Physical Review Letters, 1985
Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy, 2014