Location via proxy:
[ UP ]
[Report a bug]
[Manage cookies]
No cookies
No scripts
No ads
No referrer
Show this form
Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Loading...
User Settings
close menu
Welcome to Scribd!
Upload
Read for free
FAQ and support
Language (EN)
Sign in
0 ratings
0% found this document useful (0 votes)
97 views
Lex - A Text Scanner
Uploaded by
Emilian Francu
LEX & YACC
Copyright:
© All Rights Reserved
Available Formats
Download
as PDF or read online from Scribd
Download
Save
Save Lex - A Text Scanner For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Lex - A Text Scanner
Uploaded by
Emilian Francu
0 ratings
0% found this document useful (0 votes)
97 views
19 pages
Document Information
click to expand document information
LEX & YACC
Copyright
© © All Rights Reserved
Available Formats
PDF or read online from Scribd
Share this document
Share or Embed Document
Sharing Options
Share on Facebook, opens a new window
Facebook
Share on Twitter, opens a new window
Twitter
Share on LinkedIn, opens a new window
LinkedIn
Share with Email, opens mail client
Email
Copy link
Copy link
Did you find this document useful?
0%
0% found this document useful, Mark this document as useful
0%
0% found this document not useful, Mark this document as not useful
Is this content inappropriate?
Report
LEX & YACC
Copyright:
© All Rights Reserved
Available Formats
Download
as PDF or read online from Scribd
Download now
Download as pdf
Save
Save Lex - A Text Scanner For Later
0 ratings
0% found this document useful (0 votes)
97 views
19 pages
Lex - A Text Scanner
Uploaded by
Emilian Francu
LEX & YACC
Copyright:
© All Rights Reserved
Available Formats
Download
as PDF or read online from Scribd
Save
Save Lex - A Text Scanner For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download as pdf
Jump to Page
You are on page 1
of 19
Search inside document
aan Lex. aten scanner Lex - a text scanner Contents © Lexasa Stand-Alone tool © Lex Program Structure © The Lex Specification © Lex Pattems © Lex Actions © Precedence of Lex Pattems © The "longest match' rule © Interaction between Lex and C © Lex is a State-Machine Generator © Lex generates yylex() © yylex() and return) © Examples of Lex programs © ALex squid redirector © Using yylex() within a parser rip-url for multiple files using id-redirector ~ a squid redirector Lex as a Stand-Alone tool isitspam - lex with a C-lan, user Although Lex is offen used as a front-end to a parser, it has been designed such that it can be used stand-alone Used in this fashion, Lex makes for a simple but very powerful text processing tool In the following discussion, we will be considering lex mostly inthis role, without calling upon it's usual partner, ace. Lex Program Structure A [ex program has the following basie structure: nd includes lvasn auoveheadlex yacoextndaan Lex. aten scanner Lex macro defi statements like th. age program (the rest) However, the only mandatory partis the first ¢* The most important part of the Lex programs the Lex Specification. This is a series of statements of the form: pattern or to put it more practically: regular_expr ram statements ion (cpr The simplest Lex program you can write would read the standard input, write to the standard output, and look something like this * http: \/\/[* \nco"]* printf ("Ss\n", yytext) + \n : Which is a program which reads the standard input, extracts any hyper-text standard output. , and writes then to the To compile it, save it as rip-url.1 and run the command: make LOLIBS=-11 rip-url And it's all done. Finished. You've just created an executable called -ip-ur1. You can take the rest of the aftemoon off ‘That was f00 easy. So what did we really do? The Lex Specification Lex Patterns Lex Patterns are (more or less) standard Unix regular expressions, after the style of grep, sed, awk, perl ete. See the lex (or flex) man page for all the gory details, but here's a quick summary: alpha_numeric Most characters are taken literally, just like the "http" in our example, This includes all letter, digits, and _, plus several others They are, in effect, taken to be single character regular expressions luvasn. aerate yactextind 219aan Lex. aten scanner abede A single character regular expression consisting of any one of the characters in between the square backets ¢ 1 ‘A range of characters may be specified by using a hyphen, like this (a~za~21 To include the hyphen put it first or last in the group, like this (-2-2a~21 To include a }, put it first in the group, like this () abe Ifthe first character is *, it means any character except those listed. So the second part of our example * \neo") means “anything but a space, newline, quote, less-than or greater-than" . Any character following the \ loses i’s special meaning, and is taken literally. So the \/\/ in our example really means // The \ is also used to specify the following special characters \a 0x7 The alert (ie bell) character \b 0x8 Backspace \f OxC_ Form Feed OxA New Line \e OxD. Carridge return \e 0x9 Tab \y OxB Vertical Tab \o 0x0. Null character \123 0x53 octal representation of a character \x53 0x53 hexadecimal representation of a character Caveat: Some of the above may be flex enhancements. mtext™ Characters between the quotes ** lose their special meanings, and are intpreted literally. So we could have written the first part of our pattem as "aetp://* instead of netp:\/\/ “and 5 The * and 5 characters constrain the regular expression to the start or end of the lin, respectivelty. + and + The + and » imply repetition of the preceding single character regular expression, + means "one or more occurances of" lvasn auoveheadlex yacoextnd ateaan Lex. aten scanner + means "zero or more occurances of". The range expression (1/2) also implies repetition of the preceding regular expression. 3,5) means "3 to 5 occurances of" 2,} means "2 or more occurances of", (2) means "exactly 3 occurances of" The 2 implies that the preceding single character regular expression is optional Inefféct, it means "zero or one occurances of". (and ) The round backets imply grouping, such that the regular expression between the brackets is treated as ifit ‘was a single character (or nearly enough). See the discussion of precedence in the flex man-page for more information. Any single character except a newline (\n) The | is used to specify a "bgical OR" of two regular expressions. The exact text which is OR'd is governed by the precedence rules, so it's best to use brackets, like this: (tpi http|telnet) :\/\/{* \non)* / The is / is used to specify a "trailing conte ‘or example, the expression: c/-Program Matches the letter "C" iff is followed by the text "-Program" Note that only the letter C is matched, and copied to yytext. The "-Program'" is not consumed by the rule, and will match subsequent regular expressions, too. However, for the purp of deciding the "longest match’, the whole text "C-Program' is considered, Putting s at the end of a regex is the same as putting /\x Let's examine the Ist pattern in detail netps\/\/(* \neo"]* attp: is taken literally \/\/ means two slashes // ~ _ isaccharacter-set, which specifies the space, newline quote or angle brackets. However, since the \nc>"] Ist character is a caret, these characters are excluded fom the set, and everything else is lvasn auoveheadlex yacoextnd 419aan Lex. aten scanner included, . ‘means zero or more instances of the character-set in the preceding (.. 1 -|\a means everything else, one character at a time. Since our action consists only ofan empty statement ( ; ) the text is just discarded. So our regular expression means: any string starting with "http!" and which doesn't contain a space or \n: It is worth mentioning that, our reg-ex would not match It is case-sensitive, unless we tell flex to build a case-insensitive lexer using the "-i" option. The 2nd pattern-action statement is essential, because lex has a default action for any text which did not match any rule in the specification, ‘The default action is to print it on the standard output. This can be usefil occasionally, if you just want to do a small modification on the input stream, such as stripping out htm tags, and replacing text eg: putchar ("6"); The above lex-specification will discard any text between angle-brackets (even multi-line text), and print the rest to the standard output. Its a working lex program, and you can compile it by saving it as strip-htm1.1 and The text matched by the first rule has mn action statement which is simply *; ' ie an empty statement. This means done, The matched text is just ‘swallowed by the lexer. that the text is read, but nothing ee ‘The remainder of the text is copied to the output because it does not match any rule, and the default action takes over. The above example might be a usefil as a front-end to a program which indexes web-pages. Lex Actions Lex Actions are typically just C-program statements. An action may be a single C-statement (as in our example), or multiple statements enclosed in curly braces Anaction within curly braces ¢ .... ) can span mutiple lines. No special continuation character is required, but lvasn auoveheadlex yacoextnd sieaan Lex. aten scanner each extra line should be indented by at least one tab from the start of the line, like this: http: \/\/[* \nco"] * t printé ("Ss\n", yytext) ; ) +1 \n ? There are some other "special" actions which lex also understands. These can be invoked stand-alone, or fiom within the C-statements in curly braces { ... ). ECHO: print the matched text on the standard output RESECT; Do not consume the matched text. Re-interpret the input and use the "second best" match instead (see also the section Precedence of Lex Patterns). This is a lot more complicated than it sounds! Use with caution, BEGIN state: Set Lex into the named state (also know as a "s Start conditions must be declared in the section Lex macro defi The pattern is only applied when the appropriate state has been entered, States may be exclusive or inclusive. An exclusive start-condition is where no other patterns are applied, except those with the appropriate start-condition. An inclusive start-condition is where the rule is applied together will any other rules which are not constrained by start-conditions. Exclusive states are a feature of flex. Start conditions are a powerful but easy to use feature oflex. See the man page for more information, yymore() Scan for the next patter as usual, but prepend the text from this rule on the the yytext variable of the next nae, yyless(n) Retain the first characters from this patter, but return the rest to the input stream, such that they will be used in the next pattem-matehing operation. lvasn auoveheadlex yacoextnd areaan Lex. aten scanner Lex also provides a number of variables which we can use within actions. (char *) yytext This is a copy of the text which matched the curent pattern, as a nul-terminated string, (int) yylen This is the length of yycex: Please read the man page on flex for other, more exotic ways of using actions, and their subtleties. Precedence of Lex Patterns In most situations where we use regular expressions, there is only one regular expression active at a time. In the case of.a lex specificatoin, there are multiple regular expressions, all active at the same time. This leads to the situation where a particular piece of text may be legitimately interpreted by more than one lex pattern. In order to resolve such ambiguities, lex uses the following rules of precedence: * The longest possible match is chosen first (remember that any trailing context is considered part of the matched-length). * Iftwo patterns match the same length of text, the frst pattern in the specification is used. Please see the flex man page for further discussion of precedence, and discussions of the precedence of elements within a lex pattern, The "longest match" rule As was mentioned in above in Precedence of Lex Patterns, lex pattems are to be considered "in parallel’, and the longest match is the one which is eventaully chosen and it's action executed, This point is worth stressing, as itis the most common cause of "unexpected behaviour" from a lex-generated program. Inparticular, a rule like * (ed is usually a bad one to use, because .* would match an entire line of text, excluding only the trailing EVERY line in the input. ny for The net effect of this is that any other rules would not get a look-in, In those instances where.» is appropriate, it is best to precede it with a start-condition, Interaction between Lex and C lvasn auoveheadlex yacoextnd meaan Lex. aten scanner Lex is a State-Machine Generator So far, we've discussed lex as ifit interprets the regular expressions in our lex-specification at run-time, as sed or awk would do. However, this is not exactly true. Lex is in reality a C-code generator, more like a compiler than an interpreter. Our Lex Specification is "compiled" into C-code. The C-code Lex generates is in the form of a state-machine, which processed input character-by-character. Although this makes debugging tricky, is does have one important advantage: Pure, unadulterated, speed. Consider the case where you are trying to build a Web-Browser. Ifyou did it the traditional" way, using (for example) scant () and stremp (), you would get something like this: while( (c=getchar()) != EOF ) { f(cl= te!) { weite char te screen char tag(20]; sean ("8 [A~Za-z De £ ( stremp(tag,"HRAD") == 0 ) state = HEADER; else if ( stremp(tag,"H1") == 0 ) + change font else if ( stremp( + change else if ( stremp( + change £ else if ( stremp( ... So what's wrong with it? Readability for a start, but that’s not our main concem. Consider the code that is executed when the markup word
is encountered. © First, the statement bE (ete ter) detects that ths isthe start ofa markup word. * Now we scan the first part of the word in using scans () So far, so good. * Now we compare the string to the text "HEAD", The compare fils, but not until s= the 2nd character of the 2 strings * We do the same again, using "HI", Again, we have to get to the second character of the strings to determine that our match has failed * Eventually, we will get to the "13" compare, but not before we have compared 'x'=="1" three times, and discarded the subsequent result. mp () has compared Given that there are dozens of possible markup words, we could be calling s:renp () dozens of times. Worse than than, strcmp () may have to get several characters into the compare before retuming a negative result lvasn auoveheadlex yacoextnd areaan Lex. aten scanner So it would be better if, instead, we did it like this: char tagl20]; seanf ("$[A-Za-z]", tag) 5 if ( tag{0] == 'H') { i£( tag1] == 'E' ) { if (stremp(tag,"HEAD") == 0 ) HEADER; else ag{1] ==" Nora ¢ change font to hl . else if ( tag[1] == '2 "ory 4 change font to h2 ... else if ( tag[1] == '3' && tag(2 Nora ¢ Now, we only do the comparision 'H'=='H’ once, and go straight onto the second character. We have, without even realising it created a state-machine. When we scan the character 'H', we go to a new ‘state’. Each level of nested-if statements creates two, additional, sub-states of the higher state. But why settle for nested- if statements? We can create a top-level case-statement, with a case for each character A-Z, and have nested-case statements for processing the 2nd char...ete So we now have a high-performance scanner, but we've had to sacrifice what little readability of our original source-code still had. Lex is designed to generate a suitable state-machine based text analyser, while maintaining readability at the source-level. ‘So we can have our cake, and eat it, too. Lex generates yylex() So fir, we have been using lex in a "stand-alone" mode, and linking inthe (hitherto mysterious) lb lbrary using DLTBS: In fact, lex does not generate a complete program. Lex generates a single function, (int) and some associated global variables. ylex() yylex () reads the input indicated by the global variable (FILE*) yin. yyin defaults to the standard input When lex reaches the end of the file itis reading, it calls a finetion (int) yywrap() Iyyrap () retums non- zero, yyLex () retums a zero value, [fyywrap () returns zero, yylex() keeps scanning, from where it let off, with whatever input is available on yin. This is only usefil if yyw=ap () has changed yy:in to provide for additional input. The library lip (or libfl for flex) provides two factions which are needed to complete our stand-alone lex program: lvasn auoveheadlex yacoextnd aneaaa Lex ated scanner © main() which simply calls yy ex () * yyweap 0) «Which always retums non-zero. Let's rewrite rip-url such that we do not need the standard libl, and add a few more features along the way. a #inclade
include
int file num; int file num max; char **files; extern int errnoy 8) a (fep http) \/\/ [> \neo"]* printf ("Ss\n", yytext) ; I\n int main(int arge, char *argv[]) { file_nury file_num_ max = arge; files = argv; if (arge > 1) { if ( (yyin = fopen(argv[fite_num],"2")) perror (argv[file_num]); exit (1); } } while( yylex() ) return 0; ) int yywrap() ( felose (yyin); if ( +#£ile num < file nummax ) { Lf ( (yyin = fopen(files[file_num],"e")) perror (files [file_num]); exit (ls ) return 0; ) else ( return 1; , D We now have © a fiction nain() which opens the first file (if specified) and calls yy tex () © When yyiex() finished with the first file, it calls yyw--ap (), which opens the next fil, and yyiex() continues. © When yyweap () has exhausted all the command line arguments, i value 0 (but we don't use the retum value). retums 1, and yylex() retums with lvasn auoveheadlex yacoextnd sieata Lex ate scaner Moreover, since we have now provided both main () and yyweap (), we no longer need the libl library, and we can compile rip-url using simply: make rip-url Notice that the lib] library (lex library) is required only if we do not provide main () or yywrap (). The bl library is not required for any of the text-processing - this is all done by the lex-generated C-code. yylex() and return() Although none of our examples have so far done so, itis valid to execute a recura () statement within a lex rule. yylex 0) is oftype int, so a non-zero integer value would normally be retuned. Retuming zero would be ambiguous, because the zero value is what is retuned by yyiex () when it encounters and end-of file, and yywrap () retums a non-zero, Afier yytex () has retumed e to call it again and again, and the scanner will continue exactly where it left off cach time. If any start-condition was in force when the return () was executed, it will still apply when yylex () is called again This aspect of yyiex () plays a key role when lex is being used as a ffontend to a parser, such as yace. ‘When writing a stand-alone lex program, itis generally not required to have a recurn () statement within a lex nuk. Examples of Lex programs A Lex squid redirector If you are using the Squid http proxy (and who doesn't?) you should be aware that it supports redirectors. A squid redirector is a pipe, which is invoked by squid, fed a line of text at atime, lke this: url ip-addr ‘qdn ident method is the requested URL ip-ad is the IP-address of the host where the URL request came fiom faan fully qualified domain-name of the host where the request came from, ifavailable, otherwise '- dent Is the result of the ident_lookup, if enabled in the config file, otherwise '-" lvasn auoveheadlex yacoextnd snaan Lex. aten scanner method is GET, PUT, POST ete, The redirector program writes either a blank line, or a new URL. In the later case, the new URL is fetched in lew of the original (hence redirector"). The most obvious application is for virtual-hosting, where a single squid proxy is made to look like serveral different servers. Squid redirectors are typically implemented using regular expressions of some kind. The most obvious tools for implementing redirectors would be programs like sed, awk or perl. Allof these are really over-kil, in that they use a complex program to solve a simple problem. As such, they use more memory and CPU capacity than is strictly necessary. Ifyou have a busy squid proxy, you would probably want to use a special C-program to act as your redirector, such as squirm (see http:/iwww.senet.comau/squirm), Lex is designed with performance being a major goal, That makes lex an ideal candidate for implementing a squid-redirector for a performance-critical application. The proof of whether or not lex is actually faster than it's alternatives is "left as an exercise to the reader A lex-based squid redirector would look something like this: Sx SKIP. Sx COPY toption always-interactive ae "http: //www.debian.org/" Hl "http://www. yahoo. com/* ‘ yytext[yyleng-1] = '\o' printf ("8s.au/", yytext) ; BEGIN COPY; ) “nttp://" (www.) Maltavista.di tal.com/" printf ("http://www.altavista.yellowpages BEGIN COPY; ) "tp: //sunsite anu.edu.au/pub/linux/" "ftp: //sunsite.unc.edu/pub/Linux/" "ftp: //ftp.funet .fi/pub/Linux/" printé(" BEGIN Co} ) ‘ aul"); //ftp.monash .edu.au/pub/Linux/"); wnttp://weathe: "http: //bypass." “ftp: //bypass.
(* \n
" " BEGIN SKI : BEGIN SKIP;
7 copy; } copy; } lvasn auoveheadlex yacoextnd saneaan <*> \n To build the above redirector, save it to a file such a Lex. aten scanner { putchar("\n'); fflush(stdout); BEGIN 0; } squid~vedirector.1 and build it using make LDLIBS=-11 squid-redirector Note that it is vital that the redirector process does not bufler i's input or output by more than a single line at a time, otherwise squid will hang.. The line soption always-interactive takes care ofthe input, while the statement £f1ush (stdout) ; takes care of the output. Some key features of the above program: In the first action, we actually modify yytexct before writing t out. We are free to modify yytext as we ‘would any string variable (with all the usual perils) provided that we do not use yyless () aflerwards. Refer to the flex man page for more information. Note in particular the lex-directives “array and ‘pointer in the documentation, and how they impact yytess () The directive sopticn always-interact ive is essential in this case. Without it, flex will ry to read an extra character ahead, even afier the newline. In the absence of any character to read, this will result in the program going into a blocked state (waiting for input), before it writes out the result of the current line. This will cause squid to hang, The toption always-interactive tells flex to do the minimal look-ahead only. In this case, the rule which contains the newline is unambiguous, and there is no need for look-ahead. Hence this option prevents unwanted bullering on the input ‘After writing the output, we must use ££1ush () to defeat unwanted buffering on the output, Otherwise, the output will just sit around in a buffer instead of being sent straight to the squid process. Again, this would cause squid to hang. The (exclusive) start-condition sx1P is used to discard input. This happens when either © the intial text does not match any of the explicit rules ("http7/." © afier the first space in a URL whieh is being modified Ifnone of our explicit rules (i "tp/.") match, we use the single- dot rule to put us straight ito the state sre, We are relying on the "longest match" mechanisim to ensure that the explicit rules are given preference over the default rule Inthis case, we want to write a blank line instead of a modified URL. Ifone of our explicit rules (ie "hitp2/...") match, then we write the first part of the URL using prints 0), and immediately invoke the (exclusive) start-condition copy. From here on, the rest of the modified URL is copied fiom the input to the output (up to the first space). The lex macro ECHO is used as a matter of convinience, patchax () would be just as good. We could use a non-exclusive start-condition for SKIP, but then we would have to change the rule to:
.* ; to ensure that the SKIP rule takes precedence over the other rules, by virtue of being the “longest match” rule, It would have to appear before the other rules too, just to be safe. lvasn auoveheadlex yacoextnd a8aan Lex. aten scanner The same applies for cory © The statement secrN 0; is used at every newline to reset the start-condition to it's "normal" state (ie when no start-condition applies). © The start-condition <> means that the rule \n applies for all start-conditions, even the exclusive ones SKIP and copy. * The redirector ereates a synonym "weather" for the site www.bom.gov.au * We have provided a feature whereby we can effectively bypass our redirector by using, eg http://ypass. www.yahoo.com to refer to hitp:/Awww-yahoo.comY In order to realise the benefits of the lex state-based analysis, it is important to avoid things like yyless (), neasc? and any construct which may result in the lexer doing "backup". You should read carefilly the section "Performance Considerations" in the flex man-page. Using yylex() within a parser Lex is actually designed to be used as a front-end for a parser. Let's look at how a typical parser would use lex. One statement which we have not considered so far in our lex actions is the veturn statement. In the simple examples we have considered so far, putting a retuzn info an action would result in a premature end to the file processing. However, this need not be the ease. Afler yy1ex () returns, we can simply call it again, and processing will continue exactly where it left off This feature of yylex () is what makes it suitable as a front-end to a parser. Lets consider a simple parser, using just C-code, Our example will be a program which reads an e-mail message, analyses the headers, and tells us ifitis likey to be spam, or not. Let's use a 100-point check, based on the following criteria: * Precedence: ifit's "bulk", we'll give it 10 points, if it's unk" that's worth at least 20 points, . vo: Ce: This is the biggest clue. Ifthe ro: address is the same as the Prom: address, let's add 30 points (ifeither to: or is missing, we'll ad the 30 points anyway). © [four user-name is in neither the To nor CC fields, then let's add another 30 points, * Another clue is ifthere are lot of recipients. Lets add 10 points if there are more than 20 recipients. (Any message sent to more than 10 people probably isn't worth reading, anyway). * Lastly, let's consider the message size. Most people write reasonably short messages, typically less than 5k. Spamis often 10k or more, We'll add 10 points for any message %gt; Sk in size Let's do the lexer first. In this case it's fairly simple. It looks for the header fields #rom From: To: Ce: lvasn auoveheadlex yacoextnd sai8aaa Lex atent scanner Precedence: and retums a specific code for each of these. Any other header-field is retumed as "OTHER" Our lexer is also made to find the e-mail addresses for us, and return them. Any other text is retumed word-by-word. ‘The message body is counted by the lexer, but nothing is retumed to the parser. #4 include
include
typedef enum ( START_OF_HEADER=256, FROM=257, T0=258, 59, PRECEDENCE=260, OTHER=261, NAME=262, END_OF_HEADER=263, EMATL=264 } token_ty token_t token,hdr_field; char *token_txt; void is_it_spam(int points, int msgn); int get email _address(char **user, char **host); void hdr_to(); void hdr_ce(); void hdr_from(); int. precedence ( char *my_name; int my_name_len; int to_me; char *to addr char *from addr, int body? a Sx BODY return START_OF_HEADER; return FROM; return TO; return CC; “Precedence: return PRECEDENCE; *[A-Za-z20-9-]+: return OTRER; [A-2a-2_0-9\.1+ { token_txt = strdup(yytext) ; return NAMI } [ArZa-z_0-9\.]+@[A-Za-z_0-9\.] + { token txt = strdup(yytext) ; return EMAIL; } \n {_/* empty line */ BEGIN BODY; body=0; return END_OF HEADER; } s\n /* Ignore what we don't understand */ . | \n body++; \nFrom {BEGIN 0; yyless(0); /* yyless() breaks the "*" a int main(int argc, char *argv[]) { lvasn auoveheadlex yacoextnd mechanism + sw19aaa Latent scamer int points=0,msgn=0; int receivers; Af (argc > 1) ( if ( (yyin = fopen(argv{1},"2")) perror (argv[1))i exit (1); , my_name = getlogin(); my_name_len = strlen(my_name); while ( (token = yylex()) ) { switch(token) { case START_OF_HEADER: hdr_field=START_OF_#ADER; if (body>5000) points+=10; ifg(msgn) is_it_spam(points,msgn); to_addri point. msgnt+; break; case FROM: case PRECEDENCE: case TO: case Ci hdr_field-token; break; case END_OP_HEADE! hdr_field-END_OF_ HEADER; if(to_addri == 0) points+=30; else if(fromaddr == 0) points+=30; else if( streasecmp(to_addrl, from_addr} points?=30; if (!to_me) points+=30; if (receivers>10) points+=10; break; case NAME: switch (hdr_field) case PRECEDENCE points+=precedence (); break; default: ) break; case EMAIL: switch (hdr field) case TO: case CC: hdr_to(hdr_field); break; case FROM: lvasn auoveheadlex yacoextnd swi9aaa hdr from(); break; default break; ) break; case OTHER: hdr_field-oTHER; break; default } : if(msgn) is_it_spam(points,msgn); return points; ) int yyweap() ( return (1); ) int precedence (} ( Lf (strcaseemp("junk", token_txt) return 30; Lf (stzcasecmp ("bulk", coken_txt) return 10; ==0) 0) return void hdr_to(token_t hdy_field) { if (strncasecmp (token_txt,my_name,my_name_len) \or a if(token_txt[my_name_len token_txt [my_name_len to_me=1; , » if ( to_addrl == 0 && hdr_fiela to_addrl = token txt; void ndz_from() ( from_addr = token_txt; ) void is it_spam(int points, int msgn) { if ( points >- 80 ) printé ("Message $4d scored #2c, ( points >= $0 ) print ("Message $4d scored #2c, ( points >= 30 ) printé ("Message $4d scored 82d, else if else if else print£ ("Message $4d scored 324, ) lvasn auoveheadlex yacoextnd Lex atent scanner almost certainly spam\n",msgn,points); probably spam\n",msgn, points); possibly spam\n",msgn, points); appears legitimate\n",msgn, points); ameaan Lex. aten scanner Since mail-headers and e-mail addres are supposed to be case-insens program using the case-insensitive option of flex: ve, you should compile the above make LELA The parser is implemented in C-code. The main feature of the parser is the loop: while ( (token = yylex The parser calls yylex () repeatedly, processing each token in tur, tkaing whatever action is appropriate to that token, When lex reaches the end-of fil, yylex () retums 0, and the parser exits i's whi le () loop Where addition information must be transferred between the lexer and the parser, this is done using global variables. For example, when the lexer encounters an e-mail address, it returns the token ema12, but it also places the actual text into the variable token_txt, where the parser can find it. ‘We will be using the above mechanisms later, when we start to employ Yaec. Lex is designed to interwork with Yace. However, as the above example tries to show, ifwe think Yace may not be suitable to our needs, we are free to use whatever parser suits our needs. By the way, the above program does a pretty poor job of detecting spam reliably. However, it does illstrate a few key points of a typical parser. Lex and make ‘You may have noticed that we have been building our lex- makefile ‘ograms using make, but without having to write a This is because lex is on of the programs for which make has an "implicit rule", The default make rule for a file ending in ther extension ".!"is to invoke make on that file, and generate a ".c" file for that fle. From there on, the better known implicit rules for compiling C programs take over. The default rule for lex is: S(LEX) $(LFLAGS) -t $< > $@ Which usually resolves to something like: ex -t file.l > fil Note 1 Lex pattem-action statements should be separated by one or more TAB characters, Flex is not fussy, and lvasn auoveheadlex yacoextnd wieazar Lex tt camer will accept spaces, but some versions of lex need to see tabs as separators. Similarly, actions which run over more than one line should be indented by at least one tab. ‘Also avoid putting blank lines within the lex specification. Although flex does not have a problem with this, some versions of lex do. Author: George_Hansper@apana.org.au Last modified: $Date: 2000/04/05 23:46:29 $ Previous: Introduction Next: Yace - A parser generator lvasn auoveheadlex yacoextnd sa19
You might also like
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5977)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Rating: 4 out of 5 stars
4/5 (1110)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (622)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (893)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1737)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1217)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (932)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (619)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2119)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (545)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (356)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (831)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (476)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1058)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (275)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (814)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1953)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (443)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
Rating: 3.5 out of 5 stars
3.5/5 (2029)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (424)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2272)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4851)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (99)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (270)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (125)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1941)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4255)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1934)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (235)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2587)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (232)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (805)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (4042)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (75)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (139)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2411)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (883)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
Rating: 3.5 out of 5 stars
3.5/5 (108)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
Internet Expenses Data Sheet
Document
2 pages
Internet Expenses Data Sheet
miguelsias127
100% (1)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (105)
PUBLIC POLICY ANALYSIS 7 Policy
Document
17 pages
PUBLIC POLICY ANALYSIS 7 Policy
Praise Nehumambi
No ratings yet
Implementation Plan
Document
24 pages
Implementation Plan
William
No ratings yet
Top-Down Parsing: CS164 Lecture 5-6
Document
56 pages
Top-Down Parsing: CS164 Lecture 5-6
Effervescent Fluffer
No ratings yet
Sap PM QM
Document
2 pages
Sap PM QM
Prakash Charry
100% (1)
Laudon - Chap 15 - Managing Global Systems
Document
26 pages
Laudon - Chap 15 - Managing Global Systems
i180194 Farhan Abbas
No ratings yet
Welcome To CS143: Compilers
Document
60 pages
Welcome To CS143: Compilers
Pratyush Mohanty
No ratings yet
ASAP Methodology
Document
5 pages
ASAP Methodology
Rohidas Shinde
100% (1)
SPCC EXP8 Updated
Document
5 pages
SPCC EXP8 Updated
Apurva Ankushrao
No ratings yet
Compile Construction
Document
84 pages
Compile Construction
kadoo khan
No ratings yet
Abebe Nigussie PDF
Document
136 pages
Abebe Nigussie PDF
matebbu
No ratings yet
Programming Languages With Compiler MQuiz 2 PDF
Document
3 pages
Programming Languages With Compiler MQuiz 2 PDF
Jr Cialana
No ratings yet
Code:-: Compiler Construction (UCS802) Lab Assignment-2
Document
12 pages
Code:-: Compiler Construction (UCS802) Lab Assignment-2
Devansh Pahuja
No ratings yet
This Study Resource Was: Virtual Memory Overview
Document
4 pages
This Study Resource Was: Virtual Memory Overview
Pritam Gupta
No ratings yet
Req SG Overview
Document
72 pages
Req SG Overview
Aju Abraham
No ratings yet
A+ Comparative Study of ERP Implementation Strategies
Document
9 pages
A+ Comparative Study of ERP Implementation Strategies
udemy.otc
No ratings yet
SAP Implementation Methodology
Document
46 pages
SAP Implementation Methodology
cater101
100% (5)
Malloc
Document
97 pages
Malloc
Dip Shit
No ratings yet
Implementing Time and Labor
Document
116 pages
Implementing Time and Labor
Abdallah Fayez
No ratings yet
SAP Business One Implementation Guide
Document
20 pages
SAP Business One Implementation Guide
Jesus A Roque Ortiz
100% (1)
Manasa Pingali - CV2024
Document
4 pages
Manasa Pingali - CV2024
Sireesha pola
No ratings yet
Format of Synopsis and Report (Project)
Document
9 pages
Format of Synopsis and Report (Project)
Shweta Sansaniwal
No ratings yet
Computer Science Project Topics and Materials
Document
38 pages
Computer Science Project Topics and Materials
Ezekiel
No ratings yet
Automata and Compiler Design: D.Rahul
Document
638 pages
Automata and Compiler Design: D.Rahul
Brindha Manickavasakan
No ratings yet
Software Engineering Deployment Diagram.
Document
11 pages
Software Engineering Deployment Diagram.
shubhamjtanna
No ratings yet
Ariba Business Analyst
Document
5 pages
Ariba Business Analyst
Syed Sirajul Haq
No ratings yet
B.3. Information Systems and Information Technology Planning Phases
Document
3 pages
B.3. Information Systems and Information Technology Planning Phases
maga2000
No ratings yet
A Critical Insight Into Policy Implementation and Implementation Performance
Document
12 pages
A Critical Insight Into Policy Implementation and Implementation Performance
Shahriar Khandaker
No ratings yet
Medi-Caps University, Indore: Assignment: 2 Software Engineering
Document
6 pages
Medi-Caps University, Indore: Assignment: 2 Software Engineering
Devanshu Khatwani
No ratings yet
Chapter 1: Introduction To Compiling: 1.1: Language Processors
Document
3 pages
Chapter 1: Introduction To Compiling: 1.1: Language Processors
Rhys Anton
No ratings yet