Tutorial on LEX and internal working of lex
LEX
Lex is a tool which generates the Lexical analyser. Lex tool takes input as the regular expression and forms a DFD coresponding to that regular expression.

Lex Programs has following section:-
Declaration
%%
RULES and ACTIONS
%%
Auxiliary function
a) Declaration:-
Two type of declaration is available in lex
-> Auxiliary Declaration
-> Regular Declaration
Auxiliary declaration starts with %{ and ends with %}. All the statements written in this is directly copied to the lex.yy.c[ Lexical Analyser ]
Regular Declaration in lex is used to define special keywords used in the rule part. Also it is used to define option such as noyywrap.
Example :-
%{
#include<iostream> //DECLARATION
using namespace std;
%}
%%
int {cout<<”Integer detected…”;} //RULES
%%
int main(){ //AUXILIARY FUNCTION
yylex();
}
CODE 1.1
b) Rules :-
Rules consist of two parts.
-> Pattern to be matched
-> Action to be taken
[see code 1.1]
c) Auxillary Function :-
In addition to the code generated by lex tool, if we want to define some function than Auxiliary function is used[ See code 1.1 ].
Functions of Lex File
a.) int yylex():-
yylex is the main Scanner function, Lex creates yylex but do not call it, We need to call the yylex in main as to run the lexical analyser.
-> Otherwise we need to compile the lex.yy.c file with option -ll
Note:- If the yylex is not return than it takes the input recursively until and unless end of file is reached. In case of console input(stdin) we can give end of file by pressing cltrl+D.
In case of return of yylex, if we call yylex again then we get the input from the place we left.
By default action of left pattern is simple an ECHO.
b.) int yywrap() :-
lex calls the yywrap function. Whenever a lex encounter the end of file it calls yywrap function. If yywrap returns non zero value yylex terminates and return to the main [with value zero]. If the programmer wants to scan more than one input file, then yywrap should return zero, stating work is not finished. Meanwhile in yywrap we can change the file pointer as to read another file.
Note:- Lex by default does not define the yywrap, therefore there is a compulsion to define the yywrap function otherwise it will give an error stating that it is causing an error. Alternatively we can use %option noyywrap to define yywrap internally by lex.
This internal implementation returns non zero value , stating that work is finished.[see code 1.1]
Variables in lex tools
There are majorly three type of variables in lex.
1. yyin
2. yytext
3. yyleng
1. yyin :-
yyin is a variable of type FILE * and points to the file which we want to input in the lexical analyser.
If we set yyin to some file than it will read the character stream from that file.
Example:-
//AUXILIARY FUNCTION
int main(void){
FILE *f=fopen(“Temprary”,“r”);
if(f){
yyin=f;
}
yylex();
return 0;
}
By default value of yyin is stdin.
->Inside lex.yy.c
if(!yyin)
{
yyin=stdin;
}
This shows that if the value of yyin is not pointed by any file than yyin will get initialised to stdin[Standard Input].
2.) yytext :)
yytext is of char* type and it contains the mtched lexeme found. In every iteration of pattern matched the value is getting overwrite again and again.
Example :-
%{
#include<iostream>
using namespace std;
%}
%option no yywrap
DIGIT [0–9]+
%%
{DIGIT} {cout<<”Digit Found”<<yytext;}
%%
int main(void){
yylex();
return 1;
}
Note:- In this example we define DIGIT as a lex pattern variable and to access the variable in Pattern section we use {}.
3.) yyleng :-
yyleng is an int type, it stores the length of the lexeme or the size of yytext