GNU Flex
At the conceptual level, the compiler is a program that translates from one language to another.
Most often, the compiler translates from a high-level language (C, C ++, Java), which is written and understood by a programmer, into a low-level language (bytecode, assembly language, machine code) that can be more than executed by the processor. The lexical analysis is the first step in the compiler’s understanding of the source files.
Flex is a generator of lexical analyzers.
It receives at the input a file that contains a specification of the atoms to be recognized by the generated analyzer, as well as a series of semantic actions to execute. The specifications will be written in a text file that can have any name/extension.
Usually, the .lex extension is used.
Based on the specifications, Flex generates a C program that contains analysis tables and an analysis function called YYLEX. The default name of the file thus generated is lex.yy.c. It will have to be compiled (GCC or another compiler) to get the executable file.
Flex is able to work with several different sets of rules simultaneously. This can be useful, for example if you want to analyze a source file that contains code sequences written in different languages, to which different sets of regular expressions correspond.
GNU bison
Bison is a generator of syntactic analyzers. It receives at the input the grammar of a language which it translates into a parser for that language. Bison grammars are described using a variant of BNF (Backus Naur Form). BNF grammars can be used for context-independent languages.
Bison-generated parsers can use one of the following parsing methods:
1. LALR (1) – Look-Ahead Left to Right with a one-token lookahead
2. GLR – Generalized Left to Right
Most starters use LALR (1), which has fewer capabilities but is significantly faster and easier to use than GLR.
Bison is the GNU variant of YACC (Yet Another Compiler-Compiler).
The input file, which contains the grammar, has the extension .y (it is a convention). By default, the generated C program has the same name as the specification file, but the .y extension is changed to .tab.c.
The rules of a grammar contain terminals and non-terminals. In the grammar accepted by the bison we will name the tokens terminals. They are the result of the lexical analysis made with the help of Flex.
Bison does not check the correctness of the C code in the actions, but only copies it to the parser’s .c file, where it will be checked by the C. compiler. Therefore, C errors will only be reported when the parser is compiled. We will now see how the lexical analyzer (Flex) communicates with the syntactic analyzer (bison)
Bison generates a parser, from name_gram.y, in the file name_gram.tab.c and a header file name_gram.tab.h. In the header file, each token has a number associated with it. The lex input file (.l) includes the header file (with the extension .tab.h) and uses the numbers assigned to the tokens in this file.
The bison entry is a string of tokens provided by Flex through the YYLEX () function.
Suppose we have a grammar that defines the INTEGER token and it has received the value 258 in the header file name_gram.tab.h. Each occurrence of the “return INTEGER” statement in the .l file will be replaced, in this case by “return 258”. To get the tokens, the bison calls the YYLEX function. In other words, the file with the extension .tab.h assigns each token/terminal a unique identifier of type int. As this header is included in the * .l file, the token names are replaced with the unique identifier.