What is a compiler

W

A compiler is a software or an array of software, that translates the text of a program written in a source language into another computer language called target language.

The original source is usually called source code and the object code result. The name “compiler” is used for a program that translates from a high-level language into a low-level language.

A program that performs the reverse operation is called a “decompiler“, and one that translates between two high-level languages ​​is called “translator.”

The process of compiling a program takes place in several phases. One phase is a unitary operation in which the source program is transformed from one representation to another.

The main phases of a compile are:

1. Lexical analysis

The source text is taken in the form of a sequence of characters that are then grouped into entities called atoms; atoms are assigned lexical codes, so at the output of this phase, the source program appears as a sequence of such codes. Here are some examples of atoms: keywords, identifiers, numeric constants, punctuation marks, etc.

2. Syntactic analysis

It aims at grouping the atoms resulting from lexical analysis into syntactic structures. A syntactic structure can be seen as a tree whose terminal nodes represent atoms, while inner nodes represent the strings of atoms that form a logical entity. Here are some examples of syntactical structures: phrases, instructions, statements, etc.

3. Semantic analysis

During the syntactic analysis, there is usually a semantic analysis, which means performing checks related to:
– compatibility of the types of data with the operations in which they are involved.
– the observance of the rules of visibility imposed by the source language.

4. Generate intermediate code

In this phase, the syntax tree is transformed into a sequence of simple instructions, similar to the macro-instructions of an assembly language. The difference between the intermediate code and an assembly language is mainly that the intermediate code does not specify the registers used in the operations. Here are some examples of intermediate code representations: postfix notation, three-way instructions. The intermediate code has the advantage of being easier to optimize than the car code.

5. Code optimization

It is an optional phase, whose role is to modify portions of the generated intermediate code so that the resulting program meets specific performance criteria for execution time and/or memory space occupied.

6. Generate the final code

It assumes the transformation of the intermediate code instructions (possibly optimized) into machine instructions (or assembly) for the target computer, the one on which the compiled program will run.

In addition to the actions listed above, the compilation process also includes the following:

7. Manage the symbol table

Symbol table (TS) is a data structure intended to store information about the (name) symbols appearing in the source program; the compiler refers to this table almost in all phases of compilation.

8. Troubleshooting

A compiler must be able to recognize specific categories of errors that may appear in the source program. By treating an error, it first implies detecting it, issuing a corresponding message, and returning an error, that is to say, as much as possible, continuing the compilation process until the source text is depleted so that the number of compilations required to eliminate all errors in a program is as small as possible. Basically, there are errors specific to each compilation phase.

The GNU Compiler Collection is a set of compilers for various programming languages ​​produced by the GNU ProjectGCC has been adopted as the standard compiler for most of the Unix-like operating systems, including Linux, BSD, and Mac OS X.

About the author

Ilias spiros
By Ilias spiros

Recent Posts

Archives

Categories