What is AWK

W

AWK is a specialized programming language for text processing. A text file processed in AWK is treated as a sequence of records, and each text line is a record. Each record is divided into fields, so the first field is the first word, the second field is the second word, and so on. Awk is a scripting language specializing in text processing.

An awk program consists of:
condition {action}
pattern {action}

Where the pattern is typically a logical expression or regular expression, and the action consists of a series of commands. Each line between the input data is tested with pattern expressions, and the action is executed for each expression that has been evaluated as true. Any part of the pattern-action pair can be omitted, thus by defaulting the pattern expression we can invalidate any data line, and the lack of action will implicitly result in the data line printing.

Two special pattern expressions called BEGIN and END are supported. They describe actions that are executed before or after reading the data lines. Expressions like pattern1, pattern2 allow actions to be executed starting with the pattern validated by pattern1 and ending with pattern2 validated.

AWK includes most of the elements found in modern programming languages: variables, functions, logical operators, computing operators, and control blocks. Components are similar to those of the C language.
In phrases, we can use all classical operators. Variables are automatically initialized with the empty string or 0 as required. There are some individual variables and functions:

1) FS is the regular expression that defines with what the fields are separated.
2) NF is the number of fields in the current record.
3) NR is the ordinal number of the current record. It refers to the global number, that is, how many lines have been processed so far, not the line number in the current file.
4) FNR is the record number in the current file.
5) FILENAME is simply the name of the file on which awk works. This is useful because you can give it more files to act with the same script, and then distinguish between them with FILENAME you get the file name you are working on now.
6) ARGC is the number of arguments from the command line.
7) ARGV is the argument table.
8) length returns the length of the string.
9) substr returns a string truncated according to the parameters.

The awk command is implicitly included in all modern Linux systems, so there’s no need to install anything to get started.

The basic format of an awk command is:
awk ‘/ search_pattern / {action_to_take_on_matches; another_action; } ‘

Patterns or templates consist of regular expressions and relational expressions, comparisons that apply to the fields, and which determine the lines to which the given actions apply. Two more special patterns are BEGIN and END, which are true before the first line read, respectively after the last line read globally, not the file.

There is also an empty pattern that applies to any line.
The BEGIN and END keywords are actually just specific sets of terms, just like search parameters. Match before and after the document has been processed.

Awk is a much bigger subject and it’s actually a complete programming language. It can be used in scripts to be able to format the text in a safe way.

There are two ways to use awk programs:

1) First to give as a parameter to awk instruction sequence. It becomes quite tedious when we have longer programs.
2) The second method is to give awk a “script” file, which contains the instructions:

awk ‘instructions’ name1 name2 awk -f name_scen name1 name2

where name1 and name2 are the files we want to run our program.

About the author

Ilias spiros
By Ilias spiros

Recent Posts

Archives

Categories