phunix/man/man9/awk.9

.CD "awk \(en pattern matching language"
.SX "awk \fIrules\fR [\fIfile\fR] ...
.FL "\fR(none)"
.EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
.EX "awk rules \(en  >out" "Input from terminal, output to \fIout\fR"
.PP
AWK is a programming language devised by Aho, Weinberger, and Kernighan
at Bell Labs (hence the name).
\fIAwk\fR programs search files for
specific patterns and performs \*(OQactions\*(CQ for every occurrence
of these patterns.  The patterns can be \*(OQregular expressions\*(CQ
as used in the \fIed\fR editor.  The actions are expressed
using a subset of the C language.
.PP
The patterns and actions are usually placed in a \*(OQrules\*(CQ file
whose name must be the first argument in the command line,
preceded by the flag \fB\(enf\fR.  Otherwise, the first argument on the
command line is taken to be a string containing the rules
themselves. All other arguments are taken to be the names of text
files on which the rules are to be applied, with \fB\(en\fR being the
standard input.  To take rules from the standard input, use \fB\(enf \(en\fR.
.PP
The command:
.HS
.Cx "awk  rules  prog.\d\s+2*\s0\u"
.HS
would read the patterns and actions rules from the file \fIrules\fR
and apply them to all the arguments.
.PP
The general format of a rules file is:
.HS
~~~<pattern> { <action> }
~~~<pattern> { <action> }
~~~...
.HS
There may be any number of these <pattern> { <action> }
sequences in the rules file.  \fIAwk\fR reads a line of input from
the current input file and applies every <pattern> { <action> }
in sequence to the line.
.PP
If the <pattern> corresponding to any { <action> } is missing,
the action is applied to every line of input.  The default
{ <action> } is to print the matched input line.
.SS "Patterns"
.PP
The <pattern>s may consist of any valid C expression.  If the
<pattern> consists of two expressions separated by a comma, it
is taken to be a range and the <action> is performed on all
lines of input that match the range.  <pattern>s may contain
\*(OQregular expressions\*(CQ delimited by an @ symbol.  Regular
expressions can be thought of as a generalized \*(OQwildcard\*(CQ
string matching mechanism, similar to that used by many
operating systems to specify file names.  Regular expressions
may contain any of the following characters:
.HS
.in +0.75i
.ta +0.5i
.ti -0.5i
x	An ordinary character
.ti -0.5i
\\	The backslash quotes any character
.ti -0.5i
^	A circumflex at the beginning of an expr matches the beginning of a line.
.ti -0.5i
$	A dollar-sign at the end of an expression matches the end of a line.
.ti -0.5i
\&.	A period matches any single character except newline.
.ti -0.5i
*	An expression followed by an asterisk matches zero or more occurrences
of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
.ti -0.5i
+	An expression followed by a plus sign matches one or more occurrences
of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
.ti -0.5i
[]	A string enclosed in square brackets matches any single character in that
string, but no others.  If the first character in the string is a circumflex, the
expression matches any character except newline and the characters in the
string.  For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while
\*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ.  A range of characters may be
specified by two characters separated by \*(OQ-\*(CQ.
.in -0.75i
.SS "Actions"
.PP
Actions are expressed as a subset of the C language.  All
variables are global and default to int's if not formally
declared.
Only char's and int's and pointers and arrays of
char and int are allowed.  \fIAwk\fR allows only decimal integer
constants to be used\(emno hex (0xnn) or octal (0nn). String
and character constants may contain all of the special C
escapes (\\n, \\r, etc.).
.PP
\fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ,
\*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
control constructs, which behave exactly as in C.
.PP
Also supported are the following unary and binary operators,
listed in order from highest to lowest precedence:
.HS
.ta 0.25i 1.75i 3.0i
.nf
\fB	Operator	Type	Associativity\fR
	() []	unary	left to right
.tr ~~
	! ~ ++ \(en\(en \(en * &	unary	right to left
.tr ~
	* / %	binary	left to right
	+ \(en	binary	left to right
	<< >>	binary	left to right
	< <= > >=	binary	left to right
	== !=	binary	left to right
	&	binary	left to right
	^	binary	left to right
	|	binary	left to right
	&&	binary	left to right
	||	binary	left to right
	=	binary	right to left
.fi
.HS
Comments are introduced by a '#' symbol and are terminated by
the first newline character.  The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
comment delimiters are not supported and will result in a
syntax error.
.SP 0.5
.SS "Fields"
.SP 0.5
.PP
When \fIawk\fR reads a line from the current input file, the
record is automatically separated into \*(OQfields.\*(CQ  A field is
simply a string of consecutive characters delimited by either
the beginning or end of line, or a \*(OQfield separator\*(CQ character.
Initially, the field separators are the space and tab character.
The special unary operator '$' is used to reference one of the
fields in the current input record (line).  The fields are
numbered sequentially starting at 1.  The expression \*(OQ$0\*(CQ
references the entire input line.
.PP
Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
of an input \*(OQline,\*(CQ initially the newline character.  The field
and record separators may be changed programatically by one of
the actions and will remain in effect until changed again.
.PP
Multiple (up to 10) field separators are allowed at a time, but
only one record separator.
.PP
Fields behave exactly like strings; and can be used in the same
context as a character array.  These \*(OQarrays\*(CQ can be considered
to have been declared as:
.SP 0.15
.HS
~~~~~char ($n)[ 128 ];
.HS
.SP 0.15
In other words, they are 128 bytes long.  Notice that the
parentheses are necessary because the operators [] and $
associate from right to left; without them, the statement
would have parsed as:
.HS
.SP 0.15
~~~~~char $(1[ 128 ]);
.HS
.SP 0.15
which is obviously ridiculous.
.PP
If the contents of one of these field arrays is altered, the
\*(OQ$0\*(CQ field will reflect this change.  For example, this
expression:
.HS
.SP 0.15
~~~~~*$4 = 'A';
.HS
.SP 0.15
will change the first character of the fourth field to an upper-
case letter 'A'.  Then, when the following input line:
.HS
.SP 0.15
~~~~~120 PRINT "Name         address        Zip"
.SP 0.15
.HS
is processed, it would be printed as:
.HS
.SP 0.15
~~~~~120 PRINT "Name         Address        Zip"
.HS
.SP 0.15
Fields may also be modified with the strcpy() function (see
below).  For example, the expression:
.HS
~~~~~strcpy( $4, "Addr." );
.HS
applied to the same line above would yield:
.HS
~~~~~120 PRINT "Name         Addr.        Zip"
.HS
.SS "Predefined Variables"
.PP
The following variables are pre-defined:
.HS
.in +1.5i
.ta +1.25i
.ti -1.25i
FS	Field separator (see below).
.ti -1.25i
RS	Record separator (see below also).
.ti -1.25i
NF	Number of fields in current input record (line).
.ti -1.25i
NR	Number of records processed thus far.
.ti -1.25i
FILENAME	Name of current input file.
.ti -1.25i
BEGIN	A special <pattern> that matches the beginning of input text.
.ti -1.25i
END	A special <pattern> that matches the end of input text.
.in -1.5i
.HS
\fIAwk\fR also provides some useful built-in functions for string
manipulation and printing:
.HS
.in +1.5i
.ta +1.25i
.ti -1.25i
print(arg)	Simple printing of strings only, terminated by '\\n'.
.ti -1.25i
printf(arg...)	Exactly the printf() function from C.
.ti -1.25i
getline()	Reads the next record and returns 0 on end of file.
.ti -1.25i
nextfile()	Closes the current input file and begins processing the next file
.ti -1.25i
strlen(s)	Returns the length of its string argument.
.ti -1.25i
strcpy(s,t)	Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
.ti -1.25i
strcmp(s,t)	Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
.ti -1.25i
toupper(c)	Returns its character argument converted to upper-case.
.ti -1.25i
tolower(c)	Returns its character argument converted to lower-case.
.ti -1.25i
match(s,@re@)	Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and
returns the number of matches found (zero if none).
.in -1.5i
.SS "Authors"
.PP
\fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.