249 lines
		
	
	
		
			8.4 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			249 lines
		
	
	
		
			8.4 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .so mnx.mac
 | |
| .TH AWK 9
 | |
| .CD "awk \(en pattern matching language"
 | |
| .SX "awk \fIrules\fR [\fIfile\fR] ...
 | |
| .FL "\fR(none)"
 | |
| .EX "awk rules input" "Process \fIinput\fR according to \fIrules\fR"
 | |
| .EX "awk rules \(en  >out" "Input from terminal, output to \fIout\fR"
 | |
| .PP
 | |
| AWK is a programming language devised by Aho, Weinberger, and Kernighan
 | |
| at Bell Labs (hence the name).
 | |
| \fIAwk\fR programs search files for
 | |
| specific patterns and performs \*(OQactions\*(CQ for every occurrence
 | |
| of these patterns.  The patterns can be \*(OQregular expressions\*(CQ
 | |
| as used in the \fIed\fR editor.  The actions are expressed
 | |
| using a subset of the C language.
 | |
| .PP
 | |
| The patterns and actions are usually placed in a \*(OQrules\*(CQ file
 | |
| whose name must be the first argument in the command line,
 | |
| preceded by the flag \fB\(enf\fR.  Otherwise, the first argument on the
 | |
| command line is taken to be a string containing the rules
 | |
| themselves. All other arguments are taken to be the names of text
 | |
| files on which the rules are to be applied, with \fB\(en\fR being the
 | |
| standard input.  To take rules from the standard input, use \fB\(enf \(en\fR.
 | |
| .PP
 | |
| The command:
 | |
| .HS
 | |
| .Cx "awk  rules  prog.\d\s+2*\s0\u"
 | |
| .HS
 | |
| would read the patterns and actions rules from the file \fIrules\fR
 | |
| and apply them to all the arguments.
 | |
| .PP
 | |
| The general format of a rules file is:
 | |
| .HS
 | |
| ~~~<pattern> { <action> }
 | |
| ~~~<pattern> { <action> }
 | |
| ~~~...
 | |
| .HS
 | |
| There may be any number of these <pattern> { <action> }
 | |
| sequences in the rules file.  \fIAwk\fR reads a line of input from
 | |
| the current input file and applies every <pattern> { <action> }
 | |
| in sequence to the line.
 | |
| .PP
 | |
| If the <pattern> corresponding to any { <action> } is missing,
 | |
| the action is applied to every line of input.  The default
 | |
| { <action> } is to print the matched input line.
 | |
| .SS "Patterns"
 | |
| .PP
 | |
| The <pattern>s may consist of any valid C expression.  If the
 | |
| <pattern> consists of two expressions separated by a comma, it
 | |
| is taken to be a range and the <action> is performed on all
 | |
| lines of input that match the range.  <pattern>s may contain
 | |
| \*(OQregular expressions\*(CQ delimited by an @ symbol.  Regular
 | |
| expressions can be thought of as a generalized \*(OQwildcard\*(CQ
 | |
| string matching mechanism, similar to that used by many
 | |
| operating systems to specify file names.  Regular expressions
 | |
| may contain any of the following characters:
 | |
| .HS
 | |
| .in +0.75i
 | |
| .ta +0.5i
 | |
| .ti -0.5i
 | |
| x	An ordinary character
 | |
| .ti -0.5i
 | |
| \\	The backslash quotes any character
 | |
| .ti -0.5i
 | |
| ^	A circumflex at the beginning of an expr matches the beginning of a line.
 | |
| .ti -0.5i
 | |
| $	A dollar-sign at the end of an expression matches the end of a line.
 | |
| .ti -0.5i
 | |
| \&.	A period matches any single character except newline.
 | |
| .ti -0.5i
 | |
| *	An expression followed by an asterisk matches zero or more occurrences
 | |
| of that expression: \*(OQfo*\*(CQ matches \*(OQf\*(CQ, \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
 | |
| .ti -0.5i
 | |
| +	An expression followed by a plus sign matches one or more occurrences 
 | |
| of that expression: \*(OQfo+\*(CQ matches \*(OQfo\*(CQ, \*(OQfoo\*(CQ, \*(OQfooo\*(CQ, etc.
 | |
| .ti -0.5i
 | |
| []	A string enclosed in square brackets matches any single character in that 
 | |
| string, but no others.  If the first character in the string is a circumflex, the 
 | |
| expression matches any character except newline and the characters in the 
 | |
| string.  For example, \*(OQ[xyz]\*(CQ matches \*(OQxx\*(CQ and \*(OQzyx\*(CQ, while 
 | |
| \*(OQ[^xyz]\*(CQ matches \*(OQabc\*(CQ but not \*(OQaxb\*(CQ.  A range of characters may be 
 | |
| specified by two characters separated by \*(OQ-\*(CQ.
 | |
| .in -0.75i
 | |
| .SS "Actions"
 | |
| .PP
 | |
| Actions are expressed as a subset of the C language.  All
 | |
| variables are global and default to int's if not formally
 | |
| declared.  
 | |
| Only char's and int's and pointers and arrays of
 | |
| char and int are allowed.  \fIAwk\fR allows only decimal integer
 | |
| constants to be used\(emno hex (0xnn) or octal (0nn). String
 | |
| and character constants may contain all of the special C
 | |
| escapes (\\n, \\r, etc.).
 | |
| .PP
 | |
| \fIAwk\fR supports the \*(OQif\*(CQ, \*(OQelse\*(CQ, 
 | |
| \*(OQwhile\*(CQ and \*(OQbreak\*(CQ flow of
 | |
| control constructs, which behave exactly as in C.
 | |
| .PP
 | |
| Also supported are the following unary and binary operators,
 | |
| listed in order from highest to lowest precedence:
 | |
| .HS
 | |
| .ta 0.25i 1.75i 3.0i
 | |
| .nf
 | |
| \fB	Operator	Type	Associativity\fR
 | |
| 	() []	unary	left to right
 | |
| .tr ~~
 | |
| 	! ~ ++ \(en\(en \(en * &	unary	right to left
 | |
| .tr ~
 | |
| 	* / %	binary	left to right
 | |
| 	+ \(en	binary	left to right
 | |
| 	<< >>	binary	left to right
 | |
| 	< <= > >=	binary	left to right
 | |
| 	== !=	binary	left to right
 | |
| 	&	binary	left to right
 | |
| 	^	binary	left to right
 | |
| 	|	binary	left to right
 | |
| 	&&	binary	left to right
 | |
| 	||	binary	left to right
 | |
| 	=	binary	right to left
 | |
| .fi
 | |
| .HS
 | |
| Comments are introduced by a '#' symbol and are terminated by
 | |
| the first newline character.  The standard \*(OQ/*\*(CQ and \*(OQ*/\*(CQ
 | |
| comment delimiters are not supported and will result in a
 | |
| syntax error.
 | |
| .SP 0.5
 | |
| .SS "Fields"
 | |
| .SP 0.5
 | |
| .PP
 | |
| When \fIawk\fR reads a line from the current input file, the
 | |
| record is automatically separated into \*(OQfields.\*(CQ  A field is
 | |
| simply a string of consecutive characters delimited by either
 | |
| the beginning or end of line, or a \*(OQfield separator\*(CQ character.
 | |
| Initially, the field separators are the space and tab character.
 | |
| The special unary operator '$' is used to reference one of the
 | |
| fields in the current input record (line).  The fields are
 | |
| numbered sequentially starting at 1.  The expression \*(OQ$0\*(CQ
 | |
| references the entire input line.
 | |
| .PP
 | |
| Similarly, the \*(OQrecord separator\*(CQ is used to determine the end
 | |
| of an input \*(OQline,\*(CQ initially the newline character.  The field
 | |
| and record separators may be changed programatically by one of
 | |
| the actions and will remain in effect until changed again.
 | |
| .PP
 | |
| Multiple (up to 10) field separators are allowed at a time, but
 | |
| only one record separator.
 | |
| .PP
 | |
| Fields behave exactly like strings; and can be used in the same
 | |
| context as a character array.  These \*(OQarrays\*(CQ can be considered
 | |
| to have been declared as:
 | |
| .SP 0.15
 | |
| .HS
 | |
| ~~~~~char ($n)[ 128 ];
 | |
| .HS
 | |
| .SP 0.15
 | |
| In other words, they are 128 bytes long.  Notice that the
 | |
| parentheses are necessary because the operators [] and $
 | |
| associate from right to left; without them, the statement
 | |
| would have parsed as:
 | |
| .HS
 | |
| .SP 0.15
 | |
| ~~~~~char $(1[ 128 ]);
 | |
| .HS
 | |
| .SP 0.15
 | |
| which is obviously ridiculous.
 | |
| .PP
 | |
| If the contents of one of these field arrays is altered, the
 | |
| \*(OQ$0\*(CQ field will reflect this change.  For example, this
 | |
| expression:
 | |
| .HS
 | |
| .SP 0.15
 | |
| ~~~~~*$4 = 'A';
 | |
| .HS
 | |
| .SP 0.15
 | |
| will change the first character of the fourth field to an upper-
 | |
| case letter 'A'.  Then, when the following input line:
 | |
| .HS
 | |
| .SP 0.15
 | |
| ~~~~~120 PRINT "Name         address        Zip"
 | |
| .SP 0.15
 | |
| .HS
 | |
| is processed, it would be printed as:
 | |
| .HS
 | |
| .SP 0.15
 | |
| ~~~~~120 PRINT "Name         Address        Zip"
 | |
| .HS
 | |
| .SP 0.15
 | |
| Fields may also be modified with the strcpy() function (see
 | |
| below).  For example, the expression:
 | |
| .HS
 | |
| ~~~~~strcpy( $4, "Addr." );
 | |
| .HS
 | |
| applied to the same line above would yield:
 | |
| .HS
 | |
| ~~~~~120 PRINT "Name         Addr.        Zip"
 | |
| .HS
 | |
| .SS "Predefined Variables"
 | |
| .PP
 | |
| The following variables are pre-defined:
 | |
| .HS
 | |
| .in +1.5i
 | |
| .ta +1.25i
 | |
| .ti -1.25i
 | |
| FS	Field separator (see below).
 | |
| .ti -1.25i
 | |
| RS	Record separator (see below also).
 | |
| .ti -1.25i
 | |
| NF	Number of fields in current input record (line).
 | |
| .ti -1.25i
 | |
| NR	Number of records processed thus far.
 | |
| .ti -1.25i
 | |
| FILENAME	Name of current input file.
 | |
| .ti -1.25i
 | |
| BEGIN	A special <pattern> that matches the beginning of input text.
 | |
| .ti -1.25i
 | |
| END	A special <pattern> that matches the end of input text.
 | |
| .in -1.5i
 | |
| .HS
 | |
| \fIAwk\fR also provides some useful built-in functions for string
 | |
| manipulation and printing:
 | |
| .HS
 | |
| .in +1.5i
 | |
| .ta +1.25i
 | |
| .ti -1.25i
 | |
| print(arg)	Simple printing of strings only, terminated by '\\n'.
 | |
| .ti -1.25i
 | |
| printf(arg...)	Exactly the printf() function from C.
 | |
| .ti -1.25i
 | |
| getline()	Reads the next record and returns 0 on end of file.
 | |
| .ti -1.25i
 | |
| nextfile()	Closes the current input file and begins processing the next file
 | |
| .ti -1.25i
 | |
| strlen(s)	Returns the length of its string argument.
 | |
| .ti -1.25i
 | |
| strcpy(s,t)	Copies the string \*(OQt\*(CQ to the string \*(OQs\*(CQ.
 | |
| .ti -1.25i
 | |
| strcmp(s,t)	Compares the \*(OQs\*(CQ to \*(OQt\*(CQ and returns 0 if they match.
 | |
| .ti -1.25i
 | |
| toupper(c)	Returns its character argument converted to upper-case.
 | |
| .ti -1.25i
 | |
| tolower(c)	Returns its character argument converted to lower-case.
 | |
| .ti -1.25i
 | |
| match(s,@re@)	Compares the string \*(OQs\*(CQ to the regular expression \*(OQre\*(CQ and 
 | |
| returns the number of matches found (zero if none).
 | |
| .in -1.5i
 | |
| .SS "Authors"
 | |
| .PP
 | |
| \fIAwk\fR was written by Saeko Hirabauashi and Kouichi Hirabayashi.
 | 
