3697 lines
		
	
	
		
			123 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			3697 lines
		
	
	
		
			123 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
NAME
 | 
						|
     flex - fast lexical analyzer generator
 | 
						|
 | 
						|
SYNOPSIS
 | 
						|
     flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput  -Pprefix
 | 
						|
     -Sskeleton] [--help --version] [filename ...]
 | 
						|
 | 
						|
OVERVIEW
 | 
						|
     This manual describes flex, a tool for  generating  programs
 | 
						|
     that  perform pattern-matching on text.  The manual includes
 | 
						|
     both tutorial and reference sections:
 | 
						|
 | 
						|
         Description
 | 
						|
             a brief overview of the tool
 | 
						|
 | 
						|
         Some Simple Examples
 | 
						|
 | 
						|
         Format Of The Input File
 | 
						|
 | 
						|
         Patterns
 | 
						|
             the extended regular expressions used by flex
 | 
						|
 | 
						|
         How The Input Is Matched
 | 
						|
             the rules for determining what has been matched
 | 
						|
 | 
						|
         Actions
 | 
						|
             how to specify what to do when a pattern is matched
 | 
						|
 | 
						|
         The Generated Scanner
 | 
						|
             details regarding the scanner that flex produces;
 | 
						|
             how to control the input source
 | 
						|
 | 
						|
         Start Conditions
 | 
						|
             introducing context into your scanners, and
 | 
						|
             managing "mini-scanners"
 | 
						|
 | 
						|
         Multiple Input Buffers
 | 
						|
             how to manipulate multiple input sources; how to
 | 
						|
             scan from strings instead of files
 | 
						|
 | 
						|
         End-of-file Rules
 | 
						|
             special rules for matching the end of the input
 | 
						|
 | 
						|
         Miscellaneous Macros
 | 
						|
             a summary of macros available to the actions
 | 
						|
 | 
						|
         Values Available To The User
 | 
						|
             a summary of values available to the actions
 | 
						|
 | 
						|
         Interfacing With Yacc
 | 
						|
             connecting flex scanners together with yacc parsers
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    1
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         Options
 | 
						|
             flex command-line options, and the "%option"
 | 
						|
             directive
 | 
						|
 | 
						|
         Performance Considerations
 | 
						|
             how to make your scanner go as fast as possible
 | 
						|
 | 
						|
         Generating C++ Scanners
 | 
						|
             the (experimental) facility for generating C++
 | 
						|
             scanner classes
 | 
						|
 | 
						|
         Incompatibilities With Lex And POSIX
 | 
						|
             how flex differs from AT&T lex and the POSIX lex
 | 
						|
             standard
 | 
						|
 | 
						|
         Diagnostics
 | 
						|
             those error messages produced by flex (or scanners
 | 
						|
             it generates) whose meanings might not be apparent
 | 
						|
 | 
						|
         Files
 | 
						|
             files used by flex
 | 
						|
 | 
						|
         Deficiencies / Bugs
 | 
						|
             known problems with flex
 | 
						|
 | 
						|
         See Also
 | 
						|
             other documentation, related tools
 | 
						|
 | 
						|
         Author
 | 
						|
             includes contact information
 | 
						|
 | 
						|
 | 
						|
DESCRIPTION
 | 
						|
     flex is a  tool  for  generating  scanners:  programs  which
 | 
						|
     recognized  lexical  patterns in text.  flex reads the given
 | 
						|
     input files, or its standard input  if  no  file  names  are
 | 
						|
     given,  for  a  description  of  a scanner to generate.  The
 | 
						|
     description is in the form of pairs of  regular  expressions
 | 
						|
     and  C  code,  called  rules.  flex  generates as output a C
 | 
						|
     source file, lex.yy.c, which defines a routine yylex(). This
 | 
						|
     file is compiled and linked with the -lfl library to produce
 | 
						|
     an executable.  When the executable is run, it analyzes  its
 | 
						|
     input  for occurrences of the regular expressions.  Whenever
 | 
						|
     it finds one, it executes the corresponding C code.
 | 
						|
 | 
						|
SOME SIMPLE EXAMPLES
 | 
						|
     First some simple examples to get the flavor of how one uses
 | 
						|
     flex.  The  following  flex  input specifies a scanner which
 | 
						|
     whenever it encounters the string "username" will replace it
 | 
						|
     with the user's login name:
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    2
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         username    printf( "%s", getlogin() );
 | 
						|
 | 
						|
     By default, any text not matched by a flex scanner is copied
 | 
						|
     to  the output, so the net effect of this scanner is to copy
 | 
						|
     its input file to its output with each occurrence of  "user-
 | 
						|
     name"  expanded.   In  this  input,  there is just one rule.
 | 
						|
     "username" is the pattern and the "printf"  is  the  action.
 | 
						|
     The "%%" marks the beginning of the rules.
 | 
						|
 | 
						|
     Here's another simple example:
 | 
						|
 | 
						|
                 int num_lines = 0, num_chars = 0;
 | 
						|
 | 
						|
         %%
 | 
						|
         \n      ++num_lines; ++num_chars;
 | 
						|
         .       ++num_chars;
 | 
						|
 | 
						|
         %%
 | 
						|
         main()
 | 
						|
                 {
 | 
						|
                 yylex();
 | 
						|
                 printf( "# of lines = %d, # of chars = %d\n",
 | 
						|
                         num_lines, num_chars );
 | 
						|
                 }
 | 
						|
 | 
						|
     This scanner counts the number of characters and the  number
 | 
						|
     of  lines in its input (it produces no output other than the
 | 
						|
     final report on the counts).  The first  line  declares  two
 | 
						|
     globals,  "num_lines"  and "num_chars", which are accessible
 | 
						|
     both inside yylex() and in the main() routine declared after
 | 
						|
     the  second  "%%".  There are two rules, one which matches a
 | 
						|
     newline ("\n") and increments both the line  count  and  the
 | 
						|
     character  count,  and one which matches any character other
 | 
						|
     than a newline (indicated by the "." regular expression).
 | 
						|
 | 
						|
     A somewhat more complicated example:
 | 
						|
 | 
						|
         /* scanner for a toy Pascal-like language */
 | 
						|
 | 
						|
         %{
 | 
						|
         /* need this for the call to atof() below */
 | 
						|
         #include <math.h>
 | 
						|
         %}
 | 
						|
 | 
						|
         DIGIT    [0-9]
 | 
						|
         ID       [a-z][a-z0-9]*
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
         {DIGIT}+    {
 | 
						|
                     printf( "An integer: %s (%d)\n", yytext,
 | 
						|
                             atoi( yytext ) );
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    3
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
                     }
 | 
						|
 | 
						|
         {DIGIT}+"."{DIGIT}*        {
 | 
						|
                     printf( "A float: %s (%g)\n", yytext,
 | 
						|
                             atof( yytext ) );
 | 
						|
                     }
 | 
						|
 | 
						|
         if|then|begin|end|procedure|function        {
 | 
						|
                     printf( "A keyword: %s\n", yytext );
 | 
						|
                     }
 | 
						|
 | 
						|
         {ID}        printf( "An identifier: %s\n", yytext );
 | 
						|
 | 
						|
         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
 | 
						|
 | 
						|
         "{"[^}\n]*"}"     /* eat up one-line comments */
 | 
						|
 | 
						|
         [ \t\n]+          /* eat up whitespace */
 | 
						|
 | 
						|
         .           printf( "Unrecognized character: %s\n", yytext );
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
         main( argc, argv )
 | 
						|
         int argc;
 | 
						|
         char **argv;
 | 
						|
             {
 | 
						|
             ++argv, --argc;  /* skip over program name */
 | 
						|
             if ( argc > 0 )
 | 
						|
                     yyin = fopen( argv[0], "r" );
 | 
						|
             else
 | 
						|
                     yyin = stdin;
 | 
						|
 | 
						|
             yylex();
 | 
						|
             }
 | 
						|
 | 
						|
     This is the beginnings of a simple scanner  for  a  language
 | 
						|
     like  Pascal.   It  identifies different types of tokens and
 | 
						|
     reports on what it has seen.
 | 
						|
 | 
						|
     The details of this example will be explained in the follow-
 | 
						|
     ing sections.
 | 
						|
 | 
						|
FORMAT OF THE INPUT FILE
 | 
						|
     The flex input file consists of three sections, separated by
 | 
						|
     a line with just %% in it:
 | 
						|
 | 
						|
         definitions
 | 
						|
         %%
 | 
						|
         rules
 | 
						|
         %%
 | 
						|
         user code
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    4
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     The definitions section contains declarations of simple name
 | 
						|
     definitions  to  simplify  the  scanner  specification,  and
 | 
						|
     declarations of start conditions, which are explained  in  a
 | 
						|
     later section.
 | 
						|
 | 
						|
     Name definitions have the form:
 | 
						|
 | 
						|
         name definition
 | 
						|
 | 
						|
     The "name" is a word beginning with a letter  or  an  under-
 | 
						|
     score  ('_')  followed by zero or more letters, digits, '_',
 | 
						|
     or '-' (dash).  The definition is  taken  to  begin  at  the
 | 
						|
     first  non-white-space character following the name and con-
 | 
						|
     tinuing to the end of the line.  The definition  can  subse-
 | 
						|
     quently  be referred to using "{name}", which will expand to
 | 
						|
     "(definition)".  For example,
 | 
						|
 | 
						|
         DIGIT    [0-9]
 | 
						|
         ID       [a-z][a-z0-9]*
 | 
						|
 | 
						|
     defines "DIGIT" to be a regular expression which  matches  a
 | 
						|
     single  digit,  and  "ID"  to  be a regular expression which
 | 
						|
     matches a letter followed by zero-or-more letters-or-digits.
 | 
						|
     A subsequent reference to
 | 
						|
 | 
						|
         {DIGIT}+"."{DIGIT}*
 | 
						|
 | 
						|
     is identical to
 | 
						|
 | 
						|
         ([0-9])+"."([0-9])*
 | 
						|
 | 
						|
     and matches one-or-more digits followed by a '.' followed by
 | 
						|
     zero-or-more digits.
 | 
						|
 | 
						|
     The rules section of the flex input  contains  a  series  of
 | 
						|
     rules of the form:
 | 
						|
 | 
						|
         pattern   action
 | 
						|
 | 
						|
     where the pattern must be unindented  and  the  action  must
 | 
						|
     begin on the same line.
 | 
						|
 | 
						|
     See below for a further description of patterns and actions.
 | 
						|
 | 
						|
     Finally, the user code section is simply copied to  lex.yy.c
 | 
						|
     verbatim.   It  is used for companion routines which call or
 | 
						|
     are called by the scanner.  The presence of this section  is
 | 
						|
     optional;  if it is missing, the second %% in the input file
 | 
						|
     may be skipped, too.
 | 
						|
 | 
						|
     In the definitions and rules sections, any indented text  or
 | 
						|
     text  enclosed in %{ and %} is copied verbatim to the output
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    5
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     (with the %{}'s removed).  The %{}'s must appear  unindented
 | 
						|
     on lines by themselves.
 | 
						|
 | 
						|
     In the rules section, any indented  or  %{}  text  appearing
 | 
						|
     before the first rule may be used to declare variables which
 | 
						|
     are local to the scanning routine and  (after  the  declara-
 | 
						|
     tions)  code  which  is to be executed whenever the scanning
 | 
						|
     routine is entered.  Other indented or %{} text in the  rule
 | 
						|
     section  is  still  copied to the output, but its meaning is
 | 
						|
     not well-defined and it may well cause  compile-time  errors
 | 
						|
     (this feature is present for POSIX compliance; see below for
 | 
						|
     other such features).
 | 
						|
 | 
						|
     In the definitions section (but not in the  rules  section),
 | 
						|
     an  unindented comment (i.e., a line beginning with "/*") is
 | 
						|
     also copied verbatim to the output up to the next "*/".
 | 
						|
 | 
						|
PATTERNS
 | 
						|
     The patterns in the input are written using an extended  set
 | 
						|
     of regular expressions.  These are:
 | 
						|
 | 
						|
         x          match the character 'x'
 | 
						|
         .          any character (byte) except newline
 | 
						|
         [xyz]      a "character class"; in this case, the pattern
 | 
						|
                      matches either an 'x', a 'y', or a 'z'
 | 
						|
         [abj-oZ]   a "character class" with a range in it; matches
 | 
						|
                      an 'a', a 'b', any letter from 'j' through 'o',
 | 
						|
                      or a 'Z'
 | 
						|
         [^A-Z]     a "negated character class", i.e., any character
 | 
						|
                      but those in the class.  In this case, any
 | 
						|
                      character EXCEPT an uppercase letter.
 | 
						|
         [^A-Z\n]   any character EXCEPT an uppercase letter or
 | 
						|
                      a newline
 | 
						|
         r*         zero or more r's, where r is any regular expression
 | 
						|
         r+         one or more r's
 | 
						|
         r?         zero or one r's (that is, "an optional r")
 | 
						|
         r{2,5}     anywhere from two to five r's
 | 
						|
         r{2,}      two or more r's
 | 
						|
         r{4}       exactly 4 r's
 | 
						|
         {name}     the expansion of the "name" definition
 | 
						|
                    (see above)
 | 
						|
         "[xyz]\"foo"
 | 
						|
                    the literal string: [xyz]"foo
 | 
						|
         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
 | 
						|
                      then the ANSI-C interpretation of \x.
 | 
						|
                      Otherwise, a literal 'X' (used to escape
 | 
						|
                      operators such as '*')
 | 
						|
         \0         a NUL character (ASCII code 0)
 | 
						|
         \123       the character with octal value 123
 | 
						|
         \x2a       the character with hexadecimal value 2a
 | 
						|
         (r)        match an r; parentheses are used to override
 | 
						|
                      precedence (see below)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    6
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         rs         the regular expression r followed by the
 | 
						|
                      regular expression s; called "concatenation"
 | 
						|
 | 
						|
 | 
						|
         r|s        either an r or an s
 | 
						|
 | 
						|
 | 
						|
         r/s        an r but only if it is followed by an s.  The
 | 
						|
                      text matched by s is included when determining
 | 
						|
                      whether this rule is the "longest match",
 | 
						|
                      but is then returned to the input before
 | 
						|
                      the action is executed.  So the action only
 | 
						|
                      sees the text matched by r.  This type
 | 
						|
                      of pattern is called trailing context".
 | 
						|
                      (There are some combinations of r/s that flex
 | 
						|
                      cannot match correctly; see notes in the
 | 
						|
                      Deficiencies / Bugs section below regarding
 | 
						|
                      "dangerous trailing context".)
 | 
						|
         ^r         an r, but only at the beginning of a line (i.e.,
 | 
						|
                      which just starting to scan, or right after a
 | 
						|
                      newline has been scanned).
 | 
						|
         r$         an r, but only at the end of a line (i.e., just
 | 
						|
                      before a newline).  Equivalent to "r/\n".
 | 
						|
 | 
						|
                    Note that flex's notion of "newline" is exactly
 | 
						|
                    whatever the C compiler used to compile flex
 | 
						|
                    interprets '\n' as; in particular, on some DOS
 | 
						|
                    systems you must either filter out \r's in the
 | 
						|
                    input yourself, or explicitly use r/\r\n for "r$".
 | 
						|
 | 
						|
 | 
						|
         <s>r       an r, but only in start condition s (see
 | 
						|
                      below for discussion of start conditions)
 | 
						|
         <s1,s2,s3>r
 | 
						|
                    same, but in any of start conditions s1,
 | 
						|
                      s2, or s3
 | 
						|
         <*>r       an r in any start condition, even an exclusive one.
 | 
						|
 | 
						|
 | 
						|
         <<EOF>>    an end-of-file
 | 
						|
         <s1,s2><<EOF>>
 | 
						|
                    an end-of-file when in start condition s1 or s2
 | 
						|
 | 
						|
     Note that inside of a character class, all  regular  expres-
 | 
						|
     sion  operators  lose  their  special  meaning except escape
 | 
						|
     ('\') and the character class operators, '-', ']',  and,  at
 | 
						|
     the beginning of the class, '^'.
 | 
						|
 | 
						|
     The regular expressions listed above are  grouped  according
 | 
						|
     to  precedence, from highest precedence at the top to lowest
 | 
						|
     at the bottom.   Those  grouped  together  have  equal  pre-
 | 
						|
     cedence.  For example,
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    7
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         foo|bar*
 | 
						|
 | 
						|
     is the same as
 | 
						|
 | 
						|
         (foo)|(ba(r*))
 | 
						|
 | 
						|
     since the '*' operator has higher precedence than concatena-
 | 
						|
     tion, and concatenation higher than alternation ('|').  This
 | 
						|
     pattern therefore matches either the  string  "foo"  or  the
 | 
						|
     string "ba" followed by zero-or-more r's.  To match "foo" or
 | 
						|
     zero-or-more "bar"'s, use:
 | 
						|
 | 
						|
         foo|(bar)*
 | 
						|
 | 
						|
     and to match zero-or-more "foo"'s-or-"bar"'s:
 | 
						|
 | 
						|
         (foo|bar)*
 | 
						|
 | 
						|
 | 
						|
     In addition to characters and ranges of characters,  charac-
 | 
						|
     ter  classes  can  also contain character class expressions.
 | 
						|
     These are expressions enclosed inside [: and  :]  delimiters
 | 
						|
     (which themselves must appear between the '[' and ']' of the
 | 
						|
     character class; other elements may occur inside the charac-
 | 
						|
     ter class, too).  The valid expressions are:
 | 
						|
 | 
						|
         [:alnum:] [:alpha:] [:blank:]
 | 
						|
         [:cntrl:] [:digit:] [:graph:]
 | 
						|
         [:lower:] [:print:] [:punct:]
 | 
						|
         [:space:] [:upper:] [:xdigit:]
 | 
						|
 | 
						|
     These  expressions  all  designate  a  set   of   characters
 | 
						|
     equivalent  to  the corresponding standard C isXXX function.
 | 
						|
     For example, [:alnum:] designates those characters for which
 | 
						|
     isalnum()  returns  true  - i.e., any alphabetic or numeric.
 | 
						|
     Some  systems  don't  provide  isblank(),  so  flex  defines
 | 
						|
     [:blank:] as a blank or a tab.
 | 
						|
 | 
						|
     For  example,  the  following  character  classes  are   all
 | 
						|
     equivalent:
 | 
						|
 | 
						|
         [[:alnum:]]
 | 
						|
         [[:alpha:][:digit:]
 | 
						|
         [[:alpha:]0-9]
 | 
						|
         [a-zA-Z0-9]
 | 
						|
 | 
						|
     If your scanner is  case-insensitive  (the  -i  flag),  then
 | 
						|
     [:upper:] and [:lower:] are equivalent to [:alpha:].
 | 
						|
 | 
						|
     Some notes on patterns:
 | 
						|
 | 
						|
     -    A negated character class such as the example  "[^A-Z]"
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    8
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          above   will   match  a  newline  unless  "\n"  (or  an
 | 
						|
          equivalent escape sequence) is one  of  the  characters
 | 
						|
          explicitly  present  in  the  negated  character  class
 | 
						|
          (e.g., "[^A-Z\n]").  This is unlike how many other reg-
 | 
						|
          ular  expression tools treat negated character classes,
 | 
						|
          but unfortunately  the  inconsistency  is  historically
 | 
						|
          entrenched.   Matching  newlines  means  that a pattern
 | 
						|
          like [^"]* can match the entire  input  unless  there's
 | 
						|
          another quote in the input.
 | 
						|
 | 
						|
     -    A rule can have at most one instance of  trailing  con-
 | 
						|
          text (the '/' operator or the '$' operator).  The start
 | 
						|
          condition, '^', and "<<EOF>>" patterns can  only  occur
 | 
						|
          at the beginning of a pattern, and, as well as with '/'
 | 
						|
          and '$', cannot be grouped inside parentheses.   A  '^'
 | 
						|
          which  does  not  occur at the beginning of a rule or a
 | 
						|
          '$' which does not occur at the end of a rule loses its
 | 
						|
          special  properties  and is treated as a normal charac-
 | 
						|
          ter.
 | 
						|
 | 
						|
          The following are illegal:
 | 
						|
 | 
						|
              foo/bar$
 | 
						|
              <sc1>foo<sc2>bar
 | 
						|
 | 
						|
          Note  that  the  first  of  these,   can   be   written
 | 
						|
          "foo/bar\n".
 | 
						|
 | 
						|
          The following will result in '$' or '^'  being  treated
 | 
						|
          as a normal character:
 | 
						|
 | 
						|
              foo|(bar$)
 | 
						|
              foo|^bar
 | 
						|
 | 
						|
          If what's wanted is a  "foo"  or  a  bar-followed-by-a-
 | 
						|
          newline,  the  following could be used (the special '|'
 | 
						|
          action is explained below):
 | 
						|
 | 
						|
              foo      |
 | 
						|
              bar$     /* action goes here */
 | 
						|
 | 
						|
          A similar trick will work for matching a foo or a  bar-
 | 
						|
          at-the-beginning-of-a-line.
 | 
						|
 | 
						|
HOW THE INPUT IS MATCHED
 | 
						|
     When the generated scanner is run,  it  analyzes  its  input
 | 
						|
     looking  for strings which match any of its patterns.  If it
 | 
						|
     finds more than one match, it takes  the  one  matching  the
 | 
						|
     most  text  (for  trailing  context rules, this includes the
 | 
						|
     length of the trailing part, even though  it  will  then  be
 | 
						|
     returned  to the input).  If it finds two or more matches of
 | 
						|
     the same length, the rule listed first  in  the  flex  input
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                    9
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     file is chosen.
 | 
						|
 | 
						|
     Once the match is determined, the text corresponding to  the
 | 
						|
     match  (called  the  token)  is made available in the global
 | 
						|
     character pointer yytext,  and  its  length  in  the  global
 | 
						|
     integer yyleng. The action corresponding to the matched pat-
 | 
						|
     tern is  then  executed  (a  more  detailed  description  of
 | 
						|
     actions  follows),  and  then the remaining input is scanned
 | 
						|
     for another match.
 | 
						|
 | 
						|
     If no match is found, then the default rule is executed: the
 | 
						|
     next character in the input is considered matched and copied
 | 
						|
     to the standard output.  Thus, the simplest legal flex input
 | 
						|
     is:
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
     which generates a scanner that simply copies its input  (one
 | 
						|
     character at a time) to its output.
 | 
						|
 | 
						|
     Note that yytext can  be  defined  in  two  different  ways:
 | 
						|
     either  as  a character pointer or as a character array. You
 | 
						|
     can control which definition flex uses by including  one  of
 | 
						|
     the  special  directives  %pointer  or  %array  in the first
 | 
						|
     (definitions) section of your flex input.   The  default  is
 | 
						|
     %pointer, unless you use the -l lex compatibility option, in
 | 
						|
     which case yytext will be an array.  The advantage of  using
 | 
						|
     %pointer  is  substantially  faster  scanning  and no buffer
 | 
						|
     overflow when matching very large tokens (unless you run out
 | 
						|
     of  dynamic  memory).  The disadvantage is that you are res-
 | 
						|
     tricted in how your actions can modify yytext (see the  next
 | 
						|
     section),  and  calls  to  the unput() function destroys the
 | 
						|
     present contents of yytext,  which  can  be  a  considerable
 | 
						|
     porting headache when moving between different lex versions.
 | 
						|
 | 
						|
     The advantage of %array is that you can then  modify  yytext
 | 
						|
     to your heart's content, and calls to unput() do not destroy
 | 
						|
     yytext (see  below).   Furthermore,  existing  lex  programs
 | 
						|
     sometimes access yytext externally using declarations of the
 | 
						|
     form:
 | 
						|
         extern char yytext[];
 | 
						|
     This definition is erroneous when used  with  %pointer,  but
 | 
						|
     correct for %array.
 | 
						|
 | 
						|
     %array defines yytext to be an array of  YYLMAX  characters,
 | 
						|
     which  defaults to a fairly large value.  You can change the
 | 
						|
     size by simply #define'ing YYLMAX to a  different  value  in
 | 
						|
     the  first  section of your flex input.  As mentioned above,
 | 
						|
     with %pointer yytext grows dynamically to accommodate  large
 | 
						|
     tokens.  While this means your %pointer scanner can accommo-
 | 
						|
     date very large tokens (such as matching  entire  blocks  of
 | 
						|
     comments),  bear  in  mind  that  each time the scanner must
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   10
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     resize yytext it also must rescan the entire token from  the
 | 
						|
     beginning,  so  matching such tokens can prove slow.  yytext
 | 
						|
     presently does not dynamically grow if  a  call  to  unput()
 | 
						|
     results  in too much text being pushed back; instead, a run-
 | 
						|
     time error results.
 | 
						|
 | 
						|
     Also note that  you  cannot  use  %array  with  C++  scanner
 | 
						|
     classes (the c++ option; see below).
 | 
						|
 | 
						|
ACTIONS
 | 
						|
     Each pattern in a rule has a corresponding action, which can
 | 
						|
     be any arbitrary C statement.  The pattern ends at the first
 | 
						|
     non-escaped whitespace character; the remainder of the  line
 | 
						|
     is  its  action.  If the action is empty, then when the pat-
 | 
						|
     tern is matched the input token is  simply  discarded.   For
 | 
						|
     example,  here  is  the  specification  for  a program which
 | 
						|
     deletes all occurrences of "zap me" from its input:
 | 
						|
 | 
						|
         %%
 | 
						|
         "zap me"
 | 
						|
 | 
						|
     (It will copy all other characters in the input to the  out-
 | 
						|
     put since they will be matched by the default rule.)
 | 
						|
 | 
						|
     Here is a program which compresses multiple blanks and  tabs
 | 
						|
     down  to a single blank, and throws away whitespace found at
 | 
						|
     the end of a line:
 | 
						|
 | 
						|
         %%
 | 
						|
         [ \t]+        putchar( ' ' );
 | 
						|
         [ \t]+$       /* ignore this token */
 | 
						|
 | 
						|
 | 
						|
     If the action contains a '{', then the action spans till the
 | 
						|
     balancing  '}'  is  found, and the action may cross multiple
 | 
						|
     lines.  flex knows about C strings and comments and won't be
 | 
						|
     fooled  by braces found within them, but also allows actions
 | 
						|
     to begin with %{ and will consider the action to be all  the
 | 
						|
     text up to the next %} (regardless of ordinary braces inside
 | 
						|
     the action).
 | 
						|
 | 
						|
     An action consisting solely of a vertical  bar  ('|')  means
 | 
						|
     "same  as  the  action for the next rule."  See below for an
 | 
						|
     illustration.
 | 
						|
 | 
						|
     Actions can  include  arbitrary  C  code,  including  return
 | 
						|
     statements  to  return  a  value  to whatever routine called
 | 
						|
     yylex(). Each time yylex() is called it continues processing
 | 
						|
     tokens  from  where it last left off until it either reaches
 | 
						|
     the end of the file or executes a return.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   11
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     Actions are free to modify yytext except for lengthening  it
 | 
						|
     (adding  characters  to  its end--these will overwrite later
 | 
						|
     characters in the input  stream).   This  however  does  not
 | 
						|
     apply  when  using  %array (see above); in that case, yytext
 | 
						|
     may be freely modified in any way.
 | 
						|
 | 
						|
     Actions are free to modify yyleng except they should not  do
 | 
						|
     so if the action also includes use of yymore() (see below).
 | 
						|
 | 
						|
     There are a  number  of  special  directives  which  can  be
 | 
						|
     included within an action:
 | 
						|
 | 
						|
     -    ECHO copies yytext to the scanner's output.
 | 
						|
 | 
						|
     -    BEGIN followed by the name of a start condition  places
 | 
						|
          the  scanner  in the corresponding start condition (see
 | 
						|
          below).
 | 
						|
 | 
						|
     -    REJECT directs the scanner to proceed on to the "second
 | 
						|
          best"  rule which matched the input (or a prefix of the
 | 
						|
          input).  The rule is chosen as described above in  "How
 | 
						|
          the  Input  is  Matched",  and yytext and yyleng set up
 | 
						|
          appropriately.  It may either be one which  matched  as
 | 
						|
          much  text as the originally chosen rule but came later
 | 
						|
          in the flex input file, or one which matched less text.
 | 
						|
          For example, the following will both count the words in
 | 
						|
          the input  and  call  the  routine  special()  whenever
 | 
						|
          "frob" is seen:
 | 
						|
 | 
						|
                      int word_count = 0;
 | 
						|
              %%
 | 
						|
 | 
						|
              frob        special(); REJECT;
 | 
						|
              [^ \t\n]+   ++word_count;
 | 
						|
 | 
						|
          Without the REJECT, any "frob"'s in the input would not
 | 
						|
          be  counted  as  words, since the scanner normally exe-
 | 
						|
          cutes only one action per token.  Multiple REJECT's are
 | 
						|
          allowed,  each  one finding the next best choice to the
 | 
						|
          currently active rule.  For example, when the following
 | 
						|
          scanner  scans the token "abcd", it will write "abcdab-
 | 
						|
          caba" to the output:
 | 
						|
 | 
						|
              %%
 | 
						|
              a        |
 | 
						|
              ab       |
 | 
						|
              abc      |
 | 
						|
              abcd     ECHO; REJECT;
 | 
						|
              .|\n     /* eat up any unmatched character */
 | 
						|
 | 
						|
          (The first three rules share the fourth's action  since
 | 
						|
          they   use   the  special  '|'  action.)  REJECT  is  a
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   12
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          particularly expensive feature in terms of scanner per-
 | 
						|
          formance; if it is used in any of the scanner's actions
 | 
						|
          it will  slow  down  all  of  the  scanner's  matching.
 | 
						|
          Furthermore,  REJECT cannot be used with the -Cf or -CF
 | 
						|
          options (see below).
 | 
						|
 | 
						|
          Note also that unlike the other special actions, REJECT
 | 
						|
          is  a  branch;  code  immediately  following  it in the
 | 
						|
          action will not be executed.
 | 
						|
 | 
						|
     -    yymore() tells  the  scanner  that  the  next  time  it
 | 
						|
          matches  a  rule,  the  corresponding  token  should be
 | 
						|
          appended onto the current value of yytext  rather  than
 | 
						|
          replacing  it.   For  example,  given  the input "mega-
 | 
						|
          kludge" the following will write "mega-mega-kludge"  to
 | 
						|
          the output:
 | 
						|
 | 
						|
              %%
 | 
						|
              mega-    ECHO; yymore();
 | 
						|
              kludge   ECHO;
 | 
						|
 | 
						|
          First "mega-" is matched  and  echoed  to  the  output.
 | 
						|
          Then  "kludge"  is matched, but the previous "mega-" is
 | 
						|
          still hanging around at the beginning of yytext so  the
 | 
						|
          ECHO  for  the "kludge" rule will actually write "mega-
 | 
						|
          kludge".
 | 
						|
 | 
						|
     Two notes regarding use of yymore(). First, yymore() depends
 | 
						|
     on  the value of yyleng correctly reflecting the size of the
 | 
						|
     current token, so you must not  modify  yyleng  if  you  are
 | 
						|
     using  yymore().  Second,  the  presence  of yymore() in the
 | 
						|
     scanner's action entails a minor performance penalty in  the
 | 
						|
     scanner's matching speed.
 | 
						|
 | 
						|
     -    yyless(n) returns all but the first n characters of the
 | 
						|
          current token back to the input stream, where they will
 | 
						|
          be rescanned when the scanner looks for the next match.
 | 
						|
          yytext  and  yyleng  are  adjusted appropriately (e.g.,
 | 
						|
          yyleng will now be equal to n ).  For example,  on  the
 | 
						|
          input  "foobar"  the  following will write out "foobar-
 | 
						|
          bar":
 | 
						|
 | 
						|
              %%
 | 
						|
              foobar    ECHO; yyless(3);
 | 
						|
              [a-z]+    ECHO;
 | 
						|
 | 
						|
          An argument of  0  to  yyless  will  cause  the  entire
 | 
						|
          current  input  string  to  be  scanned  again.  Unless
 | 
						|
          you've changed how the scanner will  subsequently  pro-
 | 
						|
          cess  its  input  (using BEGIN, for example), this will
 | 
						|
          result in an endless loop.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   13
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     Note that yyless is a macro and can only be used in the flex
 | 
						|
     input file, not from other source files.
 | 
						|
 | 
						|
     -    unput(c) puts the  character  c  back  onto  the  input
 | 
						|
          stream.   It  will  be the next character scanned.  The
 | 
						|
          following action will take the current token and  cause
 | 
						|
          it to be rescanned enclosed in parentheses.
 | 
						|
 | 
						|
              {
 | 
						|
              int i;
 | 
						|
              /* Copy yytext because unput() trashes yytext */
 | 
						|
              char *yycopy = strdup( yytext );
 | 
						|
              unput( ')' );
 | 
						|
              for ( i = yyleng - 1; i >= 0; --i )
 | 
						|
                  unput( yycopy[i] );
 | 
						|
              unput( '(' );
 | 
						|
              free( yycopy );
 | 
						|
              }
 | 
						|
 | 
						|
          Note that since each unput() puts the  given  character
 | 
						|
          back at the beginning of the input stream, pushing back
 | 
						|
          strings must be done back-to-front.
 | 
						|
 | 
						|
     An important potential problem when using unput() is that if
 | 
						|
     you are using %pointer (the default), a call to unput() des-
 | 
						|
     troys the contents of yytext, starting  with  its  rightmost
 | 
						|
     character  and devouring one character to the left with each
 | 
						|
     call.  If you need the value of  yytext  preserved  after  a
 | 
						|
     call  to  unput() (as in the above example), you must either
 | 
						|
     first copy it elsewhere, or build your scanner using  %array
 | 
						|
     instead (see How The Input Is Matched).
 | 
						|
 | 
						|
     Finally, note that you cannot put back  EOF  to  attempt  to
 | 
						|
     mark the input stream with an end-of-file.
 | 
						|
 | 
						|
     -    input() reads the next character from the input stream.
 | 
						|
          For  example, the following is one way to eat up C com-
 | 
						|
          ments:
 | 
						|
 | 
						|
              %%
 | 
						|
              "/*"        {
 | 
						|
                          register int c;
 | 
						|
 | 
						|
                          for ( ; ; )
 | 
						|
                              {
 | 
						|
                              while ( (c = input()) != '*' &&
 | 
						|
                                      c != EOF )
 | 
						|
                                  ;    /* eat up text of comment */
 | 
						|
 | 
						|
                              if ( c == '*' )
 | 
						|
                                  {
 | 
						|
                                  while ( (c = input()) == '*' )
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   14
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
                                      ;
 | 
						|
                                  if ( c == '/' )
 | 
						|
                                      break;    /* found the end */
 | 
						|
                                  }
 | 
						|
 | 
						|
                              if ( c == EOF )
 | 
						|
                                  {
 | 
						|
                                  error( "EOF in comment" );
 | 
						|
                                  break;
 | 
						|
                                  }
 | 
						|
                              }
 | 
						|
                          }
 | 
						|
 | 
						|
          (Note that if the scanner is compiled using  C++,  then
 | 
						|
          input()  is  instead referred to as yyinput(), in order
 | 
						|
          to avoid a name clash with the C++ stream by  the  name
 | 
						|
          of input.)
 | 
						|
 | 
						|
     -    YY_FLUSH_BUFFER flushes the scanner's  internal  buffer
 | 
						|
          so  that  the next time the scanner attempts to match a
 | 
						|
          token, it will first refill the buffer  using  YY_INPUT
 | 
						|
          (see  The  Generated Scanner, below).  This action is a
 | 
						|
          special case  of  the  more  general  yy_flush_buffer()
 | 
						|
          function, described below in the section Multiple Input
 | 
						|
          Buffers.
 | 
						|
 | 
						|
     -    yyterminate() can be used in lieu of a return statement
 | 
						|
          in  an action.  It terminates the scanner and returns a
 | 
						|
          0 to the scanner's caller, indicating "all  done".   By
 | 
						|
          default,  yyterminate()  is also called when an end-of-
 | 
						|
          file is encountered.  It is a macro and  may  be  rede-
 | 
						|
          fined.
 | 
						|
 | 
						|
THE GENERATED SCANNER
 | 
						|
     The output of flex is the file lex.yy.c, which contains  the
 | 
						|
     scanning  routine yylex(), a number of tables used by it for
 | 
						|
     matching tokens, and a number of auxiliary routines and mac-
 | 
						|
     ros.  By default, yylex() is declared as follows:
 | 
						|
 | 
						|
         int yylex()
 | 
						|
             {
 | 
						|
             ... various definitions and the actions in here ...
 | 
						|
             }
 | 
						|
 | 
						|
     (If your environment supports function prototypes,  then  it
 | 
						|
     will  be  "int  yylex(  void  )".)   This  definition may be
 | 
						|
     changed by defining the "YY_DECL" macro.  For  example,  you
 | 
						|
     could use:
 | 
						|
 | 
						|
         #define YY_DECL float lexscan( a, b ) float a, b;
 | 
						|
 | 
						|
     to give the scanning routine the name lexscan,  returning  a
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   15
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     float, and taking two floats as arguments.  Note that if you
 | 
						|
     give  arguments  to  the  scanning  routine  using  a   K&R-
 | 
						|
     style/non-prototyped  function  declaration,  you  must ter-
 | 
						|
     minate the definition with a semi-colon (;).
 | 
						|
 | 
						|
     Whenever yylex() is called, it scans tokens from the  global
 | 
						|
     input  file  yyin  (which  defaults to stdin).  It continues
 | 
						|
     until it either reaches an end-of-file (at  which  point  it
 | 
						|
     returns the value 0) or one of its actions executes a return
 | 
						|
     statement.
 | 
						|
 | 
						|
     If the scanner reaches an end-of-file, subsequent calls  are
 | 
						|
     undefined  unless either yyin is pointed at a new input file
 | 
						|
     (in which case scanning continues from that file), or yyres-
 | 
						|
     tart()  is called.  yyrestart() takes one argument, a FILE *
 | 
						|
     pointer (which can be nil, if you've set up YY_INPUT to scan
 | 
						|
     from  a  source  other  than yyin), and initializes yyin for
 | 
						|
     scanning from that file.  Essentially there is no difference
 | 
						|
     between  just  assigning  yyin  to a new input file or using
 | 
						|
     yyrestart() to do so; the latter is available  for  compati-
 | 
						|
     bility with previous versions of flex, and because it can be
 | 
						|
     used to switch input files in the middle  of  scanning.   It
 | 
						|
     can  also be used to throw away the current input buffer, by
 | 
						|
     calling it with an argument of yyin; but better  is  to  use
 | 
						|
     YY_FLUSH_BUFFER (see above).  Note that yyrestart() does not
 | 
						|
     reset the start condition to INITIAL (see Start  Conditions,
 | 
						|
     below).
 | 
						|
 | 
						|
     If yylex() stops scanning due to executing a  return  state-
 | 
						|
     ment  in  one of the actions, the scanner may then be called
 | 
						|
     again and it will resume scanning where it left off.
 | 
						|
 | 
						|
     By default (and for purposes  of  efficiency),  the  scanner
 | 
						|
     uses  block-reads  rather  than  simple getc() calls to read
 | 
						|
     characters from yyin. The nature of how it  gets  its  input
 | 
						|
     can   be   controlled   by   defining  the  YY_INPUT  macro.
 | 
						|
     YY_INPUT's           calling           sequence           is
 | 
						|
     "YY_INPUT(buf,result,max_size)".   Its action is to place up
 | 
						|
     to max_size characters in the character array buf and return
 | 
						|
     in  the integer variable result either the number of charac-
 | 
						|
     ters read or the constant YY_NULL (0  on  Unix  systems)  to
 | 
						|
     indicate  EOF.   The  default YY_INPUT reads from the global
 | 
						|
     file-pointer "yyin".
 | 
						|
 | 
						|
     A sample definition of YY_INPUT (in the definitions  section
 | 
						|
     of the input file):
 | 
						|
 | 
						|
         %{
 | 
						|
         #define YY_INPUT(buf,result,max_size) \
 | 
						|
             { \
 | 
						|
             int c = getchar(); \
 | 
						|
             result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   16
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
             }
 | 
						|
         %}
 | 
						|
 | 
						|
     This definition will change the input  processing  to  occur
 | 
						|
     one character at a time.
 | 
						|
 | 
						|
     When the scanner receives  an  end-of-file  indication  from
 | 
						|
     YY_INPUT, it then checks the yywrap() function.  If yywrap()
 | 
						|
     returns false (zero), then it is assumed that  the  function
 | 
						|
     has  gone  ahead  and  set up yyin to point to another input
 | 
						|
     file, and scanning continues.   If  it  returns  true  (non-
 | 
						|
     zero),  then  the  scanner  terminates,  returning  0 to its
 | 
						|
     caller.  Note that  in  either  case,  the  start  condition
 | 
						|
     remains unchanged; it does not revert to INITIAL.
 | 
						|
 | 
						|
     If you do not supply your own version of yywrap(), then  you
 | 
						|
     must  either use %option noyywrap (in which case the scanner
 | 
						|
     behaves as though yywrap() returned 1),  or  you  must  link
 | 
						|
     with  -lfl  to  obtain  the  default version of the routine,
 | 
						|
     which always returns 1.
 | 
						|
 | 
						|
     Three routines are available  for  scanning  from  in-memory
 | 
						|
     buffers     rather     than     files:     yy_scan_string(),
 | 
						|
     yy_scan_bytes(), and yy_scan_buffer(). See the discussion of
 | 
						|
     them below in the section Multiple Input Buffers.
 | 
						|
 | 
						|
     The scanner writes its  ECHO  output  to  the  yyout  global
 | 
						|
     (default, stdout), which may be redefined by the user simply
 | 
						|
     by assigning it to some other FILE pointer.
 | 
						|
 | 
						|
START CONDITIONS
 | 
						|
     flex  provides  a  mechanism  for  conditionally  activating
 | 
						|
     rules.   Any rule whose pattern is prefixed with "<sc>" will
 | 
						|
     only be active when the scanner is in  the  start  condition
 | 
						|
     named "sc".  For example,
 | 
						|
 | 
						|
         <STRING>[^"]*        { /* eat up the string body ... */
 | 
						|
                     ...
 | 
						|
                     }
 | 
						|
 | 
						|
     will be active only when the  scanner  is  in  the  "STRING"
 | 
						|
     start condition, and
 | 
						|
 | 
						|
         <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
 | 
						|
                     ...
 | 
						|
                     }
 | 
						|
 | 
						|
     will be active only when  the  current  start  condition  is
 | 
						|
     either "INITIAL", "STRING", or "QUOTE".
 | 
						|
 | 
						|
     Start conditions are declared  in  the  definitions  (first)
 | 
						|
     section  of  the input using unindented lines beginning with
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   17
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     either %s or %x followed by a list  of  names.   The  former
 | 
						|
     declares  inclusive  start  conditions, the latter exclusive
 | 
						|
     start conditions.  A start condition is activated using  the
 | 
						|
     BEGIN  action.   Until  the  next  BEGIN action is executed,
 | 
						|
     rules with the given start  condition  will  be  active  and
 | 
						|
     rules  with other start conditions will be inactive.  If the
 | 
						|
     start condition is inclusive, then rules with no start  con-
 | 
						|
     ditions  at  all  will  also be active.  If it is exclusive,
 | 
						|
     then only rules qualified with the start condition  will  be
 | 
						|
     active.   A  set  of  rules contingent on the same exclusive
 | 
						|
     start condition describe a scanner which is  independent  of
 | 
						|
     any  of the other rules in the flex input.  Because of this,
 | 
						|
     exclusive start conditions make it easy  to  specify  "mini-
 | 
						|
     scanners"  which scan portions of the input that are syntac-
 | 
						|
     tically different from the rest (e.g., comments).
 | 
						|
 | 
						|
     If the distinction between  inclusive  and  exclusive  start
 | 
						|
     conditions  is still a little vague, here's a simple example
 | 
						|
     illustrating the connection between the  two.   The  set  of
 | 
						|
     rules:
 | 
						|
 | 
						|
         %s example
 | 
						|
         %%
 | 
						|
 | 
						|
         <example>foo   do_something();
 | 
						|
 | 
						|
         bar            something_else();
 | 
						|
 | 
						|
     is equivalent to
 | 
						|
 | 
						|
         %x example
 | 
						|
         %%
 | 
						|
 | 
						|
         <example>foo   do_something();
 | 
						|
 | 
						|
         <INITIAL,example>bar    something_else();
 | 
						|
 | 
						|
     Without the <INITIAL,example> qualifier, the bar pattern  in
 | 
						|
     the second example wouldn't be active (i.e., couldn't match)
 | 
						|
     when in start condition example. If we just  used  <example>
 | 
						|
     to  qualify  bar,  though,  then  it would only be active in
 | 
						|
     example and not in INITIAL, while in the first example  it's
 | 
						|
     active  in  both,  because  in the first example the example
 | 
						|
     startion condition is an inclusive (%s) start condition.
 | 
						|
 | 
						|
     Also note that the  special  start-condition  specifier  <*>
 | 
						|
     matches  every  start  condition.   Thus,  the above example
 | 
						|
     could also have been written;
 | 
						|
 | 
						|
         %x example
 | 
						|
         %%
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   18
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         <example>foo   do_something();
 | 
						|
 | 
						|
         <*>bar    something_else();
 | 
						|
 | 
						|
 | 
						|
     The default rule (to ECHO any unmatched  character)  remains
 | 
						|
     active in start conditions.  It is equivalent to:
 | 
						|
 | 
						|
         <*>.|\n     ECHO;
 | 
						|
 | 
						|
 | 
						|
     BEGIN(0) returns to the original state where only the  rules
 | 
						|
     with no start conditions are active.  This state can also be
 | 
						|
     referred   to   as   the   start-condition   "INITIAL",   so
 | 
						|
     BEGIN(INITIAL)  is  equivalent to BEGIN(0). (The parentheses
 | 
						|
     around the start condition name are  not  required  but  are
 | 
						|
     considered good style.)
 | 
						|
 | 
						|
     BEGIN actions can also be given  as  indented  code  at  the
 | 
						|
     beginning  of the rules section.  For example, the following
 | 
						|
     will cause the scanner to enter the "SPECIAL"  start  condi-
 | 
						|
     tion  whenever  yylex()  is  called  and the global variable
 | 
						|
     enter_special is true:
 | 
						|
 | 
						|
                 int enter_special;
 | 
						|
 | 
						|
         %x SPECIAL
 | 
						|
         %%
 | 
						|
                 if ( enter_special )
 | 
						|
                     BEGIN(SPECIAL);
 | 
						|
 | 
						|
         <SPECIAL>blahblahblah
 | 
						|
         ...more rules follow...
 | 
						|
 | 
						|
 | 
						|
     To illustrate the  uses  of  start  conditions,  here  is  a
 | 
						|
     scanner  which  provides  two different interpretations of a
 | 
						|
     string like "123.456".  By default it will treat it as three
 | 
						|
     tokens,  the  integer  "123",  a  dot ('.'), and the integer
 | 
						|
     "456".  But if the string is preceded earlier in the line by
 | 
						|
     the  string  "expect-floats"  it  will  treat it as a single
 | 
						|
     token, the floating-point number 123.456:
 | 
						|
 | 
						|
         %{
 | 
						|
         #include <math.h>
 | 
						|
         %}
 | 
						|
         %s expect
 | 
						|
 | 
						|
         %%
 | 
						|
         expect-floats        BEGIN(expect);
 | 
						|
 | 
						|
         <expect>[0-9]+"."[0-9]+      {
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   19
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
                     printf( "found a float, = %f\n",
 | 
						|
                             atof( yytext ) );
 | 
						|
                     }
 | 
						|
         <expect>\n           {
 | 
						|
                     /* that's the end of the line, so
 | 
						|
                      * we need another "expect-number"
 | 
						|
                      * before we'll recognize any more
 | 
						|
                      * numbers
 | 
						|
                      */
 | 
						|
                     BEGIN(INITIAL);
 | 
						|
                     }
 | 
						|
 | 
						|
         [0-9]+      {
 | 
						|
                     printf( "found an integer, = %d\n",
 | 
						|
                             atoi( yytext ) );
 | 
						|
                     }
 | 
						|
 | 
						|
         "."         printf( "found a dot\n" );
 | 
						|
 | 
						|
     Here is a scanner which recognizes (and discards) C comments
 | 
						|
     while maintaining a count of the current input line.
 | 
						|
 | 
						|
         %x comment
 | 
						|
         %%
 | 
						|
                 int line_num = 1;
 | 
						|
 | 
						|
         "/*"         BEGIN(comment);
 | 
						|
 | 
						|
         <comment>[^*\n]*        /* eat anything that's not a '*' */
 | 
						|
         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
 | 
						|
         <comment>\n             ++line_num;
 | 
						|
         <comment>"*"+"/"        BEGIN(INITIAL);
 | 
						|
 | 
						|
     This scanner goes to a bit of trouble to match as much  text
 | 
						|
     as  possible with each rule.  In general, when attempting to
 | 
						|
     write a high-speed scanner try to match as much possible  in
 | 
						|
     each rule, as it's a big win.
 | 
						|
 | 
						|
     Note that start-conditions names are really  integer  values
 | 
						|
     and  can  be  stored  as  such.   Thus,  the  above could be
 | 
						|
     extended in the following fashion:
 | 
						|
 | 
						|
         %x comment foo
 | 
						|
         %%
 | 
						|
                 int line_num = 1;
 | 
						|
                 int comment_caller;
 | 
						|
 | 
						|
         "/*"         {
 | 
						|
                      comment_caller = INITIAL;
 | 
						|
                      BEGIN(comment);
 | 
						|
                      }
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   20
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         ...
 | 
						|
 | 
						|
         <foo>"/*"    {
 | 
						|
                      comment_caller = foo;
 | 
						|
                      BEGIN(comment);
 | 
						|
                      }
 | 
						|
 | 
						|
         <comment>[^*\n]*        /* eat anything that's not a '*' */
 | 
						|
         <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
 | 
						|
         <comment>\n             ++line_num;
 | 
						|
         <comment>"*"+"/"        BEGIN(comment_caller);
 | 
						|
 | 
						|
     Furthermore, you can  access  the  current  start  condition
 | 
						|
     using  the  integer-valued YY_START macro.  For example, the
 | 
						|
     above assignments to comment_caller could instead be written
 | 
						|
 | 
						|
         comment_caller = YY_START;
 | 
						|
 | 
						|
     Flex provides YYSTATE as an alias for YY_START  (since  that
 | 
						|
     is what's used by AT&T lex).
 | 
						|
 | 
						|
     Note that start conditions do not have their own name-space;
 | 
						|
     %s's   and  %x's  declare  names  in  the  same  fashion  as
 | 
						|
     #define's.
 | 
						|
 | 
						|
     Finally, here's an example of how to  match  C-style  quoted
 | 
						|
     strings using exclusive start conditions, including expanded
 | 
						|
     escape sequences (but not including checking  for  a  string
 | 
						|
     that's too long):
 | 
						|
 | 
						|
         %x str
 | 
						|
 | 
						|
         %%
 | 
						|
                 char string_buf[MAX_STR_CONST];
 | 
						|
                 char *string_buf_ptr;
 | 
						|
 | 
						|
 | 
						|
         \"      string_buf_ptr = string_buf; BEGIN(str);
 | 
						|
 | 
						|
         <str>\"        { /* saw closing quote - all done */
 | 
						|
                 BEGIN(INITIAL);
 | 
						|
                 *string_buf_ptr = '\0';
 | 
						|
                 /* return string constant token type and
 | 
						|
                  * value to parser
 | 
						|
                  */
 | 
						|
                 }
 | 
						|
 | 
						|
         <str>\n        {
 | 
						|
                 /* error - unterminated string constant */
 | 
						|
                 /* generate error message */
 | 
						|
                 }
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   21
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         <str>\\[0-7]{1,3} {
 | 
						|
                 /* octal escape sequence */
 | 
						|
                 int result;
 | 
						|
 | 
						|
                 (void) sscanf( yytext + 1, "%o", &result );
 | 
						|
 | 
						|
                 if ( result > 0xff )
 | 
						|
                         /* error, constant is out-of-bounds */
 | 
						|
 | 
						|
                 *string_buf_ptr++ = result;
 | 
						|
                 }
 | 
						|
 | 
						|
         <str>\\[0-9]+ {
 | 
						|
                 /* generate error - bad escape sequence; something
 | 
						|
                  * like '\48' or '\0777777'
 | 
						|
                  */
 | 
						|
                 }
 | 
						|
 | 
						|
         <str>\\n  *string_buf_ptr++ = '\n';
 | 
						|
         <str>\\t  *string_buf_ptr++ = '\t';
 | 
						|
         <str>\\r  *string_buf_ptr++ = '\r';
 | 
						|
         <str>\\b  *string_buf_ptr++ = '\b';
 | 
						|
         <str>\\f  *string_buf_ptr++ = '\f';
 | 
						|
 | 
						|
         <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
 | 
						|
 | 
						|
         <str>[^\\\n\"]+        {
 | 
						|
                 char *yptr = yytext;
 | 
						|
 | 
						|
                 while ( *yptr )
 | 
						|
                         *string_buf_ptr++ = *yptr++;
 | 
						|
                 }
 | 
						|
 | 
						|
 | 
						|
     Often, such as in some of the examples above,  you  wind  up
 | 
						|
     writing  a  whole  bunch  of  rules all preceded by the same
 | 
						|
     start condition(s).  Flex makes this  a  little  easier  and
 | 
						|
     cleaner  by introducing a notion of start condition scope. A
 | 
						|
     start condition scope is begun with:
 | 
						|
 | 
						|
         <SCs>{
 | 
						|
 | 
						|
     where SCs is a list of one or more start conditions.  Inside
 | 
						|
     the  start condition scope, every rule automatically has the
 | 
						|
     prefix <SCs> applied to it, until a '}'  which  matches  the
 | 
						|
     initial '{'. So, for example,
 | 
						|
 | 
						|
         <ESC>{
 | 
						|
             "\\n"   return '\n';
 | 
						|
             "\\r"   return '\r';
 | 
						|
             "\\f"   return '\f';
 | 
						|
             "\\0"   return '\0';
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   22
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         }
 | 
						|
 | 
						|
     is equivalent to:
 | 
						|
 | 
						|
         <ESC>"\\n"  return '\n';
 | 
						|
         <ESC>"\\r"  return '\r';
 | 
						|
         <ESC>"\\f"  return '\f';
 | 
						|
         <ESC>"\\0"  return '\0';
 | 
						|
 | 
						|
     Start condition scopes may be nested.
 | 
						|
 | 
						|
     Three routines are  available  for  manipulating  stacks  of
 | 
						|
     start conditions:
 | 
						|
 | 
						|
     void yy_push_state(int new_state)
 | 
						|
          pushes the current start condition onto the top of  the
 | 
						|
          start  condition  stack  and  switches  to new_state as
 | 
						|
          though you had used BEGIN new_state (recall that  start
 | 
						|
          condition names are also integers).
 | 
						|
 | 
						|
     void yy_pop_state()
 | 
						|
          pops the top of the stack and switches to it via BEGIN.
 | 
						|
 | 
						|
     int yy_top_state()
 | 
						|
          returns the top  of  the  stack  without  altering  the
 | 
						|
          stack's contents.
 | 
						|
 | 
						|
     The start condition stack grows dynamically and  so  has  no
 | 
						|
     built-in  size  limitation.  If memory is exhausted, program
 | 
						|
     execution aborts.
 | 
						|
 | 
						|
     To use start condition stacks, your scanner must  include  a
 | 
						|
     %option stack directive (see Options below).
 | 
						|
 | 
						|
MULTIPLE INPUT BUFFERS
 | 
						|
     Some scanners (such as those which support "include"  files)
 | 
						|
     require   reading  from  several  input  streams.   As  flex
 | 
						|
     scanners do a large amount of buffering, one cannot  control
 | 
						|
     where  the  next input will be read from by simply writing a
 | 
						|
     YY_INPUT  which  is  sensitive  to  the  scanning   context.
 | 
						|
     YY_INPUT  is only called when the scanner reaches the end of
 | 
						|
     its buffer, which may be a long time after scanning a state-
 | 
						|
     ment such as an "include" which requires switching the input
 | 
						|
     source.
 | 
						|
 | 
						|
     To negotiate  these  sorts  of  problems,  flex  provides  a
 | 
						|
     mechanism  for creating and switching between multiple input
 | 
						|
     buffers.  An input buffer is created by using:
 | 
						|
 | 
						|
         YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
 | 
						|
 | 
						|
     which takes a FILE pointer and a size and creates  a  buffer
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   23
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     associated with the given file and large enough to hold size
 | 
						|
     characters (when in doubt, use YY_BUF_SIZE  for  the  size).
 | 
						|
     It  returns  a  YY_BUFFER_STATE  handle,  which  may then be
 | 
						|
     passed to other routines (see below).   The  YY_BUFFER_STATE
 | 
						|
     type is a pointer to an opaque struct yy_buffer_state struc-
 | 
						|
     ture, so you may safely initialize YY_BUFFER_STATE variables
 | 
						|
     to  ((YY_BUFFER_STATE) 0) if you wish, and also refer to the
 | 
						|
     opaque structure in order to correctly declare input buffers
 | 
						|
     in  source files other than that of your scanner.  Note that
 | 
						|
     the FILE pointer in the call  to  yy_create_buffer  is  only
 | 
						|
     used  as the value of yyin seen by YY_INPUT; if you redefine
 | 
						|
     YY_INPUT so it no longer uses yyin, then you can safely pass
 | 
						|
     a nil FILE pointer to yy_create_buffer. You select a partic-
 | 
						|
     ular buffer to scan from using:
 | 
						|
 | 
						|
         void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
 | 
						|
 | 
						|
     switches the scanner's input  buffer  so  subsequent  tokens
 | 
						|
     will  come  from new_buffer. Note that yy_switch_to_buffer()
 | 
						|
     may be used by yywrap() to set things up for continued scan-
 | 
						|
     ning, instead of opening a new file and pointing yyin at it.
 | 
						|
     Note  also  that  switching   input   sources   via   either
 | 
						|
     yy_switch_to_buffer()  or yywrap() does not change the start
 | 
						|
     condition.
 | 
						|
 | 
						|
         void yy_delete_buffer( YY_BUFFER_STATE buffer )
 | 
						|
 | 
						|
     is used to reclaim the storage associated with a buffer.   (
 | 
						|
     buffer  can be nil, in which case the routine does nothing.)
 | 
						|
     You can also clear the current contents of a buffer using:
 | 
						|
 | 
						|
         void yy_flush_buffer( YY_BUFFER_STATE buffer )
 | 
						|
 | 
						|
     This function discards the buffer's contents,  so  the  next
 | 
						|
     time  the scanner attempts to match a token from the buffer,
 | 
						|
     it will first fill the buffer anew using YY_INPUT.
 | 
						|
 | 
						|
     yy_new_buffer() is an alias for yy_create_buffer(), provided
 | 
						|
     for  compatibility  with  the  C++ use of new and delete for
 | 
						|
     creating and destroying dynamic objects.
 | 
						|
 | 
						|
     Finally,   the    YY_CURRENT_BUFFER    macro    returns    a
 | 
						|
     YY_BUFFER_STATE handle to the current buffer.
 | 
						|
 | 
						|
     Here is an example of using these  features  for  writing  a
 | 
						|
     scanner  which expands include files (the <<EOF>> feature is
 | 
						|
     discussed below):
 | 
						|
 | 
						|
         /* the "incl" state is used for picking up the name
 | 
						|
          * of an include file
 | 
						|
          */
 | 
						|
         %x incl
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   24
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         %{
 | 
						|
         #define MAX_INCLUDE_DEPTH 10
 | 
						|
         YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
 | 
						|
         int include_stack_ptr = 0;
 | 
						|
         %}
 | 
						|
 | 
						|
         %%
 | 
						|
         include             BEGIN(incl);
 | 
						|
 | 
						|
         [a-z]+              ECHO;
 | 
						|
         [^a-z\n]*\n?        ECHO;
 | 
						|
 | 
						|
         <incl>[ \t]*      /* eat the whitespace */
 | 
						|
         <incl>[^ \t\n]+   { /* got the include file name */
 | 
						|
                 if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
 | 
						|
                     {
 | 
						|
                     fprintf( stderr, "Includes nested too deeply" );
 | 
						|
                     exit( 1 );
 | 
						|
                     }
 | 
						|
 | 
						|
                 include_stack[include_stack_ptr++] =
 | 
						|
                     YY_CURRENT_BUFFER;
 | 
						|
 | 
						|
                 yyin = fopen( yytext, "r" );
 | 
						|
 | 
						|
                 if ( ! yyin )
 | 
						|
                     error( ... );
 | 
						|
 | 
						|
                 yy_switch_to_buffer(
 | 
						|
                     yy_create_buffer( yyin, YY_BUF_SIZE ) );
 | 
						|
 | 
						|
                 BEGIN(INITIAL);
 | 
						|
                 }
 | 
						|
 | 
						|
         <<EOF>> {
 | 
						|
                 if ( --include_stack_ptr < 0 )
 | 
						|
                     {
 | 
						|
                     yyterminate();
 | 
						|
                     }
 | 
						|
 | 
						|
                 else
 | 
						|
                     {
 | 
						|
                     yy_delete_buffer( YY_CURRENT_BUFFER );
 | 
						|
                     yy_switch_to_buffer(
 | 
						|
                          include_stack[include_stack_ptr] );
 | 
						|
                     }
 | 
						|
                 }
 | 
						|
 | 
						|
     Three routines are available for setting  up  input  buffers
 | 
						|
     for  scanning  in-memory  strings  instead of files.  All of
 | 
						|
     them create a new input buffer for scanning the string,  and
 | 
						|
     return  a  corresponding  YY_BUFFER_STATE  handle (which you
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   25
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     should delete with yy_delete_buffer() when  done  with  it).
 | 
						|
     They    also    switch    to    the    new    buffer   using
 | 
						|
     yy_switch_to_buffer(), so the  next  call  to  yylex()  will
 | 
						|
     start scanning the string.
 | 
						|
 | 
						|
     yy_scan_string(const char *str)
 | 
						|
          scans a NUL-terminated string.
 | 
						|
 | 
						|
     yy_scan_bytes(const char *bytes, int len)
 | 
						|
          scans len bytes (including possibly NUL's) starting  at
 | 
						|
          location bytes.
 | 
						|
 | 
						|
     Note that both of these functions create and scan a copy  of
 | 
						|
     the  string or bytes.  (This may be desirable, since yylex()
 | 
						|
     modifies the contents of the buffer it  is  scanning.)   You
 | 
						|
     can avoid the copy by using:
 | 
						|
 | 
						|
     yy_scan_buffer(char *base, yy_size_t size)
 | 
						|
          which scans in place the buffer starting at base,  con-
 | 
						|
          sisting of size bytes, the last two bytes of which must
 | 
						|
          be YY_END_OF_BUFFER_CHAR (ASCII NUL).  These  last  two
 | 
						|
          bytes  are  not  scanned;  thus,  scanning  consists of
 | 
						|
          base[0] through base[size-2], inclusive.
 | 
						|
 | 
						|
          If you fail to set up base in this manner (i.e., forget
 | 
						|
          the   final   two  YY_END_OF_BUFFER_CHAR  bytes),  then
 | 
						|
          yy_scan_buffer()  returns  a  nil  pointer  instead  of
 | 
						|
          creating a new input buffer.
 | 
						|
 | 
						|
          The type yy_size_t is an integral type to which you can
 | 
						|
          cast  an  integer expression reflecting the size of the
 | 
						|
          buffer.
 | 
						|
 | 
						|
END-OF-FILE RULES
 | 
						|
     The special rule "<<EOF>>" indicates actions which are to be
 | 
						|
     taken  when  an  end-of-file  is  encountered  and  yywrap()
 | 
						|
     returns non-zero (i.e., indicates no further files  to  pro-
 | 
						|
     cess).  The action must finish by doing one of four things:
 | 
						|
 | 
						|
     -    assigning yyin to a new input file  (in  previous  ver-
 | 
						|
          sions  of  flex,  after doing the assignment you had to
 | 
						|
          call the special action YY_NEW_FILE; this is no  longer
 | 
						|
          necessary);
 | 
						|
 | 
						|
     -    executing a return statement;
 | 
						|
 | 
						|
     -    executing the special yyterminate() action;
 | 
						|
 | 
						|
     -    or,    switching    to    a    new     buffer     using
 | 
						|
          yy_switch_to_buffer() as shown in the example above.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   26
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     <<EOF>> rules may not be used with other patterns; they  may
 | 
						|
     only  be  qualified  with a list of start conditions.  If an
 | 
						|
     unqualified <<EOF>> rule is given, it applies to  all  start
 | 
						|
     conditions  which  do  not already have <<EOF>> actions.  To
 | 
						|
     specify an <<EOF>> rule for only the  initial  start  condi-
 | 
						|
     tion, use
 | 
						|
 | 
						|
         <INITIAL><<EOF>>
 | 
						|
 | 
						|
 | 
						|
     These rules are useful for  catching  things  like  unclosed
 | 
						|
     comments.  An example:
 | 
						|
 | 
						|
         %x quote
 | 
						|
         %%
 | 
						|
 | 
						|
         ...other rules for dealing with quotes...
 | 
						|
 | 
						|
         <quote><<EOF>>   {
 | 
						|
                  error( "unterminated quote" );
 | 
						|
                  yyterminate();
 | 
						|
                  }
 | 
						|
         <<EOF>>  {
 | 
						|
                  if ( *++filelist )
 | 
						|
                      yyin = fopen( *filelist, "r" );
 | 
						|
                  else
 | 
						|
                     yyterminate();
 | 
						|
                  }
 | 
						|
 | 
						|
 | 
						|
MISCELLANEOUS MACROS
 | 
						|
     The macro YY_USER_ACTION can be defined to provide an action
 | 
						|
     which is always executed prior to the matched rule's action.
 | 
						|
     For example, it could be #define'd to call a routine to con-
 | 
						|
     vert  yytext to lower-case.  When YY_USER_ACTION is invoked,
 | 
						|
     the variable yy_act gives the number  of  the  matched  rule
 | 
						|
     (rules  are  numbered starting with 1).  Suppose you want to
 | 
						|
     profile how often each of your rules is matched.   The  fol-
 | 
						|
     lowing would do the trick:
 | 
						|
 | 
						|
         #define YY_USER_ACTION ++ctr[yy_act]
 | 
						|
 | 
						|
     where ctr is an array to hold the counts for  the  different
 | 
						|
     rules.   Note  that  the  macro YY_NUM_RULES gives the total
 | 
						|
     number of rules (including the default rule, even if you use
 | 
						|
     -s), so a correct declaration for ctr is:
 | 
						|
 | 
						|
         int ctr[YY_NUM_RULES];
 | 
						|
 | 
						|
 | 
						|
     The macro YY_USER_INIT may be defined to provide  an  action
 | 
						|
     which  is  always executed before the first scan (and before
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   27
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     the scanner's internal initializations are done).  For exam-
 | 
						|
     ple,  it  could  be used to call a routine to read in a data
 | 
						|
     table or open a logging file.
 | 
						|
 | 
						|
     The macro yy_set_interactive(is_interactive) can be used  to
 | 
						|
     control  whether  the  current buffer is considered interac-
 | 
						|
     tive. An interactive buffer is processed  more  slowly,  but
 | 
						|
     must  be  used  when  the  scanner's  input source is indeed
 | 
						|
     interactive to avoid problems due to waiting to fill buffers
 | 
						|
     (see the discussion of the -I flag below).  A non-zero value
 | 
						|
     in the macro invocation marks the buffer as  interactive,  a
 | 
						|
     zero  value as non-interactive.  Note that use of this macro
 | 
						|
     overrides  %option  always-interactive  or  %option   never-
 | 
						|
     interactive  (see Options below).  yy_set_interactive() must
 | 
						|
     be invoked prior to beginning to scan the buffer that is (or
 | 
						|
     is not) to be considered interactive.
 | 
						|
 | 
						|
     The macro yy_set_bol(at_bol) can be used to control  whether
 | 
						|
     the  current  buffer's  scanning  context for the next token
 | 
						|
     match is done as though at the beginning of a line.  A  non-
 | 
						|
     zero macro argument makes rules anchored with
 | 
						|
 | 
						|
     The macro YY_AT_BOL() returns true if the next token scanned
 | 
						|
     from  the  current  buffer will have '^' rules active, false
 | 
						|
     otherwise.
 | 
						|
 | 
						|
     In the generated scanner, the actions are  all  gathered  in
 | 
						|
     one  large  switch  statement  and separated using YY_BREAK,
 | 
						|
     which may be redefined.  By default, it is simply a "break",
 | 
						|
     to  separate  each  rule's action from the following rule's.
 | 
						|
     Redefining  YY_BREAK  allows,  for  example,  C++  users  to
 | 
						|
     #define  YY_BREAK  to  do  nothing (while being very careful
 | 
						|
     that every rule ends with a "break" or a "return"!) to avoid
 | 
						|
     suffering  from unreachable statement warnings where because
 | 
						|
     a rule's action ends with "return", the YY_BREAK is inacces-
 | 
						|
     sible.
 | 
						|
 | 
						|
VALUES AVAILABLE TO THE USER
 | 
						|
     This section summarizes the various values available to  the
 | 
						|
     user in the rule actions.
 | 
						|
 | 
						|
     -    char *yytext holds the text of the current  token.   It
 | 
						|
          may  be  modified but not lengthened (you cannot append
 | 
						|
          characters to the end).
 | 
						|
 | 
						|
          If the special directive %array appears  in  the  first
 | 
						|
          section  of  the  scanner  description,  then yytext is
 | 
						|
          instead declared char yytext[YYLMAX], where YYLMAX is a
 | 
						|
          macro  definition  that  you  can redefine in the first
 | 
						|
          section if you don't like the default value  (generally
 | 
						|
          8KB).    Using   %array   results  in  somewhat  slower
 | 
						|
          scanners, but the value of  yytext  becomes  immune  to
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   28
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          calls to input() and unput(), which potentially destroy
 | 
						|
          its value when yytext  is  a  character  pointer.   The
 | 
						|
          opposite of %array is %pointer, which is the default.
 | 
						|
 | 
						|
          You cannot  use  %array  when  generating  C++  scanner
 | 
						|
          classes (the -+ flag).
 | 
						|
 | 
						|
     -    int yyleng holds the length of the current token.
 | 
						|
 | 
						|
     -    FILE *yyin is the file  which  by  default  flex  reads
 | 
						|
          from.   It  may  be  redefined  but doing so only makes
 | 
						|
          sense before scanning begins or after an EOF  has  been
 | 
						|
          encountered.  Changing it in the midst of scanning will
 | 
						|
          have unexpected results since flex buffers  its  input;
 | 
						|
          use  yyrestart()  instead.   Once  scanning  terminates
 | 
						|
          because an end-of-file has been seen,  you  can  assign
 | 
						|
          yyin  at  the  new input file and then call the scanner
 | 
						|
          again to continue scanning.
 | 
						|
 | 
						|
     -    void yyrestart( FILE *new_file ) may be called to point
 | 
						|
          yyin at the new input file.  The switch-over to the new
 | 
						|
          file is immediate (any previously buffered-up input  is
 | 
						|
          lost).   Note  that calling yyrestart() with yyin as an
 | 
						|
          argument thus throws away the current input buffer  and
 | 
						|
          continues scanning the same input file.
 | 
						|
 | 
						|
     -    FILE *yyout is the file to which ECHO actions are done.
 | 
						|
          It can be reassigned by the user.
 | 
						|
 | 
						|
     -    YY_CURRENT_BUFFER returns a YY_BUFFER_STATE  handle  to
 | 
						|
          the current buffer.
 | 
						|
 | 
						|
     -    YY_START returns an integer value corresponding to  the
 | 
						|
          current start condition.  You can subsequently use this
 | 
						|
          value with BEGIN to return to that start condition.
 | 
						|
 | 
						|
INTERFACING WITH YACC
 | 
						|
     One of the main uses of flex is as a companion to  the  yacc
 | 
						|
     parser-generator.   yacc  parsers  expect  to call a routine
 | 
						|
     named yylex() to find the next input token.  The routine  is
 | 
						|
     supposed  to  return  the  type of the next token as well as
 | 
						|
     putting any associated value in the global  yylval.  To  use
 | 
						|
     flex  with  yacc,  one  specifies  the  -d option to yacc to
 | 
						|
     instruct it to generate the file y.tab.h containing  defini-
 | 
						|
     tions  of all the %tokens appearing in the yacc input.  This
 | 
						|
     file is then included in the flex scanner.  For example,  if
 | 
						|
     one of the tokens is "TOK_NUMBER", part of the scanner might
 | 
						|
     look like:
 | 
						|
 | 
						|
         %{
 | 
						|
         #include "y.tab.h"
 | 
						|
         %}
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   29
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
         [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
 | 
						|
 | 
						|
 | 
						|
OPTIONS
 | 
						|
     flex has the following options:
 | 
						|
 | 
						|
     -b   Generate backing-up information to lex.backup. This  is
 | 
						|
          a  list  of scanner states which require backing up and
 | 
						|
          the input characters on which they do  so.   By  adding
 | 
						|
          rules   one  can  remove  backing-up  states.   If  all
 | 
						|
          backing-up states are eliminated  and  -Cf  or  -CF  is
 | 
						|
          used, the generated scanner will run faster (see the -p
 | 
						|
          flag).  Only users who wish to squeeze every last cycle
 | 
						|
          out  of  their  scanners  need worry about this option.
 | 
						|
          (See the section on Performance Considerations below.)
 | 
						|
 | 
						|
     -c   is a do-nothing, deprecated option included  for  POSIX
 | 
						|
          compliance.
 | 
						|
 | 
						|
     -d   makes the generated scanner run in debug  mode.   When-
 | 
						|
          ever   a   pattern   is   recognized   and  the  global
 | 
						|
          yy_flex_debug is non-zero (which is the  default),  the
 | 
						|
          scanner will write to stderr a line of the form:
 | 
						|
 | 
						|
              --accepting rule at line 53 ("the matched text")
 | 
						|
 | 
						|
          The line number refers to the location of the  rule  in
 | 
						|
          the  file defining the scanner (i.e., the file that was
 | 
						|
          fed to flex).  Messages are  also  generated  when  the
 | 
						|
          scanner backs up, accepts the default rule, reaches the
 | 
						|
          end of its input buffer (or encounters a NUL;  at  this
 | 
						|
          point,  the  two  look the same as far as the scanner's
 | 
						|
          concerned), or reaches an end-of-file.
 | 
						|
 | 
						|
     -f   specifies fast scanner. No table  compression  is  done
 | 
						|
          and  stdio  is bypassed.  The result is large but fast.
 | 
						|
          This option is equivalent to -Cfr (see below).
 | 
						|
 | 
						|
     -h   generates a "help" summary of flex's options to  stdout
 | 
						|
          and then exits.  -? and --help are synonyms for -h.
 | 
						|
 | 
						|
     -i   instructs flex to generate a case-insensitive  scanner.
 | 
						|
          The  case  of  letters given in the flex input patterns
 | 
						|
          will be ignored,  and  tokens  in  the  input  will  be
 | 
						|
          matched  regardless of case.  The matched text given in
 | 
						|
          yytext will have the preserved case (i.e., it will  not
 | 
						|
          be folded).
 | 
						|
 | 
						|
     -l   turns on maximum compatibility with the  original  AT&T
 | 
						|
          lex  implementation.  Note that this does not mean full
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   30
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          compatibility.  Use of this option costs a considerable
 | 
						|
          amount  of  performance, and it cannot be used with the
 | 
						|
          -+, -f, -F, -Cf, or -CF options.  For  details  on  the
 | 
						|
          compatibilities  it provides, see the section "Incompa-
 | 
						|
          tibilities With Lex And POSIX" below.  This option also
 | 
						|
          results  in the name YY_FLEX_LEX_COMPAT being #define'd
 | 
						|
          in the generated scanner.
 | 
						|
 | 
						|
     -n   is another do-nothing, deprecated option included  only
 | 
						|
          for POSIX compliance.
 | 
						|
 | 
						|
     -p   generates a performance report to stderr.   The  report
 | 
						|
          consists  of  comments  regarding  features of the flex
 | 
						|
          input file which will cause a serious loss  of  perfor-
 | 
						|
          mance  in  the resulting scanner.  If you give the flag
 | 
						|
          twice, you will also get  comments  regarding  features
 | 
						|
          that lead to minor performance losses.
 | 
						|
 | 
						|
          Note that the use  of  REJECT,  %option  yylineno,  and
 | 
						|
          variable  trailing context (see the Deficiencies / Bugs
 | 
						|
          section  below)  entails  a   substantial   performance
 | 
						|
          penalty;  use  of  yymore(), the ^ operator, and the -I
 | 
						|
          flag entail minor performance penalties.
 | 
						|
 | 
						|
     -s   causes the default rule (that unmatched  scanner  input
 | 
						|
          is  echoed to stdout) to be suppressed.  If the scanner
 | 
						|
          encounters input that does not match any of its  rules,
 | 
						|
          it  aborts  with  an  error.  This option is useful for
 | 
						|
          finding holes in a scanner's rule set.
 | 
						|
 | 
						|
     -t   instructs flex to write the  scanner  it  generates  to
 | 
						|
          standard output instead of lex.yy.c.
 | 
						|
 | 
						|
     -v   specifies that flex should write to stderr a summary of
 | 
						|
          statistics regarding the scanner it generates.  Most of
 | 
						|
          the statistics are meaningless to the casual flex user,
 | 
						|
          but the first line identifies the version of flex (same
 | 
						|
          as reported by -V), and the next line  the  flags  used
 | 
						|
          when  generating  the scanner, including those that are
 | 
						|
          on by default.
 | 
						|
 | 
						|
     -w   suppresses warning messages.
 | 
						|
 | 
						|
     -B   instructs flex to generate a batch scanner,  the  oppo-
 | 
						|
          site  of  interactive  scanners  generated  by  -I (see
 | 
						|
          below).  In general, you use -B when  you  are  certain
 | 
						|
          that your scanner will never be used interactively, and
 | 
						|
          you want to squeeze a little more  performance  out  of
 | 
						|
          it.   If your goal is instead to squeeze out a lot more
 | 
						|
          performance, you  should   be  using  the  -Cf  or  -CF
 | 
						|
          options  (discussed  below), which turn on -B automati-
 | 
						|
          cally anyway.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   31
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     -F   specifies that the fast  scanner  table  representation
 | 
						|
          should  be used (and stdio bypassed).  This representa-
 | 
						|
          tion is about as fast as the full table  representation
 | 
						|
          (-f),  and  for some sets of patterns will be consider-
 | 
						|
          ably smaller (and for others, larger).  In general,  if
 | 
						|
          the  pattern  set contains both "keywords" and a catch-
 | 
						|
          all, "identifier" rule, such as in the set:
 | 
						|
 | 
						|
              "case"    return TOK_CASE;
 | 
						|
              "switch"  return TOK_SWITCH;
 | 
						|
              ...
 | 
						|
              "default" return TOK_DEFAULT;
 | 
						|
              [a-z]+    return TOK_ID;
 | 
						|
 | 
						|
          then you're better off using the full table representa-
 | 
						|
          tion.  If only the "identifier" rule is present and you
 | 
						|
          then use a hash table or some such to detect  the  key-
 | 
						|
          words, you're better off using -F.
 | 
						|
 | 
						|
          This option is equivalent to -CFr (see below).  It can-
 | 
						|
          not be used with -+.
 | 
						|
 | 
						|
     -I   instructs flex to generate an interactive scanner.   An
 | 
						|
          interactive  scanner  is  one  that only looks ahead to
 | 
						|
          decide what token has been  matched  if  it  absolutely
 | 
						|
          must.  It turns out that always looking one extra char-
 | 
						|
          acter ahead, even  if  the  scanner  has  already  seen
 | 
						|
          enough text to disambiguate the current token, is a bit
 | 
						|
          faster than only looking  ahead  when  necessary.   But
 | 
						|
          scanners  that always look ahead give dreadful interac-
 | 
						|
          tive performance; for example, when a user types a new-
 | 
						|
          line,  it  is  not  recognized as a newline token until
 | 
						|
          they enter another token, which often means  typing  in
 | 
						|
          another whole line.
 | 
						|
 | 
						|
          Flex scanners default to interactive unless you use the
 | 
						|
          -Cf  or  -CF  table-compression  options  (see  below).
 | 
						|
          That's because if you're looking  for  high-performance
 | 
						|
          you  should  be  using  one of these options, so if you
 | 
						|
          didn't, flex assumes you'd rather trade off  a  bit  of
 | 
						|
          run-time    performance   for   intuitive   interactive
 | 
						|
          behavior.  Note also that you cannot use -I in conjunc-
 | 
						|
          tion  with  -Cf or -CF. Thus, this option is not really
 | 
						|
          needed; it is on by default  for  all  those  cases  in
 | 
						|
          which it is allowed.
 | 
						|
 | 
						|
          You can force a scanner to not be interactive by  using
 | 
						|
          -B (see above).
 | 
						|
 | 
						|
     -L   instructs  flex  not  to  generate  #line   directives.
 | 
						|
          Without this option, flex peppers the generated scanner
 | 
						|
          with #line directives so error messages in the  actions
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   32
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          will  be  correctly  located with respect to either the
 | 
						|
          original flex input file (if the errors are due to code
 | 
						|
          in  the  input  file),  or  lex.yy.c (if the errors are
 | 
						|
          flex's fault -- you should report these sorts of errors
 | 
						|
          to the email address given below).
 | 
						|
 | 
						|
     -T   makes flex run in trace mode.  It will generate  a  lot
 | 
						|
          of  messages to stderr concerning the form of the input
 | 
						|
          and the resultant non-deterministic  and  deterministic
 | 
						|
          finite  automata.   This  option  is  mostly for use in
 | 
						|
          maintaining flex.
 | 
						|
 | 
						|
     -V   prints the version number to stdout and exits.   --ver-
 | 
						|
          sion is a synonym for -V.
 | 
						|
 | 
						|
     -7   instructs flex to generate a 7-bit scanner,  i.e.,  one
 | 
						|
          which  can  only  recognized  7-bit  characters  in its
 | 
						|
          input.  The advantage of using -7 is that the scanner's
 | 
						|
          tables  can  be  up to half the size of those generated
 | 
						|
          using the -8 option (see below).  The  disadvantage  is
 | 
						|
          that  such  scanners often hang or crash if their input
 | 
						|
          contains an 8-bit character.
 | 
						|
 | 
						|
          Note, however, that unless you  generate  your  scanner
 | 
						|
          using  the -Cf or -CF table compression options, use of
 | 
						|
          -7 will save only a small amount of  table  space,  and
 | 
						|
          make  your  scanner considerably less portable.  Flex's
 | 
						|
          default behavior is to generate an 8-bit scanner unless
 | 
						|
          you  use the -Cf or -CF, in which case flex defaults to
 | 
						|
          generating 7-bit scanners unless your site  was  always
 | 
						|
          configured to generate 8-bit scanners (as will often be
 | 
						|
          the case with non-USA sites).   You  can  tell  whether
 | 
						|
          flex  generated a 7-bit or an 8-bit scanner by inspect-
 | 
						|
          ing the flag summary in  the  -v  output  as  described
 | 
						|
          above.
 | 
						|
 | 
						|
          Note that if you use -Cfe or -CFe (those table compres-
 | 
						|
          sion  options,  but  also  using equivalence classes as
 | 
						|
          discussed see below), flex still defaults to generating
 | 
						|
          an  8-bit scanner, since usually with these compression
 | 
						|
          options full 8-bit tables are not much  more  expensive
 | 
						|
          than 7-bit tables.
 | 
						|
 | 
						|
     -8   instructs flex to generate an 8-bit scanner, i.e.,  one
 | 
						|
          which  can  recognize  8-bit  characters.  This flag is
 | 
						|
          only needed for scanners generated using -Cf or -CF, as
 | 
						|
          otherwise  flex defaults to generating an 8-bit scanner
 | 
						|
          anyway.
 | 
						|
 | 
						|
          See the discussion  of  -7  above  for  flex's  default
 | 
						|
          behavior  and  the  tradeoffs  between  7-bit and 8-bit
 | 
						|
          scanners.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   33
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     -+   specifies that you want flex to generate a C++  scanner
 | 
						|
          class.   See  the  section  on  Generating C++ Scanners
 | 
						|
          below for details.
 | 
						|
 | 
						|
     -C[aefFmr]
 | 
						|
          controls the degree of table compression and, more gen-
 | 
						|
          erally,  trade-offs  between  small  scanners  and fast
 | 
						|
          scanners.
 | 
						|
 | 
						|
          -Ca ("align") instructs flex to trade off larger tables
 | 
						|
          in the generated scanner for faster performance because
 | 
						|
          the elements of  the  tables  are  better  aligned  for
 | 
						|
          memory  access and computation.  On some RISC architec-
 | 
						|
          tures, fetching  and  manipulating  longwords  is  more
 | 
						|
          efficient  than with smaller-sized units such as short-
 | 
						|
          words.  This option can double the size of  the  tables
 | 
						|
          used by your scanner.
 | 
						|
 | 
						|
          -Ce directs  flex  to  construct  equivalence  classes,
 | 
						|
          i.e.,  sets  of characters which have identical lexical
 | 
						|
          properties (for example,  if  the  only  appearance  of
 | 
						|
          digits  in  the  flex  input  is in the character class
 | 
						|
          "[0-9]" then the digits '0', '1', ..., '9' will all  be
 | 
						|
          put   in  the  same  equivalence  class).   Equivalence
 | 
						|
          classes usually give dramatic reductions in  the  final
 | 
						|
          table/object file sizes (typically a factor of 2-5) and
 | 
						|
          are pretty cheap performance-wise  (one  array  look-up
 | 
						|
          per character scanned).
 | 
						|
 | 
						|
          -Cf specifies that the full scanner  tables  should  be
 | 
						|
          generated - flex should not compress the tables by tak-
 | 
						|
          ing advantages of similar transition functions for dif-
 | 
						|
          ferent states.
 | 
						|
 | 
						|
          -CF specifies that the alternate fast scanner represen-
 | 
						|
          tation  (described  above  under the -F flag) should be
 | 
						|
          used.  This option cannot be used with -+.
 | 
						|
 | 
						|
          -Cm directs flex to construct meta-equivalence classes,
 | 
						|
          which  are  sets of equivalence classes (or characters,
 | 
						|
          if equivalence classes are not  being  used)  that  are
 | 
						|
          commonly  used  together.  Meta-equivalence classes are
 | 
						|
          often a big win when using compressed tables, but  they
 | 
						|
          have  a  moderate  performance  impact (one or two "if"
 | 
						|
          tests and one array look-up per character scanned).
 | 
						|
 | 
						|
          -Cr causes the generated scanner to bypass use  of  the
 | 
						|
          standard  I/O  library  (stdio)  for input.  Instead of
 | 
						|
          calling fread() or getc(), the  scanner  will  use  the
 | 
						|
          read()  system  call,  resulting  in a performance gain
 | 
						|
          which varies from system to system, but in  general  is
 | 
						|
          probably  negligible  unless  you are also using -Cf or
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   34
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          -CF. Using -Cr can cause strange behavior if, for exam-
 | 
						|
          ple,  you  read  from yyin using stdio prior to calling
 | 
						|
          the scanner (because the  scanner  will  miss  whatever
 | 
						|
          text  your  previous  reads  left  in  the  stdio input
 | 
						|
          buffer).
 | 
						|
 | 
						|
          -Cr has no effect if you define YY_INPUT (see The  Gen-
 | 
						|
          erated Scanner above).
 | 
						|
 | 
						|
          A lone -C specifies that the scanner tables  should  be
 | 
						|
          compressed  but  neither  equivalence classes nor meta-
 | 
						|
          equivalence classes should be used.
 | 
						|
 | 
						|
          The options -Cf or  -CF  and  -Cm  do  not  make  sense
 | 
						|
          together - there is no opportunity for meta-equivalence
 | 
						|
          classes if the table is not being  compressed.   Other-
 | 
						|
          wise  the  options may be freely mixed, and are cumula-
 | 
						|
          tive.
 | 
						|
 | 
						|
          The default setting is -Cem, which specifies that  flex
 | 
						|
          should   generate   equivalence   classes   and   meta-
 | 
						|
          equivalence classes.  This setting provides the highest
 | 
						|
          degree   of  table  compression.   You  can  trade  off
 | 
						|
          faster-executing scanners at the cost of larger  tables
 | 
						|
          with the following generally being true:
 | 
						|
 | 
						|
              slowest & smallest
 | 
						|
                    -Cem
 | 
						|
                    -Cm
 | 
						|
                    -Ce
 | 
						|
                    -C
 | 
						|
                    -C{f,F}e
 | 
						|
                    -C{f,F}
 | 
						|
                    -C{f,F}a
 | 
						|
              fastest & largest
 | 
						|
 | 
						|
          Note that scanners with the smallest tables are usually
 | 
						|
          generated and compiled the quickest, so during develop-
 | 
						|
          ment you will usually want to use the default,  maximal
 | 
						|
          compression.
 | 
						|
 | 
						|
          -Cfe is often a good compromise between speed and  size
 | 
						|
          for production scanners.
 | 
						|
 | 
						|
     -ooutput
 | 
						|
          directs flex to write the scanner to  the  file  output
 | 
						|
          instead  of  lex.yy.c.  If  you  combine -o with the -t
 | 
						|
          option, then the scanner is written to stdout  but  its
 | 
						|
          #line directives (see the -L option above) refer to the
 | 
						|
          file output.
 | 
						|
 | 
						|
     -Pprefix
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   35
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          changes the default yy prefix  used  by  flex  for  all
 | 
						|
          globally-visible variable and function names to instead
 | 
						|
          be prefix. For  example,  -Pfoo  changes  the  name  of
 | 
						|
          yytext  to  footext.  It  also  changes the name of the
 | 
						|
          default output file from lex.yy.c  to  lex.foo.c.  Here
 | 
						|
          are all of the names affected:
 | 
						|
 | 
						|
              yy_create_buffer
 | 
						|
              yy_delete_buffer
 | 
						|
              yy_flex_debug
 | 
						|
              yy_init_buffer
 | 
						|
              yy_flush_buffer
 | 
						|
              yy_load_buffer_state
 | 
						|
              yy_switch_to_buffer
 | 
						|
              yyin
 | 
						|
              yyleng
 | 
						|
              yylex
 | 
						|
              yylineno
 | 
						|
              yyout
 | 
						|
              yyrestart
 | 
						|
              yytext
 | 
						|
              yywrap
 | 
						|
 | 
						|
          (If you are using a C++ scanner, then only  yywrap  and
 | 
						|
          yyFlexLexer  are affected.) Within your scanner itself,
 | 
						|
          you can still refer to the global variables  and  func-
 | 
						|
          tions  using  either  version of their name; but exter-
 | 
						|
          nally, they have the modified name.
 | 
						|
 | 
						|
          This option lets you easily link together multiple flex
 | 
						|
          programs  into the same executable.  Note, though, that
 | 
						|
          using this option also renames  yywrap(),  so  you  now
 | 
						|
          must either provide your own (appropriately-named) ver-
 | 
						|
          sion of the routine for your scanner,  or  use  %option
 | 
						|
          noyywrap,  as  linking with -lfl no longer provides one
 | 
						|
          for you by default.
 | 
						|
 | 
						|
     -Sskeleton_file
 | 
						|
          overrides the default skeleton  file  from  which  flex
 | 
						|
          constructs its scanners.  You'll never need this option
 | 
						|
          unless you are doing flex maintenance or development.
 | 
						|
 | 
						|
     flex also  provides  a  mechanism  for  controlling  options
 | 
						|
     within  the  scanner  specification itself, rather than from
 | 
						|
     the flex command-line.  This is done  by  including  %option
 | 
						|
     directives  in  the  first section of the scanner specifica-
 | 
						|
     tion.  You  can  specify  multiple  options  with  a  single
 | 
						|
     %option directive, and multiple directives in the first sec-
 | 
						|
     tion of your flex input file.
 | 
						|
 | 
						|
     Most options are given simply as names, optionally  preceded
 | 
						|
     by  the word "no" (with no intervening whitespace) to negate
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   36
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     their meaning.  A number are equivalent  to  flex  flags  or
 | 
						|
     their negation:
 | 
						|
 | 
						|
         7bit            -7 option
 | 
						|
         8bit            -8 option
 | 
						|
         align           -Ca option
 | 
						|
         backup          -b option
 | 
						|
         batch           -B option
 | 
						|
         c++             -+ option
 | 
						|
 | 
						|
         caseful or
 | 
						|
         case-sensitive  opposite of -i (default)
 | 
						|
 | 
						|
         case-insensitive or
 | 
						|
         caseless        -i option
 | 
						|
 | 
						|
         debug           -d option
 | 
						|
         default         opposite of -s option
 | 
						|
         ecs             -Ce option
 | 
						|
         fast            -F option
 | 
						|
         full            -f option
 | 
						|
         interactive     -I option
 | 
						|
         lex-compat      -l option
 | 
						|
         meta-ecs        -Cm option
 | 
						|
         perf-report     -p option
 | 
						|
         read            -Cr option
 | 
						|
         stdout          -t option
 | 
						|
         verbose         -v option
 | 
						|
         warn            opposite of -w option
 | 
						|
                         (use "%option nowarn" for -w)
 | 
						|
 | 
						|
         array           equivalent to "%array"
 | 
						|
         pointer         equivalent to "%pointer" (default)
 | 
						|
 | 
						|
     Some %option's provide features otherwise not available:
 | 
						|
 | 
						|
     always-interactive
 | 
						|
          instructs flex to generate a scanner which always  con-
 | 
						|
          siders  its input "interactive".  Normally, on each new
 | 
						|
          input file the scanner calls isatty() in an attempt  to
 | 
						|
          determine   whether   the  scanner's  input  source  is
 | 
						|
          interactive and thus should be read a  character  at  a
 | 
						|
          time.   When this option is used, however, then no such
 | 
						|
          call is made.
 | 
						|
 | 
						|
     main directs flex to provide a default  main()  program  for
 | 
						|
          the  scanner,  which  simply calls yylex(). This option
 | 
						|
          implies noyywrap (see below).
 | 
						|
 | 
						|
     never-interactive
 | 
						|
          instructs flex to generate a scanner which  never  con-
 | 
						|
          siders  its input "interactive" (again, no call made to
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   37
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          isatty()). This is the opposite of always-interactive.
 | 
						|
 | 
						|
     stack
 | 
						|
          enables the use of start condition  stacks  (see  Start
 | 
						|
          Conditions above).
 | 
						|
 | 
						|
     stdinit
 | 
						|
          if set (i.e., %option  stdinit)  initializes  yyin  and
 | 
						|
          yyout  to  stdin  and stdout, instead of the default of
 | 
						|
          nil.  Some  existing  lex  programs  depend   on   this
 | 
						|
          behavior,  even though it is not compliant with ANSI C,
 | 
						|
          which does not require stdin and stdout to be  compile-
 | 
						|
          time constant.
 | 
						|
 | 
						|
     yylineno
 | 
						|
          directs flex to generate a scanner that  maintains  the
 | 
						|
          number  of  the current line read from its input in the
 | 
						|
          global variable yylineno. This  option  is  implied  by
 | 
						|
          %option lex-compat.
 | 
						|
 | 
						|
     yywrap
 | 
						|
          if unset (i.e., %option noyywrap),  makes  the  scanner
 | 
						|
          not  call  yywrap()  upon  an  end-of-file,  but simply
 | 
						|
          assume that there are no more files to scan (until  the
 | 
						|
          user  points  yyin  at  a  new  file  and calls yylex()
 | 
						|
          again).
 | 
						|
 | 
						|
     flex scans your rule actions to determine  whether  you  use
 | 
						|
     the  REJECT  or  yymore()  features.   The reject and yymore
 | 
						|
     options are available to override its decision as to whether
 | 
						|
     you  use  the options, either by setting them (e.g., %option
 | 
						|
     reject) to indicate the feature is indeed used, or unsetting
 | 
						|
     them  to  indicate  it  actually  is not used (e.g., %option
 | 
						|
     noyymore).
 | 
						|
 | 
						|
     Three options take string-delimited values, offset with '=':
 | 
						|
 | 
						|
         %option outfile="ABC"
 | 
						|
 | 
						|
     is equivalent to -oABC, and
 | 
						|
 | 
						|
         %option prefix="XYZ"
 | 
						|
 | 
						|
     is equivalent to -PXYZ. Finally,
 | 
						|
 | 
						|
         %option yyclass="foo"
 | 
						|
 | 
						|
     only applies when generating a C++ scanner ( -+ option).  It
 | 
						|
     informs  flex  that  you  have  derived foo as a subclass of
 | 
						|
     yyFlexLexer, so flex will place your actions in  the  member
 | 
						|
     function  foo::yylex()  instead  of yyFlexLexer::yylex(). It
 | 
						|
     also generates a yyFlexLexer::yylex() member  function  that
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   38
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     emits      a      run-time      error      (by      invoking
 | 
						|
     yyFlexLexer::LexerError()) if called.   See  Generating  C++
 | 
						|
     Scanners, below, for additional information.
 | 
						|
 | 
						|
     A number of options are available for lint purists who  want
 | 
						|
     to  suppress the appearance of unneeded routines in the gen-
 | 
						|
     erated scanner.  Each of  the  following,  if  unset  (e.g.,
 | 
						|
     %option  nounput ), results in the corresponding routine not
 | 
						|
     appearing in the generated scanner:
 | 
						|
 | 
						|
         input, unput
 | 
						|
         yy_push_state, yy_pop_state, yy_top_state
 | 
						|
         yy_scan_buffer, yy_scan_bytes, yy_scan_string
 | 
						|
 | 
						|
     (though yy_push_state()  and  friends  won't  appear  anyway
 | 
						|
     unless you use %option stack).
 | 
						|
 | 
						|
PERFORMANCE CONSIDERATIONS
 | 
						|
     The main design goal of  flex  is  that  it  generate  high-
 | 
						|
     performance  scanners.   It  has  been optimized for dealing
 | 
						|
     well with large sets of rules.  Aside from  the  effects  on
 | 
						|
     scanner  speed  of the table compression -C options outlined
 | 
						|
     above, there are a number of options/actions  which  degrade
 | 
						|
     performance.  These are, from most expensive to least:
 | 
						|
 | 
						|
         REJECT
 | 
						|
         %option yylineno
 | 
						|
         arbitrary trailing context
 | 
						|
 | 
						|
         pattern sets that require backing up
 | 
						|
         %array
 | 
						|
         %option interactive
 | 
						|
         %option always-interactive
 | 
						|
 | 
						|
         '^' beginning-of-line operator
 | 
						|
         yymore()
 | 
						|
 | 
						|
     with the first three all being quite expensive and the  last
 | 
						|
     two  being  quite  cheap.   Note also that unput() is imple-
 | 
						|
     mented as a routine call that potentially does quite  a  bit
 | 
						|
     of  work,  while yyless() is a quite-cheap macro; so if just
 | 
						|
     putting back some excess text you scanned, use yyless().
 | 
						|
 | 
						|
     REJECT should be avoided at all costs  when  performance  is
 | 
						|
     important.  It is a particularly expensive option.
 | 
						|
 | 
						|
     Getting rid of backing up is messy and often may be an enor-
 | 
						|
     mous  amount  of work for a complicated scanner.  In princi-
 | 
						|
     pal,  one  begins  by  using  the  -b  flag  to  generate  a
 | 
						|
     lex.backup file.  For example, on the input
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   39
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         foo        return TOK_KEYWORD;
 | 
						|
         foobar     return TOK_KEYWORD;
 | 
						|
 | 
						|
     the file looks like:
 | 
						|
 | 
						|
         State #6 is non-accepting -
 | 
						|
          associated rule line numbers:
 | 
						|
                2       3
 | 
						|
          out-transitions: [ o ]
 | 
						|
          jam-transitions: EOF [ \001-n  p-\177 ]
 | 
						|
 | 
						|
         State #8 is non-accepting -
 | 
						|
          associated rule line numbers:
 | 
						|
                3
 | 
						|
          out-transitions: [ a ]
 | 
						|
          jam-transitions: EOF [ \001-`  b-\177 ]
 | 
						|
 | 
						|
         State #9 is non-accepting -
 | 
						|
          associated rule line numbers:
 | 
						|
                3
 | 
						|
          out-transitions: [ r ]
 | 
						|
          jam-transitions: EOF [ \001-q  s-\177 ]
 | 
						|
 | 
						|
         Compressed tables always back up.
 | 
						|
 | 
						|
     The first few lines tell us that there's a scanner state  in
 | 
						|
     which  it  can  make  a  transition on an 'o' but not on any
 | 
						|
     other character,  and  that  in  that  state  the  currently
 | 
						|
     scanned text does not match any rule.  The state occurs when
 | 
						|
     trying to match the rules found at lines  2  and  3  in  the
 | 
						|
     input  file.  If the scanner is in that state and then reads
 | 
						|
     something other than an 'o', it will have to back up to find
 | 
						|
     a  rule  which is matched.  With a bit of headscratching one
 | 
						|
     can see that this must be the state it's in when it has seen
 | 
						|
     "fo".   When  this  has  happened,  if  anything  other than
 | 
						|
     another 'o' is seen, the scanner will have  to  back  up  to
 | 
						|
     simply match the 'f' (by the default rule).
 | 
						|
 | 
						|
     The comment regarding State #8 indicates there's  a  problem
 | 
						|
     when  "foob"  has  been  scanned.   Indeed, on any character
 | 
						|
     other than an 'a', the scanner  will  have  to  back  up  to
 | 
						|
     accept  "foo".  Similarly, the comment for State #9 concerns
 | 
						|
     when "fooba" has been scanned and an 'r' does not follow.
 | 
						|
 | 
						|
     The final comment reminds us that there's no point going  to
 | 
						|
     all the trouble of removing backing up from the rules unless
 | 
						|
     we're using -Cf or -CF, since there's  no  performance  gain
 | 
						|
     doing so with compressed scanners.
 | 
						|
 | 
						|
     The way to remove the backing up is to add "error" rules:
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   40
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         foo         return TOK_KEYWORD;
 | 
						|
         foobar      return TOK_KEYWORD;
 | 
						|
 | 
						|
         fooba       |
 | 
						|
         foob        |
 | 
						|
         fo          {
 | 
						|
                     /* false alarm, not really a keyword */
 | 
						|
                     return TOK_ID;
 | 
						|
                     }
 | 
						|
 | 
						|
 | 
						|
     Eliminating backing up among a list of keywords can also  be
 | 
						|
     done using a "catch-all" rule:
 | 
						|
 | 
						|
         %%
 | 
						|
         foo         return TOK_KEYWORD;
 | 
						|
         foobar      return TOK_KEYWORD;
 | 
						|
 | 
						|
         [a-z]+      return TOK_ID;
 | 
						|
 | 
						|
     This is usually the best solution when appropriate.
 | 
						|
 | 
						|
     Backing up messages tend to cascade.  With a complicated set
 | 
						|
     of  rules it's not uncommon to get hundreds of messages.  If
 | 
						|
     one can decipher them, though, it often only takes  a  dozen
 | 
						|
     or so rules to eliminate the backing up (though it's easy to
 | 
						|
     make a mistake and have an error rule accidentally  match  a
 | 
						|
     valid  token.   A  possible  future  flex feature will be to
 | 
						|
     automatically add rules to eliminate backing up).
 | 
						|
 | 
						|
     It's important to keep in mind that you gain the benefits of
 | 
						|
     eliminating  backing up only if you eliminate every instance
 | 
						|
     of backing up.  Leaving just one means you gain nothing.
 | 
						|
 | 
						|
     Variable trailing context (where both the leading and trail-
 | 
						|
     ing  parts  do  not  have a fixed length) entails almost the
 | 
						|
     same performance loss as  REJECT  (i.e.,  substantial).   So
 | 
						|
     when possible a rule like:
 | 
						|
 | 
						|
         %%
 | 
						|
         mouse|rat/(cat|dog)   run();
 | 
						|
 | 
						|
     is better written:
 | 
						|
 | 
						|
         %%
 | 
						|
         mouse/cat|dog         run();
 | 
						|
         rat/cat|dog           run();
 | 
						|
 | 
						|
     or as
 | 
						|
 | 
						|
         %%
 | 
						|
         mouse|rat/cat         run();
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   41
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
         mouse|rat/dog         run();
 | 
						|
 | 
						|
     Note that here the special '|' action does not  provide  any
 | 
						|
     savings,  and can even make things worse (see Deficiencies /
 | 
						|
     Bugs below).
 | 
						|
 | 
						|
     Another area where the user can increase a scanner's perfor-
 | 
						|
     mance  (and  one that's easier to implement) arises from the
 | 
						|
     fact that the longer the  tokens  matched,  the  faster  the
 | 
						|
     scanner will run.  This is because with long tokens the pro-
 | 
						|
     cessing of most input characters takes place in the  (short)
 | 
						|
     inner  scanning  loop, and does not often have to go through
 | 
						|
     the additional work of setting up the  scanning  environment
 | 
						|
     (e.g.,  yytext)  for  the  action.  Recall the scanner for C
 | 
						|
     comments:
 | 
						|
 | 
						|
         %x comment
 | 
						|
         %%
 | 
						|
                 int line_num = 1;
 | 
						|
 | 
						|
         "/*"         BEGIN(comment);
 | 
						|
 | 
						|
         <comment>[^*\n]*
 | 
						|
         <comment>"*"+[^*/\n]*
 | 
						|
         <comment>\n             ++line_num;
 | 
						|
         <comment>"*"+"/"        BEGIN(INITIAL);
 | 
						|
 | 
						|
     This could be sped up by writing it as:
 | 
						|
 | 
						|
         %x comment
 | 
						|
         %%
 | 
						|
                 int line_num = 1;
 | 
						|
 | 
						|
         "/*"         BEGIN(comment);
 | 
						|
 | 
						|
         <comment>[^*\n]*
 | 
						|
         <comment>[^*\n]*\n      ++line_num;
 | 
						|
         <comment>"*"+[^*/\n]*
 | 
						|
         <comment>"*"+[^*/\n]*\n ++line_num;
 | 
						|
         <comment>"*"+"/"        BEGIN(INITIAL);
 | 
						|
 | 
						|
     Now instead of each  newline  requiring  the  processing  of
 | 
						|
     another  action,  recognizing  the newlines is "distributed"
 | 
						|
     over the other rules to keep the matched  text  as  long  as
 | 
						|
     possible.   Note  that  adding  rules does not slow down the
 | 
						|
     scanner!  The speed of the scanner  is  independent  of  the
 | 
						|
     number  of  rules or (modulo the considerations given at the
 | 
						|
     beginning of this section) how  complicated  the  rules  are
 | 
						|
     with regard to operators such as '*' and '|'.
 | 
						|
 | 
						|
     A final example in speeding up a scanner: suppose  you  want
 | 
						|
     to  scan through a file containing identifiers and keywords,
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   42
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     one per line and with no other  extraneous  characters,  and
 | 
						|
     recognize all the keywords.  A natural first approach is:
 | 
						|
 | 
						|
         %%
 | 
						|
         asm      |
 | 
						|
         auto     |
 | 
						|
         break    |
 | 
						|
         ... etc ...
 | 
						|
         volatile |
 | 
						|
         while    /* it's a keyword */
 | 
						|
 | 
						|
         .|\n     /* it's not a keyword */
 | 
						|
 | 
						|
     To eliminate the back-tracking, introduce a catch-all rule:
 | 
						|
 | 
						|
         %%
 | 
						|
         asm      |
 | 
						|
         auto     |
 | 
						|
         break    |
 | 
						|
         ... etc ...
 | 
						|
         volatile |
 | 
						|
         while    /* it's a keyword */
 | 
						|
 | 
						|
         [a-z]+   |
 | 
						|
         .|\n     /* it's not a keyword */
 | 
						|
 | 
						|
     Now, if it's guaranteed that there's exactly  one  word  per
 | 
						|
     line,  then  we  can reduce the total number of matches by a
 | 
						|
     half by merging in the recognition of newlines with that  of
 | 
						|
     the other tokens:
 | 
						|
 | 
						|
         %%
 | 
						|
         asm\n    |
 | 
						|
         auto\n   |
 | 
						|
         break\n  |
 | 
						|
         ... etc ...
 | 
						|
         volatile\n |
 | 
						|
         while\n  /* it's a keyword */
 | 
						|
 | 
						|
         [a-z]+\n |
 | 
						|
         .|\n     /* it's not a keyword */
 | 
						|
 | 
						|
     One has to be careful here,  as  we  have  now  reintroduced
 | 
						|
     backing  up  into the scanner.  In particular, while we know
 | 
						|
     that there will never be any characters in the input  stream
 | 
						|
     other  than letters or newlines, flex can't figure this out,
 | 
						|
     and it will plan for possibly needing to back up when it has
 | 
						|
     scanned  a  token like "auto" and then the next character is
 | 
						|
     something other than a newline or a letter.   Previously  it
 | 
						|
     would  then  just match the "auto" rule and be done, but now
 | 
						|
     it has no "auto" rule, only a "auto\n" rule.   To  eliminate
 | 
						|
     the possibility of backing up, we could either duplicate all
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   43
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     rules but without final newlines, or, since we never  expect
 | 
						|
     to  encounter  such  an  input  and therefore don't how it's
 | 
						|
     classified, we can introduce one more catch-all  rule,  this
 | 
						|
     one which doesn't include a newline:
 | 
						|
 | 
						|
         %%
 | 
						|
         asm\n    |
 | 
						|
         auto\n   |
 | 
						|
         break\n  |
 | 
						|
         ... etc ...
 | 
						|
         volatile\n |
 | 
						|
         while\n  /* it's a keyword */
 | 
						|
 | 
						|
         [a-z]+\n |
 | 
						|
         [a-z]+   |
 | 
						|
         .|\n     /* it's not a keyword */
 | 
						|
 | 
						|
     Compiled with -Cf, this is about as fast as one  can  get  a
 | 
						|
     flex scanner to go for this particular problem.
 | 
						|
 | 
						|
     A final note: flex is slow when matching NUL's, particularly
 | 
						|
     when  a  token  contains multiple NUL's.  It's best to write
 | 
						|
     rules which match short amounts of text if it's  anticipated
 | 
						|
     that the text will often include NUL's.
 | 
						|
 | 
						|
     Another final note regarding performance: as mentioned above
 | 
						|
     in  the section How the Input is Matched, dynamically resiz-
 | 
						|
     ing yytext to accommodate huge  tokens  is  a  slow  process
 | 
						|
     because  it presently requires that the (huge) token be res-
 | 
						|
     canned from the beginning.  Thus if  performance  is  vital,
 | 
						|
     you  should  attempt to match "large" quantities of text but
 | 
						|
     not "huge" quantities, where the cutoff between the  two  is
 | 
						|
     at about 8K characters/token.
 | 
						|
 | 
						|
GENERATING C++ SCANNERS
 | 
						|
     flex provides two different ways to  generate  scanners  for
 | 
						|
     use  with C++.  The first way is to simply compile a scanner
 | 
						|
     generated by flex using a C++ compiler instead of a  C  com-
 | 
						|
     piler.   You  should  not  encounter any compilations errors
 | 
						|
     (please report any you find to the email  address  given  in
 | 
						|
     the  Author  section  below).   You can then use C++ code in
 | 
						|
     your rule actions instead of C code.  Note that the  default
 | 
						|
     input  source  for  your  scanner  remains yyin, and default
 | 
						|
     echoing is still done to yyout. Both of these remain FILE  *
 | 
						|
     variables and not C++ streams.
 | 
						|
 | 
						|
     You can also use flex to generate a C++ scanner class, using
 | 
						|
     the  -+  option  (or,  equivalently,  %option c++), which is
 | 
						|
     automatically specified if the name of the  flex  executable
 | 
						|
     ends  in a '+', such as flex++. When using this option, flex
 | 
						|
     defaults to generating the scanner  to  the  file  lex.yy.cc
 | 
						|
     instead  of  lex.yy.c.  The  generated  scanner includes the
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   44
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     header file FlexLexer.h, which defines the interface to  two
 | 
						|
     C++ classes.
 | 
						|
 | 
						|
     The first class, FlexLexer, provides an abstract base  class
 | 
						|
     defining  the  general scanner class interface.  It provides
 | 
						|
     the following member functions:
 | 
						|
 | 
						|
     const char* YYText()
 | 
						|
          returns the text of the most  recently  matched  token,
 | 
						|
          the equivalent of yytext.
 | 
						|
 | 
						|
     int YYLeng()
 | 
						|
          returns the length of the most recently matched  token,
 | 
						|
          the equivalent of yyleng.
 | 
						|
 | 
						|
     int lineno() const
 | 
						|
          returns the current  input  line  number  (see  %option
 | 
						|
          yylineno), or 1 if %option yylineno was not used.
 | 
						|
 | 
						|
     void set_debug( int flag )
 | 
						|
          sets the debugging flag for the scanner, equivalent  to
 | 
						|
          assigning  to  yy_flex_debug  (see  the Options section
 | 
						|
          above).  Note that you must  build  the  scanner  using
 | 
						|
          %option debug to include debugging information in it.
 | 
						|
 | 
						|
     int debug() const
 | 
						|
          returns the current setting of the debugging flag.
 | 
						|
 | 
						|
     Also   provided   are   member   functions   equivalent   to
 | 
						|
     yy_switch_to_buffer(),  yy_create_buffer() (though the first
 | 
						|
     argument is an istream* object pointer  and  not  a  FILE*),
 | 
						|
     yy_flush_buffer(),   yy_delete_buffer(),   and   yyrestart()
 | 
						|
     (again, the first argument is a istream* object pointer).
 | 
						|
 | 
						|
     The second class  defined  in  FlexLexer.h  is  yyFlexLexer,
 | 
						|
     which  is  derived  from FlexLexer. It defines the following
 | 
						|
     additional member functions:
 | 
						|
 | 
						|
     yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
 | 
						|
          constructs a yyFlexLexer object using the given streams
 | 
						|
          for  input  and  output.  If not specified, the streams
 | 
						|
          default to cin and cout, respectively.
 | 
						|
 | 
						|
     virtual int yylex()
 | 
						|
          performs the same role is  yylex()  does  for  ordinary
 | 
						|
          flex  scanners:  it  scans  the input stream, consuming
 | 
						|
          tokens, until a rule's action returns a value.  If  you
 | 
						|
          derive a subclass S from yyFlexLexer and want to access
 | 
						|
          the member functions and variables of S inside yylex(),
 | 
						|
          then you need to use %option yyclass="S" to inform flex
 | 
						|
          that you will be using that subclass instead of yyFlex-
 | 
						|
          Lexer.   In   this   case,   rather   than   generating
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   45
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          yyFlexLexer::yylex(), flex  generates  S::yylex()  (and
 | 
						|
          also  generates a dummy yyFlexLexer::yylex() that calls
 | 
						|
          yyFlexLexer::LexerError() if called).
 | 
						|
 | 
						|
     virtual void switch_streams(istream* new_in = 0,
 | 
						|
          ostream* new_out = 0)  reassigns  yyin  to  new_in  (if
 | 
						|
          non-nil)  and  yyout  to  new_out (ditto), deleting the
 | 
						|
          previous input buffer if yyin is reassigned.
 | 
						|
 | 
						|
     int yylex( istream* new_in, ostream* new_out = 0 )
 | 
						|
          first switches the input  streams  via  switch_streams(
 | 
						|
          new_in,  new_out  )  and  then  returns  the  value  of
 | 
						|
          yylex().
 | 
						|
 | 
						|
     In addition, yyFlexLexer  defines  the  following  protected
 | 
						|
     virtual  functions which you can redefine in derived classes
 | 
						|
     to tailor the scanner:
 | 
						|
 | 
						|
     virtual int LexerInput( char* buf, int max_size )
 | 
						|
          reads up to max_size characters into  buf  and  returns
 | 
						|
          the  number  of  characters  read.  To indicate end-of-
 | 
						|
          input, return 0 characters.   Note  that  "interactive"
 | 
						|
          scanners  (see  the  -B  and -I flags) define the macro
 | 
						|
          YY_INTERACTIVE. If you redefine LexerInput()  and  need
 | 
						|
          to  take  different actions depending on whether or not
 | 
						|
          the scanner might  be  scanning  an  interactive  input
 | 
						|
          source,  you can test for the presence of this name via
 | 
						|
          #ifdef.
 | 
						|
 | 
						|
     virtual void LexerOutput( const char* buf, int size )
 | 
						|
          writes out size characters from the buffer buf,  which,
 | 
						|
          while NUL-terminated, may also contain "internal" NUL's
 | 
						|
          if the scanner's rules can match  text  with  NUL's  in
 | 
						|
          them.
 | 
						|
 | 
						|
     virtual void LexerError( const char* msg )
 | 
						|
          reports a fatal error message.  The default version  of
 | 
						|
          this function writes the message to the stream cerr and
 | 
						|
          exits.
 | 
						|
 | 
						|
     Note that a yyFlexLexer object contains its entire  scanning
 | 
						|
     state.   Thus  you  can use such objects to create reentrant
 | 
						|
     scanners.  You can instantiate  multiple  instances  of  the
 | 
						|
     same  yyFlexLexer  class,  and you can also combine multiple
 | 
						|
     C++ scanner classes together in the same program  using  the
 | 
						|
     -P option discussed above.
 | 
						|
 | 
						|
     Finally, note that the %array feature is  not  available  to
 | 
						|
     C++ scanner classes; you must use %pointer (the default).
 | 
						|
 | 
						|
     Here is an example of a simple C++ scanner:
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   46
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
             // An example of using the flex C++ scanner class.
 | 
						|
 | 
						|
         %{
 | 
						|
         int mylineno = 0;
 | 
						|
         %}
 | 
						|
 | 
						|
         string  \"[^\n"]+\"
 | 
						|
 | 
						|
         ws      [ \t]+
 | 
						|
 | 
						|
         alpha   [A-Za-z]
 | 
						|
         dig     [0-9]
 | 
						|
         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
 | 
						|
         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
 | 
						|
         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
 | 
						|
         number  {num1}|{num2}
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
         {ws}    /* skip blanks and tabs */
 | 
						|
 | 
						|
         "/*"    {
 | 
						|
                 int c;
 | 
						|
 | 
						|
                 while((c = yyinput()) != 0)
 | 
						|
                     {
 | 
						|
                     if(c == '\n')
 | 
						|
                         ++mylineno;
 | 
						|
 | 
						|
                     else if(c == '*')
 | 
						|
                         {
 | 
						|
                         if((c = yyinput()) == '/')
 | 
						|
                             break;
 | 
						|
                         else
 | 
						|
                             unput(c);
 | 
						|
                         }
 | 
						|
                     }
 | 
						|
                 }
 | 
						|
 | 
						|
         {number}  cout << "number " << YYText() << '\n';
 | 
						|
 | 
						|
         \n        mylineno++;
 | 
						|
 | 
						|
         {name}    cout << "name " << YYText() << '\n';
 | 
						|
 | 
						|
         {string}  cout << "string " << YYText() << '\n';
 | 
						|
 | 
						|
         %%
 | 
						|
 | 
						|
         int main( int /* argc */, char** /* argv */ )
 | 
						|
             {
 | 
						|
             FlexLexer* lexer = new yyFlexLexer;
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   47
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
             while(lexer->yylex() != 0)
 | 
						|
                 ;
 | 
						|
             return 0;
 | 
						|
             }
 | 
						|
     If you want to create multiple  (different)  lexer  classes,
 | 
						|
     you  use  the -P flag (or the prefix= option) to rename each
 | 
						|
     yyFlexLexer to some other xxFlexLexer. You then can  include
 | 
						|
     <FlexLexer.h>  in  your  other sources once per lexer class,
 | 
						|
     first renaming yyFlexLexer as follows:
 | 
						|
 | 
						|
         #undef yyFlexLexer
 | 
						|
         #define yyFlexLexer xxFlexLexer
 | 
						|
         #include <FlexLexer.h>
 | 
						|
 | 
						|
         #undef yyFlexLexer
 | 
						|
         #define yyFlexLexer zzFlexLexer
 | 
						|
         #include <FlexLexer.h>
 | 
						|
 | 
						|
     if, for example, you used %option  prefix="xx"  for  one  of
 | 
						|
     your scanners and %option prefix="zz" for the other.
 | 
						|
 | 
						|
     IMPORTANT: the present form of the scanning class is experi-
 | 
						|
     mental and may change considerably between major releases.
 | 
						|
 | 
						|
INCOMPATIBILITIES WITH LEX AND POSIX
 | 
						|
     flex is a rewrite of the AT&T Unix lex tool (the two  imple-
 | 
						|
     mentations  do not share any code, though), with some exten-
 | 
						|
     sions and incompatibilities, both of which are of concern to
 | 
						|
     those who wish to write scanners acceptable to either imple-
 | 
						|
     mentation.  Flex is  fully  compliant  with  the  POSIX  lex
 | 
						|
     specification,   except   that   when  using  %pointer  (the
 | 
						|
     default), a call to unput() destroys the contents of yytext,
 | 
						|
     which is counter to the POSIX specification.
 | 
						|
 | 
						|
     In this section we discuss all of the known areas of  incom-
 | 
						|
     patibility  between flex, AT&T lex, and the POSIX specifica-
 | 
						|
     tion.
 | 
						|
 | 
						|
     flex's -l option turns on  maximum  compatibility  with  the
 | 
						|
     original  AT&T  lex  implementation,  at the cost of a major
 | 
						|
     loss in the generated scanner's performance.  We note  below
 | 
						|
     which incompatibilities can be overcome using the -l option.
 | 
						|
 | 
						|
     flex is fully compatible with lex with the following  excep-
 | 
						|
     tions:
 | 
						|
 | 
						|
     -    The undocumented lex scanner internal variable yylineno
 | 
						|
          is not supported unless -l or %option yylineno is used.
 | 
						|
 | 
						|
          yylineno should be maintained on  a  per-buffer  basis,
 | 
						|
          rather  than  a  per-scanner  (single  global variable)
 | 
						|
          basis.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   48
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          yylineno is not part of the POSIX specification.
 | 
						|
 | 
						|
     -    The input() routine is not redefinable, though  it  may
 | 
						|
          be  called  to  read  characters following whatever has
 | 
						|
          been matched by a rule.  If input() encounters an  end-
 | 
						|
          of-file  the  normal  yywrap()  processing  is done.  A
 | 
						|
          ``real'' end-of-file is returned by input() as EOF.
 | 
						|
 | 
						|
          Input is instead controlled by  defining  the  YY_INPUT
 | 
						|
          macro.
 | 
						|
 | 
						|
          The flex restriction that input() cannot  be  redefined
 | 
						|
          is  in  accordance  with the POSIX specification, which
 | 
						|
          simply does not specify  any  way  of  controlling  the
 | 
						|
          scanner's input other than by making an initial assign-
 | 
						|
          ment to yyin.
 | 
						|
 | 
						|
     -    The unput() routine is not redefinable.  This  restric-
 | 
						|
          tion is in accordance with POSIX.
 | 
						|
 | 
						|
     -    flex scanners are not as reentrant as lex scanners.  In
 | 
						|
          particular,  if  you have an interactive scanner and an
 | 
						|
          interrupt handler which long-jumps out of the  scanner,
 | 
						|
          and  the  scanner is subsequently called again, you may
 | 
						|
          get the following message:
 | 
						|
 | 
						|
              fatal flex scanner internal error--end of buffer missed
 | 
						|
 | 
						|
          To reenter the scanner, first use
 | 
						|
 | 
						|
              yyrestart( yyin );
 | 
						|
 | 
						|
          Note that this call will throw away any buffered input;
 | 
						|
          usually  this  isn't  a  problem  with  an  interactive
 | 
						|
          scanner.
 | 
						|
 | 
						|
          Also note that flex C++ scanner classes are  reentrant,
 | 
						|
          so  if  using  C++ is an option for you, you should use
 | 
						|
          them instead.  See "Generating C++ Scanners" above  for
 | 
						|
          details.
 | 
						|
 | 
						|
     -    output() is not supported.  Output from the ECHO  macro
 | 
						|
          is done to the file-pointer yyout (default stdout).
 | 
						|
 | 
						|
          output() is not part of the POSIX specification.
 | 
						|
 | 
						|
     -    lex does not support exclusive start  conditions  (%x),
 | 
						|
          though they are in the POSIX specification.
 | 
						|
 | 
						|
     -    When definitions are expanded, flex  encloses  them  in
 | 
						|
          parentheses.  With lex, the following:
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   49
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
              NAME    [A-Z][A-Z0-9]*
 | 
						|
              %%
 | 
						|
              foo{NAME}?      printf( "Found it\n" );
 | 
						|
              %%
 | 
						|
 | 
						|
          will not match the string "foo" because when the  macro
 | 
						|
          is  expanded  the rule is equivalent to "foo[A-Z][A-Z0-
 | 
						|
          9]*?" and the precedence is such that the '?' is  asso-
 | 
						|
          ciated  with  "[A-Z0-9]*".  With flex, the rule will be
 | 
						|
          expanded to "foo([A-Z][A-Z0-9]*)?" and  so  the  string
 | 
						|
          "foo" will match.
 | 
						|
 | 
						|
          Note that if the definition begins with ^ or ends  with
 | 
						|
          $  then  it  is not expanded with parentheses, to allow
 | 
						|
          these operators to appear in definitions without losing
 | 
						|
          their  special  meanings.   But the <s>, /, and <<EOF>>
 | 
						|
          operators cannot be used in a flex definition.
 | 
						|
 | 
						|
          Using -l results in the lex behavior of no  parentheses
 | 
						|
          around the definition.
 | 
						|
 | 
						|
          The POSIX  specification  is  that  the  definition  be
 | 
						|
          enclosed in parentheses.
 | 
						|
 | 
						|
     -    Some implementations of lex allow a  rule's  action  to
 | 
						|
          begin  on  a  separate  line, if the rule's pattern has
 | 
						|
          trailing whitespace:
 | 
						|
 | 
						|
              %%
 | 
						|
              foo|bar<space here>
 | 
						|
                { foobar_action(); }
 | 
						|
 | 
						|
          flex does not support this feature.
 | 
						|
 | 
						|
     -    The lex %r (generate a Ratfor scanner)  option  is  not
 | 
						|
          supported.  It is not part of the POSIX specification.
 | 
						|
 | 
						|
     -    After a call to unput(), yytext is undefined until  the
 | 
						|
          next  token  is  matched,  unless the scanner was built
 | 
						|
          using %array. This is not the  case  with  lex  or  the
 | 
						|
          POSIX specification.  The -l option does away with this
 | 
						|
          incompatibility.
 | 
						|
 | 
						|
     -    The precedence of the {} (numeric  range)  operator  is
 | 
						|
          different.   lex  interprets  "abc{1,3}" as "match one,
 | 
						|
          two, or  three  occurrences  of  'abc'",  whereas  flex
 | 
						|
          interprets  it  as "match 'ab' followed by one, two, or
 | 
						|
          three occurrences of 'c'".  The latter is in  agreement
 | 
						|
          with the POSIX specification.
 | 
						|
 | 
						|
     -    The precedence of the ^  operator  is  different.   lex
 | 
						|
          interprets  "^foo|bar"  as  "match  either 'foo' at the
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   50
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
          beginning of a line, or 'bar' anywhere",  whereas  flex
 | 
						|
          interprets  it  as "match either 'foo' or 'bar' if they
 | 
						|
          come at the beginning of a line".   The  latter  is  in
 | 
						|
          agreement with the POSIX specification.
 | 
						|
 | 
						|
     -    The special table-size declarations  such  as  %a  sup-
 | 
						|
          ported  by  lex are not required by flex scanners; flex
 | 
						|
          ignores them.
 | 
						|
 | 
						|
     -    The name FLEX_SCANNER is #define'd so scanners  may  be
 | 
						|
          written  for use with either flex or lex. Scanners also
 | 
						|
          include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSION
 | 
						|
          indicating  which version of flex generated the scanner
 | 
						|
          (for example, for the 2.5 release, these defines  would
 | 
						|
          be 2 and 5 respectively).
 | 
						|
 | 
						|
     The following flex features are not included in lex  or  the
 | 
						|
     POSIX specification:
 | 
						|
 | 
						|
         C++ scanners
 | 
						|
         %option
 | 
						|
         start condition scopes
 | 
						|
         start condition stacks
 | 
						|
         interactive/non-interactive scanners
 | 
						|
         yy_scan_string() and friends
 | 
						|
         yyterminate()
 | 
						|
         yy_set_interactive()
 | 
						|
         yy_set_bol()
 | 
						|
         YY_AT_BOL()
 | 
						|
         <<EOF>>
 | 
						|
         <*>
 | 
						|
         YY_DECL
 | 
						|
         YY_START
 | 
						|
         YY_USER_ACTION
 | 
						|
         YY_USER_INIT
 | 
						|
         #line directives
 | 
						|
         %{}'s around actions
 | 
						|
         multiple actions on a line
 | 
						|
 | 
						|
     plus almost all of the flex flags.  The last feature in  the
 | 
						|
     list  refers to the fact that with flex you can put multiple
 | 
						|
     actions on the same line, separated with semi-colons,  while
 | 
						|
     with lex, the following
 | 
						|
 | 
						|
         foo    handle_foo(); ++num_foos_seen;
 | 
						|
 | 
						|
     is (rather surprisingly) truncated to
 | 
						|
 | 
						|
         foo    handle_foo();
 | 
						|
 | 
						|
     flex does not truncate the action.   Actions  that  are  not
 | 
						|
     enclosed  in  braces are simply terminated at the end of the
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   51
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     line.
 | 
						|
 | 
						|
DIAGNOSTICS
 | 
						|
     warning, rule cannot be matched  indicates  that  the  given
 | 
						|
     rule  cannot  be matched because it follows other rules that
 | 
						|
     will always match the same text as it.  For example, in  the
 | 
						|
     following  "foo" cannot be matched because it comes after an
 | 
						|
     identifier "catch-all" rule:
 | 
						|
 | 
						|
         [a-z]+    got_identifier();
 | 
						|
         foo       got_foo();
 | 
						|
 | 
						|
     Using REJECT in a scanner suppresses this warning.
 | 
						|
 | 
						|
     warning, -s option given but default  rule  can  be  matched
 | 
						|
     means  that  it  is  possible  (perhaps only in a particular
 | 
						|
     start condition) that the default  rule  (match  any  single
 | 
						|
     character)  is  the  only  one  that will match a particular
 | 
						|
     input.  Since -s was given, presumably this is not intended.
 | 
						|
 | 
						|
     reject_used_but_not_detected          undefined           or
 | 
						|
     yymore_used_but_not_detected  undefined  -  These errors can
 | 
						|
     occur at compile time.  They indicate that the scanner  uses
 | 
						|
     REJECT  or yymore() but that flex failed to notice the fact,
 | 
						|
     meaning that flex scanned the first two sections looking for
 | 
						|
     occurrences  of  these  actions  and failed to find any, but
 | 
						|
     somehow you snuck some in (via a #include  file,  for  exam-
 | 
						|
     ple).   Use  %option reject or %option yymore to indicate to
 | 
						|
     flex that you really do use these features.
 | 
						|
 | 
						|
     flex scanner jammed - a scanner compiled with -s has encoun-
 | 
						|
     tered  an  input  string  which wasn't matched by any of its
 | 
						|
     rules.  This error can also occur due to internal problems.
 | 
						|
 | 
						|
     token too large, exceeds YYLMAX - your scanner  uses  %array
 | 
						|
     and one of its rules matched a string longer than the YYLMAX
 | 
						|
     constant (8K bytes by default).  You can increase the  value
 | 
						|
     by  #define'ing  YYLMAX  in  the definitions section of your
 | 
						|
     flex input.
 | 
						|
 | 
						|
     scanner requires -8 flag to use the  character  'x'  -  Your
 | 
						|
     scanner specification includes recognizing the 8-bit charac-
 | 
						|
     ter 'x' and you did  not  specify  the  -8  flag,  and  your
 | 
						|
     scanner  defaulted  to 7-bit because you used the -Cf or -CF
 | 
						|
     table compression options.  See the  discussion  of  the  -7
 | 
						|
     flag for details.
 | 
						|
 | 
						|
     flex scanner push-back overflow - you used unput()  to  push
 | 
						|
     back  so  much text that the scanner's buffer could not hold
 | 
						|
     both the pushed-back text and the current token  in  yytext.
 | 
						|
     Ideally  the scanner should dynamically resize the buffer in
 | 
						|
     this case, but at present it does not.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   52
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     input buffer overflow, can't enlarge buffer because  scanner
 | 
						|
     uses  REJECT  -  the  scanner  was  working  on  matching an
 | 
						|
     extremely large token and needed to expand the input buffer.
 | 
						|
     This doesn't work with scanners that use REJECT.
 | 
						|
 | 
						|
     fatal flex scanner internal error--end of  buffer  missed  -
 | 
						|
     This  can  occur  in  an  scanner which is reentered after a
 | 
						|
     long-jump has jumped out (or over) the scanner's  activation
 | 
						|
     frame.  Before reentering the scanner, use:
 | 
						|
 | 
						|
         yyrestart( yyin );
 | 
						|
 | 
						|
     or, as noted above, switch to using the C++ scanner class.
 | 
						|
 | 
						|
     too many start conditions in <> you listed more start condi-
 | 
						|
     tions  in a <> construct than exist (so you must have listed
 | 
						|
     at least one of them twice).
 | 
						|
 | 
						|
FILES
 | 
						|
     -lfl library with which scanners must be linked.
 | 
						|
 | 
						|
     lex.yy.c
 | 
						|
          generated scanner (called lexyy.c on some systems).
 | 
						|
 | 
						|
     lex.yy.cc
 | 
						|
          generated C++ scanner class, when using -+.
 | 
						|
 | 
						|
     <FlexLexer.h>
 | 
						|
          header file defining the C++ scanner base class,  Flex-
 | 
						|
          Lexer, and its derived class, yyFlexLexer.
 | 
						|
 | 
						|
     flex.skl
 | 
						|
          skeleton scanner.  This file is only used when building
 | 
						|
          flex, not when flex executes.
 | 
						|
 | 
						|
     lex.backup
 | 
						|
          backing-up information for -b flag (called  lex.bck  on
 | 
						|
          some systems).
 | 
						|
 | 
						|
DEFICIENCIES / BUGS
 | 
						|
     Some trailing context patterns cannot  be  properly  matched
 | 
						|
     and  generate  warning  messages  ("dangerous  trailing con-
 | 
						|
     text").  These are patterns where the ending  of  the  first
 | 
						|
     part  of  the rule matches the beginning of the second part,
 | 
						|
     such as "zx*/xy*", where the 'x*' matches  the  'x'  at  the
 | 
						|
     beginning  of  the  trailing  context.  (Note that the POSIX
 | 
						|
     draft states that the text matched by such patterns is unde-
 | 
						|
     fined.)
 | 
						|
 | 
						|
     For some trailing context rules, parts  which  are  actually
 | 
						|
     fixed-length  are  not  recognized  as  such, leading to the
 | 
						|
     abovementioned performance loss.  In particular, parts using
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   53
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     '|'   or  {n}  (such  as  "foo{3}")  are  always  considered
 | 
						|
     variable-length.
 | 
						|
 | 
						|
     Combining trailing context with the special '|'  action  can
 | 
						|
     result  in fixed trailing context being turned into the more
 | 
						|
     expensive variable trailing context.  For  example,  in  the
 | 
						|
     following:
 | 
						|
 | 
						|
         %%
 | 
						|
         abc      |
 | 
						|
         xyz/def
 | 
						|
 | 
						|
 | 
						|
     Use of unput() invalidates yytext  and  yyleng,  unless  the
 | 
						|
     %array directive or the -l option has been used.
 | 
						|
 | 
						|
     Pattern-matching  of  NUL's  is  substantially  slower  than
 | 
						|
     matching other characters.
 | 
						|
 | 
						|
     Dynamic resizing of the input buffer is slow, as it  entails
 | 
						|
     rescanning  all the text matched so far by the current (gen-
 | 
						|
     erally huge) token.
 | 
						|
 | 
						|
     Due to both buffering of input and  read-ahead,  you  cannot
 | 
						|
     intermix  calls to <stdio.h> routines, such as, for example,
 | 
						|
     getchar(), with flex rules and  expect  it  to  work.   Call
 | 
						|
     input() instead.
 | 
						|
 | 
						|
     The total table entries listed by the -v flag  excludes  the
 | 
						|
     number  of  table  entries needed to determine what rule has
 | 
						|
     been matched.  The number of entries is equal to the  number
 | 
						|
     of  DFA states if the scanner does not use REJECT, and some-
 | 
						|
     what greater than the number of states if it does.
 | 
						|
 | 
						|
     REJECT cannot be used with the -f or -F options.
 | 
						|
 | 
						|
     The flex internal algorithms need documentation.
 | 
						|
 | 
						|
SEE ALSO
 | 
						|
     lex(1), yacc(1), sed(1), awk(1).
 | 
						|
 | 
						|
     John Levine,  Tony  Mason,  and  Doug  Brown,  Lex  &  Yacc,
 | 
						|
     O'Reilly and Associates.  Be sure to get the 2nd edition.
 | 
						|
 | 
						|
     M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
 | 
						|
 | 
						|
     Alfred Aho, Ravi Sethi and Jeffrey Ullman, Compilers:  Prin-
 | 
						|
     ciples,   Techniques   and   Tools,  Addison-Wesley  (1986).
 | 
						|
     Describes  the  pattern-matching  techniques  used  by  flex
 | 
						|
     (deterministic finite automata).
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   54
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
AUTHOR
 | 
						|
     Vern Paxson, with the help of many ideas and  much  inspira-
 | 
						|
     tion  from Van Jacobson.  Original version by Jef Poskanzer.
 | 
						|
     The fast table representation is a partial implementation of
 | 
						|
     a  design done by Van Jacobson.  The implementation was done
 | 
						|
     by Kevin Gong and Vern Paxson.
 | 
						|
 | 
						|
     Thanks to the many flex beta-testers, feedbackers, and  con-
 | 
						|
     tributors,  especially Francois Pinard, Casey Leedom, Robert
 | 
						|
     Abramovitz,  Stan  Adermann,  Terry  Allen,  David   Barker-
 | 
						|
     Plummer,  John  Basrai,  Neal  Becker,  Nelson  H.F.  Beebe,
 | 
						|
     benson@odi.com, Karl Berry, Peter A. Bigot, Simon Blanchard,
 | 
						|
     Keith  Bostic,  Frederic Brehm, Ian Brockbank, Kin Cho, Nick
 | 
						|
     Christopher, Brian Clapper, J.T.  Conklin,  Jason  Coughlin,
 | 
						|
     Bill  Cox,  Nick  Cropper, Dave Curtis, Scott David Daniels,
 | 
						|
     Chris  G.  Demetriou,  Theo  Deraadt,  Mike  Donahue,  Chuck
 | 
						|
     Doucette,  Tom  Epperly,  Leo  Eskin,  Chris  Faylor,  Chris
 | 
						|
     Flatters, Jon Forrest, Jeffrey Friedl, Joe Gayda,  Kaveh  R.
 | 
						|
     Ghazi,  Wolfgang  Glunz, Eric Goldman, Christopher M. Gould,
 | 
						|
     Ulrich Grepel, Peer Griebel, Jan  Hajic,  Charles  Hemphill,
 | 
						|
     NORO  Hideo,  Jarkko  Hietaniemi, Scott Hofmann, Jeff Honig,
 | 
						|
     Dana Hudes, Eric Hughes,  John  Interrante,  Ceriel  Jacobs,
 | 
						|
     Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry
 | 
						|
     Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O  Kane,
 | 
						|
     Amir  Katz, ken@ken.hilco.com, Kevin B. Kenny, Steve Kirsch,
 | 
						|
     Winfried Koenig, Marq  Kole,  Ronald  Lamprecht,  Greg  Lee,
 | 
						|
     Rohan  Lenard, Craig Leres, John Levine, Steve Liddle, David
 | 
						|
     Loffredo, Mike Long, Mohamed el Lozy, Brian  Madsen,  Malte,
 | 
						|
     Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn,
 | 
						|
     Jim Meyering,  R.  Alexander  Milowski,  Erik  Naggum,  G.T.
 | 
						|
     Nicol,  Landon  Noll,  James  Nordby,  Marc  Nozell, Richard
 | 
						|
     Ohnemus, Karsten Pahnke, Sven Panne,  Roland  Pesch,  Walter
 | 
						|
     Pelissero,  Gaumond  Pierre, Esmond Pitt, Jef Poskanzer, Joe
 | 
						|
     Rahmeh, Jarmo Raiha, Frederic Raimbault,  Pat  Rankin,  Rick
 | 
						|
     Richardson,  Kevin  Rodgers,  Kai  Uwe  Rommel, Jim Roskind,
 | 
						|
     Alberto Santini,  Andreas  Scherer,  Darrell  Schiebel,  Raf
 | 
						|
     Schietekat,  Doug  Schmidt,  Philippe  Schnoebelen,  Andreas
 | 
						|
     Schwab, Larry Schwimmer, Alex Siegel, Eckehard  Stolz,  Jan-
 | 
						|
     Erik  Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
 | 
						|
     Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi  Tsai,
 | 
						|
     Paul  Tuinenga,  Gary  Weik, Frank Whaley, Gerhard Wilhelms,
 | 
						|
     Kent Williams, Ken Yap,  Ron  Zellar,  Nathan  Zelle,  David
 | 
						|
     Zuhn,  and  those whose names have slipped my marginal mail-
 | 
						|
     archiving skills but whose contributions are appreciated all
 | 
						|
     the same.
 | 
						|
 | 
						|
     Thanks to Keith Bostic, Jon  Forrest,  Noah  Friedman,  John
 | 
						|
     Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol,
 | 
						|
     Francois Pinard, Rich Salz, and Richard  Stallman  for  help
 | 
						|
     with various distribution headaches.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   55
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
FLEX(1)                  USER COMMANDS                    FLEX(1)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
     Thanks to Esmond Pitt and Earle Horton for  8-bit  character
 | 
						|
     support; to Benson Margulies and Fred Burke for C++ support;
 | 
						|
     to Kent Williams and Tom Epperly for C++ class  support;  to
 | 
						|
     Ove  Ewerlid  for  support  of NUL's; and to Eric Hughes for
 | 
						|
     support of multiple buffers.
 | 
						|
 | 
						|
     This work was primarily done when I was with the  Real  Time
 | 
						|
     Systems  Group at the Lawrence Berkeley Laboratory in Berke-
 | 
						|
     ley, CA.  Many  thanks  to  all  there  for  the  support  I
 | 
						|
     received.
 | 
						|
 | 
						|
     Send comments to vern@ee.lbl.gov.
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Version 2.5          Last change: April 1995                   56
 | 
						|
 | 
						|
 | 
						|
 |