3697 lines
		
	
	
		
			123 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			3697 lines
		
	
	
		
			123 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
| NAME
 | |
|      flex - fast lexical analyzer generator
 | |
| 
 | |
| SYNOPSIS
 | |
|      flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput  -Pprefix
 | |
|      -Sskeleton] [--help --version] [filename ...]
 | |
| 
 | |
| OVERVIEW
 | |
|      This manual describes flex, a tool for  generating  programs
 | |
|      that  perform pattern-matching on text.  The manual includes
 | |
|      both tutorial and reference sections:
 | |
| 
 | |
|          Description
 | |
|              a brief overview of the tool
 | |
| 
 | |
|          Some Simple Examples
 | |
| 
 | |
|          Format Of The Input File
 | |
| 
 | |
|          Patterns
 | |
|              the extended regular expressions used by flex
 | |
| 
 | |
|          How The Input Is Matched
 | |
|              the rules for determining what has been matched
 | |
| 
 | |
|          Actions
 | |
|              how to specify what to do when a pattern is matched
 | |
| 
 | |
|          The Generated Scanner
 | |
|              details regarding the scanner that flex produces;
 | |
|              how to control the input source
 | |
| 
 | |
|          Start Conditions
 | |
|              introducing context into your scanners, and
 | |
|              managing "mini-scanners"
 | |
| 
 | |
|          Multiple Input Buffers
 | |
|              how to manipulate multiple input sources; how to
 | |
|              scan from strings instead of files
 | |
| 
 | |
|          End-of-file Rules
 | |
|              special rules for matching the end of the input
 | |
| 
 | |
|          Miscellaneous Macros
 | |
|              a summary of macros available to the actions
 | |
| 
 | |
|          Values Available To The User
 | |
|              a summary of values available to the actions
 | |
| 
 | |
|          Interfacing With Yacc
 | |
|              connecting flex scanners together with yacc parsers
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    1
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          Options
 | |
|              flex command-line options, and the "%option"
 | |
|              directive
 | |
| 
 | |
|          Performance Considerations
 | |
|              how to make your scanner go as fast as possible
 | |
| 
 | |
|          Generating C++ Scanners
 | |
|              the (experimental) facility for generating C++
 | |
|              scanner classes
 | |
| 
 | |
|          Incompatibilities With Lex And POSIX
 | |
|              how flex differs from AT&T lex and the POSIX lex
 | |
|              standard
 | |
| 
 | |
|          Diagnostics
 | |
|              those error messages produced by flex (or scanners
 | |
|              it generates) whose meanings might not be apparent
 | |
| 
 | |
|          Files
 | |
|              files used by flex
 | |
| 
 | |
|          Deficiencies / Bugs
 | |
|              known problems with flex
 | |
| 
 | |
|          See Also
 | |
|              other documentation, related tools
 | |
| 
 | |
|          Author
 | |
|              includes contact information
 | |
| 
 | |
| 
 | |
| DESCRIPTION
 | |
|      flex is a  tool  for  generating  scanners:  programs  which
 | |
|      recognized  lexical  patterns in text.  flex reads the given
 | |
|      input files, or its standard input  if  no  file  names  are
 | |
|      given,  for  a  description  of  a scanner to generate.  The
 | |
|      description is in the form of pairs of  regular  expressions
 | |
|      and  C  code,  called  rules.  flex  generates as output a C
 | |
|      source file, lex.yy.c, which defines a routine yylex(). This
 | |
|      file is compiled and linked with the -lfl library to produce
 | |
|      an executable.  When the executable is run, it analyzes  its
 | |
|      input  for occurrences of the regular expressions.  Whenever
 | |
|      it finds one, it executes the corresponding C code.
 | |
| 
 | |
| SOME SIMPLE EXAMPLES
 | |
|      First some simple examples to get the flavor of how one uses
 | |
|      flex.  The  following  flex  input specifies a scanner which
 | |
|      whenever it encounters the string "username" will replace it
 | |
|      with the user's login name:
 | |
| 
 | |
|          %%
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    2
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          username    printf( "%s", getlogin() );
 | |
| 
 | |
|      By default, any text not matched by a flex scanner is copied
 | |
|      to  the output, so the net effect of this scanner is to copy
 | |
|      its input file to its output with each occurrence of  "user-
 | |
|      name"  expanded.   In  this  input,  there is just one rule.
 | |
|      "username" is the pattern and the "printf"  is  the  action.
 | |
|      The "%%" marks the beginning of the rules.
 | |
| 
 | |
|      Here's another simple example:
 | |
| 
 | |
|                  int num_lines = 0, num_chars = 0;
 | |
| 
 | |
|          %%
 | |
|          \n      ++num_lines; ++num_chars;
 | |
|          .       ++num_chars;
 | |
| 
 | |
|          %%
 | |
|          main()
 | |
|                  {
 | |
|                  yylex();
 | |
|                  printf( "# of lines = %d, # of chars = %d\n",
 | |
|                          num_lines, num_chars );
 | |
|                  }
 | |
| 
 | |
|      This scanner counts the number of characters and the  number
 | |
|      of  lines in its input (it produces no output other than the
 | |
|      final report on the counts).  The first  line  declares  two
 | |
|      globals,  "num_lines"  and "num_chars", which are accessible
 | |
|      both inside yylex() and in the main() routine declared after
 | |
|      the  second  "%%".  There are two rules, one which matches a
 | |
|      newline ("\n") and increments both the line  count  and  the
 | |
|      character  count,  and one which matches any character other
 | |
|      than a newline (indicated by the "." regular expression).
 | |
| 
 | |
|      A somewhat more complicated example:
 | |
| 
 | |
|          /* scanner for a toy Pascal-like language */
 | |
| 
 | |
|          %{
 | |
|          /* need this for the call to atof() below */
 | |
|          #include <math.h>
 | |
|          %}
 | |
| 
 | |
|          DIGIT    [0-9]
 | |
|          ID       [a-z][a-z0-9]*
 | |
| 
 | |
|          %%
 | |
| 
 | |
|          {DIGIT}+    {
 | |
|                      printf( "An integer: %s (%d)\n", yytext,
 | |
|                              atoi( yytext ) );
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    3
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|                      }
 | |
| 
 | |
|          {DIGIT}+"."{DIGIT}*        {
 | |
|                      printf( "A float: %s (%g)\n", yytext,
 | |
|                              atof( yytext ) );
 | |
|                      }
 | |
| 
 | |
|          if|then|begin|end|procedure|function        {
 | |
|                      printf( "A keyword: %s\n", yytext );
 | |
|                      }
 | |
| 
 | |
|          {ID}        printf( "An identifier: %s\n", yytext );
 | |
| 
 | |
|          "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
 | |
| 
 | |
|          "{"[^}\n]*"}"     /* eat up one-line comments */
 | |
| 
 | |
|          [ \t\n]+          /* eat up whitespace */
 | |
| 
 | |
|          .           printf( "Unrecognized character: %s\n", yytext );
 | |
| 
 | |
|          %%
 | |
| 
 | |
|          main( argc, argv )
 | |
|          int argc;
 | |
|          char **argv;
 | |
|              {
 | |
|              ++argv, --argc;  /* skip over program name */
 | |
|              if ( argc > 0 )
 | |
|                      yyin = fopen( argv[0], "r" );
 | |
|              else
 | |
|                      yyin = stdin;
 | |
| 
 | |
|              yylex();
 | |
|              }
 | |
| 
 | |
|      This is the beginnings of a simple scanner  for  a  language
 | |
|      like  Pascal.   It  identifies different types of tokens and
 | |
|      reports on what it has seen.
 | |
| 
 | |
|      The details of this example will be explained in the follow-
 | |
|      ing sections.
 | |
| 
 | |
| FORMAT OF THE INPUT FILE
 | |
|      The flex input file consists of three sections, separated by
 | |
|      a line with just %% in it:
 | |
| 
 | |
|          definitions
 | |
|          %%
 | |
|          rules
 | |
|          %%
 | |
|          user code
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    4
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      The definitions section contains declarations of simple name
 | |
|      definitions  to  simplify  the  scanner  specification,  and
 | |
|      declarations of start conditions, which are explained  in  a
 | |
|      later section.
 | |
| 
 | |
|      Name definitions have the form:
 | |
| 
 | |
|          name definition
 | |
| 
 | |
|      The "name" is a word beginning with a letter  or  an  under-
 | |
|      score  ('_')  followed by zero or more letters, digits, '_',
 | |
|      or '-' (dash).  The definition is  taken  to  begin  at  the
 | |
|      first  non-white-space character following the name and con-
 | |
|      tinuing to the end of the line.  The definition  can  subse-
 | |
|      quently  be referred to using "{name}", which will expand to
 | |
|      "(definition)".  For example,
 | |
| 
 | |
|          DIGIT    [0-9]
 | |
|          ID       [a-z][a-z0-9]*
 | |
| 
 | |
|      defines "DIGIT" to be a regular expression which  matches  a
 | |
|      single  digit,  and  "ID"  to  be a regular expression which
 | |
|      matches a letter followed by zero-or-more letters-or-digits.
 | |
|      A subsequent reference to
 | |
| 
 | |
|          {DIGIT}+"."{DIGIT}*
 | |
| 
 | |
|      is identical to
 | |
| 
 | |
|          ([0-9])+"."([0-9])*
 | |
| 
 | |
|      and matches one-or-more digits followed by a '.' followed by
 | |
|      zero-or-more digits.
 | |
| 
 | |
|      The rules section of the flex input  contains  a  series  of
 | |
|      rules of the form:
 | |
| 
 | |
|          pattern   action
 | |
| 
 | |
|      where the pattern must be unindented  and  the  action  must
 | |
|      begin on the same line.
 | |
| 
 | |
|      See below for a further description of patterns and actions.
 | |
| 
 | |
|      Finally, the user code section is simply copied to  lex.yy.c
 | |
|      verbatim.   It  is used for companion routines which call or
 | |
|      are called by the scanner.  The presence of this section  is
 | |
|      optional;  if it is missing, the second %% in the input file
 | |
|      may be skipped, too.
 | |
| 
 | |
|      In the definitions and rules sections, any indented text  or
 | |
|      text  enclosed in %{ and %} is copied verbatim to the output
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    5
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      (with the %{}'s removed).  The %{}'s must appear  unindented
 | |
|      on lines by themselves.
 | |
| 
 | |
|      In the rules section, any indented  or  %{}  text  appearing
 | |
|      before the first rule may be used to declare variables which
 | |
|      are local to the scanning routine and  (after  the  declara-
 | |
|      tions)  code  which  is to be executed whenever the scanning
 | |
|      routine is entered.  Other indented or %{} text in the  rule
 | |
|      section  is  still  copied to the output, but its meaning is
 | |
|      not well-defined and it may well cause  compile-time  errors
 | |
|      (this feature is present for POSIX compliance; see below for
 | |
|      other such features).
 | |
| 
 | |
|      In the definitions section (but not in the  rules  section),
 | |
|      an  unindented comment (i.e., a line beginning with "/*") is
 | |
|      also copied verbatim to the output up to the next "*/".
 | |
| 
 | |
| PATTERNS
 | |
|      The patterns in the input are written using an extended  set
 | |
|      of regular expressions.  These are:
 | |
| 
 | |
|          x          match the character 'x'
 | |
|          .          any character (byte) except newline
 | |
|          [xyz]      a "character class"; in this case, the pattern
 | |
|                       matches either an 'x', a 'y', or a 'z'
 | |
|          [abj-oZ]   a "character class" with a range in it; matches
 | |
|                       an 'a', a 'b', any letter from 'j' through 'o',
 | |
|                       or a 'Z'
 | |
|          [^A-Z]     a "negated character class", i.e., any character
 | |
|                       but those in the class.  In this case, any
 | |
|                       character EXCEPT an uppercase letter.
 | |
|          [^A-Z\n]   any character EXCEPT an uppercase letter or
 | |
|                       a newline
 | |
|          r*         zero or more r's, where r is any regular expression
 | |
|          r+         one or more r's
 | |
|          r?         zero or one r's (that is, "an optional r")
 | |
|          r{2,5}     anywhere from two to five r's
 | |
|          r{2,}      two or more r's
 | |
|          r{4}       exactly 4 r's
 | |
|          {name}     the expansion of the "name" definition
 | |
|                     (see above)
 | |
|          "[xyz]\"foo"
 | |
|                     the literal string: [xyz]"foo
 | |
|          \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
 | |
|                       then the ANSI-C interpretation of \x.
 | |
|                       Otherwise, a literal 'X' (used to escape
 | |
|                       operators such as '*')
 | |
|          \0         a NUL character (ASCII code 0)
 | |
|          \123       the character with octal value 123
 | |
|          \x2a       the character with hexadecimal value 2a
 | |
|          (r)        match an r; parentheses are used to override
 | |
|                       precedence (see below)
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    6
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          rs         the regular expression r followed by the
 | |
|                       regular expression s; called "concatenation"
 | |
| 
 | |
| 
 | |
|          r|s        either an r or an s
 | |
| 
 | |
| 
 | |
|          r/s        an r but only if it is followed by an s.  The
 | |
|                       text matched by s is included when determining
 | |
|                       whether this rule is the "longest match",
 | |
|                       but is then returned to the input before
 | |
|                       the action is executed.  So the action only
 | |
|                       sees the text matched by r.  This type
 | |
|                       of pattern is called trailing context".
 | |
|                       (There are some combinations of r/s that flex
 | |
|                       cannot match correctly; see notes in the
 | |
|                       Deficiencies / Bugs section below regarding
 | |
|                       "dangerous trailing context".)
 | |
|          ^r         an r, but only at the beginning of a line (i.e.,
 | |
|                       which just starting to scan, or right after a
 | |
|                       newline has been scanned).
 | |
|          r$         an r, but only at the end of a line (i.e., just
 | |
|                       before a newline).  Equivalent to "r/\n".
 | |
| 
 | |
|                     Note that flex's notion of "newline" is exactly
 | |
|                     whatever the C compiler used to compile flex
 | |
|                     interprets '\n' as; in particular, on some DOS
 | |
|                     systems you must either filter out \r's in the
 | |
|                     input yourself, or explicitly use r/\r\n for "r$".
 | |
| 
 | |
| 
 | |
|          <s>r       an r, but only in start condition s (see
 | |
|                       below for discussion of start conditions)
 | |
|          <s1,s2,s3>r
 | |
|                     same, but in any of start conditions s1,
 | |
|                       s2, or s3
 | |
|          <*>r       an r in any start condition, even an exclusive one.
 | |
| 
 | |
| 
 | |
|          <<EOF>>    an end-of-file
 | |
|          <s1,s2><<EOF>>
 | |
|                     an end-of-file when in start condition s1 or s2
 | |
| 
 | |
|      Note that inside of a character class, all  regular  expres-
 | |
|      sion  operators  lose  their  special  meaning except escape
 | |
|      ('\') and the character class operators, '-', ']',  and,  at
 | |
|      the beginning of the class, '^'.
 | |
| 
 | |
|      The regular expressions listed above are  grouped  according
 | |
|      to  precedence, from highest precedence at the top to lowest
 | |
|      at the bottom.   Those  grouped  together  have  equal  pre-
 | |
|      cedence.  For example,
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    7
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          foo|bar*
 | |
| 
 | |
|      is the same as
 | |
| 
 | |
|          (foo)|(ba(r*))
 | |
| 
 | |
|      since the '*' operator has higher precedence than concatena-
 | |
|      tion, and concatenation higher than alternation ('|').  This
 | |
|      pattern therefore matches either the  string  "foo"  or  the
 | |
|      string "ba" followed by zero-or-more r's.  To match "foo" or
 | |
|      zero-or-more "bar"'s, use:
 | |
| 
 | |
|          foo|(bar)*
 | |
| 
 | |
|      and to match zero-or-more "foo"'s-or-"bar"'s:
 | |
| 
 | |
|          (foo|bar)*
 | |
| 
 | |
| 
 | |
|      In addition to characters and ranges of characters,  charac-
 | |
|      ter  classes  can  also contain character class expressions.
 | |
|      These are expressions enclosed inside [: and  :]  delimiters
 | |
|      (which themselves must appear between the '[' and ']' of the
 | |
|      character class; other elements may occur inside the charac-
 | |
|      ter class, too).  The valid expressions are:
 | |
| 
 | |
|          [:alnum:] [:alpha:] [:blank:]
 | |
|          [:cntrl:] [:digit:] [:graph:]
 | |
|          [:lower:] [:print:] [:punct:]
 | |
|          [:space:] [:upper:] [:xdigit:]
 | |
| 
 | |
|      These  expressions  all  designate  a  set   of   characters
 | |
|      equivalent  to  the corresponding standard C isXXX function.
 | |
|      For example, [:alnum:] designates those characters for which
 | |
|      isalnum()  returns  true  - i.e., any alphabetic or numeric.
 | |
|      Some  systems  don't  provide  isblank(),  so  flex  defines
 | |
|      [:blank:] as a blank or a tab.
 | |
| 
 | |
|      For  example,  the  following  character  classes  are   all
 | |
|      equivalent:
 | |
| 
 | |
|          [[:alnum:]]
 | |
|          [[:alpha:][:digit:]
 | |
|          [[:alpha:]0-9]
 | |
|          [a-zA-Z0-9]
 | |
| 
 | |
|      If your scanner is  case-insensitive  (the  -i  flag),  then
 | |
|      [:upper:] and [:lower:] are equivalent to [:alpha:].
 | |
| 
 | |
|      Some notes on patterns:
 | |
| 
 | |
|      -    A negated character class such as the example  "[^A-Z]"
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    8
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           above   will   match  a  newline  unless  "\n"  (or  an
 | |
|           equivalent escape sequence) is one  of  the  characters
 | |
|           explicitly  present  in  the  negated  character  class
 | |
|           (e.g., "[^A-Z\n]").  This is unlike how many other reg-
 | |
|           ular  expression tools treat negated character classes,
 | |
|           but unfortunately  the  inconsistency  is  historically
 | |
|           entrenched.   Matching  newlines  means  that a pattern
 | |
|           like [^"]* can match the entire  input  unless  there's
 | |
|           another quote in the input.
 | |
| 
 | |
|      -    A rule can have at most one instance of  trailing  con-
 | |
|           text (the '/' operator or the '$' operator).  The start
 | |
|           condition, '^', and "<<EOF>>" patterns can  only  occur
 | |
|           at the beginning of a pattern, and, as well as with '/'
 | |
|           and '$', cannot be grouped inside parentheses.   A  '^'
 | |
|           which  does  not  occur at the beginning of a rule or a
 | |
|           '$' which does not occur at the end of a rule loses its
 | |
|           special  properties  and is treated as a normal charac-
 | |
|           ter.
 | |
| 
 | |
|           The following are illegal:
 | |
| 
 | |
|               foo/bar$
 | |
|               <sc1>foo<sc2>bar
 | |
| 
 | |
|           Note  that  the  first  of  these,   can   be   written
 | |
|           "foo/bar\n".
 | |
| 
 | |
|           The following will result in '$' or '^'  being  treated
 | |
|           as a normal character:
 | |
| 
 | |
|               foo|(bar$)
 | |
|               foo|^bar
 | |
| 
 | |
|           If what's wanted is a  "foo"  or  a  bar-followed-by-a-
 | |
|           newline,  the  following could be used (the special '|'
 | |
|           action is explained below):
 | |
| 
 | |
|               foo      |
 | |
|               bar$     /* action goes here */
 | |
| 
 | |
|           A similar trick will work for matching a foo or a  bar-
 | |
|           at-the-beginning-of-a-line.
 | |
| 
 | |
| HOW THE INPUT IS MATCHED
 | |
|      When the generated scanner is run,  it  analyzes  its  input
 | |
|      looking  for strings which match any of its patterns.  If it
 | |
|      finds more than one match, it takes  the  one  matching  the
 | |
|      most  text  (for  trailing  context rules, this includes the
 | |
|      length of the trailing part, even though  it  will  then  be
 | |
|      returned  to the input).  If it finds two or more matches of
 | |
|      the same length, the rule listed first  in  the  flex  input
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                    9
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      file is chosen.
 | |
| 
 | |
|      Once the match is determined, the text corresponding to  the
 | |
|      match  (called  the  token)  is made available in the global
 | |
|      character pointer yytext,  and  its  length  in  the  global
 | |
|      integer yyleng. The action corresponding to the matched pat-
 | |
|      tern is  then  executed  (a  more  detailed  description  of
 | |
|      actions  follows),  and  then the remaining input is scanned
 | |
|      for another match.
 | |
| 
 | |
|      If no match is found, then the default rule is executed: the
 | |
|      next character in the input is considered matched and copied
 | |
|      to the standard output.  Thus, the simplest legal flex input
 | |
|      is:
 | |
| 
 | |
|          %%
 | |
| 
 | |
|      which generates a scanner that simply copies its input  (one
 | |
|      character at a time) to its output.
 | |
| 
 | |
|      Note that yytext can  be  defined  in  two  different  ways:
 | |
|      either  as  a character pointer or as a character array. You
 | |
|      can control which definition flex uses by including  one  of
 | |
|      the  special  directives  %pointer  or  %array  in the first
 | |
|      (definitions) section of your flex input.   The  default  is
 | |
|      %pointer, unless you use the -l lex compatibility option, in
 | |
|      which case yytext will be an array.  The advantage of  using
 | |
|      %pointer  is  substantially  faster  scanning  and no buffer
 | |
|      overflow when matching very large tokens (unless you run out
 | |
|      of  dynamic  memory).  The disadvantage is that you are res-
 | |
|      tricted in how your actions can modify yytext (see the  next
 | |
|      section),  and  calls  to  the unput() function destroys the
 | |
|      present contents of yytext,  which  can  be  a  considerable
 | |
|      porting headache when moving between different lex versions.
 | |
| 
 | |
|      The advantage of %array is that you can then  modify  yytext
 | |
|      to your heart's content, and calls to unput() do not destroy
 | |
|      yytext (see  below).   Furthermore,  existing  lex  programs
 | |
|      sometimes access yytext externally using declarations of the
 | |
|      form:
 | |
|          extern char yytext[];
 | |
|      This definition is erroneous when used  with  %pointer,  but
 | |
|      correct for %array.
 | |
| 
 | |
|      %array defines yytext to be an array of  YYLMAX  characters,
 | |
|      which  defaults to a fairly large value.  You can change the
 | |
|      size by simply #define'ing YYLMAX to a  different  value  in
 | |
|      the  first  section of your flex input.  As mentioned above,
 | |
|      with %pointer yytext grows dynamically to accommodate  large
 | |
|      tokens.  While this means your %pointer scanner can accommo-
 | |
|      date very large tokens (such as matching  entire  blocks  of
 | |
|      comments),  bear  in  mind  that  each time the scanner must
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   10
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      resize yytext it also must rescan the entire token from  the
 | |
|      beginning,  so  matching such tokens can prove slow.  yytext
 | |
|      presently does not dynamically grow if  a  call  to  unput()
 | |
|      results  in too much text being pushed back; instead, a run-
 | |
|      time error results.
 | |
| 
 | |
|      Also note that  you  cannot  use  %array  with  C++  scanner
 | |
|      classes (the c++ option; see below).
 | |
| 
 | |
| ACTIONS
 | |
|      Each pattern in a rule has a corresponding action, which can
 | |
|      be any arbitrary C statement.  The pattern ends at the first
 | |
|      non-escaped whitespace character; the remainder of the  line
 | |
|      is  its  action.  If the action is empty, then when the pat-
 | |
|      tern is matched the input token is  simply  discarded.   For
 | |
|      example,  here  is  the  specification  for  a program which
 | |
|      deletes all occurrences of "zap me" from its input:
 | |
| 
 | |
|          %%
 | |
|          "zap me"
 | |
| 
 | |
|      (It will copy all other characters in the input to the  out-
 | |
|      put since they will be matched by the default rule.)
 | |
| 
 | |
|      Here is a program which compresses multiple blanks and  tabs
 | |
|      down  to a single blank, and throws away whitespace found at
 | |
|      the end of a line:
 | |
| 
 | |
|          %%
 | |
|          [ \t]+        putchar( ' ' );
 | |
|          [ \t]+$       /* ignore this token */
 | |
| 
 | |
| 
 | |
|      If the action contains a '{', then the action spans till the
 | |
|      balancing  '}'  is  found, and the action may cross multiple
 | |
|      lines.  flex knows about C strings and comments and won't be
 | |
|      fooled  by braces found within them, but also allows actions
 | |
|      to begin with %{ and will consider the action to be all  the
 | |
|      text up to the next %} (regardless of ordinary braces inside
 | |
|      the action).
 | |
| 
 | |
|      An action consisting solely of a vertical  bar  ('|')  means
 | |
|      "same  as  the  action for the next rule."  See below for an
 | |
|      illustration.
 | |
| 
 | |
|      Actions can  include  arbitrary  C  code,  including  return
 | |
|      statements  to  return  a  value  to whatever routine called
 | |
|      yylex(). Each time yylex() is called it continues processing
 | |
|      tokens  from  where it last left off until it either reaches
 | |
|      the end of the file or executes a return.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   11
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      Actions are free to modify yytext except for lengthening  it
 | |
|      (adding  characters  to  its end--these will overwrite later
 | |
|      characters in the input  stream).   This  however  does  not
 | |
|      apply  when  using  %array (see above); in that case, yytext
 | |
|      may be freely modified in any way.
 | |
| 
 | |
|      Actions are free to modify yyleng except they should not  do
 | |
|      so if the action also includes use of yymore() (see below).
 | |
| 
 | |
|      There are a  number  of  special  directives  which  can  be
 | |
|      included within an action:
 | |
| 
 | |
|      -    ECHO copies yytext to the scanner's output.
 | |
| 
 | |
|      -    BEGIN followed by the name of a start condition  places
 | |
|           the  scanner  in the corresponding start condition (see
 | |
|           below).
 | |
| 
 | |
|      -    REJECT directs the scanner to proceed on to the "second
 | |
|           best"  rule which matched the input (or a prefix of the
 | |
|           input).  The rule is chosen as described above in  "How
 | |
|           the  Input  is  Matched",  and yytext and yyleng set up
 | |
|           appropriately.  It may either be one which  matched  as
 | |
|           much  text as the originally chosen rule but came later
 | |
|           in the flex input file, or one which matched less text.
 | |
|           For example, the following will both count the words in
 | |
|           the input  and  call  the  routine  special()  whenever
 | |
|           "frob" is seen:
 | |
| 
 | |
|                       int word_count = 0;
 | |
|               %%
 | |
| 
 | |
|               frob        special(); REJECT;
 | |
|               [^ \t\n]+   ++word_count;
 | |
| 
 | |
|           Without the REJECT, any "frob"'s in the input would not
 | |
|           be  counted  as  words, since the scanner normally exe-
 | |
|           cutes only one action per token.  Multiple REJECT's are
 | |
|           allowed,  each  one finding the next best choice to the
 | |
|           currently active rule.  For example, when the following
 | |
|           scanner  scans the token "abcd", it will write "abcdab-
 | |
|           caba" to the output:
 | |
| 
 | |
|               %%
 | |
|               a        |
 | |
|               ab       |
 | |
|               abc      |
 | |
|               abcd     ECHO; REJECT;
 | |
|               .|\n     /* eat up any unmatched character */
 | |
| 
 | |
|           (The first three rules share the fourth's action  since
 | |
|           they   use   the  special  '|'  action.)  REJECT  is  a
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   12
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           particularly expensive feature in terms of scanner per-
 | |
|           formance; if it is used in any of the scanner's actions
 | |
|           it will  slow  down  all  of  the  scanner's  matching.
 | |
|           Furthermore,  REJECT cannot be used with the -Cf or -CF
 | |
|           options (see below).
 | |
| 
 | |
|           Note also that unlike the other special actions, REJECT
 | |
|           is  a  branch;  code  immediately  following  it in the
 | |
|           action will not be executed.
 | |
| 
 | |
|      -    yymore() tells  the  scanner  that  the  next  time  it
 | |
|           matches  a  rule,  the  corresponding  token  should be
 | |
|           appended onto the current value of yytext  rather  than
 | |
|           replacing  it.   For  example,  given  the input "mega-
 | |
|           kludge" the following will write "mega-mega-kludge"  to
 | |
|           the output:
 | |
| 
 | |
|               %%
 | |
|               mega-    ECHO; yymore();
 | |
|               kludge   ECHO;
 | |
| 
 | |
|           First "mega-" is matched  and  echoed  to  the  output.
 | |
|           Then  "kludge"  is matched, but the previous "mega-" is
 | |
|           still hanging around at the beginning of yytext so  the
 | |
|           ECHO  for  the "kludge" rule will actually write "mega-
 | |
|           kludge".
 | |
| 
 | |
|      Two notes regarding use of yymore(). First, yymore() depends
 | |
|      on  the value of yyleng correctly reflecting the size of the
 | |
|      current token, so you must not  modify  yyleng  if  you  are
 | |
|      using  yymore().  Second,  the  presence  of yymore() in the
 | |
|      scanner's action entails a minor performance penalty in  the
 | |
|      scanner's matching speed.
 | |
| 
 | |
|      -    yyless(n) returns all but the first n characters of the
 | |
|           current token back to the input stream, where they will
 | |
|           be rescanned when the scanner looks for the next match.
 | |
|           yytext  and  yyleng  are  adjusted appropriately (e.g.,
 | |
|           yyleng will now be equal to n ).  For example,  on  the
 | |
|           input  "foobar"  the  following will write out "foobar-
 | |
|           bar":
 | |
| 
 | |
|               %%
 | |
|               foobar    ECHO; yyless(3);
 | |
|               [a-z]+    ECHO;
 | |
| 
 | |
|           An argument of  0  to  yyless  will  cause  the  entire
 | |
|           current  input  string  to  be  scanned  again.  Unless
 | |
|           you've changed how the scanner will  subsequently  pro-
 | |
|           cess  its  input  (using BEGIN, for example), this will
 | |
|           result in an endless loop.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   13
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      Note that yyless is a macro and can only be used in the flex
 | |
|      input file, not from other source files.
 | |
| 
 | |
|      -    unput(c) puts the  character  c  back  onto  the  input
 | |
|           stream.   It  will  be the next character scanned.  The
 | |
|           following action will take the current token and  cause
 | |
|           it to be rescanned enclosed in parentheses.
 | |
| 
 | |
|               {
 | |
|               int i;
 | |
|               /* Copy yytext because unput() trashes yytext */
 | |
|               char *yycopy = strdup( yytext );
 | |
|               unput( ')' );
 | |
|               for ( i = yyleng - 1; i >= 0; --i )
 | |
|                   unput( yycopy[i] );
 | |
|               unput( '(' );
 | |
|               free( yycopy );
 | |
|               }
 | |
| 
 | |
|           Note that since each unput() puts the  given  character
 | |
|           back at the beginning of the input stream, pushing back
 | |
|           strings must be done back-to-front.
 | |
| 
 | |
|      An important potential problem when using unput() is that if
 | |
|      you are using %pointer (the default), a call to unput() des-
 | |
|      troys the contents of yytext, starting  with  its  rightmost
 | |
|      character  and devouring one character to the left with each
 | |
|      call.  If you need the value of  yytext  preserved  after  a
 | |
|      call  to  unput() (as in the above example), you must either
 | |
|      first copy it elsewhere, or build your scanner using  %array
 | |
|      instead (see How The Input Is Matched).
 | |
| 
 | |
|      Finally, note that you cannot put back  EOF  to  attempt  to
 | |
|      mark the input stream with an end-of-file.
 | |
| 
 | |
|      -    input() reads the next character from the input stream.
 | |
|           For  example, the following is one way to eat up C com-
 | |
|           ments:
 | |
| 
 | |
|               %%
 | |
|               "/*"        {
 | |
|                           register int c;
 | |
| 
 | |
|                           for ( ; ; )
 | |
|                               {
 | |
|                               while ( (c = input()) != '*' &&
 | |
|                                       c != EOF )
 | |
|                                   ;    /* eat up text of comment */
 | |
| 
 | |
|                               if ( c == '*' )
 | |
|                                   {
 | |
|                                   while ( (c = input()) == '*' )
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   14
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|                                       ;
 | |
|                                   if ( c == '/' )
 | |
|                                       break;    /* found the end */
 | |
|                                   }
 | |
| 
 | |
|                               if ( c == EOF )
 | |
|                                   {
 | |
|                                   error( "EOF in comment" );
 | |
|                                   break;
 | |
|                                   }
 | |
|                               }
 | |
|                           }
 | |
| 
 | |
|           (Note that if the scanner is compiled using  C++,  then
 | |
|           input()  is  instead referred to as yyinput(), in order
 | |
|           to avoid a name clash with the C++ stream by  the  name
 | |
|           of input.)
 | |
| 
 | |
|      -    YY_FLUSH_BUFFER flushes the scanner's  internal  buffer
 | |
|           so  that  the next time the scanner attempts to match a
 | |
|           token, it will first refill the buffer  using  YY_INPUT
 | |
|           (see  The  Generated Scanner, below).  This action is a
 | |
|           special case  of  the  more  general  yy_flush_buffer()
 | |
|           function, described below in the section Multiple Input
 | |
|           Buffers.
 | |
| 
 | |
|      -    yyterminate() can be used in lieu of a return statement
 | |
|           in  an action.  It terminates the scanner and returns a
 | |
|           0 to the scanner's caller, indicating "all  done".   By
 | |
|           default,  yyterminate()  is also called when an end-of-
 | |
|           file is encountered.  It is a macro and  may  be  rede-
 | |
|           fined.
 | |
| 
 | |
| THE GENERATED SCANNER
 | |
|      The output of flex is the file lex.yy.c, which contains  the
 | |
|      scanning  routine yylex(), a number of tables used by it for
 | |
|      matching tokens, and a number of auxiliary routines and mac-
 | |
|      ros.  By default, yylex() is declared as follows:
 | |
| 
 | |
|          int yylex()
 | |
|              {
 | |
|              ... various definitions and the actions in here ...
 | |
|              }
 | |
| 
 | |
|      (If your environment supports function prototypes,  then  it
 | |
|      will  be  "int  yylex(  void  )".)   This  definition may be
 | |
|      changed by defining the "YY_DECL" macro.  For  example,  you
 | |
|      could use:
 | |
| 
 | |
|          #define YY_DECL float lexscan( a, b ) float a, b;
 | |
| 
 | |
|      to give the scanning routine the name lexscan,  returning  a
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   15
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      float, and taking two floats as arguments.  Note that if you
 | |
|      give  arguments  to  the  scanning  routine  using  a   K&R-
 | |
|      style/non-prototyped  function  declaration,  you  must ter-
 | |
|      minate the definition with a semi-colon (;).
 | |
| 
 | |
|      Whenever yylex() is called, it scans tokens from the  global
 | |
|      input  file  yyin  (which  defaults to stdin).  It continues
 | |
|      until it either reaches an end-of-file (at  which  point  it
 | |
|      returns the value 0) or one of its actions executes a return
 | |
|      statement.
 | |
| 
 | |
|      If the scanner reaches an end-of-file, subsequent calls  are
 | |
|      undefined  unless either yyin is pointed at a new input file
 | |
|      (in which case scanning continues from that file), or yyres-
 | |
|      tart()  is called.  yyrestart() takes one argument, a FILE *
 | |
|      pointer (which can be nil, if you've set up YY_INPUT to scan
 | |
|      from  a  source  other  than yyin), and initializes yyin for
 | |
|      scanning from that file.  Essentially there is no difference
 | |
|      between  just  assigning  yyin  to a new input file or using
 | |
|      yyrestart() to do so; the latter is available  for  compati-
 | |
|      bility with previous versions of flex, and because it can be
 | |
|      used to switch input files in the middle  of  scanning.   It
 | |
|      can  also be used to throw away the current input buffer, by
 | |
|      calling it with an argument of yyin; but better  is  to  use
 | |
|      YY_FLUSH_BUFFER (see above).  Note that yyrestart() does not
 | |
|      reset the start condition to INITIAL (see Start  Conditions,
 | |
|      below).
 | |
| 
 | |
|      If yylex() stops scanning due to executing a  return  state-
 | |
|      ment  in  one of the actions, the scanner may then be called
 | |
|      again and it will resume scanning where it left off.
 | |
| 
 | |
|      By default (and for purposes  of  efficiency),  the  scanner
 | |
|      uses  block-reads  rather  than  simple getc() calls to read
 | |
|      characters from yyin. The nature of how it  gets  its  input
 | |
|      can   be   controlled   by   defining  the  YY_INPUT  macro.
 | |
|      YY_INPUT's           calling           sequence           is
 | |
|      "YY_INPUT(buf,result,max_size)".   Its action is to place up
 | |
|      to max_size characters in the character array buf and return
 | |
|      in  the integer variable result either the number of charac-
 | |
|      ters read or the constant YY_NULL (0  on  Unix  systems)  to
 | |
|      indicate  EOF.   The  default YY_INPUT reads from the global
 | |
|      file-pointer "yyin".
 | |
| 
 | |
|      A sample definition of YY_INPUT (in the definitions  section
 | |
|      of the input file):
 | |
| 
 | |
|          %{
 | |
|          #define YY_INPUT(buf,result,max_size) \
 | |
|              { \
 | |
|              int c = getchar(); \
 | |
|              result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   16
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|              }
 | |
|          %}
 | |
| 
 | |
|      This definition will change the input  processing  to  occur
 | |
|      one character at a time.
 | |
| 
 | |
|      When the scanner receives  an  end-of-file  indication  from
 | |
|      YY_INPUT, it then checks the yywrap() function.  If yywrap()
 | |
|      returns false (zero), then it is assumed that  the  function
 | |
|      has  gone  ahead  and  set up yyin to point to another input
 | |
|      file, and scanning continues.   If  it  returns  true  (non-
 | |
|      zero),  then  the  scanner  terminates,  returning  0 to its
 | |
|      caller.  Note that  in  either  case,  the  start  condition
 | |
|      remains unchanged; it does not revert to INITIAL.
 | |
| 
 | |
|      If you do not supply your own version of yywrap(), then  you
 | |
|      must  either use %option noyywrap (in which case the scanner
 | |
|      behaves as though yywrap() returned 1),  or  you  must  link
 | |
|      with  -lfl  to  obtain  the  default version of the routine,
 | |
|      which always returns 1.
 | |
| 
 | |
|      Three routines are available  for  scanning  from  in-memory
 | |
|      buffers     rather     than     files:     yy_scan_string(),
 | |
|      yy_scan_bytes(), and yy_scan_buffer(). See the discussion of
 | |
|      them below in the section Multiple Input Buffers.
 | |
| 
 | |
|      The scanner writes its  ECHO  output  to  the  yyout  global
 | |
|      (default, stdout), which may be redefined by the user simply
 | |
|      by assigning it to some other FILE pointer.
 | |
| 
 | |
| START CONDITIONS
 | |
|      flex  provides  a  mechanism  for  conditionally  activating
 | |
|      rules.   Any rule whose pattern is prefixed with "<sc>" will
 | |
|      only be active when the scanner is in  the  start  condition
 | |
|      named "sc".  For example,
 | |
| 
 | |
|          <STRING>[^"]*        { /* eat up the string body ... */
 | |
|                      ...
 | |
|                      }
 | |
| 
 | |
|      will be active only when the  scanner  is  in  the  "STRING"
 | |
|      start condition, and
 | |
| 
 | |
|          <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
 | |
|                      ...
 | |
|                      }
 | |
| 
 | |
|      will be active only when  the  current  start  condition  is
 | |
|      either "INITIAL", "STRING", or "QUOTE".
 | |
| 
 | |
|      Start conditions are declared  in  the  definitions  (first)
 | |
|      section  of  the input using unindented lines beginning with
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   17
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      either %s or %x followed by a list  of  names.   The  former
 | |
|      declares  inclusive  start  conditions, the latter exclusive
 | |
|      start conditions.  A start condition is activated using  the
 | |
|      BEGIN  action.   Until  the  next  BEGIN action is executed,
 | |
|      rules with the given start  condition  will  be  active  and
 | |
|      rules  with other start conditions will be inactive.  If the
 | |
|      start condition is inclusive, then rules with no start  con-
 | |
|      ditions  at  all  will  also be active.  If it is exclusive,
 | |
|      then only rules qualified with the start condition  will  be
 | |
|      active.   A  set  of  rules contingent on the same exclusive
 | |
|      start condition describe a scanner which is  independent  of
 | |
|      any  of the other rules in the flex input.  Because of this,
 | |
|      exclusive start conditions make it easy  to  specify  "mini-
 | |
|      scanners"  which scan portions of the input that are syntac-
 | |
|      tically different from the rest (e.g., comments).
 | |
| 
 | |
|      If the distinction between  inclusive  and  exclusive  start
 | |
|      conditions  is still a little vague, here's a simple example
 | |
|      illustrating the connection between the  two.   The  set  of
 | |
|      rules:
 | |
| 
 | |
|          %s example
 | |
|          %%
 | |
| 
 | |
|          <example>foo   do_something();
 | |
| 
 | |
|          bar            something_else();
 | |
| 
 | |
|      is equivalent to
 | |
| 
 | |
|          %x example
 | |
|          %%
 | |
| 
 | |
|          <example>foo   do_something();
 | |
| 
 | |
|          <INITIAL,example>bar    something_else();
 | |
| 
 | |
|      Without the <INITIAL,example> qualifier, the bar pattern  in
 | |
|      the second example wouldn't be active (i.e., couldn't match)
 | |
|      when in start condition example. If we just  used  <example>
 | |
|      to  qualify  bar,  though,  then  it would only be active in
 | |
|      example and not in INITIAL, while in the first example  it's
 | |
|      active  in  both,  because  in the first example the example
 | |
|      startion condition is an inclusive (%s) start condition.
 | |
| 
 | |
|      Also note that the  special  start-condition  specifier  <*>
 | |
|      matches  every  start  condition.   Thus,  the above example
 | |
|      could also have been written;
 | |
| 
 | |
|          %x example
 | |
|          %%
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   18
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          <example>foo   do_something();
 | |
| 
 | |
|          <*>bar    something_else();
 | |
| 
 | |
| 
 | |
|      The default rule (to ECHO any unmatched  character)  remains
 | |
|      active in start conditions.  It is equivalent to:
 | |
| 
 | |
|          <*>.|\n     ECHO;
 | |
| 
 | |
| 
 | |
|      BEGIN(0) returns to the original state where only the  rules
 | |
|      with no start conditions are active.  This state can also be
 | |
|      referred   to   as   the   start-condition   "INITIAL",   so
 | |
|      BEGIN(INITIAL)  is  equivalent to BEGIN(0). (The parentheses
 | |
|      around the start condition name are  not  required  but  are
 | |
|      considered good style.)
 | |
| 
 | |
|      BEGIN actions can also be given  as  indented  code  at  the
 | |
|      beginning  of the rules section.  For example, the following
 | |
|      will cause the scanner to enter the "SPECIAL"  start  condi-
 | |
|      tion  whenever  yylex()  is  called  and the global variable
 | |
|      enter_special is true:
 | |
| 
 | |
|                  int enter_special;
 | |
| 
 | |
|          %x SPECIAL
 | |
|          %%
 | |
|                  if ( enter_special )
 | |
|                      BEGIN(SPECIAL);
 | |
| 
 | |
|          <SPECIAL>blahblahblah
 | |
|          ...more rules follow...
 | |
| 
 | |
| 
 | |
|      To illustrate the  uses  of  start  conditions,  here  is  a
 | |
|      scanner  which  provides  two different interpretations of a
 | |
|      string like "123.456".  By default it will treat it as three
 | |
|      tokens,  the  integer  "123",  a  dot ('.'), and the integer
 | |
|      "456".  But if the string is preceded earlier in the line by
 | |
|      the  string  "expect-floats"  it  will  treat it as a single
 | |
|      token, the floating-point number 123.456:
 | |
| 
 | |
|          %{
 | |
|          #include <math.h>
 | |
|          %}
 | |
|          %s expect
 | |
| 
 | |
|          %%
 | |
|          expect-floats        BEGIN(expect);
 | |
| 
 | |
|          <expect>[0-9]+"."[0-9]+      {
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   19
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|                      printf( "found a float, = %f\n",
 | |
|                              atof( yytext ) );
 | |
|                      }
 | |
|          <expect>\n           {
 | |
|                      /* that's the end of the line, so
 | |
|                       * we need another "expect-number"
 | |
|                       * before we'll recognize any more
 | |
|                       * numbers
 | |
|                       */
 | |
|                      BEGIN(INITIAL);
 | |
|                      }
 | |
| 
 | |
|          [0-9]+      {
 | |
|                      printf( "found an integer, = %d\n",
 | |
|                              atoi( yytext ) );
 | |
|                      }
 | |
| 
 | |
|          "."         printf( "found a dot\n" );
 | |
| 
 | |
|      Here is a scanner which recognizes (and discards) C comments
 | |
|      while maintaining a count of the current input line.
 | |
| 
 | |
|          %x comment
 | |
|          %%
 | |
|                  int line_num = 1;
 | |
| 
 | |
|          "/*"         BEGIN(comment);
 | |
| 
 | |
|          <comment>[^*\n]*        /* eat anything that's not a '*' */
 | |
|          <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
 | |
|          <comment>\n             ++line_num;
 | |
|          <comment>"*"+"/"        BEGIN(INITIAL);
 | |
| 
 | |
|      This scanner goes to a bit of trouble to match as much  text
 | |
|      as  possible with each rule.  In general, when attempting to
 | |
|      write a high-speed scanner try to match as much possible  in
 | |
|      each rule, as it's a big win.
 | |
| 
 | |
|      Note that start-conditions names are really  integer  values
 | |
|      and  can  be  stored  as  such.   Thus,  the  above could be
 | |
|      extended in the following fashion:
 | |
| 
 | |
|          %x comment foo
 | |
|          %%
 | |
|                  int line_num = 1;
 | |
|                  int comment_caller;
 | |
| 
 | |
|          "/*"         {
 | |
|                       comment_caller = INITIAL;
 | |
|                       BEGIN(comment);
 | |
|                       }
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   20
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          ...
 | |
| 
 | |
|          <foo>"/*"    {
 | |
|                       comment_caller = foo;
 | |
|                       BEGIN(comment);
 | |
|                       }
 | |
| 
 | |
|          <comment>[^*\n]*        /* eat anything that's not a '*' */
 | |
|          <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
 | |
|          <comment>\n             ++line_num;
 | |
|          <comment>"*"+"/"        BEGIN(comment_caller);
 | |
| 
 | |
|      Furthermore, you can  access  the  current  start  condition
 | |
|      using  the  integer-valued YY_START macro.  For example, the
 | |
|      above assignments to comment_caller could instead be written
 | |
| 
 | |
|          comment_caller = YY_START;
 | |
| 
 | |
|      Flex provides YYSTATE as an alias for YY_START  (since  that
 | |
|      is what's used by AT&T lex).
 | |
| 
 | |
|      Note that start conditions do not have their own name-space;
 | |
|      %s's   and  %x's  declare  names  in  the  same  fashion  as
 | |
|      #define's.
 | |
| 
 | |
|      Finally, here's an example of how to  match  C-style  quoted
 | |
|      strings using exclusive start conditions, including expanded
 | |
|      escape sequences (but not including checking  for  a  string
 | |
|      that's too long):
 | |
| 
 | |
|          %x str
 | |
| 
 | |
|          %%
 | |
|                  char string_buf[MAX_STR_CONST];
 | |
|                  char *string_buf_ptr;
 | |
| 
 | |
| 
 | |
|          \"      string_buf_ptr = string_buf; BEGIN(str);
 | |
| 
 | |
|          <str>\"        { /* saw closing quote - all done */
 | |
|                  BEGIN(INITIAL);
 | |
|                  *string_buf_ptr = '\0';
 | |
|                  /* return string constant token type and
 | |
|                   * value to parser
 | |
|                   */
 | |
|                  }
 | |
| 
 | |
|          <str>\n        {
 | |
|                  /* error - unterminated string constant */
 | |
|                  /* generate error message */
 | |
|                  }
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   21
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          <str>\\[0-7]{1,3} {
 | |
|                  /* octal escape sequence */
 | |
|                  int result;
 | |
| 
 | |
|                  (void) sscanf( yytext + 1, "%o", &result );
 | |
| 
 | |
|                  if ( result > 0xff )
 | |
|                          /* error, constant is out-of-bounds */
 | |
| 
 | |
|                  *string_buf_ptr++ = result;
 | |
|                  }
 | |
| 
 | |
|          <str>\\[0-9]+ {
 | |
|                  /* generate error - bad escape sequence; something
 | |
|                   * like '\48' or '\0777777'
 | |
|                   */
 | |
|                  }
 | |
| 
 | |
|          <str>\\n  *string_buf_ptr++ = '\n';
 | |
|          <str>\\t  *string_buf_ptr++ = '\t';
 | |
|          <str>\\r  *string_buf_ptr++ = '\r';
 | |
|          <str>\\b  *string_buf_ptr++ = '\b';
 | |
|          <str>\\f  *string_buf_ptr++ = '\f';
 | |
| 
 | |
|          <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
 | |
| 
 | |
|          <str>[^\\\n\"]+        {
 | |
|                  char *yptr = yytext;
 | |
| 
 | |
|                  while ( *yptr )
 | |
|                          *string_buf_ptr++ = *yptr++;
 | |
|                  }
 | |
| 
 | |
| 
 | |
|      Often, such as in some of the examples above,  you  wind  up
 | |
|      writing  a  whole  bunch  of  rules all preceded by the same
 | |
|      start condition(s).  Flex makes this  a  little  easier  and
 | |
|      cleaner  by introducing a notion of start condition scope. A
 | |
|      start condition scope is begun with:
 | |
| 
 | |
|          <SCs>{
 | |
| 
 | |
|      where SCs is a list of one or more start conditions.  Inside
 | |
|      the  start condition scope, every rule automatically has the
 | |
|      prefix <SCs> applied to it, until a '}'  which  matches  the
 | |
|      initial '{'. So, for example,
 | |
| 
 | |
|          <ESC>{
 | |
|              "\\n"   return '\n';
 | |
|              "\\r"   return '\r';
 | |
|              "\\f"   return '\f';
 | |
|              "\\0"   return '\0';
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   22
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          }
 | |
| 
 | |
|      is equivalent to:
 | |
| 
 | |
|          <ESC>"\\n"  return '\n';
 | |
|          <ESC>"\\r"  return '\r';
 | |
|          <ESC>"\\f"  return '\f';
 | |
|          <ESC>"\\0"  return '\0';
 | |
| 
 | |
|      Start condition scopes may be nested.
 | |
| 
 | |
|      Three routines are  available  for  manipulating  stacks  of
 | |
|      start conditions:
 | |
| 
 | |
|      void yy_push_state(int new_state)
 | |
|           pushes the current start condition onto the top of  the
 | |
|           start  condition  stack  and  switches  to new_state as
 | |
|           though you had used BEGIN new_state (recall that  start
 | |
|           condition names are also integers).
 | |
| 
 | |
|      void yy_pop_state()
 | |
|           pops the top of the stack and switches to it via BEGIN.
 | |
| 
 | |
|      int yy_top_state()
 | |
|           returns the top  of  the  stack  without  altering  the
 | |
|           stack's contents.
 | |
| 
 | |
|      The start condition stack grows dynamically and  so  has  no
 | |
|      built-in  size  limitation.  If memory is exhausted, program
 | |
|      execution aborts.
 | |
| 
 | |
|      To use start condition stacks, your scanner must  include  a
 | |
|      %option stack directive (see Options below).
 | |
| 
 | |
| MULTIPLE INPUT BUFFERS
 | |
|      Some scanners (such as those which support "include"  files)
 | |
|      require   reading  from  several  input  streams.   As  flex
 | |
|      scanners do a large amount of buffering, one cannot  control
 | |
|      where  the  next input will be read from by simply writing a
 | |
|      YY_INPUT  which  is  sensitive  to  the  scanning   context.
 | |
|      YY_INPUT  is only called when the scanner reaches the end of
 | |
|      its buffer, which may be a long time after scanning a state-
 | |
|      ment such as an "include" which requires switching the input
 | |
|      source.
 | |
| 
 | |
|      To negotiate  these  sorts  of  problems,  flex  provides  a
 | |
|      mechanism  for creating and switching between multiple input
 | |
|      buffers.  An input buffer is created by using:
 | |
| 
 | |
|          YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
 | |
| 
 | |
|      which takes a FILE pointer and a size and creates  a  buffer
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   23
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      associated with the given file and large enough to hold size
 | |
|      characters (when in doubt, use YY_BUF_SIZE  for  the  size).
 | |
|      It  returns  a  YY_BUFFER_STATE  handle,  which  may then be
 | |
|      passed to other routines (see below).   The  YY_BUFFER_STATE
 | |
|      type is a pointer to an opaque struct yy_buffer_state struc-
 | |
|      ture, so you may safely initialize YY_BUFFER_STATE variables
 | |
|      to  ((YY_BUFFER_STATE) 0) if you wish, and also refer to the
 | |
|      opaque structure in order to correctly declare input buffers
 | |
|      in  source files other than that of your scanner.  Note that
 | |
|      the FILE pointer in the call  to  yy_create_buffer  is  only
 | |
|      used  as the value of yyin seen by YY_INPUT; if you redefine
 | |
|      YY_INPUT so it no longer uses yyin, then you can safely pass
 | |
|      a nil FILE pointer to yy_create_buffer. You select a partic-
 | |
|      ular buffer to scan from using:
 | |
| 
 | |
|          void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
 | |
| 
 | |
|      switches the scanner's input  buffer  so  subsequent  tokens
 | |
|      will  come  from new_buffer. Note that yy_switch_to_buffer()
 | |
|      may be used by yywrap() to set things up for continued scan-
 | |
|      ning, instead of opening a new file and pointing yyin at it.
 | |
|      Note  also  that  switching   input   sources   via   either
 | |
|      yy_switch_to_buffer()  or yywrap() does not change the start
 | |
|      condition.
 | |
| 
 | |
|          void yy_delete_buffer( YY_BUFFER_STATE buffer )
 | |
| 
 | |
|      is used to reclaim the storage associated with a buffer.   (
 | |
|      buffer  can be nil, in which case the routine does nothing.)
 | |
|      You can also clear the current contents of a buffer using:
 | |
| 
 | |
|          void yy_flush_buffer( YY_BUFFER_STATE buffer )
 | |
| 
 | |
|      This function discards the buffer's contents,  so  the  next
 | |
|      time  the scanner attempts to match a token from the buffer,
 | |
|      it will first fill the buffer anew using YY_INPUT.
 | |
| 
 | |
|      yy_new_buffer() is an alias for yy_create_buffer(), provided
 | |
|      for  compatibility  with  the  C++ use of new and delete for
 | |
|      creating and destroying dynamic objects.
 | |
| 
 | |
|      Finally,   the    YY_CURRENT_BUFFER    macro    returns    a
 | |
|      YY_BUFFER_STATE handle to the current buffer.
 | |
| 
 | |
|      Here is an example of using these  features  for  writing  a
 | |
|      scanner  which expands include files (the <<EOF>> feature is
 | |
|      discussed below):
 | |
| 
 | |
|          /* the "incl" state is used for picking up the name
 | |
|           * of an include file
 | |
|           */
 | |
|          %x incl
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   24
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          %{
 | |
|          #define MAX_INCLUDE_DEPTH 10
 | |
|          YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
 | |
|          int include_stack_ptr = 0;
 | |
|          %}
 | |
| 
 | |
|          %%
 | |
|          include             BEGIN(incl);
 | |
| 
 | |
|          [a-z]+              ECHO;
 | |
|          [^a-z\n]*\n?        ECHO;
 | |
| 
 | |
|          <incl>[ \t]*      /* eat the whitespace */
 | |
|          <incl>[^ \t\n]+   { /* got the include file name */
 | |
|                  if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
 | |
|                      {
 | |
|                      fprintf( stderr, "Includes nested too deeply" );
 | |
|                      exit( 1 );
 | |
|                      }
 | |
| 
 | |
|                  include_stack[include_stack_ptr++] =
 | |
|                      YY_CURRENT_BUFFER;
 | |
| 
 | |
|                  yyin = fopen( yytext, "r" );
 | |
| 
 | |
|                  if ( ! yyin )
 | |
|                      error( ... );
 | |
| 
 | |
|                  yy_switch_to_buffer(
 | |
|                      yy_create_buffer( yyin, YY_BUF_SIZE ) );
 | |
| 
 | |
|                  BEGIN(INITIAL);
 | |
|                  }
 | |
| 
 | |
|          <<EOF>> {
 | |
|                  if ( --include_stack_ptr < 0 )
 | |
|                      {
 | |
|                      yyterminate();
 | |
|                      }
 | |
| 
 | |
|                  else
 | |
|                      {
 | |
|                      yy_delete_buffer( YY_CURRENT_BUFFER );
 | |
|                      yy_switch_to_buffer(
 | |
|                           include_stack[include_stack_ptr] );
 | |
|                      }
 | |
|                  }
 | |
| 
 | |
|      Three routines are available for setting  up  input  buffers
 | |
|      for  scanning  in-memory  strings  instead of files.  All of
 | |
|      them create a new input buffer for scanning the string,  and
 | |
|      return  a  corresponding  YY_BUFFER_STATE  handle (which you
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   25
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      should delete with yy_delete_buffer() when  done  with  it).
 | |
|      They    also    switch    to    the    new    buffer   using
 | |
|      yy_switch_to_buffer(), so the  next  call  to  yylex()  will
 | |
|      start scanning the string.
 | |
| 
 | |
|      yy_scan_string(const char *str)
 | |
|           scans a NUL-terminated string.
 | |
| 
 | |
|      yy_scan_bytes(const char *bytes, int len)
 | |
|           scans len bytes (including possibly NUL's) starting  at
 | |
|           location bytes.
 | |
| 
 | |
|      Note that both of these functions create and scan a copy  of
 | |
|      the  string or bytes.  (This may be desirable, since yylex()
 | |
|      modifies the contents of the buffer it  is  scanning.)   You
 | |
|      can avoid the copy by using:
 | |
| 
 | |
|      yy_scan_buffer(char *base, yy_size_t size)
 | |
|           which scans in place the buffer starting at base,  con-
 | |
|           sisting of size bytes, the last two bytes of which must
 | |
|           be YY_END_OF_BUFFER_CHAR (ASCII NUL).  These  last  two
 | |
|           bytes  are  not  scanned;  thus,  scanning  consists of
 | |
|           base[0] through base[size-2], inclusive.
 | |
| 
 | |
|           If you fail to set up base in this manner (i.e., forget
 | |
|           the   final   two  YY_END_OF_BUFFER_CHAR  bytes),  then
 | |
|           yy_scan_buffer()  returns  a  nil  pointer  instead  of
 | |
|           creating a new input buffer.
 | |
| 
 | |
|           The type yy_size_t is an integral type to which you can
 | |
|           cast  an  integer expression reflecting the size of the
 | |
|           buffer.
 | |
| 
 | |
| END-OF-FILE RULES
 | |
|      The special rule "<<EOF>>" indicates actions which are to be
 | |
|      taken  when  an  end-of-file  is  encountered  and  yywrap()
 | |
|      returns non-zero (i.e., indicates no further files  to  pro-
 | |
|      cess).  The action must finish by doing one of four things:
 | |
| 
 | |
|      -    assigning yyin to a new input file  (in  previous  ver-
 | |
|           sions  of  flex,  after doing the assignment you had to
 | |
|           call the special action YY_NEW_FILE; this is no  longer
 | |
|           necessary);
 | |
| 
 | |
|      -    executing a return statement;
 | |
| 
 | |
|      -    executing the special yyterminate() action;
 | |
| 
 | |
|      -    or,    switching    to    a    new     buffer     using
 | |
|           yy_switch_to_buffer() as shown in the example above.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   26
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      <<EOF>> rules may not be used with other patterns; they  may
 | |
|      only  be  qualified  with a list of start conditions.  If an
 | |
|      unqualified <<EOF>> rule is given, it applies to  all  start
 | |
|      conditions  which  do  not already have <<EOF>> actions.  To
 | |
|      specify an <<EOF>> rule for only the  initial  start  condi-
 | |
|      tion, use
 | |
| 
 | |
|          <INITIAL><<EOF>>
 | |
| 
 | |
| 
 | |
|      These rules are useful for  catching  things  like  unclosed
 | |
|      comments.  An example:
 | |
| 
 | |
|          %x quote
 | |
|          %%
 | |
| 
 | |
|          ...other rules for dealing with quotes...
 | |
| 
 | |
|          <quote><<EOF>>   {
 | |
|                   error( "unterminated quote" );
 | |
|                   yyterminate();
 | |
|                   }
 | |
|          <<EOF>>  {
 | |
|                   if ( *++filelist )
 | |
|                       yyin = fopen( *filelist, "r" );
 | |
|                   else
 | |
|                      yyterminate();
 | |
|                   }
 | |
| 
 | |
| 
 | |
| MISCELLANEOUS MACROS
 | |
|      The macro YY_USER_ACTION can be defined to provide an action
 | |
|      which is always executed prior to the matched rule's action.
 | |
|      For example, it could be #define'd to call a routine to con-
 | |
|      vert  yytext to lower-case.  When YY_USER_ACTION is invoked,
 | |
|      the variable yy_act gives the number  of  the  matched  rule
 | |
|      (rules  are  numbered starting with 1).  Suppose you want to
 | |
|      profile how often each of your rules is matched.   The  fol-
 | |
|      lowing would do the trick:
 | |
| 
 | |
|          #define YY_USER_ACTION ++ctr[yy_act]
 | |
| 
 | |
|      where ctr is an array to hold the counts for  the  different
 | |
|      rules.   Note  that  the  macro YY_NUM_RULES gives the total
 | |
|      number of rules (including the default rule, even if you use
 | |
|      -s), so a correct declaration for ctr is:
 | |
| 
 | |
|          int ctr[YY_NUM_RULES];
 | |
| 
 | |
| 
 | |
|      The macro YY_USER_INIT may be defined to provide  an  action
 | |
|      which  is  always executed before the first scan (and before
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   27
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      the scanner's internal initializations are done).  For exam-
 | |
|      ple,  it  could  be used to call a routine to read in a data
 | |
|      table or open a logging file.
 | |
| 
 | |
|      The macro yy_set_interactive(is_interactive) can be used  to
 | |
|      control  whether  the  current buffer is considered interac-
 | |
|      tive. An interactive buffer is processed  more  slowly,  but
 | |
|      must  be  used  when  the  scanner's  input source is indeed
 | |
|      interactive to avoid problems due to waiting to fill buffers
 | |
|      (see the discussion of the -I flag below).  A non-zero value
 | |
|      in the macro invocation marks the buffer as  interactive,  a
 | |
|      zero  value as non-interactive.  Note that use of this macro
 | |
|      overrides  %option  always-interactive  or  %option   never-
 | |
|      interactive  (see Options below).  yy_set_interactive() must
 | |
|      be invoked prior to beginning to scan the buffer that is (or
 | |
|      is not) to be considered interactive.
 | |
| 
 | |
|      The macro yy_set_bol(at_bol) can be used to control  whether
 | |
|      the  current  buffer's  scanning  context for the next token
 | |
|      match is done as though at the beginning of a line.  A  non-
 | |
|      zero macro argument makes rules anchored with
 | |
| 
 | |
|      The macro YY_AT_BOL() returns true if the next token scanned
 | |
|      from  the  current  buffer will have '^' rules active, false
 | |
|      otherwise.
 | |
| 
 | |
|      In the generated scanner, the actions are  all  gathered  in
 | |
|      one  large  switch  statement  and separated using YY_BREAK,
 | |
|      which may be redefined.  By default, it is simply a "break",
 | |
|      to  separate  each  rule's action from the following rule's.
 | |
|      Redefining  YY_BREAK  allows,  for  example,  C++  users  to
 | |
|      #define  YY_BREAK  to  do  nothing (while being very careful
 | |
|      that every rule ends with a "break" or a "return"!) to avoid
 | |
|      suffering  from unreachable statement warnings where because
 | |
|      a rule's action ends with "return", the YY_BREAK is inacces-
 | |
|      sible.
 | |
| 
 | |
| VALUES AVAILABLE TO THE USER
 | |
|      This section summarizes the various values available to  the
 | |
|      user in the rule actions.
 | |
| 
 | |
|      -    char *yytext holds the text of the current  token.   It
 | |
|           may  be  modified but not lengthened (you cannot append
 | |
|           characters to the end).
 | |
| 
 | |
|           If the special directive %array appears  in  the  first
 | |
|           section  of  the  scanner  description,  then yytext is
 | |
|           instead declared char yytext[YYLMAX], where YYLMAX is a
 | |
|           macro  definition  that  you  can redefine in the first
 | |
|           section if you don't like the default value  (generally
 | |
|           8KB).    Using   %array   results  in  somewhat  slower
 | |
|           scanners, but the value of  yytext  becomes  immune  to
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   28
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           calls to input() and unput(), which potentially destroy
 | |
|           its value when yytext  is  a  character  pointer.   The
 | |
|           opposite of %array is %pointer, which is the default.
 | |
| 
 | |
|           You cannot  use  %array  when  generating  C++  scanner
 | |
|           classes (the -+ flag).
 | |
| 
 | |
|      -    int yyleng holds the length of the current token.
 | |
| 
 | |
|      -    FILE *yyin is the file  which  by  default  flex  reads
 | |
|           from.   It  may  be  redefined  but doing so only makes
 | |
|           sense before scanning begins or after an EOF  has  been
 | |
|           encountered.  Changing it in the midst of scanning will
 | |
|           have unexpected results since flex buffers  its  input;
 | |
|           use  yyrestart()  instead.   Once  scanning  terminates
 | |
|           because an end-of-file has been seen,  you  can  assign
 | |
|           yyin  at  the  new input file and then call the scanner
 | |
|           again to continue scanning.
 | |
| 
 | |
|      -    void yyrestart( FILE *new_file ) may be called to point
 | |
|           yyin at the new input file.  The switch-over to the new
 | |
|           file is immediate (any previously buffered-up input  is
 | |
|           lost).   Note  that calling yyrestart() with yyin as an
 | |
|           argument thus throws away the current input buffer  and
 | |
|           continues scanning the same input file.
 | |
| 
 | |
|      -    FILE *yyout is the file to which ECHO actions are done.
 | |
|           It can be reassigned by the user.
 | |
| 
 | |
|      -    YY_CURRENT_BUFFER returns a YY_BUFFER_STATE  handle  to
 | |
|           the current buffer.
 | |
| 
 | |
|      -    YY_START returns an integer value corresponding to  the
 | |
|           current start condition.  You can subsequently use this
 | |
|           value with BEGIN to return to that start condition.
 | |
| 
 | |
| INTERFACING WITH YACC
 | |
|      One of the main uses of flex is as a companion to  the  yacc
 | |
|      parser-generator.   yacc  parsers  expect  to call a routine
 | |
|      named yylex() to find the next input token.  The routine  is
 | |
|      supposed  to  return  the  type of the next token as well as
 | |
|      putting any associated value in the global  yylval.  To  use
 | |
|      flex  with  yacc,  one  specifies  the  -d option to yacc to
 | |
|      instruct it to generate the file y.tab.h containing  defini-
 | |
|      tions  of all the %tokens appearing in the yacc input.  This
 | |
|      file is then included in the flex scanner.  For example,  if
 | |
|      one of the tokens is "TOK_NUMBER", part of the scanner might
 | |
|      look like:
 | |
| 
 | |
|          %{
 | |
|          #include "y.tab.h"
 | |
|          %}
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   29
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          %%
 | |
| 
 | |
|          [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
 | |
| 
 | |
| 
 | |
| OPTIONS
 | |
|      flex has the following options:
 | |
| 
 | |
|      -b   Generate backing-up information to lex.backup. This  is
 | |
|           a  list  of scanner states which require backing up and
 | |
|           the input characters on which they do  so.   By  adding
 | |
|           rules   one  can  remove  backing-up  states.   If  all
 | |
|           backing-up states are eliminated  and  -Cf  or  -CF  is
 | |
|           used, the generated scanner will run faster (see the -p
 | |
|           flag).  Only users who wish to squeeze every last cycle
 | |
|           out  of  their  scanners  need worry about this option.
 | |
|           (See the section on Performance Considerations below.)
 | |
| 
 | |
|      -c   is a do-nothing, deprecated option included  for  POSIX
 | |
|           compliance.
 | |
| 
 | |
|      -d   makes the generated scanner run in debug  mode.   When-
 | |
|           ever   a   pattern   is   recognized   and  the  global
 | |
|           yy_flex_debug is non-zero (which is the  default),  the
 | |
|           scanner will write to stderr a line of the form:
 | |
| 
 | |
|               --accepting rule at line 53 ("the matched text")
 | |
| 
 | |
|           The line number refers to the location of the  rule  in
 | |
|           the  file defining the scanner (i.e., the file that was
 | |
|           fed to flex).  Messages are  also  generated  when  the
 | |
|           scanner backs up, accepts the default rule, reaches the
 | |
|           end of its input buffer (or encounters a NUL;  at  this
 | |
|           point,  the  two  look the same as far as the scanner's
 | |
|           concerned), or reaches an end-of-file.
 | |
| 
 | |
|      -f   specifies fast scanner. No table  compression  is  done
 | |
|           and  stdio  is bypassed.  The result is large but fast.
 | |
|           This option is equivalent to -Cfr (see below).
 | |
| 
 | |
|      -h   generates a "help" summary of flex's options to  stdout
 | |
|           and then exits.  -? and --help are synonyms for -h.
 | |
| 
 | |
|      -i   instructs flex to generate a case-insensitive  scanner.
 | |
|           The  case  of  letters given in the flex input patterns
 | |
|           will be ignored,  and  tokens  in  the  input  will  be
 | |
|           matched  regardless of case.  The matched text given in
 | |
|           yytext will have the preserved case (i.e., it will  not
 | |
|           be folded).
 | |
| 
 | |
|      -l   turns on maximum compatibility with the  original  AT&T
 | |
|           lex  implementation.  Note that this does not mean full
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   30
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           compatibility.  Use of this option costs a considerable
 | |
|           amount  of  performance, and it cannot be used with the
 | |
|           -+, -f, -F, -Cf, or -CF options.  For  details  on  the
 | |
|           compatibilities  it provides, see the section "Incompa-
 | |
|           tibilities With Lex And POSIX" below.  This option also
 | |
|           results  in the name YY_FLEX_LEX_COMPAT being #define'd
 | |
|           in the generated scanner.
 | |
| 
 | |
|      -n   is another do-nothing, deprecated option included  only
 | |
|           for POSIX compliance.
 | |
| 
 | |
|      -p   generates a performance report to stderr.   The  report
 | |
|           consists  of  comments  regarding  features of the flex
 | |
|           input file which will cause a serious loss  of  perfor-
 | |
|           mance  in  the resulting scanner.  If you give the flag
 | |
|           twice, you will also get  comments  regarding  features
 | |
|           that lead to minor performance losses.
 | |
| 
 | |
|           Note that the use  of  REJECT,  %option  yylineno,  and
 | |
|           variable  trailing context (see the Deficiencies / Bugs
 | |
|           section  below)  entails  a   substantial   performance
 | |
|           penalty;  use  of  yymore(), the ^ operator, and the -I
 | |
|           flag entail minor performance penalties.
 | |
| 
 | |
|      -s   causes the default rule (that unmatched  scanner  input
 | |
|           is  echoed to stdout) to be suppressed.  If the scanner
 | |
|           encounters input that does not match any of its  rules,
 | |
|           it  aborts  with  an  error.  This option is useful for
 | |
|           finding holes in a scanner's rule set.
 | |
| 
 | |
|      -t   instructs flex to write the  scanner  it  generates  to
 | |
|           standard output instead of lex.yy.c.
 | |
| 
 | |
|      -v   specifies that flex should write to stderr a summary of
 | |
|           statistics regarding the scanner it generates.  Most of
 | |
|           the statistics are meaningless to the casual flex user,
 | |
|           but the first line identifies the version of flex (same
 | |
|           as reported by -V), and the next line  the  flags  used
 | |
|           when  generating  the scanner, including those that are
 | |
|           on by default.
 | |
| 
 | |
|      -w   suppresses warning messages.
 | |
| 
 | |
|      -B   instructs flex to generate a batch scanner,  the  oppo-
 | |
|           site  of  interactive  scanners  generated  by  -I (see
 | |
|           below).  In general, you use -B when  you  are  certain
 | |
|           that your scanner will never be used interactively, and
 | |
|           you want to squeeze a little more  performance  out  of
 | |
|           it.   If your goal is instead to squeeze out a lot more
 | |
|           performance, you  should   be  using  the  -Cf  or  -CF
 | |
|           options  (discussed  below), which turn on -B automati-
 | |
|           cally anyway.
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   31
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      -F   specifies that the fast  scanner  table  representation
 | |
|           should  be used (and stdio bypassed).  This representa-
 | |
|           tion is about as fast as the full table  representation
 | |
|           (-f),  and  for some sets of patterns will be consider-
 | |
|           ably smaller (and for others, larger).  In general,  if
 | |
|           the  pattern  set contains both "keywords" and a catch-
 | |
|           all, "identifier" rule, such as in the set:
 | |
| 
 | |
|               "case"    return TOK_CASE;
 | |
|               "switch"  return TOK_SWITCH;
 | |
|               ...
 | |
|               "default" return TOK_DEFAULT;
 | |
|               [a-z]+    return TOK_ID;
 | |
| 
 | |
|           then you're better off using the full table representa-
 | |
|           tion.  If only the "identifier" rule is present and you
 | |
|           then use a hash table or some such to detect  the  key-
 | |
|           words, you're better off using -F.
 | |
| 
 | |
|           This option is equivalent to -CFr (see below).  It can-
 | |
|           not be used with -+.
 | |
| 
 | |
|      -I   instructs flex to generate an interactive scanner.   An
 | |
|           interactive  scanner  is  one  that only looks ahead to
 | |
|           decide what token has been  matched  if  it  absolutely
 | |
|           must.  It turns out that always looking one extra char-
 | |
|           acter ahead, even  if  the  scanner  has  already  seen
 | |
|           enough text to disambiguate the current token, is a bit
 | |
|           faster than only looking  ahead  when  necessary.   But
 | |
|           scanners  that always look ahead give dreadful interac-
 | |
|           tive performance; for example, when a user types a new-
 | |
|           line,  it  is  not  recognized as a newline token until
 | |
|           they enter another token, which often means  typing  in
 | |
|           another whole line.
 | |
| 
 | |
|           Flex scanners default to interactive unless you use the
 | |
|           -Cf  or  -CF  table-compression  options  (see  below).
 | |
|           That's because if you're looking  for  high-performance
 | |
|           you  should  be  using  one of these options, so if you
 | |
|           didn't, flex assumes you'd rather trade off  a  bit  of
 | |
|           run-time    performance   for   intuitive   interactive
 | |
|           behavior.  Note also that you cannot use -I in conjunc-
 | |
|           tion  with  -Cf or -CF. Thus, this option is not really
 | |
|           needed; it is on by default  for  all  those  cases  in
 | |
|           which it is allowed.
 | |
| 
 | |
|           You can force a scanner to not be interactive by  using
 | |
|           -B (see above).
 | |
| 
 | |
|      -L   instructs  flex  not  to  generate  #line   directives.
 | |
|           Without this option, flex peppers the generated scanner
 | |
|           with #line directives so error messages in the  actions
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   32
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           will  be  correctly  located with respect to either the
 | |
|           original flex input file (if the errors are due to code
 | |
|           in  the  input  file),  or  lex.yy.c (if the errors are
 | |
|           flex's fault -- you should report these sorts of errors
 | |
|           to the email address given below).
 | |
| 
 | |
|      -T   makes flex run in trace mode.  It will generate  a  lot
 | |
|           of  messages to stderr concerning the form of the input
 | |
|           and the resultant non-deterministic  and  deterministic
 | |
|           finite  automata.   This  option  is  mostly for use in
 | |
|           maintaining flex.
 | |
| 
 | |
|      -V   prints the version number to stdout and exits.   --ver-
 | |
|           sion is a synonym for -V.
 | |
| 
 | |
|      -7   instructs flex to generate a 7-bit scanner,  i.e.,  one
 | |
|           which  can  only  recognized  7-bit  characters  in its
 | |
|           input.  The advantage of using -7 is that the scanner's
 | |
|           tables  can  be  up to half the size of those generated
 | |
|           using the -8 option (see below).  The  disadvantage  is
 | |
|           that  such  scanners often hang or crash if their input
 | |
|           contains an 8-bit character.
 | |
| 
 | |
|           Note, however, that unless you  generate  your  scanner
 | |
|           using  the -Cf or -CF table compression options, use of
 | |
|           -7 will save only a small amount of  table  space,  and
 | |
|           make  your  scanner considerably less portable.  Flex's
 | |
|           default behavior is to generate an 8-bit scanner unless
 | |
|           you  use the -Cf or -CF, in which case flex defaults to
 | |
|           generating 7-bit scanners unless your site  was  always
 | |
|           configured to generate 8-bit scanners (as will often be
 | |
|           the case with non-USA sites).   You  can  tell  whether
 | |
|           flex  generated a 7-bit or an 8-bit scanner by inspect-
 | |
|           ing the flag summary in  the  -v  output  as  described
 | |
|           above.
 | |
| 
 | |
|           Note that if you use -Cfe or -CFe (those table compres-
 | |
|           sion  options,  but  also  using equivalence classes as
 | |
|           discussed see below), flex still defaults to generating
 | |
|           an  8-bit scanner, since usually with these compression
 | |
|           options full 8-bit tables are not much  more  expensive
 | |
|           than 7-bit tables.
 | |
| 
 | |
|      -8   instructs flex to generate an 8-bit scanner, i.e.,  one
 | |
|           which  can  recognize  8-bit  characters.  This flag is
 | |
|           only needed for scanners generated using -Cf or -CF, as
 | |
|           otherwise  flex defaults to generating an 8-bit scanner
 | |
|           anyway.
 | |
| 
 | |
|           See the discussion  of  -7  above  for  flex's  default
 | |
|           behavior  and  the  tradeoffs  between  7-bit and 8-bit
 | |
|           scanners.
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   33
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      -+   specifies that you want flex to generate a C++  scanner
 | |
|           class.   See  the  section  on  Generating C++ Scanners
 | |
|           below for details.
 | |
| 
 | |
|      -C[aefFmr]
 | |
|           controls the degree of table compression and, more gen-
 | |
|           erally,  trade-offs  between  small  scanners  and fast
 | |
|           scanners.
 | |
| 
 | |
|           -Ca ("align") instructs flex to trade off larger tables
 | |
|           in the generated scanner for faster performance because
 | |
|           the elements of  the  tables  are  better  aligned  for
 | |
|           memory  access and computation.  On some RISC architec-
 | |
|           tures, fetching  and  manipulating  longwords  is  more
 | |
|           efficient  than with smaller-sized units such as short-
 | |
|           words.  This option can double the size of  the  tables
 | |
|           used by your scanner.
 | |
| 
 | |
|           -Ce directs  flex  to  construct  equivalence  classes,
 | |
|           i.e.,  sets  of characters which have identical lexical
 | |
|           properties (for example,  if  the  only  appearance  of
 | |
|           digits  in  the  flex  input  is in the character class
 | |
|           "[0-9]" then the digits '0', '1', ..., '9' will all  be
 | |
|           put   in  the  same  equivalence  class).   Equivalence
 | |
|           classes usually give dramatic reductions in  the  final
 | |
|           table/object file sizes (typically a factor of 2-5) and
 | |
|           are pretty cheap performance-wise  (one  array  look-up
 | |
|           per character scanned).
 | |
| 
 | |
|           -Cf specifies that the full scanner  tables  should  be
 | |
|           generated - flex should not compress the tables by tak-
 | |
|           ing advantages of similar transition functions for dif-
 | |
|           ferent states.
 | |
| 
 | |
|           -CF specifies that the alternate fast scanner represen-
 | |
|           tation  (described  above  under the -F flag) should be
 | |
|           used.  This option cannot be used with -+.
 | |
| 
 | |
|           -Cm directs flex to construct meta-equivalence classes,
 | |
|           which  are  sets of equivalence classes (or characters,
 | |
|           if equivalence classes are not  being  used)  that  are
 | |
|           commonly  used  together.  Meta-equivalence classes are
 | |
|           often a big win when using compressed tables, but  they
 | |
|           have  a  moderate  performance  impact (one or two "if"
 | |
|           tests and one array look-up per character scanned).
 | |
| 
 | |
|           -Cr causes the generated scanner to bypass use  of  the
 | |
|           standard  I/O  library  (stdio)  for input.  Instead of
 | |
|           calling fread() or getc(), the  scanner  will  use  the
 | |
|           read()  system  call,  resulting  in a performance gain
 | |
|           which varies from system to system, but in  general  is
 | |
|           probably  negligible  unless  you are also using -Cf or
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   34
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           -CF. Using -Cr can cause strange behavior if, for exam-
 | |
|           ple,  you  read  from yyin using stdio prior to calling
 | |
|           the scanner (because the  scanner  will  miss  whatever
 | |
|           text  your  previous  reads  left  in  the  stdio input
 | |
|           buffer).
 | |
| 
 | |
|           -Cr has no effect if you define YY_INPUT (see The  Gen-
 | |
|           erated Scanner above).
 | |
| 
 | |
|           A lone -C specifies that the scanner tables  should  be
 | |
|           compressed  but  neither  equivalence classes nor meta-
 | |
|           equivalence classes should be used.
 | |
| 
 | |
|           The options -Cf or  -CF  and  -Cm  do  not  make  sense
 | |
|           together - there is no opportunity for meta-equivalence
 | |
|           classes if the table is not being  compressed.   Other-
 | |
|           wise  the  options may be freely mixed, and are cumula-
 | |
|           tive.
 | |
| 
 | |
|           The default setting is -Cem, which specifies that  flex
 | |
|           should   generate   equivalence   classes   and   meta-
 | |
|           equivalence classes.  This setting provides the highest
 | |
|           degree   of  table  compression.   You  can  trade  off
 | |
|           faster-executing scanners at the cost of larger  tables
 | |
|           with the following generally being true:
 | |
| 
 | |
|               slowest & smallest
 | |
|                     -Cem
 | |
|                     -Cm
 | |
|                     -Ce
 | |
|                     -C
 | |
|                     -C{f,F}e
 | |
|                     -C{f,F}
 | |
|                     -C{f,F}a
 | |
|               fastest & largest
 | |
| 
 | |
|           Note that scanners with the smallest tables are usually
 | |
|           generated and compiled the quickest, so during develop-
 | |
|           ment you will usually want to use the default,  maximal
 | |
|           compression.
 | |
| 
 | |
|           -Cfe is often a good compromise between speed and  size
 | |
|           for production scanners.
 | |
| 
 | |
|      -ooutput
 | |
|           directs flex to write the scanner to  the  file  output
 | |
|           instead  of  lex.yy.c.  If  you  combine -o with the -t
 | |
|           option, then the scanner is written to stdout  but  its
 | |
|           #line directives (see the -L option above) refer to the
 | |
|           file output.
 | |
| 
 | |
|      -Pprefix
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   35
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           changes the default yy prefix  used  by  flex  for  all
 | |
|           globally-visible variable and function names to instead
 | |
|           be prefix. For  example,  -Pfoo  changes  the  name  of
 | |
|           yytext  to  footext.  It  also  changes the name of the
 | |
|           default output file from lex.yy.c  to  lex.foo.c.  Here
 | |
|           are all of the names affected:
 | |
| 
 | |
|               yy_create_buffer
 | |
|               yy_delete_buffer
 | |
|               yy_flex_debug
 | |
|               yy_init_buffer
 | |
|               yy_flush_buffer
 | |
|               yy_load_buffer_state
 | |
|               yy_switch_to_buffer
 | |
|               yyin
 | |
|               yyleng
 | |
|               yylex
 | |
|               yylineno
 | |
|               yyout
 | |
|               yyrestart
 | |
|               yytext
 | |
|               yywrap
 | |
| 
 | |
|           (If you are using a C++ scanner, then only  yywrap  and
 | |
|           yyFlexLexer  are affected.) Within your scanner itself,
 | |
|           you can still refer to the global variables  and  func-
 | |
|           tions  using  either  version of their name; but exter-
 | |
|           nally, they have the modified name.
 | |
| 
 | |
|           This option lets you easily link together multiple flex
 | |
|           programs  into the same executable.  Note, though, that
 | |
|           using this option also renames  yywrap(),  so  you  now
 | |
|           must either provide your own (appropriately-named) ver-
 | |
|           sion of the routine for your scanner,  or  use  %option
 | |
|           noyywrap,  as  linking with -lfl no longer provides one
 | |
|           for you by default.
 | |
| 
 | |
|      -Sskeleton_file
 | |
|           overrides the default skeleton  file  from  which  flex
 | |
|           constructs its scanners.  You'll never need this option
 | |
|           unless you are doing flex maintenance or development.
 | |
| 
 | |
|      flex also  provides  a  mechanism  for  controlling  options
 | |
|      within  the  scanner  specification itself, rather than from
 | |
|      the flex command-line.  This is done  by  including  %option
 | |
|      directives  in  the  first section of the scanner specifica-
 | |
|      tion.  You  can  specify  multiple  options  with  a  single
 | |
|      %option directive, and multiple directives in the first sec-
 | |
|      tion of your flex input file.
 | |
| 
 | |
|      Most options are given simply as names, optionally  preceded
 | |
|      by  the word "no" (with no intervening whitespace) to negate
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   36
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      their meaning.  A number are equivalent  to  flex  flags  or
 | |
|      their negation:
 | |
| 
 | |
|          7bit            -7 option
 | |
|          8bit            -8 option
 | |
|          align           -Ca option
 | |
|          backup          -b option
 | |
|          batch           -B option
 | |
|          c++             -+ option
 | |
| 
 | |
|          caseful or
 | |
|          case-sensitive  opposite of -i (default)
 | |
| 
 | |
|          case-insensitive or
 | |
|          caseless        -i option
 | |
| 
 | |
|          debug           -d option
 | |
|          default         opposite of -s option
 | |
|          ecs             -Ce option
 | |
|          fast            -F option
 | |
|          full            -f option
 | |
|          interactive     -I option
 | |
|          lex-compat      -l option
 | |
|          meta-ecs        -Cm option
 | |
|          perf-report     -p option
 | |
|          read            -Cr option
 | |
|          stdout          -t option
 | |
|          verbose         -v option
 | |
|          warn            opposite of -w option
 | |
|                          (use "%option nowarn" for -w)
 | |
| 
 | |
|          array           equivalent to "%array"
 | |
|          pointer         equivalent to "%pointer" (default)
 | |
| 
 | |
|      Some %option's provide features otherwise not available:
 | |
| 
 | |
|      always-interactive
 | |
|           instructs flex to generate a scanner which always  con-
 | |
|           siders  its input "interactive".  Normally, on each new
 | |
|           input file the scanner calls isatty() in an attempt  to
 | |
|           determine   whether   the  scanner's  input  source  is
 | |
|           interactive and thus should be read a  character  at  a
 | |
|           time.   When this option is used, however, then no such
 | |
|           call is made.
 | |
| 
 | |
|      main directs flex to provide a default  main()  program  for
 | |
|           the  scanner,  which  simply calls yylex(). This option
 | |
|           implies noyywrap (see below).
 | |
| 
 | |
|      never-interactive
 | |
|           instructs flex to generate a scanner which  never  con-
 | |
|           siders  its input "interactive" (again, no call made to
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   37
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           isatty()). This is the opposite of always-interactive.
 | |
| 
 | |
|      stack
 | |
|           enables the use of start condition  stacks  (see  Start
 | |
|           Conditions above).
 | |
| 
 | |
|      stdinit
 | |
|           if set (i.e., %option  stdinit)  initializes  yyin  and
 | |
|           yyout  to  stdin  and stdout, instead of the default of
 | |
|           nil.  Some  existing  lex  programs  depend   on   this
 | |
|           behavior,  even though it is not compliant with ANSI C,
 | |
|           which does not require stdin and stdout to be  compile-
 | |
|           time constant.
 | |
| 
 | |
|      yylineno
 | |
|           directs flex to generate a scanner that  maintains  the
 | |
|           number  of  the current line read from its input in the
 | |
|           global variable yylineno. This  option  is  implied  by
 | |
|           %option lex-compat.
 | |
| 
 | |
|      yywrap
 | |
|           if unset (i.e., %option noyywrap),  makes  the  scanner
 | |
|           not  call  yywrap()  upon  an  end-of-file,  but simply
 | |
|           assume that there are no more files to scan (until  the
 | |
|           user  points  yyin  at  a  new  file  and calls yylex()
 | |
|           again).
 | |
| 
 | |
|      flex scans your rule actions to determine  whether  you  use
 | |
|      the  REJECT  or  yymore()  features.   The reject and yymore
 | |
|      options are available to override its decision as to whether
 | |
|      you  use  the options, either by setting them (e.g., %option
 | |
|      reject) to indicate the feature is indeed used, or unsetting
 | |
|      them  to  indicate  it  actually  is not used (e.g., %option
 | |
|      noyymore).
 | |
| 
 | |
|      Three options take string-delimited values, offset with '=':
 | |
| 
 | |
|          %option outfile="ABC"
 | |
| 
 | |
|      is equivalent to -oABC, and
 | |
| 
 | |
|          %option prefix="XYZ"
 | |
| 
 | |
|      is equivalent to -PXYZ. Finally,
 | |
| 
 | |
|          %option yyclass="foo"
 | |
| 
 | |
|      only applies when generating a C++ scanner ( -+ option).  It
 | |
|      informs  flex  that  you  have  derived foo as a subclass of
 | |
|      yyFlexLexer, so flex will place your actions in  the  member
 | |
|      function  foo::yylex()  instead  of yyFlexLexer::yylex(). It
 | |
|      also generates a yyFlexLexer::yylex() member  function  that
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   38
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      emits      a      run-time      error      (by      invoking
 | |
|      yyFlexLexer::LexerError()) if called.   See  Generating  C++
 | |
|      Scanners, below, for additional information.
 | |
| 
 | |
|      A number of options are available for lint purists who  want
 | |
|      to  suppress the appearance of unneeded routines in the gen-
 | |
|      erated scanner.  Each of  the  following,  if  unset  (e.g.,
 | |
|      %option  nounput ), results in the corresponding routine not
 | |
|      appearing in the generated scanner:
 | |
| 
 | |
|          input, unput
 | |
|          yy_push_state, yy_pop_state, yy_top_state
 | |
|          yy_scan_buffer, yy_scan_bytes, yy_scan_string
 | |
| 
 | |
|      (though yy_push_state()  and  friends  won't  appear  anyway
 | |
|      unless you use %option stack).
 | |
| 
 | |
| PERFORMANCE CONSIDERATIONS
 | |
|      The main design goal of  flex  is  that  it  generate  high-
 | |
|      performance  scanners.   It  has  been optimized for dealing
 | |
|      well with large sets of rules.  Aside from  the  effects  on
 | |
|      scanner  speed  of the table compression -C options outlined
 | |
|      above, there are a number of options/actions  which  degrade
 | |
|      performance.  These are, from most expensive to least:
 | |
| 
 | |
|          REJECT
 | |
|          %option yylineno
 | |
|          arbitrary trailing context
 | |
| 
 | |
|          pattern sets that require backing up
 | |
|          %array
 | |
|          %option interactive
 | |
|          %option always-interactive
 | |
| 
 | |
|          '^' beginning-of-line operator
 | |
|          yymore()
 | |
| 
 | |
|      with the first three all being quite expensive and the  last
 | |
|      two  being  quite  cheap.   Note also that unput() is imple-
 | |
|      mented as a routine call that potentially does quite  a  bit
 | |
|      of  work,  while yyless() is a quite-cheap macro; so if just
 | |
|      putting back some excess text you scanned, use yyless().
 | |
| 
 | |
|      REJECT should be avoided at all costs  when  performance  is
 | |
|      important.  It is a particularly expensive option.
 | |
| 
 | |
|      Getting rid of backing up is messy and often may be an enor-
 | |
|      mous  amount  of work for a complicated scanner.  In princi-
 | |
|      pal,  one  begins  by  using  the  -b  flag  to  generate  a
 | |
|      lex.backup file.  For example, on the input
 | |
| 
 | |
|          %%
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   39
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          foo        return TOK_KEYWORD;
 | |
|          foobar     return TOK_KEYWORD;
 | |
| 
 | |
|      the file looks like:
 | |
| 
 | |
|          State #6 is non-accepting -
 | |
|           associated rule line numbers:
 | |
|                 2       3
 | |
|           out-transitions: [ o ]
 | |
|           jam-transitions: EOF [ \001-n  p-\177 ]
 | |
| 
 | |
|          State #8 is non-accepting -
 | |
|           associated rule line numbers:
 | |
|                 3
 | |
|           out-transitions: [ a ]
 | |
|           jam-transitions: EOF [ \001-`  b-\177 ]
 | |
| 
 | |
|          State #9 is non-accepting -
 | |
|           associated rule line numbers:
 | |
|                 3
 | |
|           out-transitions: [ r ]
 | |
|           jam-transitions: EOF [ \001-q  s-\177 ]
 | |
| 
 | |
|          Compressed tables always back up.
 | |
| 
 | |
|      The first few lines tell us that there's a scanner state  in
 | |
|      which  it  can  make  a  transition on an 'o' but not on any
 | |
|      other character,  and  that  in  that  state  the  currently
 | |
|      scanned text does not match any rule.  The state occurs when
 | |
|      trying to match the rules found at lines  2  and  3  in  the
 | |
|      input  file.  If the scanner is in that state and then reads
 | |
|      something other than an 'o', it will have to back up to find
 | |
|      a  rule  which is matched.  With a bit of headscratching one
 | |
|      can see that this must be the state it's in when it has seen
 | |
|      "fo".   When  this  has  happened,  if  anything  other than
 | |
|      another 'o' is seen, the scanner will have  to  back  up  to
 | |
|      simply match the 'f' (by the default rule).
 | |
| 
 | |
|      The comment regarding State #8 indicates there's  a  problem
 | |
|      when  "foob"  has  been  scanned.   Indeed, on any character
 | |
|      other than an 'a', the scanner  will  have  to  back  up  to
 | |
|      accept  "foo".  Similarly, the comment for State #9 concerns
 | |
|      when "fooba" has been scanned and an 'r' does not follow.
 | |
| 
 | |
|      The final comment reminds us that there's no point going  to
 | |
|      all the trouble of removing backing up from the rules unless
 | |
|      we're using -Cf or -CF, since there's  no  performance  gain
 | |
|      doing so with compressed scanners.
 | |
| 
 | |
|      The way to remove the backing up is to add "error" rules:
 | |
| 
 | |
|          %%
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   40
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          foo         return TOK_KEYWORD;
 | |
|          foobar      return TOK_KEYWORD;
 | |
| 
 | |
|          fooba       |
 | |
|          foob        |
 | |
|          fo          {
 | |
|                      /* false alarm, not really a keyword */
 | |
|                      return TOK_ID;
 | |
|                      }
 | |
| 
 | |
| 
 | |
|      Eliminating backing up among a list of keywords can also  be
 | |
|      done using a "catch-all" rule:
 | |
| 
 | |
|          %%
 | |
|          foo         return TOK_KEYWORD;
 | |
|          foobar      return TOK_KEYWORD;
 | |
| 
 | |
|          [a-z]+      return TOK_ID;
 | |
| 
 | |
|      This is usually the best solution when appropriate.
 | |
| 
 | |
|      Backing up messages tend to cascade.  With a complicated set
 | |
|      of  rules it's not uncommon to get hundreds of messages.  If
 | |
|      one can decipher them, though, it often only takes  a  dozen
 | |
|      or so rules to eliminate the backing up (though it's easy to
 | |
|      make a mistake and have an error rule accidentally  match  a
 | |
|      valid  token.   A  possible  future  flex feature will be to
 | |
|      automatically add rules to eliminate backing up).
 | |
| 
 | |
|      It's important to keep in mind that you gain the benefits of
 | |
|      eliminating  backing up only if you eliminate every instance
 | |
|      of backing up.  Leaving just one means you gain nothing.
 | |
| 
 | |
|      Variable trailing context (where both the leading and trail-
 | |
|      ing  parts  do  not  have a fixed length) entails almost the
 | |
|      same performance loss as  REJECT  (i.e.,  substantial).   So
 | |
|      when possible a rule like:
 | |
| 
 | |
|          %%
 | |
|          mouse|rat/(cat|dog)   run();
 | |
| 
 | |
|      is better written:
 | |
| 
 | |
|          %%
 | |
|          mouse/cat|dog         run();
 | |
|          rat/cat|dog           run();
 | |
| 
 | |
|      or as
 | |
| 
 | |
|          %%
 | |
|          mouse|rat/cat         run();
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   41
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|          mouse|rat/dog         run();
 | |
| 
 | |
|      Note that here the special '|' action does not  provide  any
 | |
|      savings,  and can even make things worse (see Deficiencies /
 | |
|      Bugs below).
 | |
| 
 | |
|      Another area where the user can increase a scanner's perfor-
 | |
|      mance  (and  one that's easier to implement) arises from the
 | |
|      fact that the longer the  tokens  matched,  the  faster  the
 | |
|      scanner will run.  This is because with long tokens the pro-
 | |
|      cessing of most input characters takes place in the  (short)
 | |
|      inner  scanning  loop, and does not often have to go through
 | |
|      the additional work of setting up the  scanning  environment
 | |
|      (e.g.,  yytext)  for  the  action.  Recall the scanner for C
 | |
|      comments:
 | |
| 
 | |
|          %x comment
 | |
|          %%
 | |
|                  int line_num = 1;
 | |
| 
 | |
|          "/*"         BEGIN(comment);
 | |
| 
 | |
|          <comment>[^*\n]*
 | |
|          <comment>"*"+[^*/\n]*
 | |
|          <comment>\n             ++line_num;
 | |
|          <comment>"*"+"/"        BEGIN(INITIAL);
 | |
| 
 | |
|      This could be sped up by writing it as:
 | |
| 
 | |
|          %x comment
 | |
|          %%
 | |
|                  int line_num = 1;
 | |
| 
 | |
|          "/*"         BEGIN(comment);
 | |
| 
 | |
|          <comment>[^*\n]*
 | |
|          <comment>[^*\n]*\n      ++line_num;
 | |
|          <comment>"*"+[^*/\n]*
 | |
|          <comment>"*"+[^*/\n]*\n ++line_num;
 | |
|          <comment>"*"+"/"        BEGIN(INITIAL);
 | |
| 
 | |
|      Now instead of each  newline  requiring  the  processing  of
 | |
|      another  action,  recognizing  the newlines is "distributed"
 | |
|      over the other rules to keep the matched  text  as  long  as
 | |
|      possible.   Note  that  adding  rules does not slow down the
 | |
|      scanner!  The speed of the scanner  is  independent  of  the
 | |
|      number  of  rules or (modulo the considerations given at the
 | |
|      beginning of this section) how  complicated  the  rules  are
 | |
|      with regard to operators such as '*' and '|'.
 | |
| 
 | |
|      A final example in speeding up a scanner: suppose  you  want
 | |
|      to  scan through a file containing identifiers and keywords,
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   42
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      one per line and with no other  extraneous  characters,  and
 | |
|      recognize all the keywords.  A natural first approach is:
 | |
| 
 | |
|          %%
 | |
|          asm      |
 | |
|          auto     |
 | |
|          break    |
 | |
|          ... etc ...
 | |
|          volatile |
 | |
|          while    /* it's a keyword */
 | |
| 
 | |
|          .|\n     /* it's not a keyword */
 | |
| 
 | |
|      To eliminate the back-tracking, introduce a catch-all rule:
 | |
| 
 | |
|          %%
 | |
|          asm      |
 | |
|          auto     |
 | |
|          break    |
 | |
|          ... etc ...
 | |
|          volatile |
 | |
|          while    /* it's a keyword */
 | |
| 
 | |
|          [a-z]+   |
 | |
|          .|\n     /* it's not a keyword */
 | |
| 
 | |
|      Now, if it's guaranteed that there's exactly  one  word  per
 | |
|      line,  then  we  can reduce the total number of matches by a
 | |
|      half by merging in the recognition of newlines with that  of
 | |
|      the other tokens:
 | |
| 
 | |
|          %%
 | |
|          asm\n    |
 | |
|          auto\n   |
 | |
|          break\n  |
 | |
|          ... etc ...
 | |
|          volatile\n |
 | |
|          while\n  /* it's a keyword */
 | |
| 
 | |
|          [a-z]+\n |
 | |
|          .|\n     /* it's not a keyword */
 | |
| 
 | |
|      One has to be careful here,  as  we  have  now  reintroduced
 | |
|      backing  up  into the scanner.  In particular, while we know
 | |
|      that there will never be any characters in the input  stream
 | |
|      other  than letters or newlines, flex can't figure this out,
 | |
|      and it will plan for possibly needing to back up when it has
 | |
|      scanned  a  token like "auto" and then the next character is
 | |
|      something other than a newline or a letter.   Previously  it
 | |
|      would  then  just match the "auto" rule and be done, but now
 | |
|      it has no "auto" rule, only a "auto\n" rule.   To  eliminate
 | |
|      the possibility of backing up, we could either duplicate all
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   43
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      rules but without final newlines, or, since we never  expect
 | |
|      to  encounter  such  an  input  and therefore don't how it's
 | |
|      classified, we can introduce one more catch-all  rule,  this
 | |
|      one which doesn't include a newline:
 | |
| 
 | |
|          %%
 | |
|          asm\n    |
 | |
|          auto\n   |
 | |
|          break\n  |
 | |
|          ... etc ...
 | |
|          volatile\n |
 | |
|          while\n  /* it's a keyword */
 | |
| 
 | |
|          [a-z]+\n |
 | |
|          [a-z]+   |
 | |
|          .|\n     /* it's not a keyword */
 | |
| 
 | |
|      Compiled with -Cf, this is about as fast as one  can  get  a
 | |
|      flex scanner to go for this particular problem.
 | |
| 
 | |
|      A final note: flex is slow when matching NUL's, particularly
 | |
|      when  a  token  contains multiple NUL's.  It's best to write
 | |
|      rules which match short amounts of text if it's  anticipated
 | |
|      that the text will often include NUL's.
 | |
| 
 | |
|      Another final note regarding performance: as mentioned above
 | |
|      in  the section How the Input is Matched, dynamically resiz-
 | |
|      ing yytext to accommodate huge  tokens  is  a  slow  process
 | |
|      because  it presently requires that the (huge) token be res-
 | |
|      canned from the beginning.  Thus if  performance  is  vital,
 | |
|      you  should  attempt to match "large" quantities of text but
 | |
|      not "huge" quantities, where the cutoff between the  two  is
 | |
|      at about 8K characters/token.
 | |
| 
 | |
| GENERATING C++ SCANNERS
 | |
|      flex provides two different ways to  generate  scanners  for
 | |
|      use  with C++.  The first way is to simply compile a scanner
 | |
|      generated by flex using a C++ compiler instead of a  C  com-
 | |
|      piler.   You  should  not  encounter any compilations errors
 | |
|      (please report any you find to the email  address  given  in
 | |
|      the  Author  section  below).   You can then use C++ code in
 | |
|      your rule actions instead of C code.  Note that the  default
 | |
|      input  source  for  your  scanner  remains yyin, and default
 | |
|      echoing is still done to yyout. Both of these remain FILE  *
 | |
|      variables and not C++ streams.
 | |
| 
 | |
|      You can also use flex to generate a C++ scanner class, using
 | |
|      the  -+  option  (or,  equivalently,  %option c++), which is
 | |
|      automatically specified if the name of the  flex  executable
 | |
|      ends  in a '+', such as flex++. When using this option, flex
 | |
|      defaults to generating the scanner  to  the  file  lex.yy.cc
 | |
|      instead  of  lex.yy.c.  The  generated  scanner includes the
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   44
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      header file FlexLexer.h, which defines the interface to  two
 | |
|      C++ classes.
 | |
| 
 | |
|      The first class, FlexLexer, provides an abstract base  class
 | |
|      defining  the  general scanner class interface.  It provides
 | |
|      the following member functions:
 | |
| 
 | |
|      const char* YYText()
 | |
|           returns the text of the most  recently  matched  token,
 | |
|           the equivalent of yytext.
 | |
| 
 | |
|      int YYLeng()
 | |
|           returns the length of the most recently matched  token,
 | |
|           the equivalent of yyleng.
 | |
| 
 | |
|      int lineno() const
 | |
|           returns the current  input  line  number  (see  %option
 | |
|           yylineno), or 1 if %option yylineno was not used.
 | |
| 
 | |
|      void set_debug( int flag )
 | |
|           sets the debugging flag for the scanner, equivalent  to
 | |
|           assigning  to  yy_flex_debug  (see  the Options section
 | |
|           above).  Note that you must  build  the  scanner  using
 | |
|           %option debug to include debugging information in it.
 | |
| 
 | |
|      int debug() const
 | |
|           returns the current setting of the debugging flag.
 | |
| 
 | |
|      Also   provided   are   member   functions   equivalent   to
 | |
|      yy_switch_to_buffer(),  yy_create_buffer() (though the first
 | |
|      argument is an istream* object pointer  and  not  a  FILE*),
 | |
|      yy_flush_buffer(),   yy_delete_buffer(),   and   yyrestart()
 | |
|      (again, the first argument is a istream* object pointer).
 | |
| 
 | |
|      The second class  defined  in  FlexLexer.h  is  yyFlexLexer,
 | |
|      which  is  derived  from FlexLexer. It defines the following
 | |
|      additional member functions:
 | |
| 
 | |
|      yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
 | |
|           constructs a yyFlexLexer object using the given streams
 | |
|           for  input  and  output.  If not specified, the streams
 | |
|           default to cin and cout, respectively.
 | |
| 
 | |
|      virtual int yylex()
 | |
|           performs the same role is  yylex()  does  for  ordinary
 | |
|           flex  scanners:  it  scans  the input stream, consuming
 | |
|           tokens, until a rule's action returns a value.  If  you
 | |
|           derive a subclass S from yyFlexLexer and want to access
 | |
|           the member functions and variables of S inside yylex(),
 | |
|           then you need to use %option yyclass="S" to inform flex
 | |
|           that you will be using that subclass instead of yyFlex-
 | |
|           Lexer.   In   this   case,   rather   than   generating
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   45
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           yyFlexLexer::yylex(), flex  generates  S::yylex()  (and
 | |
|           also  generates a dummy yyFlexLexer::yylex() that calls
 | |
|           yyFlexLexer::LexerError() if called).
 | |
| 
 | |
|      virtual void switch_streams(istream* new_in = 0,
 | |
|           ostream* new_out = 0)  reassigns  yyin  to  new_in  (if
 | |
|           non-nil)  and  yyout  to  new_out (ditto), deleting the
 | |
|           previous input buffer if yyin is reassigned.
 | |
| 
 | |
|      int yylex( istream* new_in, ostream* new_out = 0 )
 | |
|           first switches the input  streams  via  switch_streams(
 | |
|           new_in,  new_out  )  and  then  returns  the  value  of
 | |
|           yylex().
 | |
| 
 | |
|      In addition, yyFlexLexer  defines  the  following  protected
 | |
|      virtual  functions which you can redefine in derived classes
 | |
|      to tailor the scanner:
 | |
| 
 | |
|      virtual int LexerInput( char* buf, int max_size )
 | |
|           reads up to max_size characters into  buf  and  returns
 | |
|           the  number  of  characters  read.  To indicate end-of-
 | |
|           input, return 0 characters.   Note  that  "interactive"
 | |
|           scanners  (see  the  -B  and -I flags) define the macro
 | |
|           YY_INTERACTIVE. If you redefine LexerInput()  and  need
 | |
|           to  take  different actions depending on whether or not
 | |
|           the scanner might  be  scanning  an  interactive  input
 | |
|           source,  you can test for the presence of this name via
 | |
|           #ifdef.
 | |
| 
 | |
|      virtual void LexerOutput( const char* buf, int size )
 | |
|           writes out size characters from the buffer buf,  which,
 | |
|           while NUL-terminated, may also contain "internal" NUL's
 | |
|           if the scanner's rules can match  text  with  NUL's  in
 | |
|           them.
 | |
| 
 | |
|      virtual void LexerError( const char* msg )
 | |
|           reports a fatal error message.  The default version  of
 | |
|           this function writes the message to the stream cerr and
 | |
|           exits.
 | |
| 
 | |
|      Note that a yyFlexLexer object contains its entire  scanning
 | |
|      state.   Thus  you  can use such objects to create reentrant
 | |
|      scanners.  You can instantiate  multiple  instances  of  the
 | |
|      same  yyFlexLexer  class,  and you can also combine multiple
 | |
|      C++ scanner classes together in the same program  using  the
 | |
|      -P option discussed above.
 | |
| 
 | |
|      Finally, note that the %array feature is  not  available  to
 | |
|      C++ scanner classes; you must use %pointer (the default).
 | |
| 
 | |
|      Here is an example of a simple C++ scanner:
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   46
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|              // An example of using the flex C++ scanner class.
 | |
| 
 | |
|          %{
 | |
|          int mylineno = 0;
 | |
|          %}
 | |
| 
 | |
|          string  \"[^\n"]+\"
 | |
| 
 | |
|          ws      [ \t]+
 | |
| 
 | |
|          alpha   [A-Za-z]
 | |
|          dig     [0-9]
 | |
|          name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
 | |
|          num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
 | |
|          num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
 | |
|          number  {num1}|{num2}
 | |
| 
 | |
|          %%
 | |
| 
 | |
|          {ws}    /* skip blanks and tabs */
 | |
| 
 | |
|          "/*"    {
 | |
|                  int c;
 | |
| 
 | |
|                  while((c = yyinput()) != 0)
 | |
|                      {
 | |
|                      if(c == '\n')
 | |
|                          ++mylineno;
 | |
| 
 | |
|                      else if(c == '*')
 | |
|                          {
 | |
|                          if((c = yyinput()) == '/')
 | |
|                              break;
 | |
|                          else
 | |
|                              unput(c);
 | |
|                          }
 | |
|                      }
 | |
|                  }
 | |
| 
 | |
|          {number}  cout << "number " << YYText() << '\n';
 | |
| 
 | |
|          \n        mylineno++;
 | |
| 
 | |
|          {name}    cout << "name " << YYText() << '\n';
 | |
| 
 | |
|          {string}  cout << "string " << YYText() << '\n';
 | |
| 
 | |
|          %%
 | |
| 
 | |
|          int main( int /* argc */, char** /* argv */ )
 | |
|              {
 | |
|              FlexLexer* lexer = new yyFlexLexer;
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   47
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|              while(lexer->yylex() != 0)
 | |
|                  ;
 | |
|              return 0;
 | |
|              }
 | |
|      If you want to create multiple  (different)  lexer  classes,
 | |
|      you  use  the -P flag (or the prefix= option) to rename each
 | |
|      yyFlexLexer to some other xxFlexLexer. You then can  include
 | |
|      <FlexLexer.h>  in  your  other sources once per lexer class,
 | |
|      first renaming yyFlexLexer as follows:
 | |
| 
 | |
|          #undef yyFlexLexer
 | |
|          #define yyFlexLexer xxFlexLexer
 | |
|          #include <FlexLexer.h>
 | |
| 
 | |
|          #undef yyFlexLexer
 | |
|          #define yyFlexLexer zzFlexLexer
 | |
|          #include <FlexLexer.h>
 | |
| 
 | |
|      if, for example, you used %option  prefix="xx"  for  one  of
 | |
|      your scanners and %option prefix="zz" for the other.
 | |
| 
 | |
|      IMPORTANT: the present form of the scanning class is experi-
 | |
|      mental and may change considerably between major releases.
 | |
| 
 | |
| INCOMPATIBILITIES WITH LEX AND POSIX
 | |
|      flex is a rewrite of the AT&T Unix lex tool (the two  imple-
 | |
|      mentations  do not share any code, though), with some exten-
 | |
|      sions and incompatibilities, both of which are of concern to
 | |
|      those who wish to write scanners acceptable to either imple-
 | |
|      mentation.  Flex is  fully  compliant  with  the  POSIX  lex
 | |
|      specification,   except   that   when  using  %pointer  (the
 | |
|      default), a call to unput() destroys the contents of yytext,
 | |
|      which is counter to the POSIX specification.
 | |
| 
 | |
|      In this section we discuss all of the known areas of  incom-
 | |
|      patibility  between flex, AT&T lex, and the POSIX specifica-
 | |
|      tion.
 | |
| 
 | |
|      flex's -l option turns on  maximum  compatibility  with  the
 | |
|      original  AT&T  lex  implementation,  at the cost of a major
 | |
|      loss in the generated scanner's performance.  We note  below
 | |
|      which incompatibilities can be overcome using the -l option.
 | |
| 
 | |
|      flex is fully compatible with lex with the following  excep-
 | |
|      tions:
 | |
| 
 | |
|      -    The undocumented lex scanner internal variable yylineno
 | |
|           is not supported unless -l or %option yylineno is used.
 | |
| 
 | |
|           yylineno should be maintained on  a  per-buffer  basis,
 | |
|           rather  than  a  per-scanner  (single  global variable)
 | |
|           basis.
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   48
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           yylineno is not part of the POSIX specification.
 | |
| 
 | |
|      -    The input() routine is not redefinable, though  it  may
 | |
|           be  called  to  read  characters following whatever has
 | |
|           been matched by a rule.  If input() encounters an  end-
 | |
|           of-file  the  normal  yywrap()  processing  is done.  A
 | |
|           ``real'' end-of-file is returned by input() as EOF.
 | |
| 
 | |
|           Input is instead controlled by  defining  the  YY_INPUT
 | |
|           macro.
 | |
| 
 | |
|           The flex restriction that input() cannot  be  redefined
 | |
|           is  in  accordance  with the POSIX specification, which
 | |
|           simply does not specify  any  way  of  controlling  the
 | |
|           scanner's input other than by making an initial assign-
 | |
|           ment to yyin.
 | |
| 
 | |
|      -    The unput() routine is not redefinable.  This  restric-
 | |
|           tion is in accordance with POSIX.
 | |
| 
 | |
|      -    flex scanners are not as reentrant as lex scanners.  In
 | |
|           particular,  if  you have an interactive scanner and an
 | |
|           interrupt handler which long-jumps out of the  scanner,
 | |
|           and  the  scanner is subsequently called again, you may
 | |
|           get the following message:
 | |
| 
 | |
|               fatal flex scanner internal error--end of buffer missed
 | |
| 
 | |
|           To reenter the scanner, first use
 | |
| 
 | |
|               yyrestart( yyin );
 | |
| 
 | |
|           Note that this call will throw away any buffered input;
 | |
|           usually  this  isn't  a  problem  with  an  interactive
 | |
|           scanner.
 | |
| 
 | |
|           Also note that flex C++ scanner classes are  reentrant,
 | |
|           so  if  using  C++ is an option for you, you should use
 | |
|           them instead.  See "Generating C++ Scanners" above  for
 | |
|           details.
 | |
| 
 | |
|      -    output() is not supported.  Output from the ECHO  macro
 | |
|           is done to the file-pointer yyout (default stdout).
 | |
| 
 | |
|           output() is not part of the POSIX specification.
 | |
| 
 | |
|      -    lex does not support exclusive start  conditions  (%x),
 | |
|           though they are in the POSIX specification.
 | |
| 
 | |
|      -    When definitions are expanded, flex  encloses  them  in
 | |
|           parentheses.  With lex, the following:
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   49
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|               NAME    [A-Z][A-Z0-9]*
 | |
|               %%
 | |
|               foo{NAME}?      printf( "Found it\n" );
 | |
|               %%
 | |
| 
 | |
|           will not match the string "foo" because when the  macro
 | |
|           is  expanded  the rule is equivalent to "foo[A-Z][A-Z0-
 | |
|           9]*?" and the precedence is such that the '?' is  asso-
 | |
|           ciated  with  "[A-Z0-9]*".  With flex, the rule will be
 | |
|           expanded to "foo([A-Z][A-Z0-9]*)?" and  so  the  string
 | |
|           "foo" will match.
 | |
| 
 | |
|           Note that if the definition begins with ^ or ends  with
 | |
|           $  then  it  is not expanded with parentheses, to allow
 | |
|           these operators to appear in definitions without losing
 | |
|           their  special  meanings.   But the <s>, /, and <<EOF>>
 | |
|           operators cannot be used in a flex definition.
 | |
| 
 | |
|           Using -l results in the lex behavior of no  parentheses
 | |
|           around the definition.
 | |
| 
 | |
|           The POSIX  specification  is  that  the  definition  be
 | |
|           enclosed in parentheses.
 | |
| 
 | |
|      -    Some implementations of lex allow a  rule's  action  to
 | |
|           begin  on  a  separate  line, if the rule's pattern has
 | |
|           trailing whitespace:
 | |
| 
 | |
|               %%
 | |
|               foo|bar<space here>
 | |
|                 { foobar_action(); }
 | |
| 
 | |
|           flex does not support this feature.
 | |
| 
 | |
|      -    The lex %r (generate a Ratfor scanner)  option  is  not
 | |
|           supported.  It is not part of the POSIX specification.
 | |
| 
 | |
|      -    After a call to unput(), yytext is undefined until  the
 | |
|           next  token  is  matched,  unless the scanner was built
 | |
|           using %array. This is not the  case  with  lex  or  the
 | |
|           POSIX specification.  The -l option does away with this
 | |
|           incompatibility.
 | |
| 
 | |
|      -    The precedence of the {} (numeric  range)  operator  is
 | |
|           different.   lex  interprets  "abc{1,3}" as "match one,
 | |
|           two, or  three  occurrences  of  'abc'",  whereas  flex
 | |
|           interprets  it  as "match 'ab' followed by one, two, or
 | |
|           three occurrences of 'c'".  The latter is in  agreement
 | |
|           with the POSIX specification.
 | |
| 
 | |
|      -    The precedence of the ^  operator  is  different.   lex
 | |
|           interprets  "^foo|bar"  as  "match  either 'foo' at the
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   50
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|           beginning of a line, or 'bar' anywhere",  whereas  flex
 | |
|           interprets  it  as "match either 'foo' or 'bar' if they
 | |
|           come at the beginning of a line".   The  latter  is  in
 | |
|           agreement with the POSIX specification.
 | |
| 
 | |
|      -    The special table-size declarations  such  as  %a  sup-
 | |
|           ported  by  lex are not required by flex scanners; flex
 | |
|           ignores them.
 | |
| 
 | |
|      -    The name FLEX_SCANNER is #define'd so scanners  may  be
 | |
|           written  for use with either flex or lex. Scanners also
 | |
|           include YY_FLEX_MAJOR_VERSION and YY_FLEX_MINOR_VERSION
 | |
|           indicating  which version of flex generated the scanner
 | |
|           (for example, for the 2.5 release, these defines  would
 | |
|           be 2 and 5 respectively).
 | |
| 
 | |
|      The following flex features are not included in lex  or  the
 | |
|      POSIX specification:
 | |
| 
 | |
|          C++ scanners
 | |
|          %option
 | |
|          start condition scopes
 | |
|          start condition stacks
 | |
|          interactive/non-interactive scanners
 | |
|          yy_scan_string() and friends
 | |
|          yyterminate()
 | |
|          yy_set_interactive()
 | |
|          yy_set_bol()
 | |
|          YY_AT_BOL()
 | |
|          <<EOF>>
 | |
|          <*>
 | |
|          YY_DECL
 | |
|          YY_START
 | |
|          YY_USER_ACTION
 | |
|          YY_USER_INIT
 | |
|          #line directives
 | |
|          %{}'s around actions
 | |
|          multiple actions on a line
 | |
| 
 | |
|      plus almost all of the flex flags.  The last feature in  the
 | |
|      list  refers to the fact that with flex you can put multiple
 | |
|      actions on the same line, separated with semi-colons,  while
 | |
|      with lex, the following
 | |
| 
 | |
|          foo    handle_foo(); ++num_foos_seen;
 | |
| 
 | |
|      is (rather surprisingly) truncated to
 | |
| 
 | |
|          foo    handle_foo();
 | |
| 
 | |
|      flex does not truncate the action.   Actions  that  are  not
 | |
|      enclosed  in  braces are simply terminated at the end of the
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   51
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      line.
 | |
| 
 | |
| DIAGNOSTICS
 | |
|      warning, rule cannot be matched  indicates  that  the  given
 | |
|      rule  cannot  be matched because it follows other rules that
 | |
|      will always match the same text as it.  For example, in  the
 | |
|      following  "foo" cannot be matched because it comes after an
 | |
|      identifier "catch-all" rule:
 | |
| 
 | |
|          [a-z]+    got_identifier();
 | |
|          foo       got_foo();
 | |
| 
 | |
|      Using REJECT in a scanner suppresses this warning.
 | |
| 
 | |
|      warning, -s option given but default  rule  can  be  matched
 | |
|      means  that  it  is  possible  (perhaps only in a particular
 | |
|      start condition) that the default  rule  (match  any  single
 | |
|      character)  is  the  only  one  that will match a particular
 | |
|      input.  Since -s was given, presumably this is not intended.
 | |
| 
 | |
|      reject_used_but_not_detected          undefined           or
 | |
|      yymore_used_but_not_detected  undefined  -  These errors can
 | |
|      occur at compile time.  They indicate that the scanner  uses
 | |
|      REJECT  or yymore() but that flex failed to notice the fact,
 | |
|      meaning that flex scanned the first two sections looking for
 | |
|      occurrences  of  these  actions  and failed to find any, but
 | |
|      somehow you snuck some in (via a #include  file,  for  exam-
 | |
|      ple).   Use  %option reject or %option yymore to indicate to
 | |
|      flex that you really do use these features.
 | |
| 
 | |
|      flex scanner jammed - a scanner compiled with -s has encoun-
 | |
|      tered  an  input  string  which wasn't matched by any of its
 | |
|      rules.  This error can also occur due to internal problems.
 | |
| 
 | |
|      token too large, exceeds YYLMAX - your scanner  uses  %array
 | |
|      and one of its rules matched a string longer than the YYLMAX
 | |
|      constant (8K bytes by default).  You can increase the  value
 | |
|      by  #define'ing  YYLMAX  in  the definitions section of your
 | |
|      flex input.
 | |
| 
 | |
|      scanner requires -8 flag to use the  character  'x'  -  Your
 | |
|      scanner specification includes recognizing the 8-bit charac-
 | |
|      ter 'x' and you did  not  specify  the  -8  flag,  and  your
 | |
|      scanner  defaulted  to 7-bit because you used the -Cf or -CF
 | |
|      table compression options.  See the  discussion  of  the  -7
 | |
|      flag for details.
 | |
| 
 | |
|      flex scanner push-back overflow - you used unput()  to  push
 | |
|      back  so  much text that the scanner's buffer could not hold
 | |
|      both the pushed-back text and the current token  in  yytext.
 | |
|      Ideally  the scanner should dynamically resize the buffer in
 | |
|      this case, but at present it does not.
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   52
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      input buffer overflow, can't enlarge buffer because  scanner
 | |
|      uses  REJECT  -  the  scanner  was  working  on  matching an
 | |
|      extremely large token and needed to expand the input buffer.
 | |
|      This doesn't work with scanners that use REJECT.
 | |
| 
 | |
|      fatal flex scanner internal error--end of  buffer  missed  -
 | |
|      This  can  occur  in  an  scanner which is reentered after a
 | |
|      long-jump has jumped out (or over) the scanner's  activation
 | |
|      frame.  Before reentering the scanner, use:
 | |
| 
 | |
|          yyrestart( yyin );
 | |
| 
 | |
|      or, as noted above, switch to using the C++ scanner class.
 | |
| 
 | |
|      too many start conditions in <> you listed more start condi-
 | |
|      tions  in a <> construct than exist (so you must have listed
 | |
|      at least one of them twice).
 | |
| 
 | |
| FILES
 | |
|      -lfl library with which scanners must be linked.
 | |
| 
 | |
|      lex.yy.c
 | |
|           generated scanner (called lexyy.c on some systems).
 | |
| 
 | |
|      lex.yy.cc
 | |
|           generated C++ scanner class, when using -+.
 | |
| 
 | |
|      <FlexLexer.h>
 | |
|           header file defining the C++ scanner base class,  Flex-
 | |
|           Lexer, and its derived class, yyFlexLexer.
 | |
| 
 | |
|      flex.skl
 | |
|           skeleton scanner.  This file is only used when building
 | |
|           flex, not when flex executes.
 | |
| 
 | |
|      lex.backup
 | |
|           backing-up information for -b flag (called  lex.bck  on
 | |
|           some systems).
 | |
| 
 | |
| DEFICIENCIES / BUGS
 | |
|      Some trailing context patterns cannot  be  properly  matched
 | |
|      and  generate  warning  messages  ("dangerous  trailing con-
 | |
|      text").  These are patterns where the ending  of  the  first
 | |
|      part  of  the rule matches the beginning of the second part,
 | |
|      such as "zx*/xy*", where the 'x*' matches  the  'x'  at  the
 | |
|      beginning  of  the  trailing  context.  (Note that the POSIX
 | |
|      draft states that the text matched by such patterns is unde-
 | |
|      fined.)
 | |
| 
 | |
|      For some trailing context rules, parts  which  are  actually
 | |
|      fixed-length  are  not  recognized  as  such, leading to the
 | |
|      abovementioned performance loss.  In particular, parts using
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   53
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      '|'   or  {n}  (such  as  "foo{3}")  are  always  considered
 | |
|      variable-length.
 | |
| 
 | |
|      Combining trailing context with the special '|'  action  can
 | |
|      result  in fixed trailing context being turned into the more
 | |
|      expensive variable trailing context.  For  example,  in  the
 | |
|      following:
 | |
| 
 | |
|          %%
 | |
|          abc      |
 | |
|          xyz/def
 | |
| 
 | |
| 
 | |
|      Use of unput() invalidates yytext  and  yyleng,  unless  the
 | |
|      %array directive or the -l option has been used.
 | |
| 
 | |
|      Pattern-matching  of  NUL's  is  substantially  slower  than
 | |
|      matching other characters.
 | |
| 
 | |
|      Dynamic resizing of the input buffer is slow, as it  entails
 | |
|      rescanning  all the text matched so far by the current (gen-
 | |
|      erally huge) token.
 | |
| 
 | |
|      Due to both buffering of input and  read-ahead,  you  cannot
 | |
|      intermix  calls to <stdio.h> routines, such as, for example,
 | |
|      getchar(), with flex rules and  expect  it  to  work.   Call
 | |
|      input() instead.
 | |
| 
 | |
|      The total table entries listed by the -v flag  excludes  the
 | |
|      number  of  table  entries needed to determine what rule has
 | |
|      been matched.  The number of entries is equal to the  number
 | |
|      of  DFA states if the scanner does not use REJECT, and some-
 | |
|      what greater than the number of states if it does.
 | |
| 
 | |
|      REJECT cannot be used with the -f or -F options.
 | |
| 
 | |
|      The flex internal algorithms need documentation.
 | |
| 
 | |
| SEE ALSO
 | |
|      lex(1), yacc(1), sed(1), awk(1).
 | |
| 
 | |
|      John Levine,  Tony  Mason,  and  Doug  Brown,  Lex  &  Yacc,
 | |
|      O'Reilly and Associates.  Be sure to get the 2nd edition.
 | |
| 
 | |
|      M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator
 | |
| 
 | |
|      Alfred Aho, Ravi Sethi and Jeffrey Ullman, Compilers:  Prin-
 | |
|      ciples,   Techniques   and   Tools,  Addison-Wesley  (1986).
 | |
|      Describes  the  pattern-matching  techniques  used  by  flex
 | |
|      (deterministic finite automata).
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   54
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
| AUTHOR
 | |
|      Vern Paxson, with the help of many ideas and  much  inspira-
 | |
|      tion  from Van Jacobson.  Original version by Jef Poskanzer.
 | |
|      The fast table representation is a partial implementation of
 | |
|      a  design done by Van Jacobson.  The implementation was done
 | |
|      by Kevin Gong and Vern Paxson.
 | |
| 
 | |
|      Thanks to the many flex beta-testers, feedbackers, and  con-
 | |
|      tributors,  especially Francois Pinard, Casey Leedom, Robert
 | |
|      Abramovitz,  Stan  Adermann,  Terry  Allen,  David   Barker-
 | |
|      Plummer,  John  Basrai,  Neal  Becker,  Nelson  H.F.  Beebe,
 | |
|      benson@odi.com, Karl Berry, Peter A. Bigot, Simon Blanchard,
 | |
|      Keith  Bostic,  Frederic Brehm, Ian Brockbank, Kin Cho, Nick
 | |
|      Christopher, Brian Clapper, J.T.  Conklin,  Jason  Coughlin,
 | |
|      Bill  Cox,  Nick  Cropper, Dave Curtis, Scott David Daniels,
 | |
|      Chris  G.  Demetriou,  Theo  Deraadt,  Mike  Donahue,  Chuck
 | |
|      Doucette,  Tom  Epperly,  Leo  Eskin,  Chris  Faylor,  Chris
 | |
|      Flatters, Jon Forrest, Jeffrey Friedl, Joe Gayda,  Kaveh  R.
 | |
|      Ghazi,  Wolfgang  Glunz, Eric Goldman, Christopher M. Gould,
 | |
|      Ulrich Grepel, Peer Griebel, Jan  Hajic,  Charles  Hemphill,
 | |
|      NORO  Hideo,  Jarkko  Hietaniemi, Scott Hofmann, Jeff Honig,
 | |
|      Dana Hudes, Eric Hughes,  John  Interrante,  Ceriel  Jacobs,
 | |
|      Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry
 | |
|      Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O  Kane,
 | |
|      Amir  Katz, ken@ken.hilco.com, Kevin B. Kenny, Steve Kirsch,
 | |
|      Winfried Koenig, Marq  Kole,  Ronald  Lamprecht,  Greg  Lee,
 | |
|      Rohan  Lenard, Craig Leres, John Levine, Steve Liddle, David
 | |
|      Loffredo, Mike Long, Mohamed el Lozy, Brian  Madsen,  Malte,
 | |
|      Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn,
 | |
|      Jim Meyering,  R.  Alexander  Milowski,  Erik  Naggum,  G.T.
 | |
|      Nicol,  Landon  Noll,  James  Nordby,  Marc  Nozell, Richard
 | |
|      Ohnemus, Karsten Pahnke, Sven Panne,  Roland  Pesch,  Walter
 | |
|      Pelissero,  Gaumond  Pierre, Esmond Pitt, Jef Poskanzer, Joe
 | |
|      Rahmeh, Jarmo Raiha, Frederic Raimbault,  Pat  Rankin,  Rick
 | |
|      Richardson,  Kevin  Rodgers,  Kai  Uwe  Rommel, Jim Roskind,
 | |
|      Alberto Santini,  Andreas  Scherer,  Darrell  Schiebel,  Raf
 | |
|      Schietekat,  Doug  Schmidt,  Philippe  Schnoebelen,  Andreas
 | |
|      Schwab, Larry Schwimmer, Alex Siegel, Eckehard  Stolz,  Jan-
 | |
|      Erik  Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
 | |
|      Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi  Tsai,
 | |
|      Paul  Tuinenga,  Gary  Weik, Frank Whaley, Gerhard Wilhelms,
 | |
|      Kent Williams, Ken Yap,  Ron  Zellar,  Nathan  Zelle,  David
 | |
|      Zuhn,  and  those whose names have slipped my marginal mail-
 | |
|      archiving skills but whose contributions are appreciated all
 | |
|      the same.
 | |
| 
 | |
|      Thanks to Keith Bostic, Jon  Forrest,  Noah  Friedman,  John
 | |
|      Gilmore, Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol,
 | |
|      Francois Pinard, Rich Salz, and Richard  Stallman  for  help
 | |
|      with various distribution headaches.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   55
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| FLEX(1)                  USER COMMANDS                    FLEX(1)
 | |
| 
 | |
| 
 | |
| 
 | |
|      Thanks to Esmond Pitt and Earle Horton for  8-bit  character
 | |
|      support; to Benson Margulies and Fred Burke for C++ support;
 | |
|      to Kent Williams and Tom Epperly for C++ class  support;  to
 | |
|      Ove  Ewerlid  for  support  of NUL's; and to Eric Hughes for
 | |
|      support of multiple buffers.
 | |
| 
 | |
|      This work was primarily done when I was with the  Real  Time
 | |
|      Systems  Group at the Lawrence Berkeley Laboratory in Berke-
 | |
|      ley, CA.  Many  thanks  to  all  there  for  the  support  I
 | |
|      received.
 | |
| 
 | |
|      Send comments to vern@ee.lbl.gov.
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| Version 2.5          Last change: April 1995                   56
 | |
| 
 | |
| 
 | |
| 
 |