349 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
		
			Executable File
		
	
	
	
	
			
		
		
	
	
			349 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
		
			Executable File
		
	
	
	
	
| #	@(#)TOUR	5.1 (Berkeley) 3/7/91
 | |
| 
 | |
|                        A Tour through Ash
 | |
| 
 | |
|                Copyright 1989 by Kenneth Almquist.
 | |
| 
 | |
| 
 | |
| DIRECTORIES:  The subdirectory bltin contains commands which can
 | |
| be compiled stand-alone.  The rest of the source is in the main
 | |
| ash directory.
 | |
| 
 | |
| SOURCE CODE GENERATORS:  Files whose names begin with "mk" are
 | |
| programs that generate source code.  A complete list of these
 | |
| programs is:
 | |
| 
 | |
|         program         intput files        generates
 | |
|         -------         ------------        ---------
 | |
|         mkbuiltins      builtins            builtins.h builtins.c
 | |
|         mkinit          *.c                 init.c
 | |
|         mknodes         nodetypes           nodes.h nodes.c
 | |
|         mksignames          -               signames.h signames.c
 | |
|         mksyntax            -               syntax.h syntax.c
 | |
|         mktokens            -               token.def
 | |
|         bltin/mkexpr    unary_op binary_op  operators.h operators.c
 | |
| 
 | |
| There are undoubtedly too many of these.  Mkinit searches all the
 | |
| C source files for entries looking like:
 | |
| 
 | |
|         INIT {
 | |
|               x = 1;    /* executed during initialization */
 | |
|         }
 | |
| 
 | |
|         RESET {
 | |
|               x = 2;    /* executed when the shell does a longjmp
 | |
|                            back to the main command loop */
 | |
|         }
 | |
| 
 | |
|         SHELLPROC {
 | |
|               x = 3;    /* executed when the shell runs a shell procedure */
 | |
|         }
 | |
| 
 | |
| It pulls this code out into routines which are when particular
 | |
| events occur.  The intent is to improve modularity by isolating
 | |
| the information about which modules need to be explicitly
 | |
| initialized/reset within the modules themselves.
 | |
| 
 | |
| Mkinit recognizes several constructs for placing declarations in
 | |
| the init.c file.
 | |
|         INCLUDE "file.h"
 | |
| includes a file.  The storage class MKINIT makes a declaration
 | |
| available in the init.c file, for example:
 | |
|         MKINIT int funcnest;    /* depth of function calls */
 | |
| MKINIT alone on a line introduces a structure or union declara-
 | |
| tion:
 | |
|         MKINIT
 | |
|         struct redirtab {
 | |
|               short renamed[10];
 | |
|         };
 | |
| Preprocessor #define statements are copied to init.c without any
 | |
| special action to request this.
 | |
| 
 | |
| INDENTATION:  The ash source is indented in multiples of six
 | |
| spaces.  The only study that I have heard of on the subject con-
 | |
| cluded that the optimal amount to indent is in the range of four
 | |
| to six spaces.  I use six spaces since it is not too big a jump
 | |
| from the widely used eight spaces.  If you really hate six space
 | |
| indentation, use the adjind (source included) program to change
 | |
| it to something else.
 | |
| 
 | |
| EXCEPTIONS:  Code for dealing with exceptions appears in
 | |
| exceptions.c.  The C language doesn't include exception handling,
 | |
| so I implement it using setjmp and longjmp.  The global variable
 | |
| exception contains the type of exception.  EXERROR is raised by
 | |
| calling error.  EXINT is an interrupt.  EXSHELLPROC is an excep-
 | |
| tion which is raised when a shell procedure is invoked.  The pur-
 | |
| pose of EXSHELLPROC is to perform the cleanup actions associated
 | |
| with other exceptions.  After these cleanup actions, the shell
 | |
| can interpret a shell procedure itself without exec'ing a new
 | |
| copy of the shell.
 | |
| 
 | |
| INTERRUPTS:  In an interactive shell, an interrupt will cause an
 | |
| EXINT exception to return to the main command loop.  (Exception:
 | |
| EXINT is not raised if the user traps interrupts using the trap
 | |
| command.)  The INTOFF and INTON macros (defined in exception.h)
 | |
| provide uninterruptable critical sections.  Between the execution
 | |
| of INTOFF and the execution of INTON, interrupt signals will be
 | |
| held for later delivery.  INTOFF and INTON can be nested.
 | |
| 
 | |
| MEMALLOC.C:  Memalloc.c defines versions of malloc and realloc
 | |
| which call error when there is no memory left.  It also defines a
 | |
| stack oriented memory allocation scheme.  Allocating off a stack
 | |
| is probably more efficient than allocation using malloc, but the
 | |
| big advantage is that when an exception occurs all we have to do
 | |
| to free up the memory in use at the time of the exception is to
 | |
| restore the stack pointer.  The stack is implemented using a
 | |
| linked list of blocks.
 | |
| 
 | |
| STPUTC:  If the stack were contiguous, it would be easy to store
 | |
| strings on the stack without knowing in advance how long the
 | |
| string was going to be:
 | |
|         p = stackptr;
 | |
|         *p++ = c;       /* repeated as many times as needed */
 | |
|         stackptr = p;
 | |
| The folloing three macros (defined in memalloc.h) perform these
 | |
| operations, but grow the stack if you run off the end:
 | |
|         STARTSTACKSTR(p);
 | |
|         STPUTC(c, p);   /* repeated as many times as needed */
 | |
|         grabstackstr(p);
 | |
| 
 | |
| We now start a top-down look at the code:
 | |
| 
 | |
| MAIN.C:  The main routine performs some initialization, executes
 | |
| the user's profile if necessary, and calls cmdloop.  Cmdloop is
 | |
| repeatedly parses and executes commands.
 | |
| 
 | |
| OPTIONS.C:  This file contains the option processing code.  It is
 | |
| called from main to parse the shell arguments when the shell is
 | |
| invoked, and it also contains the set builtin.  The -i and -j op-
 | |
| tions (the latter turns on job control) require changes in signal
 | |
| handling.  The routines setjobctl (in jobs.c) and setinteractive
 | |
| (in trap.c) are called to handle changes to these options.
 | |
| 
 | |
| PARSING:  The parser code is all in parser.c.  A recursive des-
 | |
| cent parser is used.  Syntax tables (generated by mksyntax) are
 | |
| used to classify characters during lexical analysis.  There are
 | |
| three tables:  one for normal use, one for use when inside single
 | |
| quotes, and one for use when inside double quotes.  The tables
 | |
| are machine dependent because they are indexed by character vari-
 | |
| ables and the range of a char varies from machine to machine.
 | |
| 
 | |
| PARSE OUTPUT:  The output of the parser consists of a tree of
 | |
| nodes.  The various types of nodes are defined in the file node-
 | |
| types.
 | |
| 
 | |
| Nodes of type NARG are used to represent both words and the con-
 | |
| tents of here documents.  An early version of ash kept the con-
 | |
| tents of here documents in temporary files, but keeping here do-
 | |
| cuments in memory typically results in significantly better per-
 | |
| formance.  It would have been nice to make it an option to use
 | |
| temporary files for here documents, for the benefit of small
 | |
| machines, but the code to keep track of when to delete the tem-
 | |
| porary files was complex and I never fixed all the bugs in it.
 | |
| (AT&T has been maintaining the Bourne shell for more than ten
 | |
| years, and to the best of my knowledge they still haven't gotten
 | |
| it to handle temporary files correctly in obscure cases.)
 | |
| 
 | |
| The text field of a NARG structure points to the text of the
 | |
| word.  The text consists of ordinary characters and a number of
 | |
| special codes defined in parser.h.  The special codes are:
 | |
| 
 | |
|         CTLVAR              Variable substitution
 | |
|         CTLENDVAR           End of variable substitution
 | |
|         CTLBACKQ            Command substitution
 | |
|         CTLBACKQ|CTLQUOTE   Command substitution inside double quotes
 | |
|         CTLESC              Escape next character
 | |
| 
 | |
| A variable substitution contains the following elements:
 | |
| 
 | |
|         CTLVAR type name '=' [ alternative-text CTLENDVAR ]
 | |
| 
 | |
| The type field is a single character specifying the type of sub-
 | |
| stitution.  The possible types are:
 | |
| 
 | |
|         VSNORMAL            $var
 | |
|         VSMINUS             ${var-text}
 | |
|         VSMINUS|VSNUL       ${var:-text}
 | |
|         VSPLUS              ${var+text}
 | |
|         VSPLUS|VSNUL        ${var:+text}
 | |
|         VSQUESTION          ${var?text}
 | |
|         VSQUESTION|VSNUL    ${var:?text}
 | |
|         VSASSIGN            ${var=text}
 | |
|         VSASSIGN|VSNUL      ${var=text}
 | |
| 
 | |
| In addition, the type field will have the VSQUOTE flag set if the
 | |
| variable is enclosed in double quotes.  The name of the variable
 | |
| comes next, terminated by an equals sign.  If the type is not
 | |
| VSNORMAL, then the text field in the substitution follows, ter-
 | |
| minated by a CTLENDVAR byte.
 | |
| 
 | |
| Commands in back quotes are parsed and stored in a linked list.
 | |
| The locations of these commands in the string are indicated by
 | |
| CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether
 | |
| the back quotes were enclosed in double quotes.
 | |
| 
 | |
| The character CTLESC escapes the next character, so that in case
 | |
| any of the CTL characters mentioned above appear in the input,
 | |
| they can be passed through transparently.  CTLESC is also used to
 | |
| escape '*', '?', '[', and '!' characters which were quoted by the
 | |
| user and thus should not be used for file name generation.
 | |
| 
 | |
| CTLESC characters have proved to be particularly tricky to get
 | |
| right.  In the case of here documents which are not subject to
 | |
| variable and command substitution, the parser doesn't insert any
 | |
| CTLESC characters to begin with (so the contents of the text
 | |
| field can be written without any processing).  Other here docu-
 | |
| ments, and words which are not subject to splitting and file name
 | |
| generation, have the CTLESC characters removed during the vari-
 | |
| able and command substitution phase.  Words which are subject
 | |
| splitting and file name generation have the CTLESC characters re-
 | |
| moved as part of the file name phase.
 | |
| 
 | |
| EXECUTION:  Command execution is handled by the following files:
 | |
|         eval.c     The top level routines.
 | |
|         redir.c    Code to handle redirection of input and output.
 | |
|         jobs.c     Code to handle forking, waiting, and job control.
 | |
|         exec.c     Code to to path searches and the actual exec sys call.
 | |
|         expand.c   Code to evaluate arguments.
 | |
|         var.c      Maintains the variable symbol table.  Called from expand.c.
 | |
| 
 | |
| EVAL.C:  Evaltree recursively executes a parse tree.  The exit
 | |
| status is returned in the global variable exitstatus.  The alter-
 | |
| native entry evalbackcmd is called to evaluate commands in back
 | |
| quotes.  It saves the result in memory if the command is a buil-
 | |
| tin; otherwise it forks off a child to execute the command and
 | |
| connects the standard output of the child to a pipe.
 | |
| 
 | |
| JOBS.C:  To create a process, you call makejob to return a job
 | |
| structure, and then call forkshell (passing the job structure as
 | |
| an argument) to create the process.  Waitforjob waits for a job
 | |
| to complete.  These routines take care of process groups if job
 | |
| control is defined.
 | |
| 
 | |
| REDIR.C:  Ash allows file descriptors to be redirected and then
 | |
| restored without forking off a child process.  This is accom-
 | |
| plished by duplicating the original file descriptors.  The redir-
 | |
| tab structure records where the file descriptors have be dupli-
 | |
| cated to.
 | |
| 
 | |
| EXEC.C:  The routine find_command locates a command, and enters
 | |
| the command in the hash table if it is not already there.  The
 | |
| third argument specifies whether it is to print an error message
 | |
| if the command is not found.  (When a pipeline is set up,
 | |
| find_command is called for all the commands in the pipeline be-
 | |
| fore any forking is done, so to get the commands into the hash
 | |
| table of the parent process.  But to make command hashing as
 | |
| transparent as possible, we silently ignore errors at that point
 | |
| and only print error messages if the command cannot be found
 | |
| later.)
 | |
| 
 | |
| The routine shellexec is the interface to the exec system call.
 | |
| 
 | |
| EXPAND.C:  Arguments are processed in three passes.  The first
 | |
| (performed by the routine argstr) performs variable and command
 | |
| substitution.  The second (ifsbreakup) performs word splitting
 | |
| and the third (expandmeta) performs file name generation.  If the
 | |
| "/u" directory is simulated, then when "/u/username" is replaced
 | |
| by the user's home directory, the flag "didudir" is set.  This
 | |
| tells the cd command that it should print out the directory name,
 | |
| just as it would if the "/u" directory were implemented using
 | |
| symbolic links.
 | |
| 
 | |
| VAR.C:  Variables are stored in a hash table.  Probably we should
 | |
| switch to extensible hashing.  The variable name is stored in the
 | |
| same string as the value (using the format "name=value") so that
 | |
| no string copying is needed to create the environment of a com-
 | |
| mand.  Variables which the shell references internally are preal-
 | |
| located so that the shell can reference the values of these vari-
 | |
| ables without doing a lookup.
 | |
| 
 | |
| When a program is run, the code in eval.c sticks any environment
 | |
| variables which precede the command (as in "PATH=xxx command") in
 | |
| the variable table as the simplest way to strip duplicates, and
 | |
| then calls "environment" to get the value of the environment.
 | |
| There are two consequences of this.  First, if an assignment to
 | |
| PATH precedes the command, the value of PATH before the assign-
 | |
| ment must be remembered and passed to shellexec.  Second, if the
 | |
| program turns out to be a shell procedure, the strings from the
 | |
| environment variables which preceded the command must be pulled
 | |
| out of the table and replaced with strings obtained from malloc,
 | |
| since the former will automatically be freed when the stack (see
 | |
| the entry on memalloc.c) is emptied.
 | |
| 
 | |
| BUILTIN COMMANDS:  The procedures for handling these are scat-
 | |
| tered throughout the code, depending on which location appears
 | |
| most appropriate.  They can be recognized because their names al-
 | |
| ways end in "cmd".  The mapping from names to procedures is
 | |
| specified in the file builtins, which is processed by the mkbuil-
 | |
| tins command.
 | |
| 
 | |
| A builtin command is invoked with argc and argv set up like a
 | |
| normal program.  A builtin command is allowed to overwrite its
 | |
| arguments.  Builtin routines can call nextopt to do option pars-
 | |
| ing.  This is kind of like getopt, but you don't pass argc and
 | |
| argv to it.  Builtin routines can also call error.  This routine
 | |
| normally terminates the shell (or returns to the main command
 | |
| loop if the shell is interactive), but when called from a builtin
 | |
| command it causes the builtin command to terminate with an exit
 | |
| status of 2.
 | |
| 
 | |
| The directory bltins contains commands which can be compiled in-
 | |
| dependently but can also be built into the shell for efficiency
 | |
| reasons.  The makefile in this directory compiles these programs
 | |
| in the normal fashion (so that they can be run regardless of
 | |
| whether the invoker is ash), but also creates a library named
 | |
| bltinlib.a which can be linked with ash.  The header file bltin.h
 | |
| takes care of most of the differences between the ash and the
 | |
| stand-alone environment.  The user should call the main routine
 | |
| "main", and #define main to be the name of the routine to use
 | |
| when the program is linked into ash.  This #define should appear
 | |
| before bltin.h is included; bltin.h will #undef main if the pro-
 | |
| gram is to be compiled stand-alone.
 | |
| 
 | |
| CD.C:  This file defines the cd and pwd builtins.  The pwd com-
 | |
| mand runs /bin/pwd the first time it is invoked (unless the user
 | |
| has already done a cd to an absolute pathname), but then
 | |
| remembers the current directory and updates it when the cd com-
 | |
| mand is run, so subsequent pwd commands run very fast.  The main
 | |
| complication in the cd command is in the docd command, which
 | |
| resolves symbolic links into actual names and informs the user
 | |
| where the user ended up if he crossed a symbolic link.
 | |
| 
 | |
| SIGNALS:  Trap.c implements the trap command.  The routine set-
 | |
| signal figures out what action should be taken when a signal is
 | |
| received and invokes the signal system call to set the signal ac-
 | |
| tion appropriately.  When a signal that a user has set a trap for
 | |
| is caught, the routine "onsig" sets a flag.  The routine dotrap
 | |
| is called at appropriate points to actually handle the signal.
 | |
| When an interrupt is caught and no trap has been set for that
 | |
| signal, the routine "onint" in error.c is called.
 | |
| 
 | |
| OUTPUT:  Ash uses it's own output routines.  There are three out-
 | |
| put structures allocated.  "Output" represents the standard out-
 | |
| put, "errout" the standard error, and "memout" contains output
 | |
| which is to be stored in memory.  This last is used when a buil-
 | |
| tin command appears in backquotes, to allow its output to be col-
 | |
| lected without doing any I/O through the UNIX operating system.
 | |
| The variables out1 and out2 normally point to output and errout,
 | |
| respectively, but they are set to point to memout when appropri-
 | |
| ate inside backquotes.
 | |
| 
 | |
| INPUT:  The basic input routine is pgetc, which reads from the
 | |
| current input file.  There is a stack of input files; the current
 | |
| input file is the top file on this stack.  The code allows the
 | |
| input to come from a string rather than a file.  (This is for the
 | |
| -c option and the "." and eval builtin commands.)  The global
 | |
| variable plinno is saved and restored when files are pushed and
 | |
| popped from the stack.  The parser routines store the number of
 | |
| the current line in this variable.
 | |
| 
 | |
| DEBUGGING:  If DEBUG is defined in shell.h, then the shell will
 | |
| write debugging information to the file $HOME/trace.  Most of
 | |
| this is done using the TRACE macro, which takes a set of printf
 | |
| arguments inside two sets of parenthesis.  Example:
 | |
| "TRACE(("n=%d0, n))".  The double parenthesis are necessary be-
 | |
| cause the preprocessor can't handle functions with a variable
 | |
| number of arguments.  Defining DEBUG also causes the shell to
 | |
| generate a core dump if it is sent a quit signal.  The tracing
 | |
| code is in show.c.
 | 
