395 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			395 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Running the Tests
 | 
						|
=================
 | 
						|
 | 
						|
All the tests are executed using the "Run" script in the top-level directory.
 | 
						|
 | 
						|
The simplest way to generate results is with the commmand:
 | 
						|
    ./Run
 | 
						|
 | 
						|
This will run a standard "index" test (see "The BYTE Index" below), and
 | 
						|
save the report in the "results" directory, with a filename like
 | 
						|
    hostname-2007-09-23-01
 | 
						|
An HTML version is also saved.
 | 
						|
 | 
						|
If you want to generate both the basic system index and the graphics index,
 | 
						|
then do:
 | 
						|
    ./Run gindex
 | 
						|
 | 
						|
If your system has more than one CPU, the tests will be run twice -- once
 | 
						|
with a single copy of each test running at once, and once with N copies,
 | 
						|
where N is the number of CPUs.  Some categories of tests, however (currently
 | 
						|
the graphics tests) will only run with a single copy.
 | 
						|
 | 
						|
Since the tests are based on constant time (variable work), a "system"
 | 
						|
run usually takes about 29 minutes; the "graphics" part about 18 minutes.
 | 
						|
A "gindex" run on a dual-core machine will do 2 "system" passes (single-
 | 
						|
and dual-processing) and one "graphics" run, for a total around one and
 | 
						|
a quarter hours.
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
Detailed Usage
 | 
						|
==============
 | 
						|
 | 
						|
The Run script takes a number of options which you can use to customise a
 | 
						|
test, and you can specify the names of the tests to run.  The full usage
 | 
						|
is:
 | 
						|
 | 
						|
    Run [ -q | -v ] [-i <n> ] [-c <n> [-c <n> ...]] [test ...]
 | 
						|
 | 
						|
The option flags are:
 | 
						|
 | 
						|
  -q            Run in quiet mode.
 | 
						|
  -v            Run in verbose mode.
 | 
						|
  -i <count>    Run <count> iterations for each test -- slower tests
 | 
						|
                use <count> / 3, but at least 1.  Defaults to 10 (3 for
 | 
						|
                slow tests).
 | 
						|
  -c <n>        Run <n> copies of each test in parallel.
 | 
						|
 | 
						|
The -c option can be given multiple times; for example:
 | 
						|
 | 
						|
    ./Run -c 1 -c 4
 | 
						|
 | 
						|
will run a single-streamed pass, then a 4-streamed pass.  Note that some
 | 
						|
tests (currently the graphics tests) will only run in a single-streamed pass.
 | 
						|
 | 
						|
The remaining non-flag arguments are taken to be the names of tests to run.
 | 
						|
The default is to run "index".  See "Tests" below.
 | 
						|
 | 
						|
When running the tests, I do *not* recommend switching to single-user mode
 | 
						|
("init 1").  This seems to change the results in ways I don't understand,
 | 
						|
and it's not realistic (unless your system will actually be running in this
 | 
						|
mode, of course).  However, if using a windowing system, you may want to
 | 
						|
switch to a minimal window setup (for example, log in to a "twm" session),
 | 
						|
so that randomly-churning background processes don't randomise the results
 | 
						|
too much.  This is particularly true for the graphics tests.
 | 
						|
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
Tests
 | 
						|
=====
 | 
						|
 | 
						|
The available tests are organised into categories; when generating index
 | 
						|
scores (see "The BYTE Index" below) the results for each category are
 | 
						|
produced separately.  The categories are:
 | 
						|
 | 
						|
   system          The original Unix system tests (not all are actually
 | 
						|
                   in the index)
 | 
						|
   2d              2D graphics tests (not all are actually in the index)
 | 
						|
   3d              3D graphics tests
 | 
						|
   misc            Various non-indexed tests
 | 
						|
 | 
						|
The following individual tests are available:
 | 
						|
 | 
						|
  system:
 | 
						|
    dhry2reg         Dhrystone 2 using register variables
 | 
						|
    whetstone-double Double-Precision Whetstone
 | 
						|
    syscall          System Call Overhead
 | 
						|
    pipe             Pipe Throughput
 | 
						|
    context1         Pipe-based Context Switching
 | 
						|
    spawn            Process Creation
 | 
						|
    execl            Execl Throughput
 | 
						|
    fstime-w         File Write 1024 bufsize 2000 maxblocks
 | 
						|
    fstime-r         File Read 1024 bufsize 2000 maxblocks
 | 
						|
    fstime           File Copy 1024 bufsize 2000 maxblocks
 | 
						|
    fsbuffer-w       File Write 256 bufsize 500 maxblocks
 | 
						|
    fsbuffer-r       File Read 256 bufsize 500 maxblocks
 | 
						|
    fsbuffer         File Copy 256 bufsize 500 maxblocks
 | 
						|
    fsdisk-w         File Write 4096 bufsize 8000 maxblocks
 | 
						|
    fsdisk-r         File Read 4096 bufsize 8000 maxblocks
 | 
						|
    fsdisk           File Copy 4096 bufsize 8000 maxblocks
 | 
						|
    shell1           Shell Scripts (1 concurrent) (runs "looper 60 multi.sh 1")
 | 
						|
    shell8           Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 8")
 | 
						|
    shell16          Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 16")
 | 
						|
 | 
						|
  2d:
 | 
						|
    2d-rects         2D graphics: rectangles
 | 
						|
    2d-lines         2D graphics: lines
 | 
						|
    2d-circle        2D graphics: circles
 | 
						|
    2d-ellipse       2D graphics: ellipses
 | 
						|
    2d-shapes        2D graphics: polygons
 | 
						|
    2d-aashapes      2D graphics: aa polygons
 | 
						|
    2d-polys         2D graphics: complex polygons
 | 
						|
    2d-text          2D graphics: text
 | 
						|
    2d-blit          2D graphics: images and blits
 | 
						|
    2d-window        2D graphics: windows
 | 
						|
 | 
						|
  3d:
 | 
						|
    ubgears          3D graphics: gears
 | 
						|
 | 
						|
  misc:
 | 
						|
    C                C Compiler Throughput ("looper 60 $cCompiler cctest.c")
 | 
						|
    arithoh          Arithoh (huh?)
 | 
						|
    short            Arithmetic Test (short) (this is arith.c configured for
 | 
						|
                     "short" variables; ditto for the ones below)
 | 
						|
    int              Arithmetic Test (int)
 | 
						|
    long             Arithmetic Test (long)
 | 
						|
    float            Arithmetic Test (float)
 | 
						|
    double           Arithmetic Test (double)
 | 
						|
    dc               Dc: sqrt(2) to 99 decimal places (runs
 | 
						|
                     "looper 30 dc < dc.dat", using your system's copy of "dc")
 | 
						|
    hanoi            Recursion Test -- Tower of Hanoi
 | 
						|
    grep             Grep for a string in a large file, using your system's
 | 
						|
                     copy of "grep"
 | 
						|
    sysexec          Exercise fork() and exec().
 | 
						|
 | 
						|
The following pseudo-test names are aliases for combinations of other
 | 
						|
tests:
 | 
						|
 | 
						|
    arithmetic       Runs arithoh, short, int, long, float, double,
 | 
						|
                     and whetstone-double
 | 
						|
    dhry             Alias for dhry2reg
 | 
						|
    dhrystone        Alias for dhry2reg
 | 
						|
    whets            Alias for whetstone-double
 | 
						|
    whetstone        Alias for whetstone-double
 | 
						|
    load             Runs shell1, shell8, and shell16
 | 
						|
    misc             Runs C, dc, and hanoi
 | 
						|
    speed            Runs the arithmetic and system groups
 | 
						|
    oldsystem        Runs execl, fstime, fsbuffer, fsdisk, pipe, context1,
 | 
						|
                     spawn, and syscall
 | 
						|
    system           Runs oldsystem plus shell1, shell8, and shell16
 | 
						|
    fs               Runs fstime-w, fstime-r, fstime, fsbuffer-w,
 | 
						|
                     fsbuffer-r, fsbuffer, fsdisk-w, fsdisk-r, and fsdisk
 | 
						|
    shell            Runs shell1, shell8, and shell16
 | 
						|
 | 
						|
    index            Runs the tests which constitute the official index:
 | 
						|
                     the oldsystem group, plus dhry2reg, whetstone-double,
 | 
						|
                     shell1, and shell8
 | 
						|
                     See "The BYTE Index" below for more information.
 | 
						|
    graphics         Runs the tests which constitute the graphics index:
 | 
						|
                     2d-rects, 2d-ellipse, 2d-aashapes, 2d-text, 2d-blit,
 | 
						|
                     2d-window, and ubgears
 | 
						|
    gindex           Runs the index and graphics groups, to generate both
 | 
						|
                     sets of index results
 | 
						|
 | 
						|
    all              Runs all tests
 | 
						|
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
The BYTE Index
 | 
						|
==============
 | 
						|
 | 
						|
The purpose of this test is to provide a basic indicator of the performance
 | 
						|
of a Unix-like system; hence, multiple tests are used to test various
 | 
						|
aspects of the system's performance.  These test results are then compared
 | 
						|
to the scores from a baseline system to produce an index value, which is
 | 
						|
generally easier to handle than the raw sores.  The entire set of index
 | 
						|
values is then combined to make an overall index for the system.
 | 
						|
 | 
						|
Since 1995, the baseline system has been "George", a SPARCstation 20-61
 | 
						|
with 128 MB RAM, a SPARC Storage Array, and Solaris 2.3, whose ratings
 | 
						|
were set at 10.0.  (So a system which scores 520 is 52 times faster than
 | 
						|
this machine.)  Since the numbers are really only useful in a relative
 | 
						|
sense, there's no particular reason to update the base system, so for the
 | 
						|
sake of consistency it's probably best to leave it alone.  George's scores
 | 
						|
are in the file "pgms/index.base"; this file is used to calculate the
 | 
						|
index scores for any particular run.
 | 
						|
 | 
						|
Over the years, various changes have been made to the set of tests in the
 | 
						|
index.  Although there is a desire for a consistent baseline, various tests
 | 
						|
have been determined to be misleading, and have been removed; and a few
 | 
						|
alternatives have been added.  These changes are detailed in the README,
 | 
						|
and should be born in mind when looking at old scores.
 | 
						|
 | 
						|
A number of tests are included in the benchmark suite which are not part of
 | 
						|
the index, for various reasons; these tests can of course be run manually.
 | 
						|
See "Tests" above.
 | 
						|
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
Graphics Tests
 | 
						|
==============
 | 
						|
 | 
						|
As of version 5.1, UnixBench now contains some graphics benchmarks.  These
 | 
						|
are intended to give a rough idea of the general graphics performance of
 | 
						|
a system.
 | 
						|
 | 
						|
The graphics tests are in categories "2d" and "3d", so the index scores
 | 
						|
for these tests are separate from the basic system index.  This seems
 | 
						|
like a sensible division, since the graphics performance of a system
 | 
						|
depends largely on the graphics adaptor.
 | 
						|
 | 
						|
The tests currently consist of some 2D "x11perf" tests and "ubgears".
 | 
						|
 | 
						|
* The 2D tests are a selection of the x11perf tests, using the host
 | 
						|
  system's x11perf command (which must be installed and in the search
 | 
						|
  path).  Only a few of the x11perf tests are used, in the interests
 | 
						|
  of completing a test run in a reasonable time; if you want to do
 | 
						|
  detailed diagnosis of an X server or graphics chip, then use x11perf
 | 
						|
  directly.
 | 
						|
 | 
						|
* The 3D test is "ubgears", a modified version of the familiar "glxgears".
 | 
						|
  This version runs for 5 seconds to "warm up", then performs a timed
 | 
						|
  run and displays the average frames-per-second.
 | 
						|
 | 
						|
On multi-CPU systems, the graphics tests will only run in single-processing
 | 
						|
mode.  This is because the meaning of running two copies of a test at once
 | 
						|
is dubious; and the test windows tend to overlay each other, meaning that
 | 
						|
the window behind isn't actually doing any work.
 | 
						|
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
Multiple CPUs
 | 
						|
=============
 | 
						|
 | 
						|
If your system has multiple CPUs, the default behaviour is to run the selected
 | 
						|
tests twice -- once with one copy of each test program running at a time,
 | 
						|
and once with N copies, where N is the number of CPUs.  (You can override
 | 
						|
this with the "-c" option; see "Detailed Usage" above.)  This is designed to
 | 
						|
allow you to assess:
 | 
						|
 | 
						|
 - the performance of your system when running a single task
 | 
						|
 - the performance of your system when running multiple tasks
 | 
						|
 - the gain from your system's implementation of parallel processing
 | 
						|
 | 
						|
The results, however, need to be handled with care.  Here are the results
 | 
						|
of two runs on a dual-processor system, one in single-processing mode, one
 | 
						|
dual-processing:
 | 
						|
 | 
						|
  Test                    Single     Dual   Gain
 | 
						|
  --------------------    ------   ------   ----
 | 
						|
  Dhrystone 2              562.5   1110.3    97%
 | 
						|
  Double Whetstone         320.0    640.4   100%
 | 
						|
  Execl Throughput         450.4    880.3    95%
 | 
						|
  File Copy 1024           759.4    595.9   -22%
 | 
						|
  File Copy 256            535.8    438.8   -18%
 | 
						|
  File Copy 4096          1261.8   1043.4   -17%
 | 
						|
  Pipe Throughput          481.0    979.3   104%
 | 
						|
  Pipe-based Switching     326.8   1229.0   276%
 | 
						|
  Process Creation         917.2   1714.1    87%
 | 
						|
  Shell Scripts (1)       1064.9   1566.3    47%
 | 
						|
  Shell Scripts (8)       1567.7   1709.9     9%
 | 
						|
  System Call Overhead     944.2   1445.5    53%
 | 
						|
  --------------------    ------   ------   ----
 | 
						|
  Index Score:             678.2   1026.2    51%
 | 
						|
 | 
						|
As expected, the heavily CPU-dependent tasks -- dhrystone, whetstone,
 | 
						|
execl, pipe throughput, process creation -- show close to 100% gain when
 | 
						|
running 2 copies in parallel.
 | 
						|
 | 
						|
The Pipe-based Context Switching test measures context switching overhead
 | 
						|
by sending messages back and forth between 2 processes.  I don't know why
 | 
						|
it shows such a huge gain with 2 copies (ie. 4 processes total) running,
 | 
						|
but it seems to be consistent on my system.  I think this may be an issue
 | 
						|
with the SMP implementation.
 | 
						|
 | 
						|
The System Call Overhead shows a lesser gain, presumably because it uses a
 | 
						|
lot of CPU time in single-threaded kernel code.  The shell scripts test with
 | 
						|
8 concurrent processes shows no gain -- because the test itself runs 8
 | 
						|
scripts in parallel, it's already using both CPUs, even when the benchmark
 | 
						|
is run in single-stream mode.  The same test with one process per copy
 | 
						|
shows a real gain.
 | 
						|
 | 
						|
The filesystem throughput tests show a loss, instead of a gain, when
 | 
						|
multi-processing.  That there's no gain is to be expected, since the tests
 | 
						|
are presumably constrained by the throughput of the I/O subsystem and the
 | 
						|
disk drive itself; the drop in performance is presumably down to the
 | 
						|
increased contention for resources, and perhaps greater disk head movement.
 | 
						|
 | 
						|
So what tests should you use, how many copies should you run, and how should
 | 
						|
you interpret the results?  Well, that's up to you, since it depends on
 | 
						|
what it is you're trying to measure.
 | 
						|
 | 
						|
Implementation
 | 
						|
--------------
 | 
						|
 | 
						|
The multi-processing mode is implemented at the level of test iterations.
 | 
						|
During each iteration of a test, N slave processes are started using fork().
 | 
						|
Each of these slaves executes the test program using fork() and exec(),
 | 
						|
reads and stores the entire output, times the run, and prints all the
 | 
						|
results to a pipe.  The Run script reads the pipes for each of the slaves
 | 
						|
in turn to get the results and times.  The scores are added, and the times
 | 
						|
averaged.
 | 
						|
 | 
						|
The result is that each test program has N copies running at once.  They
 | 
						|
should all finish at around the same time, since they run for constant time.
 | 
						|
 | 
						|
If a test program itself starts off K multiple processes (as with the shell8
 | 
						|
test), then the effect will be that there are N * K processes running at
 | 
						|
once.  This is probably not very useful for testing multi-CPU performance.
 | 
						|
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
The Language Setting
 | 
						|
====================
 | 
						|
 | 
						|
The $LANG environment variable determines how programs abnd library
 | 
						|
routines interpret text.  This can have a big impact on the test results.
 | 
						|
 | 
						|
If $LANG is set to POSIX, or is left unset, text is treated as ASCII; if
 | 
						|
it is set to en_US.UTF-8, foir example, then text is treated as being
 | 
						|
encoded in UTF-8, which is more complex and therefore slower.  Setting
 | 
						|
it to other languages can have varying results.
 | 
						|
 | 
						|
To ensure consistency between test runs, the Run script now (as of version
 | 
						|
5.1.1) sets $LANG to "en_US.utf8".
 | 
						|
 | 
						|
This setting which is configured with the variable "$language".  You
 | 
						|
should not change this if you want to share your results to allow
 | 
						|
comparisons between systems; however, you may want to change it to see
 | 
						|
how different language settings affect performance.
 | 
						|
 | 
						|
Each test report now includes the language settings in use.  The reported
 | 
						|
language is what is set in $LANG, and is not necessarily supported by the
 | 
						|
system; but we also report the character mapping and collation order which
 | 
						|
are actually in use (as reported by "locale").
 | 
						|
 | 
						|
 | 
						|
============================================================================
 | 
						|
 | 
						|
Interpreting the Results
 | 
						|
========================
 | 
						|
 | 
						|
Interpreting the results of these tests is tricky, and totally depends on
 | 
						|
what you're trying to measure.
 | 
						|
 | 
						|
For example, are you trying to measure how fast your CPU is?  Or how good
 | 
						|
your compiler is?  Because these tests are all recompiled using your host
 | 
						|
system's compiler, the performance of the compiler will inevitably impact
 | 
						|
the performance of the tests.  Is this a problem?  If you're choosing a
 | 
						|
system, you probably care about its overall speed, which may well depend
 | 
						|
on how good its compiler is; so including that in the test results may be
 | 
						|
the right answer.  But you may want to ensure that the right compiler is
 | 
						|
used to build the tests.
 | 
						|
 | 
						|
On the other hand, with the vast majority of Unix systems being x86 / PC
 | 
						|
compatibles, running Linux and the GNU C compiler, the results will tend
 | 
						|
to be more dependent on the hardware; but the versions of the compiler and
 | 
						|
OS can make a big difference.  (I measured a 50% gain between SUSE 10.1
 | 
						|
and OpenSUSE 10.2 on the same machine.)  So you may want to make sure that
 | 
						|
all your test systems are running the same version of the OS; or at least
 | 
						|
publish the OS and compuiler versions with your results.  Then again, it may
 | 
						|
be compiler performance that you're interested in.
 | 
						|
 | 
						|
The C test is very dubious -- it tests the speed of compilation.  If you're
 | 
						|
running the exact same compiler on each system, OK; but otherwise, the
 | 
						|
results should probably be discarded.  A slower compilation doesn't say
 | 
						|
anything about the speed of your system, since the compiler may simply be
 | 
						|
spending more time to super-optimise the code, which would actually make it
 | 
						|
faster.
 | 
						|
 | 
						|
This will be particularly true on architectures like IA-64 (Itanium etc.)
 | 
						|
where the compiler spends huge amounts of effort scheduling instructions
 | 
						|
to run in parallel, with a resultant significant gain in execution speed.
 | 
						|
 | 
						|
Some tests are even more dubious in terms of host-dependency -- for example,
 | 
						|
the "dc" test uses the host's version of dc (a calculator program).  The
 | 
						|
version of this which is available can make a huge difference to the score,
 | 
						|
which is why it's not in the index group.  Read through the release notes
 | 
						|
for more on these kinds of issues.
 | 
						|
 | 
						|
Another age-old issue is that of the benchmarks being too trivial to be
 | 
						|
meaningful.  With compilers getting ever smarter, and performing more
 | 
						|
wide-ranging flow path analyses, the danger of parts of the benchmarks
 | 
						|
simply being optimised out of existance is always present.
 | 
						|
 | 
						|
All in all, the "index" and "gindex" tests (see above) are designed to
 | 
						|
give a reasonable measure of overall system performance; but the results
 | 
						|
of any test run should always be used with care.
 | 
						|
 |