395 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			395 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Running the Tests
 | |
| =================
 | |
| 
 | |
| All the tests are executed using the "Run" script in the top-level directory.
 | |
| 
 | |
| The simplest way to generate results is with the commmand:
 | |
|     ./Run
 | |
| 
 | |
| This will run a standard "index" test (see "The BYTE Index" below), and
 | |
| save the report in the "results" directory, with a filename like
 | |
|     hostname-2007-09-23-01
 | |
| An HTML version is also saved.
 | |
| 
 | |
| If you want to generate both the basic system index and the graphics index,
 | |
| then do:
 | |
|     ./Run gindex
 | |
| 
 | |
| If your system has more than one CPU, the tests will be run twice -- once
 | |
| with a single copy of each test running at once, and once with N copies,
 | |
| where N is the number of CPUs.  Some categories of tests, however (currently
 | |
| the graphics tests) will only run with a single copy.
 | |
| 
 | |
| Since the tests are based on constant time (variable work), a "system"
 | |
| run usually takes about 29 minutes; the "graphics" part about 18 minutes.
 | |
| A "gindex" run on a dual-core machine will do 2 "system" passes (single-
 | |
| and dual-processing) and one "graphics" run, for a total around one and
 | |
| a quarter hours.
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| Detailed Usage
 | |
| ==============
 | |
| 
 | |
| The Run script takes a number of options which you can use to customise a
 | |
| test, and you can specify the names of the tests to run.  The full usage
 | |
| is:
 | |
| 
 | |
|     Run [ -q | -v ] [-i <n> ] [-c <n> [-c <n> ...]] [test ...]
 | |
| 
 | |
| The option flags are:
 | |
| 
 | |
|   -q            Run in quiet mode.
 | |
|   -v            Run in verbose mode.
 | |
|   -i <count>    Run <count> iterations for each test -- slower tests
 | |
|                 use <count> / 3, but at least 1.  Defaults to 10 (3 for
 | |
|                 slow tests).
 | |
|   -c <n>        Run <n> copies of each test in parallel.
 | |
| 
 | |
| The -c option can be given multiple times; for example:
 | |
| 
 | |
|     ./Run -c 1 -c 4
 | |
| 
 | |
| will run a single-streamed pass, then a 4-streamed pass.  Note that some
 | |
| tests (currently the graphics tests) will only run in a single-streamed pass.
 | |
| 
 | |
| The remaining non-flag arguments are taken to be the names of tests to run.
 | |
| The default is to run "index".  See "Tests" below.
 | |
| 
 | |
| When running the tests, I do *not* recommend switching to single-user mode
 | |
| ("init 1").  This seems to change the results in ways I don't understand,
 | |
| and it's not realistic (unless your system will actually be running in this
 | |
| mode, of course).  However, if using a windowing system, you may want to
 | |
| switch to a minimal window setup (for example, log in to a "twm" session),
 | |
| so that randomly-churning background processes don't randomise the results
 | |
| too much.  This is particularly true for the graphics tests.
 | |
| 
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| Tests
 | |
| =====
 | |
| 
 | |
| The available tests are organised into categories; when generating index
 | |
| scores (see "The BYTE Index" below) the results for each category are
 | |
| produced separately.  The categories are:
 | |
| 
 | |
|    system          The original Unix system tests (not all are actually
 | |
|                    in the index)
 | |
|    2d              2D graphics tests (not all are actually in the index)
 | |
|    3d              3D graphics tests
 | |
|    misc            Various non-indexed tests
 | |
| 
 | |
| The following individual tests are available:
 | |
| 
 | |
|   system:
 | |
|     dhry2reg         Dhrystone 2 using register variables
 | |
|     whetstone-double Double-Precision Whetstone
 | |
|     syscall          System Call Overhead
 | |
|     pipe             Pipe Throughput
 | |
|     context1         Pipe-based Context Switching
 | |
|     spawn            Process Creation
 | |
|     execl            Execl Throughput
 | |
|     fstime-w         File Write 1024 bufsize 2000 maxblocks
 | |
|     fstime-r         File Read 1024 bufsize 2000 maxblocks
 | |
|     fstime           File Copy 1024 bufsize 2000 maxblocks
 | |
|     fsbuffer-w       File Write 256 bufsize 500 maxblocks
 | |
|     fsbuffer-r       File Read 256 bufsize 500 maxblocks
 | |
|     fsbuffer         File Copy 256 bufsize 500 maxblocks
 | |
|     fsdisk-w         File Write 4096 bufsize 8000 maxblocks
 | |
|     fsdisk-r         File Read 4096 bufsize 8000 maxblocks
 | |
|     fsdisk           File Copy 4096 bufsize 8000 maxblocks
 | |
|     shell1           Shell Scripts (1 concurrent) (runs "looper 60 multi.sh 1")
 | |
|     shell8           Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 8")
 | |
|     shell16          Shell Scripts (8 concurrent) (runs "looper 60 multi.sh 16")
 | |
| 
 | |
|   2d:
 | |
|     2d-rects         2D graphics: rectangles
 | |
|     2d-lines         2D graphics: lines
 | |
|     2d-circle        2D graphics: circles
 | |
|     2d-ellipse       2D graphics: ellipses
 | |
|     2d-shapes        2D graphics: polygons
 | |
|     2d-aashapes      2D graphics: aa polygons
 | |
|     2d-polys         2D graphics: complex polygons
 | |
|     2d-text          2D graphics: text
 | |
|     2d-blit          2D graphics: images and blits
 | |
|     2d-window        2D graphics: windows
 | |
| 
 | |
|   3d:
 | |
|     ubgears          3D graphics: gears
 | |
| 
 | |
|   misc:
 | |
|     C                C Compiler Throughput ("looper 60 $cCompiler cctest.c")
 | |
|     arithoh          Arithoh (huh?)
 | |
|     short            Arithmetic Test (short) (this is arith.c configured for
 | |
|                      "short" variables; ditto for the ones below)
 | |
|     int              Arithmetic Test (int)
 | |
|     long             Arithmetic Test (long)
 | |
|     float            Arithmetic Test (float)
 | |
|     double           Arithmetic Test (double)
 | |
|     dc               Dc: sqrt(2) to 99 decimal places (runs
 | |
|                      "looper 30 dc < dc.dat", using your system's copy of "dc")
 | |
|     hanoi            Recursion Test -- Tower of Hanoi
 | |
|     grep             Grep for a string in a large file, using your system's
 | |
|                      copy of "grep"
 | |
|     sysexec          Exercise fork() and exec().
 | |
| 
 | |
| The following pseudo-test names are aliases for combinations of other
 | |
| tests:
 | |
| 
 | |
|     arithmetic       Runs arithoh, short, int, long, float, double,
 | |
|                      and whetstone-double
 | |
|     dhry             Alias for dhry2reg
 | |
|     dhrystone        Alias for dhry2reg
 | |
|     whets            Alias for whetstone-double
 | |
|     whetstone        Alias for whetstone-double
 | |
|     load             Runs shell1, shell8, and shell16
 | |
|     misc             Runs C, dc, and hanoi
 | |
|     speed            Runs the arithmetic and system groups
 | |
|     oldsystem        Runs execl, fstime, fsbuffer, fsdisk, pipe, context1,
 | |
|                      spawn, and syscall
 | |
|     system           Runs oldsystem plus shell1, shell8, and shell16
 | |
|     fs               Runs fstime-w, fstime-r, fstime, fsbuffer-w,
 | |
|                      fsbuffer-r, fsbuffer, fsdisk-w, fsdisk-r, and fsdisk
 | |
|     shell            Runs shell1, shell8, and shell16
 | |
| 
 | |
|     index            Runs the tests which constitute the official index:
 | |
|                      the oldsystem group, plus dhry2reg, whetstone-double,
 | |
|                      shell1, and shell8
 | |
|                      See "The BYTE Index" below for more information.
 | |
|     graphics         Runs the tests which constitute the graphics index:
 | |
|                      2d-rects, 2d-ellipse, 2d-aashapes, 2d-text, 2d-blit,
 | |
|                      2d-window, and ubgears
 | |
|     gindex           Runs the index and graphics groups, to generate both
 | |
|                      sets of index results
 | |
| 
 | |
|     all              Runs all tests
 | |
| 
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| The BYTE Index
 | |
| ==============
 | |
| 
 | |
| The purpose of this test is to provide a basic indicator of the performance
 | |
| of a Unix-like system; hence, multiple tests are used to test various
 | |
| aspects of the system's performance.  These test results are then compared
 | |
| to the scores from a baseline system to produce an index value, which is
 | |
| generally easier to handle than the raw sores.  The entire set of index
 | |
| values is then combined to make an overall index for the system.
 | |
| 
 | |
| Since 1995, the baseline system has been "George", a SPARCstation 20-61
 | |
| with 128 MB RAM, a SPARC Storage Array, and Solaris 2.3, whose ratings
 | |
| were set at 10.0.  (So a system which scores 520 is 52 times faster than
 | |
| this machine.)  Since the numbers are really only useful in a relative
 | |
| sense, there's no particular reason to update the base system, so for the
 | |
| sake of consistency it's probably best to leave it alone.  George's scores
 | |
| are in the file "pgms/index.base"; this file is used to calculate the
 | |
| index scores for any particular run.
 | |
| 
 | |
| Over the years, various changes have been made to the set of tests in the
 | |
| index.  Although there is a desire for a consistent baseline, various tests
 | |
| have been determined to be misleading, and have been removed; and a few
 | |
| alternatives have been added.  These changes are detailed in the README,
 | |
| and should be born in mind when looking at old scores.
 | |
| 
 | |
| A number of tests are included in the benchmark suite which are not part of
 | |
| the index, for various reasons; these tests can of course be run manually.
 | |
| See "Tests" above.
 | |
| 
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| Graphics Tests
 | |
| ==============
 | |
| 
 | |
| As of version 5.1, UnixBench now contains some graphics benchmarks.  These
 | |
| are intended to give a rough idea of the general graphics performance of
 | |
| a system.
 | |
| 
 | |
| The graphics tests are in categories "2d" and "3d", so the index scores
 | |
| for these tests are separate from the basic system index.  This seems
 | |
| like a sensible division, since the graphics performance of a system
 | |
| depends largely on the graphics adaptor.
 | |
| 
 | |
| The tests currently consist of some 2D "x11perf" tests and "ubgears".
 | |
| 
 | |
| * The 2D tests are a selection of the x11perf tests, using the host
 | |
|   system's x11perf command (which must be installed and in the search
 | |
|   path).  Only a few of the x11perf tests are used, in the interests
 | |
|   of completing a test run in a reasonable time; if you want to do
 | |
|   detailed diagnosis of an X server or graphics chip, then use x11perf
 | |
|   directly.
 | |
| 
 | |
| * The 3D test is "ubgears", a modified version of the familiar "glxgears".
 | |
|   This version runs for 5 seconds to "warm up", then performs a timed
 | |
|   run and displays the average frames-per-second.
 | |
| 
 | |
| On multi-CPU systems, the graphics tests will only run in single-processing
 | |
| mode.  This is because the meaning of running two copies of a test at once
 | |
| is dubious; and the test windows tend to overlay each other, meaning that
 | |
| the window behind isn't actually doing any work.
 | |
| 
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| Multiple CPUs
 | |
| =============
 | |
| 
 | |
| If your system has multiple CPUs, the default behaviour is to run the selected
 | |
| tests twice -- once with one copy of each test program running at a time,
 | |
| and once with N copies, where N is the number of CPUs.  (You can override
 | |
| this with the "-c" option; see "Detailed Usage" above.)  This is designed to
 | |
| allow you to assess:
 | |
| 
 | |
|  - the performance of your system when running a single task
 | |
|  - the performance of your system when running multiple tasks
 | |
|  - the gain from your system's implementation of parallel processing
 | |
| 
 | |
| The results, however, need to be handled with care.  Here are the results
 | |
| of two runs on a dual-processor system, one in single-processing mode, one
 | |
| dual-processing:
 | |
| 
 | |
|   Test                    Single     Dual   Gain
 | |
|   --------------------    ------   ------   ----
 | |
|   Dhrystone 2              562.5   1110.3    97%
 | |
|   Double Whetstone         320.0    640.4   100%
 | |
|   Execl Throughput         450.4    880.3    95%
 | |
|   File Copy 1024           759.4    595.9   -22%
 | |
|   File Copy 256            535.8    438.8   -18%
 | |
|   File Copy 4096          1261.8   1043.4   -17%
 | |
|   Pipe Throughput          481.0    979.3   104%
 | |
|   Pipe-based Switching     326.8   1229.0   276%
 | |
|   Process Creation         917.2   1714.1    87%
 | |
|   Shell Scripts (1)       1064.9   1566.3    47%
 | |
|   Shell Scripts (8)       1567.7   1709.9     9%
 | |
|   System Call Overhead     944.2   1445.5    53%
 | |
|   --------------------    ------   ------   ----
 | |
|   Index Score:             678.2   1026.2    51%
 | |
| 
 | |
| As expected, the heavily CPU-dependent tasks -- dhrystone, whetstone,
 | |
| execl, pipe throughput, process creation -- show close to 100% gain when
 | |
| running 2 copies in parallel.
 | |
| 
 | |
| The Pipe-based Context Switching test measures context switching overhead
 | |
| by sending messages back and forth between 2 processes.  I don't know why
 | |
| it shows such a huge gain with 2 copies (ie. 4 processes total) running,
 | |
| but it seems to be consistent on my system.  I think this may be an issue
 | |
| with the SMP implementation.
 | |
| 
 | |
| The System Call Overhead shows a lesser gain, presumably because it uses a
 | |
| lot of CPU time in single-threaded kernel code.  The shell scripts test with
 | |
| 8 concurrent processes shows no gain -- because the test itself runs 8
 | |
| scripts in parallel, it's already using both CPUs, even when the benchmark
 | |
| is run in single-stream mode.  The same test with one process per copy
 | |
| shows a real gain.
 | |
| 
 | |
| The filesystem throughput tests show a loss, instead of a gain, when
 | |
| multi-processing.  That there's no gain is to be expected, since the tests
 | |
| are presumably constrained by the throughput of the I/O subsystem and the
 | |
| disk drive itself; the drop in performance is presumably down to the
 | |
| increased contention for resources, and perhaps greater disk head movement.
 | |
| 
 | |
| So what tests should you use, how many copies should you run, and how should
 | |
| you interpret the results?  Well, that's up to you, since it depends on
 | |
| what it is you're trying to measure.
 | |
| 
 | |
| Implementation
 | |
| --------------
 | |
| 
 | |
| The multi-processing mode is implemented at the level of test iterations.
 | |
| During each iteration of a test, N slave processes are started using fork().
 | |
| Each of these slaves executes the test program using fork() and exec(),
 | |
| reads and stores the entire output, times the run, and prints all the
 | |
| results to a pipe.  The Run script reads the pipes for each of the slaves
 | |
| in turn to get the results and times.  The scores are added, and the times
 | |
| averaged.
 | |
| 
 | |
| The result is that each test program has N copies running at once.  They
 | |
| should all finish at around the same time, since they run for constant time.
 | |
| 
 | |
| If a test program itself starts off K multiple processes (as with the shell8
 | |
| test), then the effect will be that there are N * K processes running at
 | |
| once.  This is probably not very useful for testing multi-CPU performance.
 | |
| 
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| The Language Setting
 | |
| ====================
 | |
| 
 | |
| The $LANG environment variable determines how programs abnd library
 | |
| routines interpret text.  This can have a big impact on the test results.
 | |
| 
 | |
| If $LANG is set to POSIX, or is left unset, text is treated as ASCII; if
 | |
| it is set to en_US.UTF-8, foir example, then text is treated as being
 | |
| encoded in UTF-8, which is more complex and therefore slower.  Setting
 | |
| it to other languages can have varying results.
 | |
| 
 | |
| To ensure consistency between test runs, the Run script now (as of version
 | |
| 5.1.1) sets $LANG to "en_US.utf8".
 | |
| 
 | |
| This setting which is configured with the variable "$language".  You
 | |
| should not change this if you want to share your results to allow
 | |
| comparisons between systems; however, you may want to change it to see
 | |
| how different language settings affect performance.
 | |
| 
 | |
| Each test report now includes the language settings in use.  The reported
 | |
| language is what is set in $LANG, and is not necessarily supported by the
 | |
| system; but we also report the character mapping and collation order which
 | |
| are actually in use (as reported by "locale").
 | |
| 
 | |
| 
 | |
| ============================================================================
 | |
| 
 | |
| Interpreting the Results
 | |
| ========================
 | |
| 
 | |
| Interpreting the results of these tests is tricky, and totally depends on
 | |
| what you're trying to measure.
 | |
| 
 | |
| For example, are you trying to measure how fast your CPU is?  Or how good
 | |
| your compiler is?  Because these tests are all recompiled using your host
 | |
| system's compiler, the performance of the compiler will inevitably impact
 | |
| the performance of the tests.  Is this a problem?  If you're choosing a
 | |
| system, you probably care about its overall speed, which may well depend
 | |
| on how good its compiler is; so including that in the test results may be
 | |
| the right answer.  But you may want to ensure that the right compiler is
 | |
| used to build the tests.
 | |
| 
 | |
| On the other hand, with the vast majority of Unix systems being x86 / PC
 | |
| compatibles, running Linux and the GNU C compiler, the results will tend
 | |
| to be more dependent on the hardware; but the versions of the compiler and
 | |
| OS can make a big difference.  (I measured a 50% gain between SUSE 10.1
 | |
| and OpenSUSE 10.2 on the same machine.)  So you may want to make sure that
 | |
| all your test systems are running the same version of the OS; or at least
 | |
| publish the OS and compuiler versions with your results.  Then again, it may
 | |
| be compiler performance that you're interested in.
 | |
| 
 | |
| The C test is very dubious -- it tests the speed of compilation.  If you're
 | |
| running the exact same compiler on each system, OK; but otherwise, the
 | |
| results should probably be discarded.  A slower compilation doesn't say
 | |
| anything about the speed of your system, since the compiler may simply be
 | |
| spending more time to super-optimise the code, which would actually make it
 | |
| faster.
 | |
| 
 | |
| This will be particularly true on architectures like IA-64 (Itanium etc.)
 | |
| where the compiler spends huge amounts of effort scheduling instructions
 | |
| to run in parallel, with a resultant significant gain in execution speed.
 | |
| 
 | |
| Some tests are even more dubious in terms of host-dependency -- for example,
 | |
| the "dc" test uses the host's version of dc (a calculator program).  The
 | |
| version of this which is available can make a huge difference to the score,
 | |
| which is why it's not in the index group.  Read through the release notes
 | |
| for more on these kinds of issues.
 | |
| 
 | |
| Another age-old issue is that of the benchmarks being too trivial to be
 | |
| meaningful.  With compilers getting ever smarter, and performing more
 | |
| wide-ranging flow path analyses, the danger of parts of the benchmarks
 | |
| simply being optimised out of existance is always present.
 | |
| 
 | |
| All in all, the "index" and "gindex" tests (see above) are designed to
 | |
| give a reasonable measure of overall system performance; but the results
 | |
| of any test run should always be used with care.
 | |
| 
 | 
