407 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			407 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Version 5.1.2 -- 2007-12-26
 | 
						|
 | 
						|
================================================================
 | 
						|
To use Unixbench:
 | 
						|
 | 
						|
1. UnixBench from version 5.1 on has both system and graphics tests.
 | 
						|
   If you want to use the graphic tests, edit the Makefile and make sure
 | 
						|
   that the line "GRAPHIC_TESTS = defined" is not commented out; then check
 | 
						|
   that the "GL_LIBS" definition is OK for your system.  Also make sure
 | 
						|
   that the "x11perf" command is on your search path.
 | 
						|
 | 
						|
   If you don't want the graphics tests, then comment out the
 | 
						|
   "GRAPHIC_TESTS = defined" line.  Note: comment it out, don't
 | 
						|
   set it to anything.
 | 
						|
 | 
						|
2. Do "make".
 | 
						|
 | 
						|
3. Do "Run" to run the system test; "Run graphics" to run the graphics
 | 
						|
   tests; "Run gindex" to run both.
 | 
						|
 | 
						|
You will need perl, as Run is written in perl.
 | 
						|
 | 
						|
For more information on using the tests, read "USAGE".
 | 
						|
 | 
						|
For information on adding tests into the benchmark, see "WRITING_TESTS".
 | 
						|
 | 
						|
 | 
						|
===================== RELEASE NOTES =====================================
 | 
						|
 | 
						|
========================  Dec 07 ==========================
 | 
						|
 | 
						|
v5.1.2
 | 
						|
 | 
						|
One big fix: if unixbench is installed in a directory whose pathname contains
 | 
						|
a space, it should now run (previously it failed).
 | 
						|
 | 
						|
To avoid possible clashes, the environment variables unixbench uses are now
 | 
						|
prefixed with "UB_".  These are all optional, and for most people will be
 | 
						|
completely unnecessary, but if you want you can set these:
 | 
						|
 | 
						|
    UB_BINDIR      Directory where the test programs live.
 | 
						|
    UB_TMPDIR      Temp directory, for temp files.
 | 
						|
    UB_RESULTDIR   Directory to put results in.
 | 
						|
    UB_TESTDIR     Directory where the tests are executed.
 | 
						|
 | 
						|
And a couple of tiny fixes:
 | 
						|
* In pgms/tst.sh, changed "sort -n +1" to "sort -n -k 1"
 | 
						|
* In Makefile, made it clearer that GRAPHIC_TESTS should be commented
 | 
						|
  out (not set to 0) to disable graphics
 | 
						|
Thanks to nordi for pointing these out.
 | 
						|
 | 
						|
 | 
						|
Ian Smith, December 26, 2007
 | 
						|
johantheghost at yahoo period com
 | 
						|
 | 
						|
 | 
						|
========================  Oct 07 ==========================
 | 
						|
 | 
						|
v5.1.1
 | 
						|
 | 
						|
It turns out that the setting of LANG is crucial to the results.  This
 | 
						|
explains why people in different regions were seeing odd results, and also
 | 
						|
why runlevel 1 produced odd results -- runlevel 1 doesn't set LANG, and
 | 
						|
hence reverts to ASCII, whereas most people use a UTF-8 encoding, which is
 | 
						|
much slower in some tests (eg. shell tests).
 | 
						|
 | 
						|
So now we manually set LANG to "en_US.utf8", which is configured with the
 | 
						|
variable "$language".  Don't change this if you want to share your results.
 | 
						|
We also report the language settings in use.
 | 
						|
 | 
						|
See "The Language Setting" in USAGE for more info.  Thanks to nordi for
 | 
						|
pointing out the LANG issue.
 | 
						|
 | 
						|
I also added the "grep" and "sysexec" tests.  These are non-index tests,
 | 
						|
and "grep" uses the system's grep, so it's not much use for comparing
 | 
						|
different systems.  But some folks on the OpenSuSE list have been finding
 | 
						|
these useful.  They aren't in any of the main test groups; do "Run grep
 | 
						|
sysexec" to run them.
 | 
						|
 | 
						|
Index Changes
 | 
						|
-------------
 | 
						|
 | 
						|
The setting of LANG will affect consistency with systems where this is
 | 
						|
not the default value.  However, it should produce more consistent results
 | 
						|
in future.
 | 
						|
 | 
						|
 | 
						|
Ian Smith, October 15, 2007
 | 
						|
johantheghost at yahoo period com
 | 
						|
 | 
						|
 | 
						|
========================  Oct 07 ==========================
 | 
						|
 | 
						|
v5.1
 | 
						|
 | 
						|
The major new feature in this version is the addition of graphical
 | 
						|
benchmarks.  Since these may not compile on all systems, you can enable/
 | 
						|
disable them with the GRAPHIC_TESTS variable in the Makefile.
 | 
						|
 | 
						|
As before, each test is run for 3 or 10 iterations.  However, we now discard
 | 
						|
the worst 1/3 of the scores before averaging the remainder.  The logic is
 | 
						|
that a glitch in the system (background process waking up, for example) may
 | 
						|
make one or two runs go slow, so let's discard those.  Hopefully this will
 | 
						|
produce more consistent and repeatable results.  Check the log file
 | 
						|
for a test run to see the discarded scores.
 | 
						|
 | 
						|
Made the tests compile and run on x86-64/Linux (fixed an execl bug passing
 | 
						|
int instead of pointer).
 | 
						|
 | 
						|
Also fixed some general bugs.
 | 
						|
 | 
						|
Thanks to Stefan Esser for help and testing / bug reporting.
 | 
						|
 | 
						|
Index Changes
 | 
						|
-------------
 | 
						|
 | 
						|
The tests are now divided into categories, and each category generates
 | 
						|
its own index.  This keeps the graphics test results separate from
 | 
						|
the system tests.
 | 
						|
 | 
						|
The "graphics" test and corresponding index are new.
 | 
						|
 | 
						|
The "discard the worst scores" strategy should produce slightly higher
 | 
						|
test scores, but at least they should (hopefully!) be more consistent.
 | 
						|
The scores should not be higher than the best scores you would have got
 | 
						|
with 5.0, so this should not be a huge consistency issue.
 | 
						|
 | 
						|
Ian Smith, October 11, 2007
 | 
						|
johantheghost at yahoo period com
 | 
						|
 | 
						|
 | 
						|
========================  Sep 07 ==========================
 | 
						|
 | 
						|
v5.0
 | 
						|
 | 
						|
All the work I've done on this release is Linux-based, because that's
 | 
						|
the only Unix I have access to.  I've tried to make it more OS-agnostic
 | 
						|
if anything; for example, it no longer has to figure out the format reported
 | 
						|
by /usr/bin/time.  However, it's possible that portability has been damaged.
 | 
						|
If anyone wants to fix this, please feel free to mail me patches.
 | 
						|
 | 
						|
In particular, the analysis of the system's CPUs is done via /proc/cpuinfo.
 | 
						|
For systems which don't have this, please make appropriate changes in
 | 
						|
getCpuInfo() and getSystemInfo().
 | 
						|
 | 
						|
The big change has been to make the tests multi-CPU aware.  See the
 | 
						|
"Multiple CPUs" section in "USAGE" for details.  Other changes:
 | 
						|
 | 
						|
* Completely rewrote Run in Perl; drastically simplified the way data is
 | 
						|
  processed.  The confusing system of interlocking shell and awk scripts is
 | 
						|
  now just one script.  Various intermediate files used to store and process
 | 
						|
  results are now replaced by Perl data structures internal to the script.
 | 
						|
 | 
						|
* Removed from the index runs file system read and write tests which were
 | 
						|
  ignored for the index and wasted about 10 minutes per run (see fstime.c).
 | 
						|
  The read and write tests can now be selected individually.  Made fstime.c
 | 
						|
  take parameters, so we no longer need to build 3 versions of it.
 | 
						|
 | 
						|
* Made the output file names unique; they are built from
 | 
						|
  hostname-date-sequence.
 | 
						|
 | 
						|
* Worked on result reporting, error handling, and logging.  See TESTS.
 | 
						|
  We now generate both text and HTML reports.
 | 
						|
 | 
						|
* Removed some obsolete files.
 | 
						|
 | 
						|
Index Changes
 | 
						|
-------------
 | 
						|
 | 
						|
The index is still based on David Niemi's SPARCstation 20-61 (rated at 10.0),
 | 
						|
and the intention in the changes I've made has been to keep the tests
 | 
						|
unchanged, in order to maintain consistency with old result sets.
 | 
						|
 | 
						|
However, the following changes have been made to the index:
 | 
						|
 | 
						|
* The Pipe-based Context Switching test (context1) was being dropped
 | 
						|
  from the index report in v4.1.0 due to a bug; I've put it back in.
 | 
						|
 | 
						|
* I've added shell1 to the index, to get a measure of how the shell tests
 | 
						|
  scale with multiple CPUs (shell8 already exercises all the CPUs, even
 | 
						|
  in single-copy mode).  I made up the baseline score for this by
 | 
						|
  extrapolation.
 | 
						|
 | 
						|
Both of these test can be dropped, if you wish, by editing the "TEST
 | 
						|
SPECIFICATIONS" section of Run.
 | 
						|
 | 
						|
Ian Smith, September 20, 2007
 | 
						|
johantheghost at yahoo period com
 | 
						|
 | 
						|
========================  Aug 97 ==========================
 | 
						|
 | 
						|
v4.1.0
 | 
						|
 | 
						|
Double precision Whetstone put in place instead of the old "double" benchmark.
 | 
						|
 | 
						|
Removal of some obsolete files.
 | 
						|
 | 
						|
"system" suite adds shell8.
 | 
						|
 | 
						|
perlbench and poll added as "exhibition" (non-index) benchmarks.
 | 
						|
 | 
						|
Incorporates several suggestions by Andre Derrick Balsa <andrewbalsa@usa.net>
 | 
						|
 | 
						|
Code cleanups to reduce compiler warnings by David C Niemi <niemi@tux.org>
 | 
						|
and Andy Kahn <kahn@zk3.dec.com>; Digital Unix options by Andy Kahn.
 | 
						|
 | 
						|
========================  Jun 97 ==========================
 | 
						|
 | 
						|
v4.0.1
 | 
						|
 | 
						|
Minor change to fstime.c to fix overflow problems on fast machines.  Counting
 | 
						|
is now done in units of 256 (smallest BUFSIZE) and unsigned longs are used,
 | 
						|
giving another 23 dB or so of headroom ;^)  Results should be virtually
 | 
						|
identical aside from very small rounding errors.
 | 
						|
 | 
						|
========================  Dec 95 ==========================
 | 
						|
 | 
						|
v4.0
 | 
						|
 | 
						|
Byte no longer seems to have anything to do with this benchmark, and I was
 | 
						|
unable to reach any of the original authors; so I have taken it upon myself
 | 
						|
to clean it up.
 | 
						|
 | 
						|
This is version 4.  Major assumptions made in these benchmarks have changed
 | 
						|
since they were written, but they are nonetheless popular (particularly for
 | 
						|
measuring hardware for Linux).  Some changes made:
 | 
						|
 | 
						|
-	The biggest change is to put a lot more operating system-oriented
 | 
						|
	tests into the index.  I experimented for a while with a decibel-like
 | 
						|
	logarithmic scale, but finally settled on using a geometric mean for
 | 
						|
	the final index (the individual scores are a normalized, and their
 | 
						|
	logs are averaged; the resulting value is exponentiated).
 | 
						|
 | 
						|
	"George", certain SPARCstation 20-61 with 128 MB RAM, a SPARC Storage
 | 
						|
	Array, and Solaris 2.3 is my new baseline; it is rated at 10.0 in each
 | 
						|
	of the index scores for a final score of 10.0.
 | 
						|
 | 
						|
	Overall I find the geometric averaging is a big improvement for
 | 
						|
	avoiding the skew that was once possible (e.g. a Pentium-75 which got
 | 
						|
	40 on the buggy version of fstime, such that fstime accounted for over
 | 
						|
	half of its total score and hence wildly skewed its average).
 | 
						|
 | 
						|
	I also expect that the new numbers look different enough from the old
 | 
						|
	ones that no one is too likely to casually mistake them for each other.
 | 
						|
 | 
						|
	I am finding new SPARCs running Solaris 2.4 getting about 15-20, and
 | 
						|
	my 486 DX2-66 Compaq running Linux 1.3.45 got a 9.1.  It got
 | 
						|
	understandably poor scores on CPU and FPU benchmarks (a horrible
 | 
						|
	1.8 on "double" and 1.3 on "fsdisk"); but made up for it by averaging
 | 
						|
	over 20 on the OS-oriented benchmarks.  The Pentium-75 running
 | 
						|
	Linux gets about 20 (and it *still* runs Windows 3.1 slowly.  Oh well).
 | 
						|
 | 
						|
-	It is difficult to get a modern compiler to even consider making
 | 
						|
	dhry2 without registers, short of turning off *all* optimizations.
 | 
						|
	This is also not a terribly meaningful test, even if it were possible,
 | 
						|
	as noone compiles without registers nowadays.  Replaced this benchmark
 | 
						|
	with dhry2reg in the index, and dropped it out of usage in general as
 | 
						|
	it is so hard to make a legitimate one.
 | 
						|
 | 
						|
-	fstime: this had some bugs when compiled on modern systems which return
 | 
						|
	the number of bytes read/written for read(2)/write(2) calls.  The code
 | 
						|
	assumed that a negative return code was given for EOF, but most modern
 | 
						|
	systems return 0 (certainly on SunOS 4, Solaris2, and Linux, which is
 | 
						|
	what counts for me).  The old code yielded wildly inflated read scores,
 | 
						|
	would eat up tens of MB of disk space on fast systems, and yielded
 | 
						|
	roughly 50% lower than normal copy scores than it should have.
 | 
						|
 | 
						|
	Also, it counted partial blocks *fully*; made it count the proportional
 | 
						|
	part of the block which was actually finished.
 | 
						|
 | 
						|
	Made bigger and smaller variants of fstime which are designed to beat
 | 
						|
	up the disk I/O and the buffer cache, respectively.  Adjusted the
 | 
						|
	sleeps so that they are short for short benchmarks.
 | 
						|
 | 
						|
-	Instead of 1,2,4, and 8-shell benchmarks, went to 1, 8, and 16 to
 | 
						|
	give a broader range of information (and to run 1 fewer test).
 | 
						|
	The only real problem with this is that not many iterations get
 | 
						|
	done with 16 at a time on slow systems, so there are some significant
 | 
						|
	rounding errors; 8 therefore still used for the benchmark.  There is
 | 
						|
	also the problem that the last (uncompleted) loop is counted as a full
 | 
						|
	loop, so it is impossible to score below 1.0 lpm (which gave my laptop
 | 
						|
	a break).  Probably redesigning Shell to do each loop a bit more
 | 
						|
	quickly (but with less intensity) would be a good idea.
 | 
						|
 | 
						|
	This benchmark appears to be very heavily influenced by the speed
 | 
						|
	of the loader, by which shell is being used as /bin/sh, and by how
 | 
						|
	well-compiled some of the common shell utilities like grep, sed, and
 | 
						|
	sort are.  With a consistent tool set it is also a good indicator of
 | 
						|
	the bandwidth between main memory and the CPU (e.g. Pentia score about
 | 
						|
	twice as high as 486es due to their 64-bit bus).  Small, sometimes
 | 
						|
	broken shells like "ash-linux" do particularly well here, while big,
 | 
						|
	robust shells like bash do not.
 | 
						|
 | 
						|
-	"dc" is a somewhat iffy benchmark, because there are two versions of
 | 
						|
	it floating around, one being small, very fast, and buggy, and one
 | 
						|
	being more correct but slow.  It was never in the index anyway.
 | 
						|
 | 
						|
-	Execl is a somewhat troubling benchmark in that it yields much higher
 | 
						|
	scores if compiled statically.  I frown on this practice because it
 | 
						|
	distorts the scores away from reflecting how programs are really used
 | 
						|
	(i.e. dynamically linked).
 | 
						|
 | 
						|
-	Arithoh is really more an indicator of the compiler quality than of
 | 
						|
	the computer itself.  For example, GCC 2.7.x with -O2 and a few extra
 | 
						|
	options optimizes much of it away, resulting in about a 1200% boost
 | 
						|
	to the score.  Clearly not a good one for the index.
 | 
						|
 | 
						|
I am still a bit unhappy with the variance in some of the benchmarks, most
 | 
						|
notably the fstime suite; and with how long it takes to run.  But I think
 | 
						|
it gets significantly more reliable results than the older version in less
 | 
						|
time.
 | 
						|
 | 
						|
If anyone has ideas on how to make these benchmarks faster, lower-variance,
 | 
						|
or more meaningful; or has nice, new, portable benchmarks to add, don't
 | 
						|
hesitate to e-mail me.
 | 
						|
 | 
						|
David C Niemi <niemi@tux.org>		7 Dec 1995
 | 
						|
 | 
						|
========================  May 91 ==========================
 | 
						|
This is version 3. This set of programs should be able to determine if
 | 
						|
your system is BSD or SysV. (It uses the output format of time (1)
 | 
						|
to see. If you have any problems, contact me (by email,
 | 
						|
preferably): ben@bytepb.byte.com
 | 
						|
 | 
						|
---
 | 
						|
 | 
						|
The document doc/bench.doc describes the basic flow of the
 | 
						|
benchmark system. The document doc/bench3.doc describes the major
 | 
						|
changes in design of this version. As a user of the benchmarks,
 | 
						|
you should understand some of the methods that have been
 | 
						|
implemented to generate loop counts:
 | 
						|
 | 
						|
Tests that are compiled C code:
 | 
						|
  The function wake_me(second, func) is included (from the file
 | 
						|
timeit.c). This function uses signal and alarm to set a countdown
 | 
						|
for the time request by the benchmark administration script
 | 
						|
(Run). As soon as the clock is started, the test is run with a
 | 
						|
counter keeping track of the number of loops that the test makes.
 | 
						|
When alarm sends its signal, the loop counter value is sent to stderr
 | 
						|
and the program terminates. Since the time resolution, signal
 | 
						|
trapping and other factors don't insure that the test is for the
 | 
						|
precise time that was requested, the test program is also run
 | 
						|
from the time (1) command. The real time value returned from time
 | 
						|
(1) is what is used in calculating the number of loops per second
 | 
						|
(or minute, depending on the test).  As is obvious, there is some
 | 
						|
overhead time that is not taken into account, therefore the
 | 
						|
number of loops per second is not absolute. The overhead of the
 | 
						|
test starting and stopping and the signal and alarm calls is
 | 
						|
common to the overhead of real applications. If a program loads
 | 
						|
quickly, the number of loops per second increases; a phenomenon
 | 
						|
that favors systems that can load programs quickly. (Setting the
 | 
						|
sticky bit of the test programs is not considered fair play.)
 | 
						|
 | 
						|
Test that use existing UNIX programs or shell scripts:
 | 
						|
  The concept is the same as that of compiled tests, except the
 | 
						|
alarm and signal are contained in separate compiled program,
 | 
						|
looper (source is looper.c). Looper uses an execvp to invoke the
 | 
						|
test with its arguments. Here, the overhead includes the
 | 
						|
invocation and execution of looper.
 | 
						|
 | 
						|
--
 | 
						|
 | 
						|
The index numbers are generated from a baseline file that is in
 | 
						|
pgms/index.base. You can put tests that you wish in this file.
 | 
						|
All you need to do is take the results/log file from your
 | 
						|
baseline machine, edit out the comment and blank lines, and sort
 | 
						|
the result (vi/ex command: 1,$!sort). The sort in necessary
 | 
						|
because the process of generating the index report uses join (1).
 | 
						|
You can regenerate the reports by running "make report."
 | 
						|
 | 
						|
--
 | 
						|
 | 
						|
========================= Jan 90 =============================
 | 
						|
Tom Yager has joined the effort here at BYTE; he is responsible
 | 
						|
for many refinements in the UNIX benchmarks.
 | 
						|
 | 
						|
The memory access tests have been deleted from the benchmarks.
 | 
						|
The file access tests have been reversed so that the test is run
 | 
						|
for a fixed time. The amount of data transfered (written, read,
 | 
						|
and copied) is the variable. !WARNING! This test can eat up a
 | 
						|
large hunk of disk space.
 | 
						|
 | 
						|
The initial line of all shell scripts has been changed from the
 | 
						|
SCO and XENIX form (:) to the more standard form "#! /bin/sh".
 | 
						|
But different systems handle shell switching differently. Check
 | 
						|
the documentation on your system and find out how you are
 | 
						|
supposed to do it. Or, simpler yet, just run the benchmarks from
 | 
						|
the Bourne shell. (You may need to set SHELL=/bin/sh as well.)
 | 
						|
 | 
						|
The options to Run have not been checked in a while. They may no
 | 
						|
longer function. Next time, I'll get back on them. There needs to
 | 
						|
be another option added (next time) that halts testing between
 | 
						|
each test. !WARNING! Some systems have caches that are not getting flushed
 | 
						|
before the next test or iteration is run. This can cause
 | 
						|
erroneous values.
 | 
						|
 | 
						|
========================= Sept 89 =============================
 | 
						|
The database (db) programs now have a tuneable message queue space.
 | 
						|
queue space. The default set in the Run script is 1024 bytes.
 | 
						|
Other major changes are in the format of the times. We now show
 | 
						|
Arithmetic and Geometric mean and standard deviation for User
 | 
						|
Time, System Time, and Real Time. Generally, in reporting, we
 | 
						|
plan on using the Real Time values with the benchs run with one
 | 
						|
active user (the bench user). Comments and arguments are requested.
 | 
						|
 | 
						|
contact: BIX bensmith or rick_g
 |