This patch imports the unmodified current version of NetBSD libc. The NetBSD includes are in /nbsd_include, while the libc code itself is split between lib/nbsd_libc and common/lib/libc.
		
			
				
	
	
		
			373 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			373 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
$NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $
 | 
						|
 | 
						|
SoftFloat Release 2a General Documentation
 | 
						|
 | 
						|
John R. Hauser
 | 
						|
1998 December 13
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Introduction
 | 
						|
 | 
						|
SoftFloat is a software implementation of floating-point that conforms to
 | 
						|
the IEC/IEEE Standard for Binary Floating-Point Arithmetic.  As many as four
 | 
						|
formats are supported:  single precision, double precision, extended double
 | 
						|
precision, and quadruple precision.  All operations required by the standard
 | 
						|
are implemented, except for conversions to and from decimal.
 | 
						|
 | 
						|
This document gives information about the types defined and the routines
 | 
						|
implemented by SoftFloat.  It does not attempt to define or explain the
 | 
						|
IEC/IEEE Floating-Point Standard.  Details about the standard are available
 | 
						|
elsewhere.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Limitations
 | 
						|
 | 
						|
SoftFloat is written in C and is designed to work with other C code.  The
 | 
						|
SoftFloat header files assume an ISO/ANSI-style C compiler.  No attempt
 | 
						|
has been made to accommodate compilers that are not ISO-conformant.  In
 | 
						|
particular, the distributed header files will not be acceptable to any
 | 
						|
compiler that does not recognize function prototypes.
 | 
						|
 | 
						|
Support for the extended double-precision and quadruple-precision formats
 | 
						|
depends on a C compiler that implements 64-bit integer arithmetic.  If the
 | 
						|
largest integer format supported by the C compiler is 32 bits, SoftFloat is
 | 
						|
limited to only single and double precisions.  When that is the case, all
 | 
						|
references in this document to the extended double precision, quadruple
 | 
						|
precision, and 64-bit integers should be ignored.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Contents
 | 
						|
 | 
						|
    Introduction
 | 
						|
    Limitations
 | 
						|
    Contents
 | 
						|
    Legal Notice
 | 
						|
    Types and Functions
 | 
						|
    Rounding Modes
 | 
						|
    Extended Double-Precision Rounding Precision
 | 
						|
    Exceptions and Exception Flags
 | 
						|
    Function Details
 | 
						|
        Conversion Functions
 | 
						|
        Standard Arithmetic Functions
 | 
						|
        Remainder Functions
 | 
						|
        Round-to-Integer Functions
 | 
						|
        Comparison Functions
 | 
						|
        Signaling NaN Test Functions
 | 
						|
        Raise-Exception Function
 | 
						|
    Contact Information
 | 
						|
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Legal Notice
 | 
						|
 | 
						|
SoftFloat was written by John R. Hauser.  This work was made possible in
 | 
						|
part by the International Computer Science Institute, located at Suite 600,
 | 
						|
1947 Center Street, Berkeley, California 94704.  Funding was partially
 | 
						|
provided by the National Science Foundation under grant MIP-9311980.  The
 | 
						|
original version of this code was written as part of a project to build
 | 
						|
a fixed-point vector processor in collaboration with the University of
 | 
						|
California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.
 | 
						|
 | 
						|
THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort
 | 
						|
has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT
 | 
						|
TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO
 | 
						|
PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY
 | 
						|
AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Types and Functions
 | 
						|
 | 
						|
When 64-bit integers are supported by the compiler, the `softfloat.h' header
 | 
						|
file defines four types:  `float32' (single precision), `float64' (double
 | 
						|
precision), `floatx80' (extended double precision), and `float128'
 | 
						|
(quadruple precision).  The `float32' and `float64' types are defined in
 | 
						|
terms of 32-bit and 64-bit integer types, respectively, while the `float128'
 | 
						|
type is defined as a structure of two 64-bit integers, taking into account
 | 
						|
the byte order of the particular machine being used.  The `floatx80' type
 | 
						|
is defined as a structure containing one 16-bit and one 64-bit integer, with
 | 
						|
the machine's byte order again determining the order of the `high' and `low'
 | 
						|
fields.
 | 
						|
 | 
						|
When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'
 | 
						|
header file defines only two types:  `float32' and `float64'.  Because
 | 
						|
ISO/ANSI C guarantees at least one built-in integer type of 32 bits,
 | 
						|
the `float32' type is identified with an appropriate integer type.  The
 | 
						|
`float64' type is defined as a structure of two 32-bit integers, with the
 | 
						|
machine's byte order determining the order of the fields.
 | 
						|
 | 
						|
In either case, the types in `softfloat.h' are defined such that if a system
 | 
						|
implements the usual C `float' and `double' types according to the IEC/IEEE
 | 
						|
Standard, then the `float32' and `float64' types should be indistinguishable
 | 
						|
in memory from the native `float' and `double' types.  (On the other hand,
 | 
						|
when `float32' or `float64' values are placed in processor registers by
 | 
						|
the compiler, the type of registers used may differ from those used for the
 | 
						|
native `float' and `double' types.)
 | 
						|
 | 
						|
SoftFloat implements the following arithmetic operations:
 | 
						|
 | 
						|
-- Conversions among all the floating-point formats, and also between
 | 
						|
   integers (32-bit and 64-bit) and any of the floating-point formats.
 | 
						|
 | 
						|
-- The usual add, subtract, multiply, divide, and square root operations
 | 
						|
   for all floating-point formats.
 | 
						|
 | 
						|
-- For each format, the floating-point remainder operation defined by the
 | 
						|
   IEC/IEEE Standard.
 | 
						|
 | 
						|
-- For each floating-point format, a ``round to integer'' operation that
 | 
						|
   rounds to the nearest integer value in the same format.  (The floating-
 | 
						|
   point formats can hold integer values, of course.)
 | 
						|
 | 
						|
-- Comparisons between two values in the same floating-point format.
 | 
						|
 | 
						|
The only functions required by the IEC/IEEE Standard that are not provided
 | 
						|
are conversions to and from decimal.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Rounding Modes
 | 
						|
 | 
						|
All four rounding modes prescribed by the IEC/IEEE Standard are implemented
 | 
						|
for all operations that require rounding.  The rounding mode is selected
 | 
						|
by the global variable `float_rounding_mode'.  This variable may be set
 | 
						|
to one of the values `float_round_nearest_even', `float_round_to_zero',
 | 
						|
`float_round_down', or `float_round_up'.  The rounding mode is initialized
 | 
						|
to nearest/even.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Extended Double-Precision Rounding Precision
 | 
						|
 | 
						|
For extended double precision (`floatx80') only, the rounding precision
 | 
						|
of the standard arithmetic operations is controlled by the global variable
 | 
						|
`floatx80_rounding_precision'.  The operations affected are:
 | 
						|
 | 
						|
   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
 | 
						|
 | 
						|
When `floatx80_rounding_precision' is set to its default value of 80, these
 | 
						|
operations are rounded (as usual) to the full precision of the extended
 | 
						|
double-precision format.  Setting `floatx80_rounding_precision' to 32
 | 
						|
or to 64 causes the operations listed to be rounded to reduced precision
 | 
						|
equivalent to single precision (`float32') or to double precision
 | 
						|
(`float64'), respectively.  When rounding to reduced precision, additional
 | 
						|
bits in the result significand beyond the rounding point are set to zero.
 | 
						|
The consequences of setting `floatx80_rounding_precision' to a value other
 | 
						|
than 32, 64, or 80 is not specified.  Operations other than the ones listed
 | 
						|
above are not affected by `floatx80_rounding_precision'.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Exceptions and Exception Flags
 | 
						|
 | 
						|
All five exception flags required by the IEC/IEEE Standard are
 | 
						|
implemented.  Each flag is stored as a unique bit in the global variable
 | 
						|
`float_exception_flags'.  The positions of the exception flag bits within
 | 
						|
this variable are determined by the bit masks `float_flag_inexact',
 | 
						|
`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and
 | 
						|
`float_flag_invalid'.  The exception flags variable is initialized to all 0,
 | 
						|
meaning no exceptions.
 | 
						|
 | 
						|
An individual exception flag can be cleared with the statement
 | 
						|
 | 
						|
    float_exception_flags &= ~ float_flag_<exception>;
 | 
						|
 | 
						|
where `<exception>' is the appropriate name.  To raise a floating-point
 | 
						|
exception, the SoftFloat function `float_raise' should be used (see below).
 | 
						|
 | 
						|
In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess
 | 
						|
for underflow either before or after rounding.  The choice is made by
 | 
						|
the global variable `float_detect_tininess', which can be set to either
 | 
						|
`float_tininess_before_rounding' or `float_tininess_after_rounding'.
 | 
						|
Detecting tininess after rounding is better because it results in fewer
 | 
						|
spurious underflow signals.  The other option is provided for compatibility
 | 
						|
with some systems.  Like most systems, SoftFloat always detects loss of
 | 
						|
accuracy for underflow as an inexact result.
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Function Details
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Conversion Functions
 | 
						|
 | 
						|
All conversions among the floating-point formats are supported, as are all
 | 
						|
conversions between a floating-point format and 32-bit and 64-bit signed
 | 
						|
integers.  The complete set of conversion functions is:
 | 
						|
 | 
						|
   int32_to_float32      int64_to_float32
 | 
						|
   int32_to_float64      int64_to_float32
 | 
						|
   int32_to_floatx80     int64_to_floatx80
 | 
						|
   int32_to_float128     int64_to_float128
 | 
						|
 | 
						|
   float32_to_int32      float32_to_int64
 | 
						|
   float32_to_int32      float64_to_int64
 | 
						|
   floatx80_to_int32     floatx80_to_int64
 | 
						|
   float128_to_int32     float128_to_int64
 | 
						|
 | 
						|
   float32_to_float64    float32_to_floatx80   float32_to_float128
 | 
						|
   float64_to_float32    float64_to_floatx80   float64_to_float128
 | 
						|
   floatx80_to_float32   floatx80_to_float64   floatx80_to_float128
 | 
						|
   float128_to_float32   float128_to_float64   float128_to_floatx80
 | 
						|
 | 
						|
Each conversion function takes one operand of the appropriate type and
 | 
						|
returns one result.  Conversions from a smaller to a larger floating-point
 | 
						|
format are always exact and so require no rounding.  Conversions from 32-bit
 | 
						|
integers to double precision and larger formats are also exact, and likewise
 | 
						|
for conversions from 64-bit integers to extended double and quadruple
 | 
						|
precisions.
 | 
						|
 | 
						|
Conversions from floating-point to integer raise the invalid exception if
 | 
						|
the source value cannot be rounded to a representable integer of the desired
 | 
						|
size (32 or 64 bits).  If the floating-point operand is a NaN, the largest
 | 
						|
positive integer is returned.  Otherwise, if the conversion overflows, the
 | 
						|
largest integer with the same sign as the operand is returned.
 | 
						|
 | 
						|
On conversions to integer, if the floating-point operand is not already an
 | 
						|
integer value, the operand is rounded according to the current rounding
 | 
						|
mode as specified by `float_rounding_mode'.  Because C (and perhaps other
 | 
						|
languages) require that conversions to integers be rounded toward zero, the
 | 
						|
following functions are provided for improved speed and convenience:
 | 
						|
 | 
						|
   float32_to_int32_round_to_zero    float32_to_int64_round_to_zero
 | 
						|
   float64_to_int32_round_to_zero    float64_to_int64_round_to_zero
 | 
						|
   floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero
 | 
						|
   float128_to_int32_round_to_zero   float128_to_int64_round_to_zero
 | 
						|
 | 
						|
These variant functions ignore `float_rounding_mode' and always round toward
 | 
						|
zero.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Standard Arithmetic Functions
 | 
						|
 | 
						|
The following standard arithmetic functions are provided:
 | 
						|
 | 
						|
   float32_add    float32_sub    float32_mul    float32_div    float32_sqrt
 | 
						|
   float64_add    float64_sub    float64_mul    float64_div    float64_sqrt
 | 
						|
   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt
 | 
						|
   float128_add   float128_sub   float128_mul   float128_div   float128_sqrt
 | 
						|
 | 
						|
Each function takes two operands, except for `sqrt' which takes only one.
 | 
						|
The operands and result are all of the same type.
 | 
						|
 | 
						|
Rounding of the extended double-precision (`floatx80') functions is affected
 | 
						|
by the `floatx80_rounding_precision' variable, as explained above in the
 | 
						|
section _Extended_Double-Precision_Rounding_Precision_.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Remainder Functions
 | 
						|
 | 
						|
For each format, SoftFloat implements the remainder function according to
 | 
						|
the IEC/IEEE Standard.  The remainder functions are:
 | 
						|
 | 
						|
   float32_rem
 | 
						|
   float64_rem
 | 
						|
   floatx80_rem
 | 
						|
   float128_rem
 | 
						|
 | 
						|
Each remainder function takes two operands.  The operands and result are all
 | 
						|
of the same type.  Given operands x and y, the remainder functions return
 | 
						|
the value x - n*y, where n is the integer closest to x/y.  If x/y is exactly
 | 
						|
halfway between two integers, n is the even integer closest to x/y.  The
 | 
						|
remainder functions are always exact and so require no rounding.
 | 
						|
 | 
						|
Depending on the relative magnitudes of the operands, the remainder
 | 
						|
functions can take considerably longer to execute than the other SoftFloat
 | 
						|
functions.  This is inherent in the remainder operation itself and is not a
 | 
						|
flaw in the SoftFloat implementation.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Round-to-Integer Functions
 | 
						|
 | 
						|
For each format, SoftFloat implements the round-to-integer function
 | 
						|
specified by the IEC/IEEE Standard.  The functions are:
 | 
						|
 | 
						|
   float32_round_to_int
 | 
						|
   float64_round_to_int
 | 
						|
   floatx80_round_to_int
 | 
						|
   float128_round_to_int
 | 
						|
 | 
						|
Each function takes a single floating-point operand and returns a result of
 | 
						|
the same type.  (Note that the result is not an integer type.)  The operand
 | 
						|
is rounded to an exact integer according to the current rounding mode, and
 | 
						|
the resulting integer value is returned in the same floating-point format.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Comparison Functions
 | 
						|
 | 
						|
The following floating-point comparison functions are provided:
 | 
						|
 | 
						|
   float32_eq    float32_le    float32_lt
 | 
						|
   float64_eq    float64_le    float64_lt
 | 
						|
   floatx80_eq   floatx80_le   floatx80_lt
 | 
						|
   float128_eq   float128_le   float128_lt
 | 
						|
 | 
						|
Each function takes two operands of the same type and returns a 1 or 0
 | 
						|
representing either _true_ or _false_.  The abbreviation `eq' stands for
 | 
						|
``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands
 | 
						|
for ``less than'' (<).
 | 
						|
 | 
						|
The standard greater-than (>), greater-than-or-equal (>=), and not-equal
 | 
						|
(!=) functions are easily obtained using the functions provided.  The
 | 
						|
not-equal function is just the logical complement of the equal function.
 | 
						|
The greater-than-or-equal function is identical to the less-than-or-equal
 | 
						|
function with the operands reversed; and the greater-than function can be
 | 
						|
obtained from the less-than function in the same way.
 | 
						|
 | 
						|
The IEC/IEEE Standard specifies that the less-than-or-equal and less-than
 | 
						|
functions raise the invalid exception if either input is any kind of NaN.
 | 
						|
The equal functions, on the other hand, are defined not to raise the invalid
 | 
						|
exception on quiet NaNs.  For completeness, SoftFloat provides the following
 | 
						|
additional functions:
 | 
						|
 | 
						|
   float32_eq_signaling    float32_le_quiet    float32_lt_quiet
 | 
						|
   float64_eq_signaling    float64_le_quiet    float64_lt_quiet
 | 
						|
   floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet
 | 
						|
   float128_eq_signaling   float128_le_quiet   float128_lt_quiet
 | 
						|
 | 
						|
The `signaling' equal functions are identical to the standard functions
 | 
						|
except that the invalid exception is raised for any NaN input.  Likewise,
 | 
						|
the `quiet' comparison functions are identical to their counterparts except
 | 
						|
that the invalid exception is not raised for quiet NaNs.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Signaling NaN Test Functions
 | 
						|
 | 
						|
The following functions test whether a floating-point value is a signaling
 | 
						|
NaN:
 | 
						|
 | 
						|
   float32_is_signaling_nan
 | 
						|
   float64_is_signaling_nan
 | 
						|
   floatx80_is_signaling_nan
 | 
						|
   float128_is_signaling_nan
 | 
						|
 | 
						|
The functions take one operand and return 1 if the operand is a signaling
 | 
						|
NaN and 0 otherwise.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
Raise-Exception Function
 | 
						|
 | 
						|
SoftFloat provides a function for raising floating-point exceptions:
 | 
						|
 | 
						|
    float_raise
 | 
						|
 | 
						|
The function takes a mask indicating the set of exceptions to raise.  No
 | 
						|
result is returned.  In addition to setting the specified exception flags,
 | 
						|
this function may cause a trap or abort appropriate for the current system.
 | 
						|
 | 
						|
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 | 
						|
 | 
						|
 | 
						|
-------------------------------------------------------------------------------
 | 
						|
Contact Information
 | 
						|
 | 
						|
At the time of this writing, the most up-to-date information about
 | 
						|
SoftFloat and the latest release can be found at the Web page `http://
 | 
						|
HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.
 | 
						|
 | 
						|
 |