mirror of
https://github.com/johnwinans/rvalp.git
synced 2025-09-27 13:12:03 -04:00
Word smithing repairs to binary number chapter.
This commit is contained in:
parent
8eabe7ae0d
commit
af6327e220
@ -4,7 +4,7 @@
|
||||
This chapter discusses how data are represented and stored in a computer.
|
||||
|
||||
In the context of computing, {\em boolean} refers to a condition that can
|
||||
be either true and false and {\em binary} refers to the use of a base-2
|
||||
be either true or false and {\em binary} refers to the use of a base-2
|
||||
numeric system to represent numbers.
|
||||
|
||||
RISC-V assembly language uses binary to represent all values, be they
|
||||
@ -20,7 +20,7 @@ LSB,\ldots\ perhaps relocated from the RV32I chapter?}
|
||||
|
||||
Boolean functions apply on a per-bit basis.
|
||||
When applied to multi-bit values, each bit position is operated upon
|
||||
independently of the other bits.
|
||||
independent of the other bits.
|
||||
|
||||
RISC-V assembly language uses zero to represent {\em false} and one
|
||||
to represent {\em true}. In general, however, it is useful to relax
|
||||
@ -30,9 +30,10 @@ that is not {\em false} is therefore {\em true}.%
|
||||
many other languages as well as the common assembly language idioms
|
||||
discussed in this text.}
|
||||
|
||||
The reason for this relaxation is because, while a single binary digit
|
||||
(\gls{bit}) can represent the two values zero and one, the vast majority
|
||||
of the time data is processed by the CPU in groups of bits. These
|
||||
The reason for this relaxation is to describe the common case
|
||||
where the CPU processes data, multiple \gls{bit}s at-a-time.
|
||||
|
||||
These
|
||||
groups have names like \gls{byte} (8 bits), \gls{halfword} (16 bits)
|
||||
and \gls{fullword} (32 bits).
|
||||
|
||||
@ -49,7 +50,7 @@ If the input is 1 then the output is 0. If the input is 0 then the
|
||||
output is 1. In other words, the output value is {\em not} that of the
|
||||
input value.
|
||||
|
||||
Expressing the {\em not} function in the form a a truth table:
|
||||
Expressing the {\em not} function in the form of a truth table:
|
||||
|
||||
\begin{center}
|
||||
\begin{tabular}{c|c}
|
||||
@ -95,7 +96,7 @@ single bit. The output is 1 if and only if all of the input values are 1.
|
||||
Otherwise it is 0.
|
||||
|
||||
This function works like it does in spoken language. For example
|
||||
if A is 1 {\em AND} B is 1 then the output is 1 (true).
|
||||
if A is 1 {\em and} B is 1 then the output is 1 (true).
|
||||
Otherwise the output is 0 (false).
|
||||
|
||||
In mathematical notion, the {\em and} operator is expressed the same way
|
||||
@ -115,7 +116,7 @@ A & B & AB \\
|
||||
\end{center}
|
||||
|
||||
This text will use the operator used in the C language when discussing
|
||||
the {\em AND} operator in symbolic form. Specifically the ampersand: `\verb@&@'.
|
||||
the {\em and} operator in symbolic form. Specifically the ampersand: `\verb@&@'.
|
||||
|
||||
An eight-bit example:
|
||||
|
||||
@ -136,7 +137,7 @@ The boolean {\em or} function has two or more inputs and the output is a
|
||||
single bit. The output is 1 if at least one of the input values are 1.
|
||||
|
||||
This function works like it does in spoken language. For example
|
||||
if A is 1 {\em OR} B is 1 then the output is 1 (true).
|
||||
if A is 1 {\em or} B is 1 then the output is 1 (true).
|
||||
Otherwise the output is 0 (false).
|
||||
|
||||
In mathematical notion, the {\em or} operator is expressed using the plus
|
||||
@ -154,7 +155,7 @@ A & B & A$+$B \\
|
||||
\end{center}
|
||||
|
||||
This text will use the operator used in the C language when discussing
|
||||
the {\em OR} operator in symbolic form. Specifically the pipe: `\verb@|@'.
|
||||
the {\em or} operator in symbolic form. Specifically the pipe: `\verb@|@'.
|
||||
|
||||
An eight-bit example:
|
||||
|
||||
@ -175,7 +176,7 @@ The boolean {\em exclusive or} function has two or more inputs and the
|
||||
output is a single bit. The output is 1 if only an odd number of inputs
|
||||
are 1. Otherwise the output will be 0.
|
||||
|
||||
Note that when {\em XOR} is used with two inputs, the output
|
||||
Note that when {\em xor} is used with two inputs, the output
|
||||
is set to 1 (true) when the inputs have different values and 0
|
||||
(false) when the inputs both have the same value.
|
||||
|
||||
@ -194,7 +195,7 @@ A & B & A$\oplus{}$B \\
|
||||
\end{center}
|
||||
|
||||
This text will use the operator used in the C language when discussing
|
||||
the {\em XOR} operator in symbolic form. Specifically the carrot: `\verb@^@'.
|
||||
the {\em xor} operator in symbolic form. Specifically the carrot: `\verb@^@'.
|
||||
|
||||
|
||||
An eight-bit example:
|
||||
@ -218,7 +219,7 @@ A binary integer is constructed with only 1s and 0s in the same
|
||||
manner as decimal numbers are constructed with values from 0 to 9.
|
||||
|
||||
Counting in binary (base-2) uses the same basic rules as decimal (base-10).
|
||||
The difference comes in when we consider that there are ten decimal digits and
|
||||
The difference is when we consider that there are ten decimal digits and
|
||||
only two binary digits. Therefore, in base-10, we must carry when adding one to
|
||||
nine (because there is no digit representing a ten) and, in base-2, we must
|
||||
carry when adding one to one (because there is no digit representing a two.)
|
||||
@ -293,8 +294,8 @@ Interpreting the hexadecimal value on the fourth row by converting it to decimal
|
||||
\index{Most significant bit}\index{MSB|see {Most significant bit}}%
|
||||
\index{Least significant bit}\index{LSB|see {Least significant bit}}%
|
||||
We refer to the place values with the largest exponent (the one furthest to the
|
||||
left for any given base) as the {\em most significant} digit and the place value
|
||||
with the lowest exponent as the {\em least significant} digit. For binary
|
||||
left for any given base) as the most significant digit and the place value
|
||||
with the lowest exponent as the least significant digit. For binary
|
||||
numbers these are the \acrfull{msb} and \acrfull{lsb} respectively.%
|
||||
\footnote{Changing the value of the MSB will have a more {\em significant}
|
||||
impact on the numeric value than changing the value of the LSB.}
|
||||
@ -309,7 +310,7 @@ pattern is 0-1-0-1-0-1-0-\ldots) The next column in each base
|
||||
will cycle in the same manner except each of the values is repeated
|
||||
as many times as is represented by the place value (in the case of
|
||||
decimal, $10^1$ times, binary $2^1$ times, hex $16^1$ times. Again,
|
||||
the for binary numbers this pattern is 0-0-1-1-0-0-1-1-\ldots)
|
||||
the binary numbers for this pattern are 0-0-1-1-0-0-1-1-\ldots)
|
||||
This continues for as many columns as are needed to represent the
|
||||
magnitude of the desired number.
|
||||
|
||||
@ -364,11 +365,11 @@ numbers that start with 0b are interpreted as binary.
|
||||
\subsubsection{From Binary to Decimal}
|
||||
\label{section:bindec}
|
||||
|
||||
Alas, it is occasionally necessary to convert between decimal,
|
||||
It is occasionally necessary to convert between decimal,
|
||||
binary and/or hex.
|
||||
|
||||
To convert from binary to decimal, put the decimal value of the place values
|
||||
{\ldots8 4 2 1} over the binary digits like this:
|
||||
{\ldots8, 4, 2, 1} over the binary digits like this:
|
||||
|
||||
\begin{verbatim}
|
||||
Base-2 place values: 128 64 32 16 8 4 2 1
|
||||
@ -416,10 +417,11 @@ Hex: 6 D A E
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsubsection{From Hexadecimal to Binary}
|
||||
|
||||
Again, the four-bit mapping between binary and hex makes this
|
||||
task as straight forward as using a look-up table.
|
||||
The four-bit mapping between binary and hex makes this
|
||||
task as straight forward as using a look-up table to
|
||||
translate each \gls{hit} (Hex digIT) it to its unique
|
||||
four-bit pattern.
|
||||
|
||||
For each \gls{hit} (Hex digIT), translate it to its unique four-bit pattern.
|
||||
Perform this task either by memorizing each of the 16 patterns
|
||||
or by converting each hit to decimal first and then converting
|
||||
each four-bit binary value to decimal using the place-value summing
|
||||
@ -476,7 +478,7 @@ or by first converting the decimal value to binary and then
|
||||
from binary to hex by using the methods discussed above.
|
||||
|
||||
Because binary and hex are so closely related, performing
|
||||
a conversion by way of binary is quite straight forward.
|
||||
a conversion by way of binary is straight forward.
|
||||
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
@ -508,7 +510,7 @@ For example:
|
||||
\subsection{Signed Numbers}
|
||||
|
||||
There are multiple methods used to represent signed binary integers.
|
||||
The method used by most modern computers is called ``two's complement.''
|
||||
The method used by most modern computers is called {\em two's complement}.
|
||||
|
||||
A two's complement number is encoded in such a manner as to simplify
|
||||
the hardware used to add, subtract and compare integers.
|
||||
@ -577,10 +579,10 @@ ignored.
|
||||
\subsubsection{Converting between Positive and Negative}
|
||||
|
||||
Changing the sign on two's complement numbers can be described as
|
||||
inverting all of the bits (which is also known as the one's complement)
|
||||
inverting all of the bits (which is also known as the {\em one's complement})
|
||||
and then add one.
|
||||
|
||||
For example, inverting the number {\em four}:
|
||||
For example, inverting the number four:
|
||||
|
||||
\begin{verbatim}
|
||||
-128 64 32 16 8 4 2 1
|
||||
@ -644,8 +646,8 @@ To calculate $-4-8 = -12$
|
||||
|
||||
\begin{verbatim}
|
||||
-128 64 32 16 8 4 2 1
|
||||
1 1 1 1 1 1 0 0 <== -4
|
||||
- 0 0 0 0 1 0 0 0 <== 8
|
||||
1 1 1 1 1 1 0 0 <== -4 (minuend)
|
||||
- 0 0 0 0 1 0 0 0 <== 8 (subtrahend)
|
||||
|
||||
|
||||
1 1 1 <== carries
|
||||
@ -727,7 +729,7 @@ When subtracting {\em unsigned}, an overflow only occurs when the
|
||||
minuend is positive and the subtrahend is negative and difference is negative
|
||||
or when the minuend is negative and the subtrahend is positive and the
|
||||
difference is positive.%
|
||||
\footnote{Yeah, I had to look it up to remember which were which
|
||||
\footnote{I had to look it up to remember which were which
|
||||
too\ldots\ it is: minuend - subtrahend = difference.\cite{subtrahend}}
|
||||
|
||||
Consider the results of the addition of two {\em signed} numbers
|
||||
@ -748,7 +750,7 @@ while looking more closely at the carry values.
|
||||
|
||||
|
||||
|
||||
\autoref{sum:64+64} is an example of an {\em overflow}. As you can see, the problem is
|
||||
\autoref{sum:64+64} is an example of {\em signed overflow}. As shown, the problem is
|
||||
that the sum of two positive numbers has resulted in an obviously incorrect
|
||||
negative result due to a carry flowing into the sign-bit in the MSB.
|
||||
|
||||
@ -776,10 +778,10 @@ We say that this result has been {\em truncated}.
|
||||
\label{sum:-128+-128}
|
||||
\end{figure}
|
||||
|
||||
Truncation is not necessarily a bad thing. Consider figures
|
||||
\ref{sum:-3+-5} and \ref{sum:-2+10} where truncation is not a problem.
|
||||
In fact \autoref{sum:-2+10} demonstrates the importance of discarding
|
||||
the carry from the sum of the MSBs of {\em signed} numbers when addends
|
||||
Truncation is not necessarily a problem. Consider the truncations in
|
||||
figures \ref{sum:-3+-5} and \ref{sum:-2+10}.
|
||||
\autoref{sum:-2+10} demonstrates the importance of discarding
|
||||
the carry from the sum of the MSBs of signed numbers when addends
|
||||
do not have the same sign.
|
||||
|
||||
\begin{figure}[H]
|
||||
@ -808,7 +810,7 @@ do not have the same sign.
|
||||
\label{sum:-2+10}
|
||||
\end{figure}
|
||||
|
||||
Just like an unsigned number can {\em wrap around} as a result of
|
||||
Just like an unsigned number can wrap around as a result of
|
||||
successive additions, a signed number can so the same thing. The
|
||||
only difference is that signed numbers won't wrap from the maximum
|
||||
value back to zero, instead it will wrap from the most positive to
|
||||
@ -854,7 +856,8 @@ As do these:
|
||||
00000000000000000000000000000001100 <== 12
|
||||
\end{verbatim}
|
||||
|
||||
The phenomenon illustrated here is called {\em sign extension}.
|
||||
The lengthening of these numbers by replicating the digits on the left
|
||||
is what is called {\em sign extension}.
|
||||
|
||||
\begin{tcolorbox}
|
||||
Any signed number can have any quantity of additional MSBs added to it,
|
||||
@ -862,9 +865,9 @@ provided that they repeat the value of the sign bit.
|
||||
\end{tcolorbox}
|
||||
|
||||
\autoref{Figure:SignExtendNegative} illustrates extending the negative sign
|
||||
bit of {\em val} to the left by replicating it.
|
||||
When {\em val} is negative, its \acrshort{msb} (bit 19 in this example) will
|
||||
be set to 1. Extending this value to the left will set all the new bits
|
||||
bit to the left by replicating it.
|
||||
A negative number will have its \acrshort{msb} (bit 19 in this example)
|
||||
set to 1. Extending this value to the left will set all the new bits
|
||||
to the left of it to 1 as well.
|
||||
|
||||
\begin{figure}[ht]
|
||||
@ -874,9 +877,9 @@ to the left of it to 1 as well.
|
||||
\label{Figure:SignExtendNegative}
|
||||
\end{figure}
|
||||
|
||||
\autoref{Figure:SignExtendPositive} illustrates extending the positive sign
|
||||
bit of {\em val} to the left by replicating it.
|
||||
When {\em val} is positive, its \acrshort{msb} will be set to 0. Extending this
|
||||
\autoref{Figure:SignExtendPositive} illustrates extending the sign bit of a
|
||||
positive number to the left by replicating it.
|
||||
A positive number will have its \acrshort{msb} set to 0. Extending this
|
||||
value to the left will set all the new bits to the left of it to 0 as well.
|
||||
|
||||
\begin{figure}[ht]
|
||||
@ -888,8 +891,9 @@ value to the left will set all the new bits to the left of it to 0 as well.
|
||||
|
||||
|
||||
\label{ZeroExtension}
|
||||
In a similar vein, any {\em unsigned} number also may have any quantity of
|
||||
additional MSBs added to it provided that they are all zero. For example,
|
||||
In a similar vein, any unsigned number also may have any quantity of
|
||||
additional MSBs added to it provided that they are all zero. This is
|
||||
called {\em zero extension}. For example,
|
||||
the following all represent the same value:
|
||||
\begin{verbatim}
|
||||
1111 <== 15
|
||||
@ -902,8 +906,8 @@ Any {\em unsigned} number may be {\em zero extended} to any size.
|
||||
\end{tcolorbox}
|
||||
|
||||
\enote{Remove the sign-bit boxes from this figure?}%
|
||||
\autoref{Figure:ZeroExtend} illustrates zero-extending a 20-bit {\em val} to the
|
||||
left to form a 32-bit fullword.
|
||||
\autoref{Figure:ZeroExtend} illustrates zero-extending a 20-bit number to the
|
||||
left to form a 32-bit number.
|
||||
|
||||
\begin{figure}[ht]
|
||||
\centering
|
||||
@ -990,7 +994,7 @@ when using them to dump the contents of memory and/or files.}
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Memory Dump}
|
||||
|
||||
\listingRef{rvddt_memdump.out} shows a memory dump from the rvddt
|
||||
\listingRef{rvddt_memdump.out} shows a {\em memory dump} from the rvddt
|
||||
`d' command requesting a dump starting at address \hex{00002600}
|
||||
for the default quantity (\hex{100}) of bytes.
|
||||
|
||||
@ -1024,16 +1028,16 @@ The choice of which end of a multi-byte value is to be stored at the
|
||||
lowest byte address is referred to as {\em endianness.} For example,
|
||||
if a CPU were to store a \gls{halfword} into memory, should the byte
|
||||
containing the \acrfull{msb} (the {\em big} end) go first or does
|
||||
the byte with the \acrfull{lsb} (the {\em little} end) go first/into
|
||||
the lowest memory address?
|
||||
the byte with the \acrfull{lsb} (the {\em little} end) go first?
|
||||
|
||||
On the one hand the choice is arbitrary. On the other hand, it is
|
||||
possible that the choice could impact the performance of the system.%
|
||||
\footnote{See\cite{IEN137} for some history of the big/little-endian ``controversy.''}
|
||||
|
||||
IBM mainframe CPUs and the 68000 family store their bytes in big-endian
|
||||
order. While the Intel Pentium and most embedded processors are little
|
||||
endian. Some CPUs are even {\em bi-endian} in that they instructions that
|
||||
order. While the Intel Pentium and most embedded processors use
|
||||
little-endian order.
|
||||
Some CPUs are even {\em bi-endian} in that they have instructions that
|
||||
can change their order on the fly.
|
||||
|
||||
The RISC-V system uses the little-endian byte order.
|
||||
@ -1071,7 +1075,7 @@ CPU would recognize the contents as follows:
|
||||
\end{itemize}
|
||||
|
||||
\begin{tcolorbox}
|
||||
On a little-endian syatem, the bytes in the dump are in backwards order as
|
||||
On a little-endian system, the bytes in the dump are in reverse order as
|
||||
they would be used by the CPU if it were to read them as a multi-byte value.
|
||||
\end{tcolorbox}
|
||||
|
||||
@ -1089,12 +1093,12 @@ non-standard big-endian or bi-endian systems.''\cite[p.~6]{rvismv1v22:2017}
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Arrays and Character Strings}
|
||||
|
||||
While Endianness defines to how single values are stored in memory,
|
||||
While Endianness defines how single values are stored in memory,
|
||||
the {\em array} defines how multiple values are stored.
|
||||
|
||||
An array is a data structure comprised of an ordered set of elements.
|
||||
This text will limit its definition of {\em array} to those sets
|
||||
of elements that are all of the same {\em type}. Where {\em type}
|
||||
This text will limit its definition of array to a plurality of
|
||||
elements that are all of the same type. Where type
|
||||
refers to the size (number of bytes) and representation (signed,
|
||||
unsigned,\ldots) of each element.
|
||||
|
||||
@ -1156,29 +1160,35 @@ be conveyed to any code needing to consume or process the string.
|
||||
|
||||
In \listingRef{rvddt_memdump.out}, the 5-byte long array starting
|
||||
at address \hex{00002658} contains a string whose value can be
|
||||
expressed as either of:
|
||||
expressed as either: % \verb@76 61 6c 3d 00@ or \verb@"val="@.
|
||||
|
||||
\begin{itemize}
|
||||
\item \verb@76 61 6c 3d 00@
|
||||
\item \verb@"val="@
|
||||
\end{itemize}
|
||||
\verb@76 61 6c 3d 00@
|
||||
|
||||
or
|
||||
|
||||
\verb@"val="@
|
||||
|
||||
%\begin{itemize}
|
||||
%\item \verb@76 61 6c 3d 00@
|
||||
%\item \verb@"val="@
|
||||
%\end{itemize}
|
||||
|
||||
\index{ASCII}
|
||||
\index{ASCIIZ}
|
||||
When the double-quoted text form is used, the GNU assembler used in
|
||||
this text differentiates between {\em ascii} and {\em asciiz} strings
|
||||
such that an {\em ascii} string is {\em not} null terminated and an
|
||||
{\em asciiz} string {\em is} null terminated.
|
||||
such that an {\em ascii} string is {\bf not} null terminated and an
|
||||
{\em asciiz} string {\bf is} null terminated.
|
||||
|
||||
The value of providing a method to create a string that is {\em not}
|
||||
The value of providing a method to create a string that is not
|
||||
null terminated is that a program may define a large string by
|
||||
concatenating a number of {\em ascii} strings together and following the
|
||||
last with a byte of zero to null-terminate the lot.
|
||||
last with a byte of zero to null-terminate it.
|
||||
|
||||
It is a common mistake to create a string with a missing
|
||||
null terminator. The result of printing such a ``string'' is that
|
||||
the string is printed and as well as whatever random data bytes in
|
||||
memory that follows it until a byte whose value is zero is found
|
||||
null terminator. The result of printing such a string is that
|
||||
the string will be printed as well as whatever random data bytes in
|
||||
memory follow it until a byte whose value is zero is encountered
|
||||
by chance.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
Loading…
x
Reference in New Issue
Block a user