mirror of
https://github.com/johnwinans/rvalp.git
synced 2025-10-04 00:22:49 -04:00
1406 lines
52 KiB
TeX
1406 lines
52 KiB
TeX
\chapter{Numbers and Storage Systems}
|
|
\label{chapter:numbers}
|
|
|
|
This chapter discusses how data are represented and stored in a computer.
|
|
|
|
In the context of computing, {\em boolean} refers to a condition that can
|
|
be either true or false and {\em binary} refers to the use of a base-2
|
|
numeric system to represent numbers.
|
|
|
|
RISC-V assembly language uses binary to represent all values, be they
|
|
boolean or numeric. It is the context within which they are used that
|
|
determines whether they are boolean or numeric.
|
|
|
|
\enote{Add some diagrams here showing bits, bytes and the MSB,
|
|
LSB,\ldots\ perhaps relocated from the RV32I chapter?}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Boolean Functions}
|
|
|
|
Boolean functions apply on a per-bit basis.
|
|
When applied to multi-bit values, each bit position is operated upon
|
|
independent of the other bits.
|
|
|
|
RISC-V assembly language uses zero to represent {\em false} and one
|
|
to represent {\em true}. In general, however, it is useful to relax
|
|
this and define zero {\bf and only zero} to be {\em false} and anything
|
|
that is not {\em false} is therefore {\em true}.%
|
|
\footnote{This is how {\em true} and {\em false} behave in C, C++, and
|
|
many other languages as well as the common assembly language idioms
|
|
discussed in this text.}
|
|
|
|
The reason for this relaxation is to describe the common case
|
|
where the CPU processes data, multiple \gls{bit}s at-a-time.
|
|
|
|
These
|
|
groups have names like \gls{byte} (8 bits), \gls{halfword} (16 bits)
|
|
and \gls{fullword} (32 bits).
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{NOT}
|
|
|
|
The {\em NOT} operator applies to a single operand and represents the
|
|
opposite of the input.
|
|
\enote{Need to define unary, binary and ternary operators without
|
|
confusing binary operators with binary numbers.}
|
|
|
|
If the input is 1 then the output is 0. If the input is 0 then the
|
|
output is 1. In other words, the output value is {\em not} that of the
|
|
input value.
|
|
|
|
Expressing the {\em not} function in the form of a truth table:
|
|
|
|
\begin{center}
|
|
\begin{tabular}{c|c}
|
|
A & $\overline{\mbox{A}}$\\
|
|
\hline
|
|
0 & 1 \\
|
|
1 & 0 \\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
A truth table is drawn by indicating all of the possible input values on
|
|
the left of the vertical bar with each row displaying the output values
|
|
that correspond to the input for that row. The column headings are used
|
|
to define the illustrated operation expressed using a mathematical
|
|
notation. The {\em not} operation is indicated by the presence of
|
|
an {\em overline}.
|
|
|
|
In computer programming languages, things like an overline can not be
|
|
efficiently expressed using a standard keyboard. Therefore it is common
|
|
to use a notation such as that used by the C language when discussing
|
|
the {\em NOT} operator in symbolic form. Specifically the tilde: `\verb@~@'.
|
|
|
|
It is also uncommon to for programming languages to express boolean operations
|
|
on single-bit input(s). A more generalized operation is used that applies
|
|
to a set of bits all at once. For example, performing a {\em not} operation
|
|
of eight bits at once can be illustrated as:
|
|
|
|
|
|
\begin{verbatim}
|
|
~ 1 1 1 1 0 1 0 1 <== A
|
|
-----------------
|
|
0 0 0 0 1 0 1 0 <== output
|
|
\end{verbatim}
|
|
|
|
In a line of code the above might read like this: \verb@output = ~A@
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{AND}
|
|
|
|
The boolean {\em and} function has two or more inputs and the output is a
|
|
single bit. The output is 1 if and only if all of the input values are 1.
|
|
Otherwise it is 0.
|
|
|
|
This function works like it does in spoken language. For example
|
|
if A is 1 {\em and} B is 1 then the output is 1 (true).
|
|
Otherwise the output is 0 (false).
|
|
|
|
In mathematical notion, the {\em and} operator is expressed the same way
|
|
as is {\em multiplication}. That is by a raised dot between, or by
|
|
juxtaposition of, two variable names. It is also worth noting that,
|
|
in base-2, the {\em and} operation actually {\em is} multiplication!
|
|
|
|
\begin{center}
|
|
\begin{tabular}{cc|c}
|
|
A & B & AB \\
|
|
\hline
|
|
0 & 0 & 0 \\
|
|
0 & 1 & 0 \\
|
|
1 & 0 & 0 \\
|
|
1 & 1 & 1 \\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
This text will use the operator used in the C language when discussing
|
|
the {\em and} operator in symbolic form. Specifically the ampersand: `\verb@&@'.
|
|
|
|
An eight-bit example:
|
|
|
|
\begin{verbatim}
|
|
1 1 1 1 0 1 0 1 <== A
|
|
& 1 0 0 1 0 0 1 1 <== B
|
|
-----------------
|
|
1 0 0 1 0 0 0 1 <== output
|
|
\end{verbatim}
|
|
|
|
In a line of code the above might read like this: \verb@output = A & B@
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{OR}
|
|
|
|
The boolean {\em or} function has two or more inputs and the output is a
|
|
single bit. The output is 1 if at least one of the input values are 1.
|
|
|
|
This function works like it does in spoken language. For example
|
|
if A is 1 {\em or} B is 1 then the output is 1 (true).
|
|
Otherwise the output is 0 (false).
|
|
|
|
In mathematical notion, the {\em or} operator is expressed using the plus
|
|
($+$).
|
|
|
|
\begin{center}
|
|
\begin{tabular}{cc|c}
|
|
A & B & A$+$B \\
|
|
\hline
|
|
0 & 0 & 0 \\
|
|
0 & 1 & 1 \\
|
|
1 & 0 & 1 \\
|
|
1 & 1 & 1 \\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
This text will use the operator used in the C language when discussing
|
|
the {\em or} operator in symbolic form. Specifically the pipe: `\verb@|@'.
|
|
|
|
An eight-bit example:
|
|
|
|
\begin{verbatim}
|
|
1 1 1 1 0 1 0 1 <== A
|
|
| 1 0 0 1 0 0 1 1 <== B
|
|
-----------------
|
|
1 1 1 1 0 1 1 1 <== output
|
|
\end{verbatim}
|
|
|
|
In a line of code the above might read like this: \verb@output = A | B@
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{XOR}
|
|
|
|
The boolean {\em exclusive or} function has two or more inputs and the
|
|
output is a single bit. The output is 1 if only an odd number of inputs
|
|
are 1. Otherwise the output will be 0.
|
|
|
|
Note that when {\em xor} is used with two inputs, the output
|
|
is set to 1 (true) when the inputs have different values and 0
|
|
(false) when the inputs both have the same value.
|
|
|
|
In mathematical notion, the {\em xor} operator is expressed using the plus
|
|
in a circle ($\oplus$).
|
|
|
|
\begin{center}
|
|
\begin{tabular}{cc|c}
|
|
A & B & A$\oplus{}$B \\
|
|
\hline
|
|
0 & 0 & 0 \\
|
|
0 & 1 & 1 \\
|
|
1 & 0 & 1 \\
|
|
1 & 1 & 0 \\
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
This text will use the operator used in the C language when discussing
|
|
the {\em xor} operator in symbolic form. Specifically the carrot: `\verb@^@'.
|
|
|
|
|
|
An eight-bit example:
|
|
|
|
\begin{verbatim}
|
|
1 1 1 1 0 1 0 1 <== A
|
|
^ 1 0 0 1 0 0 1 1 <== B
|
|
-----------------
|
|
0 1 1 0 0 1 1 0 <== output
|
|
\end{verbatim}
|
|
|
|
In a line of code the above might read like this: \verb@output = A ^ B@
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Integers and Counting}
|
|
|
|
A binary integer is constructed with only 1s and 0s in the same
|
|
manner as decimal numbers are constructed with values from 0 to 9.
|
|
|
|
Counting in binary (base-2) uses the same basic rules as decimal (base-10).
|
|
The difference is when we consider that there are ten decimal digits and
|
|
only two binary digits. Therefore, in base-10, we must carry when adding one to
|
|
nine (because there is no digit representing a ten) and, in base-2, we must
|
|
carry when adding one to one (because there is no digit representing a two.)
|
|
|
|
\autoref{Figure:integers} shows an abridged table of the decimal, binary and
|
|
hexadecimal values ranging from $0_{10}$ to $129_{10}$.
|
|
|
|
\begin{figure}[t]
|
|
\begin{center}
|
|
\begin{tabular}{|c|c|c||c|c|c|c|c|c|c|c||c|c|}
|
|
\hline
|
|
\multicolumn{3}{|c||}{Decimal} & \multicolumn{8}{|c||}{Binary} & \multicolumn{2}{|c|}{Hex}\\
|
|
\hline
|
|
$10^2$ & $10^1$ & $10^0$ & $2^7$ & $2^6$ & $2^5$ & $2^4$ & $2^3$ & $2^2$ & $2^1$ & $2^0$ & $16^1$ & $16^0$ \\
|
|
\hline
|
|
100 & 10 & 1 & 128 & 64 & 32 & 16 & 8 & 4 & 2 & 1 & 16 & 1 \\
|
|
\hline \hline
|
|
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
|
|
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \\
|
|
0 & 0 & 2 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 2 \\
|
|
0 & 0 & 3 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 3 \\
|
|
0 & 0 & 4 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 4 \\
|
|
0 & 0 & 5 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 5 \\
|
|
0 & 0 & 6 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 6 \\
|
|
0 & 0 & 7 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 7 \\
|
|
0 & 0 & 8 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 8 \\
|
|
0 & 0 & 9 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 9 \\
|
|
0 & 1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 0 & 0 & a \\
|
|
0 & 1 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 & 0 & b \\
|
|
0 & 1 & 2 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & c \\
|
|
0 & 1 & 3 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 1 & 0 & d \\
|
|
0 & 1 & 4 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & e \\
|
|
0 & 1 & 5 & 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 & 0 & f \\
|
|
0 & 1 & 6 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\
|
|
0 & 1 & 7 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 & 1 & 1 \\
|
|
\hline
|
|
\multicolumn{3}{|c||}{\ldots} & \multicolumn{8}{|c||}{\ldots} & \multicolumn{2}{|c|}{\ldots}\\
|
|
\hline
|
|
1 & 2 & 5 & 0 & 1 & 1 & 1 & 1 & 1 & 0 & 1 & 7 & d \\
|
|
1 & 2 & 6 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 0 & 7 & e \\
|
|
1 & 2 & 7 & 0 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 7 & f \\
|
|
1 & 2 & 8 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 8 & 0 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
\captionof{figure}{Counting in decimal, binary and hexadecimal.}
|
|
\label{Figure:integers}
|
|
\end{figure}
|
|
|
|
One way to look at this table is on a per-row basis where each place
|
|
value is represented by the base raised to the power of the place value
|
|
position (shown in the column headings.)
|
|
%This is useful when converting arbitrary numeric values between bases.
|
|
For example to interpret the decimal value on the fourth row:
|
|
|
|
\begin{equation}
|
|
0 \times 10^2 + 0 \times 10^1 + 3 \times 10^0 = 3_{10}
|
|
\end{equation}
|
|
|
|
Interpreting the binary value on the fourth row by converting it to decimal:
|
|
|
|
\begin{equation}
|
|
0 \times 2^7 + 0 \times 2^6 +0 \times 2^5 +0 \times 2^4 +0 \times 2^3 +0 \times 2^2 + 1 \times 2^1 + 1 \times 2^0 = 3_{10}
|
|
\end{equation}
|
|
|
|
Interpreting the hexadecimal value on the fourth row by converting it to decimal:
|
|
|
|
\begin{equation}
|
|
0 \times 16^1 + 3 \times 16^0 = 3_{10}
|
|
\end{equation}
|
|
|
|
\index{Most significant bit}\index{MSB|see {Most significant bit}}%
|
|
\index{Least significant bit}\index{LSB|see {Least significant bit}}%
|
|
We refer to the place values with the largest exponent (the one furthest to the
|
|
left for any given base) as the most significant digit and the place value
|
|
with the lowest exponent as the least significant digit. For binary
|
|
numbers these are the \acrfull{msb} and \acrfull{lsb} respectively.%
|
|
\footnote{Changing the value of the MSB will have a more {\em significant}
|
|
impact on the numeric value than changing the value of the LSB.}
|
|
|
|
|
|
Another way to look at this table is on a per-column basis. When
|
|
tasked with drawing such a table by hand, it might be useful
|
|
to observe that, just as in decimal, the right-most column will
|
|
cycle through all of the values represented in the chosen base
|
|
then cycle back to zero and repeat. (For example, in binary this
|
|
pattern is 0-1-0-1-0-1-0-\ldots) The next column in each base
|
|
will cycle in the same manner except each of the values is repeated
|
|
as many times as is represented by the place value (in the case of
|
|
decimal, $10^1$ times, binary $2^1$ times, hex $16^1$ times. Again,
|
|
the binary numbers for this pattern are 0-0-1-1-0-0-1-1-\ldots)
|
|
This continues for as many columns as are needed to represent the
|
|
magnitude of the desired number.
|
|
|
|
Another item worth noting is that any even binary number will always
|
|
have a 0 LSB and odd numbers will always have a 1 LSB.
|
|
|
|
As is customary in decimal, leading zeros are sometimes not shown
|
|
for readability.
|
|
|
|
The relationship between binary and hex values is also worth taking
|
|
note. Because $2^4 = 16$, there is a clean and simple grouping
|
|
of 4 \gls{bit}s to 1 \gls{hit} (aka \gls{nybble}).
|
|
There is no such relationship between binary and decimal.
|
|
|
|
Writing and reading numbers in binary that are longer than 8 bits
|
|
is cumbersome and prone to error. The simple conversion between
|
|
binary and hex makes hex a convenient shorthand for expressing
|
|
binary values in many situations.
|
|
|
|
For example, consider the following value expressed in binary,
|
|
hexadecimal and decimal (spaced to show the relationship
|
|
between binary and hex):
|
|
|
|
\begin{verbatim}
|
|
Binary value: 0010 0111 1011 1010 1100 1100 1111 0101
|
|
Hex Value: 2 7 B A C C F 5
|
|
Decimal Value: 666553589
|
|
\end{verbatim}
|
|
|
|
Empirically we can see that grouping the bits into sets of four
|
|
allows an easy conversion to hex and expressing it as such is
|
|
$\frac{1}{4}$ as long as in binary while at the same time
|
|
allowing for easy conversion back to binary.
|
|
|
|
The decimal value in this example does not easily convey a sense
|
|
of the binary value.
|
|
|
|
\begin{tcolorbox}
|
|
In programming languages like the C, its derivitives and RISC-V
|
|
assembly, numeric values are interpreted as decimal {\bfseries unless}
|
|
they start with a zero (0).
|
|
Numbers that start with 0 are interpreted as octal (base-8),
|
|
numbers starting with 0x are interpreted as hexadecimal and
|
|
numbers that start with 0b are interpreted as binary.
|
|
\end{tcolorbox}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Converting Between Bases}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{From Binary to Decimal}
|
|
\label{section:bindec}
|
|
|
|
It is occasionally necessary to convert between decimal,
|
|
binary and/or hex.
|
|
|
|
To convert from binary to decimal, put the decimal value of the place values
|
|
{\ldots8, 4, 2, 1} over the binary digits like this:
|
|
|
|
\begin{verbatim}
|
|
Base-2 place values: 128 64 32 16 8 4 2 1
|
|
Binary: 0 0 0 1 1 0 1 1
|
|
Decimal: 16 +8 +2 +1 = 27
|
|
\end{verbatim}
|
|
|
|
Now sum the place-values that are expressed in decimal for each
|
|
bit with the value of 1: $16+8+2+1$. The integer binary value
|
|
$00011011_2$ represents the decimal value $27_{10}$.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{From Binary to Hexadecimal}
|
|
\label{section:binhex}
|
|
|
|
Conversion from binary to hex involves grouping the bits into
|
|
sets of four and then performing the same summing process as
|
|
shown above. If there is not a multiple of four bits then
|
|
extend the binary to the left with zeros to make it so.
|
|
|
|
Grouping the bits into sets of four and summing:
|
|
|
|
\begin{verbatim}
|
|
Base-2 place values: 8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 1
|
|
Binary: 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 0
|
|
Decimal: 4+2 =6 8+4+ 1=13 8+ 2 =10 8+4+2 =14
|
|
\end{verbatim}
|
|
|
|
After the summing, convert each decimal value to hex. The decimal
|
|
values from 0--9 are the same values in hex. Because we don't have any
|
|
more numerals to represent the values from 10-15, we use the first 6
|
|
letters (See the right-most column of \autoref{Figure:integers}.)
|
|
Fortunately there are only six hex mappings involving letters. Thus
|
|
it is reasonable to memorize them.
|
|
|
|
Continuing this example:
|
|
|
|
\begin{verbatim}
|
|
Decimal: 6 13 10 14
|
|
Hex: 6 D A E
|
|
\end{verbatim}
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{From Hexadecimal to Binary}
|
|
|
|
The four-bit mapping between binary and hex makes this
|
|
task as straight forward as using a look-up table to
|
|
translate each \gls{hit} (Hex digIT) it to its unique
|
|
four-bit pattern.
|
|
|
|
Perform this task either by memorizing each of the 16 patterns
|
|
or by converting each hit to decimal first and then converting
|
|
each four-bit binary value to decimal using the place-value summing
|
|
method discussed in \autoref{section:bindec}.
|
|
|
|
For example:
|
|
|
|
\begin{verbatim}
|
|
Hex: 7 C
|
|
Decimal Sum: 4+2+1=7 8+4 =12
|
|
Binary: 0 1 1 1 1 1 0 0
|
|
\end{verbatim}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{From Decimal to Binary}
|
|
|
|
To convert arbitrary decimal numbers to binary, extend the list
|
|
of binary place values until it exceeds the value of the decimal
|
|
number being converted. Then make successive subtractions of each
|
|
of the place values that would yield a non-negative result.
|
|
|
|
For example, to convert $1234_{10}$ to binary:
|
|
|
|
|
|
\begin{verbatim}
|
|
Base-2 place values: 2048-1024-512-256-128-64-32-16-8-4-2-1
|
|
|
|
0 2048 (too big)
|
|
1 1234 - 1024 = 210
|
|
0 512 (too big)
|
|
0 256 (too big)
|
|
1 210 - 128 = 82
|
|
1 82 - 64 = 18
|
|
0 32 (too big)
|
|
1 18 - 16 = 2
|
|
0 8 (too big)
|
|
0 4 (too big)
|
|
1 2 - 2 = 0
|
|
0 1 (too big)
|
|
\end{verbatim}
|
|
|
|
The answer using this notation is listed vertically
|
|
in the left column with the \acrshort{msb} on the top and
|
|
the \acrshort{lsb} on the bottom line: $010011010010_2$.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{From Decimal to Hex}
|
|
|
|
Conversion from decimal to hex can be done by using the place
|
|
values for base-16 and the same math as from decimal to binary
|
|
or by first converting the decimal value to binary and then
|
|
from binary to hex by using the methods discussed above.
|
|
|
|
Because binary and hex are so closely related, performing
|
|
a conversion by way of binary is straight forward.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Addition of Binary Numbers}
|
|
|
|
The addition of binary numbers can be performed long-hand the
|
|
same way decimal addition is taught in grade school. In fact binary
|
|
addition is easier since it only involves adding 0 or 1.
|
|
|
|
The first thing to note that in any number base $0+0=0$, $0+1=1$, and
|
|
$1+0=1$. Since there is no ``two'' in binary (just like there is
|
|
no ``ten'' decimal) adding $1+1$ results in a zero with a carry as
|
|
in: $1+1=10_2$ and in: $1+1+1=11_2$. Using these five sums, any two
|
|
binary integers can be added.
|
|
|
|
\index{Full Adder}%
|
|
This truth table shows what is called a {\em Full Addr}.
|
|
A full addr is a function that can add three input bits
|
|
(the two addends and a carry value from a ``prior column'')
|
|
and produce the sum and carry output values.\footnote{
|
|
Note that the sum could be expressed in Boolean Algebra as:
|
|
$sum = ci \oplus{} a \oplus{} b$}
|
|
|
|
\begin{center}
|
|
\begin{tabular}{|ccc|cc|}
|
|
\hline
|
|
%\multicolumn{3}{c}{input} & \multicolumn{2}{c}{output}\\
|
|
$ci$ & $a$ & $b$ & $co$ & $sum$\\
|
|
\hline
|
|
0 & 0 & 0 & 0 & 0 \\
|
|
0 & 0 & 1 & 0 & 1 \\
|
|
0 & 1 & 0 & 0 & 1 \\
|
|
0 & 1 & 1 & 1 & 0 \\
|
|
1 & 0 & 0 & 0 & 1 \\
|
|
1 & 0 & 1 & 1 & 0 \\
|
|
1 & 1 & 0 & 1 & 0 \\
|
|
1 & 1 & 1 & 1 & 1 \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{center}
|
|
|
|
Adding two unsigned binary numbers using 16 full adders:
|
|
|
|
\begin{verbatim}
|
|
111111 1111 <== carries
|
|
0110101111001111 <== addend
|
|
+ 0000011101100011 <== addend
|
|
------------------
|
|
0111001100110010 <== sum
|
|
\end{verbatim}
|
|
|
|
Note that the carry ``into'' the LSB is zero.
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Signed Numbers}
|
|
|
|
There are multiple methods used to represent signed binary integers.
|
|
The method used by most modern computers is called {\em two's complement}.
|
|
|
|
A two's complement number is encoded in such a manner as to simplify
|
|
the hardware used to add, subtract and compare integers.
|
|
|
|
A simple method of thinking about two's complement numbers is to
|
|
negate the place value of the \acrshort{msb}. For example, the
|
|
number one is represented the same as discussed before:
|
|
|
|
\begin{verbatim}
|
|
Base-2 place values: -128 64 32 16 8 4 2 1
|
|
Binary: 0 0 0 0 0 0 0 1
|
|
\end{verbatim}
|
|
|
|
The \acrshort{msb} of any negative number in this format will always
|
|
be 1. For example the value $-1_{10}$ is:
|
|
|
|
\begin{verbatim}
|
|
Base-2 place values: -128 64 32 16 8 4 2 1
|
|
Binary: 1 1 1 1 1 1 1 1
|
|
\end{verbatim}
|
|
|
|
\ldots because: $-128+64+32+16+8+4+2+1=-1$.
|
|
|
|
This format has the virtue of allowing the same addition logic discussed above to be
|
|
used to calculate the sums of signed numbers as unsigned numbers.
|
|
|
|
Calculating the signed addition: $4+5 = 9$
|
|
|
|
\begin{verbatim}
|
|
1 <== carries
|
|
000100 <== 4 = 0 + 0 + 0 + 4 + 0 + 0
|
|
+000101 <== 5 = 0 + 0 + 0 + 4 + 0 + 1
|
|
-------
|
|
001001 <== 9 = 0 + 0 + 8 + 0 + 0 + 1
|
|
\end{verbatim}
|
|
|
|
Calculating the signed addition: $-4+ -5 = -9$
|
|
|
|
\begin{verbatim}
|
|
1 11 <== carries
|
|
111100 <== -4 = -32 + 16 + 8 + 4 + 0 + 0
|
|
+111011 <== -5 = -32 + 16 + 8 + 0 + 2 + 1
|
|
---------
|
|
1 110111 <== -9 (with a truncation) = -32 + 16 + 4 + 2 + 1 = -9
|
|
\end{verbatim}
|
|
|
|
|
|
Calculating the signed addition: $-1+1=0$
|
|
|
|
\begin{verbatim}
|
|
-128 64 32 16 8 4 2 1 <== place value
|
|
1 1 1 1 1 1 1 1 <== carries
|
|
1 1 1 1 1 1 1 1 <== addend (-1)
|
|
+ 0 0 0 0 0 0 0 1 <== addend (1)
|
|
----------------------
|
|
1 0 0 0 0 0 0 0 0 <== sum (0 with a truncation)
|
|
\end{verbatim}
|
|
|
|
{\em In order for this to work, the carry out of the sum of the MSBs {\bfseries must} be discarded.}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{Converting between Positive and Negative}
|
|
|
|
Changing the sign on two's complement numbers can be described as
|
|
inverting all of the bits (which is also known as the {\em one's complement})
|
|
and then add one.
|
|
|
|
For example, negating the number four:
|
|
|
|
\begin{minipage}{\textwidth}
|
|
\begin{verbatim}
|
|
-128 64 32 16 8 4 2 1
|
|
0 0 0 0 0 1 0 0 <== 4
|
|
|
|
1 1 <== carries
|
|
1 1 1 1 1 0 1 1 <== one's complement of 4
|
|
+ 0 0 0 0 0 0 0 1 <== plus 1
|
|
----------------------
|
|
1 1 1 1 1 1 0 0 <== -4
|
|
\end{verbatim}
|
|
\end{minipage}
|
|
|
|
This can be verified by adding 5 to the result and observe that
|
|
the sum is 1:
|
|
|
|
\begin{verbatim}
|
|
-128 64 32 16 8 4 2 1
|
|
1 1 1 1 1 1 <== carries
|
|
1 1 1 1 1 1 0 0 <== -4
|
|
+ 0 0 0 0 0 1 0 1 <== 5
|
|
----------------------
|
|
1 0 0 0 0 0 0 0 1 <== 1 (with a truncation)
|
|
\end{verbatim}
|
|
|
|
Note that the changing of the sign using this method is symmetric
|
|
in that it is identical when converting from negative to positive
|
|
and when converting from positive to negative: {\em flip the bits and
|
|
add 1.}
|
|
|
|
For example, changing the value -4 to 4 to illustrate the
|
|
reverse of the conversion above:
|
|
|
|
\begin{verbatim}
|
|
-128 64 32 16 8 4 2 1
|
|
1 1 1 1 1 1 0 0 <== -4
|
|
|
|
1 1 <== carries
|
|
0 0 0 0 0 0 1 1 <== one's complement of -4
|
|
+ 0 0 0 0 0 0 0 1 <== plus 1
|
|
----------------------
|
|
0 0 0 0 0 1 0 0 <== 4
|
|
\end{verbatim}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Subtraction of Binary Numbers}
|
|
|
|
|
|
Subtraction%
|
|
\enote{This section needs more examples of subtracting
|
|
signed an unsigned numbers and a discussion on how
|
|
signedness is not relevant until the results are interpreted.
|
|
For example adding $-4+ -8=-12$ using two 8-bit numbers
|
|
is the same as adding $252+248=500$ and truncating the result
|
|
to 244.}
|
|
of binary numbers is performed by first negating
|
|
the subtrahend and then adding the two numbers. Due to the
|
|
nature of two's complement numbers this method will work for both
|
|
signed and unsigned numbers!
|
|
|
|
Observation: Since we always have a carry-in of zero into the LSB when
|
|
adding, we can take advantage of that fact by (ab)using that carry input
|
|
to perform that adding the extra 1 to the subtrahend as part of
|
|
changing its sign in the examples below.
|
|
|
|
An example showing the subtraction of two {\em signed} binary numbers: $-4-8 = -12$
|
|
|
|
\begin{verbatim}
|
|
-128 64 32 16 8 4 2 1
|
|
1 1 1 1 1 1 0 0 <== -4 (minuend)
|
|
- 0 0 0 0 1 0 0 0 <== 8 (subtrahend)
|
|
------------------------
|
|
|
|
|
|
1 1 1 1 1 1 1 1 1 <== carries
|
|
1 1 1 1 1 1 0 0 <== -4
|
|
+ 1 1 1 1 0 1 1 1 <== one's complement of -8
|
|
------------------------
|
|
1 1 1 1 1 0 1 0 0 <== -12
|
|
\end{verbatim}
|
|
|
|
|
|
%An example showing the subtraction of two {\em unsigned} binary numbers: $252+248=500$
|
|
%
|
|
%\begin{verbatim}
|
|
% 128 64 32 16 8 4 2 1
|
|
%
|
|
% 1 1 1 1 1 <== carries
|
|
% 1 1 1 1 1 1 0 0 <== 252
|
|
% + 1 1 1 1 1 0 0 0 <== 248
|
|
% ----------------------
|
|
% 1 1 1 1 1 0 1 0 0 < == 500 (if we do NOT truncate the MSB)
|
|
%\end{verbatim}
|
|
%
|
|
%An example showing the subtraction of two {\em unsigned} binary numbers: $252+248=500$
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Truncation}
|
|
\index{truncation}
|
|
\index{overflow}
|
|
\index{carry}
|
|
|
|
Discarding the carry bit that can be generated from the MSB is called {\em truncation}.
|
|
|
|
So far we have been ignoring the carries that can come from the MSBs when adding and subtracting.
|
|
We have also been ignoring the potential impact of a carry causing a signed number to change
|
|
its sign in an unexpected way.
|
|
|
|
In the examples above, truncating the results either had 1) no impact on the calculated sums
|
|
or 2) was absolutely necessary to correct the sum in cases such as: $-4 + 5$.
|
|
|
|
For example, note what happens when we try to subtract 1 from the most
|
|
negative value that we can represent in a 4 bit two's complement number:
|
|
|
|
\begin{verbatim}
|
|
-8 4 2 1
|
|
1 0 0 0 <== -8 (minuend)
|
|
- 0 0 0 1 <== 1 (subtrahend)
|
|
------------
|
|
|
|
|
|
1 1 <== carries
|
|
1 0 0 0 <== -8
|
|
+ 1 1 1 0 <== one's complement of 1
|
|
----------
|
|
1 0 1 1 1 <== this SHOULD be -9 but with truncation it is 7
|
|
\end{verbatim}
|
|
|
|
The problem with this example is that we can not represent $-9_{10}$ using a 4-bit
|
|
two's complement number.
|
|
|
|
Granted, if we would have used 5 bit numbers, then the ``answer'' would have fit OK.
|
|
But the same problem would return when trying to calculate $-16 - 1$.
|
|
So simply ``making more room'' does not solve this problem.
|
|
|
|
%However, as calculating $-1+1=0$ has demonmstrated above, it was necessary for that
|
|
%case to discard the carry out of the MSB to get the correct result.
|
|
|
|
%In the case of calculating $-1+1=0$ the addends and result all fit into same-sized
|
|
%(8-bit) values. When calculating $-8-1=-9$ the addends each can fit into 4-bit
|
|
%two's complement numbers but the result would require a 5-bit number.
|
|
|
|
This is not just a problem when subtracting, nor is it just a problem with
|
|
signed numbers.
|
|
|
|
The same situation can happen {\em unsigned} numbers.
|
|
For example:
|
|
|
|
\begin{verbatim}
|
|
8 4 2 1
|
|
1 1 1 0 0 <== carries
|
|
1 1 1 0 <== 14 (addend)
|
|
+ 0 0 1 1 <== 3 (addend)
|
|
------------
|
|
1 0 0 0 1 <== this SHOULD be 17 but with truncation it is 1
|
|
\end{verbatim}
|
|
|
|
|
|
How to handle such a truncation depends on whether the {\em original} values
|
|
being added are signed or unsigned.
|
|
|
|
The RV ISA refers to the discarding the carry out of the MSB after an
|
|
add (or subtract) of two {\em unsigned} numbers as an {\em unsigned overflow}%
|
|
\footnote{Most microprocessors refer to {\em unsigned overflow} simply as a
|
|
{\em carry} condition.}
|
|
and the situation where carries create an incorrect sign in the
|
|
result of adding (or subtracting) two {\em signed} numbers as a
|
|
{\em signed overflow}.~\cite[p.~13]{rvismv1v22:2017}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{Unsigned Overflow}
|
|
\index{overflow!unsigned}
|
|
|
|
When adding {\em unsigned} numbers, an overflow only occurs when there
|
|
is a carry out of the MSB resulting in a sum that is truncated to fit
|
|
into the number of bits allocated to contain the result.
|
|
|
|
\autoref{sum:240+17} illustrates an unsigned overflow during addition:
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
1 1 1 1 0 0 0 0 0 <== carries
|
|
1 1 1 1 0 0 0 0 <== 240
|
|
+ 0 0 0 1 0 0 0 1 <== 17
|
|
---------------------
|
|
1 0 0 0 0 0 0 0 1 <== sum = 1
|
|
\end{BVerbatim}
|
|
%{\captionof{figure}{$240+16=0$ (overflow)}\label{sum:240+17}}
|
|
\caption{$240+17=1$ (overflow)}
|
|
\label{sum:240+17}
|
|
\end{figure}
|
|
|
|
Some times an overflow like this is referred to as a {\em wrap around}
|
|
because of the way that successive additions will result in a value that
|
|
increases until it {\em wraps} back {\em around} to zero and then
|
|
returns to increasing in value until it, again, wraps around again.
|
|
|
|
\begin{tcolorbox}
|
|
When adding, {\em unsigned overflow} occurs when ever there is a carry
|
|
{\em out of} the most significant bit.
|
|
\end{tcolorbox}
|
|
|
|
|
|
|
|
When subtracting {\em unsigned} numbers, an overflow only occurs when the
|
|
subtrahend is greater than the minuend (because in those cases the
|
|
different would have to be negative and there are no negative values
|
|
that can be represented with an unsigned binary number.)
|
|
|
|
\autoref{sum:3-4} illustrates an unsigned overflow during subtraction:
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
0 0 0 0 0 0 1 1 <== 3 (minuend)
|
|
- 0 0 0 0 0 1 0 0 <== 4 (subtrahend)
|
|
-----------------
|
|
|
|
|
|
0 0 0 0 0 0 1 1 1 <== carries
|
|
0 0 0 0 0 0 1 1 <== 3
|
|
+ 1 1 1 1 1 0 1 1 <== one's complement of 4
|
|
-----------------
|
|
1 1 1 1 1 1 1 1 <== 255 (overflow)
|
|
\end{BVerbatim}
|
|
\caption{$3-4=255$ (overflow)}
|
|
\label{sum:3-4}
|
|
\end{figure}
|
|
|
|
\begin{tcolorbox}
|
|
When subtracting, {\em unsigned overflow} occurs when ever there is {\em not} a carry
|
|
{\em out of} the most significant bit (IFF the carry-in on the LSB is used to add the
|
|
extra 1 to the subtrahend when changing its sign.)
|
|
\end{tcolorbox}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{Signed Overflow}
|
|
\index{overflow!signed}
|
|
|
|
When adding {\em signed} numbers, an overflow only occurs when the two
|
|
addends are positive and sum is negative or the addends are both negative
|
|
and the sum is positive.
|
|
|
|
When subtracting {\em signed} numbers, an overflow only occurs when the
|
|
minuend is positive and the subtrahend is negative and difference is negative
|
|
or when the minuend is negative and the subtrahend is positive and the
|
|
difference is positive.%
|
|
\footnote{I had to look it up to remember which were which
|
|
too\ldots\ it is: minuend - subtrahend = difference.\cite{subtrahend}}
|
|
|
|
Consider the results of the addition of two {\em signed} numbers
|
|
while looking more closely at the carry values.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
0 1 0 0 0 0 0 0 0 <== carries
|
|
0 1 0 0 0 0 0 0 <== 64
|
|
+ 0 1 0 0 0 0 0 0 <== 64
|
|
---------------------
|
|
1 0 0 0 0 0 0 0 <== sum = -128
|
|
\end{BVerbatim}
|
|
\caption{$64+64 = -128$ (overflow)}
|
|
\label{sum:64+64}
|
|
\end{figure}
|
|
|
|
|
|
|
|
\autoref{sum:64+64} is an example of {\em signed overflow}. As shown, the problem is
|
|
that the sum of two positive numbers has resulted in an obviously incorrect
|
|
negative result due to a carry flowing into the sign-bit in the MSB.
|
|
|
|
Granted, if the same values were added using values larger than 8-bits
|
|
then the sum would have been correct. However, these examples assume that
|
|
all the operations are performed on (and results stored into) 8-bit values.
|
|
Given any finite-number of bits, there are values that could be added such that
|
|
an overflow occurs.
|
|
|
|
\index{truncation}
|
|
\autoref{sum:-128+-128} shows another overflow situation that is caused
|
|
by the fact that there is nowhere for the carry out of the sign-bit to go.
|
|
We say that this result has been {\em truncated}.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
1 0 0 0 0 0 0 0 0 <== carries
|
|
1 0 0 0 0 0 0 0 <== -128
|
|
+ 1 0 0 0 0 0 0 0 <== -128
|
|
---------------------
|
|
0 0 0 0 0 0 0 0 <== sum = 0
|
|
\end{BVerbatim}
|
|
\caption{$-128+-128 = 0$ (overflow)}
|
|
\label{sum:-128+-128}
|
|
\end{figure}
|
|
|
|
Truncation is not necessarily a problem. Consider the truncations in
|
|
figures \ref{sum:-3+-5} and \ref{sum:-2+10}.
|
|
\autoref{sum:-2+10} demonstrates the importance of discarding
|
|
the carry from the sum of the MSBs of signed numbers when addends
|
|
do not have the same sign.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
1 1 1 1 1 1 1 1 0 <== carries
|
|
1 1 1 1 1 1 0 1 <== -3
|
|
+ 1 1 1 1 1 0 1 1 <== -5
|
|
---------------------
|
|
1 1 1 1 1 0 0 0 <== sum = -8
|
|
\end{BVerbatim}
|
|
\captionof{figure}{$-3+-5 = -8$}
|
|
\label{sum:-3+-5}
|
|
\end{figure}
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
1 1 1 1 1 1 1 0 0 <== carries
|
|
1 1 1 1 1 1 1 0 <== -2
|
|
+ 0 0 0 0 1 0 1 0 <== 10
|
|
---------------------
|
|
0 0 0 0 1 0 0 0 <== sum = 8
|
|
\end{BVerbatim}
|
|
\captionof{figure}{$-2+10 = 8$}
|
|
\label{sum:-2+10}
|
|
\end{figure}
|
|
|
|
Just like an unsigned number can wrap around as a result of
|
|
successive additions, a signed number can so the same thing. The
|
|
only difference is that signed numbers won't wrap from the maximum
|
|
value back to zero, instead it will wrap from the most positive to
|
|
the most negative value as shown in \autoref{sum:127+1}.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{BVerbatim}
|
|
0 1 1 1 1 1 1 1 0 <== carries
|
|
0 1 1 1 1 1 1 1 <== 127
|
|
+ 0 0 0 0 0 0 0 1 <== 1
|
|
---------------------
|
|
1 0 0 0 0 0 0 0 <== sum = -128
|
|
\end{BVerbatim}
|
|
\captionof{figure}{$127+1 = -128$}
|
|
\label{sum:127+1}
|
|
\end{figure}
|
|
|
|
\begin{tcolorbox}
|
|
Formally, a {\em signed overflow} occurs when ever the carry
|
|
{\em into} the most significant bit is not the same as the
|
|
carry {\em out of} the most significant bit.
|
|
\end{tcolorbox}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Sign and Zero Extension}
|
|
|
|
\index{sign extension}
|
|
\label{SignExtension}
|
|
Due to the nature of the two's complement encoding scheme, the following
|
|
numbers all represent the same value:
|
|
\begin{verbatim}
|
|
1111 <== -1
|
|
11111111 <== -1
|
|
11111111111111111111 <== -1
|
|
1111111111111111111111111111 <== -1
|
|
\end{verbatim}
|
|
As do these:
|
|
\begin{verbatim}
|
|
01100 <== 12
|
|
0000001100 <== 12
|
|
00000000000000000000000000000001100 <== 12
|
|
\end{verbatim}
|
|
|
|
The lengthening of these numbers by replicating the digits on the left
|
|
is what is called {\em sign extension}.
|
|
|
|
\begin{tcolorbox}
|
|
Any signed number can have any quantity of additional MSBs added to it,
|
|
provided that they repeat the value of the sign bit.
|
|
\end{tcolorbox}
|
|
|
|
\autoref{Figure:SignExtendNegative} illustrates extending the negative sign
|
|
bit to the left by replicating it.
|
|
A negative number will have its \acrshort{msb} (bit 19 in this example)
|
|
set to 1. Extending this value to the left will set all the new bits
|
|
to the left of it to 1 as well.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\DrawBitBoxSignExtendedPicture{32}{10100000000000000010}
|
|
\captionof{figure}{Sign-extending a negative integer from 20 bits to 32 bits.}
|
|
\label{Figure:SignExtendNegative}
|
|
\end{figure}
|
|
|
|
\autoref{Figure:SignExtendPositive} illustrates extending the sign bit of a
|
|
positive number to the left by replicating it.
|
|
A positive number will have its \acrshort{msb} set to 0. Extending this
|
|
value to the left will set all the new bits to the left of it to 0 as well.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\DrawBitBoxSignExtendedPicture{32}{01000000000000000010}
|
|
\captionof{figure}{Sign-extending a positive integer from 20 bits to 32 bits.}
|
|
\label{Figure:SignExtendPositive}
|
|
\end{figure}
|
|
|
|
|
|
\label{ZeroExtension}
|
|
In a similar vein, any unsigned number also may have any quantity of
|
|
additional MSBs added to it provided that they are all zero. This is
|
|
called {\em zero extension}. For example,
|
|
the following all represent the same value:
|
|
\begin{verbatim}
|
|
1111 <== 15
|
|
01111 <== 15
|
|
00000000000000000000000001111 <== 15
|
|
\end{verbatim}
|
|
|
|
\begin{tcolorbox}
|
|
Any {\em unsigned} number may be {\em zero extended} to any size.
|
|
\end{tcolorbox}
|
|
|
|
\enote{Remove the sign-bit boxes from this figure?}%
|
|
\autoref{Figure:ZeroExtend} illustrates zero-extending a 20-bit number to the
|
|
left to form a 32-bit number.
|
|
|
|
\begin{figure}[ht]
|
|
\centering
|
|
\DrawBitBoxZeroExtendedPicture{32}{10000000000000000010}
|
|
\captionof{figure}{Zero-extending an unsigned integer from 20 bits to 32 bits.}
|
|
\label{Figure:ZeroExtend}
|
|
\end{figure}
|
|
|
|
%Sign- and zero-extending binary numbers are common operations used to
|
|
%fit a byte or halfword into a fullword.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Shifting}
|
|
|
|
We were all taught how to multiply and divide decimal numbers by ten
|
|
by moving (or {\em shifting}) the decimal point to the right or left
|
|
respectively. Doing the same in any other base has the same effect
|
|
in that it will multiply or divide the number by its base.
|
|
|
|
\enote{Include decimal values in the shift diagrams.}%
|
|
Multiplication and division are only two reasons for shifting. There
|
|
can be other occasions where doing so is useful.
|
|
|
|
As implemented by a CPU, shifting applies to the value in a register
|
|
and the results stored back into a register of finite size. Therefore
|
|
a shift result will always be truncated to fit into a register.
|
|
|
|
\enote{Add some examples showing the rounding of positive and negative values.}%
|
|
Note that when dealing with numeric values, any truncation performed
|
|
during a right-shift will manifest itself as rounding toward zero.
|
|
|
|
\subsection{Logical Shifting}
|
|
|
|
Shifting {\em logically} to the left or right is a matter of re-aligning
|
|
the bits in a register and truncating the result.
|
|
|
|
\enote{Redraw these with arrows tracking the shifted bits and the truncated values}%
|
|
To shift left two positions:
|
|
|
|
\DrawBitBoxUnsignedPicture{10111000000000000010}\\
|
|
\DrawBitBoxUnsignedPicture{11100000000000001000}
|
|
|
|
To shift right one position:
|
|
|
|
\DrawBitBoxUnsignedPicture{10111000000000000010}\\
|
|
\DrawBitBoxUnsignedPicture{01011100000000000001}
|
|
|
|
\begin{tcolorbox}
|
|
Note that the vacated bit positions are always filled with zero.
|
|
\end{tcolorbox}
|
|
|
|
\subsection{Arithmetic Shifting}
|
|
|
|
Some times it is desirable to retain the value of the sign bit when
|
|
shifting. The RISC-V ISA provides an arithmetic right shift
|
|
instruction for this purpose (there is no arithmetic left shift for
|
|
this ISA.)
|
|
|
|
\begin{tcolorbox}
|
|
When shifting to the right {\em arithmetically}, vacated bit positions are
|
|
filled by replicating the value of the sign bit.
|
|
\end{tcolorbox}
|
|
|
|
An arithmetic right shift of a negative number by 4 bit positions:
|
|
|
|
\DrawBitBoxSignedPicture{10111000000000000010}\\
|
|
\DrawBitBoxSignedPicture{11111011100000000000}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\section{Main Memory Storage}
|
|
|
|
As mentioned in \autoref{VolatileStorage}, the main memory in a RISC-V
|
|
system is byte-addressable. For that reason we will visualize it by
|
|
displaying ranges of bytes displayed in hex and in \gls{ascii}. As will
|
|
become obvious, the ASCII part makes it easier to find text messages.%
|
|
\footnote{Most of the memory dumps in this text are generated by \gls{rvddt}
|
|
and are shown on a per-byte basis without any attempt to reorder their
|
|
values. Some other applications used to dump memory do not dump the bytes
|
|
in address-order! It is important to know how your software tools operate
|
|
when using them to dump the contents of memory and/or files.}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Memory Dump}
|
|
|
|
\listingRef{rvddt_memdump.out} shows a {\em memory dump} from the rvddt
|
|
`d' command requesting a dump starting at address \hex{00002600}
|
|
for the default quantity (\hex{100}) of bytes.
|
|
|
|
\listing{rvddt_memdump.out}{{\tt rvddt} memory dump}
|
|
|
|
\begin{itemize}
|
|
\item [$\ell$ 1] The rvddt prompt showing the dump command.
|
|
\item [$\ell$ 2] From left to right. the dump is presented as the address
|
|
of the first byte (\hex{00002600}) followed by a colon, the value
|
|
of the byte at address \hex{00002600} expressed in hex, the next byte
|
|
(at address \hex{00002601}) and so on for 16 bytes. There is a
|
|
double-space
|
|
between the 7th and 8th bytes to help provide a visual reference for
|
|
the center to make it easy to locate bytes on the right end. For
|
|
example, the byte at address \hex{0000260c} is four bytes to the
|
|
right of byte number eight (at the gap) and contains \hex{13}.
|
|
To the right of the 16-bytes is an asterisk-enclosed set of 16 columns
|
|
showing the ASCII characters that each byte represents. If a byte
|
|
has a value that corresponds to a printable character code, the character
|
|
will be displayed. For any illegal/un-displayable byte values, a dot
|
|
is shown to make it easier to count the columns.
|
|
\item [$\ell$ 3-17] More of the same as seen on $\ell$ 2. The address
|
|
at the left can be seen to advance by $16_{10}$ (or $10_{16}$)
|
|
for each line shown.
|
|
\end{itemize}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Endianness}
|
|
|
|
The choice of which end of a multi-byte value is to be stored at the
|
|
lowest byte address is referred to as {\em endianness.} For example,
|
|
if a CPU were to store a \gls{halfword} into memory, should the byte
|
|
containing the \acrfull{msb} (the {\em big} end) go first or does
|
|
the byte with the \acrfull{lsb} (the {\em little} end) go first?
|
|
|
|
On the one hand the choice is arbitrary. On the other hand, it is
|
|
possible that the choice could impact the performance of the system.%
|
|
\footnote{See\cite{IEN137} for some history of the big/little-endian ``controversy.''}
|
|
|
|
IBM mainframe CPUs and the 68000 family store their bytes in big-endian
|
|
order. While the Intel Pentium and most embedded processors use
|
|
little-endian order.
|
|
Some CPUs are even {\em bi-endian} in that they have instructions that
|
|
can change their order on the fly.
|
|
|
|
The RISC-V system uses the little-endian byte order.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{Big-Endian}
|
|
\label{BigEndian}
|
|
|
|
Using the contents of \listingRef{rvddt_memdump.out}, a {\em big-endian}
|
|
CPU would recognize the contents as follows:
|
|
|
|
\begin{itemize}
|
|
\item The 8-bit value stored at address \hex{00002658} is \hex{76}.
|
|
\item The 16-bit value stored at address \hex{00002658} is \hex{7661}.
|
|
\item The 32-bit value stored at address \hex{00002658} is \hex{76616c3d}.
|
|
\end{itemize}
|
|
|
|
\begin{tcolorbox}
|
|
On a big-endian system, the bytes in the dump are in the same order as
|
|
they would be used by the CPU if it were to read them as a multi-byte
|
|
value.
|
|
\end{tcolorbox}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsubsection{Little-Endian}
|
|
\label{LittleEndian}
|
|
|
|
Using the contents of \listingRef{rvddt_memdump.out}, a {\em little-endian}
|
|
CPU would recognize the contents as follows:
|
|
|
|
\begin{itemize}
|
|
\item The 8-bit value stored at address \hex{00002658} is \hex{76}.
|
|
\item The 16-bit value stored at address \hex{00002658} is \hex{6176}.
|
|
\item The 32-bit value stored at address \hex{00002658} is \hex{3d6c6176}.
|
|
\end{itemize}
|
|
|
|
\begin{tcolorbox}
|
|
On a little-endian system, the bytes in the dump are in reverse order as
|
|
they would be used by the CPU if it were to read them as a multi-byte value.
|
|
\end{tcolorbox}
|
|
|
|
Note that in a little-endian system, the number of bytes used to represent
|
|
the value does not change the place value of the first byte(s). In this
|
|
example, the \hex{76} at address \hex{00002658} is the least significant
|
|
byte in all representations.
|
|
|
|
In the Risc-V ISA it is noted that ``A minor point is that we have also found
|
|
little-endian memory systems to be more natural for hardware
|
|
designers. However, certain application areas, such as IP networking, operate
|
|
on big-endian data structures, and so we leave open the possibility of
|
|
non-standard big-endian or bi-endian systems.''\cite[p.~6]{rvismv1v22:2017}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Arrays and Character Strings}
|
|
|
|
While Endianness defines how single values are stored in memory,
|
|
the {\em array} defines how multiple values are stored.
|
|
|
|
An array is a data structure comprised of an ordered set of elements.
|
|
This text will limit its definition of array to a plurality of
|
|
elements that are all of the same type. Where type
|
|
refers to the size (number of bytes) and representation (signed,
|
|
unsigned,\ldots) of each element.
|
|
|
|
In an array, the elements are stored adjacent to one another such that the
|
|
address $e$ of any element $x[n]$ is:
|
|
|
|
\begin{equation}
|
|
e = a + n * s
|
|
\end{equation}
|
|
|
|
Where $x$ is the name of the array, $n$ is the element number of interest,
|
|
$e$ is the address of interest, $a$ is the address of the first element in
|
|
the array and $s$ is the size (in bytes) of each element.
|
|
|
|
Given an array $x$ containing $m$ elements, $x[0]$ is the first element of
|
|
the array and $x[m-1]$ is the last element of the array.%
|
|
\footnote{Some computing languages (C, C++, Java, C\#, Python, Perl,\ldots)
|
|
define an array such that the first element is indexed as $x[0]$.
|
|
While others (FORTRAN, MATLAB) define the first element of an
|
|
array to be $x[1]$.}
|
|
|
|
Using this definition, and the memory dump shown in
|
|
\listingRef{rvddt_memdump.out}, and the knowledge that
|
|
we are using a little-endian machine and given that
|
|
$a = $ \hex{00002656} and $s = 2$, the values of the first 8 elements
|
|
of array $x$ are:
|
|
|
|
\begin{itemize}
|
|
\item $x[0]$ is \hex{0000} and is stored at \hex{00002656}.
|
|
\item $x[1]$ is \hex{6176} and is stored at \hex{00002658}.
|
|
\item $x[2]$ is \hex{3d6c} and is stored at \hex{0000265a}.
|
|
\item $x[3]$ is \hex{0000} and is stored at \hex{0000265c}.
|
|
\item $x[4]$ is \hex{0000} and is stored at \hex{00002660}.
|
|
\item $x[5]$ is \hex{0000} and is stored at \hex{00002662}.
|
|
\item $x[6]$ is \hex{8480} and is stored at \hex{00002664}.
|
|
\item $x[7]$ is \hex{412e} and is stored at \hex{00002666}.
|
|
\end{itemize}
|
|
|
|
\begin{tcolorbox}
|
|
In general, there is no fixed rule nor notion as to how many
|
|
elements an array has. It is up to the programmer to ensure that
|
|
the starting address and the number of elements in any given array
|
|
(its size) are used properly so that data bytes outside an array
|
|
are not accidentally used as elements.
|
|
\end{tcolorbox}
|
|
|
|
There is, however, a common convention used for an array of
|
|
characters that is used to hold a text message
|
|
(called a {\em character string} or just {\em string}).
|
|
|
|
When an array is used to hold a string the element past the last
|
|
character in the string is set to zero. This is because 1) zero
|
|
is not a valid printable ASCII character and 2) it simplifies
|
|
software in that knowing no more than the starting address of a
|
|
string is all that is needed to processes it. Without this zero
|
|
{\em sentinel} value (called a {\em null} terminator), some knowledge
|
|
of the number of characters in the string would have to otherwise
|
|
be conveyed to any code needing to consume or process the string.
|
|
|
|
In \listingRef{rvddt_memdump.out}, the 5-byte long array starting
|
|
at address \hex{00002658} contains a string whose value can be
|
|
expressed as either: % \verb@76 61 6c 3d 00@ or \verb@"val="@.
|
|
|
|
\verb@76 61 6c 3d 00@
|
|
|
|
or
|
|
|
|
\verb@"val="@
|
|
|
|
%\begin{itemize}
|
|
%\item \verb@76 61 6c 3d 00@
|
|
%\item \verb@"val="@
|
|
%\end{itemize}
|
|
|
|
\index{ASCII}
|
|
\index{ASCIIZ}
|
|
When the double-quoted text form is used, the GNU assembler used in
|
|
this text differentiates between {\em ascii} and {\em asciiz} strings
|
|
such that an {\em ascii} string is {\bf not} null terminated and an
|
|
{\em asciiz} string {\bf is} null terminated.
|
|
|
|
The value of providing a method to create a string that is not
|
|
null terminated is that a program may define a large string by
|
|
concatenating a number of {\em ascii} strings together and following the
|
|
last with a byte of zero to null-terminate it.
|
|
|
|
It is a common mistake to create a string with a missing
|
|
null terminator. The result of printing such a string is that
|
|
the string will be printed as well as whatever random data bytes in
|
|
memory follow it until a byte whose value is zero is encountered
|
|
by chance.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Context is Important!}
|
|
|
|
Data values can be interpreted differently depending on the context in
|
|
which they are used. Assuming what a set of bytes is used for based on
|
|
their contents can be very misleading! For example, there is a 0x76 at
|
|
address 0x00002658. This is a `v' is you use it as an ASCII
|
|
(see~\autoref{chapter:ascii}) character, a $118_{10}$ if it is an integer
|
|
value and TRUE if it is a conditional.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Alignment}
|
|
|
|
\enote{Include the obligatory diagram showing the overlapping data types
|
|
when they are all aligned.}%
|
|
With respect to memory and storage, {\em \gls{alignment}} refers to the
|
|
{\em location} of a data element when the address that it is stored is
|
|
a precise multiple of a power-of-2.
|
|
|
|
The primary alignments of concern are typically 2 (a halfword),
|
|
4 (a fullword), 8 (a double word) and 16 (a quad-word) bytes.
|
|
|
|
For example, any data element that is aligned to 2-byte boundary
|
|
must have an (hex) address that ends in any of: 0, 2, 4, 6, 8, A,
|
|
C or E.
|
|
Any 4-byte aligned element must be located at an address ending
|
|
in 0, 4, 8 or C. An 8-byte aligned element at an address ending
|
|
with 0 or 8, and 16-byte aligned elements must be located at
|
|
addresses ending in zero.
|
|
|
|
Such alignments are important when exchanging data between the CPU
|
|
and memory because the hardware implementations are optimized to
|
|
transfer aligned data. Therefore, aligning data used by any program
|
|
will reap the benefit of running faster.%
|
|
\footnote{Alignment of data, while important for efficient performance,
|
|
is not mandatory for RISC-V systems.\cite[p.~19]{rvismv1v22:2017}}
|
|
|
|
An element of data is considered to be {\em aligned to its natural size}
|
|
when its address is an exact multiple of the number of bytes used to
|
|
represent the data. Note that the ISA we are concerned with {\em only}
|
|
operates on elements that have sizes that are powers of two.
|
|
|
|
For example, a 32-bit integer consumes one full word. If the four bytes
|
|
are stored in main memory at an address than is a multiple of 4 then
|
|
the integer is considered to naturally aligned.
|
|
|
|
The same would apply to 16-bit, 64-bit, 128-bit and other such values
|
|
as they fit into 2, 8 and 16 byte elements respectively.
|
|
|
|
Some CPUs can deliver four (or more) bytes at the same time while others
|
|
might only be capable of delivering one or two bytes at a time. Such
|
|
differences in hardware typically impact the cost and performance of a
|
|
system.%
|
|
\footnote{The design and implementation
|
|
choices that determine how any given system operates are part of what is
|
|
called a system's {\em organization} and is beyond the scope of this text.
|
|
See~\cite{codriscv:2017} for more information on computer organization.}
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Instruction Alignment}
|
|
|
|
The RISC-V ISA requires that all instructions be aligned to their
|
|
natural boundaries.
|
|
|
|
Every possible instruction that an RV32I CPU can execute contains
|
|
exactly 32 bits. Therefore they are always stored on a full word
|
|
boundary. Any {\em unaligned} instruction is {\em illegal}.%
|
|
\footnote{This rule is relaxed by the C extension to allow an
|
|
instruction to start at any even address.\cite[p.~5]{rvismv1v22:2017}}
|
|
|
|
An attempt to fetch an instruction from an unaligned address
|
|
will result in an error referred to as an alignment {\em \gls{exception}}.
|
|
This and other exceptions cause the CPU to stop executing the
|
|
current instruction and start executing a different set of instructions
|
|
that are prepared to handle the problem. Often an exception is
|
|
handled by completely stopping the program in a way that is commonly
|
|
referred to as a system or application {\em crash}.
|