Cleanup signed, unsigned, adding, & overflow

2025-09-27 05:04:39 -04:00 · 2020-08-18 16:04:52 -05:00 · 2020-08-18 16:04:52 -05:00 · 7ebde15709
commit 7ebde15709
parent 90744ac90d
1 changed files with 157 additions and 66 deletions
--- a/book/binary/chapter.tex
+++ b/book/binary/chapter.tex
@ -564,46 +564,42 @@ Binary:                  1  1  1  1  1  1  1  1

 \ldots because: $-128+64+32+16+8+4+2+1=-1$.

+This format has the virtue of allowing the same addition logic discussed above to be 
+used to calculate the sums of signed numbers as unsigned numbers.

-Calculating $4+5 = 9$
+Calculating the signed addition: $4+5 = 9$

 \begin{verbatim}
-	   1    <== carries
-	 000100 <== 4
-	+000101 <== 5
-     ------
-	 001001 <== 9
+       1    <== carries
+     000100 <== 4 = 0 + 0 + 0 + 4 + 0 + 0
+    +000101 <== 5 = 0 + 0 + 0 + 4 + 0 + 1
+    -------
+     001001 <== 9 = 0 + 0 + 8 + 0 + 0 + 1
 \end{verbatim}

-Calculating $-4+ -5 = -9$
+Calculating the signed addition: $-4+ -5 = -9$

 \begin{verbatim}
-	1 11     <== carries
-	  111100 <== -4
-	 +111011 <== -5
+    1 11     <== carries
+      111100 <== -4 = -32 + 16 + 8 + 4 + 0 + 0
+     +111011 <== -5 = -32 + 16 + 8 + 0 + 2 + 1
   ---------
-	1 110111 <== -9 (with a truncation)
-
-  -32 16 8 4 2 1
-    1  1 0 1 1 1
- -32 + 16 + 4 + 2 + 1 = -9
+    1 110111 <== -9 (with a truncation) = -32 + 16 + 4 + 2 + 1 = -9
 \end{verbatim}


-This format has the virtue of allowing the same addition logic 
-discussed above to be used to calculate $-1+1=0$.
+Calculating the signed addition: $-1+1=0$

 \begin{verbatim}
   -128 64 32 16  8  4  2  1 <== place value
-   1  1  1  1  1  1  1  1  0 <== carries
+   1  1  1  1  1  1  1  1    <== carries
      1  1  1  1  1  1  1  1 <== addend (-1)
    + 0  0  0  0  0  0  0  1 <== addend (1)
      ----------------------
   1  0  0  0  0  0  0  0  0 <== sum (0 with a truncation)
 \end{verbatim}

-In order for this to work, the carry out of the sum of the MSBs is 
-ignored.
+{\em In order for this to work, the carry out of the sum of the MSBs {\bfseries must} be discarded.}

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsubsection{Converting between Positive and Negative}
@ -612,8 +608,9 @@ Changing the sign on two's complement numbers can be described as
 inverting all of the bits (which is also known as the {\em one's complement})
 and then add one.

-For example, inverting the number four:
+For example, negating the number four:

+\begin{minipage}{\textwidth}
 \begin{verbatim}
   -128 64 32 16  8  4  2  1
      0  0  0  0  0  1  0  0 <== 4
@ -624,23 +621,24 @@ For example, inverting the number four:
      ----------------------
      1  1  1  1  1  1  0  0 <== -4
 \end{verbatim}
+\end{minipage}

 This can be verified by adding 5 to the result and observe that
 the sum is 1:

 \begin{verbatim}
   -128 64 32 16  8  4  2  1
-      1  1  1  1  1          <== carries
+  1   1  1  1  1  1          <== carries
      1  1  1  1  1  1  0  0 <== -4
    + 0  0  0  0  0  1  0  1 <== 5
      ----------------------
-   1  0  0  0  0  0  0  0  1
+  1   0  0  0  0  0  0  0  1 <== 1 (with a truncation)
 \end{verbatim}

 Note that the changing of the sign using this method is symmetric
 in that it is identical when converting from negative to positive
-and when converting from positive to negative: flip the bits and
-add 1.
+and when converting from positive to negative: {\em flip the bits and
+add 1.}

 For example, changing the value -4 to 4 to illustrate the
 reverse of the conversion above:
@ -661,45 +659,56 @@ reverse of the conversion above:
 \subsection{Subtraction of Binary Numbers}


-Subtraction of binary numbers is performed by first negating
-the subtrahend and then adding the two numbers.  Due to the
-nature of two's complement numbers this will work for both 
-signed and unsigned numbers.
+Subtraction%
 \enote{This section needs more examples of subtracting 
 signed an unsigned numbers and a discussion on how 
 signedness is not relevant until the results are interpreted. 
 For example adding $-4+ -8=-12$ using two 8-bit numbers 
 is the same as adding $252+248=500$ and truncating the result 
 to 244.}
+of binary numbers is performed by first negating
+the subtrahend and then adding the two numbers.  Due to the
+nature of two's complement numbers this method will work for both 
+signed and unsigned numbers!

-To calculate $-4-8 = -12$
+Observation: Since we always have a carry-in of zero into the LSB when
+adding, we can take advantage of that fact by (ab)using that carry input
+to perform that adding the extra 1 to the subtrahend as part of
+changing its sign in the examples below. 

-\enote{This example is unclear. That the adding of one to the subtrahend
-has to be done as part of the same operation as the sum of the two values.
-otherwise adding 1000 to 0001 will {\em not} result in a proper overflow 
-staus as discussed below.}
+An example showing the subtraction of two {\em signed} binary numbers: $-4-8 = -12$

 \begin{verbatim}
   -128 64 32 16  8  4  2  1
      1  1  1  1  1  1  0  0 <== -4  (minuend)
    - 0  0  0  0  1  0  0  0 <== 8   (subtrahend)
+    ------------------------


-                  1  1  1    <== carries
-      1  1  1  1  0  1  1  1 <== one's complement of -8
-    + 0  0  0  0  0  0  0  1 <== plus 1
-      ----------------------
-      1  1  1  1  1  0  0  0 <== -8
-      
-	  
-      1  1  1  1             <== carries
+  1   1  1  1  1  1  1  1  1 <== carries
      1  1  1  1  1  1  0  0 <== -4
-    + 1  1  1  1  1  0  0  0 <== -8
-      ----------------------
-   1  1  1  1  1  0  1  0  0 < == -12
+    + 1  1  1  1  0  1  1  1 <== one's complement of -8
+    ------------------------
+  1   1  1  1  1  0  1  0  0 <== -12
 \end{verbatim}


+%An example showing the subtraction of two {\em unsigned} binary numbers: $252+248=500$
+%
+%\begin{verbatim}
+%    128 64 32 16  8  4  2  1
+%
+%  1   1  1  1  1             <== carries
+%      1  1  1  1  1  1  0  0 <== 252
+%    + 1  1  1  1  1  0  0  0 <== 248
+%      ----------------------
+%  1   1  1  1  1  0  1  0  0 < == 500 (if we do NOT truncate the MSB)
+%\end{verbatim}
+%
+%An example showing the subtraction of two {\em unsigned} binary numbers: $252+248=500$
+
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{Truncation}
@ -707,10 +716,64 @@ staus as discussed below.}
 \index{overflow}
 \index{carry}

-So far we have been ignoring (truncating) the carries that can come from 
-the MSBs when adding and subtracting.  We have also been ignoring the 
-potential impact of a carry causing a signed number to change its sign in
-a destructive way.
+Discarding the carry bit that can be generated from the MSB is called {\em truncation}.
+
+So far we have been ignoring the carries that can come from the MSBs when adding and subtracting.  
+We have also been ignoring the potential impact of a carry causing a signed number to change 
+its sign in an unexpected way.
+
+In the examples above, truncating the results either had 1) no impact on the calculated sums
+or 2) was absolutely necessary to correct the sum in cases such as: $-4 + 5$.
+
+For example, note what happens when we try to subtract 1 from the most 
+negative value that we can represent in a 4 bit two's complement number:
+
+\begin{verbatim}
+     -8  4  2  1
+      1  0  0  0 <== -8  (minuend)
+    - 0  0  0  1 <==  1  (subtrahend)
+    ------------
+
+
+   1           1 <== carries
+      1  0  0  0 <== -8
+    + 1  1  1  0 <== one's complement of 1
+      ----------
+   1  0  1  1  1 <== this SHOULD be -9 but with truncation it is 7 
+\end{verbatim}
+
+The problem with this example is that we can not represent $-9_{10}$ using a 4-bit 
+two's complement number.  
+
+Granted, if we would have used 5 bit numbers, then the ``answer'' would have fit OK.
+But the same problem would return when trying to calculate $-16 - 1$. 
+So simply ``making more room'' does not solve this problem.
+
+%However, as calculating $-1+1=0$ has demonmstrated above, it was necessary for that
+%case to discard the carry out of the MSB to get the correct result.
+
+%In the case of calculating $-1+1=0$ the addends and result all fit into same-sized
+%(8-bit) values. When calculating $-8-1=-9$ the addends each can fit into 4-bit
+%two's complement numbers but the result would require a 5-bit number.
+
+This is not just a problem when subtracting, nor is it just a problem with
+signed numbers.
+
+The same situation can happen {\em unsigned} numbers. 
+For example:
+
+\begin{verbatim}
+      8  4  2  1
+  1   1  1  0  0 <== carries
+      1  1  1  0 <== 14  (addend)
+    + 0  0  1  1 <==  3  (addend)
+    ------------
+  1   0  0  0  1 <== this SHOULD be 17 but with truncation it is 1
+\end{verbatim}
+
+
+How to handle such a truncation depends on whether the {\em original} values 
+being added are signed or unsigned.

 The RV ISA refers to the discarding the carry out of the MSB after an 
 add (or subtract) of two {\em unsigned} numbers as an {\em unsigned overflow}%
@ -728,38 +791,66 @@ When adding {\em unsigned} numbers, an overflow only occurs when there
 is a carry out of the MSB resulting in a sum that is truncated to fit 
 into the number of bits allocated to contain the result.

-When subtracting {\em unsigned} numbers, an overflow only occurs when the
-difference is negative (because there are no negative unsigned numbers.)
-
-\autoref{sum:240+17} illustrates an unsigned overflow.
+\autoref{sum:240+17} illustrates an unsigned overflow during addition:

 \begin{figure}[H]
 \centering
 \begin{BVerbatim}
-   1 1 1 1 0 0 0 0   <== carries
-     1 1 1 1 0 0 0 0 <== 240
- +   0 0 0 1 0 0 0 1 <== 17
+   1  1 1 1 0 0 0 0 0 <== carries
+      1 1 1 1 0 0 0 0 <== 240
+ +    0 0 0 1 0 0 0 1 <== 17
 ---------------------
-     0 0 0 0 0 0 0 1 <== sum = 1
+   1  0 0 0 0 0 0 0 1 <== sum = 1
 \end{BVerbatim}
 %{\captionof{figure}{$240+16=0$ (overflow)}\label{sum:240+17}}
 \caption{$240+17=1$ (overflow)}
 \label{sum:240+17}
 \end{figure}

-\enote{Need to add an example of an unsigned overflow after a subtraction.
-When subtracting by adding the two's complement of the subtrahend, the unsigned
-overflow status is represented by a 0 carry out of the most significant bit!} 
 Some times an overflow like this is referred to as a {\em wrap around}
 because of the way that successive additions will result in a value that
 increases until it {\em wraps} back {\em around} to zero and then 
 returns to increasing in value until it, again, wraps around again.

 \begin{tcolorbox}
-An {\em unsigned overflow} occurs when ever there is a carry
+When adding, {\em unsigned overflow} occurs when ever there is a carry
 {\em out of} the most significant bit.
 \end{tcolorbox}

+
+
+When subtracting {\em unsigned} numbers, an overflow only occurs when the
+subtrahend is greater than the minuend (because in those cases the 
+different would have to be negative and there are no negative values 
+that can be represented with an unsigned binary number.)
+
+\autoref{sum:3-4} illustrates an unsigned overflow during subtraction:
+
+\begin{figure}[H]
+\centering
+\begin{BVerbatim}
+     0 0 0 0 0 1 1 <== 3 (minuend)
+   - 0 0 0 0 1 0 0 <== 4 (subtrahend)
+   ---------------
+
+
+  0  0 0 0 0 1 1 1 <== carries
+     0 0 0 0 0 1 1 <== 3
+   + 1 1 1 1 0 1 1 <== one's complement of 4
+   ---------------
+     1 1 1 1 1 1 1 <== 255 (overflow)
+\end{BVerbatim}
+\caption{$3-4=255$ (overflow)}
+\label{sum:3-4}
+\end{figure}
+
+\begin{tcolorbox}
+When subtracting, {\em unsigned overflow} occurs when ever there is {\em not} a carry
+{\em out of} the most significant bit (IFF the carry-in on the LSB is used to add the
+extra 1 to the subtrahend when changing its sign.)
+\end{tcolorbox}
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsubsection{Signed Overflow}
 \index{overflow!signed}
@ -781,7 +872,7 @@ while looking more closely at the carry values.
 \begin{figure}[H]
 \centering
 \begin{BVerbatim}
-   0 1 0 0 0 0 0 0   <== carries
+   0 1 0 0 0 0 0 0 0 <== carries
     0 1 0 0 0 0 0 0 <== 64
 +   0 1 0 0 0 0 0 0 <== 64
 ---------------------
@ -811,7 +902,7 @@ We say that this result has been {\em truncated}.
 \begin{figure}[H]
 \centering
 \begin{BVerbatim}
-   1 0 0 0 0 0 0 0   <== carries
+   1 0 0 0 0 0 0 0 0 <== carries
     1 0 0 0 0 0 0 0 <== -128
 +   1 0 0 0 0 0 0 0 <== -128
 ---------------------
@ -830,7 +921,7 @@ do not have the same sign.
 \begin{figure}[H]
 \centering
 \begin{BVerbatim}
-   1 1 1 1 1 1 1 1   <== carries
+   1 1 1 1 1 1 1 1 0 <== carries
     1 1 1 1 1 1 0 1 <== -3
 +   1 1 1 1 1 0 1 1 <== -5
 ---------------------
@ -843,7 +934,7 @@ do not have the same sign.
 \begin{figure}[H]
 \centering
 \begin{BVerbatim}
-   1 1 1 1 1 1 1 0   <== carries
+   1 1 1 1 1 1 1 0 0 <== carries
     1 1 1 1 1 1 1 0 <== -2
 +   0 0 0 0 1 0 1 0 <== 10
 ---------------------
@ -862,7 +953,7 @@ the most negative value as shown in \autoref{sum:127+1}.
 \begin{figure}[H]
 \centering
 \begin{BVerbatim}
-   0 1 1 1 1 1 1 1   <== carries
+   0 1 1 1 1 1 1 1 0 <== carries
     0 1 1 1 1 1 1 1 <== 127
 +   0 0 0 0 1 0 0 1 <== 1
 ---------------------