Chean up discussion of instruction formats.

2025-09-28 13:40:16 -04:00 · 2020-03-10 11:10:06 -05:00 · 2020-03-10 11:10:06 -05:00 · 34ff94a932
commit 34ff94a932
parent 176d2e851c
1 changed files with 167 additions and 80 deletions
--- a/book/rv32/chapter.tex
+++ b/book/rv32/chapter.tex
@ -262,13 +262,7 @@ immediate, register, base-displacement, pc-relative
 %in the same position.  Also note that imm[19:12] and imm[10:5] can only be 
 %found in one place.  imm[4:0] can only be found in one of two places\ldots

-The method/format of an instruction is designed with an eye on the ease
-of future manufacture of the machine that will execute them.  It is 
-easier to build a machine if it does not have to accommodate many different 
-ways to perform the same task.  The result is that a machine can be 
-built with fewer gates, consumes less power, and can run faster than
-if it were built when a priority is on how a user might prefer to decode
-the same instructions from a hex dump.
+

 This document concerns itself with the RISC-V instruction formats shown 
 in \autoref{Figure:riscvFormats}.
@ -276,17 +270,48 @@ in \autoref{Figure:riscvFormats}.
 %\autoref{Figure:riscvFormats} Shows the RISC-V instruction formats.

 \begin{figure}[ht]
-\DrawInsnTypeBTikz{00000000000000000000000000000000}\\
 \DrawInsnTypeUTikz{00000000000000000000000000000000}\\
 \DrawInsnTypeJTikz{00000000000000000000000000000000}\\
+\DrawInsnTypeRTikz{00000000000000000000000000000000}
 \DrawInsnTypeITikz{00000000000000000000000000000000}\\
 \DrawInsnTypeIShiftTikz{00000000000000000000000000000000}\\
 \DrawInsnTypeSTikz{00000000000000000000000000000000}\\
-\DrawInsnTypeRTikz{00000000000000000000000000000000}
+\DrawInsnTypeBTikz{00000000000000000000000000000000}\\
 \captionof{figure}{RISC-V instruction formats.}
 \label{Figure:riscvFormats}
 \end{figure}

+The method/format of the instructions has been designed with an eye on 
+the ease of future manufacture of the machine that will execute them.  It is 
+easier to build a machine if it does not have to accommodate many different 
+ways to perform the same task.  The result is that a machine can be 
+built with fewer gates, consumes less power, and can run faster than
+if it were built when a priority is on how a user might prefer to decode
+the same instructions from a hex dump.
+
+Observe that all instructions have their opcode in bits 0-6 and when they
+include an \verb@rd@ register it will be specified in bits 7-11, 
+an \verb@rs1@ register in bits 15-19, an \verb@rs2@ register in bits 20-24,
+and so on.  This has a seemingly strange impact on the placement of any 
+immediate operands.
+
+When immediate operands are present in an instruction, they are placed in
+the remaining unused bits.  However, they are organized such that
+the sign bit is ALWAYS in bit 31 and the remaining bits placed so
+as to minimize the number of places any given bit is located in different
+instructions. 
+
+For example, consider immediate operand bits 12-19.  In the U-type format
+it is in bit positions 12-19.  In the J-type format it is also in positions
+12-19.  In the J-type format immediate operand bits 1-10 are in the same 
+instruction bit positions as they are in the I-type format and immediate
+operand bits 5-10 are in the same positions as they are in the B-type format.
+
+While this is inconvenient for anyone looking at a memory hexdump, it does
+make sense when considering the impact of this choice on the number of
+gates needed to implement circuitry to extract the immediate operands. 
+
+
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@ -295,21 +320,22 @@ in \autoref{Figure:riscvFormats}.
 \label{insnformat:utype}

 The U-Type format is used for instructions that use a 20-bit immediate operand 
-and a destination register.
+and an \verb@rd@ destination register.
 
-\DrawInsnTypeUTikz{11010110000000000011001010110111}
+%\DrawInsnTypeUTikz{11010110000000000011001010110111}

 The \reg{rd} field contains an \reg{x} register number to be set to a value that
 depends on the instruction.

-The imm field 
-contains a 20-bit value that will be converted into \Gls{xlen} bits by 
-using the {\em imm} operand for bits 31:12 and then sign-extending it 
-to the left\footnote{When XLEN is larger than 32.} and zero-extending 
-the LSBs as discussed in \autoref{extension:zr}.
+%The imm field 
+%contains a 20-bit value that will be converted into \Gls{xlen} bits by 
+%using the {\em imm} operand for bits 31:12 and then sign-extending it 
+%to the left\footnote{When XLEN is larger than 32.} and zero-extending 
+%the LSBs as discussed in \autoref{extension:zr}.

-If \Gls{xlen}=32 then the {\em imm} value example will extracted from the instruction
-and converted as shown in \autoref{Figure:u_type_decode}.
+If \Gls{xlen}=32 then the {\em imm} value will extracted from the instruction
+and converted as shown in \autoref{Figure:u_type_decode} to form the
+\verb@imm_u@ value.

 \begin{figure}[ht]
 \centering
@ -323,13 +349,9 @@ and converted as shown in \autoref{Figure:u_type_decode}.
 Notice that the 20-bits of the imm field are mapped in the same order and 
 in the same relative position that they appear in the instruction when 
 they are used to create the value of the immediate operand.  
-Shifting the imm value to the left, into the ``upper bits'' of the immediate 
+Leaving the imm bits on the left, in the ``upper bits'' of the \verb@imm_u@ 
 value suggests a rationale for the name of this format.

-%from $01010110000000000011_2$ (\verb@d6003@$_{16}$) to 
-%$11010110000000000011000000000000_2$ (\verb@d6003000@$_{16}$).
-
-
 \begin{itemize}
 \item\instructionHeader{lui\ \ \ rd,imm}
 \label{insn:lui}
@ -351,10 +373,9 @@ memory address \verb@0x800012f4@ then register \verb@x22@ will be set to
 \end{itemize}


-
-If \Gls{xlen}=64 then the \verb@imm_u@ value in this example will be converted to the
-same two's complement integer value by extending the sign-bit (indicated by \verb@a@ 
-in \autoref{Figure:u_type_decode}) to the left.
+If \Gls{xlen}=64 then the \verb@imm_u@ value in this example will be converted 
+to the same two's complement integer value by extending the sign-bit 
+(indicated by \verb@a@ in \autoref{Figure:u_type_decode}) further to the left.



@ -366,41 +387,43 @@ in \autoref{Figure:u_type_decode}) to the left.
 \subsection{J Type}
 \label{insnformat:jtype}

-The J-type format is used for instructions that use a 20-bit immediate operand
-and a destination register.  It is similar to the U-type.  However, the immediate
-operand is constructed by arranging the {\em imm} bits in a different manner.
+The J-type instruction format is used to encode the \verb@jal@ instruction 
+with an immediate value that determines the jump target address.
+It is similar to the U-type, but the bits in the immediate operand are 
+arranged in a different order.

-\DrawInsnTypeJTikz{00111001001110000001001111101111}
+%\DrawInsnTypeJTikz{00111001001110000001001111101111}

-The \reg{rd} field contains an \reg{x} register number to be set to a value that
-depends on the instruction.
+Note that the \verb@imm_j@ value is expressed in the instruction as a target 
+address that is converted to a 21-bit value in the range of 
+$[-1048576..1048575]$ representing a \verb@pc@-relative offset to the 
+target address. 

+%In the J-type format the 20 {\em imm} bits are arranged such 
+%that they represent the ``lower'' portion of the immediate value.  Unlike 
+%the U-type instructions, the J-type requires the bits to be re-ordered 
+%and shifted to the right before they are used.
+%\footnote{The reason that the J-type 
+%bits are reordered like this is because it simplifies the implementation of 
+%hardware as discussed in \autoref{section:EncodingFormats}.}

-In the J-type format the 20 {\em imm} bits are arranged such 
-that they represent the ``lower'' portion of the immediate value.  Unlike 
-the U-type 
-instructions, the J-type requires the bits to be re-ordered and shifted 
-to the right before they are used.\footnote{The reason that the J-type 
-bits are reordered like this is because it simplifies the implementation of 
-hardware as discussed in \autoref{section:EncodingFormats}.}
+%The example above shows that the bit positions in the {\em imm} field 
+%description.  We see that the 20 {\em imm} bits are re-ordered according to: 
+%[20\textbar10:1\textbar11\textbar19:12].  
+%This means that the \acrshort{msb} of the {\em imm} field is to be placed 
+%into bit 20 of the immediate integer value ultimately used by the instruction 
+%when it is converted into \Gls{xlen} bits.  
+%The next bit to the right in the {\em imm} field is to be placed into bit 10 of 
+%the immediate value and so on.

-The example above shows that the bit positions in the {\em imm} field 
-description.  We see that the 20 {\em imm} bits are re-ordered according to: 
-[20\textbar10:1\textbar11\textbar19:12].  
-This means that the \acrshort{msb} of the {\em imm} field is to be placed 
-into bit 20 of the immediate integer value ultimately used by the instruction 
-when it is converted into \Gls{xlen} bits.  
-The next bit to the right in the {\em imm} field is to be placed into bit 10 of 
-the immediate value and so on.
+%After the {\em imm} bits are re-positioned into bits 20:1 of the immediate value
+%being constructed, a zero-bit will be added to the \acrshort{lsb} 
+%and the value in bit-position 20 will be replicated to sign-extend the 
+%value to \Gls{xlen} bits as discussed in \autoref{extension:slzr}.

-After the {\em imm} bits are re-positioned into bits 20:1 of the immediate value
-being constructed, a zero-bit will be added to the \acrshort{lsb} 
-and the value in bit-position 20 will be replicated to sign-extend the 
-value to \Gls{xlen} bits as discussed in \autoref{extension:slzr}.
-
-
-If \Gls{xlen}=32 then the {\em imm} value example will extracted from the instruction
-and converted as shown in \autoref{Figure:j_type_decode}.
+If \Gls{xlen}=32 then the {\em imm} value will extracted from the 
+instruction and converted as shown in \autoref{Figure:j_type_decode} to
+form the \verb@imm_j@ value.

 \begin{figure}[ht]
 \centering
@ -424,10 +447,11 @@ and converted as shown in \autoref{Figure:j_type_decode}.
 %\DrawBitBoxSignLeftZeroRightExtendedPicture{32}{11000000110111001001}{1}

 The J-type format is used by the Jump And Link instruction that calculates 
-a target address by adding a signed immediate value to the current program 
+the target address by adding \verb@imm_b@ to the current program 
 counter.  Since no instruction can be placed at an odd address the 20-bit 
 imm value is zero-extended to the right to represent a 21-bit signed offset 
-capable of representing numbers twice the magnitude of the 20-bit imm value.
+capable of expressing a wider range of target addresses than the 20-bit 
+imm value alone.

 \begin{itemize}
 \item\instructionHeader{jal\ \ \ rd,imm}
@ -435,10 +459,30 @@ capable of representing numbers twice the magnitude of the 20-bit imm value.

 Set register \verb@rd@ to the address of the next instruction that would 
 otherwise be executed (the address of the \verb@jal@ instruction + 4) and then
-jump to an address given by the sum of the \verb@pc@ register and the 
-\verb@imm_j@ value as decoded from the instruction shown in \autoref{imm.j:decode}.
+jump to the address given by the sum of the \verb@pc@ register and the 
+\verb@imm_j@ value as decoded from the instruction shown in 
+\autoref{imm.j:decode}.

+Note that \verb@imm_j@ is expressed in the instruction as a target address 
+that is converted to a 21-bit value representing a \verb@pc@-relative offset 
+to the target address. For example, consider the \verb@jal@ instructions in the 
+following code:

+\begin{verbatim}
+00000010: 000002ef  jal    x5,0x10      # jump to self (address 0x10)
+00000014: 008002ef  jal    x5,0x1c      # jump to address 0x1c
+00000018: 00100073  ebreak    
+0000001c: 00100073  ebreak    
+\end{verbatim}
+
+The instruction at address \verb@0x10@ has a target address of \verb@0x10@
+and the \verb@imm_j@ is zero because offset from the ``current instruction''
+to the target is zero.
+
+The instruction at address \verb@0x14@ has a target address of \verb@0x1c@
+and the \verb@imm_j@ is \verb@0x08@ because \verb@0x1c - 0x14 = 0x08@.
+
+See also \autoref{insnformat:btype}.

 \end{itemize}

@ -455,7 +499,8 @@ The R-type instructions are used for operations that set a destination
 register \verb@rd@ to the result of an arithmetic, logical or shift operation
 applied to source registers \verb@rs1@ and \verb@rs2@.

-Note that bit 30 is used to select between the \verb@add@ and \verb@sub@ instructions
+Note that instruction bit 30 (part of the the \verb@funct7@ field) 
+is used to select between the \verb@add@ and \verb@sub@ instructions 
 as well as to select between arithmetic and logical shifting.

 \begin{itemize}
@ -547,10 +592,15 @@ value \verb@0xaa55ee11@.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{I Type}
 \label{insnformat:itype}
-\DrawInsnTypeITikz{00000000010000011000001110000011}
+%\DrawInsnTypeITikz{00000000010000011000001110000011}

-If \Gls{xlen}=32 then the {\em imm} value example will extracted from the instruction
-and converted as shown in \autoref{Figure:i_type_decode}.
+The I-type instruction format is used to encode instructions with a
+signed 12-bit immediate operand with a range of $[-2048..2047]$,
+an \verb@rd@ register, and an \verb@rs1@ register.
+
+If \Gls{xlen}=32 then the 12-bit {\em imm} value example will extracted from 
+the instruction and converted as shown in \autoref{Figure:i_type_decode}
+to form the \verb@imm_i@ value.

 \begin{figure}[ht]
 \centering
@ -561,10 +611,11 @@ and converted as shown in \autoref{Figure:i_type_decode}.
 \index{imm\protect\_i}
 \end{figure}

-A special case of the I-type used for shift-immediate instructions where 
-the {\em imm} field is used as an immediate value named {\em shamt\_i} 
-representing the number of bit positions to shift as shown in 
-\autoref{Figure:shamt_i_type_decode}.
+A special case of the I-type is used for shift-immediate instructions 
+where only five bits of the imm field are used to represent the number 
+of bit positions to shift as shown in \autoref{Figure:shamt_i_type_decode}. 
+In this variation, the least significant five bits of the imm field are 
+zero-extended to form the \verb@shamt_i@ value.

 \begin{figure}[ht]
 \centering
@ -759,10 +810,15 @@ Therefore if \verb@x17@ = \verb@0x55551111@ then the instruction
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{S Type}
 \label{insnformat:stype}
-\DrawInsnTypeSTikz{00000000111100011000100110100011}
+%\DrawInsnTypeSTikz{00000000111100011000100110100011}

-If \Gls{xlen}=32 then the {\em imm} value example will extracted from the instruction
-and converted as shown \autoref{Figure:imm_s_type_decode}.
+The S-type instruction format is used to encode instructions with a
+signed 12-bit immediate operand with a range of $[-2048..2047]$,
+an \verb@rs1@ register, and an \verb@rs2@ register.
+
+If \Gls{xlen}=32 then the 12-bit {\em imm} value example will extracted 
+from the instruction and converted as shown \autoref{Figure:imm_s_type_decode}
+to form the \verb@imm_s@ value.

 \begin{figure}[ht]
 \centering
@ -836,10 +892,16 @@ then the instruction \verb@sw x12,0(x13)@ will change the memory word at address
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \subsection{B Type}
 \label{insnformat:btype}
-\DrawInsnTypeBTikz{00000000111100011000100011100011}
+%\DrawInsnTypeBTikz{00000000111100011000100011100011}

-If \Gls{xlen}=32 then the {\em imm} value example will extracted from the instruction
-and converted as shown in \autoref{Figure:imm_b_type_decode}.
+The B-type instruction format is used for branch instructions that 
+require an even immediate value that is used to determine the
+branch target address as an offset from the current instruction's
+address. 
+
+If \Gls{xlen}=32 then the 12-bit {\em imm} value example will extracted from 
+the instruction and converted as shown in \autoref{Figure:imm_b_type_decode}
+to form the \verb@imm_b@ value.

 \begin{figure}[ht]
 \centering
@ -850,42 +912,67 @@ and converted as shown in \autoref{Figure:imm_b_type_decode}.
 \index{imm\protect\_b}
 \end{figure}

+Note that \verb@imm_b@ is expressed in the instruction as a target 
+address that is converted to a 13-bit value in the range of 
+$[-4096..4095]$ representing a \verb@pc@-relative offset to the
+target address. For example, consider the branch instructions in
+the following code:
+
+\begin{verbatim}
+00000000: 00520063  beq    x4,x5,0x0    # branches to self (address 0x0)
+00000004: 00520463  beq    x4,x5,0xc    # branches to address 0xc
+00000008: fe520ce3  beq    x4,x5,0x0    # branches to address 0x0
+0000000c: 00100073  ebreak    
+\end{verbatim}
+
+The instruction at address \verb@0x0@ has a target address of zero and
+\verb@imm_b@ is zero because the offset from the ``current instruction''
+to the target is zero.\footnote{This is in contrast to many other
+instruction sets with {\tt pc}-relative addressing modes that express
+a branch target offset from the ``next instruction.''}
+
+The instruction at address \verb@0x4@ has a target address of \verb@0xc@
+and it has an \verb@imm_b@ of \verb@0x08@ because \verb@0x4 + 0x08 = 0x0c@.
+
+The instruction at address \verb@0x8@ has a target address of zero and
+\verb@imm_b@ is \verb@0xfffffff8@ (-8) because \verb@0x8 + 0xfffffff8 = 0x0@.
+
 \begin{itemize}
-\item\instructionHeader{beq\ \ \ rs1,rs2,imm}
+\item\instructionHeader{beq\ \ \ rs1,rs2,pcrel\_13}
 \label{insn:beq}

 If \verb@rs1@ is equal to \verb@rs2@ then add \verb@imm_b@ to the 
 \verb@pc@ register.

-\item\instructionHeader{bge\ \ \ rs1,rs2,imm}
+\item\instructionHeader{bge\ \ \ rs1,rs2,pcrel\_13}
 \label{insn:bge}

 If the signed value in \verb@rs1@ is greater than or equal to the 
 signed value in \verb@rs2@ then add \verb@imm_b@ to the 
 \verb@pc@ register.

-\item\instructionHeader{bgeu\ \ \ rs1,rs2,imm}
+\item\instructionHeader{bgeu\ \ rs1,rs2,pcrel\_13}
 \label{insn:bgeu}

 If the unsigned value in \verb@rs1@ is greater than or equal to the 
 unsigned value in \verb@rs2@ then add \verb@imm_b@ to the 
 \verb@pc@ register.

-\item\instructionHeader{blt\ \ \ rs1,rs2,imm}
+\item\instructionHeader{blt\ \ \ rs1,rs2,pcrel\_13}
 \label{insn:blt}

 If the signed value in \verb@rs1@ is less than the 
 signed value in \verb@rs2@ then add \verb@imm_b@ to the 
 \verb@pc@ register.

-\item\instructionHeader{bltu\ \ \ rs1,rs2,imm}
+\item\instructionHeader{bltu\ \ rs1,rs2,pcrel\_13}
 \label{insn:bltu}

 If the unsigned value in \verb@rs1@ is less than the 
 unsigned value in \verb@rs2@ then add \verb@imm_b@ to the 
 \verb@pc@ register.

-\item\instructionHeader{bne\ \ \ rs1,rs2,imm}
+\item\instructionHeader{bne\ \ \ rs1,rs2,pcrel\_13}
 \label{insn:bne}

 If \verb@rs1@ is not equal to \verb@rs2@ then add \verb@imm_b@ to the