rvalp/book/programs/chapter.tex

\chapter{Writing RISC-V Programs}

\enote{Introduce the ISA register names and aliases in here?}%
This chapter introduces each of the RV32I instructions by developing programs
that demonstrate their usefulness.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Use {\tt ebreak} to Stop \rvddt{} Execution}
\index{Instruction!ebreak}
\label{uguide:ebreak}

It is a good idea to learn how to stop before learning how to go!

The \insn{ebreak} instruction exists for the sole purpose of transferring control back
to a debugging environment.\cite[p.~24]{rvismv1v22:2017}

When \rvddt{} executes an \insn{ebreak} instruction, it will immediately terminate any
executing {\em trace} or {\em go} command currently executing and return to the
command prompt without advancing the \reg{pc} register.

The machine language encoding shows that \insn{ebreak} has no operands.

\DrawInsnTypeEPicture{ebreak}{00000000000100000000000001110011}

\listingRef{ebreak/ebreak.out} demonstrates that since \rvddt{} does
not advance the \reg{pc} when it encounters an \insn{ebreak} instruction,
subsequent {\em trace} and/or {\em go} commands will re-execute the same \insn{ebreak}
and halt the simulation again (and again).
This feature is intended to help prevent overzealous users from accidently
running past the end of a code fragment.\footnote{This was one of the first {\em enhancements}
I needed for myself \tt:-)}

\listing{ebreak/ebreak.S}{A one-line \insn{ebreak} program.}

\listing{ebreak/ebreak.out}{\insn{ebreak} stopps \rvddt{} without advancing \reg{pc}.}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Using the \insn{addi} Instruction}
\index{Instruction!addi}
\label{uguide:addi}

\enote{Define what constant and immediate values are somewhere.}%
The detailed description of how the \insn{addi} instruction is executed
is that it:
\begin{enumerate}
\item Sign-extends the immediate operand.
\item Add the sign-extended immediate operand to the contents of the \reg{rs1} register.
\item Store the sum in the \reg{rd} register.
\item Add four to the \reg{pc} register (point to the next instruction.)
\end{enumerate}

In the following example \reg{rs1} = \reg{x28}, \reg{rd} = \reg{x29} and
the immediate operand is -1.

\DrawInsnTypeIPicture{addi x29, x28, -1}{11111111111111100000111010010011}

Depending on the values of the fields in this instruction a number of
different operations can be performed.  The most obvious is that it
can add things.  But it can also be used to copy registers, set a
register to zero and even, when you need to, accomplish nothing.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{No Operation}
\index{Instruction!nop}

It might seem odd but it is sometimes important to be able to execute
an instruction that accomplishes nothing while simply advancing the
\reg{pc} to the next instruction.  One reason for this is to fill
unused memory between two instructions in a program.%
\footnote{This can happen during the evolution of one portion of code
that reduces in size but has to continue to fit into a system without
altering any other code\ldots\ or sometimes you just need to waste
a small amount of time in a device driver.}

An instruction that accomplishes nothing is called a \insn{nop}
(sometimes systems call these \insn{noop}).  The name means
{\em no operation}.
The intent of a \insn{nop} is to execute without having any side effects
other than to advance the \reg{pc} register.

The \insn{addi} instruction can serve as a \insn{nop} by coding it like this:

\DrawInsnTypeIPicture{addi x0, x0, 0}{00000000000000000000000000010011}

The result will be to add zero to zero and discard the result (because you
can never store a value into the x0 register.)

The RISC-V assembler provides a pseudoinstruction specifically for this
purpose that you can use to improve the readability of your code.  Note
that the \insn{addi} and \insn{nop} instructions in \listingRef{nop/nop.S}
are assembled into the exact same binary machine instructions
as can be seen by comparing it to
\verb@objdump@ \listingRef{nop/nop.lst},
and \verb@rvddt@ \listingRef{nop/nop.out} output.

%(The \hex{00000013} you can see are stored at addresses \hex{0} and \hex{4})
%as seen by looking at the \verb@objdump@ listing in \listingRef{nop/nop.lst}.
%In fact, you can see that objdump shows both instructions as a \insn{nop}
%while \listingRef{nop/nop.out} shows that \rvddt{} displays both as
%\verb@addi x0, x0, 0@.

\listing{nop/nop.S}{Demonstrate that \insn{addi} can be used as a \insn{nop}.}

\index{objdump}
\listing{nop/nop.lst}{Using \insn{addi} to perform a \insn{nop}}

\listing{nop/nop.out}{Using \insn{addi} to perform a \insn{nop}}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Copying the Contents of One Register to Another}

By adding zero to one register and storing the sum in another register
the \insn{addi} instruction can be used to copy the value stored in one
register to another register.  The following instruction will copy
the contents of \reg{t4} into \reg{t3}.

\DrawInsnTypeIPicture{addi t3, t4, 0}{00000000000011101000111000010011}

\index{Instruction!mv}
This is a commonly required operation.  To make your intent clear
you may use the \insn{mv} pseudoinstruction for this purpose.

\listingRef{mv/mv.S} shows the source of a program that is dumped in
\listingRef{mv/mv.lst} illustrating that the assembler has generated the
same machine instruction (\hex{000e8e13} at addresses \hex{0} and \hex{4})
for both of the instructions.

\listing{mv/mv.S}{Comparing \insn{addi} to \insn{mv}}

\listing{mv/mv.lst}{An objdump of an \insn{addi} and \insn{mv} Instruction.}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Setting a Register to Zero}

Recall that \reg{x0} always contains the value zero.  Any register
can be set to zero by copying the contents of \reg{x0} using \insn{mv}
(aka \insn{addi}).%
\footnote{There are other pseudoinstructions (such as \insn{li}) that can also
turn into an \insn{addi} instruction.  Objdump might display `{\tt addi t3,x0,0}'
as `{\tt mv t3,x0}' or `{\tt li t3,0}'.}

For example, to set \reg{t3} to zero:

\DrawInsnTypeIPicture{addi t3, x0, 0}{00000000000000000000111000010011}

\listing{mvzero/mv.S}{Using \insn{mv} (aka \insn{addi}) to zero-out a register.}

\listingRef{mvzero/mv.out} traces the execution of the program in
\listingRef{mvzero/mv.S} showing how \reg{t3} is changed from \hex{f0f0f0f0}
(seen on $\ell 16$) to \hex{00000000} (seen on $\ell 26$.)

\listing{mvzero/mv.out}{Setting \reg{t3} to zero.}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Adding a 12-bit Signed Value}


\DrawInsnTypeIPicture{addi x1, x7, 4}{00000000010000111000000010010011}


{\small
\begin{verbatim}
    addi    t0, zero, 4     # t0 = 4
    addi    t1, t1, 100     # t1 = 104

    addi    t0, zero, 0x123     # t0 = 0x123
    addi    t0, t0, 0xfff       # t0 = 0x122 (subtract 1)

    addi    t0, zero, 0xfff     # t0 = 0xffffffff (-1)  (diagram out the chaining carry)
                                # refer back to the overflow/truncation discussion in binary chapter

	addi x0, x0, 0				# no operation (pseudo: nop)
	addi rd, rs, 0				# copy reg rs to rd (pseudo: mv rd, rs)
\end{verbatim}
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{todo}

Ideas for the order of introducing instructions.


\section{Other Instructions With Immediate Operands}

\label{uguide:andi}
\label{uguide:ori}
\label{uguide:xori}
\label{uguide:slti}
\label{uguide:sltiu}
\label{uguide:srai}
\label{uguide:slli}
\label{uguide:srli}
{\small
\begin{verbatim}
    andi
    ori
    xori

    slti
    sltiu
    srai
    slli
    srli
\end{verbatim}
}

\section{Transferring Data Between Registers and Memory}

RV is a load-store architecture.  This means that the only way that the
CPU can interact with the memory is via the {\em load} and {\em store}
instructions.  All other data manipulation must be performed on register
values.

Copying values from memory to a register (first examples using regs set with addi):
\label{uguide:lb}
\label{uguide:lh}
\label{uguide:lw}
\label{uguide:lbu}
\label{uguide:lhu}
{\small
\begin{verbatim}
    lb
    lh
    lw
    lbu
    lhu
\end{verbatim}
}

Copying values from a register to memory:
\label{uguide:sb}
\label{uguide:sh}
\label{uguide:sw}
{\small
\begin{verbatim}
    sb
    sh
    sw
\end{verbatim}
}

\section{RR operations}
\label{uguide:add}
\label{uguide:sub}
\label{uguide:and}
\label{uguide:or}
\label{uguide:sra}
\label{uguide:srl}
\label{uguide:sll}
\label{uguide:xor}
\label{uguide:sltu}
\label{uguide:slt}
{\small
\begin{verbatim}
    add
    sub
    and
    or
    sra
    srl
    sll
    xor
    sltu
    slt
\end{verbatim}
}


\section{Setting registers to large values using lui with addi}

\label{uguide:lui}
\label{uguide:auipc}
{\small
\begin{verbatim}
    addi        // useful for values from -2048 to 2047
    lui         // useful for loading any multiple of 0x1000

    Setting a register to any other value must be done using a combo of insns:

    auipc       // Load an address relative the the current PC (see la pseudo)
    addi

    lui         // Load constant into into bits 31:12  (see li pseudo)
    addi        // add a constant to fill in bits 11:0
                    if bit 11 is set then need to +1 the lui value to compensate
\end{verbatim}
}


\section{Labels and Branching}

Start to introduce addressing here?

\label{uguide:beq}
\label{uguide:bne}
\label{uguide:blt}
\label{uguide:bge}
\label{uguide:bltu}
\label{uguide:bgeu}
\label{uguide:bgt}
\label{uguide:ble}
\label{uguide:bgtu}
\label{uguide:beqz}
\label{uguide:bnez}
\label{uguide:blez}
\label{uguide:bgez}
\label{uguide:bltz}
\label{uguide:bgtz}
{\small
\begin{verbatim}
    beq
    bne
    blt
    bge
    bltu
    bgeu

    bgt rs, rt, offset      # pseudo for: blt rt, rs, offset    (reverse the operands)
    ble rs, rt, offset      # pseudo for: bge rt, rs, offset    (reverse the operands)
    bgtu rs, rt, offset     # pseudo for: bltu rt, rs, offset   (reverse the operands)
    bleu rs, rt, offset     # pseudo for: bgeu rt, rs, offset   (reverse the operands)

    beqz rs, offset         # pseudo for: beq rs, x0, offset
    bnez rs, offset         # pseudo for: bne rs, x0, offset
    blez rs, offset         # pseudo for: bge x0, rs, offset
    bgez rs, offset         # pseudo for: bge rs, x0, offset
    bltz rs, offset         # pseudo for: blt rs, x0, offset
    bgtz rs, offset         # pseudo for: blt x0, rs, offset
\end{verbatim}
}


\section{Jumps}

Introduce and present subroutines but not nesting until introduce stack operations.

\label{uguide:jal}
\label{uguide:jalr}
{\small
\begin{verbatim}
    jal
    jalr
\end{verbatim}
}


\section{Pseudoinstructions}

{\small
\begin{verbatim}
    li   rd,constant lui      rd,(constant >>U 12)+(constant & 0x00000800 ? 1 : 0)
                     addi     rd,rd,(constant & 0xfff)

    la   rd,label
                     auipc    rd,((label-.) >>U 12) + ((label-.) & 0x00000800 ? 1 : 0)
                     addi     rd,rd,((label-(.-4)) & 0xfff)

    l{b|h|w} rd,label
                     auipc    rd,((label-.) >>U 12) + ((label-.) & 0x00000800 ? 1 : 0)
                     l{b|h|w} rd,((label-(.-4)) & 0xfff)(rd)

    s{b|h|w} rd,label,rt               # rt used as a temp reg for the operation (default=x6)
                     auipc    rt,((label-.) >>U 12) + ((label-.) & 0x00000800 ? 1 : 0)
                     s{b|h|w} rd,((label-(.-4)) & 0xfff)(rt)

    call label       auipc    x1,((label-.) >>U 12) + ((label-.) & 0x00000800 ? 1 : 0)
                     jalr     x1,((label-(.-4)) & 0xfff)(x1)

    tail label,rt                      # rt used as a temp reg for the operation (default=x6)
                     auipc    rt,((label-.) >>U 12) + ((label-.) & 0x00000800 ? 1 : 0)
                     jalr     x0,((label-(.-4)) & 0xfff)(rt)

    mv   rd,rs       addi     rd,rs,0

    j    label       jal      x0,label
    jal  label       jal      x1,label
    jr   rs          jalr     x0,0(rs)
    jalr rs          jalr     x1,0(rs)
    ret              jalr     x0,0(x1)
\end{verbatim}
}

\subsection{The {\tt li} Pseudoinstruction}

Note that the {\tt li} pseudoinstruction includes a conditional addition of 1 to the operand
in the {\tt lui} instruction.  This is because the immediate operand in the
{\tt addi} instruction is sign-extended before it is added to \verb@rd@.
If the immediate operand to the {\tt addi} has its most-significant-bit set to 1 then
it will have the effect of subtracting 1 from the operand in the \verb@lui@ instruction.

Consider the case of putting the value {\tt 0x12345800} into register {\tt x5}:

{\small
\begin{verbatim}
    li  x5,0x12345800
\end{verbatim}
}
{\color{red}
A naive (incorrect) solution might be:
{\small
\begin{verbatim}
    lui  x5,0x12345    // x5 = 0x12345000
    addi x5,x5,0x800   // x5 = 0x12345000 + sx(0x800) = 0x12345000 + 0xfffff800 = 0x12344800
\end{verbatim}
}
The result of the above code is that an incorrect value has been placed into x5.
}

To remedy this problem, the value used in the {\tt lui} instruction can altered
(by adding 1 to its operand) to compensate for the sign-extention in the {\tt addi}
instruction:
{\small
\begin{verbatim}
    lui  x5,0x12346    // x5 = 0x12346000  (note this is 0x12345 + 1)
    addi x5,x5,0x800   // x5 = 0x12346000 + sx(0x800) = 0x12346000 + 0xfffff800 = 0x12345800
\end{verbatim}
}
Keep in mind that the {\em only} time that this altering of the operand in the {\tt lui}
instruction should take place is when the most-significant-bit of the operand in the
{\tt addi} is set to one.

Consider the case where we wish to put the value {\tt 0x12345700} into register {\tt x5}:
{\small
\begin{verbatim}
    lui  x5,0x12345    // x5 = 0x12345000  (note this is 0x12345 + 0)
    addi x5,x5,0x700   // x5 = 0x12345000 + sx(0x700) = 0x12345000 + 0x00000700 = 0x12345700
\end{verbatim}
}
The sign-extension in this example performed by the {\tt addi} instruction will convert the
{\tt 0x700} to {\tt 0x00000700} before the addition.

Therefore, the {\tt li} pseudoinstruction must {\em only} increment the operand of the
{\tt lui} instruction when it is known that the operand of the subsequent {\tt addi}
instruction will be a negative number.


\subsection{The {\tt la} Pseudoinstruction}

The \verb@la@ (and others that use \verb@auipc@ such as
the \verb@l{b|h|w}@, \verb@s{b|h|w}@, \verb@call@, and \verb@tail@) pseudoinstructions
also compensate for a sign-ended negative number when adding a 12-bit immediate
operand. The only difference is that these use a \verb@pc@-relative addressing mode.

For example, consider the task of putting an address represented by the label \verb@var1@
into register x10:

{\small
\begin{verbatim}
00010040     la     x10,var1
00010048 ...                 # note that the la pseudoinstruction expands into 8 bytes
...

         var1:
00010900     .word  999       # a 32-bit integer constant stored in memory at address var1
\end{verbatim}
}
The \verb@la@ instruction here will expand into:
{\small
\begin{verbatim}
00010040     auipc x10,((var1-.) >>U 12) + ((var1-.) & 0x00000800 ? 1 : 0)
00010044     addi  x10,x10,((var1-(.-4)) & 0xfff)
\end{verbatim}
}

Note that \verb@auipc@ will shift the immediate operand to the left 12 bits and then
add that to the \verb@pc@ register (see \autoref{insn:auipc}.)

The assembler will calculate the value of \verb@(var1-.)@ by subtracting the address
represented by the label \verb@var1@ from the address of the current instruction
(which is expressed as '.') resulting in the number of bytes from the current instruction
to the target label\ldots{} which is \verb@0x000008c0@.

Therefore the expanded pseudoinstruction example will become:

{\small
\begin{verbatim}
00010040     auipc x10,((0x00010900-0x00010040) >>U 12) + ((0x00010900-0x00010040) & 0x00000800 ? 1 : 0)
00010044     addi  x10,x10,((0x00010900-(0x00010044-4)) & 0xfff)      # note the extra -4 here!
\end{verbatim}
}
After performing the subtractions, it will reduce to this:
{\small
\begin{verbatim}
00010040     auipc x10,(0x000008c0 >>U 12) + ((0x000008c0) & 0x00000800 ? 1 : 0)
00010044     addi  x10,x10,(0x000008c0 & 0xfff)
\end{verbatim}
}
Continuing to reduce the math operations we get:
{\small
\begin{verbatim}
00010040     auipc x10,0x00000 + 1    # add 1 here because 0x8c0 below has MSB = 1
00010044     addi  x10,x10,0x8c0
\end{verbatim}
}

\ldots and\ldots

{\small
\begin{verbatim}
00010040     auipc x10,0x00001
00010044     addi  x10,x10,0x8c0
\end{verbatim}
}

Note that the the \verb@la@ exhibits the same sort of technique as the \verb@li@ in that
if/when the immediate operand of the \verb@addi@ instruction has its most significant
bit set then the operand in the \verb@auipc@ has to be incremented by 1 to compensate.


\section{Relocation}

Because expressions that refer to constants and address labels are common in
assembly language programs, a shorthand notation is available for calculating
the pairs of values that are used in the implementation of things like the
\verb@li@ and \verb@la@ pseudoinstructions (that have to be written to
compensate for the sign-extension that will take place in the immediate operand
that appears in instructions like \verb@addi@ and \verb@jalr@.)

\subsection{Absolute Addresses}

To refer to an absolute value, the following operators can be used:
{\small
\begin{verbatim}
    %hi(constant)    // becomes: (constant >>U 12)+(constant & 0x00000800 ? 1 : 0)
    %lo(constant)    // becomes: (constant & 0xfff)
\end{verbatim}
}

Thus, the \verb@li@ pseudoinstruction can be expressed like this:

{\small
\begin{verbatim}
    li   rd,constant  lui      rd,%hi(constant)
                      addi     rd,rd,%lo(constant)
\end{verbatim}
}


\subsection{PC-Relative Addresses}

The following can be used for PC-relative addresses:
{\small
\begin{verbatim}
    %pcrel_hi(symbol) // becomes: ((symbol-.) >>U 12) + ((symbol-.) & 0x00000800 ? 1 : 0)
    %pcrel_lo(lab)    // becomes: ((symbol-lab) & 0xfff)
\end{verbatim}
}

Note the subtlety involved with the \verb@lab@ on \verb@%pcrel_lo@. It is needed to
determine the address of the instruction that contains the corresponding \verb@%pcrel_hi@.
(The label \verb@lab@ MUST be on a line that used a \verb@%pcrel_hi()@ or get an
error from the assembler.)

Thus, the \verb@la rd,label@ pseudoinstruction can be expressed like this:
{\small
\begin{verbatim}
xxx:  auipc rd,%pcrel_hi(label)
      addi  rd,rd,%pcrel_lo(xxx)  // the xxx tells pcrel_lo where to find the matching pcrel_hi
\end{verbatim}
}

Examples of using the \verb@auipc@ \& \verb@addi@ together with \verb@%pcrel_hi()@ and
\verb@%pcrel_lo()@:

{\small
\begin{verbatim}
xxx:    auipc   t1,%pcrel_hi(yyy)     // (yyy-xxx) >>U 12) + ((yyy-xxx) & 0x00000800 ? 1 : 0)
        addi    t1,t1,%pcrel_lo(xxx)  // ((yyy-xxx) & 0xfff)
...
yyy:                                  // the address: yyy is saved into t1 above
...
\end{verbatim}
}


Things like this are legal:
{\small
\begin{verbatim}
label:  auipc   t1,%pcrel_hi(symbol)
        addi    t2,t1,%pcrel_lo(label)
        addi    t3,t1,%pcrel_lo(label)
        lw      t4,%pcrel_lo(label)(t1)
        sw      t5,%pcrel_lo(label)(t1)
\end{verbatim}
}

\section{Relaxation}

\enote{I'm not sure I want to get into the details of how this is done. Just assume it works.}%
In the simplest of terms, {\em Relaxation} refers to the ability of the
linker (not the compiler!) to determine if/when the instructions that
were generated with the \verb@xxx_hi@ and \verb@xxx_lo@ operators are
unneeded (and thus waste execution time and memory) and can therefore
be removed.

However, doing so is not trivial as it will result in moving things around
in memory, possibly changing the values of address labels in the
already-assembled program!  Therefore, while the motivation for
rexation is obvious, the process of implementing it is non-trivial.

See: \url{https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md}