mirror of
https://github.com/johnwinans/rvalp.git
synced 2025-09-27 05:04:39 -04:00
Add a draft of Arrays and Character Strings
This commit is contained in:
parent
c7ce1c5df7
commit
fd5c875926
@ -953,7 +953,7 @@ be used by the CPU if it were to read them as a multi-byte value.
|
||||
Note that in a little-endian system, the number of bytes used to represent
|
||||
the value does not change the place value of the first byte(s). In this
|
||||
example, the \hex{76} at address \hex{00002658} is the least significant
|
||||
bytes in all representations.
|
||||
byte in all representations.
|
||||
|
||||
In the Risc-V ISA it is noted that ``A minor point is that we have also found
|
||||
little-endian memory systems to be more natural for hardware
|
||||
@ -962,12 +962,94 @@ on big-endian data structures, and so we leave open the possibility of
|
||||
non-standard big-endian or bi-endian systems.''\cite[p.~6]{rvismv1v22:2017}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Character Strings and Arrays}
|
||||
\subsection{Arrays and Character Strings}
|
||||
|
||||
Define character strings and arrays.
|
||||
While Endianness defines to how single values are stored in memory,
|
||||
the {\em array} defines how multiple values are stored.
|
||||
|
||||
Using the prior memory dump, discuss how and where things are stored and
|
||||
retrieved.
|
||||
An array is a data structure comprised of an ordered set of elements.
|
||||
This text will limit its definition of {\em array} to those sets
|
||||
of elements that are all of the same {\em type}. Where {\em type}
|
||||
refers to the size (number of bytes) and representation (signed,
|
||||
unsigned) of each element.
|
||||
|
||||
In an array, the elements are stored adjacent to one another such that the
|
||||
address of any element may be defined as:
|
||||
|
||||
\begin{equation}
|
||||
e = a + n * s
|
||||
\end{equation}
|
||||
|
||||
Where $n$ is the element number of interest, $e$ is the address of element
|
||||
of interest, $a$ is the address of the first element in the array, $s$
|
||||
is the size of each element, $a[0]$ is the first element of the array
|
||||
and $a[n-1]$ is the last element of the array.%
|
||||
\footnote{Some computing languages (C, C++, Java, C\#, Python, Perl,\ldots)
|
||||
define an array such that the first element is indexed as $a[0]$.
|
||||
While others (FORTRAN, MATLAB) define the first element of an
|
||||
array to be $a[1]$.}
|
||||
|
||||
Using this definition, \listingRef{rvddt_memdump.out}, knowledge that
|
||||
we are using a little-endian machine and given that
|
||||
$a = $\hex{00002656} and $s = 2$, the values of the first 8 elements
|
||||
of array $a$ are:
|
||||
|
||||
\begin{itemize}
|
||||
\item $a[0]$ is \hex{0000} and is stored at \hex{00002656}.
|
||||
\item $a[1]$ is \hex{6176} and is stored at \hex{00002658}.
|
||||
\item $a[2]$ is \hex{3d6c} and is stored at \hex{0000265a}.
|
||||
\item $a[3]$ is \hex{0000} and is stored at \hex{0000265c}.
|
||||
\item $a[4]$ is \hex{0000} and is stored at \hex{00002660}.
|
||||
\item $a[5]$ is \hex{0000} and is stored at \hex{00002662}.
|
||||
\item $a[6]$ is \hex{8480} and is stored at \hex{00002664}.
|
||||
\item $a[7]$ is \hex{412e} and is stored at \hex{00002666}.
|
||||
\end{itemize}
|
||||
|
||||
As a general rule, there is no fixed rule or notion as to how many
|
||||
elements an array has. It is up to the programmer to ensure that
|
||||
the starting address and the nubmer of elements in any given array
|
||||
(its size) are used properly so that data bytes outside an array
|
||||
are not accidently used as elements.
|
||||
|
||||
There is, however, a common convention used for an array of
|
||||
characters that is used to hold a text message
|
||||
(called a {\em character string} or just {\em string}).
|
||||
|
||||
When an array is used to hold a string the element past the last
|
||||
character in the string is set to zero. This is because 1) zero
|
||||
is not a valid printable ASCII character and 2) it simplifies
|
||||
software in that knowing no more than the starting address of a
|
||||
string is all that is needed to processes it. Without this zero
|
||||
{\em sentinel} value (called a {\em null} terminator), some knowledge
|
||||
of the number of characters in the string would have to otherwise
|
||||
be conveyed to any code needing to consume or process the string.
|
||||
|
||||
In \listingRef{rvddt_memdump.out}, the 5-byte long array starting
|
||||
at address \hex{00002658} contains a string whose value can be
|
||||
expressed as either of:
|
||||
|
||||
\begin{itemize}
|
||||
\item \verb@76 61 6c 3d 00@
|
||||
\item \verb@"val="@
|
||||
\end{itemize}
|
||||
|
||||
\index{ASCII}
|
||||
\index{ASCIIZ}
|
||||
When the double-quoted text form is used, the GNU assembler used in
|
||||
this text differentiates between {\em ascii} and {\em asciiz} strings
|
||||
such that an ascii string is {\em not} null terminated and an
|
||||
asciiz string {\em is} null terminated.
|
||||
|
||||
The value of providing a method to create a string that is {\em not}
|
||||
null terminated is that a program may define a large string by
|
||||
concatenating a number of ascii strings together and following the
|
||||
last with a byte of zero to null-terminate the lot.
|
||||
|
||||
It is a common mistake to create a string with a missing
|
||||
null terminator. The result of printing such a ``string'' is that
|
||||
the string is printed and as well as whatever random data bytes in
|
||||
memory that follows it until a byte whose value is zero is found
|
||||
by chance.
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
\subsection{Context is Important!}
|
||||
@ -1034,7 +1116,9 @@ natural boundaries.
|
||||
|
||||
Every possible instruction that an RV32I CPU can execute contains
|
||||
exactly 32 bits. Therefore they are always stored on a full word
|
||||
boundary. Any {\em unaligned} instruction would is {\em illegal}.
|
||||
boundary. Any {\em unaligned} instruction is {\em illegal}.%
|
||||
\footnote{This rule is relaxed by the C extension to allow an
|
||||
instruction to start at any even address.\cite[p.~5]{rvismv1v22:2017}}
|
||||
|
||||
An attempt to fetch an instruction from an unaligned address
|
||||
will result in an error referred to as an alignment {\em \gls{exception}}.
|
||||
|
Loading…
x
Reference in New Issue
Block a user