Chapter 6 External interface to Fsimg and Fsim
6.1 Executable and Linking Format (ELF)
ELF was originally developed and published by UNIX Systems Laboratories (USL)
as part of the Application Binary Interface (ABI). The Tool Interface Standards committee
(TIS)[3] has selected the evolving ELF standard as a portable object file format that works on
32-bit architecture environments for a variety of operating systems.
There are three main types of ELF object files.
- Relocatable file - This type of ELF file holds the code and data for linking with
other object files to create an executable or a shared object file.
- Executable file - This type of ELF file holds a program suitable for program
execution. This file specifies how to create the program's process image.
- Shared Object file - This type of ELF file holds the data suitable for linking.
Linking is done in two ways. The static linking done by the link editor (ld) requires
processing of several relocatable and shared object files to create another object file.
The dynamic linking involves the combining of an executable file with other shared object
files to create a process image.
Object files participate in program linking and execution. The object file format has different
views in these two different contexts. The file starts with a machine independent header called
ELF header which describes the remaining file organization. A linking view has a set of sections which
provide the information needed for linking such as instructions, data, symbol table, relocation
information etc. A section header table gives the information related to
these sections. An execution view has a set of segments and a program header table which provides
the information about how to create a process image.
6.2 Dynamic Library Calls
ELF executable files are two types depending on the way they are linked with the library.
A dynamically linked executable file contains references to the library functions which reside
inside the shared object files. The dynamic linker resolves these references while creating the
process
image for this executable. This enables the sharing of same library by many programs.
A statically linked executable contains all the code including
the library functions and its size may be very large.
Simulation of interactive programs is very difficult because interactive
programs interact with the operating system, devices etc. Typically programs use standard
library functions for interaction. These library calls can be diverted to the
simulator host's library calls for simulating interactive programs. For this dynamically linked
executables are more suitable as they are easy to identify in the code.
6.2.1 Handling Dynamic Calls
The Fsimg has the capability to identify the dynamic calls in the program and generates code
for Fsim which diverts these calls to the host system's library calls. Fsimg achieves this
through the external interface. The list of
dynamic calls that may be used by the program can be specified through a configuration file.
In this file,
for each dynamic function to be diverted, a corresponding user function with the parameters
and their
size is specified.
These user functions substituted for the diverted dynamic calls are linked
with Fsim. Fsimg passes the parameters to user functions by reference. The user function
extracts the parameters and call the host library function. The return value of the host
library function
should be modified and returned by the user function to Fsim. The handling of the
return value is also specified in the configuration file.
The issues of endianness is also important. The simulated processor may have different
endianness than the simulating host. In such cases the parameters which are passed to the user
function are converted to reflect the simulating host endianness and return value has to be
converted back from host endianness to the simulating processor's endianness. Fsimg detects
this
mismatch of endianness in parameters and generates code for changing the endianness of
parameters before passing to the user function.
Similarly return value is also converted. For this Fsimg needs to know the sizes of the
parameters and return value which are optionally specified in the configuration file.
In addition Fsimg needs to know the various instructions through which the simulated program
may call dynamic library functions.
This information can be given through command
line options. The call instructions typically modify the PC and save the old PC.
Using the destination address of the call instructions, Fsimg generates code for diverting the
calls.
6.2.2 Specifying Dynamic Calls
The configuration file has following syntax given in yacc style grammar. A terminal symbol
starts with a upper case letter and a non-terminal symbol start with a lower case letter.
function_definitions :
| function_definitions function_definition;
function_definition : Lib_Function => user_function;
user_function : return_variable = Function_Name ( parameters )
| Function_Name ( parameters );
return_variable : Name
| Name [ expr ]
| Name : Size
| Name [ expr ] : Size;
parameters :
| parameter_definition , parameters;
parameter_definition : parameter
: parameter : size_info;
parameter : Name
| Name [ expr ];
size_info : parameter_size
| parameter_size $ size_info;
parameter_size : No_Of_Elements - Element_Size;
expr : Name
| Name [ expr ]
| Constant
| expr + expr
| expr - expr
| expr * expr
| expr / expr;
Where,
Name : [a-zA-Z_][a-zA-Z0-9_]*
Constant : [0-9]+
Size : [0-9]+
No_Of_Elements : [0-9]+
Element_Size : [0-9]+
Lib_Function : [a-zA-Z_][a-zA-Z0-9_]*
Function_Name : [a-zA-Z_][a-zA-Z0-9_]*
The grammar specifies that configuration file is a sequence of function definitions.
Each function definition starts with the name of the dynamic function (the function called
in the binary executable file being simulated)
followed by a token ``=>''. The dynamic function is diverted to the user function specified on
the right side of ``=>''. The number of parameters to the dynamic function as well as to the
user function is the same. All parameters are specified at the right side.
According to grammar the return value specifications need not be given if the user function
does not return any value (or, value is to be discarded).
The optional size information for a variable can be given after the variable
name followed by a ``:'' token.
All parameter specifications to the user function
are given within ``('' and ``)''. If the
function does not take any parameters, the parameter information can be left out. A parameter is
given by its name followed by optional size information. Since Fsimg passes parameters by
reference, passing of structures is also possible. When passing structures the size
information gives the number of structure elements and their respective sizes. The following
example explains this.
myfunc =>
R[0] : 4 = usr_myfunc ( M[SP] : 1-4 $ 1-2 $ 2-4, M[SP+14] : 1-4 )
The above example depicts that a user function called ``usr_myfunc'' should be called instead of
simulating a call to a dynamic function ``myfunc''.
The user function takes two parameters from the stack. The return value should be
placed in register R0 whose size is 4 bytes.
The first parameter of the user function is a structure starting at address given by the register
SP in the processor being simulated.
The structure has four members with their respective sizes being 4, 2, 4, and 4 bytes. The
syntax x-y stands for x number of elements each of
size y bytes. The $ in the specification separates these element declarations.
Basically $
separates the different size member declarations. If all members of a structure are of the same
size then it can be given by a single x-y declaration, for example, 4-2 stands for four members
of two bytes each.
The parameter size information is used by Fsimg to generate code for changing the endianness
of the parameters before passing them
to the user function if the endianness of the simulating processor is not same. If no size
information is given Fsimg does not generate the code to
change the endianness of the parameters.
A sample configuration file for PowerPC 603 is shown below.
printf =>
GPR[3] = lib_printf( GPR[3], GPR[4], GPR[5],
GPR[6], GPR[7], GPR[8], GPR[9] )
memcpy =>
GPR[3] = lib_memcpy( GPR[3], GPR[4], GPR[5] )
open =>
GPR[3] = lib_open( GPR[3], GPR[4] )
read =>
GPR[3] = lib_read( GPR[3], GPR[4], GPR[5] )
The parameters to the functions are
passed through registers from GPR3 onwards and the return value is put in GPR3. The size
information for parameters is not given so Fsimg does not generate any endianness change
code for parameters.
In general if the parameters are passed through the registers then there is no need for
endianness change. This is because the registers are simulated in simulator memory.
The load instructions take care of endianness while loading them properly so there is no need
for endianness
change. Whereas the parameters in the memory (e.g. stack) needs the endianness
change, because the memory image which is loaded from file is in the endianness of the processor
being simulated.
6.3 Fsim Library
The Fsim library contains functions that are needed by the Fsim. They are implementation
of certain Sim-nML operators corresponding operators for which is not present in C.
There are also a few other miscellaneous functions.
6.3.1 Sim-nML Operators
Following are the functions for Sim-nML operators present in the library.
These operators are for integer data types only. In the current implementation of the library,
these operators do not work with floating-point arguments.
In general bit-level operators are rarely used on floating-point data types.
- OpLeftRotate - This function implements the Sim-nML left rotate operator
``<<<''.
- OpRightRotate - This function implements the Sim-nML right rotate operator
``>>>''.
- OpBitField - This function implements the Sim-nML bit-field select
operator ``< lsb .. msb >''.
- OpSetBitField - This function implements the Sim-nML bit-field operator on
the left size of expression for setting selected bits.
- OpExp - This function implements the Sim-nML exponentiation operator
``**''.
- OpBitConcat - This function implements the Sim-nML bit-concatenation
operator ``mem1 :: mem2''.
- OpSetBitConcat - This function implements the Sim-nML bit-concatenation
operator on the left side of the expression.
6.3.2 Miscellaneous Functions
There are a few miscellaneous functions needed by Fsim. The first one is
InitMem which initializes the memory before start of the simulation. The second
one is EndianChange which is used for changing the endian of data.
If the Sim-nML specification contains any canonical functions then the user has to provide
those functions also in the library.
6.4 Input Information
The Fsimg needs some information regarding the specification to generate code for Fsim.
This information is given through command line options.
Stack Size and its Direction of growth
The stack size for the program is given through the command line option `-S size'.
Where size is in kilo-bytes. The default direction of stack growth is from higher address to
lower address. This can be changed by giving the negative value to size. If this option is
not given default stack size and direction are used.
Program Counter, Stack Pointer and Current Instruction Pointer
The program counter (PC) is the variable name used in the Sim-nML description for the
program counter of the processor. The stack pointer (SP) is the variable name used for the
stack pointer. Some processors do not have any special register for stack pointer. In such
cases compiler uses one of the general purpose register as a stack pointer. In that,
case the variable name of the register as stack pointer should be given. The current instruction
pointer is a
dummy program counter used for branch instructions which use the current instruction address
as described in the section 3.3.
This information is needed for correct code generation. This information can be given by the
following command line options.
- -p program_counter_name
- -s stack_pointer_name
- -P current_instruction_pointer_name
Call instructions and Configuration file
The information about the call instructions can be given with option
`-f call-instruction-node'. Where call-instruction-node is the top and
or or-rule node for call instructions in the specification.
The configuration file for dynamic functions can be given with `-c file' option.
6.5 Constraints
Fsimg has certain limitations, due to which it puts some restrictions on writing
specifications for Fsimg.
6.5.1 Writing Specification for Fsimg
Sim-nML gives many features for specification writes which are some times difficult to
implement. Here we discuss the restrictions in writing Fsimg.
Data Types
- In Sim-nML the specification writer can use data types of any length. The Fsimg
allows only maximum length up to the size supported by the simulating host.
For example, if
one declares an integer of 128-bits, and the simulating host supports only 64-bit integers
then the Fsimg would not allow this.
- Bit operations are not allowed on floating-point data types.
- Enumerated data type of Sim-nML is not supported.
Operators
The sizes of the operands to the bit-concatenation operator when it is used on the left side
of the expression should be the natural size of the simulating host data types. That is if
the machine supports 8, 16, 32-bit integers then the arguments should only any of these sizes.
The Fsimg disallows the use of bit-field operator and bit-concatenation operator at the same
time on the left side of the expression. The following code shows the situation which is
not allowed by Fsimg.
mem VAR1 [ 1 , card ( 8 ) ]
mem VAR2 [ 1 , card ( 16 ) ]
mem VAR3 [ 1 , card ( 16 ) ]
VAR2 < 0 .. 7 > :: VAR1 = VAR3;
Aliases
The Fsimg allows use of aliases in a restrictive manner. Only byte level aliases are
supported. In other words, size of an alias should be of a multiple of 8-bits
(8, 16 etc.) and the location
to which it is aliased should be byte aligned.
Following example shows the various possible methods of aliasing.
reg AC [ 1 , card ( 32 ) ]
reg BX [ 1 , card ( 20 ) ]
mem ALIAS_1 [ 1 , card ( 8 ) ] alias = AC [ 7 ]
mem ALIAS_2 [ 1 , card ( 8 ) ] alias = AC [ 11 ]
mem ALIAS_3 [ 1 , card ( 4 ) ] alias = AC [ 3 ]
mem ALIAS_4 [ 1 , card ( 8 ) ] alias = BX [ 7 ]
In the above example two registers AC and BX are declared with sizes being 32-bits and
20-bits respectively.
Four variables are declared which are aliased to these registers. ALIAS_1 is a valid alias
definition because its size is one byte and it is aliased to the least significant byte
of AC who's size is multiple of byte size. ALIAS_2 is not a valid alias definition because
it is aliased to a location that is not byte aligned.
ALIAS_3 is also not valid because its size is 4-bits and is
aliased to invalid position. ALIAS_4 is also not valid because it is aliased to BX who's
size is not multiple of byte.
Load and Store Instructions
The main memory used in the Sim-nML description is typically byte addressable.
However, Sim-nML allows accessing multi-byte items with a given address. Following
example explains this.
type byte = int ( 8 )
mem M [ 2**16 , byte ]
reg REG [ 1 , int ( 16 ) ]
mem EA [ 1 , card ( 16 ) ]
M [ EA ] = REG;
Although M is a byte addressable according to its declaration, the effect of the above
statement is storing the REG's contents in two consecutive bytes starting from EA.
This brings in the issue of endianness even in the specification.
The Fsimg does not support this feature currently. Hence the specification writer
has to write the code for storing these bytes separately.
The registers of any machine are always big-endian, but the memory endianness
depends on the machine. The endianness has to be changed when ever a multi-byte item is
loaded from memory to a register or stored to the memory from a register. In general this
can be done at two places, when the simulation
is done the simulator dynamically converts, or in the specification itself where the
simulator would not bother about the conversion. The first method is a big overhead on the
simulator. It has to keep track of may things and has to do the conversion for each memory
access.
The second method removes this overhead form the simulator. Since the specification writer
knows the endianness of the processor he can take care of endianness while writing
specification for load and store instructions. Fsimg requires the specification writer to
take care of endianness.
This can be achieved by using byte level aliases or bit operators. For floating-point
load and store instructions use of aliases is the only possibility.
Program Termination
To terminate the simulation gracefully, the simulated program has to call exit() at
the end. Further the behaviour of exit() should be specified in the configuration file.
In absence of call to exit() library function, Fsim produced exhibit unpredicted
behaviour.