next up previous
Next: Input Binary File (ELF) Up:  Disassembler using High Level Processor Models Previous: Design and Implementation of

Design and Implementation of Disassembler

A disassembler is a tool which takes a binary file (relocatable object file, executable file etc.) as input and gives the corresponding assembly language program as output. We have designed and implemented a generic symbolic disassembler (referred to as a disassembler now onwards) which takes an ELF[18] binary file for a processor and generates the assembly language program. The disassembler is generic and processor independent. It takes a processor specification in the IR as another input. The disassembler generates symbols to refer to the locations and functions rather than the absolute addresses in the output assembly language program. Thus the output file resembles the original source from which the binary file was produced. In the output file, the format of assembler directives is the AT&T format[20] and that of the assembly language instructions is the one specified in the processor specification.

The process of disassembly involves reading a binary instruction, searching in the instruction set and generating assembly language instruction. The input binary file contains almost all (well most of) the necessary information of the original source file. Unfortunately, the process of disassembly is non-trivial as the binary file is not designed to undergo disassembly. Assemblers throw away a lot of information present in the original source which is irrelevant to the execution of the program. The greatest problem in disassembling is to identify and distinguish code (instructions) and data, as both are represented as sequence of bytes. Furthermore designing a generic disassembler involves extra effort because information about instruction set of a processor is coded in the processor specification file. Instruction set of the processor must be extracted in a format so that an instruction read from the binary file can be identified easily. In addition, information about number of instructions in the instruction set, length of an instruction, parameters in an instruction etc. varies from processor to processor. Various different processors evaluate the target address for jump instructions using bits available in the instruction in different ways which affects the design of a disassembler.

Lastly, the complexity of the symbolic disassembler is high because it uses symbols to refer to the locations. While programming, users normally use symbols (names) to refer to variables and functions. The compilers usually retain the names of functions (and global variables sometimes) in the compiled binary files. However, symbols corresponding to local variables or locations are not retained. Thus disassembler has to generate new names if not available in the binary file.

In this chapter, we shall describe the algorithm used by the disassembler for the disassembly. Basically the approach adopted is to point out what information is available and how it contributes in the generation of the final output.
 



next up previous
Next: Input Binary File (ELF) Up:  Disassembler using High Level Processor Models Previous: Design and Implementation of
Nihal chand Jain (9711113)

Fri Jan 15 11:17:08 IST 1999