BinNavi Logo

The BinNavi Database Format

BinNavi uses an SQL database to store all disassembly data. The BinNavi database format is open and tries to be a standardized way to store disassembled code in SQL databases. This part of the manual describes all necessary tables and gives a few example queries that can be used to read data from the database.

The distinction between exporting data from IDA and converting the exported data into a BinNavi module is reflected in the database too. Each BinNavi database has tables that contain the exported data (the so called raw data) and tables that contain the converted data (the so called BinNavi data). For most users only the layout and content of the raw data tables is important because this is the simpler database format, it is less likely to change in the future, and you can always rely on BinNavi itself to turn raw data into BinNavi data. For this reason only the raw data database format is described in this manual.

For each disassembled module, the exporter creates 14 new tables in the database: ex_?_address_comments, ex_?_address_references, ex_?_basic_blocks, ex_?_callgraph, ex_?_control_flow_graphs, ex_?_data, ex_?_expression_substitutions, ex_?_expression_trees, ex_?_expression_nodes, ex_?_expression_tree_nodes, ex_?_functions, ex_?_instructions, ex_?_operands, and ex_?_sections. The question mark in those table names is a placeholder for a unique ID (like 1, 2, 123, ...) that identifies the module in the database. Furthermore there is one table called modules in the database.

The Database Scheme

The modules table

The modules table contains a list of all raw modules stored in the database.

modules
Column Name Column Type Column Description
id serial NOT NULL Unique identifier of the module. This is the same identifier used for the question mark placeholder in the table names of the module.
name text NOT NULL Name of the module. In most cases this equals the name of the original input file.
architecture character varying(32) NOT NULL Architecture string that identifies the target architecture of the original input file. Right now the strings x86-32, PPC-32, and ARM-32 are officially supported.
base_address bigint The address of the original input file in memory if no relocation operation is applied to the file when it is loaded.
exporter character varying(256) NOT NULL String that identifies the exporter that exported the module.
version integer NOT NULL Version string that identifies the version of the disassembly table schema. This value must always be 3.
md5 character(32) NOT NULL MD5 hash of the original input file of the module.
sha1 character(40) NOT NULL SHA1 hash of the original input file of the module.
comment text Arbitrary comment that can be used to describe the module.
import_time timestamp Timestamp that specifies when the exporter exported the module to the database.

Additional information:

The functions table

The table ex_?_functions contains information about all functions of a module.

ex_?_functions
Column Name Column Type Column Description
address bigint NOT NULL Start address of the function.
name text NOT NULL Name of the function.
has_real_name boolean NOT NULL Flag that indicates whether the function name was generated by the disassembler (false) or whether it is a known function name, for example from an imported library (true).
type integer NOT NULL DEFAULT 0 Identifies the type of the function. Valid values are 0 (Normal Function), 1 (Library Function), 2 (Imported Function), 3 (Thunk Function), 4 (Thunk Adjustor Function), and 5 (Unknown Function).
module_name text In case of imported functions this field contains the name of the dynamic library the function is imported from. In case of functions that are not imported this field is null.

Additional information:

The basic blocks table

The table ex_?_basic_blocks contains information about the basic blocks of the module. Please note that a basic block is not uniquely identified through its address because different functions can share the same basic block.

ex_?_basic_blocks
Column Name Column Type Column Description
id integer NOT NULL Identifier of the basic block.
parent_function bigint NOT NULL Function the basic block belongs to.
address bigint NOT NULL Start address of the basic block.

Additional information:

The control flow graph table

The table ex_?_control_flow_graphs contains information about the branches between the different blocks inside each function.

ex_?_control_flow_graphs
Column Name Column Type Column Description
id serial NOT NULL Identifier of the branch.
parent_function bigint NOT NULL Function the branch belongs to.
source integer NOT NULL Source block of the branch. Note that the basic block identified by this value must be in the same function as the branch itself.
destination integer NOT NULL Destination block of the branch. Note that the basic block identified by this value must be in the same function as the branch itself.
type integer NOT NULL DEFAULT 0 Describes the type of the branch. Valid values are 0 (conditional branch executed), 1 (conditional branch not executed), 2 (unconditional branch), and 3 (switch branch).

Additional information:

The instructions table

The table ex_?_instructions contains all disassembled instructions of the module and the basic blocks they belong to.

ex_?_instructions
Column Name Column Type Column Description
address bigint NOT NULL Address of the instruction.
mnemonic character varying(32) NOT NULL Mnemonic of the instruction.
data bytea NOT NULL Raw data bytes of the instruction.

Additional information:

The operand table

The table ex_?_operands describes all operands of an instruction.

ex_?_operands
Column Name Column Type Column Description
address bigint NOT NULL Identifies the instruction an operand belongs to.
expression_tree_id integer NOT NULL Identifies the expression tree of the operand.
position integer NOT NULL Gives the position of the operand. The value 0 means that the operand is the first operand of the instruction and so on.

Additional information:

The expression trees table

The table ex_?_expression_trees contains the unique identifiers of the expression trees that can appear as operands.

ex_?_expression_trees
Column Name Column Type Column Description
id serial NOT NULL Identifies an expression tree.

The expression nodes table

The table ex_?_expression_nodes contains the expression tree nodes used to build the operand trees for each operand.

ex_?_expression_nodes
Column Name Column Type Column Description
id serial NOT NULL Identifies an expression.
type integer NOT NULL DEFAULT 0 Describes the type of the expression. Valid values are 2 (Integer Literal), 3 (Float Literal), 4 (Operator), 5 (Register), 6 (Size Prefix), and 7 (Memory Dereference).
symbol haracter varying(256) Unless the expr_type value is an integer, this column contains the value of the operand.

Valid values for operators are all kinds of infix operators like '+', '-', and '*', as well as prefix operators like x86 selectors ('ss:', 'cs:', ...).
Valid values for size prefixes are b1 (byte), b2 (word), b4(dword), and b8 (qword).
The only valid value of memory dereferences is '['.
immediate bigint If the expr_type value is an integer literal, this column contains the integer value of the literal.
position integer Specifies the position of the expression in the operand. This is useful to sort members of the expression tree (for example in binary expressions like addition).
parent_id integer Parent expression of the expression or null if the expression is a root expression of the expression tree.

Additional information:

The expression tree nodes table

The table ex_?_expression_tree_nodes specifies what expression tree nodes belong to what expression tree.

ex_?_expression_tree_nodes
Column Name Column Type Column Description
expression_tree_id integer NOT NULL Identifies an expression tree.
expression_node_id integer NOT NULL Identifies an expression tree node.

Additional information:

The expression substitutions table

The table ex_?_expression_substitutions contains alternative strings for operand expressions. This is useful to display constants or variable names instead of raw expressions.

ex_?_expression_substitutions
Column Name Column Type Column Description
id serial NOT NULL Identifies the replacement operation.
address bigint NOT NULL Instruction that contains the expression to be replaced.
position integer NOT NULL Position of the operand that contains the expression to be replaced.
expression_node_id integer NOT NULL Expression to be replaced.
replacement text NOT NULL Alternative text for the expression.

Additional information:

The address references table

The table ex_?_address_references provides information about offsets of the module referenced by instruction operands. Instruction operands can either reference other instructions or arbitrary offsets of the module.

ex_?_address_references
Column Name Column Type Column Description
address bigint NOT NULL Identifies the instruction the operand belongs to.
position integer Identifies the operand that references another address.
expression_node_id integer Identifies the exact expression that references another address.
destination bigint NOT NULL The memory address referenced by the expression.
type integer NOT NULL DEFAULT 0 Describes the reference. Valid values are 0 (conditional branch executed), 1 (conditional branch not executed), 2 (unconditional branch), 3 (switch branch), 4 (direct call), 5 (indirect call), 6 (indirect virtual call), 7 (data), and 8 (data string).

Additional information:

The address comments table

The table ex_?_address_comments contains comments associated with addresses of the module.

ex_?_address_comments
Column Name Column Type Column Description
address bigint NOT NULL Address associated with the comment.
comment text NOT NULL The comment text.

Additional information:

 The callgraph table

The table ex_?_callgraph describes how the individual functions of a module call each other. Each row of the table describes a single function call.

ex_?_callgraph
Column Name Column Type Column Description
serial NOT NULL unsigned int not null Identifies the function call.
source bigint NOT NULL Start address of the function that contains the function call.
source_basic_block_id integer NOT NULL Basic block that contains the function call.
source_address bigint NOT NULL Function call instruction.
destination bigint NOT NULL Start address of the called function.

Additional information:

Example Queries

This section contains some sample queries for working with the database scheme. Some expectations about the disassembled module and how it is stored in the database lead to pitfalls that should be avoided.

Load all module names stored in the database

mysql> select id, name from modules;

+----+-----------------+
| id | name            |
+----+-----------------+
|  1 | NOTEPAD.EXE     |
|  2 | calc.exe        |
|  3 | kernel32.dll    |
+----+-----------------+

Load all function names of calc.exe

mysql> select hex(address), name from ex_2_functions; +--------------+------------------+ | hex(address) | name | +--------------+------------------+ | 1001000 | RegOpenKeyExA | | 1001004 | RegQueryValueExA | | 1001008 | RegCloseKey | | 1001010 | SetBkColor | | 1001014 | SetTextColor | | 1001028 | GetProcAddress | | 1001030 | GlobalAlloc | | 1001034 | GlobalFree | | 1001038 | GlobalReAlloc | | 1001040 | Sleep | ...

Count the number of basic blocks in calc.exe

mysql> select count(distinct(address)) as bbcount from ex_2_basic_blocks;
+---------+
| bbcount |
+---------+
|    2141 |
+---------+

Note that the distinct is necessary because one basic block can belong to more than one function. These blocks should not be counted more than once.

Load instructions of a function

It is possible to load all instructions of a function including all of their operands in just one SQL query.

select
	hex(ex_2_instructions.address),
	mnemonic,
	symbol,
	immediate,
	ex_2_operand_tuples.position,
	ex_2_expression_tree.id,
	ex_2_expression_tree.parent_id,
	ex_2_expression_tree.position
from
  ex_2_basic_blocks,
	ex_2_instructions,
	ex_2_operand_tuples,
	ex_2_operand_expressions,
	ex_2_expression_tree
where
  ex_2_basic_blocks.parent_function = 0x1011569 and
  ex_2_basic_blocks.id = ex_2_instructions.basic_block_id and
	ex_2_instructions.address = ex_2_operand_tuples.address and
	ex_2_operand_expressions.operand_Id = ex_2_operand_tuples.operand_id and
	ex_2_expression_tree.id = ex_2_operand_expressions.expr_id
order by
	ex_2_instructions.address,
	sequence,
	ex_2_operand_tuples.position,
	parent_id
	
+---------+----------+--------+-----------+----------+------+-----------+----------+
| address | mnemonic | symbol | immediate | position | id   | parent_id | position |
+---------+----------+--------+-----------+----------+------+-----------+----------+
| 1011569 | push     | b4     |      NULL |        0 |    1 |      NULL |        0 |
| 1011569 | push     | esi    |      NULL |        0 |   12 |         1 |        0 |
| 101156A | push     | b4     |      NULL |        0 |    1 |      NULL |        0 |
| 101156A | push     | ss:    |      NULL |        0 |    3 |         1 |        0 |
| 101156A | push     | [      |      NULL |        0 |    4 |         3 |        0 |
| 101156A | push     | +      |      NULL |        0 |    5 |         4 |        0 |
| 101156A | push     | esp    |      NULL |        0 |   59 |         5 |        0 |
| 101156A | push     | NULL   |        12 |        0 |   53 |         5 |        1 |
| 101156E | mov      | b4     |      NULL |        0 |    1 |      NULL |        0 |
| 101156E | mov      | esi    |      NULL |        0 |   12 |         1 |        0 |
| 101156E | mov      | b4     |      NULL |        1 |    1 |      NULL |        0 |
...