Initialisation

Finding the initialisation file
Processing the initialisation file
Conditional processing
Lexical elements
Numbers
Set constant
Data display attributes
Map definitions
Structure definitions
Field definitions
Alignment declarations
Include directives
Macros
Reserved words
A sample initialisation file
The supplied initialisation file
BNF definitions

Finding the initialisation file

One of the first things BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.

Under 32 bit OS/2, Windows and 32 bit DOS, BE finds the initialisation file by searching along the path for the .EXE file, and then looking for a .INI file with the same name.

BE for NetWare looks for a .INI in the same directory as the .NLM file.

Under UNIX, BE looks for ~/.berc, and failing that, it looks along the path for be and then appends .ini. If be is renamed to xx, then the files will be ~/.xxrc and xx.ini.

BE can be made to look elsewhere using the -i command line option.

By the time the initialisation file is processed, any symbol files specified on the command line will have been loaded, along with any data files. This means that initialisation files may make reference to symbols and also to the data itself.

Processing the initialisation file

As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts. BE only generates warnings for the first 100 undefined symbols.

Conditional processing

Also, $define, $undef, $ifdef, $ifndef,$else, $endif and $error are supported, as a form of a pre-processing/conditional processing step. The -D command line option may be used to pre-$define such conditional processing symbols.

It should be noted that $define, $undef, $ifdef and $ifndef can all be given a list of symbols (rather that just one). This causes $define or $undef to define or undefine all the symbols in the list. It causes $ifdef or $ifndef to check that all the symbols in the list are defined or that they are all undefined.

De-Morgans law can be used to acheive OR combinations :-

$ifndef A B C
  // None of A, B or C is $defined
$else
  // This part is therefore if A, B or C is $defined
$endif

Platform specific $defines

The following pre-$defines allow you to write initialisation files with sensible defaults, relevant for the current environment :-

If you are running on then this is pre-$defined
32 bit DOS DOS
32 bit OS/2 OS2
NetWare NETWARE
Windows WIN32
any kind of UNIX UNIX
Linux LINUX
AIX AIX
HP-UX HP
SunOS SUN
Cygwin CYGWIN
MacOSX MACOSX
a system where BE is 64 bit BE64
a little-endian system LE
a big-endian system BE

If you are running on	then this is pre-`$define`d
32 bit DOS	`DOS`
32 bit OS/2	`OS2`
NetWare	`NETWARE`
Windows	`WIN32`
any kind of UNIX	`UNIX`
Linux	`LINUX`
AIX	`AIX`
HP-UX	`HP`
SunOS	`SUN`
Cygwin	`CYGWIN`
MacOSX	`MACOSX`
a system where BE is 64 bit	`BE64`
a little-endian system	`LE`
a big-endian system	`BE`

BE is 64 bit on systems with C++ compilers supporting long long or __int64 data types. On such systems, BE's numbers and its address space are 64 bit. This is basically all platforms except OS/2.

Lexical elements

This initialisation file may contain C or C++ style comments.

Numbers may be given in binary, octal, decimal or hex, as the following examples, all of which represent 13 decimal :-

0b1101, 0o15, 13, 0x0d

Numbers may also be given in character form. Multiple characters may be given to form a number, and this is quite handy because sometimes files/datastructures use magic numbers which are formed to out of characters so as to be eye-catching. Characters may be quoted, similar to traditional C/C++ style :-

'a'    = 0x61
'ab'   = 0x6162
'abc'  = 0x616263
'abcd' = 0x61626364
'\n'   = 10
'\x34' = 0x34           always 2 hex digits after \x
'\040' = 32             always 3 octal digits after \ (unlike C/C++)
'\0'                    isn't legal, must be 3 octal digits
'\000'                  isn't legal, 0 is the string terminator

Strings may be given in traditional C/C++ style too :-

"Hello World"
"One line\nAnother line"
"String with a tab\tin the middle"
"String with funny character at the end\x9a"
"String using octal \377 notation to get character 255 in it"
"String with \000 string terminator in it"                isn't legal
"String which starts on one line \
and continues on another"

Strings can be no more than 250 characters long.

Note that all strings used in the BE initialisation file must be 'clean', in that they can only contain the regular ASCII characters within the range ' ' to '~' (ie: 32 to 126 inclusive). Given this, you may ask why BE allows the escaped character notation: Well, strings can also be typed in by the user when interactively editing data, and it is very useful to allow a way to type non-ASCII characters.

When displaying strings, BE typically makes best possible use of the terminal in use, and may show the glyphs for unusual non-ASCII characters if it can. However, non-displayable characters are simply shown as '.'s.

Identifiers start with an underscore or a letter, and continue with more underscores, letters or digits. Some identifiers are actually reserved words in the BE initialisation file language.

The fact the NULs aren't allowed within BE strings is a rather irritating side effect of the way BE is implemented using traditional C/C++ NUL terminated strings. Perhaps one day I'll fix this.

BE has a way of allowing you to build up identifiers, strings and buffers in peices, using the concat concatenation operator.

set ABCDEF 123
set ABC concat DEF 123   // is the same as the above

In addition, the id2str function allows a way of making a string from an identifier.

Numbers

Wherever the initialisation file calls for a number, the following variants may be used :-

number

Just a number, as given in the Lexical Elements section above.

addr "symbolinthesymboltable"

if a symbol table is loaded, and the symbol can be found then the result is the numeric value of the symbol. Otherwise a warning is generated, and the result is the value of the constant nosym, or if that isn't defined, its ~0.

sizeof DEFN

this gives the size in bytes of the earlier defined definition called DEFN. If DEFN isn't already defined, then an error results.

offsetof DEFN "fieldname"

this gives the offset in bytes of the given field in the earlier defined DEFN. If DEFN isn't already defined, then an error results. If the field can't be found in the DEFN, then an error results.

valof "fieldname"

When displaying a list of fields within a definition, you can refer to the value of a field using this notation.

map MAPNAME "mapletstring"

this gives the numeric value that corresponds to the given string defined in the map defintion, as explained below.

identifier

BE tries the following steps in order to find a value for the identifier :-

If you are displaying a list of fields within a definition, then BE looks to see if the identifier matches a numeric field.
ie: BE trys valof "identifier".
Then BE scans its internal list of numeric constants (which can be defined with the set command).
Then BE will then try to look up the name in the symbol table.
ie: BE trys addr "identifier".
After this, BE will scan all the map definitions to see if the identifier matches exactly one maplet name.
ie: BE trys map M "identifier" for all M. This step isn't particularly quick, as there can be a very large number of mappings

Obviously you will never be able to refer to a field, symbol or maplet with white space in its name by this shorthand mechanism. Use addr, valof or map to do this. Also, using the explicit forms is more efficient, as BE needn't look in all the possible places, as it does above. The reason BE looks in all the places when just the identifier is given is to reduce typing when using BE interactively.

` identifier expression `

The value of the identifier if defined, or the value of the expression if not.

. (dot)

When defining a DEFN, dot evaluates to the current offset. When prompted for an address in the @ command, it is the current address. When prompted for a delta value, dot is the current delta. When using the = to change a numeric value, dot is the current value. When specifying a value in a maplet, dot means the previous value plus one, or zero if this the first maplet. When specifying a mask in a maplet, dot means the maplet value.

[ type bits attributes ; address ; defaultvalue ]

This trys to fetch a numeric datum of the given type (eg: n32), to take into account the given attributes (eg: signed be), from the given address. If nothing can be fetched from that address, then the result is the defaultvalue. If the defaultvalue is omitted, then the expression cannot be evaluated.

[[ buff ; e0 ; e1 ; e2 ; defaultvalue ]]

This loops through some addresses trying to match the pattern specified. The loop is basically of the form for ( a = e0; a != e1; a += e2 ) match(buff,a). Be careful using this, there is no way to abort the scan. If the search doesn't locate the pattern, then the result is the defaultvalue, unless it has been omitted, in which case the expression cannot be evaluted.

strlen(ADDR)

This loops starting at address ADDR fetching bytes until it fetches a 0 byte. It returns the number of bytes prior to the 0. Consider it analogous to the C/C++ strlen function.

sym_base(EXPR)

Looks up the expression in the symbol table. If the address can be converted into symbol+offset form, this returns the address of the symbol, without the offset. Otherwise the expression cannot be evaluated.

sym_offset(EXPR)

Looks up the expression in the symbol table. If the address can be converted into symbol+offset form, this returns the offset from the symbol. Otherwise the expression cannot be evaluated.

sum(type bits attributes , address , n , step , defaultvalue )

Computes the sum of the n numeric values, starting at the given address, stepping by step. The step is the size of the type (1 byte for n8, 2 for n16 etc.), if omitted. If the defaultvalue is given, and all the memory cannot be fetched, it is returned as the result, otherwise an error results. This is great for getting BE to calculate checksums.

xor(type bits attributes , address , n , step , defaultvalue )

Like sum, except it computes the xor of the values. This is great for getting BE to calculate LRC values. Sorry, no CRC or ECC checking features, yet.

Note that the semicolons in the [ and [[ expressions can be omitted, although this is not recommended. Consider the expression [n32 0xf000 -5]: this looks like it means 'the 32 bit word from address 0xf000, or -5 if it can't be fetched', but it actually means 'the 32 bit word at 0xeffb, with no default if it can't be fetched'. Writing [n32 0xf000 (-5)] would fix this problem, but using semicolons or commas makes the intention explicit.

It should be noted that when using the offsetof or map keywords, leading and trailing space is not significant in the "mapletstring" or "fieldname".

Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings. Operators grouped together have equal precedence. Higher precedence operators listed first :-

+, -, ~, ! unary plus, unary minus, complement, not
*, /, % multiply, divide, modulo
+, - add (plus), subtract (minus)
<<, >>, >>> shift left, shift right (signed), shift right (unsigned) [Note 1]
>, <, >=, <= greater than, less than, greater than or equal, less than or equal
==, != equal, not equal
& bitwise AND
^ bitwise exclusive OR
| bitwise inclusive OR
&& logical AND
^^ logical exclusive OR [Note 2]
|| logical inclusive OR
? : conditional expression

Note 1: The >> is a signed shift right, and >>> is the unsigned shift right (much like Java). This distinction is necessary as all numbers in BE expressions are unsigned. (This affects affects the outcome of expressions like -2/2 which is 0xfffffffffffffffe/2 which is 0x7fffffffffffffff, rather than the -1 you might expect).

Note 2: C/C++ does not have a logical exclusive OR, but BE does for symmetry.

Note also that the operator precedence now matches that of C++. Versions of BE prior to 1/7/99 had incorrect precedence for the shift operators. Luckily people tend to use brackets with these anyway.

When an identifier or valof identifier is used to refer to a field in a definition, which is an array, the [ array-subscript ] suffix may be used to pick which element.

When an identifier or valof identifier is used to refer to a field in a definition, which is a definition itself, the . sub-field suffix may be used to refer to a nested field.

When an identifier or valof identifier is used to refer to a field in a definition, which is a pointer to some other definition, the -> sub-field suffix may be used to refer to a field in the definition pointed to.

Such numeric expressions can also be used when BE prompts for a number, not just in the initialisation file.

Some example expressions :-

Expression Explanation
addr "tablebase" + 4 * sizeof RGB
symbol tablebase plus four times the size of the RGB definition
[ n32 be ; 0x70200+0x44 ] + 27
fetch big-endian 32 bit word from 0x70244, then add 27
[ n16 be bits 11:4 ; 0x1000 ]
get big-endian 16 bit word from 0x1000, extract bits 11 to 4 inclusive
if the word was 0x1234, this would give a result of 0x23
[[ "SIGNATURE" ; 0x1000 ; 0x2000 ; 4 ]]
locate "SIGNATURE" between 0x1000 and 0x2000, 4 byte aligned
sum(n32 be, 0x1000, 3)
sum of the 3 big-endian 32 bit words at 0x1000, 0x1004 and 0x1008
xor(n8, 0x2000, 3, 2, 0x55)
xor of the 3 bytes at 0x2000, 0x2002 and 0x2004
if a byte cannot be fetched, the result is 0x55
xor(n16 le bits 15:12, 0x3000, 2, 0x70)
xor top 4 bits of two words
[ n16 le bits 15:12 ; 0x3000 ] ^ [ n16 le bits 15:12 ; 0x3070 ]
xor top 4 bits of two words, ie: same as the above

Expression	Explanation
addr "tablebase" + 4 * sizeof RGB	symbol tablebase plus four times the size of the RGB definition
[ n32 be ; 0x70200+0x44 ] + 27	fetch big-endian 32 bit word from 0x70244, then add 27
[ n16 be bits 11:4 ; 0x1000 ]	get big-endian 16 bit word from 0x1000, extract bits 11 to 4 inclusive if the word was 0x1234, this would give a result of 0x23
[[ "SIGNATURE" ; 0x1000 ; 0x2000 ; 4 ]]	locate "SIGNATURE" between 0x1000 and 0x2000, 4 byte aligned
sum(n32 be, 0x1000, 3)	sum of the 3 big-endian 32 bit words at 0x1000, 0x1004 and 0x1008
xor(n8, 0x2000, 3, 2, 0x55)	xor of the 3 bytes at 0x2000, 0x2002 and 0x2004 if a byte cannot be fetched, the result is 0x55
xor(n16 le bits 15:12, 0x3000, 2, 0x70)	xor top 4 bits of two words
[ n16 le bits 15:12 ; 0x3000 ] ^ [ n16 le bits 15:12 ; 0x3070 ]	xor top 4 bits of two words, ie: same as the above

Set constant

BE maintains a smallish list of global numeric constants. eg:

set num_elements 14+5

Avoid using constant names which clash with other identifiers, such as map or structure definition names. Also, avoid clashing with reserved words in the initialisation file language.

The constant can be assigned any numeric expression, including referencing other constants.

This feature allows initialisation files with the following technique for managing multiple configurations of data :-

$ifdef BIG_DATA_FILE
set n_entries 100
$else
set n_entries 10
$endif

def DATA_RECORD
  {
  n_entries n32 buf 100 asc "names"
  n_entries n32 dec         "salaries"
  }

Attempting to set a constant which is already defined produces an error.

The unset command can be used to undefine a previous value. It is not an error to unset a constant which is not previously set to anything :-

set elems 100
unset elems
set elems 200

The -S command line flag can be used to set a constant before the initialisation file is processed. Because the constant is set before the initialisation file is processed, the expression the constant is set to can't refer to things within the initialisation file. Assuming the initialisation file debinfo.ini uses a constant called tabsize :-

be -i debinfo.ini -S tabsize=10   debug.dat              is fine
be -i debinfo.ini -S tabsize=10+4 debug.dat              is fine
be -i debinfo.ini -S "tabsize=sizeof STRUCT" debug.dat   is illegal

The value of a constant may be interactively set, changed or unset by the user using the $ keystroke.

The special constant nosym if set, is returned when the addr "symbol" syntax is used in an expression, to try to determine the numeric value of a symbol which isn't defined. The usual use of this is in defining a value which is miles away from any sensible value.

The special constant disp_limit if set, affects the way BE displays address values in symbol+offset form. If the offset (ie: the displacement) from the symbol exceeds the disp_limit value, then the address isn't displayed in symbol+offset form.

Data display attributes

When the program starts parsing the initialisation file, the default data display attributes are le unsigned hex nomul abs nonull nocode nolj noglue noseg nozterm.

To change this default setting, just include one or more of the following keywords in the file :-

be - read multibyte values from memory in a big-endian fashion.
le - read multibyte values from memory in a little-endian fashion.
signed - when fetching numeric values sign extend them, and when displaying numerically show '+signedvalue' or '-signedvalue'.
unsigned - when fetching numeric values zero extend them, and when displaying numerically show 'unsignedvalue'.
asc - set display mode to ASCII.
ebc - set display mode to EBCDIC.
bin - set display mode to binary.
oct - set display mode to octal.
dec - set display mode to decimal.
hex - set display mode to hex.
time - set display mode to time (decode seconds since epoch, using ctime(1)).
sym - set display mode to symbolic. ie: look up the value in the symbol table, and if found, display symbol+hexoffset, else display value in hex.
null - allow following of 0 pointers.
nonull - disallow following of 0 pointers.
seg - cope with 16:16 segmented pointers.
noseg - pointers are not segmented.
mul - pointer values should be multiplied by the size of the data type being pointed to.
nomul - pointer values are given in regular byte addresses.
abs - pointer values are absolute.
rel - pointer values are to be considered relative to their own addresses.
code - specify that numeric value is actually a code address.
nocode - specify that numeric value is not a code address.
lj - perform ARM specific long-jump interpretation of code addresses.
nolj - don't do long-jump interpretation.
glue - perform PowerPC specific pointer glue interpretation of code addresses. Pointer glue interpretation cannot be performed on expressions fields.
noglue - don't do pointer glue interpretation.
zterm - stop displaying buf data when a nul terminator is reached.
nozterm - display data beyond nul terminators.

Note that when multibyte numeric values are displayed in ASCII or EBCDIC, the ordering of the characters produced works like this :-

Type Sample value Displays in ASCII
n8 0x41 'A'
n16 0x4142 'AB'
n24 0x414243 'ABC'
n32 0x41424344 'ABCD'
n40 0x4142434445 'ABCDE'
n48 0x414243444546 'ABCDEF'
n56 0x41424344454647 'ABCDEFG'
n64 0x4142434445464748 'ABCDEFGH'

Type	Sample value	Displays in ASCII
`n8`	`0x41`	`'A'`
`n16`	`0x4142`	`'AB'`
`n24`	`0x414243`	`'ABC'`
`n32`	`0x41424344`	`'ABCD'`
`n40`	`0x4142434445`	`'ABCDE'`
`n48`	`0x414243444546`	`'ABCDEF'`
`n56`	`0x41424344454647`	`'ABCDEFG'`
`n64`	`0x4142434445464748`	`'ABCDEFGH'`

Support for >32 bit numbers is only present in certain operating systems versions of BE.

This can have the side effect that when people design eye-catcher values as numbers to store into memory, they may appear reversed when displayed. In such cases, it might make more sense to decode the field as a N byte ASCII buffer, rather than a number. Alternatively, use the big-endian designation, as in n32 be etc..

Map definitions

Mappings are BE's equivelent to C enumerated types and bitfield support.

These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-

map compression_type
  {
  "uncompressed" 1
  "huffman"      2
  "lzw"          3
  }

If the numeric value on display matches the value given, then it can be converted to the textual description.

Mappings in which the values are one bigger than the previous one are quite common. So BE gives a shorthand, where . in the value means 0 for the first maplet given after the open curly brace, and one plus the previous value otherwise :-

map ordinals
  {
  "zero" .
  "one"  .
  "two"  .
  }

map larger_ordinals
  {
  "four" 4
  "five" .
  "six"  .
  }

Bitfields may be acheived in the following fashion :-

map pending_events
  {
  "reconfiguration" 0x0001 : 0x0001
  "flush_cache"     0x0002 : 0x0002
  "restart_io"      0x0004 : 0x0004
  }

The : symbol introduces an additional mask. The number to string conversion algorithm inside BE works like this :-

for each maplet in the map
  if ( value & maplet.mask ) == maplet.value then
    display the maplet.name
if some unexplained bits left over then
    display the remaining value in hex

The case where the value and following mask are the same is much more common than the case where they are not. So BE provides a typing shortcut where . in the mask means 'the same as the value'. So the above example can be written :-

map pending_events
  {
  "reconfiguration" 0x0001 : .
  "flush_cache"     0x0002 : .
  "restart_io"      0x0004 : .
  }

It is possible to have multiple field decodes from a single value :-

map twobitfields
  {
  "green" 0x0001 : 0x000f
  "blue"  0x0002 : 0x000f
  "red"   0x0003 : 0x000f
  "small" 0x0100 : 0x0f00
  "large" 0x0200 : 0x0f00
  }

The value 0x0243 would be converted to red|large|0x40.

It has been alluded to above, that when supplying numeric expressions, the map keyword may also be used. In the following example, the expression evaluates to 0x0105 :-

map twobitfields "small" + 5

In fact, if there is no constant or symbol with the same name, you can use the following shorthand for the above example :-

small + 5

Even sophisticated mappings like the following will work as expected :-

map attribute_byte
  {
  "colour" 0x10 : 0xf0
  "red"    0x13 : 0xff
  "green"  0x14 : 0xff
  "shape"  0x20 : 0xf0
  "round"  0x23 : 0xff
  "square" 0x24 : 0xff
  }

In this example the meaning of the bottom 4 bits is dependent on the value of the top 4 bits. The top 4 bits encode whether the attribute is encoding information about the colour or shape of something, and the bottom 4 bits encode which colour or shape. The value 0x23 is displayed as "shape|round".

Sometimes it can be convenient to add to the definition of a mapping. This can be done via the add keyword, as follows. An example :-

map animals { "dog" 1 "cat" 2 }
map animals { "giraffe" 3 }        // Error, redefinition of map animals
map animals add { "zebra" 4 }      // Okay, extends map animals
map birds add { "pelican" 5 }      // Error, no map birds to extend

When displaying a maplet decoded value, the M key can be used to bring up a list of the maplets and whether they decode or not. Through this, the value can be edited.

You can use the suppress keyword to prevent BE using a maplet when converting a number to a string. Not normally used, but can sometimes be handy to cut down screen clutter, as a number is normally displayed in less space. In the following example, 0xc3, bright flashing blue, is shown as "blue|0xc0". Maybe we are only interested in the colour :-

map obscure_mapping
  {
  "bright" suppress 0x80 : .
  "flash"  suppress 0x40 : .
  "red"             0x01 : 0x3f
  "green"           0x02 : 0x3f
  "blue"            0x03 : 0x3f
  }

Much more common is to interactively suppress maplets from the M maplet list using the @S and @N keys.

Structure definitions

Definitions are BEs equivelent to C structures and unions.

Definitions are a list of at OFFSET clauses, align ALIGNMENT clauses, attribute definitions, and field definitions. When the structure definition is processed, then the current-offset is initialised to 0. Also, all fields take on the default display attributes, unless the default is changed for the definition, or a field specifies its own attributes.

An at OFFSET clause moves the current-offset to the specified numeric value.

An align ALIGNMENT clause moves the current-offset to be the next integer multiple of the specified numeric value.

A field definition defines a field which lives at the current-offset into the structure. After definition of the field, the current-offset is moved to the end of the field, so that the next field will immediately follow it (unless another at OFFSET clause is used, or a union is being defined).

The size of the structure is the largest value that the current-offset ever attains. This is the value returned whenever sizeof DEFN is used as a number.

Duplicate definitions of the same named definition are not allowed.

A structure definition may have zero or more fields, align ALIGNMENT clauses and/or at OFFSET clauses.

A structure definition may behave like a C struct definition, in that each field follows on from the previous one in memory. Or it may behave like a C union definition, in that all fields overlay each other in memory, and the total size is the size of the largest field.

def A_STRUCTURE struct
  {
  n32 "first field, bytes 0 to 3"
  n32 "next field, bytes 4 to 7"
    // sizeof A_STRUCTURE is 8
  }

def A_UNION union
  {
  n32 "first field, bytes 0 to 3"
  n16 "second field, bytes 0 to 1"
    // sizeof A_UNION is 4
  }

The keyword struct is unnecessary, and may be omitted.

These may be combined, like in the following :-

def MY_COMPLICATED_STRUCTURE
  {
  n32 "first field, occupying bytes 0 to 3"
  union
    {
    n32 "second field, occupying bytes 4 to 7"
    struct
      {
      n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5"
      n8  "the upper middle byte, occupying byte 6"
      n8  "the top byte, occupying byte 7"
      }
    }
  }

The at OFFSET clause also allows the same areas of a structure to be displayed in more than one way, thus also allowing the implementation of unions :-

def UNION_THE_HARD_WAY
  {
  n32 le  "first value, bytes 0 to 3"
  at 0 n8 "the lower byte, byte 0"
    // sizeof UNION_THE_HARD_WAY is 4
  }

Note: in the above style of example, you can't use the offsetof keyword to position a new field on top of an earlier field, because whilst you are defining a structure definition, it isn't actually fully defined yet, and so the offsetof keyword will not be able to find it.

Each clause can be terminated or seperated with a ;, although normally this isn't necessary. One example of where it is required is :-

def WONT_BEHAVE_AS_EXPECTED
  {
  n8     "first"
  align 4          // #1
  +5 n16 "array"
  }

The lack of a ; at #1 causes BE to interpret this as align 9, followed by a single n16 field.

The attributes a field gets can come from the global default settings, the default settings within a brace delimeted set of fields, or from the field itself, an example of how attributes work is given :-

hex                 // set hex as global default
def SOMETHING
  {
  n16 "a"           // will be shown as hex
  n16 dec "b"       // will be shown as decimal
  oct               // octal becomes default, until } at #1
  n16 "c"           // will be shown in octal
    {
    n16 "d"         // will also be octal
    dec             // default is decimal, until } at #2
    n16 "e"         // will be decimal
    }               // #2
  n16 "f"           // reverts to octal
  }                 // #1

In BE dated 2014-01-16 onwards, a definition can be annotated with nocode. BE assumes the field offsets into the definition are addresses and will not disassemble instructions at those addresses. It will show the field instead :-

def main nocode
  {
  at addr "timer_ticks" n32 "thousands of a second since boot"
  }

Field definitions

Here are some examples of field definitions :-

n8 asc "initial"
buf 20 "surname"
n16 be unsigned dec "age"
3 pet "pet names"
3 n16 be unsigned dec "pet costs"
2 n32 le unsigned hex ptr person "2 pointers to parents"
2 n32 ptr person null "2 pointers, null legal"
person "a person"
n32 sym code "__main"
1024 n32 unsigned dec "memory as 32 bit words"
9 n16 map errorcodes "results"
buf 100 asc zterm "a C style string"
GENERIC_POINTER suppress "pointer"
n32 ptr FRED add -. "link"
n32 bits 31:28 "top 4 bits"
n32 bits 27:0  "bottom 28 bits (of another word)"
n32 sym code width 10 "function"
n32 time "last_update_time"

Each example is of the form :-

optional-count type optional-attrs name

The field describes count data items of the specified type. If count is not 1, then the field is initially displayed by just showing its type (eg: 10 n32 le unsigned hex "numbers"). When you select the field, you are presented with an element list, with count lines, from which you can select the element you are interested in. Effectively BE considers all fields to be arrays. Its just that most of them are elements of size 1 element.

The type of the data is one of n8, n16, n24, n32, n40, n48, n56, n64, buf N or DEFN, where DEFN is the name of a previously defined definition. This type may be considered to be the way in which BE is told the size of the data item concerned.

n8, n16, n24, n32 n40, n48, n56, n64 mean 8, 16, 24, 32, 40, 48, 56 or 64 bit numeric data item.

Support for >32 bit values is only present in certain operating systems versions of BE.

buf N means a buffer of N bytes.

There is also a special expr E type, which defines a 'field' whose value is the result of the expression E. The expression E may be any expression and may even refer to other fields in the definition. The . symbol evaluates to the address of the field. Obviously you can't edit/change the value of an expression. So the following sort of thing becomes possible :-

def RECTANGLE
  {
  n8                  dec "width"
  n16 be              dec "height"
  expr "width*height" dec "area"
  }

With a bit more imagination, you can write things like the following, which not only displays the data, but checks the checksum too :-

def DATA_PACKET
  {
  expr "(sum(n8,.,100)&0xff)==checksum" "valid_data_packet"
    // Note, '.' in the expression is the current address
  buf 100                               "data"
  n8                                    "checksum"
  }

The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.

In addition to the data display attribute keywords given above is the map MAP attribute which means display the numeric field by looking up a textual equivelent of the numeric value using the mapping which must have previously been defined.

If the field is one of n8, n16, n24, n32, n40, n48, n56, n64 or expr the bits MS:LS designation can be used to say that only a subset of the bits fetched are to be displayed. Also, if you edit the field, only the subset of bits are changed. BE does a read-modify-write of the numeric field to acheive this. Despite only showing a subset of the bits, the field is still the same 'size', and the union mechanism must be used to decode multiple bit ranges in the same numeric field. eg:

union
  {
  n16 be bits 15:12 bin "top 4 bits"
  n16 be bits 11: 0 hex "bottom 12 bits"
  }

The bits notation is handy even when applied to expressions, despite the fact you can always write an expression which computes the equivelent value (after all, selecting certain bits for display is just an 'and' and optional 'shift-right'). This is because when bits is used, BE knows how many bits will be in the result, and can use fewer characters to display it. Consider :-

def SOME_STRUCTURE
  {
  hex
  n8 "a"                          // if this is    0x12
  n16 "b"                         // and this is 0x3456
  expr "(a+b)&0xfff" "total"      // this shows as   0x00000468 (32 bit BE)
                                  // or even 0x0000000000000468 (64 bit BE)
  expr "(a+b)" bits 11:0 "total"  // but this shows as    0x468
  }

Array indexing and refering to nested fields is possible within expressions.

When an identifier or valof identifier is used to refer to a field in a definition, which is an array, the [ array-subscript ] suffix may be used to pick which element.

When an identifier or valof identifier is used to refer to a field in a definition, which is a definition itself, the . sub-field suffix may be used to refer to a nested field.

def RECT
  {
  n16 "w"
  n16 "h"
  }

def OUTER
  {
  10 n32                 "values"
  expr "values[1+3]"     "the fifth value"
  RECT                   "rect"
  expr "rect.w"          "the width of rect"
  10 RECT                "rects"
  expr "rects[2].h"      "the height of the third rectangle"
  n32 ptr RECT           "rect_ptr"
  expr "rect_ptr->w"     "the width of the rectangle pointed to"
  }

The ptr DEFN attribute says that the numeric value is in fact a pointer to a definition of type DEFN. DEFN need not be defined yet in the initialisation file. The mul/nomul attribute described above specifies whether to multiply the pointer value by the size of the data item being pointed to. You can use mult MULT to multiply the pointer value by MULT (therefore mul is effectively the same as mult sizeof DEFN). The null/nonull attribute described above specifies whether this pointer may be followed if the numeric value is 0. The keyword add BASE may be used, and there is also a align ALIGNMENT keyword. ALIGNMENT can only be 1, 2, 4, 8, 16, 32 or 64 in the current implementation. Also, the rel/abs attribute described above specifies whether to add the address of the pointer itself to the numeric value. By using combinations of the pointer keywords, various effects may be acheived :-

n32 ptr DEFN abs: fetch pointer value, and decode DEFN at that address. This case is very common for file format decoding and memory dumps.
n32 ptr DEFN add 0x40000 abs: fetch pointer value, add 0x40000, and decode DEFN at that address. This case can be used to handle multiple memory space problems.
n32 ptr DEFN mul add addr "table" abs: fetch pointer value, multiply by the size of a DEFN, add the address of the table (as determined from the symbol table), and decode the DEFN at that address. This case is typical for when the pointer is in fact a table index.
n32 ptr DEFN rel: fetch pointer value, add address of the pointer itself, and decode the DEFN at that address. When a file consists of a list of variable length structures, where the first field is the size of the structure, this provides a handy way to skip past it to the next.
n32 ptr DEFN add 8 rel: fetch pointer value, add address of the pointer itself, add the numeric value 8 (this can be negative), and decode the DEFN at that address. This case is common for when one structure includes a field which identifies an amount of data to skip before the next structure is seen.
n8 ptr DEFN add 1 align 4 abs: fetch pointer value, add 1, and round up to the next 4 aligned address, before decoding DEFN at that address. Sometimes data items in files have length fields which need to be rounded up to a multiple of N (typically 2 or 4), before the next data field appears.

Clearly the expr mechanism described above can be used to similar effect.

The procedure for following pointers is :-

Fetch pointers numeric value.
If nonull and pointer is 0, then don't follow the pointer.
If mul, then multiply the pointer value by the size of the item being pointed to.
If mult MULT, then multiply the pointer value by MULT.
If add BASE, then add BASE to the pointer value.
If rel, then add the address of the pointer itself.
If seg, then mangle pointer address to account for the 16:16 segmented mode of x86 processors.
If align ALIGNMENT, then round up pointer to the next multiple of ALIGNMENT.
Decode and display data item at resultant address.

The seg keyword works by taking the top 16 bits of the pointer value as the segment, the bottom as the offset, and producing a new pointer value which is segment*16+offset. This feature may be of use for decoding large memory model program dumps which have been running on x86 processors running in real mode, or a 16:16 protected mode with a linear selector mapping. This feature is not recommended - its much easier to use the new -g command line switch instead. Anyone with a sensible file format to decode, or a memory dump taken from the memory space of a processor of a sensible architecture, can ignore this feature.

The keyword open may be given and this has the effect of increasing the level of detail that is initially displayed. See the description of the level of detail of display feature later in this document. This feature has its problems (bugs), but can be used to ensure that small arrays and short definitions are displayed in full without the user having to manually increase the level of detail by hand.

The suppress field attribute may be given using the suppress keyword. Suppressed fields are omitted from display when showing a whole definition on one line (by expanding the level of display). Suppressed fields are shown in round brackets when viewing a definition with each field on a new line.

The tag attribute may be given. When this field is initially displayed, the line will initially be tagged. Typically you might pre-tag one or two specific fields in a structure, if the structure were large, and certain fields were more important than others.

The width WIDTH attribute may also be given. By default, field widths are 0, which means don't pad or truncate fields when they are displayed. When set non-0, each field (or each individual field of an array) is padded or truncated to be the given width. If a field is truncated, a > or < symbol is shown. The width can be changed interactively by the user.

A validity check expression may be associated with each field, via the valid V syntax. Any field which passes its validity check has ++ displayed next to it, and any which fails has -- displayed next to it. When a whole definition is shown on one line (by expanding the level of detail of display), those fields which fail their validity tests, are not shown. This provides a handy way of doing conditional decode of variant records.

map T_
  {
  "T_SHORT" 1
  "T_LONG"  2
  }

def VARIABLE
  {
  buf 20 asc zterm                "name"
  n8 map T_                       "type"
  union
    {
    n16 dec valid "type==T_SHORT" "value16"
    n32 dec valid "type==T_LONG"  "value32"
    }
  }

Sometimes validity checks can get quite long, so remember that backslash at the end of a line causes 'line continuation', as in :-

n8 dec                             "discriminator"
n16 dec valid "discriminator==1||\
               discriminator==2||\
               discriminator==3"   "conditional_field"

Aside: Beware of using the C/C++ pre-processor (or other macro pre-processors) on BE initialisation files - they may not handle things like 'line continuation' quite the same way as BE does. eg: In the example, BE ignores the white space preceeding the word discriminator on the last two lines, but some (all?) C++ pre-processors include the white space in the final string!

Finally the name of the field must be given. You used to have to pad all field names of the same definition to be the same width with spaces, so that when displayed, everything lines up nice. But now BE does this automatically for you.

A typical structure definition might look like :-

def FROGLISTELEM
  {
  n32 ptr FROGLISTELEM "next_frog_in_list"
  buf 100 asc          "name_of_this_frog"
  }

However, consider the case that BE is being used to edit a dump of a processors memory space. In this case we also wish to be able to see all the global variables, whose addresses are determined by a symbol (rather than some fixed address). So it is typical to take advantage of the fact that fields can be placed at any offset into a structure (using at EXPR), and that expressions may refer to the symbol table (using addr "SYM"). You put such fields in a structure holding global variables, which would be decoded from address 0. You'd write something like :-

def GLOBAL_VARS
  {
  at addr "frog_list" n32 ptr FROG "frog_list"
  ...
  }

Now this can be a very common idiom, and you usually want the displayed field name to match the symbol name. So to avoid typing everything twice, BE provides a short-cut :-

def GLOBAL_VARS
  {
  n32 ptr FROG at "frog_list"
  ...
  }

When this feature was added to BE, and some real-world BE initialisation files were modified to take advantage of it, the files got 17% smaller.

Alignment declarations

Normally, when parsing a structure definition, each field is positioned immediately after the one before (unless the union, align, or at keywords are used).

When BE begins processing the initialisation file, it believes that all n8, n16, n24, n32, n40, n48, n56 and n64 variables should be aligned on a 1 byte boundary. In other words, no special alignment is to be automatically performed.

This is radically different from the way the high level languages such as C lay out the fields within their structures and unions. These languages enforce constraints such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done because certain processor architectures either can't access certain sizes of data from odd alignments, or are slower doing so. This can be accounted for by manually adding padding to structure definitions :-

def ALIGNED_USING_MANUAL_PADDING
  {
  n8 "fred"
  buf 3 "padding to align bill on a 4 byte boundary"
  n32 "bill"
  }

Or alternatively, the align keyword could be used :-

def ALIGN_USING_align_KEYWORD
  {
  n8 "fred"
  align 4
  n32 "bill"
  }

It is possible to tell BE to automatically align n8, n16, n24, n32 or nested definition fields on specific byte (offset) boundaries by constructs such as the following (which corresponds to many 32 bit C compilers) :-

align n16 2
align n32 4
align def 4
align { 4
align } 4

def ALIGNED_AUTOMATICALLY
  {
  n8  "fred"
  n32 "bill"
  }

The align { directive specifies that nested definitions must start on the indicated boundary. The align } directive specifies that structure sizes get rounded up to a multiple of the alignment.

Clearly, this feature is more useful when BE is being used to probe memory spaces of running programs via an memory extension, or doing post-mortem examination of program memory dumps.

Most data file formats don't-need-to and/or don't-bother-to align their fields.

Include directives

The initialisation file can contain the following, as long as it is outside of any other definition :-

include "anotherfile.ini"

Be sure to notice that this is a initialisation language command, not a pre-processor directive like $ifdef. This is why it is not $include.

There is also a tryinclude variant, which tries to open the file specified, but does not get upset if it can't :-

tryinclude "extrastuff.ini"

Included files will be searched for by looking in the current directory, then along an internal include path, along the BEINCLUDE environment variable, and finally along the PATH environment variable. The internal include path is usually empty, but may be appended to by the use of the -I command line option.

Macros

The BE initialisation file language now includes a macro feature. A macro (with or without parameters) can be defined using the defm endm construct.

A simple example involving linked lists :-

defm defListOf(T)
def ListOf concat T
  {
  n32 ptr T "Head of the list"
  n32 ptr T "Tail of the list"
  }
endm

Note the use of the concat operator to glue the ListOf identifier to the T (whatever is passed in to the macro on invokation) so as to make a single identifier.

Once defined, the macro can be invoked as follows :-

defListOf(STRUCT1) // defines ListOfSTRUCT1
defListOf(STRUCT2) // defines ListOfSTRUCT2
  // repeat as required for each type of list/element we might have

You might think of the macro mechanism as a way of defining template types, giving a convenient way to instantiate them as required.

Another simpler example: Sometimes programs write C/C++ ints to files, and unfortunately the size of these change from platform to platform. So we can replace the rather awful :-

def SOMETHING
  {
$ifdef INTS_ARE_64_BIT
  n64 "field"
$else
  n32 "field"
$endif
  // the above 5 lines repeated whereever the issue arises
  }

with the more manageable :-

defm int
$ifdef INTS_ARE_64_BIT
  n64
$else
  n32
$endif
endm

def SOMETHING
  {
  int "field"
  // int used wherever the issue arises
  }

This is better because the issue is addressed once at the top of the file.

Also, observe how this macro mechanism effectively provides a typedef like facility.

Just as the C/C++ pre-processor provides concatenation and token pasting operators, BE must provide similar mechanisms too. It provides functions which convert the type of one expression to another. These are :-

id2str(identifier)     id2str(SOMETHING) == "SOMETHING"
expr2str(expression)   expr2str(100+4)   == "104"
expr2id(expression)    expr2id(200+4)    == 204  // Ooops! See below

These are handy when writing macros, consider the following improved defListOf macro :-

defm defListOf(T)
def ListOf concat T
  {
  n32 ptr T "Head of the list of " concat id2str(T) concat "s"
  n32 ptr T "Tail of the list of " concat id2str(T) concat "s"
  }
endm

expr2id is a bit of a special case. Identifiers cannot start with a digit, and yet this appears to be the case in the example above! BE resolves this ambiguity by requiring that expr2id can't be used at the start of an identifier. Here is a legal example :-

defm ArrayOf(N,E)
def ArrayOf concat expr2id(N) concat E
  {
  N E "elements"
  }
endm

defArrayOf(100  ,STRUCT) // defines ArrayOf100STRUCT
defArrayOf(200+5,STRUCT) // defines ArrayOf205STRUCT

When BE encounters a macro invokation, it cannot simply delimit argument boundaries by looking for the comma seperators. It must also take bracket nesting into account. Consider :-

defArrayOf( sum(n32 be,0x1000,3) , TABENT )
  //        --------arg1--------   -arg2-                 RIGHT

BE must consider the sum(n32 be,0x1000,3) as a single argument, despite it containing commas, to avoid the following :-

defArrayOf( sum(n32 be , 0x1000 , 3)     , TABENT )
  //        ---arg1---   -arg2-   -arg3-   -arg4-         WRONG

Macros should be used sparingly. They can slow down the processing of the initialisation file at startup.

Macros should not recursively invoke themselves (directly or indirectly). Also, if you start defining macros which when invoked define other macros, you're on your own.

Macros can be hard to debug (if you get them wrong), as when BE writes out error messages, it will tend to quote a line:column position a token or two beyond the point at which the macro was invoked, rather than the point within the macro definition.

Reserved words

The following are reserved words, and so should be avoided as names of constants in the initialisation file :-

abs add addr align asc at be bin bits buf code concat dec def defm
ebc endm expr expr2id expr2str glue hex id2str include le lj map mul mult
n8 n16 n24 n32 n40 n48 n56 n64 nocode noglue nolj nomul nonull noseg nozterm
null oct offsetof open ptr rel seg set signed sizeof struct sum suppress sym
sym_base sym_offset tag time tryinclude union unset unsigned valid valof
width xor zterm

A sample initialisation file

Here is a real initialisation file, which is intended for viewing the master boot record written on sector 0 of PC disks :-

//
// mbr.ini - BE initialisation file for decoding master boot records
//
// Under Linux, root can obtain the MBR via a command much like :-
//
//   # dd if=/dev/sda of=mbr.dat bs=512 count=1
//
// Then you'd invoke BE via :-
//
//   % be -i mbr.ini mbr.dat
//
// The file assumes the drive from which the MBR was obtained has
// 63 sectors per track and 255 heads. These assumptions are used in
// computations of LBAs given CHS information. If the disk geometry is
// actually different (as is likely for <8GB disks), you can override
// the assumptions via a command line much like :-
//
//   % be -Ssectors_per_track=32 -Sheads=127 -i mbr.ini mbr.dat
//
// Information obtained mainly from STORAGE.INF.
//

set nspt `sectors_per_track 63`
set nh   `heads 255`

map BOOTINDIC
  {
  "Not Active" 0x80 : 0x80
  "Active"     0x00 : 0x80
  }

map PARTOWNER
  {
  "Unused"                                           0x00
  "DOS, 12-bit FAT"                                  0x01
  "XENIX System"                                     0x02
  "XENIX User"                                       0x03
  "DOS, 16-bit FAT"                                  0x04
  "Extended"                                         0x05
  "DOS, >32MB support, <=64KB Allocation unit"       0x06
  "OS/2, >32MB partition support"                    0x07
  "Linux swap"                                       0x82
  "Linux native"                                     0x83
    // Note: lots missing for brevity of example
  }

def PARTCHS
  {
  at 1 n8 bits 7:6 hex width 3            suppress "CylinderHigh"
  at 2 n8 hex width 4                     suppress "CylinderLow"
  expr "(CylinderHigh<<8)+CylinderLow" dec width 4 "Cylinder"
  at 0 n8 dec width 3                              "Head"
  at 1 n8 bits 5:0 dec width 2                     "Sector"

  expr "(Cylinder*nh+Head)*nspt+Sector-1" width 8 dec
                                          suppress "lba"
  }

def PART
  {
  n8 map BOOTINDIC          "BootIndicator"
  PARTCHS open              "PartitionStart"
  n8 map PARTOWNER          "SystemIndicator"
  PARTCHS open              "PartitionEnd"
  n32 dec width 8           "OffsetFromStartOfDiskInSectors"
  n32 dec width 8           "PartitionLengthInSectors"

  // By adding these two, you can work out the LBA
  // immediately following the partition
  expr "OffsetFromStartOfDiskInSectors+PartitionLengthInSectors"
       dec width 8 suppress "next_lba"

  expr "OffsetFromStartOfDiskInSectors*512" hex ptr MBR
       valid "SystemIndicator==Extended" suppress "extended" 
  }

def MBR
  {
  buf 446 hex                          "MasterBootRecordProgram"
  4 PART                               "PartitionTable"
  n16 be hex valid "Signature==0x55aa" "Signature"
  }

def main
  {
  MBR "mbr"
  }

In the above example quite a few of the BE features are demonstrated.

The setting of nspt and nh shows the expression syntax meaning "value of symbol if defined, else default value". These variables represent the geometry of the disk.

The map BOOTINC shows using map for decoding bits (in this case just one bit). This mapping decodes the per partition 'boot indicator' flag.

The map PARTOWNER shows using map for decoding enumerations. This mapping decodes the owner (or type) of the partition.

The def PARTCHS, which shows the cylinder head and sector of either the start or end of a partition, shows the following BE features :-

How to decode the bytes in the structure in an order other than first byte first, by using the at OFFSET construct.
How to extract just some bits of the bytes/words using bits MS:LS and how to combine them together to make a meaningful value using expr "EXPRESSION".
How to use width WIDTH so that the screen layout is nice.
How to use suppress to suppress some fields, so that only the ones worthy of display are shown when the entire PARTCHS structure is shown expanded on one line.

In the def PART, which decodes an entire partition entry in the partition table in the master boot record, we can see the use of the earlier two mappings, and also the use of open so that the PARTCHS structures are shown 'ready expanded'. Because of the earlier use of suppress in the def PARTCHS explained above, you'll just see the decoded cylinder head and sector. Of course when you select the PARTCHS, you'll see everything.

The use of expr "EXPRESSION" for computing next_lba is really useful when you use it in conjunction with the computed lba in the def PARTCHS above. Basically, the LBA beyond the end of a partition should be the start of the next partition, and so should be the OffsetFromStartOfDiskInSectors of another partition, and this should tally with the LBA computed from the CHS in the PartitionStart.

In the def MBR there is the use of a valid "EXPRESSION" validity check. It says the Signature field is valid, if it is 0x55aa. So if when you load BE to view an MBR, the Signature is shown with a -- indicator, you know the MBR isn't valid.

Clearly, with the above file, it is possible to do some rather low and dirty FDISK like things to MBR data, especially if you are using BE via a memory extension to directly access live disk sectors. Just because you can, it doesn't mean you should - be careful.

The supplied initialisation file

The supplied initialisation file contains enough definitions to enable you to examine the contents of many file formats.

Bitmap files supported include :-

Windows or OS/2 bitmap.
Dr.Halo bitmap.
Compu$erve GIF bitmap.
Amiga ILBM bitmap.
JFIF JPEG bitmap.
IBM KIPS bitmap.
Microsoft Paint bitmap.
Atari ST, NEOchrome bitmap.
Dr.Halo palette.
ZSoft PCX bitmap.
Portable Network Graphics bitmap.
IBM Page Segment.
Sun Raster file.
Utah Raster Toolkit bitmap.
Silicon Graphics bitmap.
RiscOS Sprite bitmap.
Targa/Vista bitmap.
Aldus/Microsoft TIFF.
IBM M-Motion bitmap.
Kodak YCC printer bitmap.
X Windows 10 window dump.
X Windows 11 window dump.

Animation formats :-

Atari ST, NEOchrome Animation Format.
Atari ST, Animatic File Format.
Atari ST, Cyber Paint Sequence Format.
JVC camera movie information file.

Also, the following miscellaneous file formats :-

IBM OS/2 Resource file.
Microsoft RIFF chunked file.
RT and MR 3D texture format.
ZIP compressed archive file.
IBM GSKit 7 log files.
MTX RUN files.
MTX tape image.
ZX Spectrum tape files.
ZX Spectrum snapshot files.
Anadisk floppy disk images.

The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.

Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.

If you are simply interested in looking at some of the file raw, you can use the DB, DW and DD definitions that come supplied in the default initialisation file. If you wanted to look at memory at 0x8000 as dwords, you could type :-

@ DD Enter 0x8000 Enter Enter

BNF definition

Here is a more formal specification of the BE initialisation file language. Actually, BE will accept variations on the following, but here we document the clearest/least-ambiguous use of the language. Where BE accepts variations on the following, typically it is in the ordering of independant attributes.

Some basics just before we start :-

<number> ::= a number in C/C++ style
             as in 0b1101, 0o15, 13, or 0x0d, or '\r' or similar
<id>     ::= a C/C++ style identifier
<string> ::= a C/C++ style double quoted string
             which is clean (characters between 32 and 126 only)
             as in "Hello World" etc.
<buffer> ::= a string or hexstring buffer
             as in "SIGNATURE" or @FF0022

When an identifier, string or buffer is required, they can be made in the following ways. Concatenation and type conversion functions tend to be handy when writing macros :-

<idx2> ::= <id>
         | 'expr2id' '(' <expr> ')'
<idx>  ::= <id> { 'concat' <idx2> }
<str2> ::= <string> 
         | <buffer>
         | 'id2str' '(' <idx> ')'
         | 'expr2str' '(' <expr> ')'
<str>  ::= <str2> { 'concat' <str2> }
<buf2> ::= <string> 
         | <buffer>
<buf>  ::= <buf2> { 'concat' <buf2> }

Numeric expressions :-

<sep>    ::= { ',' | ';' }
<vqual>  ::= [ '[' <expr> ']' ]
             [ ( '.' | '->' ) ( <idx> | 'valof' <str> ) [ <vqual> ] ]

<expr13> ::= <number>
           | '+' <expr13>
           | '-' <expr13>
           | '~' <expr13>
           | '!' <expr13>
           | 'addr' <str>
           | 'sizeof' <idx>
           | 'offsetof' <idx> <str>
           | 'valof' <str> [ <vqual> ]
           | 'map' <idx> <str>
           | '(' <expr> ')'
           | <idx> [ <vqual> ]
           | '`' <idx> <expr> '`'
           | '.'
           | '[' <n_value> <sep> <expr> [ <sep> <expr> ] ']'
           | '[[' <buf> <sep>
             <expr> <sep> <expr> <sep> <expr> [ <sep> <expr> ] ']]'
           | 'strlen' '(' <expr> ')'
           | 'sym_base' '(' <expr> ')'
           | 'sym_offset' '(' <expr> ')'
           | 'sum' '(' <n_value> ',' <expr> ',' <expr>
             [ ',' <expr> [ ',' <expr> ] ] ')'
           | 'xor' '(' <n_value> ',' <expr> ',' <expr>
             [ ',' <expr> [ ',' <expr> ] ] ')'
<expr12> ::= <expr13> { ( '*' | '/' | '%' ) <expr13> }
<expr11> ::= <expr12> { ( '+' | '-' ) <expr12> }
<expr10> ::= <expr11> { ( '<<' | '>>' | '>>>' ) <expr11> }
<expr9>  ::= <expr10> { ( '>' | '<' | '>=' | '<=' ) <expr10> }
<expr8>  ::= <expr9>  { ( '==' | '!=' ) <expr9> }
<expr7>  ::= <expr8>  { '&' <expr8> }
<expr6>  ::= <expr7>  { '^' <expr7> }
<expr5>  ::= <expr6>  { '|' <expr6> }
<expr4>  ::= <expr5>  { '&&' <expr5> }
<expr3>  ::= <expr4>  { '^^' <expr4> }
<expr2>  ::= <expr3>  { '||' <expr3> }
<expr>   ::= <expr2>  { '?' <expr2> ':' <expr2> }

The <vqual> is the way in which fields are further qualified. ie: how arrays are indexed, nested field referred to, and fields in pointed-to definitions accessed.

Sometimes in expressions . (dot) is allowed. It usually refers to some default amount. Other times it isn't allowed.

A maplet is mapping from a number to a string to display, and a map is zero or more maplets. Using . in the maplet value (first expression) means 0 or previous value plus 1, and in the maplet mask (optional second expression) it means the same as the value.

<maplet> ::= <str> [ 'suppress' ] <expr> [ ':' <expr> ]
<map>    ::= 'map' <idx> [ 'add' ] '{' { <maplet> } '}'

Numeric fields. Where the value comes from, how to display it, how to use it as a pointer (if it is one), and putting it all together :-

<bits>          ::= 'bits' <expr> ':' <expr>
<n_value>       ::= ( 'n8'  | 'n16' | 'n24' | 'n32' |
                      'n40' | 'n48' | 'n56' | 'n64' )
                    [ 'le' | 'be' ]
                    [ <bits> ]
                    [ 'signed' | 'unsigned' ]
<expr_value>    ::= 'expr' <str> [ <bits> ]
<code_attrs>    ::= [ 'lj' | 'nolj' | 'glue' | 'noglue' ]
<numeric_attrs> ::= [ 'map' <idx> ]
                    [ 'asc' | 'ebc' | 'bin' | 'oct' |
                      'dec' | 'hex' | 'sym' | 'time' ]
                    [ 'code' <code_attrs> | 'nocode' ]
<pointer_attrs> ::= [ 'ptr' <idx>
                      [ 'null' | 'nonull' ]
                      [ 'rel' | 'abs' ]
                      [ 'mul' | 'nomul' ]
                      [ 'mult' <expr> ]
                      [ 'add' <expr> ]
                      [ 'align' <expr> ]
                      [ 'seg' | 'noseg' ]
                    ]
<numeric_field> ::= ( <n_value> | <expr_value> )
                    <numeric_attrs> <pointer_attrs>

The expr string is itself a numeric expression. You'll need to escape any quotes within it.

A buffer field. How big, how to show the data, and whether to stop at a NUL byte. Using . in the buffer size expression gives the current offset into the definition :-

<buffer_field> ::= 'buf' <expr>
                   [ 'hex' | 'asc' | 'ebc' ]
                   [ 'zterm' | 'nozterm' ]

A field may name a nested definition :-

<def_field> ::= <idx>

All fields share a set of general attributes, and have a name, so a complete field specification looks like :-

<field> ::= ( <numeric_field> | <buffer_field> | <def_field> )
            { 'open' }
            [ 'valid' <str> ]
            [ 'width' <expr> ]
            [ 'suppress' ]
            [ 'tag' ]
            [ 'at' ]
            <str>

The valid string is itself a numeric expression.

A set of attributes :-

<attr>    := 'asc' | 'ebc' | 'bin' | 'oct' | 'dec' | 'hex' | 'sym' | 'time'
           | 'signed' | 'unsigned'
           | 'be' | 'le'
           | 'rel' | 'abs'
           | 'mul' | 'nomul'
           | 'seg' | 'noseg'
           | 'null' | 'nonull'
           | 'code' | 'nocode'
           | 'lj' | 'nolj'
           | 'glue' | 'noglue'
           | 'zterm' | 'nozterm'

Fields are just one type of item which can be found within a definition. Offset within the definition, alignment and nested definitions. Items in an itemlist follow one another (as in C/C++ structs), or overlay each other (as in C/C++ unions). Using . in the at expression gives the current offset into the definition. So definitions are specified as :-

<item>     ::= 'at' <expr>
             | 'align' <expr>
             | <itemlist>
             | <attr>
             | <field>
             | ';'
<itemlist> ::= [ 'struct' | 'union' ] '{' { <item> } '}'
<def>      ::= 'def' <idx> [ 'nocode' ] <itemlist>

File includes are specified :-

<include> := ( 'include' | 'tryinclude' ) <str>

The default attributes used, if not fully specified in the fields, or within an itemlist, can be specified globally :-

<align>   := ( 'align' ( 'n8'  | 'n16' | 'n24' | 'n32' |
                         'n40' | 'n48' | 'n56' | 'n64' |
                         'def' | '{' | '}' ) 
               <expr> )

Set and unsetting :-

<set>   ::= 'set' <idx> <expr>
<unset> ::= 'unset' <idx>

Macros may be defined :-

<macro> ::= 'defm' <idx> [ '(' <idx> { ',' <idx> } ')' ]
            ... body of the macro
            'endm'

So the total language is :-

<be> ::= <map>
       | <def>
       | <include>
       | <attr>
       | <align>
       | <set>
       | <unset>
       | <macro>

`+`, `-`, `~`, `!`	unary plus, unary minus, complement, not
`*`, `/`, `%`	multiply, divide, modulo
`+`, `-`	add (plus), subtract (minus)
`<<`, `>>`, `>>>`	shift left, shift right (signed), shift right (unsigned) [Note 1]
`>`, `<`, `>=`, `<=`	greater than, less than, greater than or equal, less than or equal
`==`, `!=`	equal, not equal
`&`	bitwise AND
`^`	bitwise exclusive OR
`\|`	bitwise inclusive OR
`&&`	logical AND
`^^`	logical exclusive OR [Note 2]
`\|\|`	logical inclusive OR
`? :`	conditional expression