diff --git a/user_guide.adoc b/user_guide.adoc index a7261c2..3a9d786 100644 --- a/user_guide.adoc +++ b/user_guide.adoc @@ -380,6 +380,21 @@ enums: 17: udp ---- +Alternatively, hexadecimal notation can also be used to define an enumeration: + +[source,yaml] +---- +seq: + - id: key + type: u4 + enum: keys +enums: + keys: + 0x77696474: width #widt + 0x68656967: height #heig + 0x64657074: depth #dept +---- + There are two things that should be done to declare a enum: 1. We add `enums` key on the type level (i.e. on the same level as @@ -472,7 +487,25 @@ structure: [source,yaml] ---- -TODO +seq: + - id: header + type: file_header + - id: metadata + type: metadata_section +types: + file_header: + seq: + - id: version + type: u2 + metadata_section: + seq: + - id: author + type: strz + encoding: UTF-8 + - id: publisher + type: strz + encoding: UTF-8 + if: _parent.header.version >= 2 ---- ==== `_root` @@ -799,6 +832,39 @@ other value which was not listed explicitly. _: rec_type_unknown ---- +If an enumeration has already been defined, you can use references to +items in the enumeration instead of specifying integers a second time: + +[source,yaml] +---- +seq: + - id: key + type: u4 + enum: keys + - id: data + type: + switch-on: key + cases: + keys::width: data_field_width + keys::height: data_field_height + keys::depth: data_field_depth +types: + data_field_width: + seq: + #... + data_field_height: + seq: + #... + data_field_depth: + seq: + #... +enums: + keys: + 0x77696474: width #widt + 0x68656967: height #heig + 0x64657074: depth #dept +---- + === Instances: data beyond the sequence So far we've done all the data specifications in `seq` - thus they'll @@ -1024,7 +1090,117 @@ bytes sparsely. === Streams and substreams -TODO +====Introduction and simple example==== + +A stream is a flow of data from an input file into a parser which is +generated by a KS script. The parser can request one or more bits of +data from the stream at a time, but cannot request the same data twice +and cannot request data be provided out of sequential order. A stream +knows the maximum amount of data available to be requested by the +parser and the actual amount of data which has already been +requested by the parser. + +When a file is first opened for parsing by a parser generated by KS, +a root stream is created. This root stream can be accessed via +`_root._io` at any time and in any place. In this scenario, `_root` +returns the top level object defined in a script, and `_io` is a +method which can be called on an object to return the associated +stream. The root stream will know the maximum amount of data available +to be requested by the parser as the file size of the input file which +is being parsed. Initially, the root stream will know that 0 bits of +data have been requested by the parser. + +Below is an example script which is used to generate a parser which +is then used to parse an input file. Assume that this example input +file simply consists of a 32-bit unsigned integer value of 1000 +followed by 1000 bytes of payload data. This example input file thus +has a total file size of 1004 bytes. + +[source,yaml] +---- +meta: + - id: example_file +seq: + - id: header + type: file_header + - id: body + type: file_body + size: header.body_size +types: + file_header: + seq: + - id: body_size + type: u4 + file_body: + seq: + - id: payload + size-eos: true +---- + +The parser generated by the script will first request 4 bytes of data +from the root stream to copy into the object `header.body_size`. After +the stream has returned the 4 bytes of data to the parser, the stream +will know that it has returned 4 out of the 1004 bytes of data available +to the parser. The parser is now only able to request 1000 bytes of +additional data from the stream. + +The definition of the `body` object in the example script specifies the +size of the `body` object to be the already-parsed value of +`header.body_size`. Defining an object size results in something +interesting happening with the KS-generated parser--a new substream is +created to specifically parse the `body` object. + +Similar to how the root stream operates, the new substream initially +knows the maximum amount of data available to be requested, and the +actual amount of data already returned. In this example, the substream +upon creation has a maximum of 1000 bytes of data which can be +requested by the parser. The substream will know the actual amount of +data which has been provided is 0 bytes. + +The parser will then continuously request data from the new substream +to copy into the object `file_body.payload`. As the substream receives +requests for more data, the substream will pass all requests to the +root stream. Unlike the root stream, substreams are only able to +request data from either the root stream or other substreams. +Substreams do not read from an input file directly. + +Because `size-eos: true` is specified for the `file_body.payload` +object, the parser will continue requesting data from the substream +until the actual amount of data provided by the substream is 1000 +bytes (the maximum amount of data which the substream is available +to provide). Upon all 1000 bytes of data being copied from the input +file, via the root stream and then via the substream to the +`file_body.payload` object, the internal state of the two streams +would be: +* root stream--maximum bytes of data available remains 1004, actual + amount of data already requested is 1004 bytes +* substream--maximum bytes of data available remains 1000, actual + amount of data already requested is 1000 bytes + +Alternatively, if `header.body_size` happens to be a value larger than +the input file size, the root stream would be unable to fulfill this +request, and the KS-generated parser would abruptly raise an exception +for trying to read non-existent data beyond the end of the input file. + +The `_io` method can be used to access the stream associated with an +object. An object can be obtained by identifier, or alternatively by +methods `_root` and `_parent`. Once a stream has been obtained with +the `_io` method, a number of different methods can be used to obtain +the internal state of the stream: +* `size` to return the maximum amount of data which is available to be + requested from the stream +* `pos` to return the actual amount of data which has already been + requested from the stream +* `eof` to return a boolean value of `false` when `pos != size` and + `true` when `pos == size` (has the maximum amount of data available + via the stream already been requested?) + +Substreams can be nested many layers deep by defining the `size` of +each object in the nested tree. + +Related expressions which are useful when working with streams include: +* `repeat: eos` +* `size-eos: true` === Processing: dealing with compressed, obfuscated and encrypted data @@ -1512,6 +1688,14 @@ a few pre-defined internal methods (they all start with an underscore |`size` |Integer |Number of elements in the array + +|`min` +|Array base type +|Gets the array element with the minimum value + +|`max` +|Array base type +|Gets the array element with the maximum value |=== ==== Streams @@ -1903,7 +2087,38 @@ beginner Kaitai Struct users. === Specifying size creates a substream -TODO +In the following example script, an erronous attempt is made to parse +an input file with a file size of 2000 bytes: + +[source,yaml] +---- +seq: + - id: body + type: some_body_type + size: 1000 +types: + some_body_type: + seq: + - id: payload + size: 999 + - id: overflow + size: 2 +---- + +The parser can successfully copy the required 999 bytes into +`body.payload` as the `body` substream has 1000 bytes available to +be requested, and the root stream has 2000 bytes available. + +Where an exception occurs is upon attempting to copy data from the +`body` substream into the `overflow` object. After data has been +copied from the `body` substream into the `payload` object, the +`body` substream will only have 1 byte of data still available for +the parser to request. As 2 bytes of data are attempted to be +requested, the `body` substream is exhausted of available data and +thus an exception occurs. The fact that the root stream still has +1001 bytes available to be requested from the input file does not +matter, as the `body` substream never has the opportunity to request +any more than the first 1000 bytes of the input file. === Applying `process` without a size