Byte Prediction with Gap and FF-16

Posted | Modified
Author

Building on the concept of the prediction table, the algorithm has been extended with a gap parameter.

Now, the algorithm predicts the most likely byte that comes after the current byte, allowing for a gap between them.

In practice, this means the algorithm can find frequent 16-bit patterns with gaps. The example below shows a gap of 3 bytes between two 00 bytes.

00 +(3) 00

The tool FF-16 (Find Frequent 16-bit) utilizes this algorithm.

FF-16 processes a file by splitting it into multiple 256-byte blocks, running the algorithm independently on each block. While the analysis occurs at the block level, the results are displayed at the bucket level, where a bucket consists of multiple blocks. This approach ensures that the user is not overwhelmed with excessive detail, making the output more practical to interpret.

Map View

When running FF-16 on notepad.exe the output looks like this.

BucketSize: 2816         0               1               2               3               4               5               6               7
Pattern     Ascii Blocks 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF01234567
00 00        |..| 542   |***********************************************************    ***** **** ********                       ********* * ***|
-               - 309   |                                                           ****                  *************************              |
CC CC        |..| 91    | ********* ******** ******                **  * ****** *                                                                |
48 8B        |H.| 68    |     ****  *** ****  * * *   **   ****     *   *  ******                                                                |
00 +(3) 00   |..| 42    |                                                              ****       *   *****                            * **  **  |
FF +(3) FF   |..| 42    |                                                                                                          ** *** ****   |
00 +(1) 00   |..| 41    |                                                          **  *  ** **      *   *                             *      *  |
00 +(7) 00   |..| 40    |                                                       ***      ****  **   ***** *                          *  *        |
00 +(11) 00  |..| 17    |                        *                                     *     *   **    *                                         |
00 +(15) 00  |..| 17    |                  *                                      *       ***  *     ***  *                                      |
02 00        |..| 15    |       ** * **   * ** *   *                                              *                                              |
01 00        |..| 12    |                  *      **     * *         *** *                                                                       |
FF +(11) FF  |..| 12    |                                                                                                                 **     |
00 +(19) 00  |..| 12    |                              *          *                      * *                                                     |
41 8B        |A.| 11    |                             * **      **                                                                               |
48 89        |H.| 10    |                              **        **                                                                              |
48 +(5) 00   |H.| 8     |              **    * * * **                                                                                            |
<LIST CUT HERE>

The map lists frequent patterns and marks buckets with * if they contain at least one block with a frequent pattern. The patterns occurring in more blocks are listed first. The map presents an overview of the file layout, showing the dominant patterns in different sections of the file.