SANS Digital Forensics and Incident Response Blog

Understanding EXT4 (Part 5): Large Extents

Hal Pomeranz, Deer Run Associates

I've received a lot of positive feedback from the forensics community about this series of articles, but what's really rewarding is when other forensics researchers teach me something I didn't know. I recently received an email from a colleague in Europe who was looking at the extent trees for a large file in his EXT4 file system and saw something he couldn't explain.

To replicate the finding I created a large file- about 4GB in size. Recall from our discussion in Part 1 of this series that there is a 16-bit field to store the size of an extent. However, the high bit in that field is reserved to mark a preallocated extent, so you can only have 32K blocks in an extent. Assuming a typical 4K block size, that means you can only have 128MB of data in a single extent. A 4GB file is therefore going to require at least 32 extents, and even that assumes you can find 32 runs of 32K contiguous blocks to use. More likely we'll have more than 32 extents, some of which don't use the full 128MB length.

After creating my 4GB file, I used the techniques described in Part 3 to decode the extent tree structure for the file and find the data block that was holding the actual extents for the file:

Extent Structures from 4GB File

In fact, if you look at the number of extents field from the extent header (highlighted in yellow above) you can see that the file actually uses 52 (0x0034) extents. But what's really interesting is the second extent structure that I've highlighted above. Decoding this structure we have an extent that starts at logical offset 0x00003000 (block 12288 from the start of the file) and physical block 0x0000 01A4A000 (block number 27566080).

The thing that really surprised my colleague, however, is the extent size- 0x8000. In binary, that's a 16-bit value with the high bit set and the lower 15 bits all zeros. Because the high-bit is used by EXT4 to mark a preallocated extent, that would mean a preallocated extent with zero bytes. And that makes no sense at all. So what's really going on here?

It's Easier When Somebody Else Does the Legwork

I received the initial email about this issue literally the day before I had to go to SANSFIRE to teach, so I wasn't able to do any research on the problem immediately. While I was dancing around in front of my students, however, my colleague in Europe was flexing his Google kung fu and found a couple of interesting links that seemed related to the behavior we were seeing.

The first was a short note in the EXT4 developers' conference call minutes:

Amit will first be merging in Andreas' patch to fallocate, which allows initialized extents to be the full 32768 blocks. Uninitialized extents are limited to 32767 blocks. Amit will also add comments to this, and have the update patches ready by tomorrow.

The second link was what appears to be the code/comments referenced in the note above, specifically:

-#define EXT_MAX_LEN ((1UL << 15) - 1) 
+ * EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
+ * initialized extent. This is 2^15 and not (2^16 - 1), since we use the
+ * MSB of ee_len field in the extent datastructure to signify if this
+ * particular extent is an initialized extent or an uninitialized (i.e.
+ * preallocated).
+ * EXT_UNINIT_MAX_LEN is the maximum number of blocks we can have in an
+ * uninitialized extent.
+ * If ee_len is <= 0x8000, it is an initialized extent. Otherwise, it is an
+ * uninitialized one. In other words, if MSB of ee_len is set, it is an
+ * uninitialized extent with only one special scenario when ee_len = 0x8000.
+ * In this case we can not have an uninitialized extent of zero length and
+ * thus we make it as a special case of initialized extent with 0x8000 length.
+ * This way we get better extent-to-group alignment for initialized extents.
+ * Hence, the maximum number of blocks we can have in an *initialized*
+ * extent is 2^15 (32768) and in an *uninitialized* extent is 2^15-1 (32767).
+ */
+#define EXT_INIT_MAX_LEN (1UL << 15)

You'll be forgiven if you don't immediately understand what this comment is telling us. I had to ponder this for some time myself before realization dawned.

To understand what's going on here, it's helpful to review an aspect of EXT file systems that I haven't covered yet. In a traditional Unix file system like EXT, blocks and inodes are arranged in sequential block groups in the file system. The number of blocks in a block group is usually 8x the block size in the file system. That's because at the beginning of each block group is a single block holding a file system metadata structure called the block bitmap that tracks whether each block in the block group is allocated or not. If each bit in the block bitmap tracks a block in the block group, then the most blocks you can track using a single block bitmap block is 8x(block size)- or 32K blocks in a typical file system with a 4K block size.

The block bitmap, along with the blocks reserved for inodes and the inode bitmap (EXT normally allocates one inode for every four blocks in the block group, btw), plus copies of the superblock and other file system metadata are normally stored right before the data blocks in the block group. All of this file system metadata means that you'll never find more than 32K contiguous data blocks in an EXT file system- and that's only if the block group in question is currrently unused.

Now let's think about this in the context of the extent size field in EXT4. This field is a 16-bit value, but the high bit is reserved. That means an extent can only contain up to 2^16-1 blocks, or 32767 blocks- exactly one block less than the number of blocks in a single block group. This is wasteful.

Now why was the upper bit of the extent size reserved? So that the file system could mark certain extents as being "uninitialized but reserved". This "preallocation" strategy allows EXT4 to prevent other files from using certain blocks if it thinks a file is going to need those blocks in the future, thus avoiding the need to possibly fragment the growing file. Like putting a hem on a boy's trousers so that they can be let down in the future as the child grows up.

But to allow extents to fill an entire block group, the EXT4 developers have done a dirty trick. An extent size of 0x8000 would normally mean "an uninitialized extent with zero blocks in it". But why would I preallocate zero blocks? There's no point. So the EXT4 developers have added a special case which says that the value 0x8000 means an allocated extent of the full 32K blocks in the block group.

All other values with the high bit set mean a preallocated but uninitialized extent whose length is determined by the other fifteen bits in the extent size. But that means we're back to only being able to preallocate up to 2^16-1 blocks, or 32767 blocks- again one less block than the maximum number of blocks in a block group. If you read it carefully, that's what the comment and code I cited above is trying to tell us.

So the short answer is that an extent size value of 0x8000 means an allocated extent that's 32K blocks long. Any smaller value will also be an allocated extent, because the high bit will not be set. Any value above 0x8000 is a preallocated extent whose length is specified by the lower 15 bits in the value.

A Quick Shout-Out

On the subject of EXT4 and community research projects, I wanted to mention the efforts of my MANDIANT colleague William Ballenthin who's been working on a set of EXT4 patches for the Sleuthkit. Hopefully these patches will become stable enough to roll into the main TSK release in the near future. Rock on, Willi!

Hal Pomeranz is an Independent Consultant specializing in Digital Forensics, a SANS Institute Faculty Fellow, and a GCFA. He's still trying to figure out how to turn all of this EXT4 research into a free trip to Europe. Hal will be teaching SANS For508: Advanced Computer Forensic Analysis and Incident Response in Baltimore, Oct 9-14, 2011.


Posted November 1, 2011 at 6:46 AM | Permalink | Reply

Kurt H Hanssen

Thanks hal. I think i`m the college in Europe. Norway more precise. Very satisfied this was cleared out. EXT4 is this fare one of the most complicated FS, so we look forward to the btrfs.
Thabk you very much for prompt reply on my mails
Norwegian Police University College
Police Superintendent K. Hanssen