SANS Digital Forensics and Incident Response Blog

Missed It By That Much!

Hal Pomeranz, Deer Run Associates

One primitive forensic technique I show my students in my SANS Sec506 class is the tried and true method of using grep to display byte offsets of "strings of interest" found in a disk image. For example, I have my students go looking for "love" in the file system of the VMware image we use in class:

# grep -abi ' love ' /dev/sda6
452925733:# This is a comment. I love comments.
...

Once you have the byte offsets from grep, all you have to do is divide by the block size of the file system (hint: use fsstat) to get the number of the block that the string resides in. In the example, /dev/sda6 is a small file system that only uses 1024 byte blocks, so the number of the block where love is hiding is: 452925733 / 1024 = 442310.

However, when we go to dump that block, something interesting occurs:

# dcat /dev/sda6 442310
###
### Begin Red Hat Mailcap
###

audio/mod; /usr/bin/mikmod %s
# play is apparently a security hole
#audio/*; /usr/bin/play %s

image/*; gthumb %s

application/msword; ooffice %s
application/pdf; evince %s
application/postscript ; evince %s

text/html; /usr/bin/htmlview %s ; copiousoutput

Apparently there's something wrong with our technique, because the string we matched with grep doesn't appear in the block whose number we calculated from the byte offset! Let's try the next block:

# dcat /dev/sda6 442311
# This is a comment. I love comments.

# This file controls what Internet media types are sent to the client for
...

OK, there's our string- right at the beginning of the next block. What the heck is going on here?

We've just encountered on of the curious side-effects of using a line-oriented tool like grep on binary file system image data. It's easier to understand what's happening if you look at the hex dump output of the original block whose number we calculated from the byte offset:

# dcat -h /dev/sda6 442310
0 23232320 0a232323 20426567 696e2052 ### .### Beg in R
16 65642048 6174204d 61696c63 61700a23 ed H at M ailc ap.#
...
272 77202573 203b2063 6f70696f 75736f75 w %s ; c opio usou
288 74707574 0a000000 00000000 00000000 tput .... .... ....
304 00000000 00000000 00000000 00000000 .... .... .... ....
320 00000000 00000000 00000000 00000000 .... .... .... ....
...

What we've got here is a very small file that ends at byte 292 with a final trailing newline (hex 0x0a). The rest of the slack space in the block is filled with nulls.

However, grep is looking to display the byte offset of matching lines, so from it's perspective the next "line" starts at byte 293 in our block, right after that trailing newline. Since there are no newlines in the rest of the block, that "line" continues into the next block and includes our "string of interest". Remember that grep is just treating its input as a stream of data, it knows nothing about block boundaries because the lower-level system call interfaces hide those details from grep.

If what I'm telling you is the truth then the original byte offset that was displayed by grep (452925733) should equal 442310 blocks of 1024 bytes each, plus 293 bytes. You can do the calculation for yourself, but here's my result (presented in true Unix nerd fashion):

# expr 442310 \* 1024 + 293
452925733

Theory confirmed.

Now don't get me wrong: the "grep for strings of interest" technique is a useful arrow in your forensics quiver. You just need to be aware that little bumps in the road like this will occur whenever you're using tools like grep in ways that they were not originally intended to be used.