SANS Digital Forensics and Incident Response Blog

Computer Forensics: Identifying Disk Differences — Broken Mirrors

One Friday afternoon I was greeted by a large package from FedEx. Inside the giant box was supposed to be a hard disk drive on which I was to conduct digital forensic analysis. Opening the box and removing a few handfuls of packing peanuts revealed a bubble-wrapped Dell Tower. Obviously, the clients, like most non-computer folks, didn't know they could remove the actual hard disk drive from the tower and send those my way.

After grabbing the paperwork for this job, filling out my own chain-of-custody documentation and evidence receipt, I cracked open the tower and saw the following inside:

Image 1: Double SATA, double funImage 1: Double SATA, double fun

This job suddenly got more interesting and possibly less profitable for me. I wondered if the clients knew when they hired me for the job that the system contained two disks. I disconnected the lower drive from the system, added it to my evidence inventory as D1 and connected it to my write-blocker and then to the SIFT Workstation.

After gathering the mmls output for the image, I started the imaging process. The output from mmls looked something like this (actually, it looked nothing like this. I'm not going to expose details of the client's system. For our purposes, this works):


DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors

Slot Start End Length Description
00: Meta 0000000000 0000000000 0000000001 Primary Table (#0)
01: ----- 0000000000 0000000062 0000000063 Unallocated
02: 00:00 0000000063 0000210923 0000210861 Dell Utilities FAT (0xde)
03: Meta 0000210924 0007952615 0007741692 Win95 Extended (0x0F)
04: Meta 0000210924 0000210924 0000000001 Extended Table (#1)
05: ----- 0000210924 0000210986 0000000063 Unallocated
06: 01:00 0000210987 0007952615 0007741629 NTFS (0x07)

 
Based on the interviews with my clients and the information I had about the case, I knew that the NTFS partition would be the primary focus of my investigation. I started the disk imaging process and went out for a walk-about.

After dc3dd finished imaging and hashing the disk image. I returned D1 to the Dell tower and removed the second hard disk drive, added it to the evidence inventory as D2 and repeated the process above.

Running mmls on D2 revealed the following information:


DOS Partition Table
Offset Sector: 0
Units are in 512-byte sectors

Slot Start End Length Description
00: Meta 0000000000 0000000000 0000000001 Primary Table (#0)
01: ----- 0000000000 0000000062 0000000063 Unallocated
02: 00:00 0000000063 0000210923 0000210861 Dell Utilities FAT (0xde)
03: Meta 0000210924 0007952615 0007741692 Win95 Extended (0x0F)
04: Meta 0000210924 0000210924 0000000001 Extended Table (#1)
05: ----- 0000210924 0000210986 0000000063 Unallocated
06: 01:00 0000210987 0007952615 0007741629 NTFS (0x07)

 
Astute readers will note that D2's partition table matches D1's exactly. I breathed a sigh of relief as it appeared that D1 and D2 were a mirrored pair likely running in RAID1 configuration. This meant that I could limit my analysis to a single drive and probably finish the job on the original time line I'd promised and without losing money on the gig.

Rather than image D2, I took an MD5 hash of it to verify that it matched D1. Sadly, it did not. If the drives were supposed to be a mirrored pair, the synchronization was apparently broken. At this point, I mounted the NTFS partition on the SIFT workstation and viewed the contents of the root directory and compared them to the contents of the root directory of the NTFS image I'd collected already. Everything in the root directory was the same, file names, sizes, time stamps, sub-directory names and so on. My theory that this was a mirrored pair out of sync had some life in it. I started imaging D2 and thought about ways of quickly isolating differences between the two disks.

My first thought was to run Jesse Kornblum's ssdeep against the disk images to see how similar they were. I tried this, but ssdeep complained, "Value too large for defined data type."

Next I decided to carve out each partition and calculate MD5 hashes and compare them. One by one, the partitions were carved and their MD5 hashes compared. In the end, each partition's hash matched except for the NTFS partition.

Knowing that one of the options for dc3dd is the hashwindow option, which can be used for piecewise hashing, I decided to use it to determine how similar the two disks were. Here are the commands I used:


dc3dd if=D1_ntfs.img hash=md5 hashwindow=10M hashlog=D1.hashlog of=/dev/null
dc3dd if=D2_ntfs.img hash=md5 hashwindow=10M hashlog=D2.hashlog of=/dev/null

 
Note the output file is /dev/null. I didn't need to create new images, I'd already carved them out. All I wanted from this operation was to locate which portions of the two disk images were different. The result of running these two commands was a couple of text files containing MD5 hashes for 10 MB sections of the NTFS partitions. Here's a sample of the output:


md5 0- 10485760: fdfd6a607ebef09871c3c51140e9eb40
md5 10485760- 20971520: f1c9645dbc14efddc7d8a322685f26eb
md5 20971520- 31457280: f1c9645dbc14efddc7d8a322685f26eb
md5 31457280- 41943040: f1c9645dbc14efddc7d8a322685f26eb
md5 41943040- 52428800: f1c9645dbc14efddc7d8a322685f26eb
...
md5 482344960- 492830720: f1c9645dbc14efddc7d8a322685f26eb
md5 492830720- 503316480: 00254e8b9cf9c6d3a1f6ba8040cf4782
md5 503316480- 513802240: 348b1f2236220e4ab71e335385cb80fe
md5 513802240- 524288000: f1c9645dbc14efddc7d8a322685f26eb
...

 
The first column of output shows us the hashing algorithm that was used, then the starting and ending byte offsets, followed by the MD5 sum for the bytes in that range. Note that the first 10 MB section of the partition contained data, the next 470 MBs of the partition contained no data, thus the MD5 sums for each 10 MB section were the same until finally, between 470 and 480 MBs, the partition contained something other than nulls, hence the MD5 sums started to vary again.

I was curious to see how different the two sets of hashlogs were. So I ran ssdeep against the first hashlog file and saved the result, then ran ssdeep against the second hashlog, comparing its result against the first:


ssdeep D1.hashlog > D1.hashlog.ssdeep
ssdeep -m D1.hashlog.ssdeep D2.hashlog
/cases/20100808/D2.hashlog matches D1.hashlog.ssdeep:/cases/20100808/D1.hashlog (88)

 
The hashlogs from these two partitions are 88% alike according to ssdeep. I now had more evidence that these two partitions were (at one point in time) a mirrored pair. The next step was to locate the differences between the two partitions. To do that, I used the diff command:

diff D1.hashlog D2.hashlog 
1c1
< md5 0- 10485760: fdfd6a607ebef09871c3c51140e9eb40
---
> md5 0- 10485760: ef9f993a60a6a77114aab999091597ce
48c48
< md5 492830720- 503316480: 00254e8b9cf9c6d3a1f6ba8040cf4782
---
> md5 492830720- 503316480: 547bd7c44930a1911cd6ce6f85b606df
51c51
< md5 524288000- 534773760: 705c8fc001d91cc32919d34d83127df6
---
> md5 524288000- 534773760: 64772837bbb0502f98af41261bb3743e
53c53
< md5 545259520- 555745280: 272594145001e58f0b1dfba6e7a36ce1
---
> md5 545259520- 555745280: 3497c9365449e8339c550b161ea98535
55,56c55,56
< md5 566231040- 576716800: b20266a7591cac2f2cfa9f8375a71761
< md5 576716800- 587202560: f9b18be13c774fa009717101ec495afc
---
> md5 566231040- 576716800: 58638effcded45e272d555def45351f8
> md5 576716800- 587202560: c366890edd98ed67c381adc7c294dfb5
58,61c58,61
< md5 597688320- 608174080: 9a30b16c50fdd1e6b46c621cabde0ecd
< md5 608174080- 618659840: 59f55c57bc15467e1734d8eab837b02c
< md5 618659840- 629145600: cdcb01a465f188cff6e08b5189413f2e
< md5 629145600- 639631360: 7981b144a85149fcac7fce2161d44278
---
> md5 597688320- 608174080: dedffa8e94b137914ae70ee64d02ec5b
> md5 608174080- 618659840: 9261d04130b4802a9ee6cfe50e5b3f2a
> md5 618659840- 629145600: 46f66e9d9711815a521a3429179f3e42
> md5 629145600- 639631360: 5696f83cead6b1fc80bf5b7819535f99
63c63
< md5 650117120- 660602880: 43aadd02600598fab034d091684c9dff
---
> md5 650117120- 660602880: 9d6db2c17acc1c321493bd054510c1d1
190c190
< md5 1981808640- 1992294400: c8a0dc3bcbedc485c3ebfd06087a34d8
---
> md5 1981808640- 1992294400: b6a289e4342258a016223eb4400f1c8c
380c380
< md5 TOTAL: a347712cc414e2f7ea23baedd929d620
---
> md5 TOTAL: 2ba46718f54169305073b0bc469bc1e9

 
If you're anything like me, that may look like mind numbing output. Let's review a few lines of the output line-by-line. First is the diff command itself, simple enough, compare D1.hashlog to D2.hashlog. The next line, "1c1" refers to line number one in each file, the "c" means that line number one in the second file has "changed" compared to the first file. The next several lines of diff output follow this same format, then we see


55,56c55,56 < md5 566231040- 576716800: b20266a7591cac2f2cfa9f8375a71761 < md5 576716800- 587202560: f9b18be13c774fa009717101ec495afc --- > md5 566231040- 576716800: 58638effcded45e272d555def45351f8 > md5 576716800- 587202560: c366890edd98ed67c381adc7c294dfb5

 
This means that lines 55 through 56 of file one have changed in file two, based on this explanation, you can make sense of the rest of the file. Given diff's output, we've narrowed down the differences between the two ~4 GB files to 120 MBs. We can now narrow in on the differences even more by repeating the process above using a smaller hashwindow and restricting dc3dd to those sections of the NTFS images where we know the differences reside.

The output in the hashlog files contains byte offsets. When we run dc3dd, we'll be working with blocks. We can specify our block size and divide the byte offsets given in diff's output and drill down to the specific portions of each 10 MB section to more precisely locate the differences. Here's an example:


dc3dd bs=512 if=D1_ntfs.img of=/dev/null count=20 hashwindow=1M hash=md5 hashlog=diff_1_D1.dd.hashlog
warning: sector size not probed, assuming 512
dc3dd 6.12.3 started at 2010-08-08 21:47:49 -0400
command line: dc3dd bs=512 if=D1_ntfs.img of=/dev/null count=20 hashwindow=1M hash=md5 hashlog=diff_1_D1.dd.hashlog
compiled options: DEFAULT_BLOCKSIZE=32768
sector size: 512 (assumed)
md5 0- 1048576: d5c912a902d74371aa06aafefe21674a
md5 1048576- 2097152: b6d81b360a5672d80c27430f39153e2c
...

dc3dd bs=512 if=D2_ntfs.img of=/dev/null count=20 hashwindow=1M hash=md5 hashlog=diff_1_D2.dd.hashlog
warning: sector size not probed, assuming 512
dc3dd 6.12.3 started at 2010-08-08 21:50:17 -0400
command line: dc3dd bs=512 if=D2_ntfs.img of=/dev/null count=20 hashwindow=1M hash=md5 hashlog=diff_1_D2.dd.hashlog
compiled options: DEFAULT_BLOCKSIZE=32768
sector size: 512 (assumed)
md5 0- 1048576: 042a76d72aaf721c2d49246a40d974df
md5 1048576- 2097152: b6d81b360a5672d80c27430f39153e2c
...

 
I've abbreviated the output, but you can see that where previously we knew that there was a difference in the first 10 MBs, now we know that difference is actually in the first MB. Now we're getting somewhere. Let's zoom in on the first MB from each file and see how we can pinpoint the difference.


dc3dd bs=512 if=D1_ntfs.img of=/dev/null hash=md5 hashwindow=512 count=20 hashlog=D1_1MB.hashlog
dc3dd bs=512 if=D2_ntfs.img of=/dev/null hash=md5 hashwindow=512 count=20 hashlog=D2_1MB.hashlog

 

These dc3dd commands collect the MD5 hashes for every 512 bytes of data for the first 1 MB of each NTFS partition. Running diff on the two resulting hashlog files, we get the following result:

diff D1_1MB.hashlog D2_1MB.hashlog 
17c17
< md5 8192- 8704: 590693b0719f5a66787565fa3d795e05
---
> md5 8192- 8704: dc1196943e31869bbcf12fe86f7d896c

 

Now we know the difference in the first MB of the NTFS partitions is somewhere between byte offset 8192 and 8704. At this point, we can easily carve out this section of each file and compare them.


dc3dd if=D1_ntfs.img of=D1_8192-8704.img bs=512 skip=16 count=1 hash=md5
warning: sector size not probed, assuming 512
dc3dd 6.12.3 started at 2010-08-08 22:18:22 -0400
command line: dc3dd if=D1_ntfs.img of=D1_8192-8704.img bs=512 skip=16 count=1 hash=md5
compiled options: DEFAULT_BLOCKSIZE=32768
sector size: 512 (assumed)
md5 TOTAL: 590693b0719f5a66787565fa3d795e05
1+0 sectors in
1+0 sectors out
512 bytes (512) copied (100%), 0.00191684 s, 261 K/s
dc3dd completed at 2010-08-08 22:18:22 -0400

dc3dd if=D2_ntfs.img of=D2_8192-8704.img bs=512 skip=16 count=1 hash=md5
warning: sector size not probed, assuming 512
dc3dd 6.12.3 started at 2010-08-08 22:18:42 -0400
command line: dc3dd if=D2_ntfs.img of=D2_8192-8704.img bs=512 skip=16 count=1 hash=md5
compiled options: DEFAULT_BLOCKSIZE=32768
sector size: 512 (assumed)
md5 TOTAL: dc1196943e31869bbcf12fe86f7d896c
1+0 sectors in
1+0 sectors out
512 bytes (512) copied (100%), 0.00155415 s, 322 K/s
dc3dd completed at 2010-08-08 22:18:42 -0400

 
Let's review our dc3dd command above. We specify our blocksize (bs) as 512 bytes and we want to skip to byte offset 8192 and collect a single block. Skip and count both take blocks as arguments so we divide 8192 by 512 and get 16 for our skip value. Note also that our MD5 sums for each of the 512 byte sections we've carved out, match the previous MD5 sums.

Now to pinpoint the differences in these 512 byte sections, I run them through xxd, dumping them as hexadecimal files and then diff those two files:


xxd -g1 -u D1_8192-8704.img > D1_8192-8704.img.xxd
xxd -g1 -u D2_8192-8704.img > D2_8192-8704.img.xxd
diff D1_8192-8704.img.xxd D2_8192-8704.img.xxd
1c1 < 0000000: FF FF 00 07 00 00 00 00 7E 37 01 03 00 00 00 00 ........~7...... --- > 0000000: FF FF 00 07 00 00 00 00 FF 3F 00 00 00 00 00 00 .........?......

 

That's it, repeat this process and determine if the differences between the two disks are relevant to the case. And of course, in true SANS fashion, now that you've seen the difficult way to do something like this, you should know that the SIFT Workstation includes vbindiff, which can solve this problem for you and in a much more elegant way, though you may still run into instances where the files you are working with are too large for vbindiff. Here's a screen capture of vbindiff showing the above difference:

Image 2: vbindiff D1_ntfs.img D2_ntfs.imgImage 2: vbindiff D1_ntfs.img D2_ntfs.img

Dave Hull is an incident responder and forensic investigator for a Fortune 10000 CIRT where he enjoys being the dumbest person in the room. Over the years he has worn many hats and still dons them from time to time. When he's not flipping bits, he enjoys teaching for the SANS Institute.

7 Comments

Posted August 12, 2010 at 1:00 PM | Permalink | Reply

Andrew Hay

Wow, great post. We need more walkthrough's like this ;)

Posted August 12, 2010 at 4:02 PM | Permalink | Reply

Ken Pryor

I agree with Andrew. What an excellent post! Thanks Dave!
KP

Posted August 12, 2010 at 6:48 PM | Permalink | Reply

Chad Tilbury

Nice use of VBinDiff, Dave!

Posted August 13, 2010 at 2:58 AM | Permalink | Reply

Grayson

Thanks for taking the time to write this up, great info in your post.
A lot of modern computers have a "write-back cache" capability on the RAID controller. I suspect this machine was not shut down cleanly and there is a chink of data stuck on that controller. After your investigation it might be interesting to power it up, shut down clean and test those images again.

Posted August 13, 2010 at 5:47 AM | Permalink | Reply

Dave Hull

Very interesting, thanks for the insight. I would have liked to have tried that out, but avoided powering on the system altogether. Due to the nature of the case, I didn't care about the accuracy of the system clock or other BIOS related artifacts, so didn't power it up.

Posted April 20, 2014 at 7:10 PM | Permalink | Reply

Adam

The linux command "cmp" might work for this as well.

Posted August 4, 2014 at 10:53 PM | Permalink | Reply

Alan Harper

Great post. I also think that WinHex could have located these differences must faster. I use it all the time to locate differences in files all the time. WinHex also has a neat feature to control how many bytes you compare and where to start looking. Since the hashes were different, there is no logical reason to start at any particular place on the image, so you might just as well start at the beginning. WinHex would have found this difference is seconds.
It is also interesting to note that the difference occurred starting in sector 8192, and contains the hex string of FF FF 00 07 in both and then the differences occur in bytes 8 ''" 11. I think this may have something to do with the RAID controller. Perhaps it is way for the controller to distinguish one disk from the other. I think there is more to this that meets the eye.