SANS Digital Forensics and Incident Response Blog

Atemporal time line analysis in digital forensics

As incident responders we often find that attackers compromise one host in a network and then pivot to others. In digital forensic investigations involving intrusions, we can do our own pivoting from one piece of evidence to another. On October 19th, I had the good fortune to speak at SECTor about one method of doing this via "atemporal" time line analysis. A version of the slides is available online, though most of the talk was live demo so I recommend checking out the recorded version of the presentation. This post touches on some of the ideas from that talk.

In Q1 of 2011, I responded to an intrusion in a Fortune 10K corporation. The intrusion was discovered by an internal team performing daily log review (yes Josh Corman, there are corporations discovering intrusions daily thanks to log review). In this case, the system in question was attempting to connect to an IRC server every two seconds.

In breach investigations, one common objective is to find the attacker's code. Once you've located the attacker's code, you can reverse it, determine its capabilities, its command and control channels, persistence mechanisms and so on. This information can help you find similarly compromised hosts in your environment.

After evidence acquisition, a file system time line was created using fls and mactime. The time line was over 600K lines and not having a good grasp of when the breach occurred, I decided to begin at the end of the time line and work backwards. Here's what I saw:

2011 03 18 Fri 14:43:02|80528|.a..|r/rrw-r-r-|0|0|708471|/etc/
2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/.services.swpx (deleted-realloc)
2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/mtab
2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/mtab.tmp (deleted-realloc)
2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/sysconfig/network-scripts/.ifcfg-eth1.swpx (deleted-realloc)
2011 03 18 Fri 14:43:02|47|mac.|r/rrw-r-r-|0|0|709666|/etc/sysconfig/network-scripts/ifcfg-eth1~ (deleted-realloc)
2011 03 18 Fri 14:43:02|0|mac.|-/rrw-r-r-|0|0|709692|/$OrphanFiles/OrphanFile-709692 (deleted)
2011 04 15 Fri 19:23:00|388262|m...|r/rrwxr-xr-x|1000|100|4572390|/usr/lib/popauth
2011 04 15 Fri 19:23:00|1092|m...|r/rrwxr-xr-x|1000|100|4572391|/usr/local/lib/
2011 04 15 Fri 19:23:00|351|m...|r/rrwxr-xr-x|1000|100|4572392|/etc/cron.daily/dnsquery

Notice anything interesting?

If you're thinking dsniff, yes, that is noteworthy, but take another look, focus on the dates.

Recall that this breach investigation occurred during the first quarter of 2011. How are there three files on this system that have modification times from Q2? Maybe we're dealing with the world's worst hacker.

You can check out the video of the talk to see the details on two of the three files. Suffice to say, dnsquery was a script run by cron every day, it called popauth. A quick look at popauth with strings showed that it contained some common IRC commands as well as references to dsniff. One might be tempted to remove popauth, dsniff and the dnsquery script and put the system back into production, after all, we know we are looking for an ircbot. That would have been a mistake in this case.

Now that we had located some attacker code through traditional time line analysis, how can we pivot from this information we know to something we don't know, using atemporal analysis? To start, I grepped through the time line file for the suspect file names and only the elements from the time line that I wanted to focus on. Here's the command and the results:

egrep "popauth|dsniff|dnsquery" slash.timeline.csv | awk -F"|" '{print $7, $3, $NF}' | sort -g
670500 .a.. /usr/lib/popauth.#prelink#.Ah5LTd (deleted)
670500 .a.. /usr/lib/popauth.#prelink#.yuQfuE (deleted)
670500 m.c. /usr/lib/popauth.#prelink#.Ah5LTd (deleted)
670500 m.c. /usr/lib/popauth.#prelink#.yuQfuE (deleted)
4572390 .a.. /usr/lib/popauth
4572390 ..c. /usr/lib/popauth
4572390 m... /usr/lib/popauth
4572391 .a.. /usr/local/lib/
4572391 ..c. /usr/local/lib/
4572391 m... /usr/local/lib/
4572392 .a.. /etc/cron.daily/dnsquery
4572392 ..c. /etc/cron.daily/dnsquery
4572392 m... /etc/cron.daily/dnsquery

So what are these numbers at the start of each line? They are metadata addresses, or inodes in Ext2/3/4 file systems. NTFS file systems have something similar commonly referred to as NTFS entries, though Microsoft calls them something more formal sounding. In the industry, we typically refer to them as inodes, whether we're discussing NTFS or Ext2/3/4 file systems.

So inodes are a metadata structure akin to a card (dating myself here) from a library's card catalog. They contain information about the files in the same way that those cards used to contain author, title, number of pages, location in the library, etc., but inodes contain owner, group, location on disk, size of file, etc. In a library these cards are arranged alphabetically either by title, author or subject. In a file system, they are simply first come, first serve and they are numbered sequentially. In NTFS inode 0 always points to the $MFT. In Ext2/3/4 inode 2 is the root (/) directory.

Given that these inodes are assigned sequentially, if new files are written to disk, the inodes that are assigned to them are likely to be sequential or close to sequential, assuming a sequential run of inodes is available. I need to hire a good illustrator to animate this concept.

Think of it this way, as files are deleted from the system, their inodes are marked as unallocated and are available for reuse. If there are no unallocated inodes, new ones will be assigned beginning with the current maximum inode value plus one and so on.

So, how do we use this information to find attacker code? By grepping through the time line for inode values that are similar to those we already know about. Take a look:

awk -F"|" '{print $7, $3, $NF}' slash.timeline.csv | egrep "^670(49|50)|^45723(8|9)" | grep -v Orpha | grep -v delete | sort -g
670492 .a.. /usr/sbin/sshd
670492 ..c. /usr/sbin/sshd
670492 m... /usr/sbin/sshd
670494 .a.. /usr/lib/httpd.log
670494 m.c. /usr/lib/httpd.log
670495 mac. /usr/include/shup.h
670496 .a.. /usr/include/glob2.h
670496 m.c. /usr/include/glob2.h
670497 .a.. /usr/bin/zap
670497 m.c. /usr/bin/zap
670498 .a.. /usr/bin/ssh
670498 ..c. /usr/bin/ssh
670498 m... /usr/bin/ssh
670499 .a.. /usr/bin/zmuie
670499 ..c. /usr/bin/zmuie
670499 m... /usr/bin/zmuie
4572390 .a.. /usr/lib/popauth
4572390 ..c. /usr/lib/popauth
4572390 m... /usr/lib/popauth
4572391 .a.. /usr/local/lib/
4572391 ..c. /usr/local/lib/
4572391 m... /usr/local/lib/
4572392 .a.. /etc/cron.daily/dnsquery
4572392 ..c. /etc/cron.daily/dnsquery
4572392 m... /etc/cron.daily/dnsquery

Every file in the list above was attacker code and we found them simply by taking a known piece of information and pivoting on it. If we'd relied only on temporal aspects of the time line, we could have missed these files. Why didn't these files show up at the end of our time line like the other three? Here's the same data, but with time stamps put back in:

awk -F"|" '{print $7, $1, $3, $NF}' slash.timeline.csv | egrep "^670(49|50)|^45723(8|9)" | grep -v Orpha | grep -v delete | sort -g
670492 2007 08 08 Wed 08:47:33 m... /usr/sbin/sshd
670492 2011 01 27 Thu 03:02:32 ..c. /usr/sbin/sshd
670492 2011 03 05 Sat 03:02:20 .a.. /usr/sbin/sshd
670493 2011 01 22 Sat 05:37:22 mac. /usr/share/sshd.sync
670494 2011 03 18 Fri 03:02:05 m.c. /usr/lib/httpd.log
670494 2011 03 18 Fri 12:53:36 .a.. /usr/lib/httpd.log
670495 2011 01 22 Sat 05:37:22 mac. /usr/include/shup.h
670496 2011 02 01 Tue 12:03:09 .a.. /usr/include/glob2.h
670496 2011 03 18 Fri 12:46:00 m.c. /usr/include/glob2.h
670497 2011 01 22 Sat 05:37:22 m.c. /usr/bin/zap
670497 2011 03 05 Sat 03:02:35 .a.. /usr/bin/zap
670498 2011 01 22 Sat 05:37:22 m... /usr/bin/ssh
670498 2011 01 27 Thu 03:02:32 ..c. /usr/bin/ssh
670498 2011 03 18 Fri 14:11:26 .a.. /usr/bin/ssh
670499 2007 07 30 Mon 10:19:17 m... /usr/bin/zmuie
670499 2011 01 27 Thu 03:02:32 ..c. /usr/bin/zmuie
670499 2011 03 05 Sat 03:02:13 .a.. /usr/bin/zmuie
4572390 2011 01 22 Sat 05:37:22 ..c. /usr/lib/popauth
4572390 2011 03 18 Fri 03:02:05 .a.. /usr/lib/popauth
4572390 2011 04 15 Fri 19:23:00 m... /usr/lib/popauth
4572391 2011 01 22 Sat 05:37:22 ..c. /usr/local/lib/
4572391 2011 03 18 Fri 03:02:05 .a.. /usr/local/lib/
4572391 2011 04 15 Fri 19:23:00 m... /usr/local/lib/
4572392 2011 01 22 Sat 05:37:22 ..c. /etc/cron.daily/dnsquery
4572392 2011 03 18 Fri 03:02:05 .a.. /etc/cron.daily/dnsquery
4572392 2011 04 15 Fri 19:23:00 m... /etc/cron.daily/dnsquery

The other files don't appear at the end of the time line because they had their time stamps correctly backdated via the touch command. So maybe we're not dealing with the world's least sophisticated attacker after all, maybe these three files that were dated in the future were a red herring. It's interesting to think about, but ultimately futile to try and understand the mind of the attacker.

There's at least one other noteworthy aspect of these inodes. I talked about it in my SECTor talk, so check out the recorded presentation when it becomes availalbe or stay tuned, I'll be blogging about it here soon.

Dave Hull is an incident responder, forensic investigator, reverser of malware, sometimes web application breaker and recovering code analysis guy. When he's not hunting on enterprise networks you can likely find him hanging out with his family or attempting to learn piano.


Posted October 26, 2011 at 1:52 AM | Permalink | Reply

Frank McClain

Great post, Dave! Very interesting, and good point about looking for correlating points of relevance.

Posted October 20, 2012 at 10:57 AM | Permalink | Reply

Nick Klein

'' and love your command line kung-fu Dave.