SANS Digital Forensics and Incident Response Blog

Digital Forensics: UID and GID distributions

On Unix and Linux systems each file has a user id and a group id, uid and gid respectively, showing the file's owner and group. On most *nix systems files in system directories are uid and gid root, which is represented by the numeric uid and gid value of 0, see the sample listing below:

davehull@64n6:/bin$ ls -ln | head
total 9080
-rwxr-xr-x 1 0 0 950896 May 18 2011 bash
-rwxr-xr-x 3 0 0 31112 Dec 13 10:30 bunzip2
-rwxr-xr-x 1 0 0 1719048 Sep 1 12:02 busybox
-rwxr-xr-x 3 0 0 31112 Dec 13 10:30 bzcat
lrwxrwxrwx 1 0 0 6 Dec 13 10:30 bzcmp -> bzdiff
-rwxr-xr-x 1 0 0 2140 Dec 13 10:30 bzdiff
lrwxrwxrwx 1 0 0 6 Dec 13 10:30 bzegrep -> bzgrep
-rwxr-xr-x 1 0 0 4877 Dec 13 10:30 bzexe
lrwxrwxrwx 1 0 0 6 Dec 13 10:30 bzfgrep -> bzgrep

In the output above, if we say columns are separated by whitespace, columns three and four represent the uid and gid values of each file. This listing is for the /bin directory and you can see that everything here is owned by uid 0 or the root user and the group assigned to each file is 0, also root.

When attackers compromise *nix systems, it is common for them to download "tar archives" (Windows users should think of tar files as zipped up folders) that contain malicious binaries that may be used to sniff traffic or to plant backdoors, etc. These tar files preserve the uid and gid values from the systems where they were created. This can be beneficial to investigators because when those archives are "untar'd" on the target system, those uids and gids from the system of origin will persist, even if those uids and gids are invalid for the target system, meaning no user or group exists with those numeric values.

Fantastic. How is this useful for digital forensic analysts? Observant readers of my previous post on "outlier analysis" may have noticed that some of the malicious files uncovered by that technique had unusual uid and gid values in addition to unusual inode addresses that made them outliers.

As part of the ongoing development work I'm doing in my so-called "free time," finding statistical anomalies in file systems via fls bodyfiles, I've created a short Python script called that prints the distributions of uids or gids (depending on how it's called) on a per directory basis where there is variation. Running this utility on the fls bodyfile from my previous post gives the investigator some leads for finding malicious code on the system. Here is a redacted sample of the ouput:

./ --file sda1_bodyfile.txt --meta uid
[+] Checking command line arguments.
[+] sda1_bodyfile.txt may be a bodyfile.
[+] Discarded 0 files named .. or .
[+] Discarded 0 bad lines from sda1_bodyfile.txt.
[+] Added 20268 paths to meta.
Path: /etc/cron.daily
Count: 1 uid: 1000
Count: 9 uid: 0
Path: /usr/lib
Count: 1 uid: 10
Count: 1 uid: 37
Count: 1 uid: 1000
Count: 2082 uid: 0

What the output shows is that in /etc/cron.daily there are nine files with a uid of 0 and one file with a uid of 1000 and in /usr/lib there are 2082 files with a uid of 0 and one file each with uids of 10, 37 and 1000. These odd uid values in these directories are things that may be worth investigating. In this particular case, the files with uids of 10 and 1000 are part of the attacker's malicious files on the system.

As with body-outliers, body-ugid-dist won't be a sure-fire way of finding all the evil in *nix file systems, but in cases where you're starting out with "the system is compromised," but no idea of when or how or where the malicious code is and there are 100s of thousands of files on the system, running this script against an fls bodyfile may reduce the data set to something more manageable and give you some leads. In my case, the bodyfile was reduced from more than 200K files to around 350 and of those, focusing on standard system directories (e.g. /bin, /boot, /dev, /etc, /sbin, /usr, /var) reduces the data set even further.

This approach to forensics is something that students of SANS 508: Advanced Computer Forensic Analysis & Incident Response will have the knowledge to do when the leave the classroom, though it may not be something we teach directly. If you want to advance your understanding of file systems and take your forensics beyond point and click tools, I will be teaching 508 in Phoenix in February.

Dave Hull is a senior forensics team lead in a Fortune 500 incident response team. He is also a principal consultant for Trusted Signal a boutique information security consultancy focusing on incident response and computer forensics.