SANS Digital Forensics and Incident Response Blog

Digital Forensic SIFTing: String Searching and File Carving using srch_strings_wrap

The latest version of the SIFT 2.12 contains a few scripts I wrote, and Rob asked me to write a post for the blog going over their functionality. The scripts add on to the functionality provided by The Sleuth Kit's srch_strings to provide additional information on string matches and automatically carve out matching files or blocks. The scripts are located in /usr/local/bin and are as follows:

  • srch_strings_blk
  • srch_strings_pipe
  • (the initial version of the perl script srch_strings_wrap)
  • srch_strings_wrap (note that in SIFT 2.12, this file's permissions will need to be changed to 755; there are also a couple bugs in the SIFT version: one affecting auto-carving and the other grepping, so be sure to download the latest version)
While they can be found in SIFT, you can also get them from my GitHub repository: Read on for more information on how to use these scripts.


I recently took SANS FOR508 with Rob Lee in Las Vegas. For those familiar with the class, one of the areas covered is string searching through an image by using srch_strings from the Sleuth Kit to obtain the byte offset of a matching string. This is done using the "-t d" option:

Then, after obtaining the block size of the filesystem using fsstat, we figure out which block each of these strings is in. For example, this is an image of a filesystem with 1024 byte blocks, so divide each byte offset by 1024, and drop the remainder to get the block:

Block  String
7 vmlinuz-2.2.14-5.0
7 module-info-2.2.14-5.0
256 lost+found
256 kernel.h

During class, I got tired of opening the calculator to figure out these blocks, so I came up with a little one liner to do everything at once:

Eventually, I got tired of typing that out and turned it into a script when I got back home. That script turned into the three initial shell scripts I created, which I'll give a brief overview of below.

The Initial Scripts

If you're used to running srch_strings and just wanted to get the block for all the matches, you could use srch_strings_pipe:

The srch_strings_blk command is very similar, but instead runs against the output of srch_strings which has been previously dumped to a file. With this command, I built in a check for the blocksize so it doesn't need to be specified. Instead, you can point to the original image the strings dump is from:

The last original script is a rudimentary version of the current srch_strings_wrap. It accepts the -b argument to specify the blocksize. If that is not given, it will behave exactly like srch_strings. If the -b is given, it will print out the block as the previous two scripts do:

These scripts write to temporary files and definitely aren't the best or most efficient. They can be found in my GitHub and in SIFT and were what I initially came up with and presented to Rob. He put me in touch with Hal Pomeranz, who had been talking about doing something similar. After a few emails about some additional functionality, I switched the code over to perl and created what became the current srch_strings_wrap.

The Old Way

Before going into the examples with srch_strings_wrap, I wanted to briefly show all the manual steps it accomplishes with one command. In the image I'm working with, I will search for the string ADVISORY, which has two matches in the image. There are several commands needed to extract all the information we want and carve the file out:

  • fs_stat to determine the block size
  • srch_strings with grep to find the matches
  • divide the byte offset by the block size to determine the block number
  • blkstat to check the allocation status of the block
  • ifind to find the inode using the block number
  • istat to check the allocation status of the inode
  • ffind to find the filename for the inode
  • icat to carve the file using icat
  • list the carved file

These 8 commands (not counting the final ls) are combined into one by using srch_strings_wrap.

The New Way

By using -d (enable additional features and determine block size), -g (grep for ADVISORY), and -A (autocarve), we can accomplish the 8 steps above in one command. This is even better because the commands before only found the matches and carved out the one matching file. This srch_strings_wrap command will print the data and carve both matching files.

srch_strings_wrap — Overview

Currently, the command line options include:

If no special options are given (such as "-d" or "-b <blocksize>"), srch_strings_wrap can be used instead of srch_strings and will output the same results.

The blocksize of the filesystem can be specified (-b) or automatically determined from the image (-d). Multiple filesystem images can be given as arguments, but only one full disk image can be specified. The output can be grouped by file/inode/block (-O) or printed out line by line (default). It supports custom delimiters (-F) and can output to CSV (-C). Output can be written, if desired with a header (-H), to a file (-w), to standard out (default), or not at all (-N). Grep terms can be passed on the command line (-g) or in a dirty word file (-G), with case insensitivity (-i).

If full lookups to the filename layer are not needed, the level can be specified to decrease runtime: byte (-l0, no different from "srch_strings -t d"), block (-l1), inode (-l2), and filename (-l3, the default). There is an option to autocarve (-A) which will carve out all matching strings at the highest level available.

And if multiple grep searches will be conducted, "srch_strings -a -t d fs.img > output.asc" can be run on an image to capture all the strings and save the output to a file, then -P can be used to accept the output of that file piped in ("cat output.asc | srch_strings_wrap -P -g REGEX -I fs.img"). This way the initial time consuming dump of strings only has to be run once.

All of this information is printed in the help menu (-h).

srch_strings_wrap — Examples

In showing the new way to do searching above, I gave a basic version of using the srch_strings_wrap command. Now I'll give some more examples showing all the different command line options.

As I said in the overview, if you just supply the same command line options as you would to srch_strings, srch_strings_wrap will give the same output:

If you know the blocksize you can specify with the -b option or use -d and it will be determined from fsstat. These basic versions of the command will print all string matches with the additional information:

There are a few different output options. To write STDOUT to a file, use "-w file". To suppress STDOUT, use "-N". To print a header line, use "-H", which for the output above would be:

The default delimiter is the tab character, but it can be changed with "-F delim" where delim is 1 or more characters to use. Alternatively, "-C" can be used to print in CSV format, which will put quotes around the string and escape any quotes within the string.

The default output takes the srch_strings output and prepends the additional columns. Another option is to use "-O" to print in a more human readable format that will group all the hits within a single file or inode, if it was found, or the block if not.

All these commands are using the default "level" of 3 which tries to go all the way from the byte offset (level 0) to the block (level 1) to the inode (level 2) to the filename (level 3). The "-l #" option can be used to specify a custom level if going all the way to the filename layer is not needed. The output will be adjusted accordingly and the command should run faster at lower levels. Note that "level 0? is essentially the same as the basic srch_strings output. Here's an example of only going to level 1:

The -A option can be used to automatically carve any matches into a folder. The default folder is in the current directory and is called ssw_output, but it can be changed with the "-D path" option. Note that I used -N to suppress STDOUT.

The file named DIRECTORY_FILE is, in this case, the root directory on the filesystem. The directory structure within ssw_output is "image_name/partition_number/" followed by [root] for allocated files, [deleted] for deleted files and [filename_unknown] when the filename couldn't be found. If an unallocated block is carved, then the top level will either be [unallocated] for a data block or [metadata] for a metadata block.

All of these examples so far have assumed that the image is a partition or filesystem image. This is the default, but a full disk image can be given as well. The mmls command is used to pick the partitions and their offsets. The partition number will be prepended in the output and "00? will be used if it's just a filesystem image.

Autocarving the whole image may match on many files, so srch_strings_wrap accepts "-g regex" or "-G file" where regex is a grep regex or file contains a list of regexes to pass to "grep -f". Case insensitivity can be specified with "-i". The grep commands can be used with or without the "-A" option.

The last option is "-P" which, rather than an image, accepts the output of a previously run srch_strings command. This would be useful if you wanted to run srch_strings on an entire image just once, then wanted to run multiple different "grep" searches on those results. The precomputed file is cat'd in via the pipeline. The "-I image" is also required where image is the image file that srch_strings was run against:

Additional Links

Here are some links if you're interested in keeping up with the latest on this tool:

If you have any ideas for future updates or find any bugs, let me know either on Twitter or send email to dave {at} superponible {dot} com.


— Dave

Dave Lassalle has over 8 years of experience in Information Security. He is currently a Senior Cyber Security Engineer for a government contractor in New Orleans, LA, focusing on SIEM analysis, forensics, and incident response. You can follow Dave on Twitter @superponible.


Posted December 24, 2011 at 11:25 AM | Permalink | Reply

This is fantastically discussed one. Good explanation has been brought about the scripts.