SANS Digital Forensics and Incident Response Blog

Making Reviewing Files From Data Carving Easier: Documents

This is my second installment on dealing with files recovered through the use of data carving tools. As I said in my previous post on data carving, that having to do corporate forensics, I end up having mountains of files to go through after running data carvers like Foremost/Scalpel or Photorec. Most of the programs out there either can't handle the amount of files or are very time consuming to work with. One of the worst ones to go through was document files. You know the routine, where you have to double-click the file and load it up into Word or whatever type document reader, th en do a quick view of the pages, and then it's on to the next one. The Docs-processor script will do all of that for you. It turns anything OpenOffice can read into animated GIFs. This allows you to review the files visually before having to into further analysis on the document. And there is one more thing... You can add your own plugins where they will be executed on each document and the details are put into a web page for you to review.

I have a few more processors to release and hope to release them all by Christmas.

Doc Processor

Like the others, this script takes anything that OpenOffice can read and turns it into animated GIFs.
* Creates a series of web pages that contain a thumbnail of all readable docs
* Gathers details about the files such as Exif data
* Can gather whatever data you can think of due to plugins

File Types That Should Work With The Script
(Source: HTTP://

  • Microsoft Word 6.0/95/97/2000/XP) (.doc and .dot)

  • Microsoft Word 2003 XML (.XML)

  • Microsoft Word 2007 XML (.docx, .docm, .dotx, .dotm)

  • Microsoft WinWord 5 (.doc)

  • WordPerfect Document (.wpd)* WPS 2000/Office 1.0 (.wps)

  • .rtf, .txt, and .csv

  • StarWriter formats (.sdw, .sgl, .vor)

  • DocBook (.xml)

  • Unified Office Format text (.uot, .uof)

  • Ichitaro 8/9/10/11 (.jtd and .jtt)

  • Hangul WP 97 (.hwp)

  • T602 Document (.602, .txt)

  • AportisDoc (Palm) (.pdb)

  • Pocket Word (.psw)

  • Microsoft Excel 97/2000/XP (.xls, .xlw, and .xlt)

  • Microsoft Excel 4.x—5.0/95 (.xls, .xlw, and .xlt)

  • Microsoft Excel 2003 XML (.xml)

  • Microsoft Excel 2007 XML (.xlsx, .xlsm, .xltx, .xltm)

  • Microsoft Excel 2007 binary (.xlsb)

  • Lotus 1-2-3 (.wk1, .wks, and .123)

  • Data Interchange Format (.dif)

  • Rich Text Format (.rtf)

  • Text CSV (.csv and .txt)

  • StarCalc formats (.sdc and .vor)

  • dBASE (.dbf)

  • SYLK (.slk)

  • Unified Office Format spreadsheet (.uos, .uof)

  • .htm and .html files, including Web page queries

  • Pocket Excel (pxl)

  • Quattro Pro 6.0 (.wb2)

  • Microsoft PowerPoint 97/2000/XP (.ppt, .pps, and .pot)

  • Microsoft PowerPoint 2007 (.pptx, .pptm, .potx, .potm)

  • StarDraw and StarImpress (.sda, .sdd, .sdp, and .vor)

  • Unified Office Format presentation (.uop, .uof)

  • CGM — Computer Graphics Metafile (.cgm)

  • Portable Document Format (.pdf)

  • Oh and any Open Office documents :)

Perl modules: Getopt::Long, Pod::Usage, File::Basename, Config::IniFiles, OLE::Storage, Unicode::Map, Startup, Image::ExifTool, Digest::MD5, Digest::SHA, OLE::PropertySet, Getopt::Std
Libraries and packages installed: Imagemagick, Ghostscript, unoconv

Unoconv can be obtained at:

Standard Plugins — Uses Exif to dump whatever metadata it can find in the file. — Calculates the MD5 hash for the file. — Calculates the SHA 512 has for the file. — A perl script written by Mr. Harlan Carvey for dumping metadata from Word documents.


  1. Install OpenOffice
  2. Install the listed Perl modules
  3. Install the other binary requirements such as Imagemagic, Ghostscript, and unoconv. If you're running Fedora, all three can be installed via yum.

INI File

The INI file (data_processor.ini) contains the user configurable options for each one of the data processor scripts.

Each line has a comment before the parameter. See the INI file for more details.


Here are the mandatory screenshots. :) Click on the image to bring up a larger version.

Web Page

Spreadsheet Details

Spreadsheet Details Part 2

Spreadsheet Details Part 3

Running The Program

Commandline Example: ./ -inputdir /export/data_carver_processors/doc_exam -output doc-index -plugindir /export/data_carver_processors/docs-plugins -ini /export/data_carver_processors/data_processor.ini

After the program has gone through the documents, bring up your favorite web browser and open up the file you gave it with the -output option. In the above case, I would open up doc-index.html in the directory where I ran from.


-ini FILEIni File (configuration)
-title TITLEHead page with this title.
-inputdir DIRInput directory
-output FILEName output file with this name instead "index.html"
-plugindir DIRPlugin directory
-imagenum NUMBERNumber of thumbnails per page; default is 2000
-perrow NUMBERNumber of thumbnails per row; default is 4
-imagesize NUMBERSize of the thumbnails in pixels; default is 150 pixels
-quality 0..100Quality of the thumbnails from 0 to 100; default is 80
-help or -manShow this text and exits

Other Notes

Feedback: Please send me an email with any features/plug-ins you would like to see. If you find any errors with the scripts, let me know. I am also interested any plug-ins you want to share. If you like the program, let me know, too. I don't mind positive feedback.

Errors: As the script runs over the files you may see some errors outputted. The errors are from the programs running on the recovered files. Not all of the files that the data carvers recover are good files. Hence, the errors.

License: GPL 2.0

Download at: data_carver_processors.tar.gz

Contact: cs[at]

Keven Murphy, GCFA Gold #24, is the Senior Forensics/Incident Handler to General Dynamics Land Systems.


Posted December 9, 2009 at 3:05 PM | Permalink | Reply

Rob Zirnstein

I'm impressed with what you have put together here. What tool are you using to sort the recovered files before running these scripts on them? If they have missing or wrong file extensions, are you correcting the file extensions?

Posted December 9, 2009 at 10:50 PM | Permalink | Reply

Jean-Francois Gingra

I really like the idea and will definitively use it the next time I deal with large amount of evidences data. Very nice work.
Rob: For file type detection, I think OpenOffice has modules/services called TypeDetection (flat) and ExtendedTypeDection (deep).

Posted December 10, 2009 at 1:06 AM | Permalink | Reply

Keven Murphy

Thank you.
It should work with any data carver. I mostly use photorec, foremost, and scapel.
I am not correcting the extensions. I have not seen a data carver put a .jpg extension on a zip file. But I suppose it is possible. If the conversion program cannot deal with the file given to it, it will just error out and continue to the next file.
I have greater plans for the processors and some time during Christmas to work on them.
K Murphy

Posted January 11, 2011 at 9:41 AM | Permalink | Reply


anybody knows where can I get the OpenOffice v.2.0. I need this version for install unoconv but I don't find it.
Thank you very much.

Posted January 14, 2011 at 2:46 PM | Permalink | Reply

K Murphy

unoconv works with atleast version 2.0 and up. I have been using the latest version of OpenOffice with out any issues. If you are using Fedora, just do a yum download of unoconv.
K Murphy