Application Metadata of Nested Documents

by John McCash

I was drawn to consider someting by a question on a certification practical exam I recently took. The problem had been presented as "find the specified text in the supplied disk image". However the text actually turned out to be viewable in a jpeg file which was nested inside a Word document. Once I'd found the text, the question was essentially answered, but then I started thinking about extraction options and the origins of that JPEG file.

I recalled a tool I'd recently discovered thanks to traffic on the GCFA mailing list, hachoir-subfile. The original email context was about using this tool to extract executable objects from PPS files, but it turns out that it works equally well to extract .jpg files. I had always assumed that when image files were incorporated into MS Office documents, they were somehow re-encoded,