SANS Digital Forensics and Incident Response Blog

How to Extract Flash Objects From Malicious MS Office Documents

Authors of malicious Microsoft Office document can execute code on the victim's system using several techniques, including VB macros and exploits. Another approach, which has been growing in popularity, involves embedded Flash programs in the Office document. These Flash programs can download or directly incorporate additional malicious code without the victim's knowledge. This note demonstrates several steps for extracting malicious Flash objects from Microsoft Office document files, so you can analyze them. We take a brief look at using strings, Pyew, hachoir-subfile, and tools for this purpose.

I will start by looking at the malicious Microsoft Word file "Iran's Oil and Nuclear Situation.doc", which exploited the CVE-2012-0754 vulnerability in Flash Player. To examine this file in your own a properly configured lab, you can get this sample from the Contagio collection, which is where I obtained it.

Locating the Embedded Flash Object

You can notice the presence of the Flash object inside this Word document by looking at the strings embedded into the file:

ShockwaveFlash String

By default, the strings command searches for ASCII strings. It's often worth to also search for Unicode-encoded strings by specifying the "-encoding=l" parameter.

Another way to notice the embedded Flash program is to load the Word document into Pyew:

pyew in action

Pyew parsed the Word document's OLE structure and identified object ID D27CDB6E-AE6D-11CF-96B8-444553540000, which is assigned to Flash.

Yet another way to locate the embedded Flash object is to use the hachoir-subfile utility, which is very handy for locating and identifying all sorts of embedded artifacts:

hachoir-subfile in action

Both Pyew and hachoir-subfile are installed on the REMnux Linux distro, which I am using for this example.

As we saw, the attacker embedded the Flash object inside the Word file. (Here is how a Flash object could be embedded manually inside an Office document.) When the Word file is opened, the embedded Flash object will be executed using Flash player. To determine what the Flash object does, you need to extract from the file.

Extracting the Embedded Flash Object

You could carve out the Flash object using a hex editor. A better approach might be to use the tool by Alexander Hanel. (This tool is already installed on REMnux.)

xxxswf in action

The "-x" parameter to tells the tool to extract any SWF objects embedded in the specified file. The tool saves the carved-out file as 128a66cc3efe6f424c3fedcc4b6235ac.swf, which matches the extracted Flash file's MD5 hash. the "-d" parameter to directs the tool to decompress the extracted SWF file, which is then saved as 128a66cc3efe6f424c3fedcc4b6235ac.2.swf.

Analyzing the extracted Flash file is outside the scope of this post. However, you can examine its strings and locate embedded URLs by using "strings" and "grep".

Another tool that could extract the embedded SWF file is by "noonat":

extract_swf in action

The tool extracts and decompresses the embedded SWF object into the out001.swf file. For more details regarding using to analyze this sample, see its authors blog posting, which is written in Chinese. The posting shows how to analyze the extracted Flash object with the help of Adobe's SWF Investigator tool.

The utility is not presently installed on REMnux. However, you can easily add it if you have Internet connectivity:

installing extract_swf

Executing External Flash Programs from Office Documents

In the example above, the attacker embedded the Flash object directly in the Microsoft Office document. Alternatively, the attacker could have stored the SWF file on an external website. This technique was used in the malicious "World Uyghur Congress Invitation.doc" file, which you can also obtain from the Contagio repository. This malicious document exploited the CVE-2012-0779 vulnerability in Flash player.

Neither hachoir-subfile, nor other tools locate an embedded SWF object. That's because it's not present within this Word document. However, extracting Unicode strings from the file shows embedded JavaScript that retrieves the malicious SWF object from a remote URL:

Looking at the ASCII strings embedded into the malicious Word file shows the presence of the "Microsoft Scriptlet Component" implemented as the ScriptBridge ActiveX control. This control provides attackers with another method of automatically executing scripts when the Microsoft Office document is opened. (If you have more details about this technique, please leave a comment.)

ScriptBridge String

As you can see, attackers have been finding creative ways of using Flash programs as part of Microsoft Office documents to infect computer systems. Fortunately, the tools discussed above can help you locate and begin analyzing the malicious artifacts used in such attacks.

Related posts:

Lenny Zeltser teaches malware analysis at SANS Institute. At the "day job," Lenny focuses on safeguarding customers' IT operations at NCR Corp. He is active on Twitter and writes a security blog.


Posted May 30, 2012 at 6:54 AM | Permalink | Reply


Very informative post Lenny. Thanks for sharing.

Posted November 2, 2012 at 9:58 AM | Permalink | Reply


I recently published an extended version of called pyxswf, which handles OLE streams in MS Office documents correctly, so it that works even if the OLE structure is fragmented (known obfuscation technique). See for more info.