SANS Digital Forensics and Incident Response Blog

Detecting Shellcode Hidden in Malicious Files

A challenge both reverse engineers and automated sandboxes have in common is identifying whether a particular file is malicious or not. This is especially true if the malicious aspects are obfuscated and only triggered under very specific circumstances.

There are a number of techniques available to try and identify embedded shellcode, for example searching for patterns (NOP sleds, GetEIP etc), however as attackers update their methods to overcome our protections it becomes more difficult to find the code without having the exact version of the vulnerable software targeted, and allowing the exploit to successfully execute.

In this post, I will discuss a new technique I have been experimenting with, which approaches this issue from a different perspective, forcing the execution of the exploit code, no matter what software you have installed. It is based on two core principles:

  1. If you try and execute something that isn't code (e.g. a text string), the program will likely crash as the machine code interpretation of this data is unlikely to make much sense.
  2. If you begin executing code from the start (i.e. wherever the instruction pointer would have been set during the exploitation phase), it will run to completion - no matter how obfuscated the instructions are.

So here's my theory: If we attempt to "execute" the contents of a malicious file (such as a pdf), byte by byte, catching the exceptions as the program continually crashes and then increasing the instruction pointer by one each time, we will eventually come across any malicious code contained therein which will be triggered, run to completion, and provide indicators of its malicious nature through behavioural analysis.

The experiment

In order to test this concept, I wrote a program which does the following:

  • Maps the requested file to memory (i.e. make a full copy of it in memory).
  • Set the instruction pointer to the first byte, and allow it to run.
  • It will probably crash! (The instructions won't make sense!!)
  • Catch the error, and use the error handler to increase the instruction pointer by one.
  • Try again, and again, and again...
  • If the file contains shellcode, you should eventually hit it, and it will run - hurrah!
Demonstrating the concept
Step 1 - Generating malicious document and starting metasploit reverse handler

We begin by generating a malicious pdf document containing the reverse_tcp metasploit payload, and starting the handler to await incoming connections. The attacker is now waiting for the victim to open the file with a vulnerable pdf reader, at which point it will connect back to the attackers machine.


Step 2 - Dealing with the malicious pdf

Now, let us imagine we are conducting an analysis on this document (either manually, or using an automated sandbox) - the issue we are going to have in this case, is that we are unlikely to have the vulnerable version of the software installed, the exploit won't work, and we will be none the wiser that it exists! This isn't to say that the intended victim doesn't have the vulnerable version installed.

Let us try running the PDF through our proof-of-concept shellcode hunter...


Step 3 - Bingo - shellcode has been located and triggered

As we can see below, the shellcode in the document has been triggered and established a connection back to the metasploit listener! If we were conducting a behavioural analysis, we would be able to identify the suspicious activity and take appropriate action.


Video demo

Check out the video demo if you'd like to see this in action live:


Code sample

If you're interested in testing the concept, or integrating it into your software (anyone fancy writing a cuckoo module?) - The code I used was pretty simple, and looked like this:


It could definitely be a lot more advanced than the proof of concept I wrote for this demo, for example, if the shellcode started with a JMP $-2 instruction it would trap this code by causing an infinite loop. This could be potentially overcome using multi-threading to continue the search after the first code block has been found.

You may have to play with your compiler settings to get this to work. I set Visual Studio to compile with the 'Debug' configuration and switched off some of the protections. If you need some help getting it working, send me a tweet.

I will be presenting this and a few other concepts during a session at SANS London in a couple of weeks, if you're attending the conference - it would be great to have you along!

Let me know what you think!

Follow me on Twitter: @CyberKramer


Posted June 29, 2015 at 9:48 PM | Permalink | Reply

Dale "Chip" McGleenon

Really looking forward to the 610 and to see this presentation also! Well done Adam keep up the great work for the community.

Posted July 1, 2015 at 6:32 AM | Permalink | Reply


What if the malicious file dynamically decodes the shellcode (using js code in a pdf stream for example) before running it (as it Is usually the case with exploit kits)?

Posted July 1, 2015 at 2:49 PM | Permalink | Reply


It's a game of cat and mouse and it seems malicious file makers always find a new way to make our lives a little harder. Thanks for sharing the code sample, it will help a lot.

Posted July 2, 2015 at 9:07 AM | Permalink | Reply


Hello, does shellcode_hunter also execute embedded scripts that would, for example, dynamically decode the shellcode (for example an encoded js script in a PDF stream)?

Posted July 2, 2015 at 9:12 AM | Permalink | Reply


Hello, what if the shellcode is dynamically build / decoded by an embedded script (like would be the case with an encoded/compressed js script in a PDF stream)?

Posted July 6, 2015 at 6:27 AM | Permalink | Reply


there is a libemu demo called sctest. it does similar things as your project. what's the difference between them?

Posted August 8, 2015 at 4:14 PM | Permalink | Reply

Adam Kramer

Hi ''" thanks for the feedback / questions:
strobe ''" This wouldn't work for shellcode that is dynamically built (js/pdf), it's for raw shellcode which exploits vulnerabilities in the respective loader
liuya ''" sctest emulates the execution of shellcode (much better than this!) the difference is this is more focused on finding that shellcode in the first place
Rowandro ''" Thanks for the feedback! Glad to hear it's useful!