SANS Digital Forensics and Incident Response Blog

ESE Databases are Dirty!

With the release of Internet Explorer 10, Microsoft made a radical departure from the way previous browser artifacts were stored. The perennial Index.dat records were replaced with a centralized meta-data store for the browser using the proven "JET Blue" Extensible Storage Engine (ESE) database format. While many forensic examiners have remained blissfully unaware of the ESE format, it has been increasingly used throughout Microsoft products for Exchange, NTDS.DIT, the Windows search database, Windows Live Messenger contacts, and Internet Explorer (IE). With the introduction of an enterprise-grade database hosting network artifacts, it is now time for every Windows investigator to understand how the database works and what data they may be missing. Remember that even if a user never opens Internet Explorer, there may still be valuable records in their IE database including files opened on the local system, network shares, and removable devices. It may also hold evidence of malicious activity including HTTP connections initiated on behalf of malware or suspicious sites visited via links clicked in email clients. Internet Explorer and its supporting libraries are deeply tied to the Windows operating system and WinINet API functions often interact with IE databases. Thus IE history, and the WebCache database in particular, continues to be a rich data source during many forensic examinations.

The Path to the WebCache

While the Internet Explorer implementation certainly doesn't require it, the ESE database format was built to handle a massive number of near-simultaneous transactions. Each transaction can require multiple changes to the database, making it infeasible to write changes directly to the database file on disk as you might see in older database formats like "Index.dat". Instead, the ESE database uses a write-ahead model, meaning new data is not immediately written into the database. Figure 1 shows the update process. Changes are written first into memory within the log file buffers, which typically range from 64KB in size to the low hundreds of kilobytes. The log buffer is small, so data is quickly written to disk in the form of individual log files, each 512KB in size in the current version of Internet Explorer. ESE log files contain enough information to bring the database to a logically consistent state in the event of a catastrophic event like power failure or application crash. Next, the database creates an in-memory cache and formats the log file data into database pages to be written into the database. These "dirty pages" in the memory cache can persist for hours or even days. Finally, the ESE database continuously monitors the number of dirty pages in the memory cache and prioritizes writes into the database file (WebCacheV*.dat in IE), aiming to preserve disk I/O performance. This seemingly drawn out process explains why relevant data can exist outside of the ESE database and why we continue to find so many browser artifacts (including InPrivate browsing) in memory and the pagefile.

Figure 1: Intermediary Stages of Writing to the WebCacheV*.dat Database

To give an idea of how much data may still be present outside of the ESE database, Figure 2 shows the WebCache directory for a typical IE instance. Note that the last log file was written to well over 24 hours after the last modification time of the WebCacheV01.dat file. Very few forensic tools currently have the ability to extract data from ESE log files and in this example the investigator may have easily missed the last 24+ hours of available artifacts. Once recovered, the log files in this example held an additional 32 history entries, two cookies, and 3041 browser cache entries not yet written to the WebCache database.

Figure 2: Modification Times of ESE Log Files and Database

Mining the Log Files

While ESE log files can be manually interpreted, it is a painstaking process left as a last resort. A much easier option is to replay the dirty pages available in log files into the WebCache database file. Luckily, Microsoft provides a tool to perform this action, ESENTUTL. Due to the wide range of ESE database implementations, ESENTUTL is a powerful and complex tool. Two of its built-in tools are particularly helpful in this case, with the first accomplishing a database header dump. The ESE database header can identify the database state as well as what log files are needed to return the database to a clean, or up-to-date state. In Figure 3 we see the WebCacheV01.dat file is dirty, meaning it is missing data that only exists in the log files. The "Log Required" field identifies which log files are necessary to return the database to a consistent state. These are the log files you should expect to find in the WebCache folder. In this example, the command to pull header information was:

esentutl /mh WebCacheV01.dat

Figure 3: WebCacheV01.dat Header Information

Once the ESE database has been identified as dirty, our next step is to use the ESENTUTL utility to replay the log (.log) and checkpoint (.chk) information into the database. This is accomplished with the Recovery command. A good best practice is to export the entire WebCache folder from your forensic image or remote system and perform this operation on your forensic workstation. Internet Explorer's implementation of the ESE database nicely keeps log and checkpoint files in the same folder as the WebCache database file. If the necessary set of log files identified in the ESE header are present, the recovery operation should be quick and painless, leaving you with the final step of analyzing the database with your tool of choice. The recover command used in this example was:

esentutl /r V01 /d

Figure 4: Successful ESE Database Recovery

Final Precautions

  • This technique will make changes to the ESE database. Ensure you have a backup and only perform on a working copy.
  • Database log files may contain both inserts and deletes! This means that recovering the log files into the database may actually remove some data. Examples of deletes could be cache expiration or data from InPrivate browsing which attempts to delete recorded entries at the conclusion of each session. In cases of extreme importance consider analyzing the dirty and clean versions of the database.
  • Beware of the "Repair" option in ESENTUTL. While it is sometimes necessary to force the database into a clean state so it can be analyzed with certain tools, "Repair" will not apply information from the log files and will delete entries identified as corrupt in the database.
  • Use the same operating system version (or higher) when performing recovery. Some older versions of the utility will not correctly recover newer database versions.
  • While I demonstrated recovery of an Internet Explorer database here, the same process is valid for other applications that employ the ESE database format.

Chad Tilbury, GCFA, serves as a Technical Director for CrowdStrike and has spent over fifteen years conducting computer crime investigations ranging from hacking to espionage to multi-million dollar fraud cases. He is a Senior Instructor and co-author of FOR408 Windows Forensics and FOR508 Advanced Computer Forensic Analysis and Incident Response at the SANS Institute. Find him on Twitter @chadtilbury.

1 Comments

Posted June 3, 2015 at 8:20 AM | Permalink | Reply

Bridgey the Geek

And if you are working with ESE DB files, it's worth knowing that libesedb exists: https://github.com/libyal/libesedb