SANS Digital Forensics and Incident Response Blog

How math can help with forensics

Data mining, text mining and network association are all statistical tools that have come into their own as the shear quantity of available computational power increases. True, you do not need to have a strong basis in math to use these programs, but math can help determine where they may be used.

Text data mining takes the standard associative keyword based search techniques and increases their effectiveness through the ability to map associations with other words and to create visual representations of the data. This allows an investigator to drill down into previously undetermined associations and also allows the investigator to analyze immense amounts of data. One of the problems in the past has been in how to represent this data.

This is where visualisation technologies come to play. These allow the investigator to uncover previously hidden relationships in the data. More importantly, the visualisation techniques that are available today make the reporting to a lay jury simpler.

In the visualisation network:

A dot represents a person and is also called a node.

A line connecting two dots represents an existing conversation and is also called an edge.

The GEOMI software developed at NICTA is an Open source project that consists of a set of Java scripts developed at the Systems Biology Initiative, University of New South Wales (Ho et al. (2008) J Proteome Res., 7:104-12).

The benefits that come from this type of visualisation come from the simplification of complex datasets (such as social networks, chats and logs) into an easily comprehensible 3-D map that a user can rotate, zoom and otherwise interact with.

My team has used this type of program in modelling chat logs. In the image above (the names have been altered to remove the details related to a case), the social networks are displayed with the tightly connected groups being packed together and the "outsiders" to the conversations are displayed further apart in the network display. This program has allowed for the display of social relationships between chat users. Additionally, it has been used to model changes to logs and to detect tampering with evidence.

The GEOMI program is developed by the Systems Biology Initiative, UNSW. [Prof. Marc Wilkins, Director,, Simone Li & Edwin Ho]. With their help, Ignatius and I shall be publishing a paper on the use of this and other visualisation programs in the following months.

Craig Wright, GFCA Gold #0265, is an author, auditor and forensic analyst. He has nearly 30 GIAC certifications, several post-graduate degrees and is one of a very small number of people who have successfully completed the GSE exam.


Posted October 28, 2008 at 10:29 AM | Permalink | Reply

Paul Bobby

How did the visualization of those chat logs help the investigation?
I have used Network Security Monitor (intellitactics) since v3.3 and became frustrated with the visual analysis aspect of data once the depiction became too busy. Other than knowing that ''something big' was going on, there was no other intelligence that could be gleaned from this type of ''web' once it got busy.

Posted October 29, 2008 at 8:19 PM | Permalink | Reply


The visualization supported the underlying calculations. The network in question involved knowledge that certain parties where connected. The low link rates (with these parties on the fringes) did not support facts.
Seeing a person on the fringe who is known to be a central figure enabled a simple means to report that the logs had been tampered with. There are other calculations to support this, and in fact we also used a classification scheme based on RF (random forests), but these are more difficult to explain to a jurist.
The visualization is in effect a means to support other work.