SANS Digital Forensics and Incident Response Blog

Simple Anti-Forensic and Signature stamping techniques using Unicode

by Craig Wright

The introduction of Unicode characters (such as Persian, Cyrillic and Arabic characters) has introduced both a simple means of fingerprinting intellectual property (signature stamping) and a very simple steganographic data hiding technique.

The following is an extract from the Cyrillic Unicode character set [1].

Unicode # Character

0410 ? CYRILLIC CAPITAL LETTER A

0430 ? CYRILLIC SMALL LETTER A

0412 ? CYRILLIC CAPITAL LETTER VE

0415 ? CYRILLIC CAPITAL LETTER IE 0435 ? CYRILLIC SMALL LETTER IE

041C ? CYRILLIC CAPITAL LETTER EM

041E ? CYRILLIC CAPITAL LETTER O

043E ? CYRILLIC SMALL LETTER O

0420 ? CYRILLIC CAPITAL LETTER ER

0440 ? CYRILLIC SMALL LETTER ER

0422 ? CYRILLIC CAPITAL LETTER TE

0443 ? CYRILLIC SMALL LETTER U

0405 ? CYRILLIC CAPITAL LETTER DZE (this is the Old Cyrillic zelo - Macedonian)

0455 ? CYRILLIC SMALL LETTER DZE

The basic Latin character table reflects these same symbols. The difference is that the displayed character is not the same. For instance, this can be used by an attacker seeking to complete a phishing attach using a similar domain name now that the registration of Unicode characters has been allowed. For instance, the following domains are distinctly different, but appear the same:

Microsoft.com

\x004D\x0069\x0063\x0072\x006F \x0073\x006F\x0066\x0074\x002E\x0063\x006F\x006D

and

?i?r???ft.com

\x041C\x0069\x0441\x072\x043E\x0445\x043E\x0066\x0074\x002E\x0063\x006F\x006D

Unicode Mixed CharactersLatin Characters
041C ? CYRILLIC CAPITAL LETTER EM0069 i LATIN SMALL LETTER I

0441 ? CYRILLIC SMALL LETTER ES

0072 r LATIN SMALL LETTER R

043E ? CYRILLIC SMALL LETTER O

0455 ? CYRILLIC SMALL LETTER DZE

043E ? CYRILLIC SMALL LETTER O

0066 f LATIN SMALL LETTER F

0074 t LATIN SMALL LETTER T

002E . FULL STOP

0063 c LATIN SMALL LETTER C

006F o LATIN SMALL LETTER O

006D m LATIN SMALL LETTER M

004D M LATIN CAPITAL LETTER M0069 i LATIN SMALL LETTER I

0063 c LATIN SMALL LETTER C

0072 r LATIN SMALL LETTER R

006F o LATIN SMALL LETTER O

0073 s LATIN SMALL LETTER S

006F o LATIN SMALL LETTER O

0066 f LATIN SMALL LETTER F

0074 t LATIN SMALL LETTER T

002E . FULL STOP

0063 c LATIN SMALL LETTER C

006F o LATIN SMALL LETTER O

006D m LATIN SMALL LETTER M

At the same time there are positive uses for this type of technique. Word documents can be embedded with seemingly harmless information. If this document is ever published on the web, it can be searched for using an engine such as Google. Also, it can be added as a string for a standard forensic string search. Find the string and you have your document.

Think of file names as well. Windows will allow names to be created using Unicode characters. Hence, if you are looking for a file called "cat.txt", a simple string search will miss "cat.txt" defined using the following Unicode, (\x0441\x00430\x00074\x002E\x0074\x0078\x0074). I have linked a site that does online Unicode conversions and display.

An issue with trying to uncover all versions and possible combinations is that this is an NP infeasible problem. There are more ways to hide data than there are to create simple string searches. This means that we as forensic professionals need to use our greatest tool — our Brain. Things are not always as they seem.

[1] Unicode Character Table: Cyrillic

http://jrgraphix.net/research/unicode_blocks.php?block=8

Craig Wright is a Director with Information Defense in Australia. He holds both the GSE-Malware and GSE-Compliance certifications from GIAC. He is a perpetual student with numerous post graduate degrees including an LLM specializing in international commercial law and ecommerce law as well as working on his 4th IT focused Masters degree (Masters in System Development) from Charles Stuart University where he is helping to launch a Masters degree in digital forensics. He starts his second doctorate, a PhD on the quantification of information system risk at CSU in April this year.

1 Comments

Posted June 13, 2012 at 4:03 PM | Permalink | Reply

isomorphismes

I've also seem spammers steal content (text) from websites and then change m to