Hostile code analysis with JavaScript

Cd-MaN

4.43/5 (6 votes)

Sep 21, 2005

11 min read

62312

436

An article on how to uncover the true intents of a JS script.

Download source files - 3.77 Kb

Introduction

Malicious code writers have many attack vectors. Here, I will introduce a JS class which dissects an encoded JavaScript. I will show you a real life example on a script that tries to hide its actions by using some very common techniques and how to bypass them to uncover the true intent of the code.

Background

JavaScript is a very flexible OO scripting language which is mostly known for its capability to run inside browsers and manipulate the web pages on the client side. For more information see the Wikipedia entries for JavaScript [1] and prototype based OO programming languages [2]. Some authors even think that JavaScript will be the scripting language of the future [3]. This article assumes that the reader knows the basic constructs of JavaScript.

Most of the code hiding techniques are composed of two parts: an encrypted string and a decryptor, which un-mangles and finally evaluates the resulting piece of code. JavaScript (and most of the scripting languages) offers functions that take a string and evaluate it as a piece of code. This process is repeated several times (so the "decrypted" string may actually contain another string to be decrypted). The main goal of this article is to show you how to place hooks on these commonly used functions and to redirect them to a log window instead of execution, where the data can be conveniently interpreted.

The hooked functions and the general idea

The frequently used functions in these routines are: document.write, document.writeln and eval (or the old deprecated counterpart of it – Object.prototype.eval). Below you can see a fragment of such a code:

<script language=javascript>
    document.write(unescape('%3C%73%63%72%69%70%74%2...
    dF('%286FVFULSW%2853odqjxdjh%28...
</script>

It is clear that the first line must somehow define the function dF which is most probably the decryptor. Our goal is to hook document.write and instead of execution the output should be redirected to some log window so that we can analyze the result. (A quick alternative would be to replace document.write with alert and observe the output. However this has two drawbacks: if one wants to recreate the code she/he must type it back – as you can't copy-paste from the alert box – and the alert box limits the maximum number of characters that can be displayed, which proved be insufficient in this case). Fortunately, hooking is very easy to do. One can simply write:

function someFunction() {
    //...
}
document.write = someFunction;

and all the calls to document.write are now redirected to someFunction. Next we need a separate the window where the output will be dumped. This can be opened with window.open, however most probably it will be blocked by popup blockers (since the window must appear at startup time, without user intervention to record every call - even those which are made during the loading phase of the page, as are most decryption calls, since their intent is to present to the browser / user a decrypted version). So we should provide an alternative method for opening up the window, and memorize the things we would like to display until the window is open and we can dump the text there. Also, we would like to provide as little namespace pollution as possible (namespace pollution means that we define global functions / variables which may conflict with the existing ones). To avoid this we always declare local variables in functions and wrap the entire code in a class, the name of which can be changed easily with any editor providing a search and replace functionality.

Remark: One could use Venkman [5], the very powerful JavaScript debugger for Mozilla / Firefox. However this wonderful system doesn't perform really well with self modifying code (after all, which normal programmer would write such a code?!)

Implementation details

The code is contained entirely in the file "jsdebug.js". It has three big parts: the declaration of the JsInterceptor class (which can be renamed if needed), the initial call which initializes the system and the function which substitutes the default eval function (this was necessary, since the eval behaves like a standalone function).

General notes: Ihe implementation was done on Firefox 1.0.6 (the latest stable release while writing this document) and while I've tried to be cross browser compatible, I've never tested on other browsers. Also, if you are going to analyze hostile code, I recommend Firefox since it is a very secure browser and the flaws are patched up very rapidly (and through the automated notify system you get to know about it very fast).

During initialization of the system the following things are accomplished:

backup the original functions (because we need to invoke them in our code!),
try to open up the debug window,
if we fail, we register an event handler (JsInterceptor.SetupWindowOpener) which is invoked upon terminating the loading of the document, and which creates an element, clicking on which will result in opening the debug window (this way we can avoid the popup blockers, since they detect that the window was the result of a click and allow its creation),
we override the old functions, also faking their toString() function (you can read about the reasons behind this later on in the "Detecting our implementation" part, but basically it is to prevent a simple detection method.

Now, I will provide a short description of every method and any important implementation quirk it might contain:

CopyToClipboard - copies a given text to the clipboard. The script was taken from experts-exchange.com [13].
InterceptorWriteLog - this is the function, which logs a given event. There are two possibilities: if the log window is opened, the event is dumped there right away, otherwise it is stored in a temporary array for later display. It takes three parameters: the event description (a string), the event parameters (an array) and the function which simulates the execution of the event. If the window is open, it builds up the HTML code to display the log and puts it in the log window. The entry consists of all the received data plus a link to simulate the execution of the function and a link to copy the parameters to the clipboard.
The custom functions which are called instead of the built-in ones: NewDocumentWrite, NewDocumentWriteLn and NewEval. One notable thing is that the first two don't simulate 100% the behavior of their native counterparts. This is because a fully accurate simulation could result in overwriting the document, which we want to avoid. Instead, they simply add a div element at the end of the document, containing the text it is supposed to contain.
AddEvent - a function taken from onlinetools.org [14] used to associate functions with events on elements.
SetupDebugWindow - invoked after the creation of the debug window to initialize its contents.
WindowOpener - invoked when the user clicks on the link to open the log window (in case the opening of the log window was blocked during the loading of the page).
SetupWindowOpener - this creates the element which hosts the link to the above function.

One useful trick in the code is the usage of escape / unescape when constructing functions from the strings. Since these functions themselves need to contain strings (delimited by single or normal quotation marks), these signs had to be eliminated. Also, one would have to eliminate the newline characters. Instead of writing a function with several replace methods one after the other I found it easier to wrap / unwrap the strings with the above mentioned function.

A sample work session with the code

For obvious reasons I'm not redistributing the hostile code, however I will describe the session during which it was analyzed to exemplify the usage of the provided class.

The first step is to inspect the code very carefully. Tools recommended: an editor which offers syntax highlighting (I personally use jEdit [6]) and Tidy [7]. First I pass the HTML code through Tidy to make it more readable and then open it with jEdit and look very carefully through it (preferably more than once). This is a very important step, since after this you will open the script in a live browser and you can never be sure what exploitable part your browser contains (particularly IE is rated currently "Highly critical" in the Secunia database [8], while Firefox is rated "less critical" [9], but there are claims that there are some undisclosed exploits for it too [10]).

In this particular case we see the following code:

<script language=javascript>
     document.write(unescape('%3C%73%63%72%69%70%74%2...

After looking closer, we see that there are actually two lines that are written as one, most probably to further obfuscate the code. After indenting properly we end up with:

<script language=javascript>
    document.write(unescape('%3C%73%63%72%69%70%74%2...
    dF('%286FVFULSW%2853odqjxdjh%28...
</script>

After carefully looking at it we decide that the first line probably defines the "dF" function, and the second line calls it, most probably with the encrypted body of a function. We decide that now it's safe to put it in a browser, so we edit the HTML page to include the following line in the head:

<script type="text/javascript" language="javascript" src="jsdebug.js"></script>

And fire it up in the browser. We get back the result of the first (and only) document.write call and an error regarding the dF function (since we redirected the before mentioned function):

function dF(s) {
  var s1=unescape(s.substr(0,s.length-1));
  var t='';
  for(i=0;i<s1.length;i++)
      t+=String.fromCharCode(s1.charCodeAt(i)-s.substr(s.length-1,1));
  document.write(unescape(t));
}

(The original code was again in a single line, indentation was added by me.) This is a simple routine and its only interaction with the "environment" is through the document.write code which we have hooked. We copy it back to the original source code (before a call to dF is made) and refresh the page. Now we get the result of the call to dF (since it uses document.write to display it):

<SCRIPT language="JScript.Encode">#@~^fAAAAA==@#@&NG1Es+xDRS...

Wow, that looks strange. A little background info: In the year 2003 Microsoft created a little tool called Script encoder [11]. It provides a very weak encryption, which can be broken very quickly and only provides protection against the casual look (it also not compatible with any standard or other browsers!). I used Google with the query "JScript.Encode decode" and then the following website "Decode web pages containing "jscript.encode" sections" (I'm not affiliated to this site). Through it the final result appeared as:

document.write(
  '<OBJECT classid=XXXX-XXXX-XXX codebase=XXXXXX.cab></OBJECT>');

Searching Google for XXXXXX.cab resulted in the URL of the file. Downloading and unpacking it gave an install script (.inf) and a DLL. After submitting the DLL to VirusTotal [12] we concluded that this threat is detected by many antivirus engines, and now (at least in this case) we can rest.

Detecting our implementation

If this technique becomes widely used, future malware authors will try to detect its presence, much the same way as (some) the current compiled ones try to detect the presence of debuggers. The best defense against that is of course very careful inspection of the code before you run it in the browser and its modification so that the detection code gets skipped. I will present three methods that are used to detect the presence of this system and how they can be defeated. I welcome any other suggestions on how this system can be detected and how a particular detection method can be defeated, but I would also like to stress again that the best defense is a deep inspection of the code before running it:

Detecting the existence of the JsInterceptor variable with a construct like "if (JsInterceptor != null)". Very basic, very easy to spot. Possible countermeasures: using an editor with search and replace and giving a new name to the variable.
Calling toString for the target methods (example: document.write.toString()). Under normal circumstances this would return something like "function write() { [native code] }", however in our case it would return the source code of the implementation. We can trick this, by swapping the toString function of the write object (yes, it is an object of type Function!), however then the attacker can call again the toString of the toString function (like this: document.write.toString.toString()) and so on. With prior inspection this can be spotted and removed.
Depending on some particular implementation specific detail of the system like the order of evaluation, etc. I don't know of any, but they may exist. Careful inspection of the code recommended.
Overriding the whole document. One could use the sequence document.open(), document.write(...), document.close() to effectively eliminate the debugger JavaScript code. Again, a deep inspection can reveal this.

Conclusion

Today's scripting malware is going through the same evolutionary path as binary viruses: from proof of concept through polymorphism and ending up with metamorphism (no example yet, but they will appear). To counter this it is necessary to study the techniques which provide the fastest and the most convenient method for analysis of the code to be able to react quickly to the new threats.

References

JavaScript – Wikipedia
Prototype based OO languages
Scripting languages: Into the future
David Flanagan: JavaScript Pocket Reference, 2^nd Edition - O'Reilly 2002.
Venkman homepage
jEdit – Programmer’s text editor
HTML Tidy
Secunia (Microsoft Internet Explorer 6.x)
Secunia (Mozilla Firefox 1.x)
Firefox Remote Compromise Technical Details
Script Encoder
VIRUSTOTAL
Experts-Exchange
Unobtrusive JavaScript

Full disclosure: I’m a junior virus analyst for (SOFTWIN) the makers of the BitDefender antivirus product. However anything that I've written in this article must not be interpreted as an official statement of the company, it is merely a personal opinion.