Protect, analyze, and monetize applications

PreEmptive Solutions Magazine

Subscribe to PreEmptive Solutions Magazine: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get PreEmptive Solutions Magazine: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

PreEmptive Solutions Authors: Yeshim Deniz, Maureen O'Gara, Gabriel Torok, Keith Brown, Ed Zebrowski

Related Topics: PreEmptive Solutions

PreEmptive Solutions: Article

Secure Your .NET Code

Secure Your .NET Code

Are you aware that you might be shipping your source code with your .NET dll or exe? A new tool included in Microsoft's Visual Studio .NET 2003 can help you make sure that does not happen.

The .NET platform realizes Microsoft's vision for the next paradigm in Windows computing: multiple programming languages interacting harmoniously, sharing an enriched object-based framework, and executed by a Common Language Runtime (CLR). This architecture provides an unprecedented degree of power and flexibility. Unfortunately, that flexible design inherently produces a problem for those wishing to hide their program's intellectual property. Programs in the .NET Framework are easy to reverse engineer. This is not in any way a fault in the design of .NET; it is simply a reality of modern, intermediate-compiled languages (Java suffers from this problem too). Both Java and .NET use expressive file syntax for delivery of executable code: bytecode in the case of Java, MSIL (Microsoft Intermediate Language) for .NET. Being much higher-level than binary machine code, the intermediate files are laden with identifiers and algorithms that are immediately observable and ultimately understandable. After all, it is obviously difficult to make something easy to understand, flexible, and extensible while simultaneously hiding its crucial details.

So anyone with a copy of ILDASM ­ or better yet, one of the commercial .NET decompilers ­ can look at your assemblies and reverse engineer your source code. Suddenly, your software licensing code, copy protection mechanisms, and proprietary business logic are much more available for all to see ­ whether it's legal or not. Anyone can peruse the details of your software for whatever reason. They can search for security flaws to exploit, steal unique ideas, crack programs, etc. This should be enough to make you pause for thought.

All of that said, it should not be considered a risk or a showstopper. Organizations concerned about putting their intellectual property on the .NET Platform need to understand that there is a solution to help thwart reverse engineering. Obfuscation is a technique that provides for seamless renaming of symbols in assemblies, as well as other tricks to foil decompilers. Properly applied, obfuscation can increase the protection against decompilation by many orders of magnitude, while leaving the application intact. Obfuscation is commonly used in Java environments and for years has helped companies feel safe about protecting their intellectual property when they release their Java-based products.

As you'd expect, Microsoft isn't passively watching as this issue develops. As of Visual Studio .NET 2003, they're including a "lite" version of PreEmptive Solutions' Dotfuscator, accessible from the toolbar. Microsoft is known for treating developers like important customers (which they are), and they're not missing the boat on this either. They are providing a solution right out of the box. This article delves into the world of .NET obfuscation. Along the way, you will develop an understanding of how obfuscation is successfully applied.

Obfuscation is the technology of shrouding the facts. It's not encryption, but in the context of .NET (or Java) code, it might be better. Early in Java's life, several companies produced encrypting class loaders to fully encrypt Java classes. Decryption was done just in time, prior to execution. Although it made classes completely unreadable, this methodology suffered from a classic encryption flaw; it needed to keep the decryption key with the encrypted data. Therefore, an automated utility could be created to decrypt the code and put it out to disk. Once that happens, the fully unencrypted, unobfuscated code is in plain view.

As another illustration, you could compare encryption to locking a six-item meal into a lockbox. Only the intended diner (i.e., the Common Language Runtime) has the key, and we don't want anyone else to know what he or she is going to eat. Unfortunately, if someone can pick the lock (or find the key hidden on the bottom of the box), the food is in plain view. Obfus- cation works more like putting the six-item meal into a blender and sending it to the diner in a baggie. Sure everyone can see the food in transit, but besides a lucky pea or some beef-colored goop, they don't know what the original meal is. The diner still gets the intended delivery and the meal still provides the same nutritional value it did before (luckily, CLRs aren't picky about taste). The trick of an obfuscator is to confuse observers, while still giving CLRs the same delivery.

Without argument, obfuscation (or even encryption) is not 100% protection. Even compiled C++ is disassembleable. If a hacker is persistent enough, he or she can find the meaning of your code. Also, humans write and employ decompilers to automate decompilation algorithms that are too challenging for the mind to follow. It is safe to say that any obfuscator that confuses a decompiler will pose even more of a deterrence to a less-capable human attempting the same undertaking. The goal of obfuscation is to form a barrier that knocks out as many would-be reverse engineers as possible by creating confusion.

As confusion builds, the ability of the human mind to comprehend multifaceted intellectual concepts deteriorates. Note that this precept says nothing about altering the forward (executable) logic ­ only representing it incomprehensibly. When an obfuscator goes to work on readable program instructions, a possible side effect is that the output will not only confuse a human interpreter, it will stop a decompiler. While the forward logic has been preserved, the reverse semantics have been rendered nondeterministic. As a result, any attempt to reverse engineer the instructions into a "programming dialect" like C# or VB will likely fail because the translation is ambiguous. Deep obfuscation creates a myriad of decompilation possibilities, some of which might produce incorrect logic if recompiled. The decompiler, as a computing machine, has no way of knowing which of the possibilities could be recompiled with valid semantics.

The obvious concern getting the most buzz in .NET developer circles is the threat of intellectual property theft. We hear this discussed at conferences and see it as a forum topic in online newsgroups. The developer community is concerned for good reason. They intend to produce commercial Windows software with .NET and this is a very competitive industry. The barriers to entry are low. Anyone with skill, hardware, and some basic tools can begin to create programs that have the potential to enter the competitive arena. For reasons just explained, .NET introduces the possibility that competitors can inspect your code. Even if they don't copy it outright, they can certainly glean algorithms and constructs useful to their own endeavors, leaving you holding the bag.

A less obvious effect of MSIL readability is the exhibition of confidential constructs such as your software licensing, copy protection, or encryption code. The problem here is more subtle, but equally perilous. By exposing your security logic to the public, you are giving them a roadmap to cracking your algorithms.

The third issue is that of code bloat. .NET is fully object oriented. The world has come to a place that accepts this as the programming paradigm of choice ­ no argument there. One of the benefits of OOP is the ability to use class libraries to quickly bypass the development of tedious "plumbing" code. Instead, developers inherit from a coordinated set of classes that have been tested and offer a rich palette of functionality. In fact, this set might be richer than we need for a given application. Where does all that extra functionality go when you compile? It goes right into your application code. As post-compilation tools, obfuscators are in the perfect position to help us with this bloat. High-end obfuscators are available that remove unused code as a by-product of their multipass analysis. This expands the role of obfuscator to include that of code size­reducer.

The Basic Solution
Today, some commercial obfuscators employ a renaming technique that applies trivial identifiers. Typically, these can be as short as a single character. As the obfuscator processes the code, it selects the next available trivial identifier for substitution. This seemingly simple renaming scheme has a huge advantage over hashing or character-set offset: it cannot be reversed. While the program logic is preserved, the names become nonsense. At this point, it has hampered human understanding to a large degree. Faced with identifiers like a,, ct, and 2s(e4), it is a stretch to translate the semantic purpose to be concepts like invoiceID, address.print(), userName, and deposit(amount). Nevertheless, the program logic can be reverse engineered.

A deeper form of obfuscation uses Overload Induction, a patented algorithm devised by PreEmptive Solutions, Inc. (this scheme is included in the Visual Studio version). Trivial renaming is used; however, a crafty twist is added. Method identifiers are maximally overloaded after an exhaustive scope analysis. Instead of substituting one new name for each old name, Overload Induction will rename as many methods as possible to the same name. After this deep obfuscation, the logic, while not destroyed, is beyond comprehension. See for yourself. The simple example shown in Listings 1 and 2 gives you some idea of the power of the Overload Induction technique:

One of the things you probably noticed about the example is that the obfuscated code is more compact. A positive side effect of renaming is size reduction. For example, if you have a name that is 20 characters long, renaming it to a() saves a lot of space (specifically 19 characters). This also saves space by conserving string heap entries. Renaming everything to "a" means that "a" is stored only once, and each method or field renamed to "a" can point to it. Overload Induction enhances this effect because the shortest identifiers are continually reused. Typically, an Overload Induced project will have up to 35% of the methods renamed to a().

Obfuscators remove debug information and nonessential metadata from an MSIL file as they process it. Aside from enhancing protection and security, this also contributes to the size reduction of MSIL files.

It is important to understand that obfuscation is a process that is applied to compiled MSIL code, not source code. Your development environment and tools will not change to accommodate renaming. Source code is never altered, or even read, in any way. Obfuscated MSIL code is functionally equivalent to traditional MSIL code and will execute on the CLR with identical results. (The reverse, however, is not true. Even if it were possible to decompile strongly obfuscated MSIL, it would have significant semantic disparities when compared to the original source code.) Figure 1 shows the flow of the classic obfuscation process.

Solution Enhancements
One of the more advanced obfuscation techniques available today is Control-Flow obfuscation. This process synthesizes branching, conditional, and iterative constructs that produce valid forward logic, but yield nondeterministic semantic results when decompilation is attempted. All of the admonishments you ever heard about maintaining spaghetti code are working in your favor when you try to protect your intellectual property using Control-Flow obfuscation. Consider trying to understand the code in Listings 3 and 4 before and after Control-Flow obfuscation. It should be obvious that after Control-Flow obfuscation the reverse engineered code is very ugly at worst and incorrect (not recompilable) at best.

Another technique, string encryption, applies a simple encryption algorithm to any strings in your application that you desire. As mentioned before, any encryption (or specifically decryption) done at runtime is inherently insecure. That is, a smart hacker can eventually break it, but for strings present in customer code, it is worthwhile. Let's face it; if hackers want to get into your code, they don't blindly start searching renamed types. They probably do a search for "Invalid License Key", which points right to the code where license handling is performed. Searching on strings is incredibly easy. String encryption raises the bar for the casual hacker and deters that many more nonserious hackers. The algorithm typically incurs a tiny performance penalty at runtime, so make sure the option is fully configurable.

An advanced feature called incremental obfuscation is of particular interest to enterprise development teams maintaining an integrated application environment. By generating name-mapping records during an obfuscation run, obfuscated API names can be reapplied and preserved in successive runs. A partial build can be done with the full expectation that its access points will be renamed the same as a prior build. As a result, the distributed patch files integrate into the previously deployed system without a hitch.

Last, obfuscators can accomplish size reduction by analyzing your application and removing code your program is not using. It seems odd that unused-code removal can actually do anything ­ who writes code they don't use? Well, the answer is all of us. What's more, we all use libraries and types written by other people that were written to be reusable. Reusable code implies there is contingent code that handles many cases ­ however, in any given application you typically only use one or two of those many cases. An advanced obfuscator can figure that out and rip out all the unused code (from compiled MSIL, not the source). The result is that the output contains precisely the types and methods your application needs, nothing more. Amazing space reduction can be achieved, conserving computing resources and reducing instantiation times. This can be especially important for .NET Compact Framework or remotely deployed applications.

Microsoft's .NET Framework provides one of the best software development platforms available today. Expect all Windows developers (and even some non-Windows developers) to eventually make the switch to .NET. Given this reality, the next step is to address any concerns you might have about protecting your code from reverse engineering. Obviously, this need not be considered a risk or a showstopper; the problem is solved. To get started using an obfuscator, consider downloading a free copy of Dotfuscator Community Edition at or use it right from the Tools menu of Microsoft's Visual Studio .Net 2003 (see Figure 2). Should you want more powerful obfuscation and size reduction, you can upgrade to PreEmptive's Dotfuscator Professional Edition. You may never know what an obfuscator is worth unless you do not use one!

More Stories By Gabriel Torok

Gabriel Torok is a founding principal at PreEmptive Solutions, Inc. He is a book author and active national conference speaker. He is directly involved in most aspects of the business, with a primary focus on product development, and sales and marketing. In addition to company management, he remains active in teaching Java, .NET and related technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.