This post is about our contributions to the .NET Open Source community to help create a new and more flexible .NET Intermediate Language (IL) Verifier. It will explain what IL is, why you would actually want to modify it and finally introduce to you different ways of verifying such IL. This whole story was initiated by our efforts to improve the quality and stability of our product.
Introduction: Dynatrace and IL
About a year ago, we investigated ways to harden our .NET Agent against bugs that originated from invalid IL. Our Agent is based on manipulating IL at runtime. We have a neat framework for parsing and manipulating IL, which makes changing IL quite easy. However, it was still possible to emit invalid IL and we could only find these bugs through extensive testing. Sometimes the effects were so subtle that we couldn’t even discover those in tests.
For obvious reasons, we want to avoid such bugs at all costs. So we started to search for ways on how to verify the IL we generate. We found ILVerify, a cross-platform, open-source tool by Microsoft, which seemed to satisfy all our requirements. However, we quickly realized that ILVerify was still an early stage prototype. At that moment, we decided to take things into our own hands and to start contributing to the open-source project. But before we dive into ILVerify, let me explain what this whole “IL” deal is about.
What is IL?
Languages such as C# or Java are not directly compiled into machine instructions, but another intermediate code. For C# this intermediate code is called Common Intermediate Language (CIL, or just IL). Most Portable Executable (PE) files compiled and assembled from C#, whether .dll or .exe, are simply composed of IL and its corresponding metadata. At runtime this IL is then translated into native code by a just-in-time (JIT) compiler.
Why Manipulate IL?
As a normal software developer you might have written C# applications for decades, but never actually had to deal with or even think about the IL that was generated from it. Which is a good thing, since IL is a stack-based assembly language (a fancy term for “hard to read”).
For example, this very simple C# console app:
is compiled to the following IL:
I think we can generally agree that the former is more readable than the latter. This is also the reason why most people consider manually reading or even manipulating IL “crazy”.
However, there are a lot of valid reasons why you would still want to do just that. The classes of the Reflection.Emit namespace of C# even allow you to dynamically create new types at runtime, by manually emitting IL instructions. In other words, you can write code that writes code. Another common reason for directly manipulating IL is to instrument existing applications for debugging, profiling or aspect oriented programming purposes. At Dynatrace, we use it for profiling .NET applications.
The Hassle of Manipulating IL
Other than it being hard to read, another reason why you would generally try to avoid writing IL yourself is its fragility. Unlike with C# there is no compiler, which checks whether your code conforms to the rules of the language used. IL is directly translated by the JIT compiler and thus it is your own responsibility to emit valid IL.
But what happens if you emit invalid IL?
In the best case, an InvalidProgramException is thrown when you run your application. At this point you at least know that something is wrong with your code, even though you will still have to analyze what exactly is wrong with your code. This involves analyzing the emitted IL and taking note of the stack state of the entire method by hand. On top of that you will have to get familiar with the ECMA-335 standard if you want to emit your own IL in a serious way, since documentation on this matter is rather sparse.
As if all of this wasn’t discouraging enough, this exception might only be thrown “just-in-time”, as the name of the JIT compiler suggests, meaning once the invalid code is actually executed. This way invalid code parts which rely on conditional branches, or maybe even some race condition, may never be executed during your tests, but could then lead to a fatal crash in production. Perfect.
Considering the fragility of IL and its rather bad readability, it obviously makes sense to seek a way of being able to verify your own IL in a robust and reliable way. So let’s examine and compare the currently available possibilities of verifying IL.
#1 – The Old-School Way
Gather your PE-file, fire up ildasm, get a pen and paper and start taking note of the stack state, while trying to learn the ECMA spec by heart.
This is how everyone starts out. Even though fancy open-source tools like dnSpy (which I highly recommend) make investigating IL a lot easier, this is a time consuming and quite frankly not very pleasurable task. On top of that it is also prone to errors by itself. Once snippets of IL finally start to haunt your dreams, it might be time to switch to one of the next solutions.
#2 – PEVerify
PEVerify is a tool for verifying the metadata and IL of .NET PE-files, which was developed by Microsoft. Just like ilasm and ildasm, it is shipped with Visual Studio and you can run it on any assembly using the Developer’s Command Prompt. PEVerify is a great and reliable tool, which automates the processes described earlier. However, it has some major limitations, such as not being compatible to .NET Core and not being able to verify mscorlib.dll, which was one of our requirements.
#3 – ILVerify
ILVerify is a cross-platform, open-source tool currently being developed as part of Microsoft’s CoreRT repository. The goal of ILVerify is to alleviate PEVerify’s limitations, thus being able to verify any assembly, including mscorlib and .NET Core assemblies, while being developed entirely in C#. Due to its open-source nature, anyone can browse its source code and contribute on Github.
Currently ILVerify can be run as a console application, just like PEVerify, even though there is a public API surface planned. In order to verify an assembly you also have to specify the location of all referenced assemblies. For example, in order to verify the assembly asm.exe, which references mscorlib.dll and System.dll, you would run:
ilverify.exe <path-to-asm.exe> -reference <path-to-mscorlib.dll> -reference <path-to-system.dll>
ilverify.exe <path-to-asm.exe> -r <path-to-libfolder-*.dll>
Additionally you can specify a regular expression defining specific methods to be included or excluded with -include and -exclude, or just -i and -e. You can also define the base library to be used with -system-module, or just -s, for assemblies using a base library other than mscorlib.
Contributing to ILVerify
When we came across ILVerify about six months ago, it was still in an early development stage. The basic structure was there, but most IL-instructions simply yielded a NotImplementedException. Since investing into this tool had the potential of not only preventing bugs but also improving the overall quality of our product, we decided to start contributing to it. During the last two months, I had the honor of doing so and not only learned a lot about IL, but also the open-source contribution process. Contributing to the CoreRT repository was hugely satisfying, also due to the awesome work of the Microsoft employees in charge of it.
The Current State of ILVerify
At present, ILVerify is very close to a verification capability comparable to PEVerify. While there are still some minor verification rules missing and some false negatives popping up, the project is now in a state, that allowed us to fix errors in our code, which we might have never found otherwise. In the end, it did exactly what we expected from it: improve the overall quality and stability of our product.
Future plans for ILVerify include being used as a verifier for the Roslyn Compiler and implementing newly proposed IL verification rules.
Whenever you are in a situation where you need to generate your own IL, it is a good idea to verify that your code is actually valid. Not only does it prevent your application from crashing, but can also improve the quality of your code. PEVerify has historically been the go-to tool for IL generators, but has some major limitations. ILVerify is a cross-platform, open-source tool that is currently being developed and serves as an alternative to PEVerify, avoiding its limitations and being updated with the newest verification rules introduced in new standards.
I must thank Jan Kotas for his amazingly fast and professional reviews on Github, which made the process of contributing to ILVerify a very pleasant experience. I must also thank Christoph Neumüller for continuously pushing this project and therefore making this all possible in the first place and Michael Mayr for his awesome support during the last months.