Leftshift’s Weblog

Techniques to improve your code

Mono Cecil, Visited and Observed

As I mentioned in a previous post, Mono Cecil is library that lets you load and browse the types of a .NET assembly. For a simple [but potentially useful] look at what you can do I’ll show you how you might go about listing all of the methods in an assembly. What will our test look like? Well, we will start with the assertion that the number of methods returned is what we expect:

[Test]
public void ShouldReturnNumberOfMethodsTest()
{
    Assert.AreEqual(expectedMethodCount, actualMethodCount);
}

Pretty simple so far, but we have a couple of design decisions to make to complete our test code. I’ll call our class that does the work AssemblyExaminer. It will need to expose a list of methods so I can get the count. It will also need to be passed an assembly as input. To avoid subtle state related bugs we will pass this in the constructor and make our class immutable. Looking at the Mono.Cecil namespace the AssemblyFactory.GetAssembly method has three overloads. One takes a filename, one a byte array and the other a stream. For our purposes a stream provides us with the best level of abstraction. With all that in mind here is our [almost] completed unit test:

[Test]
public void ShouldReturnNumberOfMethodsTest()
{
    const int expectedMethodCount = ?;
    using (Stream testAssembly = GetTestAssembly())
    {
        AssemblyExaminer examiner = new AssemblyExaminer(testAssembly);
        int actualMethodCount = examiner.Methods.Count;
        Assert.AreEqual(expectedMethodCount, actualMethodCount);
    }
}

Two things remain. One is simply replacing the ? with the number of methods I expect to find. The other is spinning up the stream containing the assembly. I have called the method that creates the stream GetTestAssembly. We’ll see how to implement that in a minute. Continue reading

18 July 2008 Posted by | .NET, Metrics | , | 2 Comments

Cecil

Cecil is a library to inspect and generate programs in the CIL format. CIL is the intermediate language format generated when compiling .NET programs.

The interesting thing about Cecil is that it gives you a model of your code, in a similar way to the System.Reflection namespace in .NET. The really important bit though IMO is not documented very well. You can inspect the code and the debug symbols tradionally stored in a .pdb file. The .pdb file contains information that links the IL instruction being examined back to the source code file. It contains the filename, and start and end points within the file.

One way this information is useful is if you want to calculate the number of lines of code within a method, type or assembly. The debugger defines a set of points of interest in the code known as Sequence Points. Cecil allows you to access this information. To do so download the latest version binaries [0.6]. Three DLLs are included in the download; the main Mono.Cecil.dll plus two libraries for accessing the debug symbols. One library targets standard .NET pdbs and the other targets Mono mdbs. We’re going to use the Mono.Cecil.Pdb.dll in this example. You’ll need to reference both the main and pdb libraries.

To load an assembly, get its main module and load the associated debug symbols you’ll need to do something like this [obviously ensuring that you write the unit tests first]:

string unit = Assembly.GetExecutingAssembly().Location;
AssemblyDefinition assemblyDef = AssemblyFactory.GetAssembly(unit);
ModuleDefinition modDef = assemblyDef.MainModule;
PdbFactory pdbFactory =
new PdbFactory();
ISymbolReader reader = pdbFactory.CreateReader(modDef, unit);
modDef.LoadSymbols(reader);

The first line mearly stores the location of the current executing assembly. Change this to point to whichever assembly you may be interested in. The next two lines get the Cecil Assembly definition and the main module. The rest of the code does the interesting part. It creates a symbol reader using the PdbFactory, and the symbols are loaded by the main module using this reader. From this point on you can get at interesting stuff. An alternative to loading the symbols for a whole module is to use the ISymbolReader.Read method to load the symbols for a MethodBody.

So how can you use this? To navigate the code from the Assembly down to the IL instruction level, Cecil provides the following hierachy

AssemblyDefinition > ModuleDefinitionCollection > TypeDefinitionCollection > MethodDefinitionCollection

A MethodDefinition contains a Body property which contains a list of IL Instructions that make up the code. Each instruction has a SequencePoint property. Note that for a lot of IL Instructions this property is null. This basically means that the debugger is not interested in that instruction. This is to be expected as one line of code in C# can generate many IL instructions. Where it is not null, the sequence point contains the path to the source code file and the start / end column and line within the file. When you debug a project using visual studio it uses the same information to highlight the code in the IDE.

You can use this information to help write metrics tools [such as NDepend]

30 April 2008 Posted by | .NET, Metrics | , , | 1 Comment