Leftshift’s Weblog

Techniques to improve your code

Mono Cecil, Visited and Observed

As I mentioned in a previous post, Mono Cecil is library that lets you load and browse the types of a .NET assembly. For a simple [but potentially useful] look at what you can do I’ll show you how you might go about listing all of the methods in an assembly. What will our test look like? Well, we will start with the assertion that the number of methods returned is what we expect:

[Test]
public void ShouldReturnNumberOfMethodsTest()
{
    Assert.AreEqual(expectedMethodCount, actualMethodCount);
}

Pretty simple so far, but we have a couple of design decisions to make to complete our test code. I’ll call our class that does the work AssemblyExaminer. It will need to expose a list of methods so I can get the count. It will also need to be passed an assembly as input. To avoid subtle state related bugs we will pass this in the constructor and make our class immutable. Looking at the Mono.Cecil namespace the AssemblyFactory.GetAssembly method has three overloads. One takes a filename, one a byte array and the other a stream. For our purposes a stream provides us with the best level of abstraction. With all that in mind here is our [almost] completed unit test:

[Test]
public void ShouldReturnNumberOfMethodsTest()
{
    const int expectedMethodCount = ?;
    using (Stream testAssembly = GetTestAssembly())
    {
        AssemblyExaminer examiner = new AssemblyExaminer(testAssembly);
        int actualMethodCount = examiner.Methods.Count;
        Assert.AreEqual(expectedMethodCount, actualMethodCount);
    }
}

Two things remain. One is simply replacing the ? with the number of methods I expect to find. The other is spinning up the stream containing the assembly. I have called the method that creates the stream GetTestAssembly. We’ll see how to implement that in a minute.

First things first we need to know how many methods we expect. To do this I created an assembly to run our test against, which is a simple DLL with an interface and a couple of classes. The implementation is left blank. You can use one of your own existing libraries or download an open source project and use that. You will have to manually count the number of methods however as this number will be used in our unit test:

namespace SampleProject
{
    public interface ITransportControl
    {
        void Play();
        void Stop();
    }

    public class Video:ITransportControl
    {
        public void Play()
        {
        }

        public void Stop()
        {
        }

        private void HelperMethod()
        {
        }
    }

    public class Audio: ITransportControl
    {
        public void Play()
        {
        }

        public void Stop()
        {
        }

        public void ChangeSoundLevel(int amount)
        {
        }

        private void AnotherHelperMethod()
        {
        }
    }
}

With our assembly to test against I can count the number of methods and replace the ? in the test code with the expected number of methods [9 in this case.]
As for the remaining piece of the puzzle, I’m going to add a new resource to the test project and name it AssemblyResource. Compile the sample code above and add the DLL file to AssemblyResource naming it SampleAssembly. The implementation of our GetTestAssembly() method is simply:

private Stream GetTestAssembly()
{
    return new MemoryStream(AssemblyResource.SampleAssembly);
}

The more eagle eyed amongst you will notice that the AssemblyResource.SampleAssembly property returns a byte array. Hang on a minute; didn’t one of the overloads of the AssemblyFactory.GetAssembly method take a byte array? Yes it did, but the decision taken earlier was to use the stream overload as it provided a better level of abstraction. Looking at the decision now, I’m happy that it still holds up as I would expect most clients of the code to pass in a file stream, but any type of stream can be used. We are obeying the open-closed principle. Somebody with too much time on their hands could use the standard input stream to type in the raw bytes if they wanted. Anyway I digress. We have our test, unfortunately it won’t compile just yet.

To make it compile we need to write some code. Starting with the simplest code to make the thing compile we end up with:

public class AssemblyExaminer
{
    public AssemblyExaminer(Stream assemblyStream)
    {
        throw new System.NotImplementedException();
    }

    public List<string> Methods { get; private set; }
}

Compiling and running our test gives us the lovely satisfaction that if fails and our test has some meaning. However we are no closer to implementing the code that counts the number of methods. Not yet anyway.

Ok how do we go about using Mono Cecil to get our list of methods? I’ll start with creating the AssemblyDefinition and our empty list of Methods:

    public AssemblyExaminer(Stream assemblyStream)
    {
        Methods = new List<string>();
        AssemblyDefinition assemblyDefinition = AssemblyFactory.GetAssembly(assemblyStream);
    }

Running our test again we still fail but with a more meaningful message:

TestCase 'AssemblyExaminerTest.ShouldReturnNumberOfMethodsTest' failed:
  Expected: 9
  But was:  0

What is the best way to populate our list with the methods and pass the test? We could use the MainModule property and enumerate the types and for each type enumerate the methods. Fortunately there is a better way as Mono Cecil implements the visitor pattern [some people consider the visitor pattern to be useless]. It allows us to add new operations to the existing object structure without modification. In our case the existing object structure is the Mono Cecil structure of the modules, types and methods that make up an assembly. Our new operation will simply add each method it finds to a list. So without further ado we will implement the IReflectionVisitor interface [Note, you should really write the tests first, but for the purpose of this article I’ve omitted this step.]
Looking at the IReflectionVisitor interface it contains 36 methods for use to implement! Most are grouped in pairs where one visits a collection and the other visits an item in collection. This all seems very complicated and the arguments about uselessness seem to have some grounding in reality. All is not lost however. As I said before all we are interested in is the MainModule, Its types and the methods for each type. We can implement the following five methods and leave the others blank:

internal class MethodVisitor: IReflectionVisitor
{
    public void VisitModuleDefinition(ModuleDefinition module)
    {
    }

    public void VisitTypeDefinitionCollection
        (TypeDefinitionCollection types)
    {
    }

    public void VisitTypeDefinition(TypeDefinition type)
    {
    }

    public void VisitMethodDefinitionCollection
        (MethodDefinitionCollection methods)
    {
    }  

    public void VisitMethodDefinition(MethodDefinition method)
    {
    }

    //We are not interested in any other type so leave empty implementation

    public void TerminateModuleDefinition(ModuleDefinition module)
    {
    }

    //rest of methods...
}

We have our approach. But how will this be called from our AssemblyExaminer class? This turns out to be simple. We create an instance of our MethodVisitor and pass it to the Accept method of the main module like so:

    public AssemblyExaminer(Stream assemblyStream)
    {
        Methods = new List<string>();
        AssemblyDefinition assemblyDefinition =
                    AssemblyFactory.GetAssembly(assemblyStream);
        MethodVisitor methodVisitor = new MethodVisitor();
        assemblyDefinition.MainModule.Accept(methodVisitor);
    }

We now have two things left outstanding. Number one, our implementation of the IReflectionVisitor interface doesn’t actually do anything and number two, even if it did how do we go about populating the list of methods?

Taking the first problem we know that we want to visit all the types given a module:

    public void VisitModuleDefinition(ModuleDefinition module)
    {
        module.Types.Accept(this);
    }

For the set of types we want to visit each type:

    public void VisitTypeDefinitionCollection (TypeDefinitionCollection types)
    {
        foreach (TypeDefinition item in types)
        {
            item.Accept(this);
        }
    }

For each type we want to visit the methods:

    public void VisitTypeDefinition(TypeDefinition type)
    {
        type.Methods.Accept(this);
    }

For the set of methods we want to visit each one:

    public void VisitMethodDefinitionCollection (MethodDefinitionCollection methods)
    {
        foreach (MethodDefinition item in methods)
        {
            item.Accept(this);
        }
    }

For each method we want to record it in our AssemblyExaminer:

    public void VisitMethodDefinition(MethodDefinition method)
    {
        // ???
    }

We have now come up against our second problem. How can we populate the list of methods in our AssemblyExaminer? Enter the observer pattern. We want our AssemblyExaminer to know when our MethodVisitor is visiting a method. We need to create the observer interface which contains a single update method. This method gets called on all observers when the subject state changes. Our AssemblyExaminer is an observer. Our MethodVisitor is a subject.
The observer interface looks like this:

    public interface IMethodObserver
    {
        void Update(MethodDefinition methodDefinition);
    }

As you can see it provides a mechanism to tell all observers about state changes. The state in our case is the Mono Cecil MethodDefinition. We have taken a bit of a shortcut here – normally the observer would call a GetState method on the subject. Our implementation of the interface in AssemblyExaminer now looks like this:

public class AssemblyExaminer : IMethodObserver
{
    void IMethodObserver.Update(MethodDefinition methodDefinition)
    {
        Methods.Add(methodDefinition.Name);
    }

I’ve explicitly implemented the interface as most clients won’t use it. We have solved one half of our problem, the observer, so on to the second half, the subject.
For a subject to be observable, observers register themselves with the subject. The subject then calls the update method on all registered observers, passing the new state. To do this we’ll create an abstract base class called Method that contains this functionality:

public abstract class Method
{
    private readonly List<IMethodObserver> observers = new List<IMethodObserver>();

    public void Attach(IMethodObserver observer)
    {
        observers.Add(observer);
    }

    public void Detach(IMethodObserver observer)
    {
        observers.Remove(observer);
    }

    protected void Notify(MethodDefinition state)
    {
        foreach (var item in observers)
        {
            item.Update(state);
        }
    }
}

This class provides a public mechanism to attach and detach observers. For subtypes it provides a means to notify the observers that the state has changed. Our implementation of the VisitMethodDefinition method in MethodVisitor now becomes:

internal class MethodVisitor: Method, IReflectionVisitor
{
    public void VisitMethodDefinition(MethodDefinition method)
    {
        Notify(method);
    }

With this in place we just need to add a line of code to the AssemblyExaminer constructor:

    public AssemblyExaminer(Stream assemblyStream)
    {
        Methods = new List<string>();
        AssemblyDefinition assemblyDefinition =
            AssemblyFactory.GetAssembly(assemblyStream);
        MethodVisitor methodVisitor = new MethodVisitor();
        methodVisitor.Attach(this);
        assemblyDefinition.MainModule.Accept(methodVisitor);
    }

Hurray! We are now ready to run the tests. Doing so produces the following result:

TestCase 'AssemblyExaminerTest.ShouldReturnNumberOfMethodsTest' failed:
  Expected: 9
  But was:  36

What happened? This was not the result we were hoping for. We seem to have four times as many methods as we expected. Looking at the list in the debugger, we did indeed have four copies of the method names. The names did appear to be in order for each class however. There is nothing wrong with the observer implementation; update is being called 36 times. There appears to be something wrong with our MethodVisitor. Firing up reflector for a further bit of analysis it appears Mono Cecil is making the extra calls. In the implementation of ModuleDefinition.Accept is a call to ModuleDefinition.Types.Accept. We are repeating this call in our implementation of the visitor and need to remove it. I’m not happy about this unexpected behaviour so create a unit test to reproduce the double call and leave a comment [gasp!] to prevent future developers making the same mistake. The same pattern is repeated for the TypeDefinition.Accept method. The implementation calls TypeDefinition.Methods.Accept. Once again the remedy is the same. This explains why we had four times the number of expected methods in the list. The updated MethodVisitor looks like this:

internal class MethodVisitor: Method, IReflectionVisitor
{
    public void VisitModuleDefinition(ModuleDefinition module)
    {
        //Calling ModuleDefinition.Accept calls 
        //ModuleDefinition.Types.Accept            
    }

    public void VisitTypeDefinitionCollection(TypeDefinitionCollection types)
    {
        foreach (TypeDefinition item in types)
        {
            item.Accept(this);
        }
    }

    public void VisitTypeDefinition(TypeDefinition type)
    {
        //Calling TypeDefintion.Accept calls       
        //TypeDefintion.Methods.Accept            
    }

    public void VisitMethodDefinitionCollection
        (MethodDefinitionCollection methods)
    {
        foreach (MethodDefinition item in methods)
        {
            item.Accept(this);
        }
    }  

    public void VisitMethodDefinition(MethodDefinition method)
    {
        Notify(method);
    }

Re-running the test gives a green light. Hopefully you have learnt a bit about Mono Cecil, and the visitor and observer design patterns.

Advertisements

18 July 2008 - Posted by | .NET, Metrics | ,

2 Comments »

  1. […] work just yet, but it would either examine the source itself or use static analysis, something like cecil and the approach I’ve talked about before. Does anybody else think this is a good idea? What should the filters […]

    Pingback by I’ve got a ticket to DRY « Leftshift’s Weblog | 10 December 2008 | Reply

  2. Hi thanks for confirming my own findings.
    I wholly agree that I don’t like the unexpected extra accepts in /some/ Cecil Accept implementations.

    To me, it is just plain broken, because the visitor can no longer decide what to visit and when. Of course, it is a decision of taste, but IMHO it should be the one or the other:

    Either accept will recursively visit all nodes in the tree, or it should just call the single method. Not some in-between lame unknowable mixture.

    To be brutally honest, I say it should always do the recursion itself, because otherwise, how is the visitor a visitor, if it effectively has to implement the traversal/iteration logic itself? And how is a visitor a visitor if it is to operate only on the Accepting node itself?

    visitor = new Visitor(businessClass);
    node.Accept(visitor);

    then becomes a very expensive and illegible way to write

    businessClass.method(node);

    Well the rant is misplaced of course; I just wanted to thank you for carefully documenting my pain, which prevents me from doing the analysis myself

    Comment by Seth | 6 October 2010 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: