Bob Martin has been thinking about adding a new element to the agile manifesto around producing quality rather than quantity. He’s described this as ‘Craftmanship over Execution’. To back this up you can follow the instructions here and measure the amount of WTF’s per minute. A great idea for a metric, but hard to automate. Maybe an idea for a new startup; provide metrics for code in the same way third party companies perform penatration testing.
As part of a continuous integration cycle most people consider running unit and integration tests. Some even consider running automated acceptance tests. Fewer still focus on code quality tests. To ensure code is maintainable requires a certain amount of effort as the code changes. I think this is what the refactor stage of the TDD red, green, refactor cycle alludes to. As well as refactoring code to remove duplication, there are other considerations to be made with regards to maintainability. We use six indicators to give a finger in the air estimate of the maintainability of a code base. The indicators we use are as follows:
Unit Test Coverage High test coverage is a good indicator of whether a TDD approach is being followed, and if not an optimistic percentage of the chance of a bug being caught. Said another way, If a bug is introduced into the code the chance of it being caught is at best the percentage of code covered by tests. This very much depends on the quality of the tests, but if you only have fifty percent coverage and introduce a bug, it’s a coin toss whether it’s detected. If the tests are poor the real figure is much lower than fifty percent.
Percentage of large methods Fairly obvious this one, but large methods are harder to maintain because they contain more code. There is more scope for error, less accuracy for identifying the cause of any error [any unit test covers more code] and a greater chance that the method is breaking the single responsibility principle giving it more than one reason to change. What you consider a large method is up to you, but we have been using ten lines of code as our measure.
Class Cohesion For a class to be cohesive all methods should use all fields. We use the lack of cohesion of methods henderson-sellers formula to measure this one. If a class isn’t cohesive it’s an indicator that unrelated functionality could be split into it’s own class. In other words it has more than one reason to change and is therefore breaking the single responsibility principle.
Package Cohesion For a package or assembly to be cohesive the classes inside the package should be strongly related. This is a measure of the average number of type relationships within a package. Low cohesion suggests that the types can be split into seperate packages.
Class Coupling This is a measure of the number of types that depend on a particular type a.k.a. afferent coupling. If a high number of types depend on the class in question, making changes to it will be hard without breaking lots of client code. There are a number of reasons why this might occur. Responsibility for one aspect may be split among multiple classes, but more likely you don’t have a losely coupled design.
Package Coupling This is a measure of the number of types outside this package or assembly that depend upon types within this package. One possible reason for high coupling is a packaging problem – things that change together should stay together. Another reason is that the packages in question have many responsibilities.
I’d love to hear feedback on the way you measure the maintability of code.
Cecil is a library to inspect and generate programs in the CIL format. CIL is the intermediate language format generated when compiling .NET programs.
The interesting thing about Cecil is that it gives you a model of your code, in a similar way to the System.Reflection namespace in .NET. The really important bit though IMO is not documented very well. You can inspect the code and the debug symbols tradionally stored in a .pdb file. The .pdb file contains information that links the IL instruction being examined back to the source code file. It contains the filename, and start and end points within the file.
One way this information is useful is if you want to calculate the number of lines of code within a method, type or assembly. The debugger defines a set of points of interest in the code known as Sequence Points. Cecil allows you to access this information. To do so download the latest version binaries [0.6]. Three DLLs are included in the download; the main Mono.Cecil.dll plus two libraries for accessing the debug symbols. One library targets standard .NET pdbs and the other targets Mono mdbs. We’re going to use the Mono.Cecil.Pdb.dll in this example. You’ll need to reference both the main and pdb libraries.
To load an assembly, get its main module and load the associated debug symbols you’ll need to do something like this [obviously ensuring that you write the unit tests first]:
AssemblyDefinition assemblyDef = AssemblyFactory.GetAssembly(unit);
ModuleDefinition modDef = assemblyDef.MainModule;
PdbFactory pdbFactory = new PdbFactory();
ISymbolReader reader = pdbFactory.CreateReader(modDef, unit);
The first line mearly stores the location of the current executing assembly. Change this to point to whichever assembly you may be interested in. The next two lines get the Cecil Assembly definition and the main module. The rest of the code does the interesting part. It creates a symbol reader using the PdbFactory, and the symbols are loaded by the main module using this reader. From this point on you can get at interesting stuff. An alternative to loading the symbols for a whole module is to use the ISymbolReader.Read method to load the symbols for a MethodBody.
So how can you use this? To navigate the code from the Assembly down to the IL instruction level, Cecil provides the following hierachy
AssemblyDefinition > ModuleDefinitionCollection > TypeDefinitionCollection > MethodDefinitionCollection
A MethodDefinition contains a Body property which contains a list of IL Instructions that make up the code. Each instruction has a SequencePoint property. Note that for a lot of IL Instructions this property is null. This basically means that the debugger is not interested in that instruction. This is to be expected as one line of code in C# can generate many IL instructions. Where it is not null, the sequence point contains the path to the source code file and the start / end column and line within the file. When you debug a project using visual studio it uses the same information to highlight the code in the IDE.
You can use this information to help write metrics tools [such as NDepend]