Microsoft Parallel Extensions – Hard”core” code made easy!

Ok, so I must admit that I was a little disappointed last night. I was really looking forward to implementing a threaded Consumer / Producer to try and take advantage of our 16 core datawarehouse server. Instead, I ended up changing one line of code – welcome to the world of Microsoft Parallel Extensions.

The Background (feel free to skip this)

At my work, it’s really important for us to be able to detect if people are sharing their accounts. We do this via storing a permanent cookie on the users machine with a “Machine ID” (a guid we generate). When they log in from that machine, we log the id so we know the user was on a particular machine. The more machines, the more likely they are sharing their account.

This worked well, apart from the fact that people tend to clear their cookies, leading to an artificial rise of “Machine IDs” over time. Three years ago, I came up with an algorithm to find out how to work out what machine IDs were due to cookie clearing, and what ones weren’t. So I don’t bore you to death with the details, I will leave that for another post.

The Details

At the core of my problem was the following loop:

foreach (string user in userlist)
  // Do some stuff
  // Commit it to the database

This was going through around 60,000 users and 4mil+ records and running a pretty complex algorithm on them. It took around 10 minutes to run, but when I looked at the performance monitor, I noticed that the total CPU usage rarely got over 4% and that only one CPU process spiked – hmm, a common sign that threads would help 🙂 One of my pet hates is having to write really complicated and unintelligble code just to “increase” performance, and threads, not matter how hard you try, always end up looking ugly!

The Solution

Some neuron at the back of my brain started firing, and I got this feeling that I knew a better way. F# – nope, that’s not it – would be helpful, but it would take me a long time to convert everything into Functional Programming. A quick google search bore fruit: this article I read Microsoft Parallel Extensions library back in 2007! Microsoft are now planning on releasing this as part of .Net 4.0; but a CTP for 3.5 still exists.

After installing the MSI and then adding the reference to System.Threading, all I needed to do was to make this simple change:

Parallel.ForEach(userlist, (username) =>
 // Same code as before

It was that simple! Voila, multi-threading code. If you want more control over the threads created, you can use the TaskManagerPolicy object – a simple way to do this is outlined here.

Caveat: You’re code inside the loop has to be thread safe – since none of my code was modifying shared objects, this wasn’t a concern for me.

The Result


Beautiful! The task now takes under a minute instead of 10.

MarketClose Twitterobot – Step 1: Yahoo!

So, I am a recovering stockoholic. I’ve been out of the markets for about 2 years now, but I am still facsinated by their daily fluctuations. I found myself hovering over the address bar and habitually typing to find out what was happening. But that only satiated my fix when I was firefox. So, I added the keyword “stocks” to slickrun so I could quickly open up from anywhere. Bah – too many keystrokes – I used winkey to assign it to Win+C.  But what to do before 10am? I started checking the future markets before getting to work.

It was too much – I had to go cold turkey! I changed Win+C back to being a command prompt, and re-aliased stocks to take me to proggit. I decided that instead, I would write a simple C# program to send me the stock price at the end of the day. I’ve been meaning to play around with the Twitter API for a while now, so it finally gave me an excuse.

However, before the fun started, I first had to get the data.

Step 1 – Yahoo!

Yahoo have a nice little rest based CSV generation system for stock quotes. Basically, all you need to do is request urls in the following format:

The s query string parameter defines the tickers you want to retrieve (separated by spaces) and the f defines what fields you want back – you can find a listing of all those commands here.

So, for my needs, I just needed to request:^IXIC+^DJI+^GSPC&f=sl1o

Fortunately, .net makes rest request a piece of cake:

using (WebClient wc = new WebClient())
return wc.DownloadString(@”^IXIC+^DJI+^GSPC&f=sl1o”);

So all that was left to do was to write something to quickly parse the csv (unit tests are your friend for things like this) and to move some things out to config files (such as the url)

Next Step – Twitter!

Extending yourself…

How many times have you found yourself doing this for enums:

public enum Role

public class RoleHelper
public static string GetName(Role role)
return Enum.GetName(typeof(Role), role);

Wouldn’t it be a lot nicer just to be able to go role.GetName()? As promised, I am now going to explain what Extension Methods are. Extensions Methods are a blessing 🙂 Basically, they allow you to add new methods to existing classes, so that the appear as if they were defined in the class. It’s almost as if you were adding the method inside of the class definition (though see the caveat below).

The syntax is pretty straight forward:

public static class RoleHelper
public static string GetName(this Enum e)
return Enum.GetName(e.GetType(), e);

Notice the “this” in front of the class name. That’s the secret sauce. The only other thing you need to remeber is that the methods must be declared inside public static classes.

So now, all your enum’s will have a handy GetName method which returns the name of the enum instead of having to write a helper method for each one!

Caveat: Even though it looks like you are just declaring another method for a class, you are not executing in the context of the class – this means you do not have access to the class’ private or protected members.

Friend of C#

In the world of Test Driven Development (TDD) the revivial of the need for a “friend” access modifier is becoming more and more apparent. Typically, you want to have all your tests in a different assembly (project) to your actual code – for cleanness, security and responsibility reasons. However, this presents a dilema – you want to be able to write test cases for all those internal methods. By default, “internal” methods are only exposed to other classes in the same project.

So, the typical approach is to make the methods “public” and either (forget to) change them at go live, or have a cluttered API full of methods that are not meant to be called directly. The same goes for variables. C++ had the ever useful “friend” language embelishment to handle these types of scenarios, but C# is caught lacking.

Fortunately, as of .Net 2.0, there is the wonderfully named “InternalsVisibleToAttribute”. This is an assembly wide (read: AssemblyInfo.cs) attribute that states that all items with the “internal” modifier are accessable to the assemblies identified with this attribute. So how do you identify them? Easy: open up the AssemblyInfo.cs of the assembly you want to expose the internals of (hmm, that almost sounds, well, awkward), and add the following line:

[assembly: InternalsVisibleToAttribute(“TestListGenerator.Tests”)]

Where TestListGenerator.Tests is the name of the library that you want to have to access to this assemblies internals. Of course, you’ll probably want to move to the world of strong naming to prevent people from getting to friendly with your internals. In which case, after you’ve signed all your projects (it’s easy: right click Project > Properties > Signing > new) you need to get the _full_ public key of the calling project. You can do this with Visual Studio’s sn.exe (included in the path when you use the VS Command Prompt):

sn.exe -T TestListGenerator.Tests.dll

NB: the case of the T matters. This will give you something like:

Public key is

Strip out the 0x, and copy the rest of the string. Update your InternalsVisibleTo attribute to be:

[assembly: InternalsVisibleTo(“TestListGenerator.Tests, PublicKey=0024000004800000940000000602000000240000525341310004000001000100adfedd2329a0f8

Warning: If you use the public key token instead of the full public key, you’ll end up with an error like the following:

warning CS1700: Assembly reference ‘TestListGenerator.Tests, PublicKey=32ab4ba45e0a69a1’ is invalid and cannot be resolved

Lamda Expressions and Tests

One of tests I am currently writing for TestListGenreator is checking to see if the function that gets a list of all the Tests in a DLL returns the right amount of category associations. The tests come back as a Dictionary<MethodInfo, HashSet<string>> collection. Counting the number of tests returns is easy:

Dictionary<MethodInfo, HashSet<string>> results = ReflectionHelper.GetTestsFromAssembly(“TestListGenerator.Tests.TestSubject.dll”);
int numberOfTests = results.Count;

However, counting the total amount of hashset entries is a little more difficult. In the past, I would have written something like this

int count = 0;
foreach(HashSet <string> value in results.Values)
  count += value.Count;

It works, but it just adds to the mess that is already my test cases logic. I am a great believer in chunking; however, coming from a background in Perl, I am fully aware that there is a point where chunking (writing code succinctly as possible) becomes obsfication. I think Lamda Expressions walk a narrow line in this regard (especially when it comes to understanding the method signatures!). Here’s the above code summed up in a lambda expression:

int count = results.Sum(f => f.Value.Count);

The first thing to note is the .Sum() function. You would have noticed that code completion on generic collection objects now list a whole bunch of methods that have the following icon in front of them: Extension Method. These are extension methods – I’ll cover them in a later blog post. For now, just view them as extra methods. 

For example, Sum, will sum all the objects in the collection. But wait! It takes in this weird “Func<KeyValuePair<keyObject, valueObject>, int> selector” parameter. What is this? Well, it’s a delegate signature. The “Func” means delegate, and what appears inside the <> is a list of parameters. The last parameter is always what the function should return. So, in this case, you need to define a function that takes in a KeyValuePair and returns an int. Then the Sum function will do all the work of adding all the integers you return.

Fortunately, .net allows you to define delegates inline. So you can do the following: kvp => kvp.Values.Count. What this means is that kvp is your name for the KeyValuePair<keyObject, valueObject> parameter, and what follows the => is what you want to do with it. So, the Sum object, calls kvp.Values.Count for each object in the dictionary, and then sums up all those integers.

Alpha released

Woo hoo 🙂 So I spent some time in B Cup today finishing off a few coffees and some of the loose ends of my Test List Generator project. I am happy to say, that today I released the first version.

You can download the alpha from here:

The alpha allows you to specify a DLL you wish to run a category of tests from. You can then specify the list of categories you want to include; as well as a list of categories you wish not to include. The code will then search your DLL for tests that match that criteria and generate a mstest compatible test list.

Parsing Command Line Arguments

If you’ve ever had to write a console application, you’ve probably always just quickly hacked together a command line parser. This is fine when you are the only person who is using the application, because you implicitly understand that the inputfile should be declared after the outputfile. Try explaining this wisdom to your colleague who just blew away their config file they had been working on for the last couple of hours beause they got the arguments the wrong way around 🙂 Or, even worse, WYouWroteThisCode syndrome, where you realise the incompetent programmer who put the parameters this weird way around was you several years ago.

Fortunately, CSharpOptParse comes to the rescue. If you’ve ever played with Perl (who hasn’t?) it’s based on GetOpt, one of the best command line processing libraries in existance. Basically, you define a really simple object to store all your paramers (Very simple, comments axed for brevity on web):

public class Arguments
public string InputFile { get; set; }
public string OutputFile { get; set; }

Then by using attributes, you define how to map the command line on to each of these parameters:

Description(“The input file to process.”)]
public string InputFile { get; set; }

So pretty straight forward right? Then all you need to do is the following magic to parse the parameters:

Arguments arguments = new Arguments();
Parser p = ParserFactory.BuildParser(arguments);
// Parse the args
args = p.Parse(args);

The parse takes all kinds of other options, such as to use – (unix) style or / (windows) style for specifying parameters. It also supports populating items from the enivronment if they are not specified on the command line – cool 🙂 Oh, and if you want to suppot multiple options like I do in TestListGenerator, all you need to do is use a StringCollection, and tell the Parser about it via attributes:[ShortOptionName(‘i’)]
OptDef(OptValType.MultValue, ValueType = typeof(string))]
Description(“A list of categories that should be included in the test list generated. NB: You can define multiple categories by repeating the parameter – see examples.”)]
public StringCollection IncludedCategories = new StringCollection();

NB: Note the gotchya – you need to make sure the StringCollection is instantiated.As a little bonus, there is also a really easy way to generate usage information (works like an XmlTextWriter):UsageBuilder usage = new UsageBuilder();
usage.GroupOptionsByCategory =
“tlg.exe  – Test List Generator for Visual Studio”);

// Generate the list of arguments and descriptions automagically

So there you have it – a simple but elegant solution to your command line woes. Slowly but surely, your code will become less arcane – you’ll thank your past self later.


Do you quite often find yourself putting Console.ReadLine() at the end of each of your console programs so you can see the output before Visual Studio closes the window? And then you find yourself in the embarrasing situation of leaving the code in when it’s deployed so that whenever you run the application from the command line, it ends up sitting there waiting for you to press a key? Well, next time, wrap it in the following:

if (System.Diagnostics.Debugger.IsAttached)
Console.WriteLine("Press a key to quit...");

Ahhh, Code Generation…

So the VSMDI config file is xml… That’s a good thing, because VisualStudio gives you a few tools for code generation when it comes to XML. Are you sick and tired of writing code to traverese an XML document, or sick of writing lines to create nodes and elements and insert them in the right order? Well, if you’ve already got a sample of the format you want, you can perform this useful shortcut:

  • Fire up Visual Studio, and add the xml file to your project
  • Xml > Create Schema
  • Add the new XSD file to your project
  • Open a command prompt and navigate to the XSD file
  • xsd.exe -c -l:c# -n:NAMESPACE nameofxsd.xsd

And voila, a class to interface with 🙂 Instantiate, manipulate, and then write it out with:

XmlSerializer ser =
new XmlSerializer(typeof(TestLists));
ser.Serialize(newStreamWriter("out.xml"), testList);

VSMDI / TestLists

Ok, so I had my first look at a test list today… It doesn’t look to scary:

<?xml version="1.0" encoding="UTF-8"?>
<TestLists xmlns="">
  <TestList name="Test" id="4965f8d9-7c75-44ec-a6a7-1ddc90be355c" parentListId="8c43106b-9dc1-4907-a29f-aa66a61bf5b6">
      <TestLink id="0c16c787-7ced-9874-280d-95763867baa5" name="OneCategory" storage="testlistgenerator.tests.testsubjectbindebugtestlistgenerator.tests.testsubject.dll" type="Microsoft.VisualStudio.TestTools.TestTypes.Unit.UnitTestElement, Microsoft.VisualStudio.QualityTools.Tips.UnitTest.ObjectModel,   PublicKeyToken=b03f5f7f11d50a3a" />
  <TestList name="Lists of Tests" id="8c43106b-9dc1-4907-a29f-aa66a61bf5b6">
    <RunConfiguration id="d0fd86fa-f02f-47a6-ac94-1e966ef3564e" name="Local Test Run" storage="localtestrun.testrunconfig" type="Microsoft.VisualStudio.TestTools.Common.TestRunConfiguration, Microsoft.VisualStudio.QualityTools.Common,   PublicKeyToken=b03f5f7f11d50a3a" />
  <TestList name="Test 2" id="d2c82856-b727-489a-97ee-21e164e205ed" parentListId="8c43106b-9dc1-4907-a29f-aa66a61bf5b6">
      <TestLink id="aebae68c-f6ca-a80b-2c43-c4ddd95e4109" name="RepeatedCategories" storage="testlistgenerator.tests.testsubjectbindebugtestlistgenerator.tests.testsubject.dll" type="Microsoft.VisualStudio.TestTools.TestTypes.Unit.UnitTestElement, Microsoft.VisualStudio.QualityTools.Tips.UnitTest.ObjectModel,   PublicKeyToken=b03f5f7f11d50a3a" />
      <TestLink id="7c44089f-5995-0c35-6e3b-21a742dd47a9" name="TestGetTestCategories_Class1" storage="testlistgenerator.testsbindebugtestlistgenerator.tests.dll" type="Microsoft.VisualStudio.TestTools.TestTypes.Unit.UnitTestElement, Microsoft.VisualStudio.QualityTools.Tips.UnitTest.ObjectModel,   PublicKeyToken=b03f5f7f11d50a3a" />

List of Tests is the default Test List in Visual Studio. The other two are tests that I made. From the looks of things, each TestList has a parent TestList as well as a list of Tests (TestLinks) that are in the TestList. My concerns are the ids – what exactly are they, and how are they generated?

A quick search on google turns up this thread. Hmm, helpful – I even commented on it. So now I know how the ids are generated for TestLinks. However, I can’t seem to work out how they are generated for TestLists – this could be a problem 😦