Monthly Archives: May 2009

Microsoft Parallel Extensions – Hard”core” code made easy!

Ok, so I must admit that I was a little disappointed last night. I was really looking forward to implementing a threaded Consumer / Producer to try and take advantage of our 16 core datawarehouse server. Instead, I ended up changing one line of code – welcome to the world of Microsoft Parallel Extensions.

The Background (feel free to skip this)

At my work, it’s really important for us to be able to detect if people are sharing their accounts. We do this via storing a permanent cookie on the users machine with a “Machine ID” (a guid we generate). When they log in from that machine, we log the id so we know the user was on a particular machine. The more machines, the more likely they are sharing their account.

This worked well, apart from the fact that people tend to clear their cookies, leading to an artificial rise of “Machine IDs” over time. Three years ago, I came up with an algorithm to find out how to work out what machine IDs were due to cookie clearing, and what ones weren’t. So I don’t bore you to death with the details, I will leave that for another post.

The Details

At the core of my problem was the following loop:

foreach (string user in userlist)
{
  // Do some stuff
  // Commit it to the database
}

This was going through around 60,000 users and 4mil+ records and running a pretty complex algorithm on them. It took around 10 minutes to run, but when I looked at the performance monitor, I noticed that the total CPU usage rarely got over 4% and that only one CPU process spiked – hmm, a common sign that threads would help 🙂 One of my pet hates is having to write really complicated and unintelligble code just to “increase” performance, and threads, not matter how hard you try, always end up looking ugly!

The Solution

Some neuron at the back of my brain started firing, and I got this feeling that I knew a better way. F# – nope, that’s not it – would be helpful, but it would take me a long time to convert everything into Functional Programming. A quick google search bore fruit: this article I read Microsoft Parallel Extensions library back in 2007! Microsoft are now planning on releasing this as part of .Net 4.0; but a CTP for 3.5 still exists.

After installing the MSI and then adding the reference to System.Threading, all I needed to do was to make this simple change:

Parallel.ForEach(userlist, (username) =>
{
 // Same code as before
});

It was that simple! Voila, multi-threading code. If you want more control over the threads created, you can use the TaskManagerPolicy object – a simple way to do this is outlined here.

Caveat: You’re code inside the loop has to be thread safe – since none of my code was modifying shared objects, this wasn’t a concern for me.

The Result

TaskMonitory

Beautiful! The task now takes under a minute instead of 10.

Advertisements