Uncategorized

A simple refactoring to compile time interfaces

Recently I had an opportunity to do some refactoring in a raytracer that we use for demos at work.  I don’t how you feel about refactoring, but I personally find it an immensely calming and rewarding experience because it allows me to make improvements to quality and design in a very incremental and manner, even to old or large code bases.

I also really enjoy generic programming and I wanted to share this refactoring because it is a simple example of how using generic programming to rely on a compile time interface can reduce overall code. It also doesn’t rely on techniques that can make generic programming inaccessible like compile time polymorphism, typelists or the curiously recurring template pattern.

It’s borderline too simple to write down, but I’m doing it anyways so here it is…

Two similar classes a float3 and a color3

Before the refactoring, the raytracer had 2 classes that were used all over the place in the code, float3 and color.  Both of these are simple structs that had 3 data members.  Their definitions are incredibly similar and looked like this:

struct color3{
   float r;
   float g;
   float b;
   color3(){}
   color3(float r_, float g_, float b_):r(r_),g(g_),b(b_){}
};
struct float3{
   float x;
   float y;
   float z;
   float3(){}
   float3(float x_, float y_, float z_):x(x_),y(y_),z(z_){}
};

Sadly, 2 sets of methods to operate on the classes

Unfortunately, both of these classes also initially had strongly typed supporting functions like ‘add’ which were incredibly similar, only the float3 version used x,y and z while the color version used r, g, b:

color3 add(color3& c1, color3& c2)
{
   return color3(c1.r + c2.r, c1.g + c2.g, c1.b + c2.b);
}
float3 add(float3& v1, float3& v2)
{
   return color3(v1.x + v2.x, v1.y + v2.y, v1.z + v2.z);
}

Unfortunately while incredibly straightforward and seemingly simple, this isn’t very desirable. There were 4 or 5 functions like this that were essentially identical with the only difference being the types and member variable names.

These methods can be made generic

What I wanted was a simple way to write the add function once that added no additional size or instruction overheads to the structs, since both of these structs are used frequently inside a very tight loop, even small changes will have a noticable performance impact.  For example one of the refactorings I did removed a single call to the copy constructor and that increased the performance by about 50ms per frame ~12.5% (seriously) the scene I was looking at went from ~4 fps to ~5 fps on my quad core.

Here’s what I decided to do, I’ll explain in a moment, first I created a template function ‘add’:

template <class vec>
inline vec add(vec& v1, vec& v2)
{
    return vec(v1.first()  + v2.first(),
               v1.second() + v2.second(),
               v1.third()  + v2.third());
}

 
This template function works with two references to type ‘vec’; and relys on the member functions first, second and third to be present in vec because it uses them.
Implementing each of these methods is straightforward:

// implementation of first for float3
inline float first()
{
    return x;
}

// implementation of first for color3
inline float first()
{
    return r;
}

 
This may seem inefficient, but it turns out not to be. 
 
Because I’m relying on a compile time interface rather than a virtual function there is no additional overhead.  In this method the return value is a simple built in type, but even for larger classes when the value is constructed as part of the return statement a compiler may optimize away the copy construction, and most do.  This is known as the return value optimization and the C++ spec explicitly allows for this (section 12.8).
 

Did it make a difference?

As far as code reduction goes, I removed 7 member functions from one class and turned the other 7 into free functions and I added a total of six member functions for the first, second and third.   So this was a net minor change from a lines of code perspective, but there are no longer 2 sets of functions with similar names and functionality.
 
I also made it possible to provide mock classes to those free functions for testing, before they were strongly typed and this just wasn’t possible.  Now if I want I can test ‘add’ without relying on float3 or color3.
 
Additionally, if new functions are needed that work on these types, I no longer have to add them to both classes. I can simply add the function that relies on first, second and third and this can instantly be used with both types.
 
Finally, if I’m only looking at reading properties there is now no reason to expose publicly the member variables and this is another massive win because I can protect against potentially non-thread safe data writes.
 
-Rick

Uncategorized

Comments (0)

Permalink

Sleeping Barbers an unsafe design

Inspired by some pretty cool slides by Mike Acton that came through my twitter feed last week, I implemented a version of the Sleeping Barber problem.

For those not familiar with the problem, it’s a story that illustrates some of the perils associated inter process / inter thread synchronization, wikipedia describes it as:

The analogy is based upon a hypothetical barber shop with one barber, one barber chair, and a number of chairs for waiting customers. When there are no customers, the barber sits in his chair and sleeps. As soon as a customer arrives, he either awakens the barber or, if the barber is cutting someone else’s hair, sits down in one of the vacant chairs. If all of the chairs are occupied, the newly arrived customer simply leaves.

Wikipedia was also kind enough to provide pseudo-code for he boring three semaphore solution so an implementation was straightforward and only took a few minutes. 

Here’s the skeleton of the outline

Here’s the snippets of the outline, first 3 semaphores and a counter:

 cooperative_semaphore customers_semaphore(0,num_customers,L"C");
 cooperative_semaphore barber_semaphore(0,num_barbers,L"B");
 cooperative_semaphore seats_semaphore (1,num_seats,L"S");
 int free_seat_counter = num_seats;

Uncategorized
concurrency
concurrency runtime
parallelism

Comments (2)

Permalink

Notes from PDC 2008: Day 1

It’s been a very long but great day today at PDC. I talked earlier in the day on the Parallel Pattern Library, Asynchronous Agents Library and Concurrency Runtime and demonstrating the CTP of Visual Studio 2010.   Hazim Shafi also spoke on multi-threaded and parallel performance.  I spent the latter portion of the day talking to developers at the booth we’re sharing with the High Performance Computing team.

There will be several more talks from the Parallel Computing Platform team at PDC this week, our architect Niklas will be doing a deep dive on the Concurrency Runtime and discussing extensibility, Daniel Moth will be talking soon about the Task Parallel Library and PLINQ and there’s several Symposium sessions lined up.

While I’m here at the PDC I’ll post on my personal blog summarizing experiences and I’ll also try to get a few more examples posted either here or on our team blog which demonstrate the functionality included in the CTP.  That’s it for now though, it’s been a long day. 

-Rick

Technorati Tags:

Uncategorized

Comments (0)

Permalink

Get Adobe Flash playerPlugin by wpburn.com wordpress themes