concurrency and technology30 May 2008 09:58 am

I was reading Michael Feather’s blog the other day and he mentioned how he thinks lambdas may create a tension between clutter & expressiveness.  In the spirit of consensus building, I’ll tend to agree that like any tool they can be overused or misapplied.  But, there are many extremely useful ways in which they actually reduce clutter, improve readability and quality, and serve as tools to help us refactor and test our code.

Thinking about this on the plane

Last week I was in sunny & warm California talking to some customers and I picked up Game Programming Gems 6 from the library.  One of the most interesting articles to me was an article on using a variant of the Levenshtein string matching algorithm to help find close matches in library routines for maps and other developer / designer created content.

Loved the article, hated the code

Let’s be really, really clear about this. I loved this article; I’m really a geek about this sort of thing.  Unfortunately when I read the code example, like Harrison Ford in now two movies I can only say… "I have a bad feeling about this"

Here it is:

// This is an example of a template function that can be used to find
// a closest match in a std map, assuming the map uses strings as
// the primary index. Note that we must search through every string
// in the map to find the closest match, so it is a very
// slow function: O(n), which is linear time. It’s handy for
// debugging during development, but should not be used in
// release builds.
template<class _InIt>
inline _InIt closestMatch(_InIt _First, _InIt _Last, char const *val, float limit)
{
   float maxVal = limit;
   _InIt _max = _Last;
   for (; _First != _Last; ++_First){
     float newVal = stringMatch(_First->first.c_str(), val);
     if (newVal > maxVal)
    {
        maxVal = newVal; _max = _First;
     } 
   } 
   return (_max);
}

The first thing I noticed about this template function is that it the naming convention seemed a bit inconsistent.

Then I noticed something more problematic,  this function looks awfully familiar.

In fact it looks almost identical to std::max_element or std::accumulate depending on the mood your in; even the parameters are the same.  Only rather than reusing the existing STL function it really looks like accumulate was copied and pasted into the sample and then edited.  Here’s max_element from my VS installation, I hope it’s clear how similar this code is:

// TEMPLATE FUNCTION max_element WITH PRED
template<class _FwdIt, class _Pr>
inline _FwdIt _Max_element(_FwdIt _First, _FwdIt _Last, _Pr _Pred)

   // find largest element, using _Pred
   _FwdIt _Found = _First;
   if (_First != _Last)
      for (; ++_First != _Last; )
         if (_DEBUG_LT_PRED(_Pred, *_Found, *_First))
            _Found = _First; 
   return (_Found);
}


Let’s clean this up

I’d also like to take a moment to share where I’d like to take this, but showing my hand there’s an extraneous function call here and I think that we can get a lot closer to code that looks like this:

template<class InputIterator>
inline InputIterator newClosestMatch( InputIterator begin,
                                                             InputIterator end, 
                                                             std::string val, float limit
                                                             )

       return std::max_element(begin,end,[&](…){….});
}

But first we have to unravel it a bit.  Perhaps it isn’t so hard…  We need to note that we’re using the stringMatch function as our comparison function.  The key piece to unravel is the call to stringMatch to retrieve the comparison number and that this can be re-expressed in a comparison function as:

      stringMatch(leftString, val) < stringMatch(rightString,val)

The next important change is that we need to refactor slightly to work with the collection element (in this case it’s a hash_map) directly as follows:

typedef std::pair<std::string,int> StrPair;

template<class InputIterator>
inline InputIterator newClosestMatch( InputIterator begin,
                                                             InputIterator end, 
                                                             std::string val, float limit
                                                             ) {  
   return std::max_element( begin, end, 
          [&](const StrPair& left, const StrPair& right){  
             return ( stringMatch(left.first.c_str(),val.c_str()) < 
                          stringMatch(right.first.c_str(),val.c_str()) ); 
    };
}

Oh and by the way we’re done now and we basically have reduced several lines of user written code with declarative code that uses a single lambda to compute the expression.

I dunno about you, but this makes me very happy.

concurrency and technology21 Apr 2008 04:53 pm

OK, so I promised to add some more traces showing speedups of real user applications, i.e. not developer tools or benchmark suites.  The weekends over, but I thought that I would add an application that I’m confident a lot of people are using… Internet Explorer 7.

How does IE use 4-cores?

Well, if you remember that IE let’s you have multiple home pages in multiple tabs.  I use this feature to track the news sites I read and my web-mail, and it’s definitely using so here’s what it looks like on a quad core:

ie_4way

You should be able to see that it’s taking about 7.5 seconds to launch IE then the CPU to settle down.  More importantly, there’s a period of approximately 2 seconds between the 3.5 &  5.5 second tickmarks where the CPU utilization is at > 70%.

What about 8-cores?

I happen to be fortunate enough to run this same scenario on a dual-socket quad core and so I collected a trace on that as well.  Keep in mind that the clock speeds of these 2 are completely different so comparing benchmarks between them is quite inadvisable, but I’m going to do it anyways :).

You can see in the capture below that on the 8-way the time has been reduced to 5.5 seconds… it’s still using > 70% of the CPU but now it’s 70% of  8 cores instead of 4…

ie_8way

And that period of high CPU utilization (between 8.5 seconds & 9 seconds on this box) is now only about a second long.

Hey it didn’t get twice as fast, why is this good?

 

I could very easily have shown a paint.net application getting twice as fast as it did on my quad-core, but that’s actually not very interesting.  The point here is that we took a look at an existing application that was easy to find use that was already multi-threaded and was designed to take advantage of inherent latencies in I/O and showed that some decent speedups were available to be taken advantage of with a multi-core system.   This is really just an opportunistic performance win and real users,  using real applications will see some noticeable improvements without installing developer tools or running gadzillions of applications at once (and don’t knock that – I think I have about 40 windows open on my desktop right now).

Perhaps in a future post, I’ll post a couple ideas of how to take even better advantage of multiple cores using something like the Task Parallel Library or PLINQ, but for now that’s it.

-Rick

Uncategorized19 Apr 2008 09:42 am

Continued from "Is 4 cores really the end? (Part 1)".  I see Jeff has posted another blog in response and it looks like he noticed the application  I use all the time…   paint.net.  I use this a lot because it’s free, multi-threaded and it also happens to be open source.  As I mentioned, most of the filters are multi-threaded and here’s a trace from one, it’s got all 4 cores pegged at 100% for about 7 seconds. I didn’t try hard to find this at all, I just picked a blur filter that I use frequently and ran it…

paintdotnet 

 

I’m afraid that since it’s Saturday and I’ve a full schedule, this post is also going to be a bit short, but I will come back and add some more traces over the weekend.

« Previous PageNext Page »