I’ve still a few more XNA articles I’ve been planning to write. The next one was to involve some threading code, and in the process of creating the article I hit upon something that I found concerning.
The title of this article pretty much gives it away before I can casually introduce it; the property method Thread.CurrentThread is slow on the Xbox. Specifically it’s slow compared to running the same code on a Windows PC. I’d imagine the same applies to the other Compact Framework platforms too; the Zune and Windows 7 Phones. But I can’t say for sure.
Bone-headed code
I stumbled upon this whilst trying to profile some of my code for a future threading article. I had two different techniques I was comparing, and I couldn’t for the life of me understand why one was an order of magnitude slower than the other. It just didn’t make any sense.
What I’d done in one of them was to assume that Thread.CurrentThread is cheap, and simply returns some kind of easy to grab variable. This isn’t the case. I can illustrate by showing you the exact piece of code that was slow:
for ( int i = 0; i < MAX_THREADS; i++ )
{
if ( m_data[i].m_thread_id == Thread.CurrentThread )
{
return m_data[i].m_data;
}
}
Then I changed my slower technique to ‘match’ the faster one, by prefetching the current thread:
Thread thread_id = Thread.CurrentThread;
for ( int i = 0; i < MAX_THREADS; i++ )
{
if ( m_data[i].m_thread_id == thread_id )
{
return m_data[i].m_data;
}
}
Yes, seems a no-brainer doesn’t it? In most cases it does make sense to prefetch variables like that, but I’d been a little silly and assumed there would be negligible difference from not doing so. I’d had some harebrained ideal that the optimizing compiler and the JIT-er would effectively make these two bits of code the same. On my Windows PC build they actually do execute in almost the same amount of time, certainly not different to the same degree as on Xbox.
It makes total sense that it doesn’t get optimized on either platform though. If you look at Thread.CurrentThread in the .NET Reflector, it calls an external DLL function. The compiler (or JIT-er) can’t be expected to think that the return value of that function will be the same for each iteration of the loop.
So, how slow is it?
My first port of call after I realized the big timesink, was to directly compare the performance between the Windows PC build and the Xbox 360 build. My test code was as follows, a loop accessing Thread.CurrentThread a million times:
Thread thread = null;
for ( int i = 0; i < 1000000; i++ )
{
thread = Thread.CurrentThread;
}
Here’s how long it took on each platform:
Windows PC Release Build Not in debugger, JIT-ed |
0.0088s | |
Xbox 360 Release Build Not in debugger, JIT-ed |
4.95s |
Calling Thread.CurrentThread on the Xbox 360 is over five hundred (500!) times slower than on PC. That’s not good! The PC I’m using is nothing special, it’s a three year old Core Duo ‘1’ laptop. So I’m not running some ridiculous PC rig to see these large differences in speeds.
For the code I was working on, I modified it to call Thread.CurrentThread once at the start of my worker thread. I then pass this Thread object through to the other methods that I used, which are called often within that worker thread. So where these methods used to call CurrentThread multiple times themselves before, they no longer needed to. The Thread object for this worker thread was simply passed around instead. The code doesn’t look quite as nice of course, but it’s hardly a disaster.
The particular test I was doing with that worker thread code ran over one hundred times faster with that change made. This was still a bit of a micro-benchmark though, calling some small pieces of code a ridiculous number of times… Regardless, I was just trying to battle-test something that I’d want to use in practice. Using Thread.CurrentThread was an integral part of that.
What I ended up doing was having the facility to optionally pass in a Thread object if the user has one. If they didn’t, it would end up resorting to the slow Thread.CurrentThread instead. This seemed like an ideal solution.
CurrentThreadLite
When looking at the CurrentThread property accessor in .NET Reflector, I traced through to the DLL function which is ultimately called:
public static Thread CurrentThread
{
get
{
return PAL.Threading_Thread_CurrentThread();
}
}
[DllImport("mscoree", EntryPoint="#210")]
public static extern Thread Threading_Thread_CurrentThread();
Listed alongside this particular function was this other similarly-named one; CurrentThreadLite:
[DllImport("mscoree", EntryPoint="#211")]
public static extern IntPtr Threading_Thread_CurrentThreadLite();
That’s interesting. It returns a different type, an integer pointer a opposed to a Thread object. But for my purposes I just wanted to uniquely identify a thread. If it’s faster, then it's ideal for what I need. The suffix 'lite' sounds promising!
I tried to see what .NET code actually uses it, it’s these methods:
At least within the .NET executable that I was inspecting. There are probably other .NET library methods that make use of it. I’m not too experienced with the reflector, so I’m totally unsure of a good way of finding them.
Lite execution
So I was curious about whether this ‘lite’ version was faster than the ‘full’ version on Xbox 360. A google search yielded an extremely small number of results for CurrentThreadLite, and variations of it. I figured as part of this small article I may as well have a little go at seeing if it does perform better. It may help someone else out in future who stumbles upon this page.
The main problem though is that there’s no direct accessor to it. Nothing at all. The StringBuilder object makes use of it to maintain thread-safety on its mutable strings, which I’ve touched on in previous articles. Using the same method I described in this article, I can grab the private member that stores the ‘IntPtr’ thread.
This does mean that I’ll be adding the overhead of both reflection, and likely some allocations within StringBuilder too. Given the limited set of methods I have to play with, I doubt I’ll end up with anything workable within a game. This really is purely a bit of an educational investigation now, unfortunately there's not going to be anything that I'd consider usable.
A crude performance test
I picked out one StringBuilder method which calls CurrentThreadLite: Append(String). You can see the source to this method here:
http://labs.developerfusion.co.uk/SourceViewer/view/SSCLI/System.Text/StringBuilder/
The .NET library code though calls a method with a different name than the reflector tells us. The prototype of which is:
[MethodImplAttribute(MethodImplOptions.InternalCall)]
static internal extern IntPtr InternalGetCurrentThread();
I think I can safely assume this just calls through to CurrentThreadLite.
Anyway, if I pass ‘String.Empty’ as a parameter to Append() it’ll end up doing very little work. But CurrentThreadLite will be ran though, so it’d be a good indicator as to whether it’s faster to execute than the ‘full’ one.
Here’s the code I’m testing with, same one million-times loop as before:
StringBuilder string_builder = new StringBuilder();
for ( int i = 0; i < 1000000; i++ )
{
string_builder.Append( String.Empty );
}
Here’s the results:
Xbox 360 Release Build Not in debugger, JIT-ed |
0.407s |
So it’s over ten times faster than the ‘full’ CurrentThread property accessor, the one I tested further up this article. That’s pretty cool! It makes sense that StringBuilder would use this over CurrentThread, if it were available. If they didn’t, any mutable change to a string would be ten times slower than it is now. That said, it still lags a lot behind PC; being around fifty times slower than that platform.
To reiterate though; unfortunately there's no exposed direct access to CurrentThreadLite. As far as I can tell there's no workable way of making use of it. I'd love someone to tell me otherwise though, it'd be awesome if I could use it.
StringBuilder though is still pretty slow on Xbox 360
In a previous article, I profiled some string manipulation code I’d written. Even though I was implementing alternate methods to what StringBuilder provides, in the end I still called their Append(). So I’d be incurring the hit from calling CurrentThreadLite too. It probably explains why the Xbox was running over ten times slower than PC.
I wonder if I can use reflection to grab the raw char[] data, and the string length. Then as long as I keep my code thread-safe by other means, I could work directly on the char[] buffer and concatenate strings myself. This would effectively do away with the overhead of checking what the current thread is, which is within all the existing StringBuilder mutable methods. Even though CurrentThreadLite is seemingly no slouch, it's still not exactly cheap.
Something to think about… If it brings performance of StringBuilder in line with PC, then it certainly seems worth it. Whether it's worth it for an XNA game though is arguable. Still, I found it interesting looking into this corner of C#.
References
Comments
I really like this micro-coding-optimization articles.
Like you, I found no public alternative to CurrentThread.
Regarding the speed of StringBuilder, forgetting for a moment that it advantages Concat since it mutates an existing string with no garbage generation, I wonder which one is faster: StringBuilder or String.Concat?
-
Thanks!
Here’s a couple of articles which attempt to answer what you mention:
http://dotnetperls.com/stringbuilder-performance
http://www.heikniemi.net/hardcoded/2004/08/net-string-vs-stringbuilder-concatenation-performance/
[…] This post was mentioned on Twitter by Disphunktion, Justin Olbrantz. Justin Olbrantz said: Evil: Thread.CurrentThread is slow on Xbox 360 http://bit.ly/h4M8EL #programming #games #xna […]
Dont forget that relector doesn’t always help with very low level stuff which can be implemented differently on the 360.