Threading, Parallelism, Multi-Core CPUs and Windows 7

As a software developer, I don't normally get the chance to stay up to date with all the latest hardware developments. I'm not into the overclocking scene either (I know enough to build a system but not enough to tweak frequency multipliers and voltage levels) and I usually only brush-up on my knowledge whenever I put together a new rig (I had no idea the HTPC acronym even existed two weeks ago, and I was suprised to see mini-ITX boards and even some picos going mainstream).

Nevertheless, putting together a new general-purpose desktop PC got me thinking about what CPU I should invest in: a Dual-Core with a higher clock speed and more L2 cache per core, or a more expensive Quad-Core with a lower clock speed and less cache... The CPU market used to be much simpler (or more deceptive) 5 years ago, you'd just go for the highest clock speed you could get your hands on. Those in the know knew that clock-speeds were only part of the story, L1/L2/L3 cache and bus-speeds also made a difference, but getting the highest values in all 3 would give you the performance winner.

Since manufacturers have approached current limitation on clock-speed limits (due to heat, size, minituarization restrictions), recent focus has been on creating dual-core, quad-core and the upcoming octo-core CPUs. The more cores however, the more expensive a CPU is and typically each core has a lower clock-speed and lower cache sizes. So the question is: are 4 slower cores better than 2 faster cores or 1 bed-side heater?

The typical answer is it depends on the applications you'll be running. Gamers typically favored to stick with the highly clocked single cores when dual cores were coming out, and now are generally sticking with the faster dual-cores as quad-cores are entering the market. The reason for this is that multiple slower cores are typically better when running multiple non-intensive applications (as the applications can be distributed across the cores). A processing-intensive game on the other hand is a single application which, depending on how it was programmed, may only be able to run on one core. So if you have a quad core system, and your game is only using one of them, the other three may be virtually idle and unused. In this case it makes more sense to get a dual-core where each core is faster when running by itself.

But what about general desktop usage? Word, Office, Internet, development, video editing, CD/DVD burning, etc... 

My laptop is equipped with a 1.2GHz dual-core ULV CPU and running Windows XP Pro SP3. I chose to use XP over Vista thinking that since it's an older OS, it should run better on a slow CPU. I recently installed Windows 7 RC on a second partition however and it's just as fast, if not faster, than my streamlined XP build (with Aero enabled). This seems a little odd considering all the cool new effects, translucencies and what not. Sure, a lot of that has to do with the OS off-loading all the new effects to the GPU, but could it also be because Windows 7 is optimized for multi-core architectures? This and this article seem to suggest so:

"Win32 was never designed for highly concurrent, asynchronous processing. Parallelism requires adjustments at every level of the stack [it involves] the repartioning of different tasks to different layers. . ."

As a software developer this is not surprising in the least and makes a lot of sense. I said before that a game, depending on how it was programmed, may only be able to run on one core. That's because to run on multiple cores, the program needs to be carefully multi-threaded. That is, the program must be able to concurrently execute multiple tasks (parallel processing). Developers typically hate multi-threading as it adds significant complexity and can be a nightmare to debug. Thus, traditional software, especially games, have been mostly synchronous as there was no good incentive to multi-thread. Running on a single-core CPU, threading adds no performance throughput benefit and it can even slow things down a little (there are some small overheads involved in parallelism such as synchronisation, thread monitors, additional objects and event handlers, etc).

A typical Windows application running on a single-core CPU can usually get away with just two threads, one for updating the UI and responding to user-input, and another for performing lengthy background operations (I've blogged about this before). The two threads are needed not because they make the application faster, but because they allow the user to see the progress of a long-running operation and to perhaps cancel or pause the operation (examples of operations that are not threaded are clearing the temporary files and history in Internet Explorer, when you do so IE hangs until the delete is complete).

Things are very different in the dual/quad/octa-core and beyond world however. Running multiple threads now splits the work across multiple CPUs, and thus increases total throughput. Think about it logically: if an application fully utilizes a 2.0GHz quad-core, it's effectivelly running at 8GHz clock speed, or potentially 16GHz on an equivalent octa-core CPU! Windows XP obviously wasn't designed for this, hence why it may actually run slower on quad-core CPUs, as it's only using one slow core for all the context-switching, multi-tasking, and internal processing.

Programmers now have to take multi-threading very seriously. We cannot afford to blindly write synchronous programs for anything processing intensive. Doing so is a terrible waste of hardware and is actually taking a step backwards (individual cores are getting slower). Whilst in the past we could get away with two threads to write our GUI apps, or one thread for console apps, we now need to code with parallelism in mind and break up each individual work-unit in a separate thread. That is, a single thread should never be running an intensive nested for loop. Instead, the top-level loop should run in one thread and spawn/queue the sub-loop iteration work-units in multiple sub-threads. This way, if the user is running a 64-core CPU five years from now, all cores will be fully utilized!

The problem of transparent parallelism is getting a lot of attention at the moment and products such as Intel Parallel Studio and Threading Building Blocks are a testament to this. I suspect in the very near future we'll be seeing many new design patterns and new frameworks for making threading easier (or even completly transparent) to application programmers (does anyone know of a good Java implementation? I'm thinking of combining the command pattern with some abstract classes and work managers to create a threading library).

Getting back to the original question of whether I should buy a dual-core vs. a quad-core for a general purpose desktop, the answer is still it depends. Programmers are becoming more aware of parallelism but we're not there just yet. Many applications will continue to only run on a single core for many years to come. Many existing game engines are designed for single cores and will take some time for developers to dump these and start all over again from scratch. Windows 7 on the other hand is already optimized for multiple cores, meaning the more you throw at it, the better it'll respond. So while a single application like a non-parallelised video encoder may run slower on a quad core, Windows will respond faster and will let you do more with more applications at the same time (browse the net, burn a DVD, watch Live TV through a tuner card etc.). 

If you're a heavy gamer and care about squeezing the most out of your rig for the next year or so, then a dual-core is your best bang-for-buck. If you're a CAD worker or 3D renderer, you need to consider the characteristics of the applications you use (have they been designed for parallelism? Is there a new version coming out soon?). If you're a general-purpose user/developer, I'd recommend a decent quad-core as a good investment for a three year run.


Popular posts from this blog

Wkhtmltopdf font and sizing issues

Import Google Contacts to Nokia PC Suite

Can't delete last blank page from Word