Hardware has reached a point where the bulk of performance increases are coming not via major innovation in the fundamental design of processing systems, but rather through parallelism. Instead of increasing clock speeds, decreasing die sizes, changing instruction set designs, or any number of other approaches to gaining processing speeds, we are duplicating the existing processor technologies and simply throwing two, four, twelve "cores" at the same exact software.
In the past, upgrading a processor every two or three years was sufficient to gain "free lunch" speed boosts in processing capabilities. There have certainly been cases where taking advantage of new instruction sets could provide even more dramatic speed benefits, albeit at the cost of rewriting core pieces of the software in question. (SIMD paradigms are the most notable example of this.)
However, the game is up. We are now looking at a future era where only nominal speed increases can come from hardware improvements, and software must finally take its turn at improving. The chief path for improving computational performance now is parallelism - doing multiple sets of work simultaneously.
The obvious player here is multicore CPU architecture. As cores proliferate, it becomes increasingly mandatory to do something with all that extra silicon, lest the investment in ever-newer hardware essentially go to waste. Another major contender is asymmetric multiprocessing, such as GPGPU programming, whereby heterogeneous processor types are harnessed in tandem for greater throughput.
None of this is news to anyone who has been paying attention to software development over the past ten years or so. In fact, it's already been said many times, by many people, and often with broader perspective or more detail than is available in the handful of preceding paragraphs.
I would like to focus not on the problem as stated, but on what I perceive to be a much deeper and more fundamental issue, and that has to do with the typically proposed "solution" to the problem above.
Parallelism and Concurrency
The first hurdle is the deep confusion between parallel processing, otherwise known as parallelism, and concurrency. Parallelism is the notion of doing multiple things simultaneously, using multiple pieces of execution hardware, to increase overall throughput. Concurrency is the notion of having multiple paths of code execution active at the same time. Concurrency need not utilize multiple hardware elements, although it is difficult to realize true performance gains without doing so.
Concurrency is only one method for achieving parallel processing, and it is far from a good method. Concurrent programs are not merely hard to reason about - they are truly non-deterministic, taking them well outside the realm of correctness proofs and other formal techniques for thinking about whether or not our software is robust.
In an ideal (deterministic) situation, we can look at a program, look at its inputs, and conclude that, if the outputs are correct, the program will always be correct, at least for those particular inputs. Concurrency destroys this. It is no longer accurate to conclude that a program is doing the right thing based on a single execution of the program itself; subsequent executions may have subtle timing differences, leading to different interleaving of the concurrent execution paths, and then suddenly after the five-hundredth execution, the program stops working for reasons which are incredibly difficult to pin-point.
Concurrency is a slave to chaos mechanics - sensitive dependence on initial conditions. A tiny change at ten decimal places out can cascade into wildly different results after even a few seconds of execution. Concurrent programs can go from fine to disastrously broken because of the difference of literal nanoseconds. It is even possible that the very quantum-mechanical interactions of particles within the CPU itself can manifest as differences in the execution flow of a concurrent program.
When your program's correctness is victim to one of the only systems that modern physics has described as truly random, you can never be sure of anything.
Revisiting Parallelism
The fascinating problem here is that, by and large, the world is obsessed with concurrent programming. Despite the inherent and inescapable flaws of the very notion of concurrency, we cling to it as if it were our only hope for advancing the state of computational performance in light of the hardware realities of the day. It is almost as if so many programmers are willing (even naively eager!) to embrace a substandard approach to parallel processing simply because it already exists.
This is a tragic mistake, and reflects a fundamental loss of the entrepreneurial, pioneering spirit that made computing so great in its early decades. It has become all too easy to fall into the trap of assuming that all the interesting work in computing has been done, and now we just need to play with the pieces that our predecessors handed us.
Innovation and advancement do not come from being willing to happily play with a limited toybox. Innovation and advancement come from dreaming of a toybox with bigger, better, more interesting toys. Innovation and advancement come only from a deep discomfort with the contemporary state of affairs, and a powerful drive to change things for the better.
That discomfort is appallingly lacking in most programmers today, and certainly the drive to improve upon concurrency as a model for parallel processing is virtually nonexistent in most circles.
Parallelism need not have anything to do with concurrency, and probably should not have anything to do with concurrency.
Threading Is Cancer
Concurrency is not just evil because it is non-deterministic; it is evil because it is parasitic. As soon as you introduce concurrency to one part of your code, it develops a malignant tendency to spread. Suddenly, synchronization becomes crucial in places that never had to worry when they were exclusively serial. Transactional semantics must be upheld at ever-increasing radii and boundary levels. Assumptions that could simplify and expedite the creation of the program in a single-threaded world can overnight turn into the worst mistakes in the architecture.
Threading is not content to live in a well-compartmentalized zone of a program. Even if we expend the considerable effort necessary to localize concurrency to a specific subsection of our software, the ramifications must by necessity propagate all over the place. The knock-on effects of introducing multithreading to a program can become nightmarish to keep under control. In many cases it is easier to apply blunt-force trauma to a code base (i.e. spread locks and transactions and other mechanisms all over the place) than to "simply" contain the concurrency to a particular module or feature set.
The practical upshot of this is that the tumor of multithreading will, by nature, spread its diseased assumptions as far throughout a code base as it can. Elements of a program that can be proven correct in a deterministic environment are suddenly robbed of their robustness guarantees, because they must now interface with non-deterministic modules. Even if the internals of a module are deterministic, as soon as the interactions with other modules involve non-determinism, any hope we have of reasoning about that module goes out the window.
A Better Future
I believe that there is a paradigm for parallel processing which avoids the traps of concurrency. To be clear, there will always be inherent dangers in parallelism, and always be a limit to the gains we can realize via parallelism while still holding on to ease of reasoning about the code itself.
However, it is my gut instinct that the solution exists, and consists of only a few simple elements:
- Concretely composable program fragments
- Impenetrable abstraction mechanisms
- Native language support
Composition of program elements is key. We need a way to build programs from increasingly sophisticated sets of operations, and we need ways to reason about how those sets are combined and how they interact.
Once we have this, we can start working on abstractions for parallelism that do not leak, in either direction. It is important that the reality of how parallelism is achieved be hidden from programs; as soon as that black box cracks, non-determinism will creep back in, and the entire charade is subject to the same cancerous problems as before. By the same token, it is important that the abstractions themselves cannot be trifled with by the higher-level program. They must be treated as fundamental and immutable principles of how the program is composed. As soon as the program can meddle in the mechanics of parallelism, it can recreate the problems we're trying to avoid, once again.
Finally, this all suggests that native language and execution environment support will be mandatory. It has been argued before that parallelism cannot be accomplished as a library; it must be a language feature from the beginning. Anything less creates the leaky abstractions that we have already seen to defeat our entire goal.
I think the only really hard part of this is education. Deriving the principles is not terribly difficult; building the abstractions should be very straightforward for anyone with experience in building parallelized systems; and creating a language and execution environment to support all of this is a tremendous amount of work, but not inherently hard.
The real challenge is going to be convincing the masses that concurrency is a terrible monster, and that the alternative solution is worth learning and investing in.
We have language support for massive parallelism (erlang) but the restrictions are too great for general purpose use.
We have abstractions for the hard and fast mechanisms to build off of, but they're either too low level to be practical, or too heavy-handed to do what we need.
I am skeptical that we're somehow [b]now [/b]going to come up with something revolutionary enough to overcome those historical roadblocks.