Threads are Cancer

posted in The Bag of Holding

Published August 23, 2011

Computing today faces a serious problem.

Hardware has reached a point where the bulk of performance increases are coming not via major innovation in the fundamental design of processing systems, but rather through parallelism. Instead of increasing clock speeds, decreasing die sizes, changing instruction set designs, or any number of other approaches to gaining processing speeds, we are duplicating the existing processor technologies and simply throwing two, four, twelve "cores" at the same exact software.

In the past, upgrading a processor every two or three years was sufficient to gain "free lunch" speed boosts in processing capabilities. There have certainly been cases where taking advantage of new instruction sets could provide even more dramatic speed benefits, albeit at the cost of rewriting core pieces of the software in question. (SIMD paradigms are the most notable example of this.)

However, the game is up. We are now looking at a future era where only nominal speed increases can come from hardware improvements, and software must finally take its turn at improving. The chief path for improving computational performance now is parallelism - doing multiple sets of work simultaneously.

The obvious player here is multicore CPU architecture. As cores proliferate, it becomes increasingly mandatory to do something with all that extra silicon, lest the investment in ever-newer hardware essentially go to waste. Another major contender is asymmetric multiprocessing, such as GPGPU programming, whereby heterogeneous processor types are harnessed in tandem for greater throughput.

None of this is news to anyone who has been paying attention to software development over the past ten years or so. In fact, it's already been said many times, by many people, and often with broader perspective or more detail than is available in the handful of preceding paragraphs.

I would like to focus not on the problem as stated, but on what I perceive to be a much deeper and more fundamental issue, and that has to do with the typically proposed "solution" to the problem above.

Parallelism and Concurrency
The first hurdle is the deep confusion between parallel processing, otherwise known as parallelism, and concurrency. Parallelism is the notion of doing multiple things simultaneously, using multiple pieces of execution hardware, to increase overall throughput. Concurrency is the notion of having multiple paths of code execution active at the same time. Concurrency need not utilize multiple hardware elements, although it is difficult to realize true performance gains without doing so.

Concurrency is only one method for achieving parallel processing, and it is far from a good method. Concurrent programs are not merely hard to reason about - they are truly non-deterministic, taking them well outside the realm of correctness proofs and other formal techniques for thinking about whether or not our software is robust.

In an ideal (deterministic) situation, we can look at a program, look at its inputs, and conclude that, if the outputs are correct, the program will always be correct, at least for those particular inputs. Concurrency destroys this. It is no longer accurate to conclude that a program is doing the right thing based on a single execution of the program itself; subsequent executions may have subtle timing differences, leading to different interleaving of the concurrent execution paths, and then suddenly after the five-hundredth execution, the program stops working for reasons which are incredibly difficult to pin-point.

Concurrency is a slave to chaos mechanics - sensitive dependence on initial conditions. A tiny change at ten decimal places out can cascade into wildly different results after even a few seconds of execution. Concurrent programs can go from fine to disastrously broken because of the difference of literal nanoseconds. It is even possible that the very quantum-mechanical interactions of particles within the CPU itself can manifest as differences in the execution flow of a concurrent program.

When your program's correctness is victim to one of the only systems that modern physics has described as truly random, you can never be sure of anything.

Revisiting Parallelism
The fascinating problem here is that, by and large, the world is obsessed with concurrent programming. Despite the inherent and inescapable flaws of the very notion of concurrency, we cling to it as if it were our only hope for advancing the state of computational performance in light of the hardware realities of the day. It is almost as if so many programmers are willing (even naively eager!) to embrace a substandard approach to parallel processing simply because it already exists.

This is a tragic mistake, and reflects a fundamental loss of the entrepreneurial, pioneering spirit that made computing so great in its early decades. It has become all too easy to fall into the trap of assuming that all the interesting work in computing has been done, and now we just need to play with the pieces that our predecessors handed us.

Innovation and advancement do not come from being willing to happily play with a limited toybox. Innovation and advancement come from dreaming of a toybox with bigger, better, more interesting toys. Innovation and advancement come only from a deep discomfort with the contemporary state of affairs, and a powerful drive to change things for the better.

That discomfort is appallingly lacking in most programmers today, and certainly the drive to improve upon concurrency as a model for parallel processing is virtually nonexistent in most circles.

Parallelism need not have anything to do with concurrency, and probably should not have anything to do with concurrency.

Threading Is Cancer
Concurrency is not just evil because it is non-deterministic; it is evil because it is parasitic. As soon as you introduce concurrency to one part of your code, it develops a malignant tendency to spread. Suddenly, synchronization becomes crucial in places that never had to worry when they were exclusively serial. Transactional semantics must be upheld at ever-increasing radii and boundary levels. Assumptions that could simplify and expedite the creation of the program in a single-threaded world can overnight turn into the worst mistakes in the architecture.

Threading is not content to live in a well-compartmentalized zone of a program. Even if we expend the considerable effort necessary to localize concurrency to a specific subsection of our software, the ramifications must by necessity propagate all over the place. The knock-on effects of introducing multithreading to a program can become nightmarish to keep under control. In many cases it is easier to apply blunt-force trauma to a code base (i.e. spread locks and transactions and other mechanisms all over the place) than to "simply" contain the concurrency to a particular module or feature set.

The practical upshot of this is that the tumor of multithreading will, by nature, spread its diseased assumptions as far throughout a code base as it can. Elements of a program that can be proven correct in a deterministic environment are suddenly robbed of their robustness guarantees, because they must now interface with non-deterministic modules. Even if the internals of a module are deterministic, as soon as the interactions with other modules involve non-determinism, any hope we have of reasoning about that module goes out the window.

A Better Future
I believe that there is a paradigm for parallel processing which avoids the traps of concurrency. To be clear, there will always be inherent dangers in parallelism, and always be a limit to the gains we can realize via parallelism while still holding on to ease of reasoning about the code itself.

However, it is my gut instinct that the solution exists, and consists of only a few simple elements:

Concretely composable program fragments
Impenetrable abstraction mechanisms
Native language support

Composition of program elements is key. We need a way to build programs from increasingly sophisticated sets of operations, and we need ways to reason about how those sets are combined and how they interact.

Once we have this, we can start working on abstractions for parallelism that do not leak, in either direction. It is important that the reality of how parallelism is achieved be hidden from programs; as soon as that black box cracks, non-determinism will creep back in, and the entire charade is subject to the same cancerous problems as before. By the same token, it is important that the abstractions themselves cannot be trifled with by the higher-level program. They must be treated as fundamental and immutable principles of how the program is composed. As soon as the program can meddle in the mechanics of parallelism, it can recreate the problems we're trying to avoid, once again.

Finally, this all suggests that native language and execution environment support will be mandatory. It has been argued before that parallelism cannot be accomplished as a library; it must be a language feature from the beginning. Anything less creates the leaky abstractions that we have already seen to defeat our entire goal.

I think the only really hard part of this is education. Deriving the principles is not terribly difficult; building the abstractions should be very straightforward for anyone with experience in building parallelized systems; and creating a language and execution environment to support all of this is a tremendous amount of work, but not inherently hard.

The real challenge is going to be convincing the masses that concurrency is a terrible monster, and that the alternative solution is worth learning and investing in.

Previous Entry Epoch Compiler Optimization: Recap

Next Entry Become a Good Programmer in Six Really Hard Steps

4 likes 25 comments

Comments

Telastyn

I agree with all of your problems, but disagree with many of your deductions. Not all parallel programs need to have that sort of cancerous spread (or at least not any more than the design requirements that you need to deal with in a multi-tasking operating system). Some of them will, sure. But the increased availability of threadsafe collections, user education, and common techniques makes isolation a practical option for skilled programmers.

We have language support for massive parallelism (erlang) but the restrictions are too great for general purpose use.
We have abstractions for the hard and fast mechanisms to build off of, but they're either too low level to be practical, or too heavy-handed to do what we need.

I am skeptical that we're somehow [b]now [/b]going to come up with something revolutionary enough to overcome those historical roadblocks.

August 23, 2011 06:15 PM

ApochPiQ

Skilled programmers can cope with this, yes. What I'm worried about is [i]everyone else[/i].

Parallelism is already hard; why restrict it to a subset of "skilled programmers" unnecessarily? Is there any good reason why we can't give less-than-stellar programmers tools to make efficient and [i]safe[/i] parallel programs? Yes, there will always be room for the really great coders to do even better by breaking the mold, but why shaft the majority because of it?

Remember, at one point in time, if you couldn't use a punch card, you weren't cut out to be a programmer. Times change, and I think it's virtually self-evident that advancement is worth pursuing.

Erlang's model of parallelism is a good [i]start[/i] and phenomenal for certain classes of problem. I'm looking for a more generalized solution that can sacrifice a little bit of optimality for a lot of reach.

I agree that the extant abstractions are at the wrong level of granularity or resolution for the most part; again, I'm not saying we need to use what we've already got. In fact, I specifically stated that we need to work on making non-leaky [i]new[/i] abstractions.

Remember, a lot of historical roadblocks are only cleared once sufficient amounts of effort have gone into clearing them. Nobody landed on the moon in 1493, but that's hardly a good argument that nobody could [i]ever[/i] land on the moon, because obviously we eventually pulled it off. The more time goes by, the closer we will tend to be to a point in history where collective knowledge and effort is sufficient to jump those hurdles.

August 24, 2011 07:45 PM

Telastyn

Oh I'm not saying it should be limited, but I think that there's bigger bang for your buck improvements that can be done to make non-skilled programmers do more better stuff.

Perhaps what I was getting at is that 'we need better ways to make parallel programs' isn't exactly a new problem. Many people far more skilled than I have looked at it for years and we haven't exactly gotten resounding success. I don't see impediments like processing power or programmer education or memory needs preventing something awesome (but impractical/academic) from catching on, so infer that the problem itself is just that hard.

Not to say the problem doesn't need addressed, just saying that providing language supported impenetrable abstractions seems like something someone already would've tried.

August 25, 2011 03:20 AM

swiftcoder

[quote name='Telastyn' timestamp='1314123355']
We have language support for massive parallelism (erlang) but the restrictions are too great for general purpose use.[/quote]
I have only used Erlang for relitavely small-scale experiments, so I haven't really had the chance to run across any of it's restrictions (apart from some interesting syntactic quirks and odd standard library decisions). Would you mind expanding on what some of the restrictions are?

August 25, 2011 04:11 AM

ApochPiQ

If anyone's tried it, I certainly have never had a chance to play with it :-)

I'm also curious what you would consider to be more economical investments in terms of helping programmers be more productive.

August 25, 2011 09:43 AM

EnlightenedOne

This comment was broken by being added after comments were added without spaces being properly formatted I think the forum is still under maintenance it seems quite broken on IE9. I reposted below.

August 25, 2011 03:16 PM

EnlightenedOne

First a minor complaint with one or two of your arguments:

1. You cannot assume timing will be consistent in a single threaded application on any multiprocessing OS because the priority of a thread is inherently determined outside of the program. You may assume that processes are sequential but not that their operation is fully deterministic to the user.

2. "This is a tragic mistake, and reflects a fundamental loss of the entrepreneurial, pioneering spirit that made computing so great in its early decades. It has become all too easy to fall into the trap of assuming that all the interesting work in computing has been done, and now we just need to play with the pieces that our predecessors handed us." - This is hugely subjective, what has driven pioneering programming in recent decades is closer to human languages and in any case there are constantly new languages (and old) being tampered with and rewritten, huge libraries to replace low level tasks and broaden compatibility (another stirling example of Boost).

I think the trouble with your concurrency argument is much like entanglement you want to see where your programs threads are and how fast they are going at the same time without having any influence on how they must behave.

Where I stand on threading:
The only real problem I have had with multi-threading (particularly when using the Boost library) is that I have to write my own barrier class where one thread forces all other threads to wait for it to complete a task before enabling them all to attempt to lock their own groups of (sometimes shared) data. So long as a program uses an ordered sequential list of mutex's to lock varying bits of data in a class live lock is never a problem.

A barrier class makes sense for speeding up programs where you use an engine paradigm where one task must be done and then several parallel tasks can be performed which may then have to all stop and wait for a single process to occur before being allowed to restart their operation. In fact the principle is similar to that of the "grand central station" thread que paradigm. I would say that your argument is moot on the ground that standardisation of how to interact with multiple threads safely is not simply quantifiable (with a lot of threads accessing unique volumes of data its unlikely ever to be), but concepts for simplifying concurrency are everywhere.

Reclaiming threads from the air of cancer, and clear limitations of compilers:
Threads are not cancerous, we cannot control cancer, we can control threads. Poor initial planning for processes that require concurrent behaviour and poor standardisation of coding practices are cancerous. Once you understand how to stop dead/live lock threats the problem is just one of efficiency and repeatability. Having development programs flag when source locks a mutex but never unlocks it elsewhere in the code (or in error handling branches of code) is the closest thing you can get to assisting development, but what if threads want to lock and hold a lock on data to interact with linked lists then unlock that data for other threads later? You can create code that tests if a user can establish a lock and if they are trying to access data through a function that requires a lock taken from elsewhere and flags it before safely exiting but you cannot infer deeper context as to how data is being used.

Conclusion, the cost of your abstraction or standardisation:
If I understand you fully, you want to create abstraction from efficient design to gain performance by reducing the risk of human error in threading, you will reduce development time at the cost of nullifying the inherent value of having parallel processes by abstracting in this way. This is like garbage collection, you could create a more extensive volatile key word for variables or classes which guaranteed them better than existing implementations of volatile do, but you would still have your garbage collector slowing down the entire process as an overhead. Even if your talking about bodies for threads like threaded classes (which are an excellent way to manage threading) your concept impose to much control over a free process. Your best bet is to work on a library that runs on top of a language using threads and establish higher level data structures which fragment threads and data as you think may help, you could be on to something if you created a template for a threaded application with multiple architectures of common threaded application design, but I doubt you would make any money out of it.

August 25, 2011 03:46 PM

cdoty

Web programmers are learning to deal with multi-threaded programming through technologies such as node.js. The irony is that node.js currently runs on a single thread, but the internal VM (Javascript V8) doesn't support concurrent execution, except through callbacks. The concept of threads are hidden under the language.

August 25, 2011 08:44 PM

ApochPiQ

[quote name='EnlightenedOne' timestamp='1314287207']
First a minor complaint with one or two of your arguments:

1. You cannot assume timing will be consistent in a single threaded application on any multiprocessing OS because the priority of a thread is inherently determined outside of the program. You may assume that processes are sequential but not that their operation is fully deterministic to the user.[/quote]

While [i]timing[/i] is non-deterministic, order of operations and the lack of interleaving of operations [i]is[/i] deterministic. You can prove a program to be correct in the presence of non-deterministic execution timing, but you cannot prove a program to be correct in the presence of non-deterministic order/interleaving of operations.

[quote]2. "This is a tragic mistake, and reflects a fundamental loss of the entrepreneurial, pioneering spirit that made computing so great in its early decades. It has become all too easy to fall into the trap of assuming that all the interesting work in computing has been done, and now we just need to play with the pieces that our predecessors handed us." - This is hugely subjective, what has driven pioneering programming in recent decades is closer to human languages and in any case there are constantly new languages (and old) being tampered with and rewritten, huge libraries to replace low level tasks and broaden compatibility (another stirling example of Boost).[/quote]

Be careful not to pull that quote out of context. I was talking about the popularity of concurrent programming and the lack of significant public efforts to supplant it. I don't mean to suggest that there is [i]no[/i] innovation or forward progress being made in computing these days - just that the [i]degree[/i] is substantially less than it once was, because so many people are content to just accept the state of the art instead of trying to advance it.

[quote]I think the trouble with your concurrency argument is much like entanglement you want to see where your programs threads are and how fast they are going at the same time without having any influence on how they must behave.[/quote]

Not sure I follow the analogy...

[quote]The only real problem I have had with multi-threading (particularly when using the Boost library) is that I have to write my own barrier class where one thread forces all other threads to wait for it to complete a task before enabling them all to attempt to lock their own groups of (sometimes shared) data. So long as a program uses an ordered sequential list of mutex's to lock varying bits of data in a class live lock is never a problem.[/quote]

I submit that either (1) you're doing very simple concurrency or (2) you have a lot of problems in your concurrent code that you don't know about yet. Getting scalable concurrency robust is extremely hard. Granted, I'm thinking in terms of dozens of threads on several hardware cores and many GB of RAM pumping thousands of simultaneous network connections, so I'm kind of on the extreme end here - it may well be that the majority of concurrent programs are actually not that problematic, I don't know. I can only speak from my current perspective.

[quote]A barrier class makes sense for speeding up programs where you use an engine paradigm where one task must be done and then several parallel tasks can be performed which may then have to all stop and wait for a single process to occur before being allowed to restart their operation. In fact the principle is similar to that of the "grand central station" thread que paradigm. I would say that your argument is moot on the ground that standardisation of how to interact with multiple threads safely is not simply quantifiable (with a lot of threads accessing unique volumes of data its unlikely ever to be), but concepts for simplifying concurrency are everywhere.[/quote]

You assume that my proposal is to fix concurrency. It is not. My proposal is to jettison concurrency entirely in favor of superior models of parallelization.

... Whoops, I am being informed that I'm supposed to be in a meeting :-D Will reply more later.

Thanks all for the feedback thus far!

August 25, 2011 10:58 PM

Washu

A few things:

We're already seeing a trend towards languages abstracting away the necessary components of asynchronous operations. Take a look at some of the new extensions to the C# language standard that are currently in development ([url="http://blogs.msdn.com/b/csharpfaq/archive/2010/10/28/async.aspx"]async/await keywords[/url]). These remove the need to explicitly thread anything, although you do have to be sure to handle shared state properly still.

Or the new std::async/futures/promise framework, which is not as nice and quite ugly to use extensively, but still removes the need to create or manage threads, and even helps to abstract a lot of the synchronization as well, through std::atomic.

The biggest issue I've actually encountered with novice programmers and threading is that they rely too much on shared state. Remember: you can create copies of things. So do so. If you find yourself with a large amount of synchronization taking place, take a step back and look it over. Chances are you'll find that you dont' need most of that synchronization by simply copying the appropriate data structures and establishing a singular synchronization point (if one is even necessary).

Also, remember me? Well, be sure to remember me when the F&A beta for GW2 comes up. Lest I keel you.

August 25, 2011 11:34 PM

Telastyn

[quote]
... restrictions on Erlang?
[/quote]

I've not done much at all with it, but the functional focus I see as a limitation, the specificness of its featureset and libraries...

[quote]
...more economical work for poor programmers?
[/quote]

Based on my perception of how much effort would go into making a parallel programming environment which is good and easy to use, almost everything?

Any sort of incremental improvement in libraries, syntax readability, tools, error messages... Making source control that isn't a complete pain to use. Making generated data access not suck. Making something other than JavaScript (even if it builds down to javascript) as a client side option. Making a good, standard templating/markup language for generating html (like MVC for .NET perhaps? but that's not there yet). Improving static analysis (or generating automated tests) to detect errors in algorithms/data structures. Improve auto-formatter integration with source control. Improve databases to integrate better with in-memory cache/nosql clusters...

August 25, 2011 11:39 PM

swiftcoder

[quote name='Telastyn' timestamp='1314315599']the functional focus I see as a limitation[/quote]
Not sure I see that as a particular limitation, but then again, i haven't tried building large-scale systems in a purely functional language. The big win I see is the complete lack of mutable state - this makes parallelism almost trivial, by comparison to traditional shared-state + threading.

[quote]the specificness of its featureset and libraries...[/quote]
Yes, this is definitely true, *but*, that is something that could be remedied, largely without touching the language itself.

***this editor hates me***

August 26, 2011 12:08 AM

ApochPiQ

[quote name='EnlightenedOne' timestamp='1314287207']
Reclaiming threads from the air of cancer, and clear limitations of compilers:
Threads are not cancerous, we cannot control cancer, we can control threads. Poor initial planning for processes that require concurrent behaviour and poor standardisation of coding practices are cancerous. Once you understand how to stop dead/live lock threats the problem is just one of efficiency and repeatability. Having development programs flag when source locks a mutex but never unlocks it elsewhere in the code (or in error handling branches of code) is the closest thing you can get to assisting development, but what if threads want to lock and hold a lock on data to interact with linked lists then unlock that data for other threads later? You can create code that tests if a user can establish a lock and if they are trying to access data through a function that requires a lock taken from elsewhere and flags it before safely exiting but you cannot infer deeper context as to how data is being used.[/quote]

I'm not sure I understand what you're arguing for here. The fact that things like nested lock orders and lock swapping even exist is strong evidence (to my mind at least) that concurrency causes threading-related concerns to proliferate throughout code. If you want to avoid deadlocks, you have to maintain lock orders, and that is generally not something that languages or thread libraries or even OSes will enforce for you - you have to do it by hand. If you want to do lock handoffs or nontrivial locking semantics, you have to enforce this by convention in the code - there is no support from the tools to help you. And this means that [i]all[/i] of your code that interacts with that locking pattern has to be aware of it, and therefore the threading considerations have leaked across boundaries that they shouldn't have to.

We have issues precisely because the usage pattern of shared state can't be introspected by the threading system.

[quote]Conclusion, the cost of your abstraction or standardisation:
If I understand you fully, you want to create abstraction from efficient design to gain performance by reducing the risk of human error in threading, you will reduce development time at the cost of nullifying the inherent value of having parallel processes by abstracting in this way. This is like garbage collection, you could create a more extensive volatile key word for variables or classes which guaranteed them better than existing implementations of volatile do, but you would still have your garbage collector slowing down the entire process as an overhead. Even if your talking about bodies for threads like threaded classes (which are an excellent way to manage threading) your concept impose to much control over a free process. Your best bet is to work on a library that runs on top of a language using threads and establish higher level data structures which fragment threads and data as you think may help, you could be on to something if you created a template for a threaded application with multiple architectures of common threaded application design, but I doubt you would make any money out of it.
[/quote]

I fundamentally disagree with your perspective on garbage collection, so I don't really agree with your conclusion by analogy with GC systems. I also strongly disagree that libraries are the right place for threads; much wiser programmers than myself have argued this case better than I can, so I won't risk making a half-baked attempt at duplicating their arguments.

And as for making money... I'm not in this to get rich :-)

August 26, 2011 12:08 AM

ApochPiQ

[quote name='Washu' timestamp='1314315244']
A few things:

We're already seeing a trend towards languages abstracting away the necessary components of asynchronous operations. Take a look at some of the new extensions to the C# language standard that are currently in development ([url="http://blogs.msdn.com/b/csharpfaq/archive/2010/10/28/async.aspx"]async/await keywords[/url]). These remove the need to explicitly thread anything, although you do have to be sure to handle shared state properly still.

Or the new std::async/futures/promise framework, which is not as nice and quite ugly to use extensively, but still removes the need to create or manage threads, and even helps to abstract a lot of the synchronization as well, through std::atomic.

The biggest issue I've actually encountered with novice programmers and threading is that they rely too much on shared state. Remember: you can create copies of things. So do so. If you find yourself with a large amount of synchronization taking place, take a step back and look it over. Chances are you'll find that you dont' need most of that synchronization by simply copying the appropriate data structures and establishing a singular synchronization point (if one is even necessary).

Also, remember me? Well, be sure to remember me when the F&A beta for GW2 comes up. Lest I keel you.
[/quote]

I agree that the direction of C++11 and C# is a good one. I'd like to see even more effort in this direction, though, even to the point of building a language around those constructs rather than building the constructs into a language. (Thankfully, I'm already working on that!)

Fundamentally I think that escaping shared state and even thread-style concurrency entirely should be possible, and I feel like languages are going to play a key role in enabling that transition.

August 26, 2011 12:11 AM

ApochPiQ

[quote name='Telastyn' timestamp='1314315599']
Based on my perception of how much effort would go into making a parallel programming environment which is good and easy to use, almost everything?

Any sort of incremental improvement in libraries, syntax readability, tools, error messages... Making source control that isn't a complete pain to use. Making generated data access not suck. Making something other than JavaScript (even if it builds down to javascript) as a client side option. Making a good, standard templating/markup language for generating html (like MVC for .NET perhaps? but that's not there yet). Improving static analysis (or generating automated tests) to detect errors in algorithms/data structures. Improve auto-formatter integration with source control. Improve databases to integrate better with in-memory cache/nosql clusters...
[/quote]

I guess for me this is just such an interesting problem that it's worth tackling despite the fact that it's hard. Or, maybe more accurately, I think I'm interested in the problem exactly [i]because[/i] it's so hard.

To each their own, though; and while I certainly agree that there's plenty of ground to be gained in the areas you listed, I don't think that invalidates the significance of searching for answers to the parallelism problem.

August 26, 2011 12:14 AM

maxgpgpu

I agree that multi-threading is very bad news. However, as with other aspects of software development, my experience has been the flaw in this area is letting someone else take responsibility for "making it work", and "making it robust". I say "trust no one", "never accept non-transparency". Which to me, means not wanting or adopting language mechanisms.

I'll tell you what my solution is. I'm not sure what name should be attached, but perhaps "cooperative parallelism" or "parallel subsystems". For example, in a game engine, let's say "collision detection" takes too many CPU cycles to put that process in the serial flow of processes the engine executes.

My solution is to hand that ENTIRE subsystem to another CPU-core. Think about it. In the serial implementation (which is made to run reliably first), work gets done, then collision detection is performed, then a series of other processes are performed, some of which depend upon the results of collision detection.

So my solution is to have the "master CPU core" get to the point where it starts collision detection, then look to see whether it has another "slave CPU core". If no other core exists (or is available), the master CPU core just executes the collision detection subsystem itself, then moves on to subsequent work. If another CPU core is available, the master write a value into memory to tell the other core everything is ready for collision detection to be performed, then wakes up that slave CPU core.

Then the master core executes other processes or subsystems that have nothing to do with the graphical objects and their vertices. When it finishes one (let's say "processing the message queue"), then the master CPU core checks to see whether the slave CPU core has written to a specific location that says "collision detection complete for frame n" (the current frame). If collision detection is not yet complete, the master CPU core executes another independent subsystem (let's say "updating the audio queue", or "sending network messages", or "computing object behavior", or "you name it" --- all of which are independent subsystems that do not need to access or change data relevant to collision detection.

This is "bomb proof". Sure, a bit of thought is required to survey all the information that is accessed or written by each subsystem, but in my experience, it is quite obvious that "processing the message queue" does not change "object vertices" or anything else accessed or modified by the "collision detection" subsystem.

I find this kind of "multi-processing" (if you will), extremely reliable. It isn't like a programmer could never make a mistake (some data accessed by two subsystems executing simultaneously), but... [i]that is a mistake, a clear bug[/i]. Note that both subsystems might access certain "stable variables", and that's perfectly okay. For example, both subsystems might access the current "frame number", and that's perfectly fine, since frame number is only incremented when the back-and-front buffers are swapped and the back-buffer is cleared.

What must never be allowed is for one subsystem to access data that another might be modifying. Sure, it is [i]possible[/i] to coordinate such things (multi-threaders do this crap all the time), but I don't. I write my subsystems to be clean, modular and very clearly contained to certain information... and as a result I [i]never[/i] run into this situation. When I do, I reorganize the program to make it cleaner, more modular, and avoid the problem. At worst, and this rarely, rarely happens, I am tempted to make one or two "alternate/duplicate variables" that get copied from their potentially dynamic cousins at a safe point (between subsystem execution), which assures the subsystem that accesses them never see dynamic, changing variables (the state of those variables is ALWAYS in a very known and stable state between known subsystems).

Anyway, this is much too long winded, but I totally disagree on this issue as I do about other issues. Do NOT depend on some language or library feature to HOPEFULLY let you "get away with something". That is not only a cop out and unreliable (since you don't understand what they're doing), but it changes from version to version of their tools/software/environment, from OS to OS, from language to language, and so forth. In contrast, if you take responsibility to make sure the very structure of execution is solid (which is easy when you do cooperative multi-processing this way), you will never get tripped up by what others are doing. I've been there, done that, and will never, ever trust the reliability of my code on anything provided by someone else. Forget that!

August 26, 2011 03:09 AM

EnlightenedOne

maxgpgpu, Well Boost provides cross platform implementations of threading for the C++ language and so I trust it to work or have a warning tag on anything that doesn't, you shouldn't use a library if you have no appreciation for how it is being a workhorse, but if it just provides the base constructs for threading that works for me. I know a couple of programmers who go the die hard build it all from scratch method, but I figure having code that is multi-platform is not a bad thing, that is very much a pro and con argument for elsewhere. For me implementations of threading outside libraries looks a mess and is while not difficult to use is certainly less neat. I can appreciate your comments ApocPiQ your threading is indeed on a scale outside what I have experimented with I have only ever written a program with 9 threads managed by myself. I had a lot of counter points but I sum mate my qualms with this document aside from wrongly using the word cancer to this final argument: "Parallelism need not have anything to do with concurrency, and probably should not have anything to do with concurrency." - Only at the cost of mass data copying/tackling issues of synchronisation between these cloned programs and basically making threads into what would appear as individual computers networked together to perform a task in almost every sense. A. Show me your parallelism diagram for parallel access to a simple class without concurrent control of that access in a manner that is stable and equally efficient to the human built concurrent alternative threaded design (with pseudo-code please). B. Show me the state diagram behind a compiler when it attempts to determine or inhibit the control over the context of data locking in a way that restores determinism to something that is in nature completely the counterpart of deterministic code design. I hear the idealism I don't see the schematic, I respect your perspective "I'm not in this to get rich :-)". If you do make that schematic then you will be at a vantage point where you can make a lot of money. To quote Spartans - "If".

August 26, 2011 09:43 AM

swiftcoder

[quote name='EnlightenedOne' timestamp='1314351805']
A. Show me your parallelism diagram for parallel access to a simple class without concurrent control of that access in a manner that is stable and equally efficient to the human built concurrent alternative threaded design (with pseudo-code please).
B. Show me the state diagram behind a compiler when it attempts to determine or inhibit the control over the context of data locking in a way that restores determinism to something that is in nature completely the counterpart of deterministic code design.[/quote]

You are thinking too small: you don't need concurrent access control to classes, if there are no classes. You don't need data locking if there is no mutable shared state...

Parallelism on a massive scale may require different paradigms, and different thought patterns. For example, it isn't at all clear that Object Orientation is a good fit for parallelism - several of the current crop of parallel languages (Erlang, Google's Go) don't support OOP. Several don't support shared mutable state either (Erlang, Mozilla's Rust).

August 26, 2011 04:59 PM

ddn3

[font=arial, verdana, tahoma, sans-serif][size=2]Sure multi-threaded code can be invasive in code bases which where single threaded before, but they are hardly a cancer. Most AAA games use threading in some form or another. Almost every Enterprise level application uses threading in some form, concurrency programming has been around for ages now. 4th Generational languages have low level concurrency models support built right in ie Java and C# and some experimental languages like Erlang are based around concurrent execution with many of the features you're looking for (non-shared state, state rollback, deterministic, event driven, native language support, etc..)

[/size][/font]
The problem with Erlang is that it's not as easy to program in and it doesn't fit well with game programming, which is still very serial in its structure. Its' found a following in high volume network applications like MMOs and match making servers, which fit well since its original purpose was for writing a high volume network server for handling telephony routing ( need to verify that )..

Mostly in games, mutli-threaded programming consists of a worker pools operating on parallizable tasks such as physics, graphics processing, animation, etc.. Most of the time the cores are idle since most game task are just not parallizable.. People have tried to parallize game logic but that's much more difficult task. For that you'd need a language which is multi-thread friendly, which C++ is not.. MMOs do it using Elang, Phython, C#?, Java? etc..

I think it's just an issue of using the right tools for the right job.

-ddn

August 26, 2011 11:14 PM

swiftcoder

[quote name='ddn3' timestamp='1314400481']
[font="arial, verdana, tahoma, sans-serif"][size="2"]experimental languages like Erlang[/quote][/size][/font]
[size="2"]Let's not bandy those negative labels too wildly, ok? Erlang was first standardised the same year as C++, and it's use in mission-critical systems predates widespread use of C++. [/size][size=2]Obscure it may be, but Erlang is far from 'experimental'.[/size]

August 26, 2011 11:58 PM

ApochPiQ

I'm kind of depressed that so many people totally missed the point that threading is a bad way to do parallelism to begin with.

It's not about making threading work better, it's about [b]killing it entirely[/b].

August 27, 2011 12:01 AM

swiftcoder

[quote name='ApochPiQ' timestamp='1314403294']It's not about making threading work better, it's about [b]killing it entirely[/b].[/quote]
Perhaps there is an issue of nomenclature as well? At the hardware/OS level, threads are what we have to work with, and that isn't going to change anytime soon. At the language level, most current attempts to get away from threads (i.e. Erlang, Scala, Stackless, Go) consist of layering coroutines and message passing over the underlying OS threading model.

My personal opinion is that this approach is highly successful, and that it is currently hindered not by flaws in the approach, but by a lack of understanding among mainstream programmers, and a lack of enthusiasm borne out of a lack of need - your average software project hasn't yet hit the 'multi-core crunch'.

Are you thinking somewhere along the same lines, or do you feel that the actor/coroutine model is also a failure, and we should be exploring radically different solutions?

August 27, 2011 02:30 AM

maxgpgpu

[quote name='ApochPiQ' timestamp='1314403294']
I'm kind of depressed that so many people totally missed the point that threading is a bad way to do parallelism to begin with.

It's not about making threading work better, it's about [b]killing it entirely[/b].
[/quote]
Now you seem to be hung up with terminology, not reality. Look, if an application has to execute four time-consuming subsystems that are utterly independent*, then ALL the problems with multi-threading (or multi-processing) vanish. Poof. Gone. The way I see it, the problem with multi-threading (or application architecture and design in general) is that everyone is trained to focus on fine-grain multi-threading, which is indeed a nasty cancer in most cases (a few very convenient cases do exist in which fine-grain is non-problematic too, but not enough to bother with).

In applications like games (and we are on a website devoted to games), there simply isn't enough CPU throughput to do everything we want to. In the end, we must give up features, details, number of objects, realism of physics, high-quality shadowing, or something (usually quite a bit).

One solution is optimize. I personally have no problem converting key code into [SIMD] assembly language, but that only goes so far. I totally agree that more attention should be paid to overall architecture and design, but that too has its limits. In the end, some applications are simply "up against the wall", including all game engines and most games. Given the death of Moore's Law and many core CPUs, there is very little choice - we must make those cores work for us as much as we can (without accepting cancer).

And the answer to that is simultaneous execution of subsystems on multiple cores (either with the thread subsystem or process subsystem, but one way or the other). *Any number of threads can read data that isn't changing during the multiple threads/processes cores, but none can alter shared data. The fact is, if designed carefully, the various subsystems can totally avoid even reading any variables that might possibly change. Which means, there is no cancer, not even a mild cough.

But sure, the multi-threading most people do is indeed horrific. I won't even go there.

August 27, 2011 05:01 AM

Washu

Well, you may noticed I tried to avoid mentioning threads. There's a couple of reasons for that.

Firstly, I try and break things down to tasks. Now my general definition of a task goes something like: A task is a synchronous block of code who exports synchronized channels for external accessors and subscribes to synchronized incoming channels for internal accessors.

This means then that a task could be in another thread, another process, on another server on the local network, or on a server on an entirely separate network. Really it doesn't matter from my viewpoint.

Now, on the language support front: I don't think a new language is ultimately the answer. The problem with new languages being platform support. The PS3, XBox360, PS4, Xbox720, Wii2, etc. aren't going to have native support for language X (even Erlang), which means you end up having to "build" the runtime for that platform, which removes it from the comfort zone of a nice supported language target to something you don't get any support for at all. Which is a problem in the enterprise world, and even in the game development world.

The other issue with new languages, or even less used languages like Erlang, is that of developer and even corporate memory. Yes, your advanced uber developers might have spent plenty of time exploring the alternatives, but your average developer is like your average other worker: they know, and only want to know, just enough to get their job done and get paid.

Another thing to notice is that things like std::async in C++11 and await/async in C# don't rely on "threads" at all. In fact, std::async can return a future that executes in the current thread, blocking the current execution path until it completes. You can use enumerations to set the rough type of asynchronous execution you desire, but that doesn't require the underlying implementation to do it in a specific manner, provided it meets the requirements of the standard.

The other problem that needs to be recognized is that no two threading solutions are the same, nor are two different applications of threads necessarily decomposible to the same set of high level primitives. This means that a language which abstracts away the functionality of threads completely, to asynchronous tasklets for example, will be unable to meet the needs of one class of applications (efficiently), while able to meet the needs of another. Furthermore, any language that ends up providing some form of native support for "some" of the other threading styles one might use will typically only due so in a manner that is not necessarily very performance or resource friendly.

August 27, 2011 07:32 AM

undead

I think ApochPiQ has some good points altough I'm not 100% comfortable with the terminology used. From my POV the difference between parallelism and concurrency isn't just about hardware vs software. You wrote: "Concurrency is the notion of having multiple paths of code execution active at the same time." and " Parallelism is the notion of doing multiple things simultaneously, using multiple pieces of execution hardware, to increase overall throughput". As an example, rasterizing a triangle on a CPU is an inherently parallel operation. Actually it's an embarrassingly parallel one. Of course the same code fragment could be executed on multiple threads/taks at the same time. Rasterizing the same triangle 100 times will result in the pixels (tasks) to be rendered (completed) in a different order (time). So yes, we have multiple paths of code execution active at the same time expecially if we need a piece of code waiting for the triangle to be fully drawn before submitting a new one. Still, it is hard to define that as "concurrency". From my point of view the problem lies in the dependencies between multiple code fragments being executed at the same time. As long as there's no strong dependency, we are just in parallelism realm. When multiple code fragments "team up" and "communicate" to solve a problem (sharing resources), we are still in parallelism realm, town of concurrencyville. As maxgpgpu noted, the solution is to avoid dependencies. But this leads us (or at least leads me) to the conclusion sometimes the difference between parallelism and concurrency lies in the nature of the problem we have to solve. In the dinining philosophers problem, the goal isn't to make them eat as much rice as possible but to cope with their behaviour avoiding starvation (deadlocks). If the goal was to make them eat as much rice as possible in a given timeslice then it would be easy. I can create a table for each hardware execution unit, launch N tasks, each of them making the philosophers eat in a deterministic (and optimized) order. Let's say each table has 4 philosophers, I can make 1-3 eat then 2-4 then 1-3, etc. This way I have "solved" the problem without concurrency by making each execution path be independent. But in general I agree when OP makes a (strong) statement against concurrency. I think for the average programmer concurrency is evil because: - it is described (and learned) through fictional problems dating back to the 70s - those problems usually start from the assumpion there's a behavioural anomaly we have to cope with - performance isn't even part of the equation (if I was a barber I would like to have as much customers as possible in 1 hour, not to have customers falling asleep then randomly waking up after a few minutes to check if they can skip the line and sit on my chair) - it makes people think it's fine to waste an huge amount of time building fail-proof code for coping with dumb code snippets I think in order to leave concurrencyville we don't need new languages, it would be much better to focus on smarter thinking and innovative engine designs. The key should be to keep things as simple as possible and for that a library like intel TBB looks more than enough for me. Adding complexity to our toolchain isn't the key... I'm scared we might be tempted into designing more complex components leading to even more complex problems. My only concern is in real world some problems just cannot be made independent and at the same time they are so time-consuming we won't have any other choice than going back to concurrencyville, finger crossed. My 0.02$.

August 29, 2011 09:39 AM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

ApochPiQ

Author

Threads are Cancer

Comments

ApochPiQ

Latest Entries

A Few Farewells

Code Reuse In Actual Practice

Source-Level Debugging For Epoch Programs

Using Poison to Reverse Engineer Code

Using Poison to Reverse Engineer Code

Debugging Information Success

Debugging Information Success

Debugging Epoch Programs

Debugging Epoch Programs

Epoch 64-bit compiler progress

Threads are Cancer

Comments

ApochPiQ

Latest Entries

A Few Farewells

Code Reuse In Actual Practice

Source-Level Debugging For Epoch Programs

Using Poison to Reverse Engineer Code

Using Poison to Reverse Engineer Code

Debugging Information Success

Debugging Information Success

Debugging Epoch Programs

Debugging Epoch Programs

Epoch 64-bit compiler progress

Reticulating splines