In other words, "write once, run everywhere. No, seriously - everywhere." Except Epoch doesn't suck like Java does [grin]
To provide this degree of flexibility, the Epoch VM obviously needs a serious arsenal of CPU-side parallelization tricks. It's no good to write solid, parallel code and then have it run in a single CPU thread.
A prime example is the "parallel for" concept, where a given set of calculations can be performed in parallel. In a traditional setting, you might see these calculations simply run in serial, in a single thread. The parallel-for construct allows you to split up that loop into chunks, and then feed each chunk to a worker thread to do the actual computations.
As I write this, I'm finishing up the polishing touches on Epoch's very own parallelfor loop. It's taken a couple of hours to really get all the semantics right, but the actual process of adding the control structure was surprisingly easy, albeit time consuming. This gives me a lot of hope for future expansions to the Epoch parallelization repertoire.
Of course, with Epoch, the big news right now is Release 9; as I've mentioned before I plan to debut R9 at GDC'10 this year. (Don't worry, I'll post the release package on the project site the same day [smile])
That leaves me with scant few hours to finish up the release package. I'm down to evenings and potentially a small chunk of time on Saturday, and then Sunday afternoon I leave for San Francisco. Nothing like a little bit of pressure to keep you on your toes...
The only really significant chunk of work left is to add the CPU failover logic so that when a suitable GPU is not present, the CUDA extension defers to standard CPU execution. This is slightly important because my demo machine (aka. my notebook) doesn't have a CUDA-ready GPU. It'd kind of look bad to present the project and show it failing to work correctly [grin]
After that, it's down to lots of small detail work; getting the release ready is a fairly involved process, as I'm doing my best not to release totally broken code. Unfortunately, many of these tasks are hard to predict and plan around, so I have no idea at this point if I'll be able to hit my desired R9 deadline.
But, hey, you can sleep when you're dead, right?
I've been tinkering on a little GPGPU library myself (though nowhere near as ambitious as Epoch's transparent facility), so I'm following your discoveries with great interest. You got me wondering if Epoch also needs to deal with GPU latency & sync issues. I'm just using the DX9 API (XNA actually) to do my GPU stuff, so it's entirely possible your CUDA based code doesn't suffer from this. I sure hope for you it doesn't [smile]
Anyway, uploading inputs to the GPU and downloading results causes a pipeline stall for me, which seems to be the key limiting factor to performance. Do you expect this pitfall in Epoch too? If so, how will you handle it? If not, why not pray tell?