Warp Speed Computing

By John Conway | October 21, 2007 4:13 pm

Here is one of the best ideas I’ve heard in a long time – thanks to Matt Searle for passing this on to me!

Computers often do the same thing over and over again. Microprocessors have become amazingly fast, but since they are general purpose, they are not as fast as dedicated circuits which just do one operation, but do it blazingly fast. Field-programmable gate arrays (FPGAs) have been used for over two decades for dedicated operations in high-speed electronics, and now Prof. Frank Vahid and his Ph.D. student Roman Lysecky at UC Riverside have married the FPGA to the microprocessor to create “warp speed” computing.

The idea, like many great ideas, is simple: when a computer program finds that it is executing the same instructions repeatedly, and these can be done faster in an FPGA, the program automatically moves that code section to an on-board FPGA, which will run that section up to a 1000 times faster than the microprocessor.

Lysecky’s dissertation on warp computing won the 2006 “Dissertation of the Year” prize at the European Design and Automation Association.

This is so obviously a great idea, and will speed up computing in so many circumstances that I expect we’ll see it in commercial systems very rapidly. This could be a huge breakthrough…

  • SW

    Using FPGAs to accelerate general-purpose computing is called reconfigurable computing, and while it is extremely interesting, is is not new. This paper from 2001 gives an overview of RC going back to the early ’90s; the GARP project was attempting to use reconfigurable coprocessors to speed up loops ten years ago. Dr. Vahid is doing some neat stuff, working toward automatically detecting suitable loops and placing them in the FPGA coprocessor. However, he didn’t single-handedly invent the entire field.

  • hack

    This is what happens when you take press releases literally.

    Incidentally, the computer you’re using right now has something called a graphics coprocessor for accelerating certain repetetive operations “thousands of times” faster than your CPU can do.

  • http://carlbrannen.wordpress.com/ Carl Brannen

    I spent years working with FPGAs, mostly Xilinx. If you design programs for them by hand, than 1000s of times faster is a slight understatement.

    A Xilinx Virtex-5 FPGA has as many as 51,840 slices, which, if it was all devoted to arithmetic would give 207,360 single bit adders or 25,920 8-bit adders. They run at 550 MHz, so the theoretical maximum for 8-bit addition is 14,256 billion operations per second.

    Actual ratios depend on the operation. Generally, the CPU will have advantages at complicated operations like floating point, especially multiplication and addition, while the FPGA will be blindingly fast on simple 1-bit wide logic operations on as many as 4 variables. Such a logic operation could be, for example, ((A or B) and (C or D)) or ((not A) and (not B)), that is, any logic function of four variables.

    FPGAs are widely used in experimental particle physics.

  • Torbjörn Larsson, OM

    I have always fancied idea of reconfigurable processing, but the field has never taken off. Hopefully approaches such as these works, they may be based on affordable and scalable realizations.

    Speaking of wet dreams, I’m just reading an article on cloud processing, among other background referring to the modular and minimal Xios/3 OS based on xml. Everything on scalable internet, and internet on everything modular. (And Google ogling the proceeds which is both promising and scary.) Keep your fingers crossed for survival in the hard world of software.

  • Torbjörn Larsson, OM

    Oh, and I forgot to add that another driver for reconfigurable processing is its power efficiency, especially if one can convert synchronous processing in a local microprocessor to asynchronous in a more generalized net topology. So as the importance of mobility (and energy saving/prizes) goes up, the threshold for using it should decrease one may hope.

  • jick

    That is certainly an interesting idea, but what computer scientists (or rather, computer engineers) have found is that a single idea never solves all the performance issues. Rather, a good idea frequently exposes yet another bottleneck that was hidden before.

    ILP (instruction-level parallelism) is a related idea, and also an old one. Even in the PC world, the original Pentium processor introduced a “superscalar” architecture, which meant that it can occasionally execute two instructions at a single cycle. Still, today’s monster chips don’t have a hundred instructions running parallel, and not because we don’t have enough chip space to do that. Program codes have inherent interdependency, which limits the amount of things you can do at a time… unless you find really really clever tricks.

    There’s also the memory bottleneck issue. Modern DRAMs are usually hundreds of times slower than the CPU, which means the CPU often sits idle for hundreds of cycles just waiting for data. If the CPU suddenly becomes 1000x faster, the situation just becomes much worse (relatively speaking), unless we can redesign the code in a special way such that it does a LOT of computation with only a SMALL amount of data. (It could be feasible for some science applications, but impossible/impractical for most end-user applications.)

    Wow, I can’t believe I just wrote an informative reply to Cosmic Variance! *giggle*

  • Pingback: “Warp speed” computing? « WCIT WIS()

  • http://www.secureconsulting.net/ Ben

    If memory serves, FPGAs were one of the unique aspects of Transmeta’s microchip program back in the late 90s. Their site doesn’t have much info on it any more (http://www.transmeta.com/tech/microip.html), but I believe this was the basis of their “software-based microprocessor.” At any rate, this does not seem to be a new idea, at least on the surface. Perhaps a new application of an existing idea. Transmeta’s chip were renowned for optimizing themselves over time as they adapted to the host OS that was installed.

  • Jason Dick

    I thought Transmeta focused on VLIW for their processors?

    In any case, yeah, this seems to be a new algorithm based on an old idea. I first heard about this sort of thing with Starbridge Systems about eight years ago or so:

    What seems interesting about this FPGA idea, to me, is that it seems like they are attempting to use an FPGA to speed up general processing, without reprogramming. Now, this sort of thing rarely speeds up programming by a significant portion, but given how challenging it is to make new, faster processors, this sounds like an excellent idea to improve performance of general-purpose processors a few percent more.

  • http://arunsmusings.blogspot.com Arun

    To reiterate Jick’s point, unless your data set is small or the required fetches from memory are extremely predictable, your warp speed processor is going to be spending most of its time twiddling its thumbs.

  • Puppy

    I believe that transmeta tried to use software translation to traslate from i.e. x86 intructions-set to their own native VLIW-instruction-set. In other words, the operating system and the CPU didn’t actually run the same intruction-set.

    I think that this chip lives on as VIA EDEN, but I don’t know if it still features the software translation that was the main feature of the transmeta chip. If I remember, VIA EDEN is featured as producing low power consumption (and of course not much heat).

  • fh

    SW but isn’t that the point though? FPGAs have been around for a long time but the need to “handwrite” its code limits it to specialized fields.
    Automation would mean that it would run transparently to the application programer and a sleuth of current programs could benefit.

    In other words, its main achievement is to change the economics of speeding up programs using RC, because it shifts the weight of coding the reconfiguration from the application developer (who is human, and thus very expensive) to the CPU (which is silicon and very very cheap, albeit of course much less capable then the human)

    Or am I misreading this?

  • jt

    No FH, you’re right, if it all works.

    But such automatic compilation of designs has been the “holy grail” of not just the FPGA world, but the entire parallel/scientific computing field for 40 years. It’s just really hard to extract more than a few way parallelism from non-trivial problems that are stated serially, even without the memory bottleneck. And it’s hard as hades for most humans to express parallel algorithms as such.

    There has been a LOT of work done here, and every high-level or automatic design compiler I’ve seen makes a major tradeoff in either flexibility, performance, ease of use.

    Usually, you can only pick one (or at most 2).
    (e.g. Mitrionics, which focuses on ease-of-use,Handel-C which goes for flexibility, Viva, which is pretty good trades flexibility for a decent enough performance and )

    The trouble I see here is in the target audience– if you have a major scientific computation to perform that can be accelerated with FPGAs, why would you use this custom processor, when for a little more effort you can get an FPGA designer who can make a custom design which will probably outperform this guy. If you have a webserver with many trivially parallel tasks, unless they map to what the desing compiler wants to see, you’ve wasted your silicon– it probably makes sense to buy another COTS box ‘o processors.

    You’re spending a lot of silicon on a design tool that you’ll need once in a while– don’t get me wrong, it’s very cool to have it on a chip, and it’s a step towards “evolvable, self-programming hardware”, but I’m not sure of the practicality.

    Bioinformatics & network security is one place where this might do very well. You have lots of string and pattern matching which avoids floating point (which currently kills FPGAs), very regular memory accesses and very simple operations (counters, comparators and address generators). But agian, the simplicity of the app makes me question whether it isn’t worth the designer time vs. this automatic solution.


  • Pingback: hostab » Blog Archive » Comment on Warp Speed Computing by Jason Dick()

  • Haelfix

    How long does it take the main cpu to write the instruction code to the FPGA in general? I’d imagine that would be the huge bottleneck for mainstream apps.

  • jt

    In general (for traditional FPGAs) you’re talking milliseconds to seconds for full configuration. Reconfiguration can be much faster. For an architecture like this, though, the logic fabric is smallish, and it’s very tightly coupled to the CPU. You’d still be looking at a it, but nowhere near as big.

    Given that before the FPGA gets programmed, enough statistics have to be collected by the profiling unit to determine if it’s worthwhile, it’ll only be reprogrammed if it’s worth it. This is (roughly) the scheme used for Just-In-Time compilation for CPU emulation, and it works there.


  • Pingback: links for 2007-10-22 « Chatquah and Galoshes()

  • Torbjörn Larsson, OM

    Rather, a good idea frequently exposes yet another bottleneck that was hidden before.

    Yes, but that isn’t a matter of complaint as such.

    When you say that DRAMs are substantially slower, don’t you mean that large memory space and subsequent deep data channels are the problems? I’m not sure how to tackle that one though – it seems as least as hard a problem to break down as the serial-to-parallel challenge is.

  • Pingback: Procesadores inteligentes que se “auto-configuran” para ir más rápido « The ZeRoX Blog()

  • Kaleberg

    Sussman and his people at MIT used to talk about the dynamicist’s workbench back in the 1980s. They were trying to raise the computer power to human reason ratio. There were quite interested in dynamic rewiring for special purpose computations. As a side project, Sussman and Wisdom built a digital orrery to study the orbit of Pluto. Basically, the idea was to wire up an old fashioned analog computer for planetary dynamics, but have it do digital computation, and be subject to digital error analysis. (They had to kludge the time interval so that numerical errors didn’t break energy conservation).

    Some of the old school analog computation people were really into this idea. They visualized computations as wires and nodes. I remember one mechanical engineering professor, who was a big fan of nomograms. He wanted a set of networkable blocks so he could build a computer program on his desk top and have it run digitally. We could probably do this today, but few people are used to thinking about algorithms that way anymore, despite the inherent parallellism that such thinking requires. This was back in the early 1970s, so this idea has old roots.

  • http://myspace.com/spiritpsience John Quantum

    Technology like this, plus technology that is more quantum-field attuned would bridge to eacother pretty good. Someday, there can be file storage and some processing that is within the quantum field of a small computer that defines and processes data through the field around it. Just as we are processors and vehicles for the information that is stored in the quantum field around us. I envision a future of merging the divine with computers. This warp speed computing idea is pretty good. I foresee a program where someone can chat with higher dimensional beings with their computer or a whole array of applications. Since we live in an electrical universe, computers are electrical too.

  • Pingback: /home/avaloncio :) » Blog Archive » Lo que viene… los Procesadores Warp()


Discover's Newsletter

Sign up to get the latest science news delivered weekly right to your inbox!

Cosmic Variance

Random samplings from a universe of ideas.

See More

Collapse bottom bar