The detailed performance comparison done by arstechnica between a 2.66 Ghz, four core, Mac Pro running Intel's latest "woodcrest" processors and an older dual G5 showed the new machine winning various tests by an average of less than 25% - with more than twice the cycles, and on native code.
However, there's rather more to this story: specifically the fact that Apple has long suffered from developer refusal or inability to really focus on using the PPC architecture effectively - resulting, for example, in badly ported games that completely failed to use the Apple hardware effectively and therefore made that hardware look slow to gamers.
Thus the code used to test Apple's Intel Macs reflects years of expertise in building Intel compilers and writing Intel optimised code - expertise that's generally been missing on the PPC side.
So what happens when you do optimise code for the G5? Well, despite (or perhaps because of) Apple's decision to terminate its IBM relationship, IBM has gone ahead and modified the gcc compiler suite to partially automate production of PPC optimised code - creating something they call their "GCC Auto-Vectorisation Compiler" aimed at making better use of the PPC Altivec facility.
The results are stunning. Here's how IBM describes the telemark benchmark used in a recent test series:
EEMBC's TeleBench is a suite of benchmarks comprised of kernels that include autocorrelation, convolutional encoder, bit allocation, inverse fast Fourier transform, fast Fourier transform, and Viterbi decoder tests. The Telemark score is calculated by taking the geometric mean of these individual benchmark scores and dividing by a normalisation factor. Most of these benchmarks have features that benefit from autovectorization capabilities. For example, the Viterbi decoder computes the most probable transmitted sequence of a convolutional coded sequence. The most computationally intensive part of Viterbi performs a maximisation of a likelihood function through a sequence of add-compare-select (ACS) operations and can benefit significantly from SIMD execution.
With the standard Green Hills 4.0.5 compilers the 2Ghz 970FX gets an out of the box score of 56.1 on this benchmark. With the new GCC auto vectorisation suite it gets 141.8 on the same test using the same source code - a 182% improvement.
IBM hasn't reported on the other embedded benchmarks yet and the results will vary depending on the degree of parallelisation applicable, but that 182% performance gain came simply from making better software use of hardware facilities that have been built in and functional for years - and are now available on Microsoft's X360, Sony's forthcoming Playstation3, and IBM's own Cell and other PPC based gear - but not on anyone's new Mac.