Benchmarks: AMD's 45nm 'Shanghai' Opteron
Published: 20 Nov 2008
Conclusion
In Shanghai, AMD has a processor that can take on Intel's Core 2 architecture in all areas. For pure arithmetic performance, Shanghai still falls short of Intel's chips. But once clock speeds are taken into account, the gap is mostly reduced to a single-figure percentage. Only in single-precision arithmetic using SSE instructions do Shanghai systems show a 35 percent performance shortfall against a comparable Core 2 architecture Xeon system.
Arithmetic performance is highly significant if a server is used for rendering, video encoding or as part of a compute cluster. Typical server tasks, such as file, print, web and database serving, put a special load on memory throughput and memory latency. In these areas Shanghai systems are clearly ahead of their Intel Core 2 counterparts.

Thanks to the memory controller integrated into each Shanghai CPU, memory bandwidth scales with each additional processor in the system — an advantage that becomes particularly evident in four- and eight-processor machines. Intel's four-processor Dunnington system tries to counteract the memory bottleneck by adding large quantities of L3 cache, which succeeds in improving synthetic benchmarks. Buyers considering acquiring a 24-core system usually have to cope with large datasets that don't fit in cache and put a much greater load on the external memory subsystem.
Shanghai systems are efficient in terms of power consumption, but Intel also did its green homework with its 45nm processors, so it can no longer be assumed that AMD systems use less power than comparable Intel systems.
Buyers looking for a balance between power consumption, computational performance and memory throughput will find Shanghai processors extremely attractive. But in two-processor systems, Intel can offer clock speeds up to 3.40GHz, with commensurate increases in arithmetic performance. AMD's top Shanghai processor currently runs at 2.70GHz, but from February 2009 a more power-hungry 2.80GHz version will be available.
Processor development is forging ahead. Intel has already demonstrated what Nehalem can do on the desktop. As well as Nehalem's substantial increases in arithmetic performance, its integrated memory controller is particularly significant. Intel offers a triple-channel DDR3 controller that supports memory up to 1,066MHz, which corresponds roughly to the bandwidth of a hexa-channel DDR2 controller running at up to 533MHz.
If Intel releases the first two-processor Nehalem systems in the spring, it's reasonable to assume they will outstrip the present Shanghai CPUs in most areas. AMD will have to counter that threat with HyperTransport 3.0 and DDR3 support.
AMD's strength in four- and eight-processor systems seems set to continue. Intel is unlikely to offer Nehalem variants for these machines before the end of 2009. Intel's Dunnington chips cannot keep up with AMD's Shanghai because of their restricted main memory bandwidth. Consequently, the two chip-makers will continue to give each other good reason to improve performance and feature sets, and the power user will continue to reap the rewards.
Additional contributions by Rupert Goodwins
Translation by Toby Wolpe

















