Benchmarks: AMD's 45nm 'Shanghai' Opteron
Published: 20 Nov 2008
Expanded L3 cache
In Shanghai, AMD has stuck with the three-level cache architecture established with Barcelona. But while Barcelona's small L3 caches were unimpressive, Shanghai's 6MB shared L3 cache is a big step in the right direction: a cache miss in L2 cache is now much more likely to be compensated for by a hit in L3, avoiding a slow and expensive trip to main memory.
Shanghai CPUs have about the same amount of cache as Intel's first Nehalem (Core i7) CPUs. But while AMD allocates 512KB of exclusive L2 cache per core, Intel's Nehalem chips have only 256KB per core. On the other hand, Nehalem features an 8MB L3 cache compared to Shanghai's 6MB.
Intel's Xeon 5400-series server CPUs have 12MB of cache, and these processors can be viewed as direct competitors to the Shanghai two-processor models. Intel's four-processor models in the Xeon 7400 series have caches of between 14MB and 25MB, but they must access main system memory through a single external quad-channel DDR2 controller. The Nehalem architecture, which has an internal memory controller analogous to Shanghai's HyperTransport, is not yet available for servers.
The lack of support for DDR3 RAM is particularly significant for main memory. If one sets aside marketing statements and examines throughput as expressed by bits per cycle x clock frequency x the number of memory channels, it's clear the crucial factor is the number of memory channels. This scales with AMD's processors, unlike Intel's, as the number of CPUs increases.
A two-processor system with eight cores can use DDR2-800 modules at an effective memory speed of 3.2GHz. That's identical to the performance offered by the quad-channel FB-DIMM controller that Intel uses with its Xeon 5000 processors. But with current Intel server platforms each bit must run through the frontside bus to the northbridge, which also handles the PCI-Express bus.
There's no question that AMD delivers more throughput, especially if the hypervisor, operating system and applications all support a genuine NUMA (Non-Uniform Memory Access) architecture. Intel does not yet offer a server platform with an integrated memory controller, and buyers will have to wait until next year for a two-processor system. Four-processor systems will only become available towards the end of 2009.

















