ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Processors Toolkit

Inside Intel's Penryn

Rupert Goodwins ZDNet.co.uk

Published: 12 Nov 2007

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

Architecture
The basic architecture of Penryn is familiar to anyone who's followed processor design over the past ten years. Instructions are fetched in from memory — hopefully cache — in as big a chunk as possible, to minimise slow bus transactions. The first step once they're on-chip is pre-decode, where the chunk is broken down into the individual instructions and their length determined. The instructions are then pushed into a 64-byte deep queue, which acts a buffer that absorbs temporary delays between fetching and decoding, and this happens in general with four instructions per clock cycle.

Penryn's basic architecture is unchanged from the previous 65nm processor generation.

The processor then works out how to allocate its internal resources to the instructions. It has a big pool of registers — many more than the standard register set that the programmer knows about — and allocates these so that, as far as possible, data doesn't need to be moved between them. It's often possible, for example, to spot a case where the program takes the contents of the EAX register and moves them to ECX, then overwrites EAX with other data. It's quicker to internally rename the EAX register to ECX while loading a third unnamed register with the new data, and then rename that register to EAX: that saves one complete data move.

That example also demonstrates out-of-order execution, where the processor spots instructions that aren't dependent on previous events and can be run using otherwise unused internal resources. This is efficient, but produces results in a different order to that expected by the program: these results are held in a reorder buffer, which holds the results until they're ready to be returned. Then, the retirement unit sorts out which results are due out in what order, and unloads them appropriately.

Another unit, the memory order buffer, holds data loads that may have relations to other loads or stores, and checks what dependencies exist. If the data is going to memory-mapped I/O and depends on information being stored, it doesn't dispatch the load before the store completes. If the load references the store address, it forwards the result from the store. If there are no dependencies, the load can be despatched ahead of the store. That gives some large advantages, as it avoids the delays of going off-chip — a process chip designers regard as expensive.

Penryn also keeps a history table of how previous data loads behaved and whether they had a dependency the last time they were executed. This is part of the disambiguation unit, which decides ahead of time whether a particular action is likely to need an expensive memory access or not. Although this doesn't always work — and when it doesn't, the effects can be quite dramatic, with the entire internal pipeline having to be reloaded to a previous state. Intel engineers call this 'nuking the pipeline' — it does compensate to some extent for the chip's reliance on a frontside bus and the lack of an on-board memory controller.

 

Related articles

Benchmarks: Intel's first 45nm Penryn chip

Tech Guide The 3GHz Core 2 Extreme QX9650, Intel's first 45nm processor, has a total of 12MB of Level 2 cache at its disposal. This benchmark test shows what else the new chip has to offer. [11 Nov 2007]

2 Talkbacks


Intel Core 2 Extreme QX9650

Review The CPU market is due for a lot of upheaval over the next 12 months, so you might be wise to wait for a clearer picture before plunking down $1000 or so on Intel's new Core 2 Extreme QX9650 quad-core desktop processor. But if you want to claim ownership of the fastest multi-core CPU available today, look no further. [29 Oct 2007]


  • Email
  • Trackback
  • Clip Link
  • Print friendly Print with HP

Did you find this article useful?
16 out of 16 people found this useful


Full Talkback thread

0 comments


New Products

Apple Time Capsule: a first look

Apple Time Capsule: a first look

With Time Capsule, Apple blends high-bandwidth wireless networking and automated system backup in one tidy-looking package.

Apple MacBook Air: a first look

Apple MacBook Air: a first look

The MacBook Air is not quite an ultraportable, but it is exceptionally thin. Retaining the same 13.3in. display as the current MacBook line, it tapers from 1.94cm thick to just 0.4cm. Apple calls it the 'world's thinnest notebook'.

View all Previews

Discussions

RichardThurston RichardThurston

Government help

Friday 16 May 2008, 8:35 AM

2 comments
barrie barrie

Windows Driver Updates

Friday 16 May 2008, 3:14 AM

2 comments
jgj jgj

"what more do you need?"

Thursday 15 May 2008, 9:19 PM

4 comments
dwr50 dwr50

Just tried it...

Thursday 15 May 2008, 9:12 PM

2 comments