The Pentium Pro Processor Microarchitecture

2696 Words6 Pages

A Tour of the Pentium Pro Processor Microarchitecture

Introduction

One of the Pentium Pro processor's primary goals was to significantly exceed the performance of the 100MHz Pentium processor while being manufactured on the same semiconductor process. Using the same process as a volume production processor practically assured that the Pentium Pro processor would be manufacturable, but it meant that Intel had to focus on an improved microarchitecture for ALL of the performance gains. This guided tour describes how multiple architectural techniques - some proven in mainframe computers, some proposed in academia and some we innovated ourselves - were carefully interwoven, modified, enhanced, tuned and implemented to produce the Pentium Pro microprocessor. This unique combination of architectural features, which Intel describes as Dynamic
Execution, enabled the first Pentium Pro processor silicon to exceed the original performance goal.

Building from an already high platform

The Pentium processor set an impressive performance standard with its pipelined, superscalar microarchitecture. The Pentium processor's pipelined implementation uses five stages to extract high throughput from the silicon - the Pentium Pro processor moves to a decoupled, 12-stage, superpipelined implementation, trading less work per pipestage for more stages. The Pentium Pro processor reduced its pipestage time by 33 percent, compared with a Pentium processor, which means the
Pentium Pro processor can have a 33% higher clock speed than a Pentium processor and still be equally easy to produce from a semiconductor manufacturing process
(i.e., transistor speed) perspective.

The Pentium processor's superscalar microarchitecture, with its ability to execute two instructions per clock, would be difficult to exceed without a new approach. The new approach used by the Pentium Pro processor removes the constraint of linear instruction sequencing between the traditional "fetch" and
"execute" phases, and opens up a wide instruction window using an instruction pool. This approach allows the "execute" phase of the Pentium Pro processor to have much more visibility into the program's instruction stream so that better scheduling may take place. It requires the instruction "fetch/decode" phase of the Pentium Pro processor to be much more intelligent in terms of predicting program flow. Optimized scheduling requires the fundamental "execute" phase to be replaced by decoupled "dispatch/execute" and "retire" phases. This allows instructions to be started in any order but always be completed in the original program order. The Pentium Pro processor is implemented as three independent engines coupled with an instruction pool as shown in Figure 1 below.

What is the fundamental problem to solve?

Before starting our tour on how the Pentium Pro processor achieves its high performance it is important to note why this three- independent-engine approach was taken. A fundamental fact of today's microprocessor implementations must be

More about The Pentium Pro Processor Microarchitecture

Open Document