[The Truth About Processor Performance (a.k.a AMD GHz vs. Intel GHz)] -



The Truth About Processor Performance (a.k.a AMD GHz vs. Intel GHz)

Discuss The Truth About Processor Performance (a.k.a AMD GHz vs. Intel GHz)



Posted by: nitestick

[b]THE LONG STORY[/b] (short story at bottom)

[I]For the longest time people have had difficulty comprehending the power of processors. In the past it was mostly seen that the higher the frequency (number of MHz or GHz) of a processor was the more powerful it was. Now that is not the case. While the frequency does matter, it is not the only factor that determines performance.[/I]

In the past 5 years we have seen AMD and Intel go two separate routes in the way they have developed their processors. For the most of that 5 years Intel has chosen to manufacture high frequency processors while AMD has manufactured low frequency processors (the highest clocked being 3.8ghz and 2.8ghz respectively for each manufacturer). Many people wondered why AMD made “slower” processors. The truth is that they [b]did not[/b]. Now there are a few things that influence how quickly a processor works and I’ll try and stick to the more important and fundamental ones:

[b][i]Execution Units:[/b][/i] The number of execution units basically determines how many things a processor can do at once:

[quote] Execution unit
From Wikipedia, the free encyclopedia

In computer engineering, an execution unit is a part of a CPU that performs the operations and calculations called for by the program. It may have its own internal control sequence unit (not to be confused with the CPUs main control unit), some registers, and other internal units such as a sub-ALU or FPU, or some smaller, more specific components.
It is commonplace for modern CPUs to have multiple parallel execution units, referred to as scalar or superscalar design. The simplest arrangement is to use one, the bus manager, to manage the memory interface, and the others to perform calculations. Additionally, modern CPUs execution units are usually pipelined.[/quote]
[url=http://en.wikipedia.org/wiki/Execution_unit]Wikipedia[/url]

[b][i]Pipeline:[/i][/b] The length of the pipeline determines how long it is before an instruction is processed completely and the processors "decision" or output is finalised.

[i][b]Frequency:[/b][/i] This is the number of “clock cycles” per second. The number of clock cycles per second determines the total number of times the processor can process an instruction (given the way it’s pipeline/execution units work).

[b][i]Memory Access/Cache:[/b][/i] How quickly a processor can access it’s memory or the cache built into it will impact on how fast it can perform certain tasks. Think of the memory as being a book. Now if the processor has enough memory and fast enough access to it, it would be like having every page of the book torn out and laid side by side. That would make it possible to see every page of the book in one instant. If the processor doesn’t have enough memory and/or can’t access it quick enough it’s like having to flick through every page.

[u]Now that is all good and well but it still doesn’t explain how a 1.8GHz Athlon 64 can perform just as well as a 3.0GHz Pentium 4, so let’s take a look at the Pentium 4 Prescott vs. the Athlon 64 Venice.[/u]

[COLOR=blue]Prescott:
90nm die process size
3 execution units
31 stage pipeline
3.0GHz
Memory controller on the motherboard’s “Northbridge”
[/COLOR]

[COLOR=blue]Venice:
90nm die process size
3 execution units
12 stage pipeline
1.8GHz
Memory controller integrated into processor.[/color]

Once again though, I’m sure that still doesn’t explain things so…………..

The stages of the pipeline is really probably the largest difference here. As you can see the Prescott and the Venice have a 31 stage and a 12 stage pipeline respectively. What that means is that it takes the Pentium 4 from our example 31 clock cycles to complete a single instruction before it can start another!!!! The Athlon on the other hand has the short 12 stage pipeline and will process more than 2 instructions on each execution unit in the time it takes the Pentium 4 to do one single instruction on each unit. As you can see while the Athlon has a much lower frequency it makes more efficient use of it. This is where the naming system used by AMD comes in. The names such as 3000+ and 4000+ are rough indicators of performance. They are called P-Ratings and were originally taken by comparing the performance of the Athlon XP’s to the original Athlon Thunderbird and the rating designated is roughly the frequency in MHz of an Athlon Thunderbird that would equal it. It just so happens that the Athlon Thunderbird was about as efficient as the Pentium 4 Prescott which makes things easy for a comparison. Essentially it means that an Athlon 64 3700+ is equivalent or better than a 3700MHz or 3.7GHz Pentium 4 even though it is only 2.2GHz.

The other factor here is the Athlon 64’s in-built memory controller. In the past and with Intel’s current processors, the processor’s “FSB” (Front Side BUS) would be used to “talk” to the motherboard’s chipset Northbridge which contains the memory controller. The memory controller would then “talk” to the memory and then the process would go back in reverse to “talk” to the processor again. After this game of Chinese whispers quite a lot of time has been wasted just to fetch some information and THEN the processor can get to work. Having the memory controller on the processor pretty much allows almost instant access to the memory so the processor can get straight to work.

There are other good reasons why it is better to have a processor that does more work at a lower frequency. The higher the frequency, the higher the power consumption and the amount of heat that the processor puts out. The other problem is that the higher the frequency, the greater the likelihood that the processor will make an error which is how the Pentium 4 ended up with a 31 stage pipeline. It allowed it to run at a higher frequency but at the cost of efficiency.

So the difference isn’t performance, it’s just a different way of getting things done. That brings us to the latest generation of processors; AMD’s socket AM2 and Intel’s Core 2.

[b]So what’s different?[/b]
AM2: Not a whole heck of a lot. Socket AM2 processors for all intents and purposes are exactly the same as the Athlon 64 we have already talked about but with just one difference. The memory controller has been modified so it can use DDR2 memory as well as the DDR the original Athlon 64’s used. The DDR2 memory only provides a marginal performance increase.

Core 2: Intel has repented. This is a complete departure from the design of the Pentium 4. The Core 2 processors are based on the Core processors, which in turn are based loosely on the Pentium M processors……which are based on the Pentium 3. Of course there are a lot of differences between the Pentium 3 and the Core 2 but they are relatives. The Core 2 is more efficient than the Athlon 64 as it’s pipeline has been shortened to 14 stages (almost same as the Athlon 64) and an extra execution unit has been added. Aside from this the Core 2 processors have shown amazing potential for being run at high frequencies despite their efficiency, many have been seen to reach 3.6GHz which has been almost impossible for the Athlon 64.

[u]So now let us compare the Core 2 Conroe and the Athlon 64 Toledo/Windsor cores.[/u]

[COLOR=blue]Core 2 Conroe e6600:
65nm die process size
4 execution units
14 stage pipeline
2.4GHz
Memory controller on motherboard's Northbridge

Windsor 5000+:
90nm die process size
3 execution units
12 stage pipeline
2.6GHz
Memory controller integrated into processor
[/color]

The Conroe core is Intel's new flagship. The addition of a 4th execution unit, improved L2 cache function and shortened pipeline have combined to form a monstrously powerful processor. As you can see I am now comparing an Athlon 64 X2 5000+ with a clock speed of 2.6GHz to the e6600 at 2.4GHz. The new found efficiency of the Conroe means that despite being 200MHz lower in frequency it is far more powerful. In fact at stock speed the e6600 can out perform the FX-60 which is a 2.6GHz Toledo core (think of this as being perhaps around the power a X2 5200+ would be if it existed) and challenges the 2.8GHz Toledo core FX-62 (~5600+).

[SIZE=3][u][b]MYTHS[/b][/u][/SIZE]

A 3.8ghz P4 is better than a 2.8GHZ Athlon 64: False, a 2.8GHz Athlon 64 is roughly equivalent to approx. a 4.8-5GHz P4.

65nm will increase frequencies: False, while reducing the size of the parts in a processor does reduce the voltage required and consequently the heat output by the processor this does not mean it has a greater potential for overclocking. The smaller parts are more sensitive to heat so really once that is factored in there is no gain other than it costing less to manufacture and operating at lower power.

Reverse Hyper Threading: This is complete fantasy as near as I can tell. Certain websites fabricated and perpetuated this myth. It suggests that the AM2 processors have something called Reverse Hyper Threading that enables the separate processor cores of a dual core to operate as one. It also suggested at a date that has been and gone the RHT technology would be enabled with a BIOS update……..still waiting :p

DDR2 memory is better than DDR: False. It’s just a different type of memory really. Similar to the difference between the Athlon and P4, DDR obtains it’s performance from efficiency and DDR2 from high frequency. Here are some comparisons between DDR and DDR2 though keep in mind that the results are mainly due to the Athlon's memory controller. Once we have some AM2 processors with DDR2 memory it will be a better comparison. Overall the while the bandwidth provided may be different it does not make a particularly noticeable difference.[url=http://tech-forums.net/showthread.php?s=&threadid=120879]Gaara’s Bandwidth Comparison[/url]

AMD will have a quick response to Core 2: I hate to say it but this is false too. It’s really impossible for AMD to make any competition for the Core 2 Conroe core until at least mid way through 2007.

Intel’s FSB is higher: This is more a misconception than a myth. Intel’s advertised FSB has always been the “effective FSB” rather than the true FSB. The later Athlon 64’s actually have a “2000MT/s” FSB. That is an “effective FSB” of 1000MHz but in “full duplex” which means it can perform 2 operations per clock cycle or 2,000 Million Transfers/Second.

Adding more cache automatically makes the processor faster: Not exactly, additional L2 cache does help if designed correctly but just adding extra cache does not necessarily help. A perfect proof of this is the original P4 Extreme Edition (code name "Gallatin"). This was a P4 pretty much merged with a Xeon. it was given extra cache in the form of an L3 cache. Despite being higher clocked than the Northwood Pentium 4's of the time it was actually beaten in many applications though took a significant lead in encoding and in some games.



[b]THE SHORT STORY[/b]
The frequency of a processor is almost irrelevant. The only thing that determines it’s performance is how efficient it is with it’s clock cycles. The Athlon 64 is more efficient than the Pentium 4 so can out perform it at a lower clock speed and the Core 2 Duo is more efficient than the Athlon 64 and can out perform it at a lower clock speed.

Core 2>Athlon 64=Core>Pentium 4

Any corrections, additions or requests are welcome as i'm certain i've probably messed something up :p. This shall remain a work in progress

Happy Reading,
Nitestick

[COLOR=teal]© Nitestick 2006[/COLOR]
______________________________________________
[url=http://www.HotFile.us/HF/The Truth About Processor Performance_TF Guide.pdf/1bf4e824]Download the *.PDF Guide Mirror #1[/url]



Posted by: idiotec

Very good info :D

One point I will comment on:

[QUOTE]DDR2 memory is better than DDR: False. It’s just a different type of memory really. Similar to the difference between the Athlon and P4, DDR obtains it’s performance from efficiency and DDR2 from high frequency. In fact it’s arguable that DDR is superior, see this thread Gaara’s Bandwidth Comparison[/QUOTE]

I am not going to argue that DDR2 is "better," but I think the augment here is somewhat misleading.

Within the thread you linked too, all comparisons are DDR on A64 vs DDR2 on Intel. In that comparison, the bigger factor is IMC vs chipset MC. When comparing DDR on 939 vs DDR2 on AM2, DDR2 does add a significant amount of bandwidth. Now, this does not translate into significant performance increases for AM2 systems, primarily because 939 systems weren't bandwidth starved to begin width.

Great thread though, should answer a lot of people's questions.
;)



Posted by: Lord AnthraX

Excellent Post.

Glad this is stickied.


This will awnser alot of peoples questions :D



Posted by: nitestick

[QUOTE][i]Originally posted by idiotec [/i]
[B]Very good info :D

One point I will comment on:



I am not going to argue that DDR2 is "better," but I think the augment here is somewhat misleading.

Within the thread you linked too, all comparisons are DDR on A64 vs DDR2 on Intel. In that comparison, the bigger factor is IMC vs chipset MC. When comparing DDR on 939 vs DDR2 on AM2, DDR2 does add a significant amount of bandwidth. Now, this does not translate into significant performance increases for AM2 systems, primarily because 939 systems weren't bandwidth starved to begin width.

Great thread though, should answer a lot of people's questions.
;) [/B][/QUOTE]

:o very true, i think i have journalist blood in me as i kind of thought that in the back of my head yet wrote it still to prove a point. i'll rectify that

edit: have changed that information about DDR2, added some more myths and cleaned up the format a little. still plenty of work though i guess



Posted by: alexsabree

very nice, i am going to print out this thread and show all of my dumb classmates in my programming class how stupid they are. :D

I mean seriously, im like the only one who knows anything in that class
(and its an elective)



Posted by: nitestick

i've added a *.PDF download link to the bottom of the guide. the file host i'm using is quite crap and has a 60 second delay before you can start the download. if anyone knows a decent free filehost could they PM me. i really only need to store the PDF and it's <100kb so i don't need much :p



Posted by: Lord AnthraX

[QUOTE][i]Originally posted by nitestick [/i]
[B]i've added a *.PDF download link to the bottom of the guide. the file host i'm using is quite crap and has a 60 second delay before you can start the download. if anyone knows a decent free filehost could they PM me. i really only need to store the PDF and it's <100kb so i don't need much :p [/B][/QUOTE]


:D
[url]http://rapidshare.de/files/34818326/The_Truth_About_Processor_Performance_TF_Guide.pdf.html[/url]



Posted by: hickfarm

This helped out a lot. I always wondered why AMDs ghz was so much smaller than INTEL. I knew that 4200+ meant that is how it compared to INTEL.
Great Thread.



Posted by: RalliArt882

This helped me out alot. I always knew that Core 2 Duos were better than AMD X2 and those were better than Pentiums, but never knew why until i read this.



Posted by: alexsabree

someone needed to clear this up.. wow, never knew so many computer geeks could be so stupid..

Im just pi***d about my classmates thinking that im wrong all the time, when i absolutly know im right



Posted by: keyser09

hey but i thought ddr2 was better than ddr cause the guy at the shop said ddr2 is duel channeling and will iprove my performance by double...



Posted by: nitestick

well the guy at the shop was lying. DDR has used dual channel for about 4 years now (i think). dual channel does not double performance but it does allow larger amounts of data to be transferred to memory at the same time because the memory bus is 128 bits wide rather than 64 bits. the only thing DDR2 beats DDR memory in is in frequency. the latencies are poor by comparison for example the best latencies i have seen for DDR2 are 3-3-3-10 for DDR2-400 versus something like my memory which has latencies of 2-2-2-5 at DDR-400 (incidentally can run at over DDR500 with those as well). in my opinion DDR is superior to DDR2 as a platform but unfortunately has frequency limits so AM2 can actually get higher bandwidths with top-end memory



Posted by: beluga

mann that really does help.. ive always been a bit confused with those ghz and mhz... with amd and intel... how bout crusoe processors? or G5s



Posted by: hmammen

great read......pretty musch knew all of this previously but this cleared up some stuff........*favorited!*



Posted by: str8lazy

This has been a very much need post for a long time. I know that you have seen the wars between the AMD and Intel fanboys (I'm one of them) and despite the recent turnover by Intel, I am going to support AMD to the fullest! Thanks once again, great read.



Posted by: TriEclipse

Nice and lengthy, lots of good info, nice work.

But...

You compared a 2.4Ghz Conroe to a 2.6Ghz A64 X2? :p Amusing.

And just a minor correction, but Core 2 Duo has a 12-stage pipeline, not 14.



Posted by: maroon1

I have some questions.

As I know the Intel Northwood and Willamette has 20-stage pipeline , while the Prescott and Cedar Mill has 31-stage pipeline.

So, does this mean that Willamette and Northwood are faster than Prescott or Cedar Mill, if they are running at same freq



Also, I have another question. As I know Pentium 3 has 10-stage pipeline. So, does this mean that Pentium 3 is faster than Pentium 4 ?



Posted by: TriEclipse

@ Maroon1;

Pipeline length does not automatically translate directly to the performance. The complexity of the stages also plays a major role.

For example, consider the similar performances of a 2.0Ghz A64 and a 3.2Ghz Pentium 4. The A64 has a 14-stage pipeline, and the P4 a 31-stage pipeline, but the Athlon 64 is not necessarily any faster than the P4.

The Athlon 64 accomplishes this through the use of complex stages and a short pipeline. The Pentium 4 accomplishes the same thing through the use of a longer pipeline, and simpler stages. The effect is that the Athlon 64 can do the same amount of work at slower clockspeeds while the Pentium 4 has to push for higher clocks to compete.

It is true, that at the same clockspeeds, a processor with shorter pipelines and more complex stages will beat out a processor with longer pipes and simpler stages. So, to use your example, if you have a Pentium III at 1.0Ghz, and a Pentium 4 at 1.0Ghz, the Pentium III would indeed beat out the Pentium 4. However, the Pentium 4 was designed with longer pipes and simpler stages so that it could race ahead in clockspeed and beat out the older generation through the use of pure speed.

I'm not sure about how the Willamette and Northwood compare to the Prescott since there are many many other things even besides the simplicity or complexity of the stages that dictates the performance of a processor. But the premise behind the Prescott was to make the pipes even longer, and the stages even simpler, to achieve higher clocks than the older cores could provide. At this point, Intel still hadn't realized the brick wall of insulator leakage that would leave it's processors crippled before even hitting 4Ghz.

Hope that cleared things up a bit (or maybe just made it even more complicated, haha).



Posted by: nitestick

[quote]As I know the Intel Northwood and Willamette has 20-stage pipeline , while the Prescott and Cedar Mill has 31-stage pipeline.[/quote]

it's not quite black & white. the "grey" area (i like this metaphor :D) is the scalability of performance. at the lower end of clock speeds when comparing say a Northwood and Prescott, perhaps at about 2.4GHz the Northwood would win out. the Northwood proved to not scale well with clock speed though and that is part of why they did not go much past 3.0GHz. while at about 2.8GHz a Northwood and a Prescott would be about even i think after around 3.5GHz Northwoods stopped providing significant improvements and the Prescott gained a sizeable performance lead.

yes the P3 is indeed faster than a P4 at the same clock speeds, it is well publicised. it wasn't until the Willamette hit about 1.7GHz that it could officially take the performance crown from the last P3s in all tests.

[quote]
And just a minor correction, but Core 2 Duo has a 12-stage pipeline, not 14.[/quote]

that is greatly appreciated :D. most of this information was originally typed at around 12:30am, [b]without[/b] research :p.

[size=3][color=red]NOTE: anything that doesn't directly pertain to understanding this thread/corrections etc will be split to another thread. spam of course will be deleted. i want this to stay clean and there is no need to post [quote]
well im a nubby here but i have a chance to build a computer the cpus witch a freind has but dont know witch one to take he said there both good . but need some help on witch one to take ? the both intels ones a pent D 805 and a pent 4 650 ht. which one should i take ? i love my games .[/quote] and the like. discuss away :D just don't hi-jack a moderator[/color][/size]



Posted by: hbbk131

That is a wonderful and most informational post…great job!

Cliffs notes version of a wonderful post, which if you have the time read it, other wise here is a shorter less intelectual explanation to accompany the original post. Latency is the keyword here...AMD architecture has been lower in latency for a long time...this is why they have performed better. Think of it in terms of a two straws trying to draw on the same liquid...one is a .0625 in diameter and the other is a .25 in diameter. In order to get both straws’s to draw the same amount of liquid the .0625 will have to work harder than the .25. Same principal in the CPU architecture. Lower latency means better performance. I just hope AMD can make a come back with the QUAD cores coming out soon…I really hate intel.

-HBBK131 :D



Posted by: TriEclipse

Well AMD has it's [i]competition[/i] for the Core 2 Quad in Q3 of '07, but only one quarter later, Intel has their 45nm version of the Core 2 series, including Quad Cores that work at upwards of 3Ghz (up to 3.73Ghz). Can't say things look too good for AMD for a while. There's a similar situation with AMD's K8L Dual Cores and Intel's 45nm Dual Cores.



Posted by: TriEclipse

Please post a seperate thread about this so as to not litter the sticky.



Posted by: mizner07

Thanks for the info it helped me out on determing on what processor i should buy, because im building a new computer.



Posted by: maroon1

[QUOTE][i]Originally posted by TriEclipse [/i]
[B].

And just a minor correction, but Core 2 Duo has a 12-stage pipeline, not 14. [/B][/QUOTE]

I have searched by google, and I found that Core 2 Duo has a 14-stage pipeline



Posted by: mizner07

What does 14 stage pipeline mean?



Posted by: cvb724

^ Try reading nitesticks first post as a starter :)



Posted by: TriEclipse

[QUOTE][i]Originally posted by maroon1 [/i]
[B]I have searched by google, and I found that Core 2 Duo has a 14-stage pipeline [/B][/QUOTE]

My bad. Its the Athlon 64 that has a 12-stage pipeline, and not the Conroe. I knew it was one or the other. In any case, a correction is in order.



Posted by: GreenMachine

EXCELLENT POST. :)

AMD vs INTEL = popcorn and Tech-Forums!


I WAS WONDERING IF WE COULD ADD MORE BRAIN SOOTHING INFORMATION.

I COULD READ ON AND ON AND ON AND ON.


Fascination is empowerment of todays minds. ( dank quote by myself xD )



Posted by: nitestick

PEOPLE, PLEASE, PLEASE FOR THE LOVE OF GOD STOP JACKING THE GUIDE WITH UNRELATED QUESTIONS!!!!!!!



Posted by: nickm926

good info, my 3800+ is just sweeeet



Posted by: vernong1992

well this is no longer true with the intel core 2 duo processors



Posted by: Meithan

I disagree on some things about CPU performance, specially this about pipeline length:

[QUOTE][I]What that means is that it takes the Pentium 4 from our example 31 clock cycles to complete a single instruction before it can start another.[/I][/QUOTE]

This is very misleading. While it is true that it's not possible to push more than one pipeline stage per clock, you can't say pipeline length is equal to cycles per instruction. Modern desktop processors are based on superscalar, out-of-order designs. Determining how much clocks an instruction needs to complete is far more complex than just counting the number of pipeline stages.

Did you know that out of the 31 stages of the Prescott pipeline, 21 alone are dedicated to branch prediction? You could easily just group these 21 stages on a single stage labeled "Branch Prediction" and say the Prescott has only an 11-stage pipeline. Sure, you'd still need at least 21 cycles to complete the stage, but if that allows to predict the flow of instructions with much more precision, then in the end you'll end up winning cycles.

A longer pipeline means you can split up work on smaller, easier to handle bits (and thus achieve better logic control, such as brand prediction), but when an instruction stalls deep down the pipeline, you lose a lot of cycles. The main problem with long pipelines is that they require high clock frequencies to remain competitive, and this carries severe transistor leakage issues. This is why NetBurst was abandoned.

The argument that the Conroe is faster than the Prescott [I]because[/I] of a shorter pipeline is wrong. A better (but still superficial) description would be that Conroes are better because their pipeline is [I]wider[/I].