문제

For a brief report I have to do, our class ran code on a cluster using both gcc -O0 and icc -O0. We found that gcc was about 2.5 times faster than icc without any optimizations? Why is this? Does gcc -O0 actually do some minor optimization or does it simply happen to work better for this system?

The code was an implementation of the naive string searching algorithm found here, written in c.

Thank you

도움이 되었습니까?

해결책

A few things to take into account:

  • The instruction set each compiler uses by default. For example if your GCC build produces i686 code by default, while ICC restricts itself to i586 opcodes, you would probably see a significant performance difference.

  • The actual CPUs in your cluster. If you are using AMD processors, instead of Intel CPUs, then ICC is at a disadvantage because it is, of course, targeted specifically to Intel processors.

  • You mentioned using a cluster. Does this speed difference exist on a single processor as well? If you used any parallelisation facilities provided by your compiler, there could be significant differences there.

  • Simplistically, when optimisations are disabled, the compiler uses pre-made "templates" for each code construct. Since these templates are intended to be optimised afterwards, they are constructed in a way that enables the optimisation passes to produce better code. The fact that they may be slower or faster with -O0 does not really mean anything - for example, more explicit initial code could be easier to optimise but far slower to execute.

That said, the only way to find out what is going on is to profile the execution of your code and, if necessary, have a look at the assembly of those parts of the code where the major differences lie.

다른 팁

Performance at -O0 is not interesting or indicative of anything. It explicitly says "I don't care about performance", and the compiler takes you up on that; it just does whatever happens to be simplest. By random luck, what is simplest for GCC is faster than what is simplest for ICC for one highly specific microbenchmark on your specific hardware configuration. If you ran 100 other microbenchmarks, you would probably find some where ICC is faster, too. Even if you didn't, that still wouldn't mean much. If you're going to compare performance across compilers, turn on optimizations, because that's what you do if you care about performance.

If you want to understand why one is faster, profile the execution. Where is the execution time being spent? Where are there stalls? Why do those stalls occur?

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top