CPU Optimizations


Recommended Posts

From a few different systems got access to....

real 0m30.470s

user 0m30.406s

sys 0m0.024s

Intel? Xeon? CPU X3360 @ 2.83GHz (4 Cores)

8GB Memory

real 0m22.688s

user 0m22.677s

sys 0m0.000s

Intel? Core? i5-2320 CPU @ 3.00GHz (4 Cores)

16GB Memory

real 1m19.754s

user 1m19.653s

sys 0m0.004s

Intel? Atom? CPU D2550 @ 1.86GHz (2 Cores)

cache size : 512 KB

4GB Memory

Link to comment
Share on other sites

What's the difference between Funtoo and Gentoo?

Quote from funtoo.org

Funtoo Linux features native UTF-8 support enabled by default, a git-based, distributed Portage Tree and funtoo overlay, an enhanced Portage with more compact mini-manifest tree, automated imports of new Gentoo changes every 12 hours, GPT/GUID boot support and streamlined boot configuration, enhanced network configuration, up-to-date stable and current Funtoostages, all built using Funtoo's Metro build tool. We also offer Ubuntu Server, Debian, RHEL and Fedora-based kernels.

IOW, Optimized "from scratch" gentoo, git portage, currrent is based on ~x86 and ~amd_64 things like that

Link to comment
Share on other sites

Hi All,

I have been playing around with some Compiler Optimizations in the Linux Kernel. I want to know what your results are if you run this in a terminal.

Whats your Hardware and whats your kernel version?

time echo "scale=5000; a(1)*4" | bc -l[/CODE]

Care to elaborate on why? You're basically benchmarking bc and how good your compiler is. The kernel (or OS in general) is going to have little to no effect on the result.

[b]Hardware[/b]

  • Sun UltraSparc T1 (Niagara) CPU clocked at 1.0 GHz
  • 8 GB of RAM

The T1 is a very interesting CPU architecture (terrible single threaded performance, god awful floating point, but great integer throughput). How'd you get your hands on one?

Link to comment
Share on other sites

Care to elaborate on why? You're basically benchmarking bc and how good your compiler is. The kernel (or OS in general) is going to have little to no effect on the result.

The T1 is a very interesting CPU architecture (terrible single threaded performance, god awful floating point, but great integer throughput). How'd you get your hands on one?

I disagree. I have already shown that compiling the latest 3.9 Kernels with CPU specific optimizations yeilds better results than the 3.8 Kernels. Also In the early part of this thread the two different Kernels were producing DIFFERENT number streams.

Link to comment
Share on other sites

I disagree. I have already shown that compiling the latest 3.9 Kernels with CPU specific optimizations yeilds better results than the 3.8 Kernels. Also In the early part of this thread the two different Kernels were producing DIFFERENT number streams.

The problem with this conclusion is that your results come from many different systems each with a different configuration. They may have different kernels, but different hardware or execution environment may be the cause of the difference.

To actually compare these results you would need to run your tests on the same hardware with the same configuration (OS, connected peripherals, etc...) and only vary the kernel version.

Run repeats to get more reliable timings, with both cold and warm starts, and only then could you start to draw any conclusions. At the moment all you can really say is that computer's owned by different people produce different results when calculating Pi and take different lengths of times to do so.

Also what do you want to measure? If you are just measuring the performance of a single process to complete a result then your results could be skewed. For example, you may miss an overall decrease in system performance if say the different compile optimisations increase the code size and make the task scheduling code too large to fit in the cache. By running a single high load application you may not hit this issue often enough to observer performance degregation. But in, say a webserver, multiple high load processes may end up performing worse due to this "optimisation". So you will need to run different tests to see the impact of any optimisation.

Of course this all depends on the hardware you are running on too, so you will need to run a benchmark on the different hardware with the same configurations you tested the first bit of hardware on.

In short, benchmarking performance is difficult. But, specifying what you mean by performance (latency, average completion time, total completion time, lateness, maximum time taken) will definitely help with benchmarking.

Various tests used to benchmark the kernel:

http://kernel-perf.sourceforge.net/about_tests.php

http://lbs.sourceforge.net/

https://wiki.archlinux.org/index.php/Benchmarking

Link to comment
Share on other sites

Linux Mint 14.1 live usb key

Intel? Core? i7-3770K CPU @ 3.50GHz, 4800 MHz

real 0m13.728s

user 0m13.713s

sys 0m0.000s

Same CPU (though running at 3465.49Mhz) and Linux...and running in VirtualBox

real 0m17.286s

user 0m17.125s

sys 0m0.140s

Link to comment
Share on other sites

The T1 is a very interesting CPU architecture (terrible single threaded performance, god awful floating point, but great integer throughput). How'd you get your hands on one?

You can buy Sun Fire T2000's on ebay for $400-800. While their single threaded performance is truly awful (and apparently floating point as well), they run heavily multithreaded applications very well. I have been impressed with the Java and Qemu performance in particular. I can build i386, AMD64, and ARM packages faster in qemubuilder on that machine than natively on their respective architectures. (That is comparing the 1.0 GHz T1 to a 2.4 GHz Core 2 Quad for i386 and AMD64 builds. The ARM machine is slow enough that it's not really a fair comparison.)

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.