Intel's Speed Select Technology (SST) is a power management solution from the company that allows users to manage core prioritization and frequency regulation depending on the workloads in order to improve performance and efficiency.
However, as an Intel engineer has observed, there is performance regression by more than 10% in benchmarks with the mode enabled. And while it isn't stated, the impact in a real workload might be lower but it's still a cause for concern.
The engineer further explains that the standard Linux PCI interface which is used here is causing the delay as it searches through hundreds of PCI devices, during mapping, that are attached to the system. For those wondering why the need to mention hundreds of devices here, that's because Intel SST is a complex solution and is only available in Xeons and not in the mainstream Core lineup.
Since the root cause of the problem has been identified, the good news is that a patch that promises to fix this should be available soon via a future firmware if it isn't already out. The fix is a fairly simple one and will use the cached data that will speed up the search process.
Here's what the full LKML message says:
It was observed that some of the high performance benchmarks are spending more time in kernel depending on which CPU package they are executing. The difference is significant and benchmark scores varies more than 10%. These benchmarks adjust class of service to improve thread performance which run in parallel. This class of service change causes access to MMIO region of Intel Speed Select PCI devices depending on the CPU package they are executing.
This mapping from CPU to PCI device instance uses a standard Linux PCI interface "pci_get_domain_bus_and_slot()". This function does a linear search to get to a PCI device. Since these platforms have 100+ PCI devices, this search can be expensive in fast path for benchmarks.
Since the device and function of PCI device is fixed for Intel Speed Select PCI devices, the CPU to PCI device information can be cached at the same time when bus number for the CPU is read. In this way during runtime the cached information can be used. This improves performance of these benchmarks significantly.
Intel launched SST back in 2019 inside Cascade Lake Xeon CPUs. The technology is quite versatile as it enables several options like setting core prioritization, base clock tweaking, and more. As stated above, SST is implemented in the firmware and carried out by the processor's Power Control Unit (PCU). For more information on SST, visit Intel's official site here.