A patch to improve the boot times on massively parallel Linux systems is currently being prepared and initial performance numbers are extremely impressive. The patch isn't exactly new though and has been in the making since at least February of this year.
It will improve the effective utilization of many-core/thread server and workstation processor systems, like those based on AMD's EPYC / Ryzen Threadripper, and Intel's Xeon, while booting. On a 96-threaded Skylake system, the patch reduced the Bringup time (wake up time) for the cores from 500ms down to just 34ms, which is around a factor of 15.
Dusting off this patch series from February, in which we reduce the time taken for bringing up CPUs on a 96-way Skylake box from 500ms to about 34ms.
Apparently, more parallel performance can be extracted with the help of further optimization related to Time Stamp Counter synchronization (TSC sync) for kernel execution (kexec), and the INIT-SIPI (Startup Inter-Processor Interrupt) phase.
There is more parallelism to be had here, including a 1:many TSC sync (or just *no* TSC sync, in the kexec case), and letting the APs all run through their own states from CPUHP_BRINGUP_CPU to CPUHP_AP_ONLINE_IDLE in parallel too. But I'll take a mere factor of 15 for the time being.
We can also have a careful look at the remaining time spent in the initial INIT/SIPI phase and see what we can shave off it.
In case you're wondering, AP here indicates Application Processor. Generally, it's the Bootstrap processor (BSP) which handles the initialization and booting phase of a PC.
You can view the patch details here.