I'd like to know where he's getting that no-HT=<8 threads, too. I have Task Manager open in detail mode - simply to check that; Google Chrome (which is not an x64 browser) has three tabs open, and eighteen processes running. That does NOT include the two crash handlers (Google has both x64 and x32 crash handlers -rather odd for an x32 browser). And I'm not running on an i-series CPU of any sort, let alone one with HT; this is a Q6600 -- Kentsfield, the original Intel quad-core. Chrome has the most processes of anything; even svchost (Services Host) uses just twelve processes - and that's a core process. (I picked svchost.exe because it uses the second-most process threads behind Chrome itself. It's also an x32 application/executable. So, we have one application with eighteen threads, and another with twelve - and neither is 64-bit? The OS is, but the executables are not. )
The amount of processes in use is not a measure of code efficiency by any stretch of anyone's imagination - despite Chrome's eighteen processes (and svchost's twelve) the CPU is still using less than thirty percent of available resources - in other words, despite the process load, it's basically "loafing".
It appeared to me that in subsequent posts that he thought that applications had to implement thread scheduling themselves as opposed to scheduling/thread-handling being an OS-service and preemption/interrupts being hardware supported. So, I think from there he had just assumed that it would be very difficult to do such a thing without taking massive performance hits. He had used an example of an video encoder only using the same number of threads as physical cores as evidence for what he was saying.
I don't think most people have the background to understand the nuances of processor utilization or scheduling and just assume more cores is better and that it would be impossible for a machine with less cores to do an equivalent amount of work in the same amount of time or less.