There's a lot of misinformation out there and a lot of people who don't get it.
Real world ram performance over 200GB/S. ~50% more bandwidth than the competition
It will be true that you can go directly, simultaneously to DRAM and ESRAM.
That equivalent on ESRAM would be 218GB/s. However, just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency.
we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally.
Digital Foundry: So 140-150GB/s is a realistic target and you can integrate DDR3 bandwidth simultaneously?
Nick Baker: Yes. That's been measured.
The biggest thing in terms of the number of compute units, that's been something that's been very easy to focus on. It's like, hey, let's count up the number of CUs, count up the gigaflops and declare the winner based on that. My take on it is that when you buy a graphics card, do you go by the specs or do you actually run some benchmarks? Firstly though, we don't have any games out. You can't see the games. When you see the games you'll be saying, "What is the performance difference between them?" The games are the benchmarks.
Explaining what balanced means in a system.
The goal of a 'balanced' system is by definition not to be consistently bottlenecked on any one area. In general with a balanced system there should rarely be a single bottleneck over the course of any given frame - parts of the frame can be fill-rate bound, other can be ALU bound, others can be fetch bound, others can be memory bound, others can be wave occupancy bound, others can be draw-setup bound, others can be state change bound, etc. To complicate matters further, the GPU bottlenecks can change within the course of a single draw call!
How important is the CPU to framerates and why cpu offloading was a big part of the design
Another very important thing for us in terms of design on the system was to ensure that our game had smooth frame-rates. Interestingly, the biggest source of your frame-rate drops actually comes from the CPU, not the GPU. Adding the margin on the CPU... we actually had titles that were losing frames largely because they were CPU-bound in terms of their core threads. In providing what looks like a very little boost, it's actually a very significant win for us in making sure that we get the steady frame-rates on our console. And so that was a key design goal of ours - and we've got a lot of CPU offload going on.
The scalar sounds cool. It can dynamically change per frame to reduce frame drops.
We've done things on the GPU side as well with our hardware overlays to ensure more consistent frame-rates. We have two independent layers we can give to the titles where one can be 3D content, one can be the HUD. We have a higher quality scaler than we had on Xbox 360. What this does is that we actually allow you to change the scaler parameters on a frame-by-frame basis. I talked about CPU glitches causing frame glitches... GPU workloads tend to be more coherent frame to frame. There doesn't tend to be big spikes like you get on the CPU and so you can adapt to that.
About the function of the eMMC memory.
Digital Foundry: Another thing that came up from the Hot Chips presentation that was new information was the eMMC NAND which I hadn't seen any mention of. I'm told it's not available for titles. So what does it do?
Andrew Goossen: Sure. We use it as a cache system-side to improve system response and again not disturb system performance on the titles running underneath. So what it does is that it makes our boot times faster when you're not coming out of the sleep mode - if you're doing the cold boot. It caches the operating system on there. It also caches system data on there while you're actually running the titles and when you have the snap applications running concurrently. It's so that we're not going and hitting the hard disk at the same time that the title is. All the game data is on the HDD. We wanted to be moving that head around and not worrying about the system coming in and monkeying with the head at an inopportune time.
Digital Foundry: Can you talk us through how you arrived at the CPU and GPU increases that you did and did it have any effect on production yield?
Nick Baker: We knew we had headroom. We didn't know what we wanted to do with it until we had real titles to test on. How much do you increase the GPU by? How much do you increase the CPU by?
lots more interesting details about the architecture