Jump to content



Photo

AMD 2014 Roadmap Discussion (CPU/APU)

amd cpu apu

  • Please log in to reply
49 replies to this topic

#16 Athernar

Athernar

    ?

  • Joined: 15-December 04

Posted 14 January 2014 - 06:18

It is double the performance in well optimized code. You can't necessarily pack instructions perfectly (or maybe at all in some cases) though unless you are doing something simple like matrix multiplication though so in real-world applications it is less.

 

EDIT: speaking of FMA3/4. Yeah, that business was a headache. Though, I don't think it was Intel's fault persay. I was under the impression that both Intel and AMD switched half way through to the opposite. It looks like FMA3 is the standard going forward so it should be settled.

 

EDIT2: Also to be fair, that doesn't necessarily get rid of all issues. Some AMD cores use to shared FMAC units (e.g. 1 unit per 2 core) where you could either issue a single 256-bit width FMA instruction on a single core or two 128-bit width FMAs on adjacent cores. In practice you saw better performance by doing the latter. The point being is that you don't end up with a unification in optimizations regardless.

 

Yeah, unfortunate for people with bdver1 based chips, but bdver2+ supports FMA3.

 

As to the FMAC stuff, yeah I hinted to as much in my initial post referencing improvements in Excavator/bdver4. (Presumably 2x 256-bit FMACs?)

 

What I don't understand though, is with the changes that are landing post-Bulldozer (Like Steamroller's additional decoder), aren't AMD essentially just edging slowly back to having a "standard" architecture? Seems rather odd to make that change just when their HSA stuff is reaching early maturity.




#17 +snaphat (Myles Landwehr)

snaphat (Myles Landwehr)

    Electrical & Computer Engineer

  • Tech Issues Solved: 29
  • Joined: 23-August 05
  • OS: Win/Lin/Bsd/Osx
  • Phone: dumb phone

Posted 14 January 2014 - 06:35

Yeah, unfortunate for people with bdver1 based chips, but bdver2+ supports FMA3.

 

As to the FMAC stuff, yeah I hinted to as much in my initial post referencing improvements in Excavator/bdver4. (Presumably 2x 256-bit FMACs?)

 

What I don't understand though, is with the changes that are landing post-Bulldozer (Like Steamroller's additional decoder), aren't AMD essentially just edging slowly back to having a "standard" architecture? Seems rather odd to make that change just when their HSA stuff is reaching early maturity.

Do you mean why 2x 256-bit FMACs? Probably because it's easier to design, layout, and uses less die space. I'm sure they aren't wasting any space in their design.

 

But yeah they'll continue to move forward with less and less shared parts of the pipeline in future designs. It's always going to be a balance of what more you can shove into the die at the end of the day though. Isn't AMD still a process size behind Intel? If so, that'd probably be a constraint in continuing to share resources.



#18 Athernar

Athernar

    ?

  • Joined: 15-December 04

Posted 14 January 2014 - 06:46

Do you mean why 2x 256-bit FMACs? Probably because it's easier to design, layout, and uses less die space. I'm sure they aren't wasting any space in their design.

 

But yeah they'll continue to move forward with less and less shared parts of the pipeline in future designs. It's always going to be a balance of what more you can shove into the die at the end of the day though. Isn't AMD still a process size behind Intel? If so, that'd probably be a constraint in continuing to share resources.

 

The presumably part was in regards to what constituted Excavator's "FPU Improvments", I don't think they've been detailed yet. All that's known thus far is it supports AVX2 amongst other things.

 

As far as process size goes I had read that the issue was due to contractual obligations (Soon to end) with the spun-off GloFo, the Radeon 7xx0 parts fabbed at TMSC are/were 28nm vs GloFo's 32nm.



#19 +snaphat (Myles Landwehr)

snaphat (Myles Landwehr)

    Electrical & Computer Engineer

  • Tech Issues Solved: 29
  • Joined: 23-August 05
  • OS: Win/Lin/Bsd/Osx
  • Phone: dumb phone

Posted 14 January 2014 - 06:53

The presumably part was in regards to what constituted Excavator's "FPU Improvments", I don't think they've been detailed yet. All that's known thus far is it supports AVX2 amongst other things.

 

As far as process size goes I had read that the issue was due to contractual obligations (Soon to end) with the spun-off GloFo, the Radeon 7xx0 parts fabbed at TMSC are/were 28nm vs GloFo's 32nm.

Eh, they are still a few years behind Intel in that regard then (that's always going to be the case at this point me thinks). It's funny because Intel doesn't necessarily have to make great designs. They can just ride the benefits from a better process and do easy improvements. That's not to say that they don't do good designs, but they'd stay afloat even if they didn't.



#20 Luc2k

Luc2k

    Neowinian

  • Tech Issues Solved: 3
  • Joined: 16-May 09

Posted 14 January 2014 - 07:51

Eh, they are still a few years behind Intel in that regard then (that's always going to be the case at this point me thinks). It's funny because Intel doesn't necessarily have to make great designs. They can just ride the benefits from a better process and do easy improvements. That's not to say that they don't do good designs, but they'd stay afloat even if they didn't.

Or pay OEMs to ignore the competition when they can't match them.



#21 +snaphat (Myles Landwehr)

snaphat (Myles Landwehr)

    Electrical & Computer Engineer

  • Tech Issues Solved: 29
  • Joined: 23-August 05
  • OS: Win/Lin/Bsd/Osx
  • Phone: dumb phone

Posted 14 January 2014 - 15:29

Or pay OEMs to ignore the competition when they can't match them.

And then have the resulting lawsuit yield just a slap on the wrist :-D



#22 TheExperiment

TheExperiment

    Reality Bomb

  • Tech Issues Solved: 1
  • Joined: 11-October 03
  • Location: Everywhere
  • OS: 8.1 x64

Posted 14 January 2014 - 17:07

Well, it was on the roadmap, so here's a benchmark from TechReport for the A8-7600

http://tinyurl.com/oboh9gf
No Mantle reviews yet but it does damn nice.



#23 Luc2k

Luc2k

    Neowinian

  • Tech Issues Solved: 3
  • Joined: 16-May 09

Posted 14 January 2014 - 17:25

I think the NDA lifts in 1-2 days (I forget the exact time). Hopefully there's some benches of the CPU side with a dGPU.



#24 +Zlip792

Zlip792

    Neowinian Senior

  • Tech Issues Solved: 9
  • Joined: 31-October 10
  • Location: Pakistan
  • OS: Windows 8.1 Pro 64-bit
  • Phone: Nokia C3-00 (8.70 firmware) It sucks!!!

Posted 14 January 2014 - 17:32

Reviews are out already:

http://www.anandtech...-7600-a10-7850k

http://techreport.co...cessor-reviewed

http://www.guru3d.co...u_review,1.html



#25 vcfan

vcfan

    Doing the Humpty Dance

  • Tech Issues Solved: 3
  • Joined: 12-June 11

Posted 14 January 2014 - 17:54

wow, disappointing. I was expecting better out of steamroller.



#26 TheExperiment

TheExperiment

    Reality Bomb

  • Tech Issues Solved: 1
  • Joined: 11-October 03
  • Location: Everywhere
  • OS: 8.1 x64

Posted 14 January 2014 - 18:03

wow, disappointing. I was expecting better out of steamroller.

It's about what I was expecting.  AMD finally has a great APU to recommend, especially with Mantle and TrueAudio around the corner. =)

 

Now to get my brother off his Llano laptop...it might be possible now that he's invested in Warframe.



#27 vcfan

vcfan

    Doing the Humpty Dance

  • Tech Issues Solved: 3
  • Joined: 12-June 11

Posted 14 January 2014 - 18:21

It's about what I was expecting.  AMD finally has a great APU to recommend, especially with Mantle and TrueAudio around the corner. =)

 

Now to get my brother off his Llano laptop...it might be possible now that he's invested in Warframe.

I think for the same price,you can get better performance out of discreet components, but the advantage here and when this apu will really shine is if the software begins to take advantage of stuff like mantle, HSA, true audio. If the software support doesn't get there, then i don't know what amd can do. intel is so far ahead of them in single core performance.



#28 Andre S.

Andre S.

    Asik

  • Tech Issues Solved: 10
  • Joined: 26-October 05

Posted 14 January 2014 - 18:27

@snaphat I'm not really advocating SIMD intrinsics in C# or other high-level languages, what I'd like to see is more natural ways to express parallel or vector operations that can then be easily compiled into vector/parallel code without the need for sophisticated reverse-engineering. Again, going back to simply adding two arrays. I should be able to just say

 

int[] c = a + b;

 

where a and b are arrays, and c is an array containing the sum of their respective elements. That is something that can be implemented sequentially, or with SIMD instructions, or that can even run on multiple threads if the arrays are sufficiently large.

 

There is some of that already in functional constructs like LINQ and functional languages like F#, but needless to say the runtime support for actual vectorization/parallelisation isn't there. So there is work to do on runtimes, but there's also work at the language level. Functional languages, although favoring immutable data and therefore being naturally more amenable to parallelism, were not really designed with this purpose in mind. What if a language was built from the ground up for parallelism?

 

By switching your language to be inherently parallel you are giving the compiler the task of automatically (auto-magically) mapping parallelism to limited resources or more strictly put: the compiler is responsible for optimizing perfectly parallel code to fit on machine with limited parallel resources (chunking things into tasks and simd instructions). This doesn't necessarily end well because it is actually a fairly difficult problem from a compiler standpoint. Hand-tuned manual parallelism using control-flow languages tends to yield better results.

 

I wonder if one day optimizers won't beat humans at that game too like they did for assembly code decades ago. In the meantime, you can have declarative/functional constructs and manual parallelism, for instance you insert the AsParallel() extension method where you think the granularity is optimal and everything else stays declarative, so that's already something.

 

When I say async changes the way programmers think, I mean they can look at a given piece of asynchronous code and reason about it in the same way they do a sequential piece of code: the two look identical with the exception of a few additional keywords, and apart from the asynchrony the logic is the same. Whereas previously you had to build this complicated state machine with callbacks and while the end result was the same, the code looked nothing like the logic you were trying to implement, it was all about handling asynchrony rather than doing what you were actually trying to do.  Here's a good example of that: http://mnajder.blogs...ment-using.html

 

You seem to be making two arguments to me: you want more control in higher level languages but you want less control in lower level languages. Complex programs require complex languages and you aren't going to get a silver bullet of easy-to-use-with-lots-of-power-and-runs-well language. The fact is, no-one has an answer for what the perfect language or runtime is: it's simply an open research topic. Currently, from what I've seen, people think the best alternative to do a layered approach. At the low level you have languages that map well to your architecture and a higher level you have languages that are transformed into lower-level languages. You expose architectural details in the low level languages, but not in the high level languages. On top of that you expose some sort of hinting system  at the high level to give help with parallel optimizations (e.g. group these tasks). This is just a vague trend I'm alluding to that has been occurring in HPC.

 

I pretty much agree with the alternative you describe. I just think that the kind of high level language we use now does not always lend itself well to be transformed into a low-level parallel form, because all these languages were designed at a time where parallelism wasn't a concern; all the "C" languages have the obsolete notion of a "currently executing statement", no notion of vector variables, no parallel constructs except for library extensions, shared global state everywhere, etc. Only Rust and perhaps Go I think are currently attempting to tackle these issues.



#29 Andre S.

Andre S.

    Asik

  • Tech Issues Solved: 10
  • Joined: 26-October 05

Posted 14 January 2014 - 18:39

On the topic of Kaveri, I think it'll only make sense in laptops, especially with Mantle, where it'll destroy anything Intel currently does in gaming performance. On desktops, for the price of the A10 you could get an X4 and an HD7750 and get way better gaming performance, so I really don't see the use case for it.

 

There are potentially large performance gains to be made from taking advantage of the unified memory for better CPU/GPU collaboration, but is any software available today taking advantage of that? With the low market share Kaveri will have, will any software take advantage of that? There's a bright potential future, but we need software to use it.



#30 Athernar

Athernar

    ?

  • Joined: 15-December 04

Posted 14 January 2014 - 18:40

Eh, they are still a few years behind Intel in that regard then (that's always going to be the case at this point me thinks). It's funny because Intel doesn't necessarily have to make great designs. They can just ride the benefits from a better process and do easy improvements. That's not to say that they don't do good designs, but they'd stay afloat even if they didn't.

 

TMSC are on track to offer 20nm in "early 2014", so they certainly seem to be ahead of GloFo. The question is if/when AMD can dump GloFo.

 

--

 

Interestingly, looking at the Anandtech review of the 7850k posted above, said article alleges that the GloFo 28nm process is optimised for density at the cost of clockrate - could we have our reason for no Steamroller FX mayhaps?