Slashdot Log In
PS3 Cell Processor 'Broken'?
Posted by
Zonk
on Mon Jun 05, 2006 07:27 AM
from the it's-drinking dept.
from the it's-drinking dept.
D-Fly writes "Charlie Demerijian at the Inquirer got a look at some insider specs on the PS3, and says, Sony screwed up big time with the Cell processor; the memory read speed on the current Devkits is something like 3 orders of magnitude slower than the write speed; and is unlikely to improve much before the ship date. The slide from Sony pictured in the article is priceless: 'Local Memory Read Speed ~16Mbps, No this isn't a Typo.' Demerjian says when the PS3 comes out a full year after the XBox360, it's still going to be inferior: 'Someone screwed up so badly it looks like it will relegate the console to second place behind the 360.'" This is the Inquirer, so take with a grain of salt. Just the same, doesn't sound too good for Sony or IBM.
Related Stories
[+]
News: Sony Pushes Back Release For Blu-Ray Players 262 comments
Sony has announced that their first model of Blu-Ray player will release in August, not later this month as originally announced. The BDP-SP1, retailing for $1000, will now ship on or about August 15th. Bad news for fans of the new format, and even worse news for the PS3. Since Sony's lackluster E3 showing, a string of bad news has seemed to conspire against the company's next-gen console. From the Gamers with Jobs article: "With the PS3's high-end model coming it at a whopping $400.00 less than a stand-alone Blu-Ray player, Sony needs to release these players as soon as possible. If they wait too long, the PS3 will begin looming on the horizon, causing even devout early adopters to question the intelligence of buying a stand-alone Blu-Ray unit. Sony also needs the largest possible installed base, come launch-time for the PS3. For the Blu-Ray player to be the PS3's version of the PS2's DVD player, casual technophiles need to be able to see the virtues of the Blu-Ray format. If there are few players, and few titles, this might not happen."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
PS2 Vs PS3 (Score:5, Informative)
If you really want to dig into the details of the Cell processor, check out Sony's resources [scei.co.jp]. You have to agree to a bunch of things to get to the pdfs but there's a lot of information [scei.co.jp] in them. Another place you can find information is IBM's resource site [ibm.com] which contains a lot of stuff including the programming handbook.
dev kits (Score:4, Insightful)
Re:dev kits (Score:4, Interesting)
Parent
Inquirer, yes, but... (Score:5, Insightful)
Re:Inquirer, yes, but... (Score:5, Insightful)
By the way, I'm not discounting that it could be real - it's got me curious enough to look on the web for the last 10 mins for some documentation to back up the claims in the story.. I couldn't find anything though.
Anyone got any real documentation or anything to back up the claim?
Parent
Re:Inquirer, yes, but... (Score:5, Insightful)
The picture says that the read speed for the Cell from "Local Memory" is 16Mb a second. Assuming it is true (I've got no reason to doubt it), then it still doesn't matter.
The "Local Memory" is the RSX graphics memory. The Cell shouldn't need to read this. The PS3 would still work even if the Cell couldn't read this memory at all. This memory is where you store textures and other graphics data.
Parent
Re:Inquirer, yes, but... (Score:5, Informative)
The RSX can read the Cell's RAM at ridiculous speeds which is all that matters. The RSX can render out of main memory, so you shouldn't ever be using the Cell to read from the RSX's RAM at all. The Cell will probably be manipulating vector data for the RSX, but 256MB for all executable code and vector data is still more than enough. The 256MB attached to the RSX would have been used primarily for textures even if the Cell could read from it at reasonable speeds
Parent
Not the National Enquirer (Score:4, Insightful)
This isn't the online IT arm of the National Enquirer, you know.
The Inq isn't always right, but what the do tend to have is a lot of news-breaking stuff that they're (well, Mike) is willing to publish regardless of the consequences when the corporate heads find out there's a leak. Thats' why Mike got eased out of The Register when it went more corporate to form the Inq in the first place.
Those who have been following it for a while will remember all the appearances of leaked memos from Compaq (ex-DEC) insiders who were willing to leak happily to someone of the old school who was interested in seeing how the whole fiasco was turning out. Compaq/HP even started internal witchhunts looking for the leakers.
Regardless, the only real problem people might have with the Inq is they can't distinguish between an opinion piece and direct reporting, or can't accept that while the information as presented might be correct, it doesn't ensure that interpretive parts also follow.
Parent
Re:Inquirer, yes, but... (Score:5, Interesting)
The Register doesn't have this rep, yet they share common DNA and I've seen at least one case [boingboing.net] where they have actually had their integrity called into question.
As for TFA, we all heard many moons ago that the PS3 was a bitch to program for (the comparison I've seen most often on this very site is to the Saturn, which iirc had 2 cpus), and Sony aren't exactly filling the marketplace with confidence on this one. If the slow speed of this "local memory" to Cell access is irrelevant to any conceivable operation, as most people here seem to be saying, then why is it even mentioned on this slideshow?
Seems to me there's a good mix of Shooting the Messenger, Ignoring Inconvenient Facts from the TFA and maybe even just a hint of Fanboyism here.
Parent
Re:Inquirer, yes, but... (Score:5, Interesting)
I can share with you why I don't go to their site anymore. Check out this page:
http://www.theinquirer.net/?article=11159 [theinquirer.net]
This is back in 2003, not long after the Blaster worm hit. The Inquirer requested people send in photos of Windows not working in places such as airports. As a result, they took this photo and told the little story like this:
My beef with this? It's quite clear from this image that IE is reporting that it cannot find the page. This isn't an IE problem. This is a problem with either the network connection on that computer or the server feeding the page. In other words, niether Mozilla, Netscape, or Opera would have rectified this difficulty. I sent them an email about it, but it went unresponded. (That wouldn't have surprised me except they had responded rather quickly to another enquiry I made that didn't point out their journalistic silliness...)
I don't know if this is a problem most people would care about. The way I understood it, they were trying to give Microsoft a hard time over serious quality issues of Microsoft's software. That, in and of itself, I don't have a problem with. But this little story basically told me that they weren't serious about being correct about the news they were reporting as long as it fit their agenda. It was then that I stopped bothering to visit their site.
In the interests of being fair, though, I should point out that this story is three years old, and a lot can happen in that time. It is not my intention to convince you that they are currently behaving this way. Rather I'm just answering your question about their negative rep.
Parent
Re:Inquirer, yes, but... (Score:5, Interesting)
They latch on to a fact and twist it. The Cell reads from the graphics card's memory at glacial speeds, so they run the headline "PS3 hardware slow and broken" and fail to point out the fact that you would almost never want to do this in a game.
A respectable article would have pointed out that this doesn't have any impact on games, but will effect applications. The 256MB of RAM connected to the video card is really only good for vertex data and textures, so you are only left with 256MB to run the executables in. The practical implications of this information means that Linux will only be able to use 256MB of RAM. The RSX(graphics card) can render out of it's own local memory or main memory(almost as fast as local mem), anything that needs to be modified by the Cell must stay in main memory because of this bandwidth issue.
Luckily, games contain a lot of static models and static textures that will easily fill up the 256MB of local mem on the RSX; stuff that the Cell would never read from....
Parent
How much you want to bet (Score:4, Funny)
main memories read speed is 25GB/s (Score:5, Insightful)
I assume the local memory is not going to be used much for 'reading' and only main memory is going to be used.
Re:main memories read speed is 25GB/s (Score:5, Informative)
So it is perfectly normal for texture memory to be nearly write-only. As long as writing to it is extremely fast (which it is in this case according to the PP slide), that isn't a problem.
Parent
Re:main memories read speed is 25GB/s (Score:5, Insightful)
Parent
That one Simpsons episode (Score:4, Funny)
What was that called again?
Re:That one Simpsons episode (Score:4, Informative)
Parent
Does it really matter? (Score:5, Funny)
Why ./ is bashing Sony so much? (Score:5, Insightful)
Re:Why ./ is bashing Sony so much? (Score:5, Funny)
Parent
DevStation? (Score:5, Funny)
DeviStation
Article completely misses the point (Score:5, Insightful)
That the read performance for the Cell from this memory is dreadful is no surprise. This is exactly the same architecture that has been traditionally used in PCs. Reading graphics memory from the main processor is usually really really slow.
This memory is where you store textures and other graphics data. The main processor will usually have little need to read from this memory. If it does, then, as apparently Sony says, you just get the RSX to write to main memory instead.
This is a non-story. People have dealt with this for PC games for a long time.
For goodness sake... (Score:5, Informative)
"Load and store operations (LS), 6 Clock cycles Latency". And that's the time it takes for the instruction to complete, not to be issued to memory.
(3.2Ghz / 6 cycles) * 16 bytes != 16MB/s
Personally, I'm gonna bet on IBM being right, seeing how they're the ones who made the bloody thing. I don't trust the inquirer anyway, but if those figures are true, the most likely answer is inefficiencies in their benchmarking programs, (Such as instruction starvation, a nasty side effect of using SPU's)
History Repeats Itself (Score:5, Insightful)
I think being too connected to the online debates about this stuff can make you lose sight of what the more average public thinks and bases their purchase decisions on. That's why the only real argument for the PS3's failure so far is the high price, not questions about performance or developer issues.
16MB/s = CPU reading GPU memory directly (Score:5, Informative)
Memory transfer bandwidth between each SPU and its SPU Local Memory is something more like 25GB/s (gigabyte per second); sustained actual bandwidth between all SPUs is greater than 100GB/s; peak theoretical is greater than 200GB/s (assuming all 8 SPUs present for simplicity).
If you had access to the full version of the presentation (part of the full Sony PS3 SDK and technotes), you'd realise that that slide is part of a presentation about the RSX (the PS3's GPU). As such, when it refers to "Local Memory", it means RSX's Local Memory (eg graphics memory, video memory, VRAM or whatever you call it in fanboy/ps3/360-is-teh-suck websites). To be understood outside that context, the columns would be better labelled "Main System Memory" and "GPU Local Memory".
The Inquirer article seems to suggest that this figure of 16MB/s (megabyte per second, by the way, what the fuck is it with journalists swapping bits for bytes? why don't they get their shift/capslock keys fixed?) is some kind of show stopper. No it isn't. It simply means that the Cell processor has 16MB/s bandwidth when reading directly from memory-mapped GPU address space. So what? Unless you're planning on calling memcpy() or some shit to bring your data back then it doesn't really matter.
On RSX-initiated transfers you have 20GB/s bandwidth to do the same transfer (from RSX local to main system memory). Cell read bandwidth of GPU memory might as well have 0MB/s (ie no connection at all) and it wouldn't matter a bit.
Re:16MB/s = CPU reading GPU memory directly (Score:4, Interesting)
This article takes the statement completely out of context, and the Slashdot reaction to it is just ridiculous.
Anybody who didn't know reads from GPU memory are slow turn in your geek card right now! On a PC, even with a 4GB/sec AGP connection, reading from the framebuffer can be as slow as 75MB/sec. This has been true for a very long time --- GPU's don't like anybody else directly touching their framebuffer. That's why Microsoft took direct framebuffer access out of "DirectX". It's a performance killer on modern systems. Sony's "work around" for the situation, using the GPU to handle texture uploads/downloads, isn't news --- it's common knowledge to anybody who has done any graphics programming on modern hardware.
Parent
Yay! (Score:5, Interesting)
Everyone close to me in the industry said I was crazy and that this would all smooth out and Sony would easily retain its market share if not grow more. I wasn't buying it and stuck to my guns, I'm pretty happy about my decision almost daily since day 1 of E3 this year.
I was against UMD from the beginning, yet everyone claimed that the sales were stellar. Looks like they weren't and they are proprietary, expensive, unwieldy little discs that no one wants to deal with. The "cell" processor was without a dobt my turning point, I have ZERO faith in it or the architecture and it will not become this ubiquitous omnipresent processor as so many claim, even IBM has major problems with it and designing compilers and dev software for their own product. Control schemes have been radically changed from initial proposals, and too quickly to be properly tested... that is a bomb yet to go off. System price and dev costs that are just too high for our current economic situation as well as for widespread adoption. There are more issues, but top it all off with a new unproven media that is also expensive and offers no real consumer advantages and you have the high risk of a catastrophic failure that could hurt Sony and IBM even more than they are already hurting.
The best that can happen is that companies finally lose the DRM/proprietary/Closed nature of their consumer electronics. Stop treating customers as criminals and start to offer them affordable and accessible entertainment that is convenient. I'd actually prefer consoles to standardize and become built into consumer electronics so that developers and consumers can really get to work on a stable and long lasting platform. Imagine the possibilities. There is a lot to be said for standards.
Use the "Flame the Author" Link (Score:4, Informative)
My flame:
"I'm sure you'll get a lot of these messages, but hell, you deserve it.
The slow read speed you noted in the slide is for Cell reading from the RSX's local memory. Such accesses are expected to be very slow. If you look at this USENIX article from one of the Linux DRI folks, you can see this quite easily:
DRI article [usenix.org]
He shows how painfully slow it is to read from AGP or framebuffer memory (14 and 5 MB/sec, respectively), on a Rage 128 graphics card. For the CPU to framebuffer read, which is the equivalent to what we're talking about here, the read speed is 1/40th the write speed. At 16MB/sec read and 4GB/sec write, the PS3 is actually right in line with what can be expected of modern GPU architectures.
Reading from the framebuffer is just slow unless you have a unified memory architecture. The CPU and the GPU aren't cache-coherent, which means every access to framebuffer memory (or even AGP memory, which is actually a chunk of system memory allocated to the GPU) must be an uncached access. Uncached accesses are just plain slow, on any architecture.
The way your article is written, it makes it seem like Cell reads its local storage at 16 MB/sec. That is, of course, bollocks, since IBM has shown benchmarks of the Cell local storage achieving 98% efficiency. If you had any journalistic integrity at all, you'd post a retraction on your site, and a clarification of the technical issues involved."
Re:Go Sony, go! (Score:5, Funny)
Parent
Re:Go Sony, go! (Score:5, Funny)
Well theres your problem, fitting 512 Men in Black into any console is going to cause heat problems. . .
Parent
Re:Go Sony, go! (Score:5, Funny)
Parent
Re:Go Sony, go! (Score:5, Informative)
Parent
I disagree. Those numbers are for Cell. (Score:4, Informative)
And the theoretical bandwidth numbers listed for CELL to main memory are those of the direct XDR interface. You'll note that the RSX has much lower numbers because it accesses main memory through a bridge bus (much like a graphics card on PCIe).
On the Cell, there is only one thing local memory can mean, and that is the local memory of each SPE.
NOTE: this can be a serious issue, because each SPE MUST read instructions and write results to the local memory. It is up to the main processor to load instructions into this memory from main memory, and to copy results from this local memory to main.
Parent
IGNORE MY COMMENT (Score:5, Insightful)
Parent
Re:Go Sony, go! (Score:5, Interesting)
And it turned out to be one fo the most successful consoles ever.
Parent
Re:Go Sony, go! (Score:4, Insightful)
That tends to happen when your basically the ONLY console. Not discounting Nintendo but its targeted a very different group of people than the PS3.
Parent
Re:Go Sony, go! (Score:5, Insightful)
Parent
Re:Go Sony, go! (Score:4, Insightful)
Secondly, of the reasons the PS2 was successful, its graphical performance isn't relevant. It's successful because:
1) When it came out, it had (basically) no competition. The Nintendo 64 was way past its prime, and the Dreamcast was pretty much already dead by that point. PS2's coming out was a death-blow to Dreamcast, and everyone knew it.
2) Because of backwards-compatibility, it had a huge selection of games at release.
The PS2's graphics performance *is* disappointing. It barely beats out the Dreamcast, and it can't hold a candle to the Gamecube or Xbox. Has nothing to do with success.
Parent
Re:Go Sony, go! (Score:5, Funny)
Not trying to flamebait here, but what is one of the OS's that will be running on PS3? Hint, it starts with L and ends with X.
Parent
Re:Go Sony, go! (Score:5, Funny)
Parent
Re:Go Sony, go! (Score:4, Funny)
Parent
Re:Go Sony, go! (Score:5, Informative)
IMO it's reasonable to have asynchronous communication with the graphics subsystem. The only stupid thing going on is calling graphics cards memory "Local Memory". It suggests that the X-Box got it right by having one big chunk of memory that is read by both the CPU and GPU even if most developers will make the same basic split anyway.
Parent
Re:Go Sony, go! (Score:5, Informative)
Presumably in the (unlikely?) event you did need the output from the RSX graphics chip for manipulation by the Cell processor gubbins, you could get it to render to main memory, let the processor do the appropriate data-diddling, then have the RSX read it back again?
The 'local memory' is presumably the RSX's private play area, and thus the RSX gets maximum-stupendous-speed priority, and the Cell gets occasional access at weekends. Which is a bonus, and not even necessary...
Parent
Re:Go Sony, go! (Score:5, Informative)
Parent
Re:Go Sony, go! (Score:5, Informative)
In the slide, the "Local Memory" refers to the RSX local memory, not the SPU local memory. The article says that the next slide is Sony telling devs to use the RSX to do the transfer instead, which only makes sense if it is talking about the RSX memory.
Your conclusion is right though, as this also is memory that the Cell doesn't need to read from.
Parent
Broken benchmark, perhaps? (Score:5, Informative)
Either that, or a broken benchmark. Each Cell processor (Synergistic Processing Element -- SPE) [ibm.com] shares its instruction fetch port with its data memory port. The SPE can buffer up 80 instructions at a time (2.5 fetch words), plus an additional 32 from a branch target. Fetch will stall if the memory system gets saturated with loads and stores. Properly written memory-intensive code includes explicit fetches to keep these buffers full. Incorrectly written code will cause problems. Still, that doesn't explain a 3 orders of magnitude drop.
If you look at the slides on the page I linked to above, you'll see the SPEs are not connected into the global address space. They connect to a private single ported memory, and to each other through two unidirectional rings. (The ring structure is not apparent from that diagram, but trust me, it's there.) These rings then connect to a DMA engine.
If you wade through this paper, [ibm.com] you'll see that the Cell compiler implements a software cache. (The same paper also explains the instruction fetch mechanism mentioned above, BTW.) That is, it emulates a cache in software, using the DMA to actually move memory around. Depending on the nature of the benchmark and how it was written, it could be that the read benchmark spends all its time allocating stuff into this cache and waiting for it to arrive. Writes would be faster because the cache can "write behind" without having to wait for the allocation to happen, if the compiler is smart enough to know that the previous data will be entirely overwritten. So, if the benchmark goofed, then the results are meaningless.
Fact of the matter is that the SPEs are capable of reading 128 bits a cycle each (128 bytes / cycle across the 8 SPEs). Other benchmarks, such as the article recently posted to Slashdot [slashdot.org] about using Cell for scientific computation [berkeley.edu] confirm that this thing hauls--and these are bandwidth-intensive tasks. The quoted paper did run some numbers on real silicon and showed numbers similar to their simulation results.
With all this in mind, I find it hard to believe that Cell is broken.
--JoeParent
Re:Nice post, but not relevant to the (FUD) articl (Score:4, Insightful)
Ahh, so this is the rate at which the Cell can read RSX's local memory? That I'll believe. And I will equally agree "BFD!" The Cell does its work and dumps everything to main memory or the RSX's memory. RSX does its work and if it needs to communicate anything major back to the Cell, it does so through main memory. Makes perfect sense then.
I thought something seemed awful fishy. I thought the slide was summarizing performance of the Cell SPE and RSX, not the Cell's and RSX's ability to communicate with the RSX's local memory. If your statement's true, then this paragraph in TFA is full of it: (Emphasis mine.)
It all begins to make a lot more sense, though, if this is about accesses from Cell or RSX to memory local to RSX. I admit ignorance on the RSX's architecture. I just know in my bones that those numbers aren't for a Cell SPE talking to its local memory.
--JoeParent
Re:DevKit (Score:4, Informative)
Parent
Re:This is the Inquirer, so take with a grain of s (Score:4, Insightful)
Parent
Re:D-Fly, you piece of shit: Mbps != MB/s (Score:5, Informative)
1) The poster had no clue
2) Zonk (and for that matter, the whole
3) This mistake happens _constantly_ on
4) Anyone with even a basic understanding of computers wouldn't make this mistake
Just more proof that "IT" != computer science
Parent