Follow Slashdot stories on Twitter

Ars Technica's Hannibal on IBM's Cell 449

Posted by timothy on Wednesday February 09, 2005 @01:41AM from the different-approach dept.

endersdouble writes "Ars Technica's Jon "Hannibal" Stokes, known for his many articles on CPU technology, has posted a new article on IBM's new Cell processor. This one is the first part of a series, and covers the processor's approach to caching and control logic. Good read."

This discussion has been archived. No new comments can be posted.

Ars Technica's Hannibal on IBM's Cell

Load All Comments

Search 449 Comments Log In/Create an Account

Comments Filter:

Apple? (Score:4, Insightful)

by tinrobot ( 314936 ) writes: on Wednesday February 09, 2005 @01:45AM (#11615689)

Why do I have the sneaking suspicion that, if successful, this processor will eclipse the PowerPC on the Mac in the next few years?

Share
twitter facebook
- Re:Apple? (Score:5, Informative)
  
  by Tropaios ( 244000 ) writes: <`tropaios' `at' `yahoo.com'> on Wednesday February 09, 2005 @02:29AM (#11615889)
  
  From the article:
  
  The Cell and Apple
  
  Finally, before signing off, I should clarify my earlier remarks to the effect that I don't think that Apple will use this CPU. I originally based this assessment on the fact that I knew that the SPUs would not use VMX/Altivec. However, the PPC core does have a VMX unit. Nonetheless, I expect this VMX to be very simple, and roughly comparable to the Altivec unit o the first G4. Everything on this processor is stripped down to the bare minimum, so don't expect a ton of VMX performance out of it, and definitely not anything comparable to the G5. Furthermore, any Altivec code written for the new G4 or G5 would have to be completely reoptimized due to inorder nature of the PPC core's issue.
  
  So the short answer is, Apple's use of this chip is within the realm of concievability, but it's extremely unlikely in the short- and medium-term. Apple is just too heavily invested in Altivec, and this processor is going to be a relative weakling in that department. Sure, it'll pack a major SIMD punch, but that will not be a double-precision Alitvec-type punch.
  
  Parent Share
  twitter facebook
- - - Re:Apple? (Score:3, Informative)
      
      by sholden ( 12227 ) writes:
      
      My 7 year old PC (300mhz PII) runs everything I need on a daily basis pretty well.
      
      Firefox, wily, gcc, python, perl, MS office, gimp and so on.
      - Re:Apple? (Score:3, Funny)
        
        by the_2nd_coming ( 444906 ) writes:
        
        I have OS X running nicely on my Quadra.
        
        Re:Apple? (Score:4, Funny)
        
        by Anonymous Coward writes: on Wednesday February 09, 2005 @02:26AM (#11615871)
        
        Oh, yeah? Well I have Windows XP on my IBM 5150. Well, not so much Windows XP as MS-DOS. Version 2.0. But I have two floppy drives, so I don't even have to take out the system disk to play Oregon Trail!
        
        Parent Share
        twitter facebook
        
        Re:Apple? (Score:2)
        
        by Nutria ( 679911 ) writes:
        
        I remember thinking I was pimp having two drives.
        
        You weren't even born yet, loser.
        
        It was a joke (Score:2)
        
        by dn15 ( 735502 ) writes:
        
        The parent (to your post) was joking. Quadras are Motorola 680x0 machines, not PowerPC. You'd have no more luck running OS X on one of those than you would Windows XP.
      - Re:Apple? (Score:2)
        
        by Suburbanpride ( 755823 ) writes:
        
        aha, my ibook orginaly shipped with osx9 in the days before itunes and iphoto, and yet it still runs fine, even with the new ilife '05 software. That being said, I'm sure the apple programers would do wonderful things with all the power that the cell has to offer, but I think they also deserve credit for writing software that will run on a wide range of computers.
    - Re:Apple? (Score:2)
      
      by Jerf ( 17166 ) writes:
      
      I just offered to upgrade my wife off of my Duron 800 to a Mac Mini and she declined, citing that exact argument.
      
      I don't *quite* know if that's 2000 vintage, but it's gotta be close.
    - Re:Apple? (Score:2, Informative)
      
      by Chemical ( 49694 ) writes:
      
      I've got a 400Mhz iMac that a friend gave me, and while it does run Panther, Safari, Quicktime, and iTunes, it struggles with all of them. Flash animations stutter, iTunes skips if you try and do anything else while using it. It is incapable of decoding a 640x480 Divx file fast enough to actually play it.
      For browsing simple websites or writing emails it works acceptably. For anything even remotely multimedia related, it is rendered useless.
      Meanwhile a 400Mhz PII running Windows 2K can play flash, mp3s,
    - Re:Apple? (Score:5, Informative)
      
      by prockcore ( 543967 ) writes: on Wednesday February 09, 2005 @02:57AM (#11615999)
      
      My old 600mhz g3 ibook runs panther, safari, quicktime, iphoto, itunes and everything else I need on a daily basis pretty well. Try saying that about a five year old PC.
      
      5 year old? Your 600mhz g3 ibook came out October 2001. That machine is just a few months older than 3 years old.
      
      In October of 2001, the P4 was at 2.0ghz, and the Athlon 2000+ was just coming out. Are you going to tell me that a 2ghz P4 isn't adequate for browsing the web, listing to mp3s and importing digital photos?!
      
      Parent Share
      twitter facebook
      - Re:Mistake (Score:5, Interesting)
        
        by TheNetAvenger ( 624455 ) writes: on Wednesday February 09, 2005 @05:04AM (#11616379)
        
        A budget-class PC laptop of that time might have been about 900 MHz to 1.1 GHz. I wouldn't consider such a laptop anything near useable. They tended to have poor quality sound systems that bottlenecked the processor and atrociously short battery times. The ibook was legendary for its excellent battery performance
        
        Get off what you 'assume', assumption is just intuition for idiots.
        
        We have test 200mhz laptops with 80mb of ram 5gb hard drives, released 1997 all running WindowsXP Professional (yes even the themes turned on) and they benchmark faster than they did when they shipped with Windows 95.
        
        Secondly, they can do full 30fps video as long as it is uncompressed AVI or even WMA 9. QuickTime (MPEG4), MPEG2, and real stutter horribly on video playback unfortunately.
        
        As for battery, don't know, these laptops hold for 3hrs with a single charge, and yes techs are REQUIRED and have no problems using them daily in test scenarios.
        
        Now if you really want to compare laptops to laptops, why don't I show you our 900mhz AMD Compaq laptops, they have JBL sound systems in them, and there isn't a single feature the cannot perform with the exception of running a T&L based video game, as the integrated video doesn't handle it, oh wait, the 900mhz PowerBook video didn't support such features either. (BTW, This is not to say that there are not several 900-1000mhz class laptops that have upper end video features), I am just using what we have in our test labs for comparison.
        
        The 900mhz laptop has a DVD/CDRW, came out late 2000 early 2001 (trying to remember if we got them before holidays or not). They do full software DVD decoding with less than 20% CPU utilization and pretty much do anything fairly fast that we through at them. We even have a beta version of Windows 2003 server running on one with 256mb of RAM. (Yes we are always pushing the limits, but it works as fast as the WindowsXP pro version of the machine sitting next to it.)
        
        Now off my rant... Macs truly are great, and the PowerBooks of the time were great, but that DOES NOT MEAN they were the BEST, WILL ALWAYS BE THE BEST, or you should be complacent listening to Apple tell you what you are getting is the best when it might not be. It is time for us as MAC users to stand up and DEMAND that technology becomes as much a part of what a MAC is as the EASE of USE in the Interface.
        
        The time is now, we need to STOP accepting what they tell us and give us and force them to truly give us the LATEST technological concepts, not just the above average concepts when compared to the PC world. These are Macs, they SHOULD BE BETTER. IT shouldn't even be subjected to a debate they should be so far advanced a debate should not be possible. PERIOD.
        
        Sadly, it just isn't true now, and has not been for many years. OSX has giving the Mac world some credibility backing OS technology, but not Apple needs to take Macs to the next level.
        
        Even if my comment inspires one Mac user to say hey Apple, we want better, then maybe we all can be the symbolic person with the hammer from their 1984 video and WAKE THEM UP this time.
        
        Parent Share
        twitter facebook
        
        Re:Mistake (Score:2)
        
        by marcello_dl ( 667940 ) writes:
        
        Now off my rant... Macs truly are great, and the PowerBooks of the time were great, but that DOES NOT MEAN they were the BEST
        
        At the time there was no sony vaio, so the powerbook titanium was the smallest laptop around. It also had optional wireless and standard firewire and gigabit ethernet built in. Os 10.1 was a bit lacking but i'd take it over whatever windows version any day (i tried 98 2000 and xp home)
        
        I'd say it was the best.
My article on the new cell processor: (Score:3, Insightful)

by tod_miller ( 792541 ) writes: on Wednesday February 09, 2005 @01:46AM (#11615692) Journal

I want 2 of them, yesterday.

Aside from my own (competent) review of the cell processor, the article possibly the most insightful and technically nicely balanced articles posted on slashdot in a long while!

I'll cover more of the Cell's basic architecture, including the mysterious 64-bit POWERPC core that forms the "brains" of this design.

Looking forward to that... I think that many people will be moving to Mac ... on cell... likely?

Share
twitter facebook
Part II is up now (Score:5, Informative)

by Anonymous Coward writes: on Wednesday February 09, 2005 @01:50AM (#11615715)

Part II is up [arstechnica.com] as well.

Share
twitter facebook
Like having a whole Beowulf Cluster on one chip... (Score:3, Funny)

by ABeowulfCluster ( 854634 ) writes: on Wednesday February 09, 2005 @01:52AM (#11615728)

.. made of risc components.

Share
twitter facebook
Workstation? (Score:5, Interesting)

by jericho4.0 ( 565125 ) writes: on Wednesday February 09, 2005 @01:56AM (#11615746)

From this site [itjungle.com] and others..
" Last fall, IBM and Sony said they were developing a workstation based on Cell chips, which is the first product IBM will ship based on Cell."
Regardless if this is the first product shipped or not, a workstation is coming. I can't see it running anything but linux. Given the mass market targeting of the cell, I hope Sony makes a strong go at grabbing the market with cheap hardware, rather than trying to milk the high-end content creation market first.

Share
twitter facebook
- Workstation?-Cell Wars. (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  Linux on Intel: Think Dead Man Walking [technewsworld.com] and Grid vs. SMP: The Empire Tries Again [technewsworld.com] and Fast, Faster and IBM's PlayStation 3 Processor [technewsworld.com].
- Re:Workstation? (Score:3, Interesting)
  
  by node 3 ( 115640 ) writes:
  
  I can't see it running anything but linux.
  
  OS X is another strong possibility. Sony's President was recently on stage with Steve Jobs at the Macworld Expo, hinting at working with Apple in the future. A recent slashdot story linked to an article which states that 3 PC manufacturers have been begging Apple to license OS X to them. I'll bet Sony was one of them, and IBM would also be a logical suitor.
  
  Since OS X is essentially NEXTSTEP 6, and the Cell workstations would be great for science or 3d, OS X is a
  - Re:Workstation? (Score:4, Insightful)
    
    by TheRaven64 ( 641858 ) writes: on Wednesday February 09, 2005 @06:26AM (#11616629) Journal
    
    The last time Apple tried licensing the OS, it almost killed them. They licensed it completely indiscriminately and lost out at the low end because clones were built using cheaper components and at the high end because SMP clones were cheaper. Licensing to Sony or IBM remains a possibility if the licensing agreement contained some kind of non-competition clause - Apple primarily target the home user, and so would be happy to let IBM have the corporate market if it meant paying them a royalty on every sale and a whole load of free publicity for OS X.
    Apple at the moment is two companies. One is primarily a computer hardware company that makes software to drive hardware sales and sells the entire package as user experience. The other is a consumer electronics company. Last year, the profits made by both companies were about the same. Whether they wish to transition to being a software and consumer electronics company that also makes some niche hardware is a decision they will have to make.
    
    Parent Share
    twitter facebook
- A real supercomputer chip that CELL copied (Score:2)
  
  by zymano ( 581466 ) writes:
  
  Stanford professor Dally's stream processor. [weblogs.com]
  
  It's almost just like Cell but has onchip memory to solve the bandwidth problem.
  
  Dally worked for Cray and mentioned that todays supercomputers are not efficient.
- - When hell freezes over (Score:2)
    
    by yem ( 170316 ) writes:
    
    Hey "never say never", but I don't see Microsoft (xbox2) porting/releasing ANY Windows technology on Sony (ps3) hardware any time soon. The Xbox2/PS3 showdown is going to be the biggest thing since, well, Xbox/PS2.
- - Re:Workstation? (Score:5, Insightful)
    
    by PurpleFloyd ( 149812 ) writes: <.zeno20. .at. .attbi.com.> on Wednesday February 09, 2005 @05:02AM (#11616376) Homepage
    
    The Cell workstation in question is not a home/office computer; not running Linux because it's hard to install or a scanner won't work is not an issue. The workstation is closer to a Sun or SGI system - very expensive, and faster than almost anything in the x86 world.
    The target market is not home users but rather scientists, animators, engineers, and others who need raw power and aren't concerned with the fact that Word won't work on it; many customers will probably have a cheap PC sitting next to it for office tasks, freeing up the workstation to do nothing but grind through computations. In this world, various unicies are the only serious choice; SGIs run IRIX or Linux, Suns run Solaris or Linux, and IBMs run AIX or Linux.
    Take into account IBM's commitment to Linux, and the fact that many of their customers already use it, and it's almost certain that Linux will be a major OS choice for Cell workstation customers, particularly those working in a mixed-architecture environment. While it's likely to run AIX and a Windows port is possible, it's almost certain that a majority of Cell workstations will be running Linux.
    
    Parent Share
    twitter facebook
More info in these slides (Score:5, Interesting)

by Namarrgon ( 105036 ) writes: on Wednesday February 09, 2005 @01:58AM (#11615757) Homepage

Scroll down a bit here [impress.co.jp], there's some more tasty tidbits.
e.g. 234 M transistors [impress.co.jp] (!) That's why I don't think this will be replacing the G5 any time soon. The die size (at the current prototype's 90nm) is over 200 mm2.
It'll have to get a fair bit smaller/cheaper before the PS3 can use it without major subsidies, and I don't know why they think general consumer devices will want it. God knows how much power it dissipates with all 8 SPEs clocking over at 4 GHz...

Share
twitter facebook
- Re:More info in these slides (Score:2)
  
  by aralin ( 107264 ) writes:
  
  What is the problem here? 200 mm square is a little over half inch by half inch.
  - Re:More info in these slides (Score:4, Informative)
    
    by WoTG ( 610710 ) writes: on Wednesday February 09, 2005 @05:20AM (#11616438) Homepage Journal
    
    In CPU sizes, 200mm is pretty big. IIRC, newer Athlons bump around 100mm depending on the cache size. P4's are somewhat larger than the Athlons. Bigger chips use more material and fab space, plus, the defect rate rises (it only takes a single error in a critical part of the chip to ruin it).
    
    Parent Share
    twitter facebook
- Re:More info in these slides (Score:2, Insightful)
  
  by Effugas ( 2378 ) * writes:
  
  No subsidies required. PS3 will sell enough to write its own ticket. No need to hope others pick up the slack.
  - Re:More info in these slides (Score:2)
    
    by Namarrgon ( 105036 ) writes:
    
    Not if its CPU costs twice as much to manufacture as e.g. a $300 Pentium 4 CPU. Would you pay $600+ for a PS3?
    It'll have to be shrunk to 65nm before it can hope to be competitive.
  - Not if the CPU is too expensive. (Score:3, Informative)
    
    by Sycraft-fu ( 314770 ) writes:
    
    New consoles are sold at a loss, but there's a limit to how muc of a loss companies can take. If the CPU itself ends up costing Sony $300+, they'd be looking at a massive loss on the consoles, probably larger than they are willing to take. That was actually a noted problem with the X-box, the loss per unit was large so they had to sell quite a few games per unit to make it up. I'm not even sure if they made any money on it.
    
    Well, in MS's case, they can pull shit like that. Microsoft makes loads of cash off
- Re:More info in these slides (Score:3, Informative)
  
  by i41Overlord ( 829913 ) writes:
  
  The reason it has so many transistors is because of the amount of onboard memory. Memory uses a lot more transistors than the logic circuits do.
  
  A complicated CPU may have tens or hundreds of millions of transistors, but a single memory chip has billions.
  
  So when you bump up the cache size on a CPU, the transistor count goes up greatly.
- - Re:If Sony can, Apple can (Score:3, Insightful)
    
    by Namarrgon ( 105036 ) writes:
    
    If Sony can fit it in a console and sell a hundred million of them in a year, I'm sure Apple can...
    Sony may be able to do that with the 65nm final design, when it arrives some time in 2006. Then we'll see.
    Even then, there are other considerations that may make it a less-than-ideal fit for a general purpose computer - all those vector units are great for number crunching, but how much of that do you do each day? And when you're not, that's 3/4 of the cost of your chip sitting around idle. There are more
    - Re:If Sony can, Apple can (Score:4, Insightful)
      
      by TheRaven64 ( 641858 ) writes: on Wednesday February 09, 2005 @06:32AM (#11616647) Journal
      
      Hannibal believes the weak VMX implementation will be a major downside for Apple.
      I am not convinced by this argument. A lot of OS X code uses AltiVec, but very little actually uses it directly. Apple has spent a lot of effort producing libraries that people can use which wrap AltiVec into something higher level (e.g. QuickTime, vDSP). Most of these could potentially be ported to the SPEs. Things like CoreVideo could also make use of the SPEs.
      all those vector units are great for number crunching, but how much of that do you do each day? And when you're not, that's 3/4 of the cost of your chip sitting around idle.
      90% of the time, my 1.5GHz G4 is sitting at 20% utilisation or less. You could argue that 80% of the power of the chip is wasted. However, when I am doing things that tax it they are almost always things that would support a large degree of parallelism.
      
      Parent Share
      twitter facebook
      - Re:If Sony can, Apple can (Score:3, Informative)
        
        by mrseigen ( 518390 ) writes:
        
        XCode 2.0 is actually supposed to automatically "vectorize" programs for better optimization with altivec (check the Tiger page for it).
Love those architectural articles (Score:5, Funny)

by hurtfultater ( 745421 ) writes: on Wednesday February 09, 2005 @02:02AM (#11615778)

Thank god. I've enjoyed his articles in the past, and if experience is any indication, I will have the false impression that I understand this stuff in a nontrivial way for up to three hours. This is not meant to rag on Hannibal, BTW.

Share
twitter facebook
Hannibal (Score:5, Funny)

by ndogg ( 158021 ) writes: <the@rhorn.gmail@com> on Wednesday February 09, 2005 @02:08AM (#11615801) Homepage Journal

WIth a name like that, I expect to see pictures of him eating those Cell processors, and describing how they taste.

Share
twitter facebook
- Re:Hannibal (Score:5, Funny)
  
  by NonSequor ( 230139 ) writes: on Wednesday February 09, 2005 @02:44AM (#11615941) Journal
  
  With a name like that, I expect to see him crossing the Alps on an elephant to invade Italy.
  
  Parent Share
  twitter facebook
  - Re:Hannibal (Score:5, Funny)
    
    by Hannibal_Ars ( 227413 ) writes: on Wednesday February 09, 2005 @09:44AM (#11617250) Homepage
    
    Actually, I was crossing the Alps and got stranded, and had to eat my elephants to survive.
    
    Parent Share
    twitter facebook
- Re:Hannibal (Score:2)
  
  by jd ( 1658 ) writes:
  
  Actually, there are people who do eat ground-up computers. There's one guy who has even eaten an entire USAF jet aircraft.
  - - Re:Hannibal (Score:3, Informative)
      
      by cooldev ( 204270 ) writes:
      
      What are you talking about? Obviously you can't eat a jet aircraft.
      
      Zen, your Google-fu is weak: http://en.wikipedia.org/wiki/Michel_Lotito [wikipedia.org] :)
      Lotito's performances are the consumption of metal, glass, rubber and so on in items such as bicycles, televisions, a Cessna 150, and smaller items which are disassembled, cut-up and swallowed. The aircraft took roughly two years to be 'eaten' from 1978 to 1980. He began eating unusual material while a child and has been performing publicly since 1966.
- Re:Hannibal (Score:2)
  
  by aztektum ( 170569 ) writes:
  
  I instantly thought The A-Team, who's "leader" was Jon "Hannibal" Smith, played by the late George Peppard.
- Hannibal was the greatest general of his era (Score:2)
  
  by Mongoose ( 8480 ) writes:
  
  If you read books, or at least watch the history channel... then you would know Hannibal was the man that brought elephants over the Alps and routed and slaughtered Romans by the ten of thousands using but swords and spears. The only thing people dis him about was his one mistake to not go ahead and take Rome -- instead of giving them the chance to surrender. Hell, if lack of perfection is your only flaw that's a hell of a compliment.
  
  He also was a great politician after the Tunic wars.
  - Re:Hannibal was the greatest general of his era (Score:2)
    
    by rxmd ( 205533 ) writes:
    
    He also was a great politician after the Tunic wars.
    
    Make that the Punic wars: Punic = Phoenician = Carthagian, and Hannibal was from Carthago. Even for the Romans, a tunic was nothing to war over ;)
- Re:Hannibal (Score:2)
  
  by FleaPlus ( 6935 ) writes:
  
  He had its DSP core with some fava beans and a dab of Arctic Silver.
- Re:Hannibal (Score:2)
  
  by screwballicus ( 313964 ) writes:
  
  It depends on whether there's a current running through it, in my experience. Very distinctive savour, at peak usage.
iCell? (Score:2, Interesting)

by mpesce ( 146930 ) writes:

Although the article (which is quite clear) indicates that the AltiVec architecture is closer to G4 than G5, won't the speed increase of having 8 fully-parallel processors (9 if you count the main CPU) more than make up for the issues associated with the loss of the G5's advanced features? It seems to me that this is a natural for Apple - it will give them a 5x - 10x performance boost over anything that's on the drawing boards over at Intel.

Even so, I doubt we'd see Cell-based Macs until at least 2007 -
- depends on application (Score:2)
  
  by Dink Paisy ( 823325 ) writes:
  
  How well you can use 8 dsps really depends on your code. I'd guess in most cases the answer is no, you can't use the vector units to make up for the lost performance of the main core. If you got effective use of VMX, then you might be able to, because easily vectorizable calculations should be possible to port to the dsps more often than most code.
  With the low performance PPC cpu, I doubt Apple will want these things. Apple has too much interest in the general purpose computer market to care much about so
  - - Re:depends on application (Score:2)
      
      by mrchaotica ( 681592 ) writes:
      
      And even if it was too slow for general-purpose tasks, it would make one hell of a good co-processor.
      
      I wonder how the cell architecture comapares to ATi and nVidia GPUs?
- Re:iCell? (Score:2)
  
  by Namarrgon ( 105036 ) writes:
  
  It seems to me that this is a natural for Apple - it will give them a 5x - 10x performance boost over anything that's on the drawing boards over at Intel.
  That's a theoretical performance boost. Few apps will be able to take full advantage of 9 simultaneous processors, even after being coded with specific support for it. Still, it'd give a nice speedup to a couple of specific Photoshop tasks, and that's all you need to feed the Jobs RDF. If a Pentium III can speed up the Internet, then why not?
  wouldn't i
  - - Re:iCell? (Score:2)
      
      by Namarrgon ( 105036 ) writes:
      
      The software might run without change on another Cell, but you still have to copy across all the state data - no getting around that, unless you want to manage shared access (across ethernet?). And migrating code to another Cell doesn't just happen automatically; it takes quite a lot of system support to migrate processes to different machines, not something you find often in a Mac or PC, let alone a console or a TV.
      The primary advantage of consoles (to developers) is that they are a uniform environment,
      - Re:iCell? (Score:2)
        
        by Namarrgon ( 105036 ) writes:
        
        Details, indeed... The idea of transparent process migration to entirely different machines is not new; some mainframe systems have been doing it for maybe 30 years. It's not so much a function of the hardware (though it helps a lot if the systems are identical), it's primarily the OS that manages this - and it involves moving not just code but all the code's data, its entire local state, across to the new machine. The process may be unaware of this, but it can still take a hefty amount of time and/or bandw
- - Re:iCell? (Score:2)
    
    by TheSunborn ( 68004 ) writes:
    
    The only flaw in that conclusion is that altivec does NOT support double-precision operations at all.
How do I code this thing?? (Score:4, Interesting)

by MagikSlinger ( 259969 ) writes: on Wednesday February 09, 2005 @02:14AM (#11615827) Homepage Journal

The one thing I don't understand is how I would code for this thing. As best as I understand it, I now have some instructions for controlling the cache (or LAM, whatever) which sounds cool, but are there any details yet of how I'd write code for this? I'm also disappointed that the article didn't explain how one would use their SIMD instructions if they aren't using any of the existing standards. So I load my vectors with the cache control and ask the processors to ever so kindly add them?

Anybody out there with experience on this architecture or even attended the presentation itself can give us mere coders details? Preferably a website.

Share
twitter facebook
- Re:How do I code this thing?? (Score:2)
  
  by gorim ( 700913 ) writes:
  
  Since Toshiba is part of the collaboration, it is quite possible that the Cell's vector units are based on, and improved versions, of the PS2's vector units. Certainly the information I have seen so far hasn't led me to believe it was unlikely.
  
  Check out his earlier articles on the PS2 architecture to learn more about those vector units.
- Re:How do I code this thing?? (Score:5, Informative)
  
  by Space cowboy ( 13680 ) * writes: on Wednesday February 09, 2005 @02:49AM (#11615958) Journal
  
  The architecture of the Cell look like a much-improved PS2 system, with the PS2's vu0 and vu1 (vector units 0 and 1) replaced by 8 SPE's. Also, the programmable DMA (with chaining ability, allowing it to sequence multiple DMA events one after the other etc.) looks very similar to the PS2's.
  
  If that turns out to be the case, then PS2 programming is a hint towards how it'll work. On the PS2, you generally configured the DMA controller to upload mini programs to the vector units, then DMA-chained data as streams from RAM through the just-uploaded program and onto the destination (usually the GS which rasterised the display).
  
  On the Cell, it looks as though you can DMA-chain code & data through multiple SPE's and ultimately back to RAM/the PPC core/whatever is memory mapped. This is cool - it's software pipelining :-)
  
  So, my guess is that the PPC acts as a (DMA, IO, etc.) controller (much like the mips chip did in the PS2), and the heavy lifting goes on in the vector units, with code and data being streamed in on demand.
  
  It's a different model to normal programming, and as far as I can see it encourages you to be closer to the metal (ie: it's harder, I normally expect my L1 cache to take care of itself...), but assuming they release/port gcc for the SPE's, it might not be too hard if you're used to event-driven highly-threaded programming. Let's just hope they release a Linux port and 'vcl' so we can do something useful with the vector units...
  
  Oh, and if the xbox was a target for a self-hosting linux solution, I think the Cell will be irrestible :-)
  
  Simon
  
  Parent Share
  twitter facebook
  - Re:How do I code this thing?? (Score:3, Insightful)
    
    by grammar fascist ( 239789 ) writes:
    
    If that turns out to be the case, then PS2 programming is a hint towards how it'll work. On the PS2, you generally configured the DMA controller to upload mini programs to the vector units, then DMA-chained data as streams from RAM through the just-uploaded program and onto the destination (usually the GS which rasterised the display).
    
    Sounds a lot like pixel/vertex shaders. Is this how we're going to get around all our bandwidth problems now? Slice up our programs into little independent fragments and upl
  - - Re:As a total Cell/PS2-coding n00b... (Score:4, Informative)
      
      by Herbmaster ( 1486 ) writes: on Wednesday February 09, 2005 @11:35AM (#11618225)
      
      [Re: any given for loop being parallelizable]
      
      A fair question, but no. Consider for example an iterative factorial agorithm:
      
      for (i=1;i<n;i++) { m = m * i; }
      
      Totally unparallelizable.
      This is a case where to execute the next step, you absolutely need the results of the previous step to be completed. There can be other kinds of reasons for this:
      
      for (i=0;i<n;i++) { i = f(i); }
      
      In this case you don't even know how many times the loop is going to execute in advance. Now, maybe if you're clever you can figure it out, but what if f() is return (rand() * i);? Ick.
      To make matters worse, C lets you use pointers and do whatever you want. So given some set of instructions, there could be side affects on i (or n) that are totally unpredictable without executing the program.
      What you're looking for - the problem I'm describing - is not a problem with gcc. It's a problem with the C language. If you want to get rid of side-effects and make parallelization easy, try using a pure functional language. But people don't like programming in pure functional languages (well, I don't), they like programming in C (or other procedural-style language).
      
      Parent Share
      twitter facebook
- Re:How do I code this thing?? (Score:5, Informative)
  
  by adam31 ( 817930 ) writes: <adam31NO@SPAMgmail.com> on Wednesday February 09, 2005 @03:13AM (#11616059)
  
  This is similar to the 'scratchpad' RAM that Sony used in the PS2 and PS1. It's 16kb of on-chip (super-fast) memory that can be loaded and manipulated by the programmer, completely separate from the jurisdiction of the cache (which can cause big headaches-- think cache writeback with stale data).
  We'd do our skeletal animation skinning with this. DMA a bunch of verts to scratchpad, transform and weight them on the VU, DMA back to a display list. The thing is, there's really no high-level language support for this... the onus is on the programmer to schedule and memory map everything, mostly in assembly.
  The design of the cell-- it's incredible. It's every game programmer's wet dream. I just don't see how it's going to be as useful in other areas though. It's going to be a compiler-writer's nightmare, and to get real performance frome the SPEs is going to take a lot of assembly or a high-level language construct that I haven't seen yet.
  
  Parent Share
  twitter facebook
- Re:How do I code this thing?? (Score:5, Interesting)
  
  by fuzzbrain ( 239898 ) writes: on Wednesday February 09, 2005 @05:21AM (#11616442)
  
  I don't have much experience or knowledge but there was an interesting article [www.gotw.ca] the other week about how the next revolution in programming languages will be a turn towards concurrency:
  
  "Starting today, the performance lunch isn't free any more. Sure, there will continue to be generally applicable performance gains that everyone can pick up, thanks mainly to cache size improvements. But if you want your application to benefit from the continued exponential throughput advances in new processors, it will need to be a well-written concurrent (usually multithreaded) application. And that's easier said than done, because not all problems are inherently parallelizable and because concurrent programming is hard."
  
  Obviously, it's not clear whether this is directly relevant to cell processors, but I think it's at least of passing interest. It's also worth considering whether concurrency-oriented languages like Erlang and Oz could become more important with these sorts of processors (not for games but possibly for scientific work).
  See also the discussion [lambda-the-ultimate.org] of this article on Lambda [lambda-the-ultimate.org].
  
  Parent Share
  twitter facebook
- Re:How do I code this thing?? (Score:2)
  
  by gl4ss ( 559668 ) writes:
  
  with a lot of blood.
  
  anyhow.. you don't probably have to sweat over it.. not that likely to be that open for just anybody(it's going to be a nightmare though.. as they've pretty much hinted that it's up to the programmer to keep the 8 apu's occupied).
- Re:How do I code this thing?? (Score:3, Insightful)
  
  by TheRaven64 ( 641858 ) writes:
  
  First, you will use a language that supports a vector type. The languages used for GPU programming do, and there is a vector extension to C supported by GCC. You will write code that manipulates vectors instead of scalars. And that's about it. You try to keep your working set small, and your compiler will try to fit in the local memory.
The real value of the x86 (Score:5, Insightful)

by argoff ( 142580 ) writes: on Wednesday February 09, 2005 @02:16AM (#11615833)

Is that the 386 instruction set and arcitecture is so non proprietary. What made it so popular certainly wasn't that it was better. If I had the dough, I can literally make one and my own fab without asking a single soul. Alot of times it seems companies try to gather into consortiums to mimic the same effect and gather market momentum, but these are doomed to failure because the more valuable the technology becomes - the greater the pressure to diferentiate and fence off some "teritory" for themselves. We saw this happen first hand with UNIX, where all the flavors would constantly try to group under these unified standards - and they made little progress until Linux came along. The CPU world needs somthing similar to protect people from patent harassment. for design, cores, and fabrication.

Share
twitter facebook
- Perhaps I don't quite understand (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  Who would conceivably have enough money to build microchip fabrication facilities but not enough money to license the powerpc architecture? [google.com]
  
  "Reverse engineered implementations exist" is not really much of a meaningful strength if you don't own one such reverse engineered implementation already. You say you can potentially build a 386 chip fab, but the thing is you aren't going to build a 386 chip fab, you're going to just keep on buying Intel and AMD chips, the only noteworthy people currently making x86 c
- Re:The real value of the x86 (Score:4, Insightful)
  
  by jd ( 1658 ) writes: <imipakNO@SPAMyahoo.com> on Wednesday February 09, 2005 @03:09AM (#11616039) Homepage Journal
  
  True, but at the time it came out, Intel did everything short of pay the US Govt. to take the clone manufacturers out with tac nukes.
  
  As I recall, at the time, there were lawsuits aplenty by Intel, claiming microcode copyright violations for the most part. The majority of clone makers, though, were making money off the maths co-processor, as Intel's 387 sucked. It was the slowest out there, expensive, with only eight entries on a linear stack.
  
  By moving the coprocessor into the main CPU, Intel tried to destroy clone makers. Anyone who made just 386 clones or 387 clones would be out of business, and those who made both would be years behind combining them on the same die.
  
  Well, history shows that far fewer clone makers existed in the 486 era. Wonder why. But even that wasn't apparently good enough, with Intel trying to claim the chip ID was trademarked. The courts threw that one out, which is why Intel switched to using names. You can't trademark a number.
  
  The Pentium also took some time to clone. No, not because of all the random bugs in the design, but because that's when Intel switched to a hybrid RISC/CISC design. Although it seems to have largely been a cosmetic change, to cash in on the massive publicity surrounding RISC designs at the time, it did put up a major challenge to clone makers, who - for the first time - couldn't just throw the chip together half-assedly and hope to be an order of magnitude faster than Intel.
  
  Intel DID do a few things, around this time, that were puzzling. Their 486DX-50 was never clock-doubled or clock-quadrupled, the way the DX-33 was. The DX-50 placed far higher demands on the surrounding components, true, but it also gave you higher real-term performance than the DX2-66, because the DX2 wasn't able to drive anything any faster than the DX-33. All it could do was run those instructions it had a little faster.
  
  Intel are still playing these numbers games, which is why their multi-gigahertz processors aren't noticably any faster. The bottleneck isn't in the computing elements, so faster computing elements won't make for a faster chip.
  
  IBM's "cell" design seems to be working much more on the bottlenecks, which means that GHz-for-GHz, they should run faster than Intel's chips for the same tasks.
  
  I think IBM could go further with their design - I think they're being far more conservative than they need be. When you're working in a multi-core environment, you don't always want all parts of the CPU to be in lock-step. It's not efficient to force things to wait, not because of anything they are doing but because some totally unrelated component works at a certain speed and no faster.
  
  It would make sense, then, for the chip to be asynchronous, at least in places, so that nothing is needlessly held up.
  
  However, I can easily imagine that a hybrid synchronous/asynchronous chip that is already a hybrid multi-core DSP/CPU would be a much harder sell to industry, so I can see why they'd avoid that strategy. On the other hand, if they could have pulled that off, this could have been a far more amazing press release than it already is.
  
  Parent Share
  twitter facebook
  - Re:The real value of the x86 (Score:2)
    
    by jmv ( 93421 ) writes:
    
    Well, history shows that far fewer clone makers existed in the 486 era.
    
    I don't quite remember how it was before, but for the 486, AMD had a pretty good clone that was (as far as I remember) both faster and cheaper.
- Sparc is open too (Score:2)
  
  by anpe ( 217106 ) writes:
  
  The SPARC V8 spec is open, there's also an open source implementation: the Leon [gaisler.com] and it's supported by Linux.
Future compatibility (Score:2)

by ndogg ( 158021 ) writes:

Pattnaik said that if IBM were to publish the detailed monitoring information for end users to access, then the company would feel obliged to maintain backwards compatibility in future iterations, and so they'd be limited in the changes they could make to the scheme.

If I were IBM, I'd publish such specs anyway, alongside letting the press know very loudly and clearly that developers should stick to the recommended API if they want any guarantee of future compatibility. OTOH, I do understand their reasoni
- Re:Future compatibility (Score:2)
  
  by ndogg ( 158021 ) writes:
  
  Forget it. I read the wrong article.
Export controls? (Score:2)

by utlemming ( 654269 ) writes:

This chip seems insanely powerful. With 8 APU's capable of doing DSP, you would think that some countries would impose export restrictions on the thing. If you remember when the G4 came out Apple advertized that the military didn't want that thing leaving the country. But image a chip with the ability to do some serious SIMD operations? The CIA, NSA and others doing signal processing have to love this chip.
I understand (Score:2, Interesting)

by JeffTL ( 667728 ) writes:

that it runs at 30 watts, about like a Pentium M. And it's 64-bit. Can we say....

Dare I say....

Oh the Hell....

PowerBook G5!
- Not this year (Score:2)
  
  by Namarrgon ( 105036 ) writes:
  
  Actually, the quote was, "...it will run at 30 watts." Once it's been shrunk to 65nm, in 2006. Maybe.
  Right now, it has 4x as many transistors as a G5, runs at twice the clock speed, and likely puts out a hell of a lot more heat than a G5 does.
Power5 "lite"? (Score:2)

by ndogg ( 158021 ) writes:

As I fully expected, Pattnaik could not discuss a possible workstation-class derivative (read: Apple-oriented derivative) of the POWER5. He also made it clear that he is and has been focused on POWER5 servers only, and any hypothetical workstation-class derivative of the design would be for someone else to discuss.

I'm wondering about the feasibility of such a processor. This design seems to be rather heavily dependent upon the specific design of the OS (namely AIX in this case), and it seems to me that a
- Oops (Score:2)
  
  by ndogg ( 158021 ) writes:
  
  I commented on the wrong article.
Not useful for scientific computing (Score:5, Interesting)

by renoX ( 11677 ) writes: on Wednesday February 09, 2005 @02:44AM (#11615938)

What I find interesting is that the vector processor are restricted to single precision floating point calculations.
This isn't terribly useful for scientific computations (there is the same problem with the GPU): currently the IEEE is working on a standard for 128bit precision floating point calculations!

Of course for 3D, video and sound, 32bit precision is good enough and *if* programmers (a big if) manage to overcome the pain of 'parallel programming' then it could be a big success.

Share
twitter facebook
- Re:Not useful for scientific computing (Score:3, Informative)
  
  by marcoz76 ( 201101 ) writes:
  
  SPEs (CELL SIMD processors..) have double precision units! IBM will discuss DP units for CELL today or tomorrow at ISSCC.
- - Re:Not useful for scientific computing (Score:3, Insightful)
    
    by taniwha ( 70410 ) writes:
    
    the problem is that a multiplier's size is proportional to roughly the square of the things being multiplied - assuming the 64 fp's mantissa is twice the size of a 32-bit one it's going to take 4 times the area (or twice the area of a pair of them) and of course it will eat into your cycle time (both in gates and in wire delay)
Cellection? (Score:2)

by Doc Ruby ( 173196 ) writes:

But does it run gcc? Or even have a cross-compiler target module? Will gcc become smart enough to emulate some of the SIMD techniques in my regular C++ code, even when I write the same old patterns?
- Re:Cellection? (Score:3, Interesting)
  
  by Screaming Lunatic ( 526975 ) writes:
  
  Autovectorization is planned for GCC 4.0.
  gcc autovectorization page. [gnu.org]
similar technology... (Score:4, Informative)

by morcheeba ( 260908 ) writes: on Wednesday February 09, 2005 @02:55AM (#11615989) Journal

Cradle Semiconductor has been working for a while on a similar technology [cradle.com].

Of course, it's all a matter of scale - TI had a 4 DSP, 1 CPU [ti.com] processor a while ago, but it only made 100 MFLOPS. Cradle's first product has 8 DSPs and 6 CPUs - depending on if you can get your data to properly pipeline through the processors, you can achieve up to 3.6 GFLOPs peak with only a 230 MHz clock.

Share
twitter facebook
Golden oppourtunity for L4/Hurd (Score:3, Interesting)

by The_Dougster ( 308194 ) writes: on Wednesday February 09, 2005 @03:19AM (#11616078) Homepage

This arch is still a baby and this would be a great time for L4/Hurd to latch onto this processor. There is already a L4 PowerPC/64 port in some kind of development stage, and the very first platform is likely to be a PS/3 with somewhat fixed hardware specs. Marcus et. al. were discussing today something and they mentioned that there is nobody working on the driver interface for L4/Hurd yet.
Hurd might be an interesting candidate for running on Cell because of the highly threaded design. Hurd servers might be able to swap in and out of cells as they require cycles. It seems a good match; i.e. L4 runs in the main core, and various translators and other processes run on the cells. If a cell could be programmed to run the filesystem, for instance, it would totally free up the core for other business.
Because the PS/3 will have a highly fixed hardware set, implementing a minimal driver set might be feasible given enough reverse-engineering effort.
I'm not saying that L4/Hurd will kick the nuts off of Linux on an Opteron, I'm just noting that it might be pretty cool to experiment with Hurd on Cell technology. The L4/Hurd team is real close to getting the last peices in place to compile Mach based Hurd under L4, and if you ever tried Debian GNU/Hurd, you know its pretty near feature-complete and a pretty neat system to run. The next task for L4/Hurd is a driver infrastructure, and it might be wise to look at what Cell is bringing to the table before it gets too far along. Know what I mean.

Share
twitter facebook
- Re:Golden oppourtunity for L4/Hurd (Score:2)
  
  by idiotnot ( 302133 ) writes:
  
  Marcus et. al. were discussing today something and they mentioned that there is nobody working on the driver interface for L4/Hurd yet.
  
  Like everything else with the Hurd, it'll come in time. I'd do something with it, but I don't have a clue as how I'd write a device driver, much less an interface for one.
  
  It seems a good match; i.e. L4 runs in the main core, and various translators and other processes run on the cells. If a cell could be programmed to run the filesystem, for instance, it would totally f
  - Re:Golden oppourtunity for L4/Hurd (Score:4, Informative)
    
    by The_Dougster ( 308194 ) writes: on Wednesday February 09, 2005 @07:05AM (#11616745) Homepage
    
    Like everything else with the Hurd, it'll come in time. I'd do something with it, but I don't have a clue as how I'd write a device driver, much less an interface for one.
    Likewise. I'm in kind of a strange position as I am keenly interested in stuff like this, yet this really isn't my personal genre.
    The L4/Hurd guys are talking about "Deva" which is their vaporous specification for a driver interface. Since Hurd's drivers are all userland, this specification which nobody is working on is probably one of the most important things in the development of computer science right now. Hell, I should go back to university and take some classes so I could work on it. Talk about making history.
    Slashdotters constantly bitch and moan about how slow Hurd's progress has been, but all they have to do is send in a patch or write a doc or something. I personally ported GNU Pth to Hurd some years back making me (in my mind) one of the first people to ever compile and run a pthread app on Hurd (slooooowww). Hehe, but I did make pseudo-history in the world of computer science because of that stupid couple days I spend fiddling around with autoconf.
    L4/Hurd development is total anarchy. Work on whatever you feel like and send in patches. You don't have to "join GNU" or any such nonsense. In fact I have never ever seen RMS post to any Hurd developer list ever. He's more likely to post here.
    Slashdotters seem to think that Hurd is RMS's little empire, but in fact he has about nothing to to with it. Marcus Brinkman right now is probably the unofficial leader of Hurd just because he has personally written most of the really hardcore stuff.
    
    Parent Share
    twitter facebook
No CELL for Macintosh... (Score:2, Redundant)

by dtjohnson ( 102237 ) writes:

In part II, he writes:

"Finally, before signing off, I should clarify my earlier remarks to the effect that I don't think that Apple will use this CPU. I originally based this assessment on the fact that I knew that the SPUs would not use VMX/Altivec. However, the PPC core does have a VMX unit. Nonetheless, I expect this VMX to be very simple, and roughly comparable to the Altivec unit o the first G4. Everything on this processor is stripped down to the bare minimum, so don't expect a ton of VMX performance
Digital Rights Management (Score:5, Interesting)

by wakejagr ( 781977 ) writes: on Wednesday February 09, 2005 @03:31AM (#11616116) Journal

Another article on the Cell design at http://www.theregister.co.uk/2005/02/03/cell_analy sis_part_two/ [theregister.co.uk] seems to indicate that there is some sort of DRM built in.

The Cell is designed to make sure media, or third party programs, stay exactly where the owner of the media or program thinks they should stay. While most microprocessor designers agonize about how to make memory accesses as fast as possible, the Cell designers have erected several (four, we count) barriers to ensure memory accesses are as slow and cumbersome as possible - if need be.

Hannibal doesn't say anything about this (that I noticed) - anyone have more info?

Share
twitter facebook
- Re:Digital Rights Management (Score:5, Interesting)
  
  by xenocide2 ( 231786 ) writes: on Wednesday February 09, 2005 @04:40AM (#11616310) Homepage
  
  Sounds like an enourmous misinterpretation of the concept of caching. As a multimedia programmer on the Cell, its likely you'll have sole jurisdiction over where stuff goes on your processor. Think of it like programmable cache management. Usually that's pretty stupid, because you want to write things back for longevity, but media is more transient--streams and whatnot. Barriers within that context would be cache levels.
  
  But perhaps they've got some technical details (enough that they can count distinct features) that I can't find with a basic google search on the subject. It would certainly be out of Sony's previous style, though I understand they recently pulled their heads out of their collective asses and discovered that they were selling a loose metaphor of cars and crowbars at the same time, and came out with a public apology for sucking.
  
  Parent Share
  twitter facebook
- Re:Digital Rights Management (Score:2)
  
  by KontinMonet ( 737319 ) writes:
  
  I believe it is DRMd. Makes sense from Sony's point of view. Blachford makes a brief reference to it.
Eliminating Instruction Window (Score:3, Interesting)

by ndogg ( 158021 ) writes: <the@rhorn.gmail@com> on Wednesday February 09, 2005 @03:33AM (#11616122) Homepage Journal

This RAM functions in the role of the L1 cache, but the fact that it is under the explicit control of the programmer means that it can be simpler than an L1 cache. The burden of managing the cache has been moved into software, with the result that the cache design has been greatly simplified. There is no tag RAM to search on each access, no prefetch, and none of the other overhead that accompanies a normal L1 cache. The SPEs also move the burden of branch prediction and code scheduling into software, much like a VLIW design.

Why? The reason for the instruction window was to simplify software development.

Of course, I like to play devil's advocate with myself, so I'll answer that question.

The purpose of the Cell processor is to enhance home appliances, which have a greater reliance upon low-latency than they do on precision, accuracy, and performane bandwidth. Thus, one can very safely say that the Cell processor will likely have little purpose in scientific calculations.

Share
twitter facebook
- Re:Eliminating Instruction Window (Score:4, Informative)
  
  by taniwha ( 70410 ) writes: on Wednesday February 09, 2005 @05:36AM (#11616493) Homepage Journal
  
  read it more carefully - they don't eliminate the instruction window - they set it to 2. They can decode exactly 2 instructions/clock (provided they meet some simple dependency rules between the instructions) makes for easy decode trees, fast cycle times.
  This isn't even a general purpose processor (no MMUs on the cells either in the traditional sense) nor have they gone superscalar - they have enough registers to keep the thing busy, software can figure that out - this isn't even that new an idea, a cell looks a lot like one of the media processors that was being sold 5-6 years ago
  You're right it's not designed to be a scientific processor - but then high precision scientific processing is a tiny market these days - way more people want to pay for fast gaming platforms than want to do fluid dynamics or what have you
  
  Parent Share
  twitter facebook
A proposal for Apple (Score:4, Interesting)

by Anonymous Coward writes: on Wednesday February 09, 2005 @04:08AM (#11616221)

A proposal for Apple

I don't have an account, but this is an honest idea.

Why doesn't Apple include a Playstation 2 support card into their Macintosh line?

Problem: The OSX platform has almost no games. I own several macs, I love my macs, and I sincerely enjoy OSX. But it has no games, and that will never get better, especially as simpler games migrate to the web and the complex ones bail for the console market. The PC gaming market has essentially peaked.

Solution: Embed (or include as a BTO option) a PS2 chipset to a Macintosh. Run the generated display straight through to the graphical overlay plane. Done.

Everything works. The controllers are trivially converted to use USB. The DVD drive is already there. The display is already there. The USB and Firewire is already there. The harddrive is already there. The "memory cards" are already there.

Reason: The Macintosh game library explodes instantly to encompass something like 3,000 PS1 and PS2 games. With no need for emulation, the games are guaranteed to work out of the box and provide the Apple ease of use everyone loves. Sony increases their marketshare, Apple gets a viable expanding game library, and users get a vastly better gaming experience on OSX for maybe $40 of parts and engineering.

Why won't this work?

Share
twitter facebook
- - - Re:A proposal for Apple (Score:2)
      
      by bhima ( 46039 ) writes:
      
      It's a great idea. I think the best implementation would be in the form of a PCI or PCI-X card.
      I think it would sell well.
      I don't think Sony would go for it because they would rather sell the PS2. And because Sony makes the chipset without them nothing can happen.
      Maybe after PS3 comes out... but who would want it then?
Doomed until parallel programming is common (Score:4, Insightful)

by rufusdufus ( 450462 ) writes: on Wednesday February 09, 2005 @04:19AM (#11616253)

The difference is that instead of the compiler taking up the slack (as in RISC), a combination of the compiler, the programmer, some very smart scheduling software

Requiring programmers to learn how to write parallel code that makes good use of this processor seems pretty dicey to me. Few programmers have been trained to write parallel code (most struggle with threading). The fact that no popular programming language has a good parallel model is also a big stumbling block.

This problem seems to be looming for all the dual core processors, but I havent seen a big effort to teach programmers how to adapt.

Share
twitter facebook
Top 7 Myths of the New Cell Processor: (Score:5, Informative)

by Modab ( 153378 ) writes: on Wednesday February 09, 2005 @05:56AM (#11616545)
There are so many people saying dumb things about the Cell and the upcoming PS3, I have to set some things straight. Here goes:
1. The Cell is just a PowerPC with some extra vector processing.
  Not quite. The Cell is 9 complete yet simple CPU's in one. Each handles its own tasks with its own memory. Imagine 9 computers each with a really fast network connection to the other 8. You could problably treat them as extra vector processors, but you'd then miss out on a lot of potential applications. For instance, the small processors can talk to each other rather than work with the PowerPC at all.
2. Sony will have to sell the PS3 at an incredible loss to make it competitive.
  Hardly. Sony is following the same game plan as they did with their Emotion Engine in the PS2. Everyone thought that they were losing 1-200 bucks per machine at launch, but financial records have shown that besides the initial R&D (the cost of which is hard to figure out), they were only selling the PS2 at a small loss initially, and were breaking even by the end of the first year. By fabbing their own units, they took a huge risk, but they reaped huge benefits. Their risk and reward is roughly the same now as it was then.
3. Apple is going to use this processor in their new machine.
  Doubtful. The problem is that though the main CPU is PowerPC-based like current Apple chips, it is stripped down, and the Altivec support will be much lower than in current G5s. Unoptomized, Apple code would run like a G4 on this hardware. They would have to commit to a lot of R&D for their OS to use the additional 8 processors on the chip, and redesign all their tweaked Altivec code. It would not be a simple port. A couple of years to complete, at least.
4. The parallel nature will make it impossible to program.
  This is half-true. While it will be hard, most game logic will be performed on the traditional PowerPC part of the Cell, and thus normal to program. The difficult part will be concentrated in specific algorithms, like a physics engine, or certain AI. The modular nature of this code will mean that you could buy a physics engine already designed to fit into the 128k limitation of the subprocessor, and add the hooks into your code. Easy as pie.
5. The Cell will do the graphics processing, leaving only rasterezation to the video card. Most likely false. The high-end video cards coming out now can process the rendering chain as fast as the Cell can, looking at the raw specs of 256Gflops from the Cell, as opposed to about 200GFlops from video cards. In two years, video cards will be capable of much more, and they are already optomized for this, where the Cell is not, so video cards will perform closer to the theoretical limits.
6. The OS will handle the 8 additional vector processors so the programmer doesn't need to.
  Bwahahaha! No way. This is a delicate bit of coding that is going to need to be tweaked by highly-paid coders for every single game. Letting on OS predictively determine what code needs to get sent to what processor to run is insane in this case. The cost of switching out instructions is going to be very high, so any switch will need to be carefully considered by the designer, or the frame-rate will hit rock-bottom.
7. The Cell chip is too large to fab efficiently.
  This is one myth that could be correct. The Cell is huge (relatively), and given IBM's problems in the recent past with making large, fast PowerPC chips, it's a huge gamble on the part of all parties involved that they can fab enough of these things.
Share
twitter facebook
- Re:Top 7 Myths of the New Cell Processor: (Score:3, Informative)
  
  by fitten ( 521191 ) writes:
  
  Your points #4 and #6 almost conflict...
  
  "Easy as pie."
  
  and
  
  "This is a delicate bit of coding that is going to need to be tweaked by highly-paid coders for every single game."
  
  I know that you are talking, sort of, about two different things, but they are related. While it may be "easy as pie" to add the hooks into your code to call what is essentially a library, making sure that library is scheduled, running, running in the right place and on the right data, and synchronized with everything else in the rig
  - Re:Top 7 Myths of the New Cell Processor: (Score:3, Insightful)
    
    by Modab ( 153378 ) writes:
    
    You bring up a good point. I gloss over it because the Emotion Engine would have had a bit of the same problems, yet developers eventually figured out how to use it... it all depends on the tools Sony ships to work with the platform, and also on how you view this parallel code executing.
    
    Comparing it with trying to work with threads definitely brings up nightmare conditions. But I don't think it has to be a nightmare. We use mammoth parallelization all the time and with great success. We hand off all the re
Division of labor (Score:4, Insightful)

by chiph ( 523845 ) writes: on Wednesday February 09, 2005 @10:11AM (#11617402)

Reading the article, it reminds me of the typical mainframe architecture, where you have a central supervisory CPU, but most of the specialized work is done by the channel processors.

In the Cell, the main PPC CPU appears to identify a piece of work that needs to be done, schedules it to run on a SPE, uploads the code snippet to the SPE's LS via DMA transfer, and then goes off and does something else worthwhile while the SPE munches on it. I presume there's an interrupt mechanism to let the PPC know that a SPE has some results to return.

Compiler writers ought to be able to handle this new architecture well enough -- it's sort of like the current CPU/GPU split, where you've got the main program running on the system CPU, and specialized graphical transform programlets running on the GPU. There may need to be macros or code section identifiers in the source to let the compiler know which to target for that bit of code.

Obviously, this is just the first iteration of the Cell processor. I can see them widening the SPE from single precision to double precision (for the scientific market -- the game market probably doesn't need it), and going to a multi-core design to reduce the die size.

Chip H.

Share
twitter facebook
- Re:PLEASE HELP: Academic Research Survey (Score:2)
  
  by uberdave ( 526529 ) writes:
  
  If you really want help, try posting an actual link instead of merely quoting a URL. We're all to busy/lazy here to copy and paste that into our browsers.
- - In addition to the pornography... (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    ...clicking on this link also attempts to install a trojan (SARC's name: ByteVerify). I agree: this link should be removed and the poster's IP should be reported to the relevant authorities.
    - - Re:In addition to the pornography... (Score:2, Informative)
        
        by mfreed ( 217310 ) writes:
        
        nyud.net refers to a semi-open, peer-to-peer content distribution network called CoralCDN that is essentially a distributed web cache. We serve > 10 M requests daily for 100,000s of clients. For more information about this research project, please see:
        
        http://www.coralcdn.org/
        
        Basically, when you see a URL like you reported, it means that the content is actually from (stripping out the .nyud.net:8090):
        
        http://minigirls.biz/
        
        Thus, if you think you've seen evidence of child abuse, you should get
- Re:Okay... (Score:2)
  
  by HAKdragon ( 193605 ) writes:
  
  I'm guessing it's because the Cell chip is going to be used in the Playstation 3.
- Upcoming Sony PS3... (Score:2)
  
  by quarkscat ( 697644 ) writes:
  
  will be using this "SoC".
  
  However, after having RTFA(s), the Cell processor
  would look like a very good candidate for a F/OSS
  VIDEO BOARD - fast multicore processors, a large
  local memory, simplified RISC with most control
  in software, and a 64-bit PPC "traffic cop".
  
  One additional area (at least) that I would
  expect the Cell processor to be incorporated
  into would be next generation radar and sonar
  systems, due to vector processing capabilities.
  
  I would love to see an IBM development system
  for this architecture
- Re:Interesting (Score:3, Interesting)
  
  by divisionbyzero ( 300681 ) writes:
  
  Well, it certainly might seem that he is being a hypocrite. See:
  
  "In another part of the article, Blachford claims that the cell processing units have no "cache." Instead, they each have a "local memory" that fetches data from main memory in 1024-bit blocks. Well, that's sort of like saying that an iMac doesn't have a "monitor," but it does have a surface on which visual output is displayed. In other words, the Cell "local memories," which are roughly analogous to the vector units' "scratchpad RAM" on the P

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Apple? (Score:4, Insightful)

Re:Apple? (Score:5, Informative)

Re:Apple? (Score:3, Informative)

Re:Apple? (Score:3, Funny)

Re:Apple? (Score:4, Funny)

Re:Apple? (Score:2)

It was a joke (Score:2)

Re:Apple? (Score:2)

Re:Apple? (Score:2)

Re:Apple? (Score:2, Informative)

Re:Apple? (Score:5, Informative)

Re:Mistake (Score:5, Interesting)

Re:Mistake (Score:2)

My article on the new cell processor: (Score:3, Insightful)

Part II is up now (Score:5, Informative)

Like having a whole Beowulf Cluster on one chip... (Score:3, Funny)

Workstation? (Score:5, Interesting)

Workstation?-Cell Wars. (Score:2, Informative)

Re:Workstation? (Score:3, Interesting)

Re:Workstation? (Score:4, Insightful)

A real supercomputer chip that CELL copied (Score:2)

When hell freezes over (Score:2)

Re:Workstation? (Score:5, Insightful)

More info in these slides (Score:5, Interesting)

Re:More info in these slides (Score:2)

Re:More info in these slides (Score:4, Informative)

Re:More info in these slides (Score:2, Insightful)

Re:More info in these slides (Score:2)

Not if the CPU is too expensive. (Score:3, Informative)

Re:More info in these slides (Score:3, Informative)

Re:If Sony can, Apple can (Score:3, Insightful)

Re:If Sony can, Apple can (Score:4, Insightful)

Re:If Sony can, Apple can (Score:3, Informative)

Love those architectural articles (Score:5, Funny)

Hannibal (Score:5, Funny)

Re:Hannibal (Score:5, Funny)

Re:Hannibal (Score:5, Funny)

Re:Hannibal (Score:2)

Re:Hannibal (Score:3, Informative)

Re:Hannibal (Score:2)

Hannibal was the greatest general of his era (Score:2)

Re:Hannibal was the greatest general of his era (Score:2)

Re:Hannibal (Score:2)

Re:Hannibal (Score:2)

iCell? (Score:2, Interesting)

depends on application (Score:2)

Re:depends on application (Score:2)

Re:iCell? (Score:2)

Re:iCell? (Score:2)

Re:iCell? (Score:2)

Re:iCell? (Score:2)

How do I code this thing?? (Score:4, Interesting)

Re:How do I code this thing?? (Score:2)

Re:How do I code this thing?? (Score:5, Informative)

Re:How do I code this thing?? (Score:3, Insightful)

Re:As a total Cell/PS2-coding n00b... (Score:4, Informative)

Re:How do I code this thing?? (Score:5, Informative)

Re:How do I code this thing?? (Score:5, Interesting)

Re:How do I code this thing?? (Score:2)

Re:How do I code this thing?? (Score:3, Insightful)

The real value of the x86 (Score:5, Insightful)

Perhaps I don't quite understand (Score:2, Insightful)

Re:The real value of the x86 (Score:4, Insightful)

Re:The real value of the x86 (Score:2)

Sparc is open too (Score:2)

Future compatibility (Score:2)

Re:Future compatibility (Score:2)

Export controls? (Score:2)

I understand (Score:2, Interesting)

Not this year (Score:2)

Power5 "lite"? (Score:2)

Oops (Score:2)

Not useful for scientific computing (Score:5, Interesting)

Re:Not useful for scientific computing (Score:3, Informative)

Re:Not useful for scientific computing (Score:3, Insightful)

Cellection? (Score:2)

Re:Cellection? (Score:3, Interesting)

similar technology... (Score:4, Informative)

Golden oppourtunity for L4/Hurd (Score:3, Interesting)

Re:Golden oppourtunity for L4/Hurd (Score:2)