Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

The Truth About Last Year's Xbox 360 Recall

Posted by kdawson on Tue Jun 10, 2008 06:03 PM
from the value-for-the-money dept.
chrplace forwards an article in which Gartner's Brian Lewis offers his perspective on what led to last year's Xbox 360 recall. Lewis says it happened because Microsoft wanted to avoid an ASIC vendor. "Microsoft designed the graphic chip on its own, cut a traditional ASIC vendor out of the process, and went straight to Taiwan Semiconductor Manufacturing Co. Ltd., he explained. But in the end, by going cheap — hoping to save tens of millions of dollars in ASIC design costs, Microsoft ended up paying more than $1 billion for its Xbox 360 recall. To fix the problem, Microsoft went back to an unnamed ASIC vendor based in the United States and redesigned the chip, Lewis added. (Based on a previous report, the ASIC vendor is most likely the former ATI Technologies, now part of AMD.)"
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by ZiakII (829432) <halfwarr@gmail.cNETBSDom minus bsd> on Tuesday June 10 2008, @06:06PM (#23737903)
    Microsoft designed their own graphic chip and it crashed? I'm shocked... I tell you shocked!
  • by Anonymous Coward on Tuesday June 10 2008, @06:07PM (#23737923)
    it seems that every time some company tries to cut corners, it only ends up biting them in the a. my company does the same thing, and the kludgy results are nothing short of spectacular.
    • by boner (27505) on Tuesday June 10 2008, @06:13PM (#23738003)
      well, it's the difference between an MBA making a business call based on cost/profit analysis and an experienced chip designer looking at the actual risks involved....

      MBAs are good in cutting corners in traditional businesses, but generally have no understanding of technology risks....
      • by CyberLife (63954) on Tuesday June 10 2008, @07:11PM (#23738853)

        MBAs are good in cutting corners in traditional businesses, but generally have no understanding of technology risks....
        This sort of arrogance is so common it's not even funny. I once presented a GIS plot to such a person. You know, the kind of thing that crunches so much data it takes a cluster of machines upwards of several minutes just to produce a single frame? Well this guy argued if I needed so much computer power to make a simple picture I must be doing something wrong.
        • by MadKeithV (102058) on Wednesday June 11 2008, @02:14AM (#23743689)
          The most common form of it that I see is one of the business dudes telling me (the Software Development Consultant) that a particular piece of technology "will take about a week to develop". I've started replying with "so you will deliver to me next thursday then?". But seriously, I think management and planning by wishful thinking are becoming a full-on religion around these parts.
      • Re: (Score:3, Insightful)

        well, it's the difference between an MBA making a business call based on cost/profit analysis
        All profit-seeking companies do this. This is not an inherently bad thing - you wouldn't have a job otherwise.

        MBAs are good in cutting corners in traditional businesses, but generally have no understanding of technology risks....
        So if you have business savvy you can't possibly understand technology risks? Oh please.
        • by Anonymous Coward on Tuesday June 10 2008, @07:58PM (#23739683)

          So if you have business savvy you can't possibly understand technology risks? Oh please.
          Strawman. The problem is that MBA degrees are churned out as "one size fits all" managers, suitable (pun intended) for any industry by virtue of having no specific training for any of them.

          You can have business savvy and technological expertise, but it's a roundabout path through today's educational system if you're not teaching yourself at least one. And I think we all know the proportion of people who are capable of serious self-education.

          • by Hal_Porter (817932) on Tuesday June 10 2008, @10:41PM (#23741895)
            I dunno, the problem I have with MBA types as managers is that it's easier to learn the business stuff yourself than the technology.

            And for balance the problem I have with engineers as managers is that it's possible to learn the people skills stuff but you have to understand why it's important and want to do learn it. It's all too easy to stay in the comfort zone where you basically sit in a dark corner somewhere and write code if that's what you enjoy rather than forcing yourself to talk to people.
        • by Anonymous Coward on Tuesday June 10 2008, @08:23PM (#23740075)

          All profit-seeking companies do this. This is not an inherently bad thing - you wouldn't have a job otherwise.

          I don't think you're getting it. Cutting costs is one thing. Cutting corners is another. Cutting costs is fine, but cutting corners implies the product is worse off because of it. Few engineers would say "It'd be cheaper to roll our own graphics chip," because they realize the immense technical challenges involved. Few MBAs are likely to understand that, however.

          So if you have business savvy you can't possibly understand technology risks? Oh please.

          There's a big difference between what you just said and what the OP said. Nobody said MBAs can't be tech savvy. However, the fact of the matter is, most of them aren't.

          Also, just to be pedantic, having an MBA has little to do with having business savvy.

          • by Hal_Porter (817932) on Tuesday June 10 2008, @11:05PM (#23742151)

            I don't think you're getting it. Cutting costs is one thing. Cutting corners is another. Cutting costs is fine, but cutting corners implies the product is worse off because of it. Few engineers would say "It'd be cheaper to roll our own graphics chip," because they realize the immense technical challenges involved.
            They didn't "roll their own graphics chip" from what I can tell. They licensed the IP (the VHDL code or a synthesized core) from someone else. The plan from the start with the XBox360 was that they would do this and try to integrate it all eventually onto one chip. That's the reason they moved from x86 to PPC, because neither Intel or AMD would license their IP and let Microsoft make their own chips. Actually this is the difference between Risc and x86 these days - x86 vendors don't license their IP but Risc vendors do. Since consoles are sold at a loss initially and subsidized by games it's really important to reduce the build costs by doing this. Back in the XBox days most people thought that Microsoft lost out because they couldn't integrate the design into once chip in the way that Sony did with their console. And that was because they didn't own the IP for the processor.

            The mistake seemed to be to let Microsoft's in house group do this rather than outsourcing.

            But you've got to remember this is an article in EEtimes from an analyst with an agenda
            http://www.eetimes.com/news/latest/showArticle.jhtml;jsessionid=51TYZYXYRWUZUQSNDLSCKHA?articleID=208403010 [eetimes.com]
            "System OEMs have no business designing ASICs any longer," said Lewis. The reality is that system companies are finding it hard to do enough ASIC designs to keep in-house design teams employed.

            Basically he's trying to create business for ASIC design houses by telling people that putting a bunch of licensed IP onto a chip is rocket science and they shouldn't try to do it in house.

            Is it really? I honestly don't know. I suspect it depends a lot on the quality of the in house people and the quality of the ASIC design house.

            And it depends on what you're trying to do. In the embedded area lots of companies much smaller than Microsoft put an processor and a bunch of their own peripherals onto a chip and it works. I guess that console or PC graphics cores use a lot more power than that. But I don't know if "an ASIC design house" would have done a better job than Microsoft's ASIC group.

            Or more to the point, maybe a $1B recall is the price you pay for learning about this stuff. Microsoft can afford it obviously and it will influence how the successor to the XBox360 is done. Whether they hire more engineers and do it in house or outsource it is a business decision it seems. I guess the in house people and the design house will both try to argue for the best option from their point of view and some manager will decide.

            But if you're a cash rich company then the bias will be to try to do as much as possible in house, because that gives you more freedom to value engineer later.
            • by tftp (111690) on Wednesday June 11 2008, @02:50AM (#23743899) Homepage
              Basically he's trying to create business for ASIC design houses by telling people that putting a bunch of licensed IP onto a chip is rocket science and they shouldn't try to do it in house. Is it really? I honestly don't know. I suspect it depends a lot on the quality of the in house people and the quality of the ASIC design house.

              It is true. You should not unnecessarily muck with VHDL/Verilog and 3rd party cores even if you work with an FPGA. This will not kill you, but it will make you poorer. HDLs are notoriously kludgy, and it takes a lot of effort to do it right. Proprietary cores rarely work as documented, and you have no visibility into them. When multiple cores are used, it's one large fingerpointing game between vendors. And you need to have good, experienced HDL coders. And you need to have all the tools, they cost big bucks.

              But that's with mere FPGAs, where you can update your design whenever you wish. However here they are talking about ASICs - where all the wiring is done with masks when the IC is made. You'd have to be certifiably mad to even think about a casual design like this. ASIC designs are done by very competent teams, using "10% coding / 90% verification" time allocation, because you can't afford /any/ mistakes. And even then you make mistakes; but experienced teams with good tools make those mistakes smaller, and they call them "errata" - something that is not right but can be worked around. When you make the F0 0F bug, though, you trash the whole run.

              So Microsoft risked a lot when it went for an in-house design. I am not surprised that they failed. They should have counted all the successful 3D video companies on the market and asked themselves why there are so few, and why top gaming cards cost so much.

              But if you're a cash rich company then the bias will be to try to do as much as possible in house, because that gives you more freedom to value engineer later.

              I am not MS, but I don't really see much business value in rolling your own video controller. More likely the NIH syndrome kicked in, or some people were overly concerned about their job security.

            • by quanticle (843097) on Tuesday June 10 2008, @10:10PM (#23741527) Homepage

              That's true, but, if the did go to an ASIC vendor they could have got a contract indemnifying them from taking losses when the chip turned out to be flawed. By doing the chip design themselves, they saved a little bit of costs, but also took on all the risks of having a bad design.

              That's what the parent poster is alluding to. A manager with experience in technology would have understood that, while designing your own chip might have been cheaper, it would have also introduced significant downside risk, which ought to have been factored into the equation. Farming the chip design out to a third party, while more expensive in the short term, would have entailed less long-term risk.

      • What exactly do they understand? From the decisions I've seen, "Master of Business Administration" is not a title I'd apply to most...
    • Consider: would you rather spend $10M on a platform that may flop and not make a dime

      OR

      Spend $1B on a platform that has made multi-billions.

          • by PJ1216 (1063738) * on Tuesday June 10 2008, @07:21PM (#23739057) Homepage
            I'm not sure of the numbers, but finally turning a profit one quarter does not mean you've finally made up all the money you lost all the past quarters from selling the systems at a loss. It just means they're no longer selling them at a loss. They had already dug a hole, but finally started to climb out instead of going deeper. It doesn't necessarily mean they are out of it though. They *could* be at this point, but that article says nothing the platform making up the billions it had already lost. Eventually they will and maybe they have at this point in time, but that article is a red herring. It just means they stopped losing money.
    • by HiVizDiver (640486) on Tuesday June 10 2008, @07:32PM (#23739263)
      Not so sure about that... I would argue that very often when something breaks, it is because they used a cheap vendor, but that the logic doesn't necessarily apply backwards - that using a cheap vendor means it WILL break. I bet there are loads of examples of people doing things on the cheap, where it DIDN'T fail. You just don't hear about those.
  • Bleh... (Score:4, Insightful)

    by Anonymous Coward on Tuesday June 10 2008, @06:08PM (#23737927)

    ...hoping to save tens of millions of dollars in ASIC design costs, Microsoft ended up paying more than $1 billion for its Xbox 360 recall.
    I'm glad that I am not wealthy enough to be able to afford to be that incompetent.
  • by Udo Schmitz (738216) on Tuesday June 10 2008, @06:09PM (#23737953) Journal
    Shaking fists at ATI, yelling: "I'll design my own chip! With blackjack! And hookers! ... In fact ..."
  • by Naughty Bob (1004174) * on Tuesday June 10 2008, @06:10PM (#23737961)
    I know /. does like to stick the boot into MSFT whenever possible, but in the last 2 hours there has been 3 front page stories, real stories, about the nasty behaviour of MSFT coming back to bite them in their fugly corporate ass.

    Or is it all just a hoax? [fugue.com]

    Hope not.
  • Another Talisman CF (Score:5, Interesting)

    by rimcrazy (146022) on Tuesday June 10 2008, @06:11PM (#23737965)
    I had the miss-pleasure of working on a graphics ASIC with MicroSquish back around the late 90's on a project called Talisman.

    Never, and I say NEVER let a bunch of software engineers try to design a hardware chip. This was the biggest CF I'd seen in all my years (30+) as a chip designer. That they did it again, and with such stupidity again is no friggin surprise.

    It is not that software engineers should not be involved, of course they should but when they drive the architecture in complete void of any practical chip design constraints..... and continually refuse to listen to any reason from the hardware designers..... well as they say, garbage in, garbage out.
    • by Dhar (19056) on Tuesday June 10 2008, @06:31PM (#23738257) Homepage

      Never, and I say NEVER let a bunch of software engineers try to design a hardware chip.
      I've worked with software written by a hardware company, and I can say the same thing from my side of the fence...never let a bunch of hardware guys write software!

      I suppose if we can all agree to stay out of the other guy's yard, we can get along. You do hardware, I'll do software. :)

      -g.
      • by hedronist (233240) * on Tuesday June 10 2008, @07:26PM (#23739165)
        > never let a bunch of hardware guys write software

        I testify, Brother, I TESTIFY!

        30 Years ago, I ended up in therapy (literally) after dealing with an assembly program written by a hardware guy. The program emulated a CDC communications protocol that was originally done in hardware. This was on a Cincinnati Milacron 2200B, a machine that had both variable instruction length and variable data length. The hardware guy had implemented the protocol state changes by putting a label *on the address portion* of jump statements (he did this in 50 different places in the program) and then in some other area of the code he would change where the jump branched to next time through. It bordered on an implementation of the mythical COME FROM instruction. Of course, there was zero documentation and almost zero comments.

        After one marathon debugging session I was so frustrated I was in tears. My manager came in and wanted to know what the problem was. I gave him the listing and left to walk around the building a few times. When I came back, he told me that it was, hands down, the worst piece of crap he had seen in 20 years. He had me rewrite it from scratch, which I did over a long weekend.

        The program's name was RIP/TIP (Receive Interrupt Processor/Transmit Interrupt Processor) and I was in therapy for most of a year. (There were a few other issues, but this was the bale of hay that made me snap.)

    • by Anonymous Coward on Tuesday June 10 2008, @06:49PM (#23738473)
      What makes you think that it was designed by only software engineers exactly?

      I can tell you first hand that a lot of the people on the Xbox hardware team a extremely talented HARDWARE specialists. The way you talk you would think MS locked a bunch of IE developers in a room and didnt let them out until they had designed the chip.

      And as for the argument of 'well if they are so talented, why is the chip such a POS?', it is not only software engineers that design shitty hardware. Look at AMD, with the TLB defect in the Phenom chips, is that the fault of the software engineers?

      This response may be overkill, but somehow you were modded +5 interesting, but you completely miss the point.
      • by Cassini2 (956052) on Tuesday June 10 2008, @07:12PM (#23738879)

        I am a person that designs both hardware, and software, but not chips, At the risk of talking outside of my expertise, I will have a go at answering your question.

        Firstly, there are things that software people really like, but it is often better to not do them in hardware. This category contains things like Read/Write I/O registers. From a software point of view, they are nice, but they can double your gate count. They can also increase your capacitive bus loading. DAC and ADC designs can also be affected this way. A software person might use a proper ADC and expect proper ADC registered results. A hardware person might select a resistor, capacitor, a voltage comparitor, and a couple of spare I/O pins. The cheesy R/C approach may save the hardware design from a whole slew of problems including cost. A software person may opt for a synchronous logic approach with all registers clocked every clock cycle. The hardware designer may opt for a much more asynchronous approach, that minimizes the number of clocked registers. This reduces power consumption, and potentially the number of registers too. Often the hardware designer will consider thermal, cost, electrical layout issues as part of his design process. The software person will not be as familiar with how to design a good circuit board and chip design in a cost-effective manner. A good software engineer can learn all of this material with time, but the hardware engineers will do them naturally.

        The second category of problems is tools. The modern chip designer is working with a fairly advanced set of tools that the software person is likely to be quite unfamiliar with. This starts with the IC design tools, which are quite specialized. It ends with the hardware engineering tools. Have you ever X-Rayed a circuit board to analyze the cracks in the Ball Grid Array where it bonds to the circuit board? Are you familiar with thermal issues, and thermal images? How about EMI test results? Modern IC package design limitations? A good team of engineers will be familiar with these tools, and know how to use them to get good results.

        The third category of problems is mistakes from inexperience, or lack of experience in the correct field. I work with industrial electronics. I think from an industrial point of view. What happens when someone attaches 600 (VAC) to the ground wire of the computer? What happens to the remote sensors when the plant gets hit by lightening? In IC design, there are some known gray areas too. Does the chip reset properly on power up? Do metastable, astable, or self-oscillating states exist in the IC design? Can the chip survive with no cooling? Does the chip have an overtemp shutdown function? What happens if someone starts the chip up in sub-zero weather? Do the analog electronics have sufficient electrical separation from the digital electronics, while avoiding nasty things like ESD latchup conditions?

        I've completed chip design courses before, but have never had to design a modern production gate array design. As a person that has done both software and hardware, I know that my skills are not good enough for the most modern IC design processes. My limit is FPGA work, and my preference is clever opto-isolation, power semiconductor, TTL and micro-proccessor based circuits. In analog, my expertise in analog is industrial sensing and survivability. You have to know where your field of expertise is, and what your limits are.

          • by CastrTroy (595695) on Tuesday June 10 2008, @08:09PM (#23739871) Homepage
            Not really. I wouldn't have a mechanical engineer design a chip either. I also wouldn't have a hardware/mechanical engineer designing a software system. Let people do what they are good at, and stop trying to cut corners by substituting in people where they have no skills.
          • by vux984 (928602) on Tuesday June 10 2008, @11:27PM (#23742403)
            So... hardware design is a "real" engineering (deals with whole range of nastiness the physical reality slaps you with), unlike the hack that software "engineering" is... Is that what you're saying? :-)

            Well... there's "real" software engineering too...stuff involving resource deadlock, race conditions, critical section synchronization, in applications like virtual memory management, network protocols, time sync, file systems, security, fault tolerance, etc that are subject to all sorts of 'physical reality nastiness'.

            Its not all wizards and automatic code completion you know. :-)
  • Ridiculous (Score:5, Informative)

    by smackenzie (912024) on Tuesday June 10 2008, @06:17PM (#23738079)
    ATI and Microsoft developed this chip together over a period of two years. The XBOX 360 GPU has been known since conception as an ATI GPU.

    Furthermore, the recall was for overheating in general which -- though unquestionably affected by the GPU -- is a more comprehensive system design failure, not just a single component. (Look at the stability success they have had simply by reducing the size of the CPU.)

    I'm looking forward to "Jasper", the code name for the next XBOX 360 mother board that will include a 65 nanometer graphics chip, smaller memory chips and HOPEFULLY a price reduction.
    • Vote parent up (Score:5, Insightful)

      by imsabbel (611519) on Tuesday June 10 2008, @06:39PM (#23738351)
      The article is COMPLETE, UTTER bullshit.

      Years before the xbox360 has been released ATI was already announced as the system parter for the GPU. No "secret unnamed ASIC vendor" anywhere.
      The recall, again, was thermal problems.

      Do you really think a completely different GPU by a completely different company could have been designed in a year _and_ totally compatible with the original one?
  • What's going on..... (Score:5, Informative)

    by ryszards (451448) * on Tuesday June 10 2008, @06:25PM (#23738161) Homepage
    Microsoft didn't design the GPU, ATI did, and everyone knows ATI have always been fabless. TSMC are the manufacturer of the larger of the two dice that make up the Xenos/C1 design, and while that die has been revised since for a process node change, it doesn't even appear if that new revision has been used yet (despite it being finished by ATI a long time ago).

    Lewis seems to be just plain wrong, which is kind of upsetting for "chief researcher" at a firm like Gartner, especially when the correct information is freely available.

    While the cooling solution for the GPU is the likely cause of most of the failures, that's not necessarily the GPU's fault, or ATI's, especially for a fault so widespread.
    • by Ritchie70 (860516) on Tuesday June 10 2008, @10:18PM (#23741617) Journal
      Dunno why Lewis being wrong is upsetting.

      Everything I've ever heard as a "Gartner opinion" got one of two reactions from me:

      1. Well duh.
      2. No, that's obviously wrong.

      Looks like this is #2.
      • by afidel (530433) on Tuesday June 10 2008, @07:06PM (#23738757)
        Now scaler chip makes a LOT more sense to me than the GPU. Everyone knows ATI was the partner for the GPU and there would be few people in the industry that would call a GPU an ASIC. A scaler chip is very much an ASIC and I can see where MS might decide to do their own scaler chip, but they had no chance of doing their own modern GPU without a partner.
  • What Recall? (Score:4, Insightful)

    by cjjjer (530715) <cjjjerNO@SPAMhotmail.com> on Tuesday June 10 2008, @06:48PM (#23738469)
    Funny I don't recall a recall only a 3 year warranty extension covering the RoD.

    True to /. form allowing an article to spread false truths...

    News for Nerds.... Stuff that may or may not be true...
  • by YesIAmAScript (886271) on Tuesday June 10 2008, @06:56PM (#23738591)
    Look at Bunnie Huang's analysis.

    The problem wasn't any chip at all. It wasn't even heat. The problem was the chips were not soldered to the board.

    http://www.bunniestudios.com/blog/?p=223 [bunniestudios.com]

    Doesn't matter who designed or made the chips. If they aren't soldered down, they won't work. And that's what the problem was. That's why X-clamps (mostly) work.

    Heat is semi-tangential. If the chip is soldered down, heat won't pop it off and if it isn't soldered, any kind of movement will break it loose, even when cold. This is how MS could ship you replacement units that were RRoD out of the box. They were fine before they were shipped and were broken loose during shipping.

    Most of the problem appears to be solderability problems, not a problem with chip design or manufacturing.
    • Re:More info please (Score:4, Informative)

      by Vectronic (1221470) on Tuesday June 10 2008, @07:11PM (#23738845)
      http://en.wikipedia.org/wiki/Xbox_360_technical_problems [wikipedia.org]

      When a Microsoft Xbox 360 console experiences a "general hardware" failure or "Core Digital" failure, three flashing red lights appear on the power switch's "Ring of Light" in the front of the console. This is commonly referred to as the "Red Ring of Death" ...

      The General Hardware Failure error could be caused by cold soldering. The added mass of the CSP chips (including the GPU and CPU) resists heat flow that allows proper soldering of the lead-free solders underneath the motherboard. ...

      Another General Hardware Failure is shown by the ring of light flashing one red light, and an error code E 74. This too renders the Xbox unusable. ...

      The Nyko Intercooler has also been reported to have caused a general hardware failure in a number of consoles, as well as scorching of the power AC input. ...

      An update patch released on November 1, 2006 was reported to "brick" consoles, rendering them useless. ...

      In June 2008, the EE Times reports the problems may have started in a graphic chip.
      The last one is what this article is (mostly) about...
    • Re:Some Facts... (Score:4, Interesting)

      by Renraku (518261) on Tuesday June 10 2008, @10:24PM (#23741695) Homepage
      The reason GTA4 runs at a lower resolution on the PS3 is because they can do all kinds of nifty effects with the card that aren't all geometry, textures, and shading. They can do a slight motion blur, for example, and have almost everything 100% bump-mapped. In reality, you don't notice that the resolution is slightly lower.

      The PS3 COULD run it in 360-resolution, but it might have to sacrifice some of those filters and special effects. I'd rather have a special effect laden game run at slightly lower resolution myself, as long as its hard to notice.