Forgot your password?
Bug Programming Games

Whose Bug Is This Anyway? 241

Posted by Soulskill
from the it's-nobody's-fault-and-everybody's-angry dept.
An anonymous reader writes "Patrick Wyatt, one of the developers behind the original Warcraft and StarCraft games, as well as Diablo and Guild Wars, has a post about some of the bug hunting he's done throughout his career. He covers familiar topics — crunch time leading to stupid mistakes and finding bugs in compilers rather than game code — and shares a story about finding a way to diagnose hardware failure for players of Guild Wars. Quoting: '[Mike O'Brien] wrote a module ("OsStress") which would allocate a block of memory, perform calculations in that memory block, and then compare the results of the calculation to a table of known answers. He encoded this stress-test into the main game loop so that the computer would perform this verification step about 30-50 times per second. On a properly functioning computer this stress test should never fail, but surprisingly we discovered that on about 1% of the computers being used to play Guild Wars it did fail! One percent might not sound like a big deal, but when one million gamers play the game on any given day that means 10,000 would have at least one crash bug. Our programming team could spend weeks researching the bugs for just one day at that rate!'"
This discussion has been archived. No new comments can be posted.

Whose Bug Is This Anyway?

Comments Filter:
  • by PaladinAlpha (645879) on Tuesday December 18, 2012 @09:49PM (#42332769)

    You don't have any idea what you're talking about, and that's why you don't understand what he's talking about.

  • by MtHuurne (602934) on Tuesday December 18, 2012 @09:52PM (#42332779) Homepage

    If you suspect the compiler is generating invalid machine code, try to make a minimal test case for it. If you succeed, file a bug report and add that test case; the compiler developers will appreciate it. If you don't succeed in finding a minimal test case that triggers the same issue, it's likely not a compiler bug but an issue in your program in some place where you weren't expecting it.

  • by Jeremi (14640) on Tuesday December 18, 2012 @10:06PM (#42332851) Homepage

    I think this is bull. I just don't believe 1% of computers give wrong answers

    1% of all computers? Probably not.

    1% of gamers' computers, in an era when PC gaming technology was progressing very quickly, and so gamers were often running overclocked (or otherwise poorly set up) hardware? Sounds plausible enough.

  • Re:OsStress (Score:2, Insightful)

    by DNS-and-BIND (461968) on Tuesday December 18, 2012 @10:06PM (#42332857) Homepage

    We all realize that when Intel bakes a bunch of processors, they come out all the same, and then Intel labels some as highspeed, some as middle, and some as low. They are then sold for different prices. However, they are the exact same CPU.

    Overclocking isn't the issue, because the CPUs are the same. The problem arises when aggressive overclocking is done by ignorant hobbyists or money-grubbing computer retailers. They overclock the computer to where it crashes, and then back off just a little bit. "There! Now I've got a real MEAN MACHINE," he thinks.

  • by MtHuurne (602934) on Tuesday December 18, 2012 @10:08PM (#42332865) Homepage

    He said 1% of computers that were used to play Guild Wars gave wrong answers. Gaming PCs are more likely to be overclocked too far, have under-dimensioned power supplies or overheating issues than the average PC. 1% doesn't sound unrealistically high to me.

  • Yep, seen it all (Score:5, Insightful)

    by russotto (537200) on Tuesday December 18, 2012 @10:26PM (#42332959) Journal

    I've had compilers miscompile my code, assemblers mis-assemble it, and even on a few cases CPUs mis-execute it consistently (look up CPU6 and msp430). Random crashes due to bad memory/cpu... yep. But on very rare occasions, I find that the bug is indeed in my own code, so I check there first.

  • Re:OsStress (Score:5, Insightful)

    by Anonymous Coward on Tuesday December 18, 2012 @10:48PM (#42333049)

    Bullshit. While Intel does occasionally bin processors into lower speeds to fulfill quotas and such, often times those processors are binned lower because they can't pass the QA process at their full speed. But they can pass the QA process when running at a lower speed. These processors were meant to be the same as the more expensive line, but due to minor defects can't run stably or reliably at the higher speed. Or at least not enough for Intel to sell them at full speed.

    Which is a large part of why some processors in the same batch can handle it when others can't.

    As much as I hate Intel, I think we could at least realize that they are often times doing this with good reason.

  • by disambiguated (1147551) on Tuesday December 18, 2012 @11:58PM (#42333499)
    You're a better programmer for assuming it's not a compiler bug and trying harder to figure out what you did wrong.

    I've been programming professionally for over 20 years, mostly in C/C++ (MSVC, GCC, and recently CLang (and others back in the olden days)). I've seen maybe two serious compiler bugs in the past 10 years. They used to be common.

    On the other hand, I can't count how many times I've seen coders insist there must be a compiler bug when after investigation, the compiler had done exactly what it should according to the standard (or according to the compiler vendor's documentation when the compiler intentionally deviated from the standard).

    By "serious", I mean the compiler itself doesn't crash, issues no warnings or errors, but generates incorrect code. Maybe I've just been lucky. (Or maybe QA just never found them ;-)

    Oh, and btw, yes I realize you were joking (and I found it funny.)
  • by Greyfox (87712) on Wednesday December 19, 2012 @12:47AM (#42333731) Homepage Journal
    Heh, back in the day when I was doing OS/2 phone support, I had a customer call up with a trap zero during install. Now I'd seen a lot of odd shit during an OS/2 install, but I'd never seen a trap zero. Turns out that was a divide by zero error. Fucker made me start filling out the paperwork to send him to level 2 before admitting that he was trying to overclock his processor. If memory servers me correctly (Which it might not, nearly two decades later) he was trying to go from 8 mhz to 20 mhz, and was also getting a lot of crashes in DOS and DOS applications. I told him that was probably what his problem was and if I tried to send this on to level 2 it'd be rejected with a "Don't do that," so I was just going to save him some time and tell him "don't do that" now.
  • by afidel (530433) on Wednesday December 19, 2012 @01:29AM (#42333917)

    Nah, that's pretty typical. In fact ram is the only component other than HDD's to have a statistically significant AFR in my datacenter. At the peak I had a bit over 200 servers and we'd have a DIMM go bad about once every other month (so say 6 of 1200 DIMMs per year). Heck with my Proliants the fans and PSUs were more reliable as we've only lost a handful of each over the last 6 years.

  • by adolf (21054) <> on Wednesday December 19, 2012 @01:45AM (#42333991) Journal

    Especially the just-makes-it-past-warranty crap that's sold these days.

    I've been hearing this for the entirety of my worldly awareness (several decades), and the song remains the same.

    Eventually, I'd hoped that folks would realize that they were unlucky or were just buying garbage, instead of the insipidly assuming that such-and-such widget was so perfectly constructed and planned that it failed within hours/days of the warranty expiring -- just as designed.

    The truth is that no matter what the nature of the item, or the term of the limited warranty: Given sufficient quantity, some of them are going to fail mere seconds after the warranty is gone.

    Such as it is.

    We all want everything we buy to work perfectly and last forever, but nothing ever does. It should be no surprise that this is not the result of any conspiracy, but just life. Things wear out. (Even DIMMs.)

  • by Paradise Pete (33184) on Wednesday December 19, 2012 @04:27AM (#42334659) Journal

    Your school has incompetent IT staff.

    Welcome to planet Earth. If your species expects competence in its dealings with humans you should have done more research before landing. Didn't you get those episodes of I Love Lucy we kept sending you?

  • by Impy the Impiuos Imp (442658) on Wednesday December 19, 2012 @08:02AM (#42335469) Journal

    Open computer and blow it out with a leaf blower every 6 six months. Solves 80% of your boot problems, no need to reinstall or re-seat components.

  • by V for Vendetta (1204898) on Wednesday December 19, 2012 @12:29PM (#42337243)

    One wonders why you continue to use a mail reader that can't manage to send an email without mangling the subject header.

    He wrote a bug report, but it was ignored.

1: No code table for op: ++post