


Whose Bug Is This Anyway? 241
An anonymous reader writes "Patrick Wyatt, one of the developers behind the original Warcraft and StarCraft games, as well as Diablo and Guild Wars, has a post about some of the bug hunting he's done throughout his career. He covers familiar topics — crunch time leading to stupid mistakes and finding bugs in compilers rather than game code — and shares a story about finding a way to diagnose hardware failure for players of Guild Wars. Quoting: '[Mike O'Brien] wrote a module ("OsStress") which would allocate a block of memory, perform calculations in that memory block, and then compare the results of the calculation to a table of known answers. He encoded this stress-test into the main game loop so that the computer would perform this verification step about 30-50 times per second. On a properly functioning computer this stress test should never fail, but surprisingly we discovered that on about 1% of the computers being used to play Guild Wars it did fail! One percent might not sound like a big deal, but when one million gamers play the game on any given day that means 10,000 would have at least one crash bug. Our programming team could spend weeks researching the bugs for just one day at that rate!'"
Re:I don't believe 1% of computers give wrong answ (Score:5, Insightful)
You don't have any idea what you're talking about, and that's why you don't understand what he's talking about.
How to deal with compiler bugs (Score:5, Insightful)
If you suspect the compiler is generating invalid machine code, try to make a minimal test case for it. If you succeed, file a bug report and add that test case; the compiler developers will appreciate it. If you don't succeed in finding a minimal test case that triggers the same issue, it's likely not a compiler bug but an issue in your program in some place where you weren't expecting it.
Re:I don't believe 1% of computers give wrong answ (Score:5, Insightful)
I think this is bull. I just don't believe 1% of computers give wrong answers
1% of all computers? Probably not.
1% of gamers' computers, in an era when PC gaming technology was progressing very quickly, and so gamers were often running overclocked (or otherwise poorly set up) hardware? Sounds plausible enough.
Re:OsStress (Score:2, Insightful)
We all realize that when Intel bakes a bunch of processors, they come out all the same, and then Intel labels some as highspeed, some as middle, and some as low. They are then sold for different prices. However, they are the exact same CPU.
Overclocking isn't the issue, because the CPUs are the same. The problem arises when aggressive overclocking is done by ignorant hobbyists or money-grubbing computer retailers. They overclock the computer to where it crashes, and then back off just a little bit. "There! Now I've got a real MEAN MACHINE," he thinks.
Re:I don't believe 1% of computers give wrong answ (Score:5, Insightful)
He said 1% of computers that were used to play Guild Wars gave wrong answers. Gaming PCs are more likely to be overclocked too far, have under-dimensioned power supplies or overheating issues than the average PC. 1% doesn't sound unrealistically high to me.
Yep, seen it all (Score:5, Insightful)
I've had compilers miscompile my code, assemblers mis-assemble it, and even on a few cases CPUs mis-execute it consistently (look up CPU6 and msp430). Random crashes due to bad memory/cpu... yep. But on very rare occasions, I find that the bug is indeed in my own code, so I check there first.
Re:OsStress (Score:5, Insightful)
Bullshit. While Intel does occasionally bin processors into lower speeds to fulfill quotas and such, often times those processors are binned lower because they can't pass the QA process at their full speed. But they can pass the QA process when running at a lower speed. These processors were meant to be the same as the more expensive line, but due to minor defects can't run stably or reliably at the higher speed. Or at least not enough for Intel to sell them at full speed.
Which is a large part of why some processors in the same batch can handle it when others can't.
As much as I hate Intel, I think we could at least realize that they are often times doing this with good reason.
Re:Wait its possible?! (Score:5, Insightful)
I've been programming professionally for over 20 years, mostly in C/C++ (MSVC, GCC, and recently CLang (and others back in the olden days)). I've seen maybe two serious compiler bugs in the past 10 years. They used to be common.
On the other hand, I can't count how many times I've seen coders insist there must be a compiler bug when after investigation, the compiler had done exactly what it should according to the standard (or according to the compiler vendor's documentation when the compiler intentionally deviated from the standard).
By "serious", I mean the compiler itself doesn't crash, issues no warnings or errors, but generates incorrect code. Maybe I've just been lucky. (Or maybe QA just never found them
Oh, and btw, yes I realize you were joking (and I found it funny.)
Re:The memory thing... (Score:4, Insightful)
Re:The memory thing... (Score:4, Insightful)
Nah, that's pretty typical. In fact ram is the only component other than HDD's to have a statistically significant AFR in my datacenter. At the peak I had a bit over 200 servers and we'd have a DIMM go bad about once every other month (so say 6 of 1200 DIMMs per year). Heck with my Proliants the fans and PSUs were more reliable as we've only lost a handful of each over the last 6 years.
Re:The memory thing... (Score:4, Insightful)
I've been hearing this for the entirety of my worldly awareness (several decades), and the song remains the same.
Eventually, I'd hoped that folks would realize that they were unlucky or were just buying garbage, instead of the insipidly assuming that such-and-such widget was so perfectly constructed and planned that it failed within hours/days of the warranty expiring -- just as designed.
The truth is that no matter what the nature of the item, or the term of the limited warranty: Given sufficient quantity, some of them are going to fail mere seconds after the warranty is gone.
Such as it is.
We all want everything we buy to work perfectly and last forever, but nothing ever does. It should be no surprise that this is not the result of any conspiracy, but just life. Things wear out. (Even DIMMs.)
Re:Wait its possible?! (Score:3, Insightful)
Welcome to planet Earth. If your species expects competence in its dealings with humans you should have done more research before landing. Didn't you get those episodes of I Love Lucy we kept sending you?
Re:The memory thing... (Score:3, Insightful)
Open computer and blow it out with a leaf blower every 6 six months. Solves 80% of your boot problems, no need to reinstall or re-seat components.
Re:How to lose time and sanity (Score:4, Insightful)
He wrote a bug report, but it was ignored.