Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Origin of Quake3's Fast InvSqrt()

Posted by Zonk on Fri Dec 01, 2006 02:20 PM
from the i-know-you-were-dying-inside-without-this dept.
geo writes "Beyond3D.com's Ryszard Sommefeldt dons his seersucker hunting jacket and meerschaum pipe to take on his secret identity as graphics code sleuth extraordinaire. In today's thrilling installment, the origins of one of the more famous snippets of graphics code in recent years is under the microscope — Quake3's Fast InvSqrt(), which has been known to cause strong geeks to go wobbly in the knees while contemplating its simple beauty and power."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • A famous quote (Score:5, Funny)

    by ArchieBunker (132337) on Friday December 01 2006, @02:23PM (#17070412)
    (http://www.naawp.org/)
    "English motherfucker, do you speak it?" Anyone care to explain what that function does?
    • Re:A famous quote (Score:5, Informative)

      by GigsVT (208848) on Friday December 01 2006, @02:26PM (#17070456)
      (Last Journal: Saturday June 30, @01:22AM)
      1/sqrt(x)
      [ Parent ]
    • Re:A famous quote by theelectron (Score:1) Friday December 01 2006, @02:28PM
    • Re:A famous quote by Anonymous Coward (Score:1) Friday December 01 2006, @02:29PM
      • Re:A famous quote (Score:5, Informative)

        by mike260 (224212) on Friday December 01 2006, @04:41PM (#17073056)
        Nope, the reason you want a fast 1/sqrt(x) in graphics is so you can normalise 3D vectors quickly (something that most 3D apps have to do a LOT).

        Slow:
            const float length = sqrt( v.x*v.x + v.y*v.y + v.z*v.z );
            v.x /= length;
            v.y /= length;
            v.z /= length;

        Fast:
            const float recip_length = InvSqrt( v.x*v.x + v.y*v.y + v.z*v.z );
            v.x *= recip_length;
            v.y *= recip_length;
            v.z *= recip_length;

        The 2nd version has no divides, and no call to sqrt, which makes it *loads* faster.
        [ Parent ]
    • Re:A famous quote (Score:4, Informative)

      by Raul654 (453029) on Friday December 01 2006, @02:34PM (#17070604)
      (http://en.wikipedia.org/wiki/User:Raul654)
      Given a number, that function calculates the inverse of the square root - which, according to TFA, is very common in graphics applications.
      [ Parent ]
    • Re:A famous quote by nicklott (Score:2) Friday December 01 2006, @02:48PM
    • Re: A better question: by Anonymous Coward (Score:2) Friday December 01 2006, @02:49PM
      • Re: A better question: (Score:5, Informative)

        by Jerry Coffin (824726) on Friday December 01 2006, @04:45PM (#17073126)
        How does this function compute 1/x^(1/2)?


        It starts by taking a guess at the right answer, and then improving the guess until it's accurate enough to use.

        The first step depends heavily on the fact that a floating point number on a computer is represented as a significand (aka mantissa) and an exponent (a power of two). For the moment, consider taking just the square root of X instead of its inverse. You could separate out the exponent part of the floating point number, divide it by two, and then put the result back together with the original significand, and have a reasonable starting point.

        From there, you could improve your guesses to get a better approximation. The simplest version of that would be like a high-low game -- you split the difference between the current guess and the previous guess, and then add or subtract that depending on whether your previous guess was high or low. Eventually, you'll get arbitrarily close to the correct answer.

        This can take quite a few iterations to get to the right answer though. To improve that, Newton-Raphson looks at the curve of the function you're working with, and projects a line tangent to the curve at the point of the current guess. Where that line crosses the origin gives you the next guess. That's probably a lot easier to understand from picture [sosmath.com].

        In this case, we're looking for the inverse square root, which changes the curve, but not the basic idea. As a general rule, the closer your first guess, the fewer iterations you need to get some particular level of accuracy. That's the point of the:

        i = 0x5f3759df - (i>>1);

        While the originator of this constant is unknown, and some of it is rather obscure, the basic idea of most of it is fairly simple: we start by shifting the original number right a bit. This divides both the mantissa and the exponent part by two, with the possibility that IF the exponent was odd, it shifts a bit from the exponent into the mantissa. The subtraction from the magic number then does a couple of things. For one thing, if a bit from the exponent was shifted into the mantissa, it removes it. The rest of the subtraction is trickier. If memory serves, it's based on the harmonic mean of the difference between sqrt(x) and (x/2) for every possible floating point number of the size you're using.

        This is where the fact that it's 1/sqrt(x) instead of sqrt(x) means a lot: 1/sqrt(x) is a curve, but it's a fairly flat curve -- much flatter than sqrt(x). The result is that we can approximate a point on the curve fairly accurately with a line. In this case, it's really two lines, which gets it a bit closer still.

        From there, the number has had a bit of extra tweaking done -- it doesn't actually give the most accurate first guess, but its errors are often enough in the opposite direction from those you get in the Newton-Raphson iteration steps that it gives slightly more accurate final results.
        [ Parent ]
    • Evil math--it approximates the inverse square root by Anonymous Coward (Score:2) Friday December 01 2006, @03:08PM
    • Re:A famous quote (Score:5, Informative)

      by 91degrees (207121) on Friday December 01 2006, @03:13PM (#17071420)
      (Last Journal: Friday June 11 2004, @11:15AM)
      Okay - At times you want a normalised vector. A lot of the time you will have vectors of arbitrary length For example, the light is at the origin, the point is at (12, 4, 3). So the vector from the point to the light is (-13, -4, -3). The length of this vector can be calculated easily using pythagoras's theorem (sqrt (12^2+4^2+3^2). It's 13 units in length. We want a unit vector (i.e. a vector 1 unit in length) So we divide by the length to get (12/13, 4/13, 3/13).

      This is great for a 3D rendering application, but in a game speed is critical. This pair of calculations involves a square root and a divide. Both of thse are at least an order of magnitude slower than multiplications and additions.

      So what this function does is provide a value you can multiply each component by to get a unit vector.

      Well, there's the what and why parts. As for the , I have no idea. I think it uses magic.
      [ Parent ]
    • Re:A famous quote by Lord Ender (Score:1) Friday December 01 2006, @03:29PM
    • Re:A famous quote by Gilmoure (Score:2) Friday December 01 2006, @04:37PM
    • Re:A famous quote (Score:4, Funny)

      by ebyrob (165903) on Friday December 01 2006, @06:15PM (#17074620)
      (http://slashdot.org/)
      If this doesn't make your head swim:

      int i = *(int*)&x;
      i = 0x5f3759df - (i >> 1);


      Then I'm afraid the whole article is going to be lost on you...

      We've got a floating point being operated on as an integer.
      We've got a mysterious constant.
      We've got a two's complement sign-flip combined with a bit-shift.

      The only thing missing from this party is hookers and beer.
      [ Parent ]
      • 1 reply beneath your current threshold.
    • Re:A famous quote by Codename46 (Score:1) Friday December 01 2006, @02:35PM
    • Re:A famous quote (Score:5, Informative)

      by Red Flayer (890720) on Friday December 01 2006, @02:35PM (#17070624)
      (Last Journal: Friday November 10 2006, @02:16PM)
      It's just John Carmack's way of telling everyone he knows mathematics and please worship the ground he walks on.
      RTFA.

      Carmack quite graciously denied the code was his and helped direct the author closer to the true source.
      [ Parent ]
    • nope by rhombic (Score:2) Friday December 01 2006, @02:36PM
    • Re:A famous quote by Broken scope (Score:2) Sunday December 03 2006, @08:34PM
    • 3 replies beneath your current threshold.
  • And so why do we care? (Score:2, Insightful)

    by Codename46 (889058) on Friday December 01 2006, @02:27PM (#17070484)
    From the words of John Carmack himself, if you discover and implement an algorithm by yourself, even though it may have already been discovered already, you deserve credit for finding it on your own.

    [Insert rant about software patents]
  • I have a truly marvelous proof of who wrote this code which this comment box is too narrow to contain.
  • by gukin (14148) on Friday December 01 2006, @02:33PM (#17070584)
    Anyone who has ever played a computer game should pay up.
  • The linked site seems to be down (gee, you think it might be slashdotted?), but this paper [purdue.edu] seems to be covering the same topic.
  • What's with use of Pointers? (Score:3, Interesting)

    by MankyD (567984) on Friday December 01 2006, @02:35PM (#17070630)
    (http://millionnumbers.com/)
    Why does the coder use
    int i = *(int*)
    instead of just
    int i = (int)x;


    Can someone enlighten me?
    • Re:What's with use of Pointers? by MankyD (Score:2) Friday December 01 2006, @02:38PM
      • Re:What's with use of Pointers? by chrisbtoo (Score:2) Friday December 01 2006, @02:46PM
      • Re:What's with use of Pointers? (Score:5, Informative)

        by eis271828 (842849) on Friday December 01 2006, @02:46PM (#17070866)
        (int) x would convert the floating point value to an integer (truncation, basically).
        *(int*) &x treats the bits as an integer, with no behind the scenes conversion to an actual int value.
        [ Parent ]
      • Re:What's with use of Pointers? (Score:5, Informative)

        by radarsat1 (786772) on Friday December 01 2006, @02:47PM (#17070878)
        (http://www.music.mcgill.ca/~sinclair)
        Because in C's own weird way, that's the only way of refering to a float as an int without changing the bits.

        If you do this:
        int i = (int)3.0f;

        You get i=3, like what you'd get from the floor() function.

        If you do this:
        float f = 3.0f;
        int i = *(int*)


        Then i contains a bit-for-bit copy of the IEEE floating-point representation of 3.0.

        It's because C knows how to cast a float to an int by applying the floor function. However, if you do it the second way, you aren't casting a float to an int, you are casting a pointer-to-float to a pointer-to-int and then dereferencing it.

        By the way, I just wanted to say... this is one of the most interesting things I've read on Slashdot in a while. Wow. That function is just amazing. I only wish I understood how it worked. I know nothing about what a "Newton-Raphson iteration" is.
        [ Parent ]
        • Re:What's with use of Pointers? (Score:5, Informative)

          by dsci (658278) on Friday December 01 2006, @03:12PM (#17071400)
          (http://www.dsbscience.com/)
          Newton-Raphson is a general algorithm for finding root of an equation f(x)=0.

          You start with some INITIAL GUESS (the real beauty of this algorithm) X(0), then apply:

          X(n+1) = X(n) - f(X(n)) / f'(X(n))

          where
          X(n+1) is the NEXT guess after the value you 'know',
          X(n) is that most recent value you know,
          f(X(n)) is the function evaluated at X(n) and
          f'(X(n)) is the first derivative of f(x) evaluated at X(n).

          It's not foolproof and a BOTH whether it converges at al AND how FAST it converges depends on the initial guess, X(0)

          The "Secant Method" is an improvement that makes it a little 'smarter,' at the expense of more computation (this is often a positive trade-off on numerical modeling codes, since the 'smarter' algorithm does tend to converge faster). There are other improvements as well, such as the Los Alamos Linear Feedback Solver (a slightly modified secant method that converges about 10-17% faster, at least for some types of problems) that I use in my own codes.

          Obligatory wikipediea followup: Newton's Method [wikipedia.org]
          [ Parent ]
        • Re:What's with use of Pointers? by The boojum (Score:3) Friday December 01 2006, @03:21PM
        • Re:What's with use of Pointers? by mrogers (Score:1) Friday December 01 2006, @03:30PM
        • Re:What's with use of Pointers? by fractalus (Score:2) Friday December 01 2006, @03:31PM
        • Re:What's with use of Pointers? by Abcd1234 (Score:2) Friday December 01 2006, @03:35PM
        • Re:What's with use of Pointers? (Score:5, Informative)

          by arniebuteft (1032530) <buteft&gmail,com> on Friday December 01 2006, @04:58PM (#17073326)
          The article at http://www.math.purdue.edu/~clomont/Math/Papers/20 03/InvSqrt.pdf [purdue.edu] really explains it better, but the point is to trick the code into performing integer operations on a float. You start the function with a float, which is 32 bits, arranged in a very specific sequence: first bit is sign (0 for positive, 1 for neg), next 8 bits are exponent (offset by 127 to allow positive and negative exponents), and the last 23 bits are the mantissa (normalized to assume the decimal point has a binary 1 to its left). So, the simple good old number 21 in float is the sequence 0 10000011 01010000000000000000000, for the sign, exponent, and mantissa (omit the spaces of course). That same number 21, stored as a 32-bit integer is 00000000 00000000 00000000 00010101 (again, omit the spaces).

          The trick of this function is to take the 32 bits of data that are really a float, but process it as if it's an integer. So you take that cumbersome number 21 as a float, then BAM! presto, turn it directly to an integer not through type conversion, but by simply treating those same 32 bits as if they were representing an integer all along.

          Let's use the number 21 as an example in the function call.

          The binary representation of 21 as a float is 01000001 10101000 00000000 00000000 (broken into 8-bit words for clarity). The function then goes to create a new integer i, whose value is also 01000001 10101000 00000000 00000000 (which happens to be 1101529088 in decimal). The magical line of the code, i = 0x5f3759df - (i>>1), takes that integer i, shifts its bits one to the right (turning our 01000001 10101000 00000000 00000000 into 00100000 11010100 00000000 00000000, or 550764544 in decimal), then subtracts it (still doing integer math here) from 0x5f3759df (which is 01011111 00110111 01011001 11011111 or 1597463007 in decimal), and winds up with 00111110 01100011 01011001 11011111 (or 1046698463 in decimal).

          Now, for its next trick, it takes that wonky integer 1046698463, and turns it back into a floating point number, by the same trick used above, i.e. simply by looking at those same 32 bits, and pretending they're a float, not an int. The binary representation of 1046698463, 00111110 01100011 01011001 11011111, is the same as 0.22202251851558685 in float.

          From here on out, it's all floating math. Apply the Newton-Rhapson method (thats the next line), we get x = 0.22202251851558685 * (1.5 - ( (21*.5) * 0.22202251851558685^2 )) = 0.218117811. We return this value at the closing of the function. As it turns out, the inverse square root of 21 is 0.21821789... (thanks Google calc). So, I have no idea WHY the Float to Int to Float trick works, but it works very well.

          Whew!

          [ Parent ]
        • Re:What's with use of Pointers? by fireboy1919 (Score:2) Friday December 01 2006, @05:05PM
        • Re:What's with use of Pointers? by HTH NE1 (Score:3) Friday December 01 2006, @05:12PM
          • 1 reply beneath your current threshold.
      • Re:What's with use of Pointers? by The boojum (Score:2) Friday December 01 2006, @02:48PM
      • Re:What's with use of Pointers? (Score:5, Informative)

        by ToxikFetus (925966) on Friday December 01 2006, @02:56PM (#17071070)
        Why does the coder use

        (1) int i = *(int*)

        instead of just

        (2) int i = (int)x;
        (some of my points added for emphasis)

        I'll take a swing at this one. It's because the author doesn't want the value of x, but the integer representation of the value at x's memory address.

        If x is 3.14159, (2) will result in i==3, whereas (1) will result in whatever the 4-byte IEEE-754 representation of 3.14159 is (0x40490FD0, if Google is correct). By using (1), the author is able to use integer bitwise opeartions (>>) to perform "free" floating point operations. When i is sent back into floating point form via:

        x = *(float*)

        x now contains the value of the integer operation:

        i = 0x5f3759df - (i >> 1);

        which was presumably faster than an identical floating point operation. It's a nifty little solution, especially with regard to the selection of the magic number.
        [ Parent ]
      • Re:What's with use of Pointers? by mooingyak (Score:2) Friday December 01 2006, @03:00PM
      • Re:What's with use of Pointers? by eneville (Score:1) Friday December 01 2006, @02:51PM
      • Re:What's with use of Pointers? by Cyberax (Score:2) Friday December 01 2006, @03:37PM
      • 1 reply beneath your current threshold.
    • Re:What's with use of Pointers? by Anonymous Coward (Score:1) Friday December 01 2006, @02:42PM
    • Re:What's with use of Pointers? by MeanMF (Score:3) Friday December 01 2006, @02:43PM
    • Re:What's with use of Pointers? by Second_Derivative (Score:1) Friday December 01 2006, @02:43PM
    • Re:What's with use of Pointers? (Score:5, Informative)

      by ewhac (5844) on Friday December 01 2006, @02:44PM (#17070820)
      (http://ewhac.best.vwh.net/ | Last Journal: Saturday August 18 2001, @10:28PM)
      C "understands" ints and floats. If you do the simple cast:

      int i = (int)x;

      Then C will simply convert the float value into an integer value (throwing away fractional part). But this isn't what we want. We want to operate on the bits of an IEEE floating point value directly, and integers are the best way to do that.

      So first, we lie to the compiler by telling it we have a pointer to an int:

      (int *) &f

      And then we deference the pointer to get it into an operable int:

      i = *(int *) &f

      Note what's important here is to keep the compiler from modifying any part of the original 32-bit value.

      Schwab

      [ Parent ]
    • Re:What's with use of Pointers? by uhoreg (Score:2) Friday December 01 2006, @02:47PM
    • Re:What's with use of Pointers? by ars (Score:2) Friday December 01 2006, @02:48PM
    • Because that's what he wanted to do. by mcg1969 (Score:2) Friday December 01 2006, @02:48PM
    • Re:What's with use of Pointers? by doshell (Score:2) Friday December 01 2006, @02:49PM
    • Re:What's with use of Pointers? by systemeng (Score:1) Friday December 01 2006, @02:54PM
    • Re:What's with use of Pointers? by 91degrees (Score:1) Friday December 01 2006, @03:19PM
    • Re:What's with use of Pointers? by gnasher719 (Score:2) Friday December 01 2006, @04:30PM
    • Re:What's with use of Pointers? by prockcore (Score:2) Friday December 01 2006, @06:52PM
    • 2 replies beneath your current threshold.
  • by Stavr0 (35032) on Friday December 01 2006, @02:42PM (#17070778)
    (http://slashdot.org/~Stavr0/journal/ | Last Journal: Thursday January 19 2006, @01:18PM)
    New hotness: Fast InvSqrt()

    Naah, just kidding. They both deserve a spot in the Clever Hacks Hall of Fame

  • Hmm... (Score:2, Funny)

    by porkmusket (954006) on Friday December 01 2006, @02:43PM (#17070798)
    (http://spiderweblabs.com/)
    Suddenly I'm very scared that Ballmer will want to InvSqrt me some pictures of his kids.
    • Re:Hmm... by gstoddart (Score:3) Friday December 01 2006, @02:54PM
  • Poor function name (Score:4, Insightful)

    by bloobloo (957543) on Friday December 01 2006, @02:43PM (#17070800)
    (http://paul-mclaughlin.com/)
    The inverse of the square root is the square. The reciprocal of the square root is what is being calculated. In the case of multiplication I guess it's a reasonable name but it's pretty poor form in my view.
  • It was fast (Score:4, Informative)

    by KalvinB (205500) on Friday December 01 2006, @02:43PM (#17070808)
    (http://www.icarusindie.com/)
    http://www.icarusindie.com/DoItYourSelf/rtsr/index .php?section=ffi&page=sqrt [icarusindie.com]

    That page compares the time it takes to calculate the sqrt various ways including Carmacks. Short version is that modern processors are significantly faster since it can be done in hardware. It may still be useful in cases where the processor doesn't have the sqrt function available.

    His version took 428 cycles compared to 107 cycles doing it in hardware on the same system.
    • Re:It was fast (Score:5, Insightful)

      by systemeng (998953) on Friday December 01 2006, @03:19PM (#17071500)
      First off, this function calculates 1.0/sqrt(x), not sqrt(x). InvSqrt is a particularily nasty function because both the divide and the square root stall the floating point pipeline on IA32 processors. As a result, instead of shooting out one result per cycle that the pipelining normally allows, the processor will stall for 32 cycles for the divide after it has stalled for the 43 cycles for the square root(P4). This is a big hit to realtime performance and it also prevents 76 multiplies from getting done while the pipeline is stalled. Secondly, IA32 processors are super scalar and have multiple integer units which can do portions of this calculation in parallel. This algorithm is brilliant because it uses the integer units for a portion of the most difficult part of the calculation and the remaining floating point multiplies only take about 6 clock cycles on the FPU. The difference in clock cycles you are counting is likely because the routine as written will be implemented as a function call and the stack push overhead will eat you alive. If this is implemented inline, it's about 6 times as good as simply calling the processor's assembly instructions for root and divide in sequence with the penalty that it isn't as accurate. It is virtually impossible to beat sqrt on IA-32 but 1.0/sqrt can be computed faster with newton raphson iteration in one fell swoop than by coposition of the operations. I've worked several years implementing similar optimizations in the reference implementation of ISO/IEC 18026, a standard for digital map conversion. Most of the routines that had optimizations like this added to them saw at least 30% speed improvements. This is a bit of a soft number because many things were reordered to make the pipeline fill better but in general, a complicated function especially of trig fucntions that can be computed in one iteration of well designed newton-raphson will be much faster than the coposition of the CPU's implementation of the component functions. In short, don't write off careful numerics they can provide great sped improvements, just don't use them in code that people will want to understand later if you don't document exactly what you did and why.
      [ Parent ]
      • Re:It was fast by imsabbel (Score:2) Friday December 01 2006, @03:41PM
        • Re:It was fast by systemeng (Score:1) Friday December 01 2006, @04:40PM
      • Re:It was fast (Score:5, Informative)

        by adam31 (817930) <[moc.liamg] [ta] [13mada]> on Friday December 01 2006, @04:07PM (#17072390)
        Okay, let's try x86...

        rsqrtss xmm1, xmm0
        about 5 cycles. And it can pipeline.

        Not a fan of x86? Maybe altivec...
        vrsqrtefp V2, V1
        depends, but 12 cycles probably and pipelined.

        On PS3's SPU it's rsqrte (6 cycles), on 3dNow it's pfrsqrt (8 cycles) both pipelined. Even PS2 had rsqrt (13 cycles). There's just no reason for software reciprocal square root. It's a cool trick, but it's not even useful anymore.

        [ Parent ]
        • Re:It was fast by Hamilton Lovecraft (Score:2) Friday December 01 2006, @04:42PM
        • Re:It was fast by SETIGuy (Score:2) Friday December 01 2006, @04:45PM
        • Re:It was fast by systemeng (Score:3) Friday December 01 2006, @04:48PM
    • Re:It was fast by 91degrees (Score:1) Friday December 01 2006, @03:34PM
    • Re:It was fast by NFNNMIDATA (Score:2) Friday December 01 2006, @03:55PM
    • Hardware not that much faster, if at all. by SETIGuy (Score:2) Friday December 01 2006, @04:32PM
    • Re:It was fast by julesh (Score:2) Saturday December 02 2006, @03:32PM
    • 1 reply beneath your current threshold.
  • Cool Journey (Score:1)

    by locotx (559059) on Friday December 01 2006, @02:50PM (#17070958)
    This was a great article. Especially for the script kiddies and wanna be HaXoRz. It's great to enlighten others about the art of true technology. Artcle included the following: * Mystery (who wrote this) * A Puzzle to solve (how did they come up with some a clevery solution) * Games (Quake, Doom, 3D graphics) * Math (Inverse Square-Root) * History (Who's who of 3D programming, VooDoo->3dFx->nVidia) * Lessons (Skill, Assembly Language, Research, OpenSource is good) Kinda brought back feelings of the "Revenge of the Geeks" series by PBS, when you get to place faces, names, ideas, feelings and thoughts to technical acheivements.
  • Mirrordot the airticle cut-and-paste (Score:5, Informative)

    by zoftie (195518) on Friday December 01 2006, @02:50PM (#17070964)
    (http://www.perlpimp.com/)
    Introduction
    Note!

    This article is a republishing of something I had up on my personal website a year or so ago before I joined Beyond3D, which is itself the culmination of an investigation started in April 2004. So if timeframes appear a little wonky, it's entirely on purpose! One for the geeks, enjoy.
    Origin of Quake3's Fast InvSqrt()

    To most folks the following bit of C code, found in a few places in the recently released Quake3 source code, won't mean much. To the Beyond3D crowd it might ring a bell or two. It might even make some sense.

    InvSqrt()

    Finding the inverse square root of a number has many applications in 3D graphics, not least of all the normalisation of 3D vectors. Without something like the nrm instruction in a modern fragment processor where you can get normalisation of an fp16 3-channel vector for free on certain NVIDIA hardware if you're (or the compiler is!) careful, or if you need to do it outside of a shader program for whatever reason, inverse square root is your friend. Most of you will know that you can calculate a square root using Newton-Raphson iteration and essentially that's what the code above does, but with a twist.
    How the code works

    The magic of the code, even if you can't follow it, stands out as the i = 0x5f3759df - (i>>1); line. Simplified, Newton-Raphson is an approximation that starts off with a guess and refines it with iteration. Taking advantage of the nature of 32-bit x86 processors, i, an integer, is initially set to the value of the floating point number you want to take the inverse square of, using an integer cast. i is then set to 0x5f3759df, minus itself shifted one bit to the right. The right shift drops the least significant bit of i, essentially halving it.

    Using the integer cast of the seeded value, i is reused and the initial guess for Newton is calculated using the magic seed value minus a free divide by 2 courtesy of the CPU.

    But why that constant to start the guessing game? Chris Lomont wrote a paper analysing it while at Purdue in 2003. He'd seen the code on the gamedev.net forums and that's probably also where DemoCoder saw it before commenting in the first NV40 Doom3 thread on B3D. Chris's analysis for his paper explains it for those interested in the base math behind the implementation. Suffice to say the constant used to start the Newton iteration is a very clever one. The paper's summary wonders who wrote it and whether they got there by guessing or derivation.
    So who did write it? John Carmack?

    While discussing NV40's render path in the Doom3 engine as mentioned previously, the code was brought up and attributed to John Carmack; and he's the obvious choice since it appears in the source for one of his engines. Michael Abrash was mooted as a possible author too. Michael stands up here as x86 assembly optimiser extraordinaire, author of the legendary Zen of Assembly Language and Zen of Graphics Programming tomes, and employee of id during Quake's development where he worked alongside Carmack on optimising Quake's software renderer for the CPUs around at the time.

    Asking John whether it was him or Michael returned a "not quite".

    -----Original Message-----
    From: John Carmack
    Sent: 26 April 2004 19:51
    Subject: Re: Origin of fast approximated inverse square root

    At 06:38 PM 4/26/2004 +0100, you wrote:

    >Hi John,
    >
    >There's a discussion on Beyond3D.com's forums about who the author of
    >the following is:
    >
    >float InvSqrt (float x){
    > float xhalf = 0.5f*x;
    > int i = *(int*)
    > i = 0x5f3759df - (i>>1);
    > x = *(float*)
    > x = x*(1.5f - xhalf*x*x);
    > return x;
    >}
    >
    >Is that something we can attribute to you? Analysis shows it to be
    >extremely clever in its method and supposedly from the Q3 source.
    >Most people say it's your work, a few say it's Michael Abrash's. Do
    >you know who's responsible, possibly with a history of sorts?

    Not me,
  • It Was Obviously... (Score:4, Funny)

    by eno2001 (527078) on Friday December 01 2006, @02:54PM (#17071046)
    (http://www.kickthebobo.com/erotech/index.html | Last Journal: Friday October 26, @11:51AM)
    ...the work of John Romero. Apparently he was going to call it the MakeYouMyBitch() function. ;P
  • Fast functions (Score:1)

    by LiquidCoooled (634315) on Friday December 01 2006, @02:55PM (#17071066)
    I haven't seen code like this since I stopped coding Mandelbrot sets in 68k assembler.
    As a young newt I looked at the code inside fractint with awe and discovered some similar marvellous optimisations.
    Building on those and converting to 68k I made what was the fastest mandelbrot calculation I could.

    Another blast from the past.
  • by Matchstick (94940) on Friday December 01 2006, @02:56PM (#17071068)
    Specifically, the compilation _Notation, Notation, Notation_, page 130.
  • It might be damn smart.. (Score:5, Insightful)

    by kan0r (805166) on Friday December 01 2006, @03:01PM (#17071186)
    (http://strace.org/kaners)
    But the first thing I thought when I saw this was: "Damn, that code is a mess!"
    Seriously, try looking away from the genius who obviously wrote it.
    • There is no single comment which would make reading and understanding what happens here much easier!
    • Introduction of a magic number with no explanation whatsoever
    • Magic pointer arithmetics without demystification
    • Portability? Abuse of a single processor architecture, without warning that this would not work on non-x86
    I know it is good code. But it is simply bad code!
  • Error! (Score:3, Funny)

    by jrmiller84 (927224) on Friday December 01 2006, @03:05PM (#17071264)
    (http://www.jamesoft.net/)
    float InvSqrt(float x) { float xhalf = 0.5f*x; int i = *(int*) i = 0x5f3759df - (i >> 1); x = *(float*) x = x*(1.5f - xhalf*x*x); return x; } Asking John whether it was him or Michael returned a "not quite". But it's supposed to return a float!
    • Re:Error! by jrmiller84 (Score:1) Friday December 01 2006, @03:19PM
      • Re:Error! by ralmeida (Score:2) Friday December 01 2006, @04:24PM
        • 1 reply beneath your current threshold.
    • Re:Error! by rmdir -r * (Score:2) Friday December 01 2006, @05:16PM
  • hakmem (Score:3, Interesting)

    by trb (8509) on Friday December 01 2006, @03:27PM (#17071680)
    The article barely mentions HAKMEM [wikipedia.org], but the invsqrt hack is reminiscent of the HAKMEM programming hacks [pipeline.com], which were published in 1972. Several of these hacks use bit fiddling with magic constants to perform tasks in straight-line code, that you would ordinarily think of doing with iteration.

    HAKMEM is classic bathroom reading for hackers. If you want to do it up old-school, print a copy from original scans [mit.edu], double-sided.

    • Re:hakmem by Hamilton Lovecraft (Score:1) Friday December 01 2006, @04:47PM
  • Really old news... (Score:2)

    by Jahz (831343) on Friday December 01 2006, @03:33PM (#17071788)
    (http://www.adkap.com/ | Last Journal: Thursday August 10 2006, @04:10PM)
    As the article says, this snippet has been around since the original Quake came about. Back then (a decade?) the hardware was so much slower than it is today that this approx function was absolutely needed to write the game. It's not new, nor mysterious. In fact I first encountered this code on a forum where somebody linked to the top10 most interesting code snippets (or something to that end)... I've even used it my own code here and there. The crazy hex number exploits IEEE floating point.
     
    A not that is not mentioned here: you can repeat the IEEE magic number line again for a more accurate result :)
  • O_o (Score:1)

    by Rankiri (1002633) on Friday December 01 2006, @03:38PM (#17071866)
    Look son, crazy people...
    • 1 reply beneath your current threshold.
  • Back around 1990, I wrote a game called Alpha Waves (Preview [the-underdogs.info]). Back then, the problem was not using integers to approximate floating point math, it was avoiding integer multiplications, because those were expensive compared to additions. For those of you with a curious mind, you can take a look at http://cc3d.free.fr/cube.s [cc3d.free.fr] for the original source code.

    So here is a quizz, the first one answering wins the admiration of the crowd:

    - How did that engine avoid multiplications?

    - How did it compute sine and cosines?

    - What low-level hardware trick did it use in two-player mode, and how could that be used for performance tuning?

    I believe Alpha Waves was the first video game on personal computers with full-screen "real" 3D (i.e. not scaled sprites), and I also believe it was the first 3D game with two simultaneous players sharing a screen. Would anybody have data to confirm or infirm that?

    --
    -- Physics for Dummies [cc3d.free.fr]
  • To all those wondering why John bothers to push out the source to id's game engines after the fact, the snippet of code at the very top of this article is a poster child for why. Not only do you get well-programmed and well-optimised 3D engines to modify and learn from, you get gems like the fast invsqrt function to show you that it's not all about the 3D hardware, and that software is arguably even more of a factor when analysing 3D performance
    When this software code it turned into a hardware circuit and patented then the magic number on this line,"i = 0x5f3759df - (i>>1);" will be the focus of the patent on the hardware circuit. It becomes very important to know who managed to skillfully approximate that mathematical technique with the code.

    Note that the final magic number isn't the one implicated in the paper on lamont.org though the paper does test it with two other possible candidates. Gary Tarolli mentions that he may have changed which particular magic number was used in the final code but that he didn't write the code itself. Likely, if this particular fast square root function is embedded as a circuit and the circuit patented, the legal documents will attribute the code to Gary since he made the most significant contribution of actually choosing which number was used in the code which was released.

    The question still remains, though: who wrote the routine?
  • Alternate mod type (Score:2, Funny)

    by JBHarris (890771) <bharris@[ ].com ['isf' in gap]> on Friday December 01 2006, @03:55PM (#17072184)
    I wish there was an option to mod some of these posts "+1 Whoa".
  • Future Crew? (Score:1)

    by SignalFreq (580297) on Friday December 01 2006, @03:56PM (#17072216)
    Perhaps this is from the old days of the Future Crew demo scene? They did some amazing things with limited CPU cycles back then (1988-1993).
  • This was done in the 60's and 70's (Score:1, Interesting)

    by Anonymous Coward on Friday December 01 2006, @04:09PM (#17072416)
    I know my Dad used this technique in the 60's or 70's for vector normalization. It was quite valuable on systems that had no hardware floating point divide (and nothing in that era had a hardware square-root).

    Being non-iterative it also has a deterministic speed, which is valuable in
    real-time applications.

    Integer adds/subtracts on floating-point representations were also sometimes used to approximate multiplication and division (by adding/subtracting the
    exponent)
  • by Ninja Programmer (145252) on Friday December 01 2006, @04:23PM (#17072692)
    (http://www.pobox.com/~qed/)
    Here's an old version of one of my webpages:

            http://web.archive.org/web/19990210111728/www.geoc ities.com/SiliconValley/9498/sqroot.html [archive.org]

    And here's an updated version of the same page:

            http://www.azillionmonkeys.com/qed/sqroot.html [azillionmonkeys.com]

    It isn't an exact rendering of the code in question, but it explains enough for any skilled hacker to 1) understand what's going on and 2) modify the code to create the resulting code that's in the Quake 3 source. Furthermore this web page has existed since about 1997 (archive.org doesn't go back that far for some reason.)

    Now *IF* in fact the code origin comes from someone who took ideas from my site, I should point out that *I* am not the originator of the idea either (though I did write the relevant code). Bruce Holloway (who I credit on the page) was the first person to point out this technique to me at around the 1997 timeframe (prior to this, I created my own method which is similar, but not really as fast). (Vesa Karvonen informed by of the technique (through a code snippet with no explanation) at roughly the same time as well.) It was a technique well known to hard core 3D accelerator and CPU implementors, and follows an intentional design idea from the IEEE-754 specification.

    Prof. William Kahan, one of the key people who specified the IEEE-754 standard (the standard for floating point the many CPUs use, starting with Intel's 8087 coprocessor) apparently presented this idea, and is the source for where Bruce Holloway got the idea. The IEEE-754 standard came out around the 1982 time frame. Though, its very likely that these ideas originate from even earlier in computing history.
  • Clever trick! (Score:5, Informative)

    by kent.dickey (685796) on Friday December 01 2006, @05:58PM (#17074370)
    To summarize, the article is about a piece of code to approximate 1/sqrtf(f):

    float InvSqrt (float x){
      float xhalf = 0.5f*x;
      int i = *(int*)&x;
      i = 0x5f3759df - (i>>1);
      x = *(float*)&i;
      x = x*(1.5f - xhalf*x*x);
      return x;
    }
    The trick is the "i = 0x5f3759df" line. It's certainly a magic number.

    The algorithm is simple Newton-Raphson -- make a good initial guess, then iterate making the guess better. I think Newton-Raphson on 1/sqrt picks up 5-6 bits each try in the line "x = x*(1.5f - xhalf*x*x)". It can be repeated to get a more accurate result each time it's repeated.

    The problem with Newton-Raphson is making a good first guess--otherwise, you need an extra iteration or two. And that's what the magic number is doing, making a good first guess.

    So let's work out what a good first guess would look like for 1/sqrt(f), to see where this code came from.

    Floating Point numbers are stored with a mantissa and an exponent: f = mantissa * (2 ^ exponent), where exponent is 8-bits wide and the mantissa is 23-bits wide.

    Let's take an example: 1/sqrt(16) would have f = 1.0 * (2 ^ 4). We want the result 0.25 which is f = 1.0 * ( 2 ^ -2).

    So our first guess should take our exponent, negate it, and cut it in half. (Try more examples to see that this works--it's basically the definition of 1/sqrt(f)). We'll ignore the mantissa--if we can just get within a factor of 2 of the answer in one step, we're doing pretty well.

    Unfortunately, the exponent is stored in FP numbers in an offset format. In memory,

    exp_field = (actual_exp + 127) << 23
    The mantissa is in the low 23 bits, and the most-significant bit is the sign (which will be 0 if we're taking roots). For now, let's just assume we have our exponents as 8-bit values, to work out what we need to do with the +127 offset.

    We want new_actual_exp = -(actual_exp)/2. But in memory, exp = (actual_exp + 127). Or, actual_exp = exp - 127.

    Substituting gives (new_exp - 127) = -(exp - 127)/2. Simplify this to: new_exp = 127 - (exp - 127)/2 => new_exp = 3*127/2 - (exp / 2).

    Now the exponent is shifted 23 places in memory, so let's write out our code (and ignore the mantissa completely for now...):

    i = ((3*127)/2) << 23) - (i >> 1);
    rewriting as hex:

    i = 0x5f400000 - (i >> 1);
    Well, first, it's arguable whether it should be 0x5f000000 or 0x5f400000 (The "4" is actually in the mantissa). I'm guessing resolving that dilemma led to the original author discovering that choosing a particular pattern of bits in the mantissa can help make the initial guess even more accurate, leading to the 0x5f3759df constant.

    I haven't worked it out, but Chris Lomont http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf [lomont.org] shows this first guess is accurate to about 4-5 bits of significance for all floating point values. That's a good result, considering that mucking with the exponents was just hoping to get us within 1-2 bits of significance.
    • Re:Clever trick! by fph il quozientatore (Score:1) Saturday December 02 2006, @07:34PM
  • not that accurate (Score:1)

    by quakehead3 (988738) on Friday December 01 2006, @06:36PM (#17074958)
    (Last Journal: Monday January 15 2007, @10:11PM)
    I compiled the function and tested it with few numbers...and compared the output with 1/sqrt(x)

    InvSart(4) = 0.499154
    1/sqrt(4) = 0.5

    InvSart(10) = 0.315686
    1/sqrt(10) = 0.316228

    InvSart(2) = 0.70693
    1/sqrt(2) = 0.707107

    InvSart(100) = 0.0998449
    1/sqrt(100) = 0.1

    so it's pretty accurate to around three figures...
    • 1 reply beneath your current threshold.
  • Similar solution for x^(-1/4) (Score:2, Interesting)

    by jothaxe (822206) on Friday December 01 2006, @07:44PM (#17075990)
    I think I have found a similar solution for the reciprocal of the FOURTH root. The code looks almost the same, but it requires a new constant and you shift the bits 2 places instead of just 1. float Inv4thrt(float x) { float x0 = x; int i = *(int*) // get bits for floating value i = 0x4F541EEF - (i>>2); // gives initial guess y0 x = *(float*) // convert bits back to float x = (x/4.0f)*(5 - x0 * x*x*x*x); // Newton step, repeating increases accuracy return x; } Note that the constant I have here is not very carefully tuned so the error may be slightly larger than with the sqrt. Pretty cool, I think.
  • lousy code (Score:2)

    by idlake (850372) on Friday December 01 2006, @08:00PM (#17076144)
    The code may or may not be a good way of solving the problem, but it's lousy code nonetheless:

    * it's badly named: "reciprocal square root" would be better (however, that's a problem with graphics lingo)
    * it lacks a comment as to where it came from
    * it lacks a comment as to how it works or how the constant was determined
    * it lacks a comment as to how well it works
    * it lacks a comment as to the range of arguments it's valid for
    * it lacks (optional) error checking
    * it lacks a comment as to what environment it works in
    * it lacks a straightforward base implementation
    * there don't seem to be any test cases either

    The code posted in the article is actually not exactly the Q3 source code. The Q3 source code actually contains two slightly different versions of the function, one called "Q_rsqrt" commented (with "// what the fuck") and a single iteration, and second one that's uncommented and named incorrectly "SquareRootFloat" and uses two iterations. Neither of them say what the function actually does, and the fact that there are two of them is also not so great.

    Yeah, that kind of code causes me "to go wobbly in the knees"--by annoying the hell out of me--because the programmer who wrote it is wasting my time. Pretty predictably, sooner or later, this code will break something, and it will first take time to figure out that this code is responsible, then it will take some time to figure out what it is supposed to do, and finally, it will be a pain to figure out how to fix it. Or, I'd just rewrite it.
    • Re:lousy code by WilliamSChips (Score:2) Saturday December 02 2006, @07:42PM
    • 1 reply beneath your current threshold.
  • by adam31 (817930) <[moc.liamg] [ta] [13mada]> on Friday December 01 2006, @09:50PM (#17077028)
    Probably no one cares, since log and pow aren't as useful as rsqrt, but I wrote a FastLog (base 2) that uses basically the same principle.

    inline float FastLog2( float x )
    {
    unsigned int fint = (*(unsigned int*)&x & 0x7FFFFFFF);
    float exp = (float)( fint >> 23) - 127.0f;
    fint &= ~0x40000000;
    unsigned int mant = fint | 0x3f800000;
    float fmant = *(float *)&mant - 1.0f;

    // cheaper, slightly faster approximation:
    // return exp + 1.3333f*fmant - 0.33333f*fmant*fmant;

    // better, slightly slower approximation:
    float t0 = 0.1566666f * fmant + .5746666f;
    return exp + t0*fmant + (1.0f-t0)*fmant*(2.0f-fmant);
    }
    I guess the useful piece is that you can do basically the same thing with exp2(x) and combine it with log2(x) to get a decent powf(x,y) approximation.

    inline float FastExp2( float x )
    {
    unsigned int exp = ((unsigned int)x + 127) << 23;
    float fexp = *(float *)&exp;
    float fmant = x - (float)(unsigned int)x;
    float t0 = (.695f - .08f*fmant);
    float curve = 1.0f + t0*fmant + (1.0f-t0)*fmant*fmant;

    return fexp * curve;
    }

    inline float FastPow( float x, float y )
    {
    float logx = FastLog2( x );
    float answer = FastExp2( y*logx );

    return answer;
    }
    Enjoy!
  • breaks on gcc (Score:2)

    by idlake (850372) on Friday December 01 2006, @10:59PM (#17077544)
    Apart from the unnecessary obscurity of the code, the code actually simply computes the wrong values with current versions of gcc and optimization turned on. See here:

    http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg02943.ht ml [gnu.org]

    http://gcc.gnu.org/ml/gcc-bugs/2006-03/msg02957.ht ml [gnu.org]

    It also makes unstated assumptions about the values it gets called with; call it with something else and you get bad results.
  • In other words... (Score:1)

    by redblue (943665) on Friday December 01 2006, @11:02PM (#17077564)
    Java sucks, right?
  • by d_jedi (773213) on Friday December 01 2006, @11:13PM (#17077620)
    But it IS just an approximation. It doesn't give the right answers.

    Ex: InvSqrt(25) returns 0.19969 instead of the expected result of 0.2. (ie. sqrt(25) = 5, 1/5 = 0.2, for those wondering about the math).
  • by kazad (619012) on Friday December 01 2006, @11:16PM (#17077640)
    (http://instacalc.com/)
    The code is extremely clever. Net: It finds the inverse square root [1/sqrt(n)] using a great initial guess and one iteration of Newton's approximation method. It avoids excessive division, the square root operation, and multiplication, which are computationally expensive.

    I'm not an expert, but heres how I understand it:

    1. Background: Newton's method finds roots of any function
    ------

    What does factoring an equation have to do with finding 1/sqrt(n)? A lot. Give me a number n. I now make the function

    f(x) = 1/sqrt(x) - n

    Notice that when you find an x where f(x) = 0, it means x is the inverse square root of n:

    f(x) = 0
    1/sqrt(x) - n = 0
    1/sqrt(x) = n
    x = 1/sqrt(n)

    In other words, I need to find the root of that equation. Newton's method lets you do this by picking a starting value, seeing how far off you are, and getting closer and closer with each iteration. There's more info online. With Newton's method, call your initial guess "g". An better approximation for the root is

    guess_new = g - f(g)/f'(g)

    In our case, f(x) = 1/x^2 - i (where i is the initial value, as seen in the code). We use the power rule to see that f'(x) = -2x^-3, and plug it into the guess_new equation above:

    guess_new = x - (1/x^2 - i)/-2x^-3
    guess_new = x(1.5 - ix^2)

    which is exactly what the code above has. If you keep plugging "guess_new" back in the equation, you can get closer and closer to the answer.

    Here is a demo of multiple iterations to find inverse square: http://tinyurl.com/vh7hg/ [tinyurl.com] Try plugging in different initial guesses (.2, .4, .8) to see how it converges. With me so far? Newton's method finds roots, and finds them fast if given a good guess.

    2. Now our problem becomes: How can we make a good guess?
    ------

    If we had a lot of time, we could just pick a random number and keep iterating using the method above. But that would be slow - we want a *good* guess.

    Well, our best guess for the inverse square root is the inverse square root itself! What's a good way to get 1/sqrt(n)?

    This is the first level of magic. Assume you have a number in exponent form, like this:

    10^6 = 1 million

    If you want to find the regular square root of 1 million, just divide the exponent by 2.
    sqrt(10^6) = 10^(6/2) = 10^3 = 1 thousand.

    If you want the *inverse* square root, divide the exponent by -2 to flip the sign.

    invsqrt(10^6) = 10^(6/-2) = 10^-3 = 1/thousand

    Ok so far? Our goal is to divide the exponent of i (our number) by -2 to get a really awesome guess for Newton's approximation method.

    3. Floats are stored in mantissa-exponent form
    ------

    This is the key. Floating-point numbers have an explicit exponent and mantissa component. Theoretically, we could mask out the bits for the exponent and do division.

    But division is expensive; the code uses another clever hack. Shifting bits is the same as dividing by 2 (or 4, 16, or any other power; the remainder is truncated, which is OK for an approximation).

    So we can divide by 2 easily. And if we want a negative number, instead of multiplying by -1 (expensive), we can just subtract the number from "0" (cheap).

    The program converts the floating point into an integer (using the pointer tricks), shifts the bits by 1 to halve the exponent, and subtracts from "0" (the magic number - hold on) to negate it.

    4. Why the magic number 0x5f37...?
    ------

    We can't just subtract from zero, there's too much going on. First, by shifting the bits we mave move some of the exponent bits into the mantissa. Also, there are different cases of odd/even exponents. The paper goes into lots of special cases, I didn't really understand them all first time around. But the magic number tries to minimize errors, and there can be several magic numbers used.

    5. What's the result?
    ------

    The result is that you get a great initial value to
  • So, if you do

    fortune -m 'bits in'

    on any modern linux with fortune installed you will get the following,
    (along with usually at least one other tidbit)

    #define BITCOUNT(x)     (((BX_(x)+(BX_(x)>>4)) & 0x0F0F0F0F) % 255)
    #define  BX_(x)         ((x) - (((x)>>1)&0x77777777)                    \
                                 - (((x)>>2)&0x33333333)                    \
                                 - (((x)>>3)&0x11111111))

    It works, and I have used it to good advantage in circumstances where
    nothing else would _quite_ do...

    What, you ask, does it do?  It gives you the number of bits which are set in a
    given byte/word/dword...

    So BITCOUNT(0x81) == 2

    and so on.

  • by Bromskloss (750445) on Saturday December 02 2006, @07:29AM (#17079538)
    Or is it fast already? Does the hardware just decrease the exponent with one when it sees that the value to multiply x with is 0.5? Otherwise, it sounds like that's something you would want to do manually, right?
  • Don't know why (Score:2)

    by 12357bd (686909) on Saturday December 02 2006, @11:56AM (#17081064)

    Really, by the comments on this page, it's amazing how people use functions they don't understand! What's this? Faith?
    Programming should not be based on faith but on reasoning.
    The article even fails to mention that the function is just a rough approximation at 1/sqrt(), this function would be better labeled RoughInvSqrt(). And yes bitwise manipulation of float point values is cool, yes, that's programming.

  • by TomerM (918036) on Saturday December 02 2006, @02:01PM (#17082084)
    The original article that came out more than a year and a half ago, right after Quake's source code has been revealed, has been published at Code Maestro: http://www.codemaestro.com/reviews/review00000105. html [codemaestro.com]
    If someone doesn't have any good ideas of his own for writing articles, perhaps he should have picked another profession...
  • by gvwalsh (1034796) on Saturday December 02 2006, @05:56PM (#17084138)
    I can shed some more light on the 1/sqrt(x) topic. I coded a vectorized version of this function for the Ardent Titan "graphics supercomputer" in 1987. I did not originate the idea but got it from Cleve Moler, original founder of MathWorks and author of MatLab. I did some work with Gary Tarolli when he was at Stellar, a competitor of Ardent which eventually merged with it to form Stardent computer. He may have picked it up there. The function was widely used in the Ardent 3D code base. I also had a 1/cuberoot(x) version which worked in a similar but slightly more complex manner. It was not used in 3D graphics but did find application in some mechanical engineering analysis programs. - Greg Walsh
  • InvSquirt!?-Sperm. (Score:1, Funny)

    by Anonymous Coward on Friday December 01 2006, @02:53PM (#17071012)
    Nope. It's when it goes in instead of out.
    [ Parent ]
  • Re:x * x, right? (Score:3, Informative)

    by Eudial (590661) on Friday December 01 2006, @02:56PM (#17071086)
    The inverse square root is 1/(sqrt(x)), not x^2 (which is, I admit, the first thing I thought of, wondering why anyone would be so excited about a faster way of getting it).


    Well, you're sort of right.

    f^-1(x) = x^2 is the inverse of f(x) = sqrt(x) where x > 0

    [ Parent ]
    • 1 reply beneath your current threshold.

  • All floating point operations are approximations, writing good floating point code is all about understanding the approximations and tradeoffs you are making.

    and floading point operations can be one of the slowest parts of code

    i have to agree on the lack of comments in the source for games, it can make them very hard to penatrate (particularlly when combined with heavy use of macros)
    [ Parent ]
  • Well, I made a mess of that. Poor formatting and not anonymous. I tried.
    [ Parent ]
  • Re:Bad programmer, no cookie (Score:3, Interesting)

    by DeadCatX2 (950953) on Friday December 01 2006, @04:40PM (#17073016)
    (Last Journal: Tuesday September 19 2006, @01:23PM)

    It is probably a premature optimization that will slow computations down.
    I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

    This is John Carmack we're talking about. I imagine he knows how to use a code profiler.

    The approximations it involve are likely to cause bugs.
    Floating point always involves approximations. If you don't like them, use a better approximation (double precision). But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

    I mean, it's not like they're running scientific simulations with this. It's Quake3.

    Furthermore, converting floating-points to integers and then doing calculations on them is a trick that has been known for years, long before Quake3.
    They weren't converting floats to ints. They manipulated the bitwise representation of the float. I'm not sure if that's what you meant, though.
    [ Parent ]
    • Re:Bad programmer, no cookie (Score:5, Insightful)

      by Chris Burke (6130) on Friday December 01 2006, @08:50PM (#17076620)
      (http://slashdot.org/)
      I imagine they ran some profiling tools and found that they were spending a substantial amount of time in calls to sqrt(), and figured that the division was also time-consuming and for an operation that is performed on so many pixels, it was worthwhile to optimize this particular routine.

      I've written enough 3D graphics code -- including a textured polygon rasterizer that would probably cry and try to delete itself if it saw something like Quake 3 -- to know that they didn't have to run a profiler to know that they'd be spending too much time doing 1/sqrt(x) if they didn't have a highly optimized routine for it. It's an inherent operation in so many 3D calculations it isn't funny, and while by the time Quake 3 came out hardware floating point units were pretty fast, FP divides and FP square roots were very lengthy operations that more importantly couldn't be pipelined.

      But if you're just trying to, say, figure out which pixel to color, and you approximate the pixel to a few decimal places...I think you're good to go.

      Yeah, pretty much. Back when I wrote my code (pentium days) you had an FP unit but it wasn't very good, so I used fixed point math (using integer instructions) which had sufficient precision for a 320x200 display. Getting enough performance out of the core algorithms was still hard enough that I had to take a lot of shortcuts, like instead of doing the right thing by using a divide every pixel to calculate which texel to use, I used a divide every 8 pixels and linearly interpolated in between. I'm sure that Quake (the contemporary 3D engine of the day which would also make my code cry) contained many more clever optimizations and approximations, because it wouldn't have been possible on the hardware of that day without them.

      In fact, approximating FP values for 3D code is so common that the 3DNow and SSE instruction sets contain instructions that approximate the square root and inverse square root to about half of single-precision floating point. The non-approximation instruction uses a lookup table to get an initial guess, then uses a couple iterations of Newton's Method to refine the result. The short cut instructions simply return the value in the lookup table.

      So yeah, basically the AC OP has no idea what he is talking about, and from that basis of ignorance is denigrating what is in reality a very clever and extremely useful hack.
      [ Parent ]
  • 7 replies beneath your current threshold.