Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Quake First Person Shooters (Games)

Carmack On 3D Linux 71

Gaza wrote in to send us an Essay written by John Carmack to help in making GL drivers for Linux. Its kinda techie stuff, but its still interesting.
This discussion has been archived. No new comments can be posted.

Carmack On 3D Linux

Comments Filter:
  • by Anonymous Coward
    Xshm is the shared-memory extension and has been around for years. It allows you to pass X protocol stuff in a shared memory segment.

    Xdga allows for direct graphics access to a video card's frame buffer. This has been around
    since Xfree86 3.2 or so and is used for things like XawTV, etc. A write-up can be found
    here [ucsd.edu].
  • by Anonymous Coward
    Well, it depends:
    Office 97 will run happily on a P-100 with 64 MB of RAM. Office 2k, on the other hand, will want my K6-III 500 (I know you said "K6-2 450," but I have a K6-III 500).

    Games are almost always FIRST to stretch the hardware boundaries and all that advancement and innovation we love, but the rest of the industry eventually realises "hey, there are, like, a squillion people with massively powerful boxes, let's see if we can tax them as much as these games do!" Thus, the proliferation of "features" (I'm pretty sure I don't know about half the features in Office97, nor will I ever likely use them).

    Enough for now.
  • by Anonymous Coward
    ...that the programmer is writing code specifically for the TNT, not that the TNT is difficult to write for.

    Writing for the TNT may be easy, but one can't get away with just supporting the TNT.

    Maybe using built-in support for the TNT (and other well-supported cards) and using fbcon/svgalib/X for everyone else would work, though... .
  • by Anonymous Coward
    besides the tri-buffering, most everythint makes complete sense... why tri-buffer indeces??? i don't see why there would be a limitation??? unless someone else can point it out for me.
  • by Anonymous Coward
    What PI are actually doing is GLX, not just direct rendering.

    GLX allows _non-local_ X Clients to send OpenGL commands to the X server. i.e. a client running on
    somewhere.else.net can send OpenGL drawing instructions to your X server on your.host.net -
    thus allowing HW accelerated functions of the gfx card on my.host.net to be used by a program running on somewhere.else.net to display on my.host.net.
    Of course, if the server isn't running on a computer with hw-accelerated 3D, GLX can still be used. There's already a glx module for XFree86 that just passes all GLX stuff to Mesa. It worked fine for me.

    This is much wider reaching than "simply" hw-accelerating direct-rendering to a window.
    Incidentally, you can already get Mesa running on top of Glide to do this, by using a dodgy copy-3d-buffer-mem-to-window hack.

    PI are also laying down a standard implementation for adding HW 3D accel to X, for any 3D card, which is what you're talking about. However, a lot of the first generation 3D cards don't "officialy" support rendering to anything but full screen (thus necessitating dogy hacks), whereas the X-Server/Client paradigm really should support 3D-accelerated rendering to windows too. Of course, the DGA extension (analogous to, though not the same as, DirectX (which is _not_ Direct3D, which is Microsoft doing their utmost to kill OpenGL)) may well be built upon to allow fullscreen mode 3D-accelerated rendering.

    VMware and UAE both need DGA for optimum performance.


  • by Anonymous Coward
    triple buffering is very useful...

    Suppose an app that wants to switch its display frames synchronously with the vertical retrace. Well, on the PC you won't have to suppose much, because we can only switch frames during the vertical retrace. whatever.

    So suppose this app is using double buffering. It renders one buffer while the other one is being shown on screen. when it's done, you want to switch buffers. but the switch cannot occur before the vertical retrace - so you have one buffer that is completely rendered and the other one that you cannot use yet because it is still being shown on-screen.

    if your screen shows 60 frames per second but you can only render 59 frames per second, with double buffering you will only be able to show 30 frames per second - you take just a bit more that a frame delay to render a buffer, and then you have to wait till the start of the next frame. with triple buffering, you will be able to display your 59 frames per second.
  • by Anonymous Coward on Saturday May 01, 1999 @10:42AM (#1907893)
    I think Carmack makes an excellent point about X's rendering pipeline. Under the X Window System, the Xclients (any X program) update the display by writing data to the X Display Server via a socket.

    While this is great for doing things over a network, it limits performance, especially performance where huge chunks of data must be thrown around (like in higher resolutions/depth) on the local machine.

    UNIX (LOCAL) sockets speed this up by about 100% (or more or less, depending on the implementation), but it's still fundamentally flawed.

    Each write to the X Server must involve the kernel's interaction. This introduces more overhead than it probably ever should. Especially if you want excessively high performance.

    It's a simple fact. While the X Server remains a seperate process that can only be updated via a local socket, performance will be gated.

    I'm not an expert on The X Window System, so I really don't know what a viable solution would entail. I'm just throwing this out: What if we were to make an extension to Xlib that would ask the X Server if it would allow the Xclient to mmap part of the framebuffer (it's client area) or just mmap the entire thing? All subsequent Xlib calls would write to the framebuffer.

    This introduces a whole mess of issues, but we need to do SOMETHING. Anyone want to open up discussion? or tell me how wrong I am? :)

    --Michael Bacarella
  • John Carmack did some experimenting with this,
    take a look at http://finger.planetquake.com/plan.asp?userid=john c&id=11716
    (The entry from 9/10/98)

    I'm not saying that it would be impossible to do, but it is probably not that easy to do.

    oh.. and shared memory is probably faster than unix domain sockets.

    /Andreas
  • We can only access the framebuffer of supported video cards. Accessing the framebuffer really isn't that interesting since we can't send any 3d commands to the card via /dev/fb.

    /Andreas
  • The specs for the G400 isn't available as yet without an NDA though.

    If 3d support under Linux is a must I'd wait with the G400 until I heard that the specs for the 3d part was released. (Or until matrox releases a working OpenGL driver for XFree86, whichever comes first... :))

    Keep in mind that the specs for the G200 was released half a year after the card appeared in shops. In the 3d accelerator market half a year is a very long time... but on the other hand.. matrox did release the specs, let us hope that they continue to do that.

    /AE
  • The interface that /dev/fb uses has ioctl:s for changing the resolution on the fly. vesafb can't use it though due to the complexity of using the VESA API from kernel space.
    If you are using for example matroxfb you can change resolution of the console at will.

    /Andreas
  • The only programmer writing code specifically for the TNT should be the programmer of the OpenGL library.
  • Is DGA something similar? I know I've seen that around, but I can't really remember what it is..

    Anyway, regarding X in general, I know that the XFree guys were asking for more people to help with their project in general -- I understand that 4.0 is pretty far behind where they'd like to be..

    If I knew more about graphics, I'd probably try and help, but it's not my strong suit..
  • Yeah, and a shared mmap(MAP_ANON) would be fast too. But there's a problem: memory sharing and IPC are not the same. If you REALLY want to get rid of memory-to-memory copies as much as possible (read: not use sockets) you can use shared memory and then IPC to do it. However, the copy overhead would be removed, but the context switch overhead would remain the same.
  • While we're correcting:

    Berkeley is spelled "Berkeley", not "Berkely".
    It says "only two products". If you believe that, it's very funny!
  • Why is there a warning that there's an article on slashdot that "has some kind of techie stuff" included in it? I mean, I'm so used to the page that I probably wouldn't notice immediately, but the masthead does still read "News for Nerds", right?

    I mean, we have to take the occassional break from making fun of Bill "Richboy" Gates and beating up on Jesse Berst to still do geek-type stuff on occassion, don't we? I'd think so.

    Now that I'm done abusing Rob (for now), here's my constructive observation:

    I've always argued that games were vital to Linux because of the user base and notoriety that they bring, as well as enhancing my ability to kill time at the office. This article, however, point out another big advantage that I hadn't really considered before: Linux games spur technical enhancements.

    I feel stupid for now adding this to my list of "Why I like games" before, but it's up there next to number one right now. Hey, you could even argue that games are what drive us to get better and better computers: I mean, who needs a 450Mhz k6-2 to run Word Perfect or that other word processor from those guys in Washington?

    ----

  • by cduffy ( 652 )
    The GGI project's KGIcon drivers can work as fbcon interfaces. Your S3-based card should certainly be supported.
  • The quality of the individual drivers (amount of acceleration and the like) varies wildly. Best advice? Read the thing; There's a point where it connects all the hooks to acceleration-specific functions, so you can find out how many are implemented.
  • I read all of the posts (so far), and I didn't have a clue what half of them were talking about. It's great. I love it. I spent a couple hours last night surfing around trying to figure this stuff out. Some of the posts are wrong, but they're still good because they got me thinking.

    We need more stories like this on /., instead of all the World Dominion fluff and mindless flaming.

    TedC

  • Well, that _is_ why MIT Shared memory and DGA are available in most recent X servers (including XFree86). Then you don't have to do X calls for doing all the work - just blit image data to a shared memory area (or into the framebuffer directly for DGA) and save yourself buttloads of time.
  • by nathanh ( 1214 ) on Saturday May 01, 1999 @11:58PM (#1907907) Homepage
    I'm not an expert on The X Window System, so I really don't know what a viable solution would entail. I'm just throwing this out: What if we were to make an extension to Xlib that would ask the X Server if it would allow the Xclient to mmap part of the framebuffer (it's client area) or just mmap the entire thing? All subsequent Xlib calls would write to the framebuffer.

    You're too late, it's already been done. It's called DGA (Direct Graphics Access) and has been part of modern X servers for a while now. With a DGA application you get *equal* performance when compared to FB or SVGALIB.

    The obvious examples of well written DGA apps include XMAME and SNES9X.

    DGA even works exactly like you suggested. It lets you mmap the framebuffer.

    The MITSHM solution others have pointed to is simply not good enough. You still get bogged down in the X socket, so there are a couple of wasted copies involved, and performance degrades fairly noticeably. DGA is the better solution.

    UNIX (LOCAL) sockets speed this up by about 100% (or more or less, depending on the implementation), but it's still fundamentally flawed.

    Please be careful to know that though sockets are "fundamentally flawed" (in that sockets are always going to reduce performance), the concept of X isn't fundamentally flawed. SGI uses mmap'd ring buffers for local clients, avoiding all the issues with system call overhead. SGI manages to retain the benefits of the X abstraction without sacrificing performance. They just used cleverer X-server code.

    Remember that all good systems will sacrifice some speed for a good abstraction. Even Linux is guilty of sacrificing that extra 10% performance to keep the nice UNIX abstraction. X is a really nice abstraction, so don't blame it for losing a few percentile points of performance.

    Also, the XFree86 team could really do with a lot more coders. X is easily as complicated as a UNIX kernel, if not more so, but they have a lot fewer people working on XFree86 than work on the Linux kernel! There are a lot of very cool ideas that X can do - stuff invented by SGI - that the XFree86 group would like to do, but without good coders these ideas will never be implemented. If you want a real project to get your teeth into then XFree86 is challenging: drivers aren't the only things XFree86 members work on! And, if you really like 3D stuff and OpenGL, then now is the right time to help work on XFree86!

  • Actually, on single-texturing the Banshee is somewhat faster, since its default clockspeed is higher.
  • The case where this isn't good is when you care about framerate on a local display, you're generating stuff on the fly in a method that video cards can do for you, and you're pumping out a lot of data.

    To me it sounds like this situation is one in which the user is going to be focused pretty heavily on what this program is generating. In this case, you don't really want other windows onscreen, so it's not just the overhead of X you don't want, it's also the functionality.
  • The 3DLabs Permidia2 is a _cheap_ OpenGL board -- performs about like a Voodoo1, and has excellent GL under Win95/98/NT -- fully compliant and everything. A decent choice for a business or low end graphics workstation. I think you can get one for $40, maybe even $20.

    3DLabs has a variety of much newer, more powerful boards you can learn about at their website [3dlabs.com]. I'm sure some other people make GL boards -- Gloria comes to mind. All $$$$!

    Under Linux you can use GL on your Voodoo1 ($30) or Voodoo2/Banshee ($80), supported thru Mesa (not as fast as it could be, since it's Mesa on top of Glide). Under Windoze, you can use 3Dfx's OpenGL ICD, which is really only a subset of GL implemented for Quake (and therefore not really any good for graphix work), and is also slo since it's going thru M$ function calls (yuck). Many other boards have GL ICD's for Windoze, and nVidia is supposed to be writing one for the TNT for Linux.
  • Remember that the context switch is not needed per Xlib call, but only when the buffer fills up. In fact the percentage overhead of the context switch can be reduced to very small by not doing XFlush excessively (there are programs out there, especially ones written by people who are used to direct hardware access, that do XFlush and XSync all the time, resulting in horrid performance).

    There are huge wins besides networking with getting all the graphics into it's own process and easily worth the overhead.

  • Your assmuption that you cannot render while waiting for the buffer-swap is not entirely true. Rendering really consists of two parts. One is to take the incoming vertices and transform and clip them. The other part is letting the HW draw them. If you have a graphics chip that uses DMA buffers (well, pretty much every chip by now) that you can still build a new DMA buffer while waiting for the buffer swap.

    - Thomas
  • Yes, we have done shared memory transport with Accelerated-X for quite a while now. However for 2D it just doesn't make a lot of sence anymore. If you have a good 2D core implementation you'll get already very close to the maximum without this type of transport (assuming that you use the MIT-SHM extension for images).

    For 3D that's a different story. OpenGL does support something called a direct rendering context. This is a GLX context that has semantics that allow libGL.so to be implemented in a way that it talks directly to hardware. In any case I feel that it would be foolish to expose an API that allows talking to hardware to a programmer. It's way to complex and gives to much opprtunity to screw things up (not intentionally, but hey, show me a bugfree piece of HW). Having OpenGL there and let libGL.so do the talking to the hardware makes way more sence.

    - Thomas
  • pm2 is a lot better than voodoo1.

    1. 8mb ram
    2. AGP
    3. Linux support (2D/3D) -- not to mention one of the fastest 2D X servers around.

    You can play q3a with it using the mlx/mesa modules, very cool. I got my card for 45 bucks. That's an accelgraphics permedia2 board.
    --
  • With all this talk of sockets and network operation, I'd like to remind everyone that that is only now just *one* possibility for accessing the display. Let's not forget that under linux 2.2 we can access the display directly via /dev/fb* devices.



    --
  • ...that the programmer is writing code specifically for the TNT, not that the TNT is difficult to write for.

    That is a valid point. Hopefully, however, we have a driver architecture flexible enough to work in client or server mode. Meaning that the driver can work in the X server for places where you still want to talk over a socket (AF_LOCAL or AF_INET) but can also talk directly to the card for special programs like Q3.

    I don't think our graphics card architecture (XFree drivers, essentially) currently is even close to this, but I'm sure that someone will eventually make it work.

  • Most programs like things such as acceleration from video cards, at which point no sane programmer wants app->video card

    With modern video cards, this is completely reasonable. The TNT has channels that allow up to 127 programs to communicate directly with the hardware. IIRC, the privileged program (X server or kernel) creates a channel by specifying a clipping region and a 64k region of memory that the client can mmap(). The TNT (with help from the kernel - it needs an interrupt handler) the interraction between all these clients.

    There are papers out there that describe how good graphics cards should be designed. A good graphics card allows for app->video with accelerated (2d and 3d) features. It is nice to see that these cards are starting to show up for ~$100.

  • X over sockets is like this:
    app -> socket -> X -> video

    X with shm is like this:
    app -> X -> video

    Note that there's still a context switch. This is Not A Good Thing. We need this:

    app -> video

    Thus we have svgalib, ggi, SDL, and so forth.
  • Depending on who you ask, the spelling can also be
    "Berzerkeley"...
  • The transport mechanism between a local X client and the X server is not restricted to a Unix socket. The X server vendor is free to implement it as a shared memory queue or such (I believe Accelerated X servers do something of that sort), so this is not a flaw in X architecture.

    Your idea about mmapping the framebuffer and allowing Xlib to deal with it directly is completely flawed, OTOH. First off, it disables use of hardware acceleration, which means performance would drop much more than using a Unix socket for communication with the X server. However, even if you have some sort of plugins for the Xlib which provice acceleration for specific cards, there would have to be a single semaphore for serializing access to the card's accelerator between different X clients.

    People who designed X were not stupid and really, really experienced. X does have flaws, but not where you showed them.

    Anyway, all of the above applies to "classical", 2D operation mode of X. Direct 3D rendering for games is something absolutely different, and your ideas, transformed to some point, are correct :))

    Heck, Precision Insight had them half a year ago :))

    Cheers,
    -diskena
  • That sounds like it would solve some 2D problems, but not 3D, and not a more general class of 2D problems either.

    It wouldn't solve 3D problems because the cards need access to your data to do the raster rendering themselves. It sounds like you're advocating mainly framebuffer access, which isn't enough for this kind of thing.

    Also, a lot of programs would want access to the 2D acceleration features of a card, and this solution doesn't allow for that either.

    Maybe some kind of special shared-memory only version of the X protocol that allowed you to refer to client resources by shared memory offset so the server could access them directly. This would allow you to do anything with the shared memory area that you could normally do with the X protocol.

    I also may be talking out of my hat here because I don't know a huge amount about how X servers or graphics cards work.

  • Let's see... Your Voodoo2 card could cost around $150 or so... (It was almost $200 when I got my Banshee)..

    And that doesn't do 3D...

    And you're wasting another slot in your system..

    YET - My Banshee cost me $99, has excellent 2D, and I'm not wasting slots.

    I don't think anyone who bought a Banshee bought it because they thought it would trounce the V2..

  • Please note that app->video card only works in the few instances when the app is simply writing pixel data (such as image viewers, etc.) and doesn't need any functionality from the card aside from being able to display passed pixels.

    Most programs like things such as acceleration from video cards, at which point no sane programmer wants app->video card, that's just painful as hell as then you have to start supporting the acceleration of all different video cards, which is what X is around for in the first place.

    So while what you're talking about is useful for unnacelerated video playback, it doesn't have much applicability to anything else. Wouldn't be that bad an idea, though, except that it would be a bit hard to work out the security model for.
  • Remember that glide is being written by Daryll Struass (sp?) in his spare time without being paid. The voodoo people are basically using Daryll. Remember that in this case the "customer" is being given the drivers by Daryll, he's got no reason to besides the goodness of his heart.

    If you want someone to whom you are the customer to write drivers, ask the makers of the voodoo chipsets to write their own drivers and do it on a timely basis for a funded company. Daryll's doing an amazing job considering that he's not being paid for it.
  • It's called the MIT shared memory extension. Most programs that do intensive graphics stuff (TV players, the gimp, etc.) already use it. X and the program share a section of memory. The program writes to it, and then notifies X when it's done. Then X simply uses that area of memory for moving things to the card. The kernel isn't really involved at all except for setting up the shared memory area and for the pipe protocol to notify the X server. This also allows the X server to properly do clipping, and things like that.
  • Well Q3 test for linux just came out and I have a banshee, and glide for this beast (and V3) isn't done yet. Hm. Why not. It looks like either I go buy an outdated voodoo 2 or go ask for my voodoo 1 back that I gave away because it too is outdated. I wish the wheels of development would turn just a little faster on these sorts of things that the customer "wants". I don't want to dual boot with Micro~1 Window95 just to play QuakeN where N is {1,2,3}.
  • If we are talking about a higher level graphics API like OpenGL, going through X (using GLX) can sometimes increase speed.

    Consider the situation where the opengl driver for a card needs to do a bit of non trivial preprocessing of the gl commands before sending them off to the graphics card (the extreme of this is software rendering). Now AFAIK gl is not thread safe, so this processing would occurin the same thread of execution as the program.

    Now with GLX and a multi processor machine, the render preprocessing would occur in a separate process, in effect giving your program a separate render thread that could run on the other processor.

    As for X connections being slow, the X protocol spec defines how information should be sent accross the network. For local connections (eg :0.0), the data can be sent using the most efficient method to the X server. If you know of a faster method than unix domain sockets/pipes, the XFree86 team may be interested in your input.
  • They made a kludge where they copied directly out of the voodoo's memory and drew that to a window. Suck, suck, suck, performace wise.
  • I loaded the q3test and I get sound that re-occurs in a loop. I guess I'm getting 1 frame every 10 seconds.

    Hardware you ask?
    AMD 233, 64 Meg Ram, Voodoo Rush

    Software?
    Redhat Linux 5.2 out of the box install with one exception, Glide and modified SVGA server for Voodoo card.

    Oh yeah, in quake 2 I can't get mouse to act right. I turn on "freelook" and all I can see is the floor.
  • Please look at Precision Insight [slashdot.org] This is funded by Red Hat, and gives a similar rendering architecture to SGIs on sensible hardware. This is going into XFree86 4.0 , which is rumoured to be coming out in June sometime.
  • The Riva TNT is not currently supported in 3d under Linux. Supposedly, by the time Q3A or TNT2 is actually released, nVidia will have full drivers for Linux.

    doozy
  • Im interested in running Q3Test, but I haven't tried yet. I have an STB Velocity 4400 AGP card, with the RIVA TNT chip. How would I get OpenGL rendering in Linux? I've heard about MESA, but that doesn't work with the RIVA TNT does it? Can someone point me to a good source of information on 3d graphics in linux with the Riva TNT?
  • But running Q3 in a window will make banshee look better ? Just wondering, I have heard Voodoo2's performace in a window is much lower
  • t really depends on your needs but a good one is the Riva TNT base one from Creative it's pretty much a bare bones card, so it's really cheap nowadays.

    Are there actually TNT GL drivers for Linux now? I thought we had to wait until the TNT2 came out and use those drivers (backwards compatibility, right?)

    "Software is like sex- the best is for free"
  • NT: Not The...
    ...pause...
    ...blue screen


    Hmmm.... I dunno, I've been using NT4 for months now and haven't gotten a single blue screen... the performance SUCKS on my P2-350, but it hasn't crashed yet. Of course, maybe that's cuz I only use it for an hour or so at a time before I can't stand it anymore and reboot into good ol' Linux.

    "Software is like sex- the best is for free"
  • Hey its been awhile x86, how are ya?

    Anyway I think that your idea sounds quite reasonable, granted that developers of X and developers of proggies that use the new X extensions can cooperate (additionally, authors of different x programs need to cooperate in case a user somehow invokes two programs which directly map the framebuffer at the same time :), as of course they have a history of doing.

    Making suggestionslike this is a good first step to gettinga formal proposal, and then to getting something /done/.

    Sounds like a good idea to me (and yes of course I realise the implications of complication wrt other programs, but it CAN be worked out and if linux is to gain any marketshare amongst the gamers of the coming years as games get more and more complex, then this is somethign that would definately be quite useful :)

    James
  • Sure the article's techie, and yes it's interesting. There's no need to apologise for posting techical articles here of all places - they're one of the reasons I (and presumably some others) like /. in the first place.

    (Bonus points for spotting the reference in the title)
  • I could not agree more.
    Why not create /.. and put all the nontechnical stuff there?
    I visit /. for technical news. Company mergers, personal feuds and marketing hype get plenty of coverage elsewhere.
  • RedHat is working with Precision Insight to add OpenGL direct rendering to XFree86. This will allow clients that are on the same machine as the server to render directly into the framebuffer. SGI already has this.

    XFree also supports the DGA and MIT_SHM X extensions than can be used to speed up local clients.
  • There's also matroxfb which is an accelerated driver for matrox cards. I use it with a Millenium I.

    Note that vesafb will work for any card with a VESA 2.0 BIOS and linear frame buffer.
  • > There are only two products to come out of Berkely: UNIX and LSD. We don't believe this is a coincidence.

    I've seen this so many times i can't hardly believe it. You miss the joke! It should read "BSD and LSD". (As in BSD unix, original unix did not come from berkley, but from AT&T.)
  • What's a good/cheap OpenGL card (AGP)...???
    --
    Alan L. * Webmaster of www.UnixPower.org

A Fortran compiler is the hobgoblin of little minis.

Working...