Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Microsoft XBox (Games)

Microsoft Introduces Its DirectStorage API Which Promises To Reinvent Game Storage (pcgamer.com) 43

Microsoft has finally released its DirectStorage API to game developers. This means one of the most promising features of the Xbox Series X is coming to the PC. From a report: DirectStorage promises to bring faster loading times thanks to optimized NVMe SSD accesses. Previously, a game could only perform one in/out access at a time. This didn't present any issues in the days of hard drives, but now that most gaming PC's have SSDs that can transfer gigabytes per second with hundreds of thousands of in out operations per second (IOPS) it's clear that a better method was needed. Enter DirectStorage. DirectStorage lets an NVMe SSD to reach its full performance potential by allowing multiple I/O operations concurrently. It allows assets to be transferred directly to the GPU, leading to better efficiency.
This discussion has been archived. No new comments can be posted.

Microsoft Introduces Its DirectStorage API Which Promises To Reinvent Game Storage

Comments Filter:
  • Is Windows storage really so fundamentally broken that "a game could only perform one in/out access at a time"?

    Admittedly, I have very limited experience developing anything on Windows, but this seems to be so bad that I have trouble believing even Windows is that defective ...

    • Windows storage wasn't, I think. I know I could program in multiple calls to happen simultaneously. Whether they actually happened that way would be a different question, as even SSDs are slower than the CPU and still serial access.

      I think that this is more a port type implementation:

      "Hey look X-Box devs, here's one less thing that you have to reprogram/develop/change if you want to port your game from XBox to Windows to get a wider audience!"

      That said, I still wish they'd change the menus more often. To

      • by gweihir ( 88907 )

        I think that this is more a port type implementation:

        "Hey look X-Box devs, here's one less thing that you have to reprogram/develop/change if you want to port your game from XBox to Windows to get a wider audience!"

        That makes a lot more sense, thanks.

        • I RTFA'd enough to follow the links down to the actual explanation [microsoft.com], here is the relevant snippet:

          Modern games load in much more data than older ones and are smarter about how they load this data. These data loading optimizations are necessary for this larger amount of data to fit into shared memory/GPU accessible memory. Instead of loading large chunks at a time with very few IO requests, games now break assets like textures down into smaller pieces, only loading in the pieces that are needed for the curren

          • by Rhipf ( 525263 )

            Which means that no, games were not restricted to one request at a time, but yes, Windows really is so pathetic that it bottlenecks your storage access when you use the normal APIs.

            Before saying "Windows really is so pathetic" how does this stack up against other OS drive access? Is Linux any better at accessing drives? Is macOS? There will always be bottlenecks in any software/hardware situation. I don't know that I would be so harsh on Windows bottlenecking on the old API since when it was programed there were no SSDs in circulation and it is only recently that this situation has changed. In 20 years it could very well turn out that this new API will bottleneck on the next storage m

            • They should be fixing this in the base OS without having to switch APIs just to get decent performance.

              • Yoiu want the base OS to be copying things directly from nvme to GPU RAM as a default operation?

                Seems to me like this is exactly the sort of thing you want an API for.

                • Yoiu want the base OS to be copying things directly from nvme to GPU RAM as a default operation?

                  I want the base OS to be copying things directly from NVMe to where I tell them as an operation. NVMe does this thing called DMA, you may have heard of it. I want it all to be controlled by the IOMMU. And I want the most efficient method to be used to get my data where I want it when I make the system call, without having to worry about the details, or use some alternate API.

                  • by batkiwi ( 137781 )

                    You could do 3 minutes of research as opposed to spouting off things.

                    This is an optimised API for readonly access to 4kb compressed files, usually textures. It does this by changing how it pipelines the reads and by using the GPU to decompress and load automatically into GPU memory without having to go through the CPU.

                    The base windows file apis do what they should do for the 95+% of normal use cases, including concurrent reads etc.

                    From https://github.com/microsoft/D... [github.com]

                    "DirectStorage is a feature intended

            • Windows is bottlenecked in terms of IO performance to a ridiculous level. For starters, the traversal bit is ignored in modern Windows to try and fix the broken performance of NTFS ACL checks, as the system doesn’t have a fast enough algorithm for checking permissions. Then you have the MFT being cached in RAM, which costs 1kb per file/folder on-disk, leading to the invention of Windows Dynamic Disk Cache (complete with source code) being released by Microsoft as a workaround for OOM scenarios caused
              • by gweihir ( 88907 )

                You also can’t blame NTFS as Tuxera made an NTFS driver which outperformed all other Linux file systems.

                Interesting. So it is not NTFS, it is the FS implementation and the VFS layer that is just completely borked. Getting poor performance from an FS that can demonstrably perform well is something I would expect from some small hobbyist project, not from the (unfortunately still) prevalent desktop OS. Well, a typical effect for a quasi-monopoly: Things get done shoddily and problems get only fixed if they become really bad.

            • by gweihir ( 88907 )

              Linux and xBSD (including Mac OS) VFS layers are all enterprise grade and high performing. They still have bottlenecks, but nothing like Windows, it seems. So far, I though this was just the ancient NTFS slowing things down, but it seems the whole (V)FS stack on Windows is a barely performing old mess.

              Being harsh on MS here is perfectly adequate, because they tried and failed several times to do some renovations here. They obviously did not take things seriously or do not have actually competent FS/VFS expe

          • Actually I've been working with both Linux and Windows in doing disk reads in the last few days. Not an expert at it, but what everybody is saying here doesn't sound quite right.

            I'm not sure what GP means by SSD being serial. Maybe that's a SATA/SAS limitation? NVMe is directly connected to the PCIe bus and behaves like anything else PCIe, meaning its memory (storage in this case) can be addressed directly. PCIe has multiple serial lanes, which can be accessed in parallel.

            Linux pretty much uses the standard

          • So in fact that summary is complete horse shit, the truth is that windows' existing APIs for data access choke when you queue up the large numbers of IO requests that NVMe drives are capable of handling, and they have created this alternative API for making those requests as a result.

            Is it though? Windows has had explicit scatter/gather I/O for at least a decade, and before that something similar under the name overlapped I/O since, I dunno, Windows 2000 or even NT 4. Given its necessity in data-intensive applications there's no way it could have got by this long without it.

    • a game could only perform one in/out access at a time"?

      I started reading the article and that jumped out to me, too. Most of it looks like a tech reporter who is reciting what they read without really understand what is important about it. The announcement is yet another API, the rest is fluff.

      I've been doing game development for approaching three decades. You never, ever do just one thing at once or block. IO must always be asynchronous because at the very least you must keep animating something on the screen. Better code rely on OS-specific functionality li

      • by SirSlud ( 67381 )

        "Most game assets are compressed which means the CPU needs to be accessed before loading the asset" is another WTF statement a few paragraphs later.

        What's WTF about that? Only the current gen consoles have added decompression in dedicated hardware, and most games I've shipped (AAA console games) files are compressed (usually bzip or zip) on disk and decompressed when being put into RAM. I guess maybe you're taking issue with the words "before loading the asset" when "before placing the asset in it's 'final

        • Yes, the wording is horrible and misleading as you noted. Pre PS5/Xbox SE the CPU needed to uncompress the compressed data.

          As you point about current gen console supporting compression another nitpick is that the CPU doesn't even need to be touched. i.e. On the PS5 RAD Game Tools Kraken [radgametools.com] compression is built in at the hardware level [gamingbolt.com] that is equivalent [eurogamer.net] to 9 Zen cores.

    • I'm not an expert by any means, but I think the problem was:
      1. load multiple gigs of compressed textures from disk
      2. uncompress in RAM
      3. shift to GPU texture memory

      I think what DirectStorage is meant to do is basically all of that in one go, at a low level, to get a lot of other ancillary crap out of the way. Basically directly transfer from the SSD to GPU VRAM and uncompress there saving a few bus clocks and a lot of bandwidth along the way.

    • Is Windows storage really so fundamentally broken that "a game could only perform one in/out access at a time"?

      No. All systems are so fundamentally broken that things can only perform one in/out access at a time. Windows, Linux, Nintendo, fucking Sony as well. Sony was the first to address this on the PS5. Microsoft next on the Xbox and now Windows.

      There's nothing "broken" about using CPU resources to perform I/O, it's just now what you need when streaming textures to a graphics card.

      But as usual you are so desperate to take a dig at Windows asking the "important" questions Tucker Carlson style, but really all you h

    • by Dadoo ( 899435 )

      I would be surprised if even Microsoft only allowed one in/out access at a time, but I can tell you from experience: Windows disk I/O is definitely slow. I have both the Linux and Windows versions of many of my games, and loading time - especially for open-world type games - is noticeably faster on Linux. Sometimes twice as fast.

  • by JBMcB ( 73720 ) on Tuesday March 15, 2022 @01:49PM (#62359919)

    The explanation of what DirectStorage is is incorrect.

    It's pretty simple. Normally you load textures into your graphics card through the CPU. The CPU directs an NVMe transfer to a portion of system memory, then the graphics card grabs it from system memory.

    DirectStorage lets textures load straight from the NVMe drive to the graphics card using PCI bus mastering. This PCI bus mastering feature has been around forever, but not much uses it besides really high performance network and disk array controller cards.

    • Is this distinct from DMA? Isn't that essentially what DMA is? Is this just a wrapper for that?

      • by tlhIngan ( 30335 )

        Is this distinct from DMA? Isn't that essentially what DMA is? Is this just a wrapper for that?

        PCI bus mastering is a form of DMA, and it's used by everything.

        Anytime you want to transfer something over PCI, you could do it via the CPU, or via bus mastering, and practically everything uses bus mastering. You lock the block of memory with the buffer of data, get a translated address (system memory space and PCI memory space are different address spaces, so the OS kernel maintains a mapping between the two),

        • History is funny. PDPs and Decs were backplane systems. Calls were made to disk via each board and each needed to control the bus. IOW, they were bus mastering.
          Then motherboard came with the CPU handling everything in a sequential fashion (tape and then disk).
          Now, we are moving back to multiple boards accessing data directly over the bus, and avoiding the motherboard/Main CPU/Memory.
          I am waiting for us to move to a true backplane again.
    • This PCI bus mastering feature has been around forever

      What hardware feature is around is completely irrelevant if an API doesn't exist to support it in a simple way. Games haven't directly accessed hardware for a long long time.

      While PCI bus mastering has existed forever, it's an enabling technology for DirectStorage which is most definitely something "new" in that it doesn't exist for games currently.

    • Your description is correct for what it promises in the future.

      But today, if you are using NVMe, and you are using Win11 (it has new NVMe kernel IO method support), and you are using DirectStorage, then you will get...Everything still loaded through main memory via the CPU.

      Now you will get a performance benefit of it not having to be copied into user-mode memory space on the way to the GPU, it can stay in a Kernel IO buffer to be copied direct to GPU, not via user-mode process memory space.
      You will also get
  • As a former game dev, if you're doing lots of separate file accesses to load a level you're doing it very wrong.

    The games I worked on loaded at whatever rate the media streamed in at. While I never had access to a console with an SSD (I've been out of games a while) there's no reason the code we used wouldn't work up to a few GB/sec even on the older consoles.

    But then again, nobody said XBox/Windows devs were smart.

    • As a former game dev, if you're doing lots of separate file accesses to load a level you're doing it very wrong. The games I worked on loaded at whatever rate the media streamed in at.

      That's outdated thinking, it made sense in the days where all I/O was essentially single-threaded. But not anymore; SSDs using a PCI bus allow for multiple concurrent file access and NVMe has additional benefits. So you can treat files more like a regular data object sitting in "slow RAM." The gist of what MS is doing here is allowing the GPU direct access to the storage, bypassing the need to load assets into RAM before passing to the GPU. So you want a "one asset per file" scheme as opposed to older meth

      • I've worked on open world games and each zone was a separate file that streamed in with linear access.

        It takes a bunch more offline tooling to do this way but it's superior to the scatter/gather approach everywhere, although it would be nice to no go through main memory to load GPU resources but hat's mainly a GPU driver thing.

      • Storage I/O hasn't been "essentially single-threaded" for at least 30 years now. The only thing single-threaded was application code that consigned I/O to a single thread using blocking I/O functions. Any code concerned about performance used either async I/O or multiple threads to take advantage of parallel operations and seek ordering by the hardware and drivers. The only place you had a bottleneck was in the physical I/O channel between the drive and the bus/chipset, and SSDs are subject to that same lim

    • by SirSlud ( 67381 )

      I'm in console games (20+ years) and things change. Most of the big titles today are to a greater degree open world - or at least less deterministic in terms of loading behavior - more streamed on demand and too dynamic to benefit from architecting around loading in huge concurrently stored chunks of data. Games still do built-time optimizations for data locality but every major game today is doing a huge number of IO requests, and not having to go through system RAM is a big win.

      Sony did this too. For some

    • But then again, nobody said XBox/Windows devs were smart.

      Oh fuck off arsehole. This problem you don't understand transcends consoles and ironically for your point Sony was the first to address it. I'm glad you're a "former" game dev. You clearly have no idea what the bottlenecks of modern games are. Leave the game development to professionals.

  • I've already spent too much time today thinking about what this might actually be, because neither the article or the blurb on MS's Web site describe what this actually does.

    If I were designing this: My first thought would be a variation of the scatter-gather technique. However, instead of a sequential read from media to scattered buffers in RAM, I would add scattered file offsets. In other words, extend the iovec structure something like:

    struct iovec { off_t iov_srcoff; void * iov_destbase;

    • I've already spent too much time today thinking about what this might actually be, because neither the article or the blurb on MS's Web site describe what this actually does.

      It will probably require you to open your wallet and buy a more expensive SSD. That seems to be the general trend in gaming. "How badly can we fuck up the code, so you need to go out and buy gut-wrenchingly-expensive GPU and a bigger, faster SSD to hold all the bloat?"

  • Is this possibly a sneak attack on Valve's Proton to prevent Windows games from running on Linux and the Steam Deck?
    • by bn-7bc ( 909819 )
      I'm not a game dev ( or any other kind of dev for that matter) so please excuse me jf this question us stupid, but isn't such a low level detail as hiw you liead data usualy not handeled by the game engine, and eoult it not be rhe game engines devs problem to figure out what os api to use on the current is and just expose a high level load function in the engines api?
  • Problem is, that other OS calls are assumed to be sequential to a tape and then to a disk.
    However, new calls are needed to deal with parallelism.

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Working...