Using XML for storing game data

When making games, you find yourself dealing with lots of data of different sorts, everything from the movement speed and jump height of the player to the level layouts or game art, and it can sometimes be of great importance how you deal with it. Some things (especially the smaller, simpler things), you might as well just store in the code itself, while other things might be better to store in external files. This entry will be about storing game data, and more specifically about using XML files for storage.

In the various online forums and messageboards about game development, the question about how to store game data pops up from time to time. Usually, some people will reply to this by recommending their favourite data format (be it scripts, like LUA, or data formats like INI-files or XML) and then keep defending their choice, touting it as the best in any given situation. These are people with enough experience to feel confident, but not nearly enough experience to be worth listening to. Give them a few years and a couple of large projects, and they’ll see the error of their ways :-)

The fact is, that how you store data depends on your game, the data in question and (yes) your personal preferences.

Don’t think that you’ll be using the same format for all the data in a game, or that you will use the same formats for every game you make. Attempting to do so will result in a lot of wasted time, that could have been better spent.

If the data is large (like bitmaps, 3d models etc), you must use custom, binary formats. There’s no other way to get maximum load speed, and there’s no reason to make your players wait for your game to complete a bunch of text-parsing during loading (and yes, even for small games, text-based formats make a significant impact on load times, even on fast computers). If you need the data to control flow or check conditions, you need a scripting language of some sort. But it’s usually not a good idea to use an out-of-the box scripting language – you need a simpler, custom solution which will let you express your game scripts in the most simple way possible, to make them easier to write/

Sometimes you need to store small amounts of data which you need to be fairly easy to edit by hand, or you need an intermediate format for larger amounts of data (which will be converted to a custom binary format before release, like my 3d model file format), and XML is a good choice for this (in many cases). Note though, that it’s not the only one – always evaluate your options and make the choice that you feel comfortable with.

When making games using my own free, public domain game engine, I use XML for some things from time to time, and if used right, it’s a really nice way of storing data. I won’t go into too much detail about the XML format itself – there’s plenty of online resources describing XML basics. There’s one thing I think I should mention though:

There’s two different ways of reading and working with XML files. two different classes of parsers: DOM (Document Object Model) and SAX (Simple API for XML).

With DOM, the game programmer tells the XML parser to read and parse an XML file. The parser will go through the file, and create a hierarchical datastructure in memory, mirroring the content of the XML file. The game programmer will then add code to parse this datastructure, pulling out strings from it and use this to create his own in-game structures.

With SAX, the game programmer also tells the XML parser to read and parse an XML file. The parser will go through the file, and for each element or attribute it encounters, it will call a function implemented by the game programmer, passing along the data. The game programmer can do pretty much what he wants in this function, and will use this to create his own in-game structures.

Though they appear quite similar, there’s a couple of significant differences between the SAX and DOM parsing philosophies. With DOM, there’s a lot of small memory allocations going on, as it will have to allocate string buffers and hierarchy nodes as it goes along. Memory allocations is very slow, and you definitely don’t want a lot of them going on during load time, if you can avoid it. With SAX, on the other hand, there won’t be much allocations going on, except whatever ones the game programmer makes in the callback functions, to store the in-game data. Also, using DOM, you will have to keep the entire contents of the XML file in memory while extracting the data – with SAX, you only hold bits of it at a time.

DOM parsers have their uses, especially if you need to load a hierarchy, change it, and save it back to file, but for games where you just want to read the data and store it in your own datasets, the obvious choice is a SAX parser.

So what do most games programmers use? Well, being games programmers, they use DOM parsers. Which doesn’t make much sense at all. I think the reason is that there’s a popular DOM parser called TinyXML, which a lot of programmers are recommending on forums and such. Not because it is better than other parsers, but because it is the only one they have used 😛 (or in some cases, heard that others have used). I think the "Tiny" part of the name appeals to a lot of programmers, and it is true that the source code is quite small, and the feature set is limited. But it’s still a DOM parser, so its memory footprint, number of allocations or general speed won’t be tiny.

A better choice is a little known SAX parser called expat, which is also the one I’m using in my engine. It’s small, easy to use and free to use and distribute in any way.

Using a SAX-based parser is not difficult, and just requires you to register some callback functions with the parser, and keep track of where the data should go. However, as I’m using object-oriented C++ for most things, I’ve built a simple system on top of expat, which allows for even easier loading of XML data, and I will be writing more about this system in a later blog entry.

4 Responses to “Using XML for storing game data”

  1. whoa says:

    Ofcourse SAX and DOM have different uses, but I have yet to see any great impact on loading speed due to the use of a DOM parser even on quite large data sets in my experience the time difference between them is not an issue that the end-user will notice(or suffer at if you will).Give us some solid evidence here! For a realistic dataset, will it differ that much that the end-gamer 😉 will squeel in agony over time waiting?!?

  2. Mattias says:

    So, here we have an anonymous person making a confident post, questioning things he appears to not have a clue about :-)

    “I have yet to see any great impact on loading speed”… What games have you worked on? Which of those used XML? It would be interesting to know what you’re basing your conclusions on 😀 (Which games I’ve worked on is listed here). My guess is that you haven’t made any games. That you have little real experience, and that finding out that the technology you use is inferior have somehow ticked you off, making you feel a need to defend it. But why? Why not just use the tech which is better suited, say “thanks for the advice”, change over to SAX and move on.

    The fact of the matter is, that we games programmer have a responsibility to look out for even the small details. If we can reduce the loading time from 5 minutes to 1 minute, that’s great, and a service we should provide to our players. If all it takes to achieve it is use the right piece of technology, which is just as easy to use, then why shouldn’t we?

    Nowadays, when so many people learn most things about making games from dodgy online forums and half-baked online tutorials, there’s a blind-leading-the-blind kind of situation, and a general feeling of always being in a hurry to get to the point where you hit “run” and “things sorta works”, ignoring the finer details of what’s going on under the hood. This shows in the AAA games industry too, with a lot of people being clueless about the fundamentals.

    Don’t hang on to the wrong solution to a problem just because “this really cool guy” on “this really cool forum” said it’s the best thing. Hey, don’t take my word for it either, do your own performance tests. But don’t make the mistake of using something out of habit when it no longer serves its purpose well.

  3. Thorbjørn Lindeijer says:

    I’d just like to point out that you can get the same (or even slightly better) performance as SAX for XML parsing with a stream reader. This allows you to still write code similar to traversing a DOM tree, which is generally much more straight-forward than keeping your SAX parser state under control.Stream reader support was added to Java as of version 1.6 ( XMLStreamReader), it was added to Qt as of version 4.3 (QXmlStreamReader). I believe C# and libxml2 also have interfaces like that.

  4. Mattias says:

    True, it’s not necessary to use the formal SAX interface, with all it entails – using a stream reader is just as good, and can be easier – it’s just keeping a full DOM tree around that seems a bit wasteful.

Leave a Reply

You must be logged in to post a comment.