When making games, you find yourself dealing with lots of data of different sorts, everything from the movement speed and jump height of the player to the level layouts or game art, and it can sometimes be of great importance how you deal with it. Some things (especially the smaller, simpler things), you might as well just store in the code itself, while other things might be better to store in external files. This entry will be about storing game data, and more specifically about using XML files for storage.
In the various online forums and messageboards about game development, the question about how to store game data pops up from time to time. Usually, some people will reply to this by recommending their favourite data format (be it scripts, like LUA, or data formats like INI-files or XML) and then keep defending their choice, touting it as the best in any given situation. These are people with enough experience to feel confident, but not nearly enough experience to be worth listening to. Give them a few years and a couple of large projects, and they’ll see the error of their ways
The fact is, that how you store data depends on your game, the data in question and (yes) your personal preferences.
Don’t think that you’ll be using the same format for all the data in a game, or that you will use the same formats for every game you make. Attempting to do so will result in a lot of wasted time, that could have been better spent.
If the data is large (like bitmaps, 3d models etc), you must use custom, binary formats. There’s no other way to get maximum load speed, and there’s no reason to make your players wait for your game to complete a bunch of text-parsing during loading (and yes, even for small games, text-based formats make a significant impact on load times, even on fast computers). If you need the data to control flow or check conditions, you need a scripting language of some sort. But it’s usually not a good idea to use an out-of-the box scripting language – you need a simpler, custom solution which will let you express your game scripts in the most simple way possible, to make them easier to write/
Sometimes you need to store small amounts of data which you need to be fairly easy to edit by hand, or you need an intermediate format for larger amounts of data (which will be converted to a custom binary format before release, like my 3d model file format), and XML is a good choice for this (in many cases). Note though, that it’s not the only one – always evaluate your options and make the choice that you feel comfortable with.
When making games using my own free, public domain game engine, I use XML for some things from time to time, and if used right, it’s a really nice way of storing data. I won’t go into too much detail about the XML format itself – there’s plenty of online resources describing XML basics. There’s one thing I think I should mention though:
There’s two different ways of reading and working with XML files. two different classes of parsers: DOM (Document Object Model) and SAX (Simple API for XML).
With DOM, the game programmer tells the XML parser to read and parse an XML file. The parser will go through the file, and create a hierarchical datastructure in memory, mirroring the content of the XML file. The game programmer will then add code to parse this datastructure, pulling out strings from it and use this to create his own in-game structures.
With SAX, the game programmer also tells the XML parser to read and parse an XML file. The parser will go through the file, and for each element or attribute it encounters, it will call a function implemented by the game programmer, passing along the data. The game programmer can do pretty much what he wants in this function, and will use this to create his own in-game structures.
Though they appear quite similar, there’s a couple of significant differences between the SAX and DOM parsing philosophies. With DOM, there’s a lot of small memory allocations going on, as it will have to allocate string buffers and hierarchy nodes as it goes along. Memory allocations is very slow, and you definitely don’t want a lot of them going on during load time, if you can avoid it. With SAX, on the other hand, there won’t be much allocations going on, except whatever ones the game programmer makes in the callback functions, to store the in-game data. Also, using DOM, you will have to keep the entire contents of the XML file in memory while extracting the data – with SAX, you only hold bits of it at a time.
DOM parsers have their uses, especially if you need to load a hierarchy, change it, and save it back to file, but for games where you just want to read the data and store it in your own datasets, the obvious choice is a SAX parser.
So what do most games programmers use? Well, being games programmers, they use DOM parsers. Which doesn’t make much sense at all. I think the reason is that there’s a popular DOM parser called TinyXML, which a lot of programmers are recommending on forums and such. Not because it is better than other parsers, but because it is the only one they have used (or in some cases, heard that others have used). I think the "Tiny" part of the name appeals to a lot of programmers, and it is true that the source code is quite small, and the feature set is limited. But it’s still a DOM parser, so its memory footprint, number of allocations or general speed won’t be tiny.
A better choice is a little known SAX parser called expat, which is also the one I’m using in my engine. It’s small, easy to use and free to use and distribute in any way.
Using a SAX-based parser is not difficult, and just requires you to register some callback functions with the parser, and keep track of where the data should go. However, as I’m using object-oriented C++ for most things, I’ve built a simple system on top of expat, which allows for even easier loading of XML data, and I will be writing more about this system in a later blog entry.