WAD "Where's All the Data" files used by DOOM and various other games are simple containers, similar to zip and other archive formats, without additional complexity (such as compression) and data-centric rather than file. This article describes how to read the WAD files used by DOOM, DOOM II, Rise of the Triad and similar games of that era. Yes, I'm talking DOS and 1993, not the more modern reboots.
The article only covers reading of a WAD and extracting its contents, it does not cover the format of the individual data within given that the data is application dependent. With that said, I'll be covering the DOOM picture format in the next article.
In 2018 I looked into the MIX format used by the Command & Conquer games which is very similar to WAD but for reasons I don't recall I didn't end up writing a post about the format. Recently I finished reading Jimmy Maher's excellent series on DOOM and that reminded me I had wanted to look into WAD and other container formats for my own future use. As I have been completely unable to finish a single draft blog post I currently have, I decided something fresh and new (to me anyway!) was a good idea.
Although I don't normally plug other sites, Jimmy's blog The Digital Antiquarian is a fantastic blog of the games of yesteryear and I wish I could write half as well as him.
There are various formats of WAD file available, each building on the previous. This initial series of articles only covers the original version first introduced in DOOM. At the time of writing, I haven't looked the other versions but I plan to look at some of them in future articles.
I have tested the code presented in this article with WAD files from Shareware DOOM, ULTIMATE DOOM, DOOM II and Rise of the Triad: Dark War.
The format is simple enough. There is a 12 byte header which details the wad type, the number of lumps of data it contains, and an offset where the directory index is located.
||Either the string
||The number of entries in the directory|
||The location of the directory|
The directory index is comprised of (16 * number of lumps) bytes which describe the lumps. Each 16 byte header details the size, the position in the data and the lump name.
||The location of the lump|
||The size of the lump|
||The name of the lump, padded with
As far as I know, the directory can be located anywhere in a WAD file, or at least anywhere after the header. All of the WADs I have examined have the directory at the end of the file which makes perfect sense from a serialisation standpoint, but there's no reason why it couldn't be elsewhere. The only rule is that all elements in the directory index must be contiguous.
All integer values are in little-endian format.
The first four bytes of the file header are either
PWAD, and this denotes the type of the WAD. The
means this is an "internal" WAD, which is the main WAD for a
P prefix denotes a "patch" WAD, which allows a WAD
to override the lumps from the main internal WAD, e.g. for
providing custom levels, skins or other data.
Reading the header is quite straightforward - first read in the 12 bytes into a buffer and define the WAD type based on the first byte. Next, we extract 32bit integers from each set of 4 bytes in the remainder of the header that contain the number of data lumps and then the start of the directory listing.
Note: In the interests of clarity, parameter and data validation have been omitted from the snippets in this article.
You could use the
BitConverter.ToInt32method, but then if this code was ran on a big-endian system, the BitConverter class would automatically reverse the bytes, returning values that would be very wrong and so this set of articles will use their own code which ignores the endian-ness of the system and will always read and write as little-endian.
Now that we know where the directory index is located in the
file, we can read out the individual lump details. As with the
WAD header, we declare a buffer big enough to fill the directory
header, then read in the bytes. Using the same
method described earlier, we extract the size of the lump and
its position in the file.
Next, we find the real length of the lump name, by starting at
the end of the array and working back until we find a non-zero
value. Once we have this length we call
Encoding.ASCII.GetString to extract the name. Unfortunately,
if we called this API without defining the true length, the
returned string would include any
NUL padding bytes.
Lump names may not be unique and can appear multiple times. For
example, every DOOM map that I've looked at so far has a lump
THINGS, another named
LINEDEFS and several more.
As a result, DOOM seems to make use of a uniquely named lump
E1M1) that serve no purpose other than to be a bookmark
to a contiguous set of lumps that make up a feature (and
sometimes another placeholder at the end if the lumps are
dynamic). For placeholders, the lump size is set to zero, and
the lump offset is either set to the offset of the next valid
lump or again zero. This also means that, depending on the
application using the WAD, lump order is important.
To read the actual data for a given lump, we would set the
Position of our backing
Stream to the lump offset and then
only read data up to the length of the lump.
This sounds error prone and means you have to know this
information up front instead of being able to pass a
another method. So for this case, I created an
class which basically acts as a window into another stream
without being to read data it shouldn't or the caller needing to
explicitly know about source boundaries.
With this class in place, I can now get a
Stream that only
provides access to the a specific lumps data with a call similar
to the below.
I can then dispose of this stream or pass it to another method
ImageFile.FromStream) without needing to know or
care that this is part of something bigger or affecting that.
For this example, I created the
WadReader class, which is a
forward reading class for quickly enumerating the contents of a
WAD. I also added a
WadFile class which will load all the lump
meta data into a collection for further use.
WadReader is designed for quickly enumerating the contents
of a WAD. It maintains enough state to know where it is in the
WAD, but nothing else, expecting the consumer to take care of
storing whatever information is required. This would be useful,
for example, if you wanted to pull out one or more lumps for
load on demand.
WadReader class exposes a
Count property, and
GetNextLump method which can be used to enumerate.
GetNextLump will return a valid object as long as there are
items remaining, and
null once it reaches the end of the file.
WadFile class loads all the lumps (but not the actual
data) into a collection so that it is always available. You can
then pull out lump data at any point without having to re-read
the directory and provides convenience methods for more easily
pulling out WAD data. It isn't as efficient as
easier to use. It also supports write operations whilst
WadReader does not.
There is no single download available for this sample as rather than doing a simple demo as I do for most blog posts, it is a slightly more complex solution covering reading, writing and various other features too. The full project is available from our GitHub page.
The WAD format has no real features and so is simple to read and write. The linked GitHub page includes a demonstration program which allows WAD files to be opened and contents extracted.
Like what you're reading? Perhaps you like to buy us a coffee?