Reading DOOM WAD Files

WAD "Where's All the Data" files used by DOOM and various other games are simple containers, similar to zip and other archive formats, without additional complexity (such as compression) and data-centric rather than file. This article describes how to read the WAD files used by DOOM, DOOM II, Rise of the Triad and similar games of that era. Yes, I'm talking DOS and 1993, not the more modern reboots.

The article only covers reading of a WAD and extracting its contents, it does not cover the format of the individual data within given that the data is application dependent. With that said, I'll be covering the DOOM picture format in the next article.

In 2018 I looked into the MIX format used by the Command & Conquer games which is very similar to WAD but for reasons I don't recall I didn't end up writing a post about the format. Recently I finished reading Jimmy Maher's excellent series on DOOM and that reminded me I had wanted to look into WAD and other container formats for my own future use. As I have been completely unable to finish a single draft blog post I currently have, I decided something fresh and new (to me anyway!) was a good idea.

Although I don't normally plug other sites, Jimmy's blog The Digital Antiquarian is a fantastic blog of the games of yesteryear and I wish I could write half as well as him.

An example of reading a WAD file and examining one of the contained data lumps

About WAD Formats

There are various formats of WAD file available, each building on the previous. This initial series of articles only covers the original version first introduced in DOOM. At the time of writing, I haven't looked the other versions but I plan to look at some of them in future articles.

I have tested the code presented in this article with WAD files from Shareware DOOM, ULTIMATE DOOM, DOOM II and Rise of the Triad: Dark War.

The Format

The format is simple enough. There is a 12 byte header which details the wad type, the number of lumps of data it contains, and an offset where the directory index is located.

Range	Description
`0` - `3`	Either the string `IWAD` or `PWAD`
`4` - `7`	The number of entries in the directory
`8` - `11`	The location of the directory

The directory index is comprised of (16 * number of lumps) bytes which describe the lumps. Each 16 byte header details the size, the position in the data and the lump name.

Range	Description
`0` - `3`	The location of the lump
`4` - `7`	The size of the lump
`8` - `15`	The name of the lump, padded with `NUL` bytes

As far as I know, the directory can be located anywhere in a WAD file, or at least anywhere after the header. All of the WADs I have examined have the directory at the end of the file which makes perfect sense from a serialisation standpoint, but there's no reason why it couldn't be elsewhere. The only rule is that all elements in the directory index must be contiguous.

All integer values are in little-endian format.

WAD Types

The first four bytes of the file header are either IWAD or PWAD, and this denotes the type of the WAD. The I prefix means this is an "internal" WAD, which is the main WAD for a game. The P prefix denotes a "patch" WAD, which allows a WAD to override the lumps from the main internal WAD, e.g. for providing custom levels, skins or other data.

Reading the Header

Reading the header is quite straightforward - first read in the 12 bytes into a buffer and define the WAD type based on the first byte. Next, we extract 32bit integers from each set of 4 bytes in the remainder of the header that contain the number of data lumps and then the start of the directory listing.

Note: In the interests of clarity, parameter and data validation have been omitted from the snippets in this article.

csharp

private const byte _wadHeaderLength = 12;
private const byte _lumpCountOffset = 4;
private const byte _directoryStartOffset = 8;

private WadType _type;
private int _lumpCount;
private int _directoryStart;

private void ReadWadHeader(Stream stream)
{
  byte[] buffer;

  buffer = new byte[_wadHeaderLength];

  stream.Read(buffer, 0, _wadHeaderLength);

  _type = _buffer[0] == 'I' ? WadType.Internal : WadType.Patch;

  _lumpCount = GetInt32Le(_buffer, _lumpCountOffset);
  _directoryStart = GetInt32Le(_buffer, _directoryStartOffset);
}

public static int GetInt32Le(byte[] buffer, int offset)
{
  return buffer[offset + 3] << 24 | buffer[offset + 2] << 16 | buffer[offset + 1] << 8 | buffer[offset];
}

You could use the BitConverter.ToInt32 method, but then if this code was ran on a big-endian system, the BitConverter class would automatically reverse the bytes, returning values that would be very wrong and so this set of articles will use their own code which ignores the endian-ness of the system and will always read and write as little-endian.

Reading the Directory

Now that we know where the directory index is located in the file, we can read out the individual lump details. As with the WAD header, we declare a buffer big enough to fill the directory header, then read in the bytes. Using the same GetInt32Le method described earlier, we extract the size of the lump and its position in the file.

Next, we find the real length of the lump name, by starting at the end of the array and working back until we find a non-zero value. Once we have this length we call Encoding.ASCII.GetString to extract the name. Unfortunately, if we called this API without defining the true length, the returned string would include anyNUL padding bytes.

private const byte _directoryHeaderLength = 16;
private const byte _lumpStartOffset = 0;
private const byte _lumpSizeOffset = 4;
private const byte _lumpNameOffset = 8;

private void LoadDirectory(Stream stream, int lumpCount, int directoryStart)
{
  byte[] buffer;

  stream.Seek(directoryStart, SeekOrigin.Begin);

  buffer = new byte[_directoryHeaderLength];

  for (int i = 0; i < lumpCount; i++)
  {
    int offset;
    int size;
    string name;

    stream.Read(buffer, 0, _directoryHeaderLength);

    offset = GetInt32Le(buffer, _lumpStartOffset);
    size = GetInt32Le(buffer, _lumpSizeOffset);
    name = this.GetSafeLumpName(buffer);

    // Do something with the 3 values
  }
}

private string GetSafeLumpName(byte[] buffer)
{
  int length;

  length = 0;

  for (int i = _directoryHeaderLength; i > _lumpNameOffset; i--)
  {
    if (entry[i - 1] != '\0')
    {
      length = i - _lumpNameOffset;
      break;
    }
  }

  return length > 0
      ? Encoding.ASCII.GetString(entry, _lumpNameOffset, length)
      : null;
}

About Names and Empty Data

Lump names may not be unique and can appear multiple times. For example, every DOOM map that I've looked at so far has a lump named THINGS, another named LINEDEFS and several more.

As a result, DOOM seems to make use of a uniquely named lump (e.g. E1M1) that serve no purpose other than to be a bookmark to a contiguous set of lumps that make up a feature (and sometimes another placeholder at the end if the lumps are dynamic). For placeholders, the lump size is set to zero, and the lump offset is either set to the offset of the next valid lump or again zero. This also means that, depending on the application using the WAD, lump order is important.

Reading Lump Data

To read the actual data for a given lump, we would set the Position of our backing Stream to the lump offset and then only read data up to the length of the lump.

csharp

  using (Stream stream = File.OpenRead(fileName))
  {
    using (WadReader reader = new WadReader(stream))
    {
      WadLump lump;

      while ((lump = reader.GetNextLump()) != null)
      {
        byte[] buffer;

        buffer = new byte[lump.Size];

        stream.Position = lump.Offset;
        stream.Read(buffer, 0, buffer.Length);
      }
    }
  }

This sounds error prone and means you have to know this information up front instead of being able to pass a Stream to another method. So for this case, I created an OffsetStream class which basically acts as a window into another stream without being to read data it shouldn't or the caller needing to explicitly know about source boundaries.

csharp

internal sealed class OffsetStream : Stream
{
  private readonly int _length;
  private readonly int _start;
  private readonly Stream _stream;
  private long _position;

  public OffsetStream(Stream source, int start, int length)
  {
    _stream = source;
    _start = start;
    _length = length;
  }

  public override bool CanRead
  {
    get { return true; }
  }

  public override bool CanSeek
  {
    get { return true; }
  }

  public override bool CanWrite
  {
    get { return false; }
  }

  public override long Length
  {
    get { return _length; }
  }

  public override long Position
  {
    get { return _position; }
    set
    {
      if (value < 0 || value > _length)
      {
        throw new ArgumentOutOfRangeException(nameof(value), value, "Value outside of stream range.");
      }

      _position = value;
    }
  }

  public override int Read(byte[] buffer, int offset, int count)
  {
    if (_position + count > _length)
    {
      count = _length - (int)_position;
    }

    if (count > 0)
    {
      _stream.Position = _start + _position;
      _stream.Read(buffer, offset, count);
      _position += count;
    }

    return count;
  }

  public override long Seek(long offset, SeekOrigin origin)
  {
    long value;

    switch (origin)
    {
      case SeekOrigin.Begin:
        value = offset;
        break;

      case SeekOrigin.Current:
        value = _position + offset;
        break;

      case SeekOrigin.End:
        value = _length - offset;
        break;

      default:
        throw new ArgumentOutOfRangeException(nameof(origin), origin, "Invalid origin value.");
    }

    this.Position = value;

    return value;
  }
}

With this class in place, I can now get a Stream that only provides access to the a specific lumps data with a call similar to the below.

csharp

public Stream GetInputStream()
{
  return new OffsetStream(_container, _offset, _size);
}

I can then dispose of this stream or pass it to another method (for example ImageFile.FromStream) without needing to know or care that this is part of something bigger or affecting that.

csharp

while ((lump = reader.GetNextLump()) != null)
{
  Image image = Image.FromStream(lump.GetInputStream());
}

Putting it all together

For this example, I created the WadReader class, which is a forward reading class for quickly enumerating the contents of a WAD. I also added a WadFile class which will load all the lump meta data into a collection for further use.

Using the WadReader

The WadReader is designed for quickly enumerating the contents of a WAD. It maintains enough state to know where it is in the WAD, but nothing else, expecting the consumer to take care of storing whatever information is required. This would be useful, for example, if you wanted to pull out one or more lumps for load on demand.

The WadReader class exposes a Type and Count property, and a GetNextLump method which can be used to enumerate. GetNextLump will return a valid object as long as there are items remaining, and null once it reaches the end of the file.

csharp

private static void WriteWadInfo(string fileName)
{
  using (Stream stream = File.OpenRead(fileName))
  {
    using (WadReader reader = new WadReader(stream))
    {
      WadLump lump;

      Console.WriteLine("File: {0}", fileName);
      Console.WriteLine("Type: {0}", reader.Type);
      Console.WriteLine("Lump Count: {0}", reader.Count);

      while ((lump = reader.GetNextLump()) != null)
      {
        Console.WriteLine("{0}: Offset {1}, Size {2}", lump.Name, lump.Offset, lump.Size);

        // stream.Position is also automatically set to the
        // start of the lump data, allowing me to do
        // stream.Read if required, or call lump.GetInputStream()
        // to get a stream to pass to other methods
      }
    }
  }
}

Using the WadFile class

The WadFile class loads all the lumps (but not the actual data) into a collection so that it is always available. You can then pull out lump data at any point without having to re-read the directory and provides convenience methods for more easily pulling out WAD data. It isn't as efficient as WadReader, but easier to use. It also supports write operations whilst WadReader does not.

csharp

private void FillItems(string fileName)
{
  WadFile wadFile;

  wadFile = WadFile.LoadFrom(fileName);

  namesListBox.BeginUpdate();
  namesListBox.Items.Clear();

  namesListBox.Sorted = false;

  for (int i = 0; i < wadFile.Lumps.Count; i++)
  {
    namesListBox.Items.Add(wadFile.Lumps[i]);
  }

  if (_useNameSort)
  {
    namesListBox.Sorted = true;
  }

  namesListBox.EndUpdate();
}

Where's All The Source Code

There is no single download available for this sample as rather than doing a simple demo as I do for most blog posts, it is a slightly more complex solution covering reading, writing and various other features too. The full project is available from our GitHub page.

Wrapping Up

The WAD format has no real features and so is simple to read and write. The linked GitHub page includes a demonstration program which allows WAD files to be opened and contents extracted.

Like what you're reading? Perhaps you like to buy us a coffee?

A review of the Argon ONE Raspberry Pi 4 Case

Decoding DOOM Picture Files

# Krapul

Oct 30, 2022 08:36

Hi ! First of all, thanks for the detailed def of a WAD. But... i was disappointed to find no tool using these data, for i'm no programmer. Is there a link i missed ? Or can you point to an existing tool that could dive informations like type (I/P wad), title, levels #s & titels, aso), like WinRAR displaying the content of an archive, but with specific infos. Many thanks again.

Reply

About WAD Formats

The Format

WAD Types

Reading the Header

Reading the Directory

About Names and Empty Data

Reading Lump Data

Putting it all together

Using the WadReader

Using the WadFile class

Where's All The Source Code

Wrapping Up

Comments

# Krapul

Richard Moss

Writing DOOM WAD files

Richard Moss

Decoding DOOM Picture Files

Richard Moss