Using custom type converters with C# and YamlDotNet, part 2

Recently I discussed using type converters to perform custom serialization of types in YamlDotNet. In this post I'll concentrate on expanding the type converter to support deserialization as well.

I'll be reusing a lot of code and knowledge from the first part of this mini-series, so if you haven't read that yet it is a good place to start.

Even more so that with part 1, in this article I'm completely winging it. This code works in my demonstration program but I'm by no means confident it is error free or the best way of reading YAML objects.

To deserialize data via a type converter, we need to implement the ReadYaml method of the IYamlTypeConverter interface. This method provides an object implementing IParser for reading the YAML, along with a type parameter describing the type of object the method should return. This latter parameter can be ignored unless your converter can handle multiple object types.

The IParser interface itself is very basic - a MoveNext method to advance the parser, and a Current property which returns the current ParsingEvent object (the same types of object we originally used to write the YAML).

YamlDotNet also adds a few extension methods to this interface which may be of use. Although in this sample project I'm only using the base interface, I try to point out where you could use these extension methods which you may find more readable to use.

A key tip is to always advance the parser by calling MoveNext - if you don't, then YamlDotNet will call your converter again and again in an infinite loop. This is the very first issue I encountered when I wrote some placeholder code as below and then ran the demo program.

csharp

public object ReadYaml(IParser parser, Type type)
{
  // As we're not advancing the parser, we've just introduced an infinte loop
  return new ContentCategory();
}

You should probably consider having automated tests that run as you're writing the code using a tool such as NCrunch. Just as with serializing, I found writing deserialization code using YamlDotNet to be non-intuitive and debugging counter productive.

Reading property maps

To read a map, we first check to ensure the current element is MappingStart instance. Then just keep reading and processing nodes until we get a corresponding MappingEnd object.

csharp

private static readonly Type _mappingStartType = typeof(MappingStart);
private static readonly Type _mappingEndType = typeof(MappingEnd);

public object ReadYaml(IParser parser, Type type)
{
  ContentCategory result;

  if (parser.Current.GetType() != _mappingStartType) // You could also use parser.Accept<MappingStart>()
  {
    throw new InvalidDataException("Invalid YAML content.");
  }

  parser.MoveNext(); // move on from the map start

 result = new ContentCategory();

  do
  {
    // do something with the current node

    parser.MoveNext();
  } while (parser.Current.GetType() != _mappingEndType);

  parser.MoveNext(); // skip the mapping end (or crash)

  return result;
}

With the basics in place, we can now process the nodes inside our loop. As it is a mapping, any value should be preceded by a scalar name and often will be followed by a simple scalar value. For this reason I added a helper method to check if the current node is a Scalar and if so return its value (otherwise to throw an exception).

csharp

private string GetScalarValue(IParser parser)
{
  Scalar scalar;

  scalar = parser.Current as Scalar;

  if (scalar == null)
  {
    throw new InvalidDataException("Failed to retrieve scalar value.");
  }
  
  // You could replace the above null check with parser.Expect<Scalar> which will throw its own exception
  
  return scalar.Value;
}

Inside the main processing loop, I get the scalar value that represents the name of the property to process and advance the reader to get it ready to process the property value. I then check the property name and act accordingly depending on if it is a simple or complex type.

csharp

string value;

value = this.GetScalarValue(parser);
parser.MoveNext(); // skip the scalar property name

switch (value)
{
  case "Name":
    result.Name = this.GetScalarValue(parser);
    break;
  case "Title":
    result.Title = this.GetScalarValue(parser);
    break;
  case "Topics":
    this.ReadTopics(parser, result.Topics);
    break;
  case "Categories":
    this.ReadContentCategories(parser, result.Categories);
    break;
  default:
    throw new InvalidDataException("Unexpected scalar value '" + value + "'.");
}

For the sample Name and Title properties of my ContentCategory object, I use the GetScalarValue helper method above to just return the string value. The Topics and Categories properties however are collection objects, which leads us nicely to the next section.

Reading lists

Reading lists is fairly similar to maps, except this time we start by looking for SequenceStart and ending with SequenceEnd. Otherwise the logic is fairly similar. For example, in the demonstration project, the Topics property is a list of strings and therefore can be easily read by reading each scalar entry in the sequence.

csharp

private static readonly Type _sequenceEndType = typeof(SequenceEnd);
private static readonly Type _sequenceStartType = typeof(SequenceStart);

private void ReadTopics(IParser parser, StringCollection topics)
{
  if (parser.Current.GetType() != _sequenceStartType)
  {
    throw new InvalidDataException("Invalid YAML content.");
  }

  parser.MoveNext(); // skip the sequence start

  do
  {
    topics.Add(this.GetScalarValue(parser));
    parser.MoveNext();
  } while (parser.Current.GetType() != _sequenceEndType);
}

Sequences don't have to be lists of simple values, they can be complex objects of their own. As our ContentCategory object can have children of the same type, another helper method repeatedly calls the base ReadYaml method to construct child objects.

csharp

private void ReadContentCategories(IParser parser, ContentCategoryCollection categories)
{
  if (parser.Current.GetType() != _sequenceStartType)
  {
    throw new InvalidDataException("Invalid YAML content.");
  }

  parser.MoveNext(); // skip the sequence start

  do
  {
    categories.Add((ContentCategory)this.ReadYaml(parser, null));
  } while (parser.Current.GetType() != _sequenceEndType);
}

What I don't know how to do however, is invoke the original parser logic for handling other types. Nor do I know how our custom type converters are supposed to make use of INamingConvention implementations. The demo project is using capitalisation, but the production code is using pure lowercase to avoid any ambiguity.

Using the custom type converter

Just as we did with the SerializerBuilder in part 1, we use the WithTypeConverter method on a DeserializerBuilder instance to inform YamlDotNet of the existence of our converter.

csharp

Deserializer deserializer;

deserializer = new DeserializerBuilder()
  .WithTypeConverter(new ContentCategoryYamlTypeConverter())
  .Build();

It would be nice if I could decorate my types with a YamlDotNet version of the standard TypeConverter attribute and so avoid having to manually use WithTypeConverter but this doesn't seem to be a supported feature.

Closing

Custom YAML serialization and deserialization with YamlDotNet isn't as straightforward as perhaps could be but it isn't difficult to do. Even better, if you serialize valid YAML then it's entirely possible (as in my case where I'm attempting to serialize less default values) that you don't need to write custom deserialization code at all as YamlDotNet will handle it for you.