Json C++ Serialization

Posted on: 2015-04-16

Well as usual it seems with serialization there are lots of edge cases and issues to get right. The JSON serialization/deserialization now works, and here's what some test output looks like

  "serial": {
    "majorVersion": 1,
    "minorVersion": 0,
    "instances": [
        "type": "object",
        "name": "#CSTDerived0",
        "valueType": "CSTDerived",
        "values": {
          "x": 1435496195,
          "y": 0.029479,
          "string": "A string:1962775954",
          "anotherString": "",
          "half": -0.921875,
          "halfArray": [
          "array": [
            135, 141, 217, 246, 141, 79, 160, 92, 27, 7, 8, 215, 
            207, 76, 122, 111, 161, 49, 54, 199, 7, 118, 150, 96, 
            8, 115, 171
          "rasterFormat": "LUM8",
          "rasterLoadFlags": 0,
          "refString": "D4RD*A)",
          "refName": "",
          "dumbRefName": "054CH",
          "int64": 546897557,
          "int16": -20878,
          "int8": -60,
          "float64": 0.360291,
          "obj": "#CSTBase0",
          "enumFlag": 0,
          "enum": "LUM4"
        "type": "object",
        "name": "#CSTBase0",
        "valueType": "CSTBase",
        "values": {
          "x": 1464189610,
          "y": 0.61714,
          "string": "A string:115916702",
          "anotherString": "",
          "half": 7.17969,
          "halfArray": [
          "array": null,
          "rasterFormat": "PAL8",
          "rasterLoadFlags": "CAN_COMPRESS",
          "refString": "",
          "refName": "",
          "dumbRefName": "BX+1XVR2"

In the original XML serialization this would look like this...

<?xml version="1.0" encoding="UTF-8"?>
	<obj cls="CSTRWDerived"> value=10 
		<obj field="derived" cls="CSTDerived"> 
            string="A string:-1241196936" 
            anotherString="5" half=-2.17969 
			<array field="halfArray" type="f16">-0.359375, 1.39063, 1.89063</array>
			<array field="array" type="u8">68, 133, 93, 192, 98, 73, 206, 
                215, 29, 204, 2, 253, 160, 101, 89, 135, 12, 46, 126, 196, 225, 
                72, 227, 126, 21</array> 
			<obj field="obj" cls="CSTBase"> 
                string="A string:1372361558" 
				<array field="halfArray" type="f16">-1.54688</array>
				<array field="array" type="u8">240, 9, 116, 25, 25, 196, 86, 62, 10, 197, 
                    245, 224, 250, 192, 8, 142, 195, 70, 103, 230, 190, 104, 
                    165, 124, 35</array> 
				<string field="refName">&lt;S*</string> 
        data="A string:-1237320431" 

The XML is actually relatively close, and surprisingly not that much bigger - but that's mainly because it's not strict XML as I embed values in strings which have to be parsed specially and that XML doesn't understand so cannot manipulate. The following is 'full XML' version of the same data.

<?xml version="1.0" encoding="ISO-8859-1"?>
	<obj cls="CSTRWDerived">
		<i32 field="value">10</i32>
		<obj field="derived" cls="CSTDerived">
			<i32 field="x">-1873942545</i32>
			<f32 field="y">0.898559</f32>
			<string field="string">A string:-1241196936</string>
			<string field="anotherString">5</string>
			<f16 field="half">-2.17969</f16>
			<array field="halfArray" type="f16">
			<array field="array" type="u8">
			<enum type="Graphics::PixelFormat" field="rasterFormat">ABGR_U8</enum>
			<enum type="Graphics::PixelLoadFlag" field="rasterLoadFlags">CAN_COMPRESS|CAN_DITHER</enum>
			<string field="refString">B69N</string>
			<string field="refName">@@DBBS</string>
			<string field="dumbRefName">NLBNUWR</string>
			<i64 field="int64">-674024748</i64>
			<i16 field="int16">20739</i16>
			<i8 field="int8">-44</i8>
			<f64 field="float64">0.387931</f64>
			<obj field="obj" cls="CSTBase">
				<i32 field="x">1986533458</i32>
				<f32 field="y">0.883077</f32>
				<string field="string">A string:1372361558</string>
				<string field="anotherString" id="Name#0"></string>
				<f16 field="half">-6.10156</f16>
				<array field="halfArray" type="f16">
				<array field="array" type="u8">
				<enum type="Graphics::PixelFormat" field="rasterFormat">ARGB_F32</enum>
				<enum type="Graphics::PixelLoadFlag" field="rasterLoadFlags">0</enum>
				<ref field="refString" idref="Name#0"/>
				<string field="refName">&lt;S*</string>
				<ref field="dumbRefName" idref="Name#0"/>
			<enum type="Graphics::PixelLoadFlag" field="enumFlag">CAN_LOSSY_COMPRESS|CAN_COMPRESS</enum>
			<enum type="Graphics::PixelFormat" field="enum">ARGB_U8</enum>
		<string field="data">A string:-1237320431</string>
		<f32 field="floatValue">0.464458</f32>
		<i32 field="intValue">-1096084940</i32>

In the examples above I don't write out the type definition, so this is more how configuration files look. I won't go into all the edge cases - some were about making the JSON output more readable and editable, and others are due to tricky little features of how to write out types appropriately.

There are 3 representations when reading in serialized data

The serial representation can be identical to the internal representation - and that's the easiest and fastest scenario - as we already have precomputed remappers that will convert the C++ representation to and from the internal representation. Unfortunately it's not uncommon for them to be different.

So how would the serial representation be different from the internal representation. One way is some 'ordinary' change. Say a new field is added, the order of fields change, or the type of a field is changed. You could write code to 'patch' the changes - but that would be very dull. You could also use the remapping code to generate code to remap to the serial to the internal representation.  That would work, but would mean there are two remappings, one remap from serial to internal, and another from internal to the C++ representation.

We can remove one of the remappings if we allow remapping from the serialized representation directly to the internal representation. There is code to produce such remappers - it is used to create the pre-calculated remapper for the internal representation from the C++ representation.

So if we allow a version to remain the same when 'ordinary changes' are made - we need a way to work out quickly if we need to calculate a new remapper or use the pre-calculated remapper. I do this with a hash of the type - if the version number between the types is the same I can convert with a direct remapping between serial representation and C++  representation. If the hash number is also the same - I can use the already pre-calculated remapper. If not for that type I will need to build a remapper that will map the serial representation to the C++ representation.

Now a new slightly more egregious problem appears. Say I add a field to a type, and leave the version number the same. Everything will work - the version number will be the same, but the hash will be different, so the system will produce a new remapper that will not copy the newly added field from the serial representation (because it won't exist), and in not doing so, it will mean the field will be correctly defaulted. This is good.

The problem is when I now define the field in the serial representation - it may not be defined for all instances - but now I don't know how to default it. I need to give it a reasonable default value because it will be copied over what in the run-time representation. More mundanely if you are using the serialized format for configuration - you don't want to have to define the values of all fields - you want sensible default types if you don't define them.

There's a few way you could fix this. In the end the approach I used was to calculate a default representation for each type in the serial representation. If fields are not filled in then the default value will remain.

Unfortunately I can't pre-calculate 'default instances' because if the type contains a pointer then that becomes a handle, and that handles value is specific to the SerialState. So when a JSON file is read, I have to calculate the 'default values' on the fly. This involves creating an instance of the type, running the serial remapper forward to create a serial representation. In doing so it may add instances to the SerialState. That said I could pre-calculate if I could determine the type had no handles... something to perhaps add.

In conclusion this is getting close. I need to tidy up some of the default handling, and I haven't implemented the 'ordinary' remapping scenario, but that shouldn't be to hard.