ADR-005: JSON Serialization Format

authorrevision
Jure Kukovec1.1

This ADR documents our decision on serializing the Apalache internal representation (IR) as JSON. The purpose of introducing such a serialization is to expose the internal representation in a standardized format, which can be used for persistent storage, or for analysis by third-party tools in the future.

1. Serializable classes

The following classes are serializable:

  1. TLA+ expressions (see TlaEx) and subclasses thereof:

    • Named expressions NameEx
    • Literal values ValEx for the following literals:
      • Integers TlaInt
      • Strings TlaStr
      • Booleans TlaBool
      • Decimals TlaDecimal
    • Operator expressions OperEx
    • LET-IN expressions LetInEx
  2. TLA+ declarations (see TlaDecl) and subclasses thereof:

    • Variable declarations TlaVarDecl
    • Constant declarations TlaConstDecl
    • Operator declarations TlaOperDecl
    • Assumption declarations TlaAssumeDecl
    • Theorem declarations TlaTheoremDecl
  3. TLA+ modules, see TlaModule

2. Disambiguation field

Every serialization will contain a disambiguation field, named kind. This field holds the name of the class being serialized. For example, the serialization of a NameEx will have the shape

{
  "kind": "NameEx"
  ...
}

3. Serializing tagged entities

Serializations of entities annotated with a TypeTag will have an additional field named type, containing the type of the expression (see ADR-002, ADR-004 for a description of our type system and the syntax for types-as-string-annotations respectively), if the tag is Typed, or Untyped otherwise. For example, the integer literal 1 is represented by a ValEx, which has type Int and is serialized as follows:

{
  "kind": "ValEx",
  "type": "Int"
  ...
}

in the typed encoding, or

{
  "kind": "ValEx",
  "type": "Untyped"
  ...
}

in the untyped encoding.

4. Source information

Entities in the internal representation are usually annotated with source information, of the form {filename}:{startLine}:{startColumn}-{endLine}:{endColumn}, relating them to a file range in the provided specification (from which they may have been transformed as part of preprocessing). JSON encodings may, but are not required to, contain a source providing this information, of the following shape:

{
  "source": {
    "filename" : <FILENAME>,
    "from" : {
      "line" : <STARTLINE>,
      "column" : <STARTCOLUMN>
    },
    "to" : {
      "line" : <ENDLINE>,
      "column" : <ENDCOLUMN>
    }
  }
}

or

{
  "source": "UNKNOWN"
}

if no source information is available (e.g. for expressions generated purely by Apalache). Serializations generated by Apalache are always guaranteed to contain a source field entry.

Example:

{
  "kind" : "NameEx",
  "type" : "Int",
  "name" : "myName",
  "source": {
    "filename" : "MyModule.tla",
    "from" : {
      "line" : 3,
      "column" : 5
    },
    "to" : {
      "line" : 3,
      "column" : 10
    }
  }
}

5. Root wrapper

JSON serializations of one or more TlaModule objects are wrapped in a root object with two required fields:

  • version, the value of which is a string representation of the current JSON encoding version, shaped {major}.{minor}, and
  • modules, the value of which is an array containing the JSON encodings of zero or more TlaModule objects

It may optionally contain a field "name" : "ApalacheIR". This document defines JSON Version 1.0. If and when a different JSON version is defined, this document will be updated accordingly. Apalache may refuse to import, or trigger warnings for, JSON objects with obsolete versions of the encoding in the future. Example:

{
  "name": "ApalacheIR",
  "version": "1.0",
  "modules" = [
    {
      "kind": "TlaModule",
      "name": "MyModule"
      ...
    },
    ...]
}

6. General serialization rules

The goal of the serialization is for the JSON structure to mimic the internal representation as closely as possible, for ease of deserialization. Concretely, whenever a class declares a field fld: T, its serialization also contains a field named fld, containing the serialization of the field value. For example, if TlaConstDecl declares a name: String field, its JSON serialization will have a name field as well, containing the name string.

If a class field has the type Traversable[T], for some T, the corresponding JSON entry is a list containing serializations of the individual arguments. For example, OperEx is variadic and declares args: TlaEx*, so its serialization has an args field containing a (possibly empty) list.

As a running example, take the expression 1 + 1, represented with the correct type annotations as

OperEx( 
  oper = TlaArithOper.plus, 
  args = Seq( 
    ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) ), 
    ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) )
    ) 
  )( typeTag = Typed( IntT1() ) )

Since both sub-expressions, the literals 1, are identical, their serializations are equal as well:

{
  "kind": "ValEx",
  "type": "Int",
  "value": {
    "kind": "TlaInt",
    "value": 1
  }
}

Observe that we choose to serialize TlaValue as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape

{
  "kind": <KIND> ,
  "value": <VALUE>
}

The value field depends on the kind of TlaValue:

  1. For TlaStr: a JSON string
  2. For TlaBool: a JSON Boolean
  3. For TlaInt(bigIntValue):
    1. If bigIntValue.isValidInt: a JSON number
    2. Otherwise: { "bigInt": bigIntValue.toString() }
  4. For TlaDecimal(decValue): a JSON string decValue.toString

The reason for the non-uniform treatment of integers is that Apalache encodes its TLA+ integers as BigInt, which means that it permits values for which .isValidInt does not hold.

While it might seem more natural to encode the entire TlaValue as a JSON primitive, without the added object layer we would have a much tougher time deserializing. We would need a) a sensible encoding of BigInt values, which are not valid integers, and b) a way to distinguish both variants of BigInt, as well as decimals, when deserializing (since JSON is not typed). We could encode all values as strings, but they would be similarly indistinguishable when deserializing. Importantly, the type field of the ValEx expression is not guaranteed to contain a hint, as it could be Untyped

Take jsonOf1 to be the serialization of ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) ) shown above. The serialization of 1 + 1 is then equal to

{
  "kind": "OperEx",
  "type": "Int",
  "oper": "PLUS",
  "args": [jsonOf1, jsonOf1]
}

In general, for any given oper: TlaOper of OperEx, the value of the oper field in the serialization equals oper.name.

7. Implementation

The implementation of the serialization can be found in the class at.forsyte.apalache.io.json.TlaToJson of the module tla-import, see TlaToJson.