ADR-005: JSON Serialization Format
author | revision |
---|---|
Jure Kukovec | 1.1 |
This ADR documents our decision on serializing the Apalache internal representation (IR) as JSON. The purpose of introducing such a serialization is to expose the internal representation in a standardized format, which can be used for persistent storage, or for analysis by third-party tools in the future.
1. Serializable classes
The following classes are serializable:
-
TLA+ expressions (see TlaEx) and subclasses thereof:
- Named expressions
NameEx
- Literal values
ValEx
for the following literals:- Integers
TlaInt
- Strings
TlaStr
- Booleans
TlaBool
- Decimals
TlaDecimal
- Integers
- Operator expressions
OperEx
- LET-IN expressions
LetInEx
- Named expressions
-
TLA+ declarations (see TlaDecl) and subclasses thereof:
- Variable declarations
TlaVarDecl
- Constant declarations
TlaConstDecl
- Operator declarations
TlaOperDecl
- Assumption declarations
TlaAssumeDecl
- Theorem declarations
TlaTheoremDecl
- Variable declarations
-
TLA+ modules, see TlaModule
2. Disambiguation field
Every serialization will contain a disambiguation field, named kind
. This field holds the name of the class being serialized. For example, the serialization of a NameEx
will have the shape
{
"kind": "NameEx"
...
}
3. Serializing tagged entities
Serializations of entities annotated with a TypeTag
will have an additional field named type
, containing the type of the expression (see ADR-002, ADR-004 for a description of our type system and the syntax for types-as-string-annotations respectively), if the tag is Typed
, or Untyped
otherwise. For example, the integer literal 1
is represented by a ValEx
, which has type Int
and is serialized as follows:
{
"kind": "ValEx",
"type": "Int"
...
}
in the typed encoding, or
{
"kind": "ValEx",
"type": "Untyped"
...
}
in the untyped encoding.
4. Source information
Entities in the internal representation are usually annotated with source information, of the form {filename}:{startLine}:{startColumn}-{endLine}:{endColumn}
, relating them to a file range in the provided specification (from which they may have been transformed as part of preprocessing).
JSON encodings may, but are not required to, contain a source
providing this information, of the following shape:
{
"source": {
"filename" : <FILENAME>,
"from" : {
"line" : <STARTLINE>,
"column" : <STARTCOLUMN>
},
"to" : {
"line" : <ENDLINE>,
"column" : <ENDCOLUMN>
}
}
}
or
{
"source": "UNKNOWN"
}
if no source information is available (e.g. for expressions generated purely by Apalache).
Serializations generated by Apalache are always guaranteed to contain a source
field entry.
Example:
{
"kind" : "NameEx",
"type" : "Int",
"name" : "myName",
"source": {
"filename" : "MyModule.tla",
"from" : {
"line" : 3,
"column" : 5
},
"to" : {
"line" : 3,
"column" : 10
}
}
}
5. Root wrapper
JSON serializations of one or more TlaModule
objects are wrapped in a root object with two required fields:
version
, the value of which is a string representation of the current JSON encoding version, shaped{major}.{minor}
, andmodules
, the value of which is an array containing the JSON encodings of zero or moreTlaModule
objects
It may optionally contain a field "name" : "ApalacheIR"
.
This document defines JSON Version 1.0. If and when a different JSON version is defined, this document will be updated accordingly.
Apalache may refuse to import, or trigger warnings for, JSON objects with obsolete versions of the encoding in the future.
Example:
{
"name": "ApalacheIR",
"version": "1.0",
"modules" = [
{
"kind": "TlaModule",
"name": "MyModule"
...
},
...]
}
6. General serialization rules
The goal of the serialization is for the JSON structure to mimic the internal representation as closely as possible, for ease of deserialization.
Concretely, whenever a class declares a field fld: T
, its serialization also contains a field named fld
, containing the serialization of the field value.
For example, if TlaConstDecl
declares a name: String
field, its JSON serialization will have a name
field as well, containing the name string.
If a class field has the type Traversable[T]
, for some T
, the corresponding JSON entry is a list containing serializations of the individual arguments. For example, OperEx
is variadic and declares args: TlaEx*
, so its serialization has an args
field containing a (possibly empty) list.
As a running example, take the expression 1 + 1
, represented with the correct type annotations as
OperEx(
oper = TlaArithOper.plus,
args = Seq(
ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) ),
ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) )
)
)( typeTag = Typed( IntT1() ) )
Since both sub-expressions, the literals 1
, are identical, their serializations are equal as well:
{
"kind": "ValEx",
"type": "Int",
"value": {
"kind": "TlaInt",
"value": 1
}
}
Observe that we choose to serialize TlaValue
as a JSON object, which is more verbose, but trivial to deserialize. It has the following shape
{
"kind": <KIND> ,
"value": <VALUE>
}
The value
field depends on the kind of TlaValue
:
- For
TlaStr
: a JSON string - For
TlaBool
: a JSON Boolean - For
TlaInt(bigIntValue)
:- If
bigIntValue.isValidInt
: a JSON number - Otherwise:
{ "bigInt": bigIntValue.toString() }
- If
- For
TlaDecimal(decValue)
: a JSON stringdecValue.toString
The reason for the non-uniform treatment of integers is that Apalache encodes its TLA+ integers as BigInt
, which means that it permits values for which .isValidInt
does not hold.
While it might seem more natural to encode the entire TlaValue
as a JSON primitive, without the added object layer we would have a much tougher time deserializing.
We would need a) a sensible encoding of BigInt
values, which are not valid integers, and b) a way to distinguish both variants of BigInt
, as well as decimals, when deserializing (since JSON is not typed).
We could encode all values as strings, but they would be similarly indistinguishable when deserializing. Importantly, the type
field of the ValEx
expression is not guaranteed to contain a hint, as it could be Untyped
Take jsonOf1
to be the serialization of ValEx( TlaInt( 1 ) )( typeTag = Typed( IntT1() ) )
shown above. The serialization of 1 + 1
is then equal to
{
"kind": "OperEx",
"type": "Int",
"oper": "PLUS",
"args": [jsonOf1, jsonOf1]
}
In general, for any given oper: TlaOper
of OperEx
, the value of the oper
field in the serialization equals oper.name
.
7. Implementation
The implementation of the serialization can be found in the class
at.forsyte.apalache.io.json.TlaToJson
of the module tla-import
, see TlaToJson.