RFC 001: types and type annotations
Contributors (in alphabetical order): Shon Feder @shonfeder, Igor Konnov @konnov, Jure Kukovec @Kukovec, Markus Kuppe @lemmy, Andrey Kupriyanov @andrey-kuprianov, Leslie Lamport
This is an RFC that reviews a number of possibilities. A concrete proposal can be found in ADR-002.
It is good to have a number of different opinions here. We have three questions:
- How to write types in TLA+.
- How to write type annotations (as a user).
- How to display and use inferred types.
1. How to write types in TLA+
Everybody has a different opinion here. It would be great to use the native TLA+ constructs to express types.
1.1. TypeOK syntax
The only way to write types in the TypeOK
style is by set membership.
For instance:
x
is an integer:x \in Int
f
is a function from an integer to an integer:f \in [Int -> Int]
f
is a function from a set of integers to a set of integers:f \in [SUBSET Int -> SUBSET Int]
r
is a record that has the fieldsa
andb
, wherea
is an integer andb
is a string:r \in [a: Int, b: STRING]
f
is a set of functions from a tuple of integers to an integer:f \in SUBSET [Int \X Int -> Int]
Foo
is an operator of anInt
andSTRING
that returns anInt
:\A a \in Int: \A b \in STRING: Foo(a, b) \in Int
Bar
is a higher-order operator that takes an operator that takes anInt
andSTRING
and returns anInt
, and returns aBOOLEAN
.
Here is an approach to higher-order operators suggested by Leslie Lamport, where he uses a theorem:
THEOREM BarType ==
ASSUME NEW G(_,_),
\A x \in Int, y \in STRING : G(x,y) \in Int
PROVE Bar(G) \in BOOLEAN
Similar to that, we can write a theorem about the type of Foo
:
THEOREM FooType ==
\A a \in Int: \A b \in STRING: Foo(a, b) \in Int
1.2. Types as terms
A classical way of writing types is by using logical terms (or algebraic datatypes).
To this end, we can define a special module Types.tla
:
---- MODULE Types ----
\* Types as terms. The right-hand side of an operator does not play a role,
\* but we define it as the corresponding set of values.
\* Alternatively, we could just define tuples of strings in rhs.
\* a type annotation operator that erases the type
value <: type == value
\* the integer type
IntT == Int
\* the Boolean type
BoolT == BOOLEAN
\* the string type
StrT == STRING
\* a set type
SetT(elemT) == SUBSET elemT
\* a function type
FunT(fromT, toT) == [fromT -> toT]
\* a sequence type
SeqT(elemT) == Seq(elemT)
\* tuple types
Tup0T == {}
Tup1T(t1) == t1
Tup2T(t1, t2) == t1 \X t2
Tup3T(t1, t2, t3) == t1 \X t2 \X t3
\* and so on, e.g., Scala has 26 tuples. how many do we like to have?
\* Record types. We assume that field names are alphabetically ordered.
\* We cannot use record-set notation here,
\* as the field names are parameters. So I gave up here on giving corresponding sets.
Rec1T(f1, t1) == <<"Rec1", f1, t1>>
Rec2T(f1, t1, f2, t2) == <<"Rec2", f1, t1, f2, t2>>
Rec3T(f1, t1, f2, t2, f3, t3) == <<"Rec3", f1, t1, f2, t2, f3, t3>>
\* and so on
\* Operator types. No clear set semantics.
\* Note that the arguments can be operators as well!
\* So this approach gives us higher-order operators for free.
Oper0T(resT) == <<"Oper0", resT>>
Oper1T(arg1T, resT) == <<"Oper1", arg1T, res1T>>
Oper2T(arg1T, arg2T, resT) == <<"Oper2", arg1T, arg2T, res1T>>
\* and so on
======================
Assuming that we have some syntax for writing down that x
has type T
,
e.g., by writing x <: T
, we can write the above examples as follows:
-
x
is an integer:x <: IntT
-
f
is a function from an integer to an integer:f <: FunT(IntT, IntT)
-
f
is a function from a set of integers to a set of integers:f <: FunT(SetT(IntT), SetT(IntT))
-
r
is a record that has the fieldsa
andb
, wherea
is an integer andb
is a string:r <: Rec2T("a", IntT, "b", StrT)
-
f
is a set of functions from a tuple of integers to an integer:f <: SetT(FunT(Tup2T(IntT, IntT), IntT))
-
Foo
is an operator of anInt
andSTRING
that returns anInt
:\A a: \A b: Foo(a, b) <: Oper2(IntT, StrT, IntT)
.- Here it gets tricky, as the TLA+ syntax does not allow us to mention an operator by name without applying it.
-
Bar
is a higher-order operator that takes an operator that takes anInt
andSTRING
and returns anInt
, and returns aBOOLEAN
.\A a, b, c: Bar(LAMBDA a, b: c) <: Oper1(Oper2(IntT, StrT, IntT), BoolT)
.- Here we have to pull lambda operators, but at least it is possible to write down a type annotation.
1.3. Types as strings
Let us introduce the following grammar for types:
T ::= var | Bool | Int | Str | T -> T | Set(T) | Seq(T) |
<<T, ..., T>> | [h_1 |-> T, ..., h_k |-> T] | (T, ..., T) => T
In this grammar, var
stands for a type variable, which can be instantiated with
concrete variable names such as a
, b
, c
, etc., whereas h_1
,...,h_k
are
field names. The rule T -> T
defines a function, while the rule
(T, ..., T) => T
defines an operator.
Assuming that we have some syntax for writing down that x
has type T
,
e.g., by writing isType("x", "T")
, we can write the above examples as follows:
-
x
is an integer:isType("x", "Int")
. -
f
is a function from an integer to an integer:isType("f", "Int -> Int")
. -
f
is a function from a set of integers to a set of integers:isType("f", "Set(Int) -> Set(Int))"
. -
r
is a record that has the fieldsa
andb
, wherea
is an integer andb
is a string:isType("r", "[a |-> Int, b |-> Str])"
. -
f
is a set of functions from a tuple of integers to an integer:isType("f", "Set(<<Int, Int>> -> Int))"
. -
Foo
is an operator of anInt
andSTRING
that returns anInt
:isType("Foo", "(Int, Str) => Int")
. -
Bar
is a higher-order operator that takes an operator that takes anInt
andSTRING
and returns anInt
, and returns aBOOLEAN
:isType("Bar", "((Int, Str) => Int) => Bool")
.
Note: We have to pass names as strings, as it is impossible to pass operator
names, e.g., Foo
and Bar
in other operators, unless Foo
and Bar
are nullary operators and isType
is a higher-order operator.
2. How to write type annotations (as a user)
Note: This question is not a priority, as we do not expect the user to write type annotations. However, it would be good to have a solution, as sometimes users want to write types.
Again, we have plenty of options and opinions here:
- Write type annotations by calling a special operator like
<:
or|=
. - Write type annotations as assumptions.
- Write type annotations in comments.
- Write type annotations as operator definitions.
2.1. Type annotations with a special operator
This is the current approach in Apalache. One has to define an operator, e.g., <:
:
value <: type == value
Then an expression may be annotated with a type as follows:
VARIABLE S
Init ==
S = {} <: {Int}
Pros:
- Intutive notation, similar to programming languages.
Cons:
- This approach works well for expressions. However, it is not clear how to extend it to operators.
- This notation is more like type clarification, rather than a type annotation. Normally types are specified for names, that is, constants, variables, functions, operators, etc.
- Same expression may be annotated in a Boolean formula. What shall we do, if the
user writes:
x <: BOOLEAN \/ x <: Int
?
Note: The current approach has an issue. If one declares the operator <:
in
a module M
and then uses an unnamed instance INSTANCE M
in a module M2
,
then M
and M2
will clash on the operator <:
. We should define the operator
once in a special module Types
or Apalache
.
2.2. Type annotations as assumptions
One can use TLA+ syntax to write assumptions and assertions about the types. We are talking only about type assumptions here. The similar approach can be used to write theorems about types. Consider the following specification:
EXTENDS Sequences
CONSTANTS Range
VARIABLES list
Mem(e, es) ==
\E i \in DOMAIN es:
e = es[i]
In this example, the operator Mem
is polymorphic, whereas the types of Range
and list
are parameterized. If the user wants to
restrict the types of constants, variables, and operators, they could write (using the
TypeOK syntax):
ASSUME(Range \in SUBSET Int)
ASSUME(list \in Seq(Int))
ASSUME(\A e \in Int, \A es \in Seq(Int): Mem(e, es) \in BOOLEAN)
SANY parser only accepts the first assumption in the above example. The two other assumptions are rejected by the parser, as they refer to non-constant values.
Moreover, using the proof syntax of
TLA+ Version 2,
we can annotate the
types of variables introduced inside the operators. For instance, we could
label the name i
as follows:
Mem(e, es) ==
\E i \in DOMAIN es:
e = es[i_use :: i]
And then write:
ASSUME(\A e, es, i: Mem(e, es)!i_use(i) \in Int)
Pros:
- The assumptions syntax is quite appealing, when writing types of CONSTANTS, VARIABLES, and top-level operators.
Cons:
- The syntax gets verbose and hard to write, when writing types of LET-IN operators and bound variables.
- It is not clear how to extend this syntax to higher-order operators.
- One cannot write assumptions about state variables.
2.3. Type annotations in comments
This solution basically gives up on TLA+ syntax and introduces a special syntax à la javadoc for type annotations:
EXTENDS Sequences
CONSTANTS Range \*@ Range: Set(Int)
VARIABLES list \*@ list: Seq(Int)
Mem(e, es) ==
\*@ Mem: (Int, Seq(Int)) => Bool
\E i \in DOMAIN es:
\*@ i: Int
e = es[i]
We have not come up with a good syntax for these annotaions. The above example demonstrates one possible approach.
Pros:
- Non-verbose, simple syntax
- Type annotations do not stand in the way of the specification author
- Type annotations may be collapsed, removed, etc.
- If we have an annotation preprocessor, we can use it for other kinds of annotations
Cons:
- As we give up on the TLA+ syntax, TLA+ Toolbox will not help us (though it is not uncommon for IDEs to parse javadoc annotations, so there is some hope)
- The users have to learn new syntax for writing type annotations and types
- We have to write an annotation preprocessor
2.4. Type annotations as operator definitions
Operators definitions and LET-IN definitions can be written almost anywhere in TLA+. Instead of writing in-comment annotations, we can write annotations with operator definitions (assuming types as strings, but this is not necessary):
EXTENDS Sequences
CONSTANTS Range
Range_type == "set(z)"
VARIABLES list
list_type == "seq(z)"
Mem(e, es) ==
LET Mem_type == "<a, seq(a)> => Bool" IN
\E i \in DOMAIN es:
LET i_type == "Int" IN
e = es[i]
Init ==
LET Init_type == "<> => Bool" IN
list = <<>>
Next ==
LET Next_type == "<> => Bool" IN
\E e \in Range:
LET e_type == "set(z)" IN
list' = Append(list, e)
Pros:
- No need for a comment preprocessor, easy to extract annotations from the operator definitions
Cons:
- Fruitless operator definitions
- Looks like a hack
3. How to display and use inferred types
TBD
Basically, use Language Server Protocol and introduce THEOREMs in the spirit of types as TypeOK.