Idiom 0: Keep state variables to the minimum
In imperative programming, it is common to use mutable variable assignments liberally, but to exercise caution whenever mutable variables have a global scope. In TLA+, mutable variables are always global, so it is important to use them carefully and in a way that accurately reflects the global state of the system you are specifying.
Description
A good TLA+ specification minimizes the computation state and makes it visible.
TLA+ does not have a special syntax for variable assignment. For a good reason. The power of TLA+ is in writing constraints on variables rather than in writing detailed commands. If you have been writing in languages such as C, C++, Java, Python, your first reflex would be to define a variable to store the intermediate result of a complex computation.
In programming languages, we introduce temporary variables for several reasons:
- To avoid repetitive computations of the same expression,
- To break down a large expression into a series of smaller expressions,
- To make the code concise.
Point 1 is a non-issue in TLA+, as it is mostly executed in the reader's brain, and people are probably less efficient in caching expressions than computers. Points 2 and 3 can be nicely addressed with LET-definitions in TLA+. Hence, there is no need for auxiliary variables.
Usually, we should minimize the specification state, that is, the scope of the data
structures that are declared with VARIABLES
. It does not mean that one variable
is always better than two. It means that what is stored in VARIABLES
should be
absolutely necessary to describe the computations or the observed properties.
Advantages
By avoiding auxiliary state variables, we localize the updates to the state. This improves specification readability. It also helps the tools, as large parts of the specification become deterministic.
Disadvantages
Sometimes, we have to expose the internals of the computation. For instance, if we want to closely monitor the values of the computed expressions, when using the specification for model-based testing.
Sometimes, we have to break this idiom to make the specification more readable. Here is an example by Markus Kuppe. The specification of BlockingQueue that has one more variable is easier to read than the original specification with a minimal number of variables.
Example
Consider the following implementation of Bubble sort in Python:
my_list = [5, 4, 3, 8, 1]
finished = False
my_list_len = len(my_list) # cache the length
while not finished:
finished = True
if my_list_len > 0:
prev = my_list[0] # save the first element to use in the loop
for i in range(1, my_list_len):
current = my_list[i]
if prev <= current:
# save current for the next iteration
prev = current
else:
# swap the elements
my_list[i - 1] = current
my_list[i] = prev
finished = False
Notice that we have introduced three local variables to optimize the code:
my_list_len
to cache the length of the list,prev
to cache the previously accessed element of the list, in order to minimize the number of list accesses,current
to cache the iterated element of the list.
In TLA+, one usually does not introduce local variables for the intermediate
results of the computation, but rather introduces variables to represent the
essential part of the algorithm state. (While we have spent some time on code
optimization, we might have missed the fact that our sorting algorithm is not
as good as Quicksort.) In the above example, the essential variables are
finished
and my_list
.
Compare the above code to (a slightly more abstract) bubble sort in TLA+:
EXTENDS Integers, Sequences
in_list == <<5, 4, 3, 8, 1>>
VARIABLES my_list, finished
Init ==
/\ my_list = in_list
/\ finished = FALSE
IsSorted(lst) ==
\A i \in DOMAIN lst \ {1}:
lst[i - 1] <= lst[i]
WhenSorted ==
/\ IsSorted(my_list)
/\ finished' = TRUE
/\ UNCHANGED my_list
WhenUnsorted ==
/\ \E i \in DOMAIN my_list \ {1}:
/\ my_list[i - 1] > my_list[i]
/\ my_list' = [my_list EXCEPT ![i - 1] = my_list[i],
![i] = my_list[i - 1]]
/\ finished' = FALSE
Next ==
IF finished
THEN UNCHANGED <<my_list, finished>>
ELSE WhenSorted \/ WhenUnsorted
Our TLA+ code contains only two state variables: my_list
and finished
.
Other variables are introduced by quantifiers (e.g., \E i \in ...
).
The state variables are not updated in the sense of programming languages.
Rather, one writes constraints over unprimed and primed versions, e.g.:
...
/\ my_list' = [my_list EXCEPT ![i - 1] = my_list[i],
![i] = my_list[i - 1]]
Of course, one can introduce aliases for intermediate expressions, for instance, by using let-definitions:
...
LET prev == my_list[i - 1]
current == my_list[i]
IN
/\ prev > current
/\ my_list' = [my_list EXCEPT ![i - 1] = current, ![i] = prev]
However, the let-definitions are not variables, they are just aliases for more complex expressions. Importantly, one cannot update the value of an expression that is defined with a let-definition. In this sense, TLA+ is similar to functional languages, where side effects are carefully avoided and minimized.
In contrast to functional languages, the value of TLA+ is not in computing the result of a function application, but in producing sequences of states (called behaviors). Hence, some parts of a useful TLA+ specification should have side effects to record the states.