Intervals and their relations
Two years ago I had the luck to participate to DDDEurope and listen to the nice keynote by Eric Evans on Modelling Time. I really liked the idea of modelling time not necessarily as instants but more as intervals, and making explicit the possible relations which two intervals can have. These relations are described by the so called Allen’s interval algebra.
Some time ago I started working on a little Haskell pet project to compute the common time availability of several people (something like doodle.com but where everyone first declares his/her availability and only then a time is chosen from the intersection of the availabilities) and the key concept of the domain were exactly intervals.
A bit after that Taylor Fausak released a Haskell package called Rampart which implements exactly the relations of Allen’s interval algebra.
Finding myself with a little of free time, I didn’t resist the temptation of trying to link my own library, which was dealing with intervals, to Rampart, so that I could consider relations between intervals.
While I was doing that, I realised that interpreting Allen’s interval algebra relations without ambiguities was not trivial (see for example this Rampart issue) and in the end I decided to reimplement it in a more general and unambiguous way.
INTERVALS
Loosely speaking we can define an interval as the set of values included between two boundaries. Trying to model this idea using types, we could sketch something like
data Boundary a = Boundary { _value :: a }
data Interval a = Interval { _start :: Boundary a, _end :: Boundary a }
We would like to ensure that _start
is always smaller than _end
. To this end we hide the Interval
data constructor and we expose a function to build an interval from two boundaries
instance Ord a => Ord (Boundary a)
boundsInterval :: Ord a => Boundary a -> Boundary a -> Interval
At this point we have to make our first design decision. What should boundsInterval
return if the first boundary if bigger that the second one? One option could be returning Maybe Interval
to notify the caller that he provided invalid values. Another option is to refine a bit the vague definition of interval we gave above, declaring an interval to be the set of values which are greater than _start
and smaller than _end
. Doing so, it becomes clear that, if _start
is greater than _end
, the interval delimited by _start
and _end
should simply be empty. Currently though we don’t have an explicit way to represent the empty interval, so let’s add it
data Interval a
= Empty
| Interval { _start :: Boundary a, _end :: Boundary a }
RELATIONS
Now that we defined what an interval could be, let’s try to understand how two intervals could relate to each other. We can list all the 13 possible relations in the following table
Relation | Illustration | Interpretation |
---|---|---|
|
X takes place before Y | |
|
X meets Y (i stands for inverse) | |
|
X overlaps with Y | |
|
X starts Y | |
|
X during Y | |
|
X finishes Y | |
X is equal to Y |
and define the following Haskell data type to enumerate all the possibilities
data Relation
= Equal
| Starts
| Finishes
| During
| StartedBy
| FinishedBy
| Contains
| Before
| After
| Meets
| IsMet
| Overlaps
| OverlappedBy
We would like now to define a function
relate :: Ord a => Interval a -> Interval a -> Relation
to compute how two given intervals are related.
SINGLETONS
When we try concretely to implement relate
we realise that there are some situations which are a bit ambiguous and they all pertain the case where one of the two intervals is a singleton, i.e. the starting bound and the ending bound coincide.
Let’s consider for example the relation between two intervals a = boundsInterval x x
and b = boundsInterval x y
. Should it be that a
meets b
since they have a boundary in common and b
happens all after a
? Or should it be that a
starts b
? Or maybe a
overlaps b
? It is not immediately clear what the correct choice should be.
The Allen’s paper notices the issue and dismisses it considering only intervals with a start smaller than the end.
One possible solution comes from a more precise inspection of the domain and using a bit of maths to abstract the relevant information out of our particular case.
BOUNDARIES
The first ingredient to solve our issue with relate
is to be more precise in the description of our domain model.
Above, when we gave the definition of interval, we never really specified the semantic of the boundaries. When we write Interval x y
do we mean that x
and y
are actually contained in the interval, or do we mean that they are excluded? There is no correct choice here, so to remain agnostic of this detail, we provide the possibility to specify explicitly in a boundary whether it should be included or excluded
data IsIncluded
= Included
| Excluded
data Boundary a = Boundary
{ _boundaryValue :: a
, _isIncluded :: IsIncluded
}
Now we can be very explicit when we define our intervals
a = boundsInterval (Boundary 1 Included) (Boundary 3 Excluded)
-- 1 ≤ x < 3
b = boundsInterval (Boundary 2 Included) (Boundary 2 Included)
-- 2 ≤ x ≤ 2 => b = {2}
c = boundsInterval (Boundary 0 Excluded) (Boundary 0 Included)
-- 0 < x ≤ 0 => c is empty
RELATIONS RELOADED
Now that we introduced both included and excluded boundaries, the 13 cases in the Relation
data type reveal themselves as being not that well defined. For example, if we consider the two intervals
a = boundsInterval (Boundary 1 Excluded) (Boundary 2 Included)
b = boundsInterval (Boundary 2 Included) (Boundary 3 Excluded)
what should relate a b
be? Should it be Meets
or should it be Overlaps
? We clearly need a more precise definition to disambiguate such cases.
To do this, we need to consider the operations that we have available on intervals. Considering intervals as sets, we can derive for them operations to compute the intersection and the union of two intervals
intersection :: Ord a => Interval a -> Interval a -> Interval a
union :: Ord a
=> Interval a
-> Interval a
-> Either (Interval a) (Interval a, Interval a)
Notice how we are making explicit that the union of two intervals could be either one or two separate intervals.
Intersection and union induce a partial order ≤
on intervals defined by
-- a is contained in b
a ≤ b = a `union` b == b
= a `intersection` b == a
Using this partial order, we can identify four cases, when considering two intervals a
and b
:
-
a ≤ b
andb ≤ a
: this, by antisymmetry, means thata == b
; hence, it is clear that is must berelate a b = Equal
a ≤ b
andb ≰ a
: this means thata
is properly contained inb
. We can further dissect this possibility checking if the boundaries are equal:start a == start b
: in this caserelate a b = Starts
end a == end b
: in this caserelate a b = Finishes
start a != start b
andend a != end b
: in this caserelate a b = During
-
a ≰ b
andb ≤ a
: this is the symmetric case with respect to the previous one. Dissecting the possibilities as we did above we obtain theStartedBy
,FinishedBy
andContains
cases a ≰ b
andb ≰ a
: this means that neithera
is contained inb
, norb
is contained ina
The last case is still too broad, so we have to analyse it more precisely. We can check the intersection of a
and b
to split this further:
-
a `intersection` b != empty
: this means that the two intervals have some value in common, hence they overlap. Depending on the ordering ofstart a
andstart b
, we would get eitherrelate a b = Overlaps
orrelate a b = OverlappedBy
-
a `intersection` b == empty
: no value is in both intervals. At this point we inspect better the union ofa
andb
:a `union` b
is two separate intervals: in this case the two intervals are really disjoint, meaning that there exists a value which is bigger that one and smaller than the other. Depending on the order ofstart a
andstart b
we will get eitherrelate a b = Before
orrelate a b = After
a `union` b
is one single interval: the two intervals are disjoint, but there is not value between them, so they must be one right after the other. Depending on the order ofstart a
andstart b
we will get eitherrelate a b = Meets
orrelate a b = IsMet
This gives a precise algorithm to determine the relation occurring between any two intervals. We can see now that we have a precise way to distinguish what happens in cases where two intervals have a boundary with a common value
relate (boundsInterval (Boundary 0 Included) (Boundary 1 Excluded))
(boundsInterval (Boundary 1 Excluded) (Boundary 2 Included))
= Before -- not Meets
relate (boundsInterval (Boundary 0 Included) (Boundary 1 Included))
(boundsInterval (Boundary 1 Excluded) (Boundary 2 Included))
= Meets
relate (boundsInterval (Boundary 0 Included) (Boundary 1 Included))
(boundsInterval (Boundary 1 Included) (Boundary 2 Included))
= Overlaps -- not Meets
It disambiguates also the issues with singleton intervals
relate (boundsInterval (Boundary 0 Included) (Boundary 0 Included))
(boundsInterval (Boundary 0 Included) (Boundary 0 Included))
= Equals
relate (boundsInterval (Boundary 0 Included) (Boundary 0 Included))
(boundsInterval (Boundary 0 Included) (Boundary 2 Excluded))
= Starts
relate (boundsInterval (Boundary 1 Included) (Boundary 1 Included))
(boundsInterval (Boundary 0 Included) (Boundary 2 Included))
= During
CONCLUSION
We took a look at intervals and how they can relate with each other. We noticed how a naive model of the domain leads to subtle issues and ambiguities. With a better model, distinguishing between included and excluded interval boundaries, and using their algebraic operations on intervals as much as possible, we were able to disambiguate the problematic cases and recover a clear view of the situation.
In general I believe that trying to model a domain using mathematical abstractions is a very productive choice. It usually leads to more general and elegant solutions because it forces us to understand our domain to isolate the relevant aspects and forget about the noise given by context-over-specific information.
If you want to check out a working implementation of what we discussed above, you can find it in my availer project.