Intervals and their relations

Two years ago I had the luck to participate to DDDEurope and listen to the nice keynote by Eric Evans on Modelling Time. I really liked the idea of modelling time not necessarily as instants but more as intervals, and making explicit the possible relations which two intervals can have. These relations are described by the so called Allen’s interval algebra.

Some time ago I started working on a little Haskell pet project to compute the common time availability of several people (something like doodle.com but where everyone first declares his/her availability and only then a time is chosen from the intersection of the availabilities) and the key concept of the domain were exactly intervals.

A bit after that Taylor Fausak released a Haskell package called Rampart which implements exactly the relations of Allen’s interval algebra.

Finding myself with a little of free time, I didn’t resist the temptation of trying to link my own library, which was dealing with intervals, to Rampart, so that I could consider relations between intervals.

While I was doing that, I realised that interpreting Allen’s interval algebra relations without ambiguities was not trivial (see for example this Rampart issue) and in the end I decided to reimplement it in a more general and unambiguous way.

INTERVALS

Loosely speaking we can define an interval as the set of values included between two boundaries. Trying to model this idea using types, we could sketch something like

data Boundary a = Boundary { _value :: a }

data Interval a = Interval { _start :: Boundary a, _end :: Boundary a }

We would like to ensure that _start is always smaller than _end. To this end we hide the Interval data constructor and we expose a function to build an interval from two boundaries

instance Ord a => Ord (Boundary a)

boundsInterval :: Ord a => Boundary a -> Boundary a -> Interval

At this point we have to make our first design decision. What should boundsInterval return if the first boundary if bigger that the second one? One option could be returning Maybe Interval to notify the caller that he provided invalid values. Another option is to refine a bit the vague definition of interval we gave above, declaring an interval to be the set of values which are greater than _start and smaller than _end. Doing so, it becomes clear that, if _start is greater than _end, the interval delimited by _start and _end should simply be empty. Currently though we don’t have an explicit way to represent the empty interval, so let’s add it

data Interval a
  = Empty
  | Interval { _start :: Boundary a, _end :: Boundary a }

RELATIONS

Now that we defined what an interval could be, let’s try to understand how two intervals could relate to each other. We can list all the 13 possible relations in the following table

Relation	Illustration	Interpretation
$X\,{\mathrel }}\,Y$ $Y\,{\mathrel }}\,X$		X takes place before Y
$X\,{\mathrel }}\,Y$ $Y\,{\mathrel }}\,X$		X meets Y (i stands for inverse)
$X\,{\mathrel }}\,Y$ $Y\,{\mathrel }}\,X$		X overlaps with Y
$X\,{\mathrel }}\,Y$ $Y\,{\mathrel }}\,X$		X starts Y
$X\,{\mathrel }}\,Y$ $Y\,{\mathrel }}\,X$		X during Y
$X\,{\mathrel }}\,Y$ $Y\,{\mathrel }}\,X$		X finishes Y
$X\,{\mathrel }}\,Y$		X is equal to Y

and define the following Haskell data type to enumerate all the possibilities

data Relation
  = Equal
  | Starts
  | Finishes
  | During
  | StartedBy
  | FinishedBy
  | Contains
  | Before
  | After
  | Meets
  | IsMet
  | Overlaps
  | OverlappedBy

We would like now to define a function

relate :: Ord a => Interval a -> Interval a -> Relation

to compute how two given intervals are related.

SINGLETONS

When we try concretely to implement relate we realise that there are some situations which are a bit ambiguous and they all pertain the case where one of the two intervals is a singleton, i.e. the starting bound and the ending bound coincide.

Let’s consider for example the relation between two intervals a = boundsInterval x x and b = boundsInterval x y. Should it be that a meets b since they have a boundary in common and b happens all after a? Or should it be that a starts b? Or maybe a overlaps b? It is not immediately clear what the correct choice should be.

The Allen’s paper notices the issue and dismisses it considering only intervals with a start smaller than the end.

One possible solution comes from a more precise inspection of the domain and using a bit of maths to abstract the relevant information out of our particular case.

BOUNDARIES

The first ingredient to solve our issue with relate is to be more precise in the description of our domain model.

Above, when we gave the definition of interval, we never really specified the semantic of the boundaries. When we write Interval x y do we mean that x and y are actually contained in the interval, or do we mean that they are excluded? There is no correct choice here, so to remain agnostic of this detail, we provide the possibility to specify explicitly in a boundary whether it should be included or excluded

data IsIncluded
  = Included
  | Excluded

data Boundary a = Boundary
  { _boundaryValue :: a
  , _isIncluded :: IsIncluded
  }

Now we can be very explicit when we define our intervals

a = boundsInterval (Boundary 1 Included) (Boundary 3 Excluded)
-- 1 ≤ x < 3

b = boundsInterval (Boundary 2 Included) (Boundary 2 Included)
-- 2 ≤ x ≤ 2 => b = {2}

c = boundsInterval (Boundary 0 Excluded) (Boundary 0 Included)
-- 0 < x ≤ 0 => c is empty

RELATIONS RELOADED

Now that we introduced both included and excluded boundaries, the 13 cases in the Relation data type reveal themselves as being not that well defined. For example, if we consider the two intervals

a = boundsInterval (Boundary 1 Excluded) (Boundary 2 Included)
b = boundsInterval (Boundary 2 Included) (Boundary 3 Excluded)

what should relate a b be? Should it be Meets or should it be Overlaps? We clearly need a more precise definition to disambiguate such cases.

To do this, we need to consider the operations that we have available on intervals. Considering intervals as sets, we can derive for them operations to compute the intersection and the union of two intervals

intersection :: Ord a => Interval a -> Interval a -> Interval a

union :: Ord a
  => Interval a
  -> Interval a
  -> Either (Interval a) (Interval a, Interval a)

Notice how we are making explicit that the union of two intervals could be either one or two separate intervals.

Intersection and union induce a partial order ≤ on intervals defined by

-- a is contained in b
a ≤ b = a `union` b == b
      = a `intersection` b == a

Using this partial order, we can identify four cases, when considering two intervals a and b:

a ≤ b and b ≤ a: this, by antisymmetry, means that a == b; hence, it is clear that is must be relate a b = Equal
a ≤ b and b ≰ a: this means that a is properly contained in b. We can further dissect this possibility checking if the boundaries are equal:
- start a == start b: in this case relate a b = Starts
- end a == end b: in this case relate a b = Finishes
- start a != start b and end a != end b: in this case relate a b = During
a ≰ b and b ≤ a: this is the symmetric case with respect to the previous one. Dissecting the possibilities as we did above we obtain the StartedBy, FinishedBy and Contains cases
a ≰ b and b ≰ a: this means that neither a is contained in b, nor b is contained in a

The last case is still too broad, so we have to analyse it more precisely. We can check the intersection of a and b to split this further:

a `intersection` b != empty: this means that the two intervals have some value in common, hence they overlap. Depending on the ordering of start a and start b, we would get either relate a b = Overlaps or relate a b = OverlappedBy
a `intersection` b == empty: no value is in both intervals. At this point we inspect better the union of a and b:
- a `union` b is two separate intervals: in this case the two intervals are really disjoint, meaning that there exists a value which is bigger that one and smaller than the other. Depending on the order of start a and start b we will get either relate a b = Before or relate a b = After
- a `union` b is one single interval: the two intervals are disjoint, but there is not value between them, so they must be one right after the other. Depending on the order of start a and start b we will get either relate a b = Meets or relate a b = IsMet

This gives a precise algorithm to determine the relation occurring between any two intervals. We can see now that we have a precise way to distinguish what happens in cases where two intervals have a boundary with a common value

relate (boundsInterval (Boundary 0 Included) (Boundary 1 Excluded))
       (boundsInterval (Boundary 1 Excluded) (Boundary 2 Included))
  = Before -- not Meets

relate (boundsInterval (Boundary 0 Included) (Boundary 1 Included))
       (boundsInterval (Boundary 1 Excluded) (Boundary 2 Included))
  = Meets

relate (boundsInterval (Boundary 0 Included) (Boundary 1 Included))
       (boundsInterval (Boundary 1 Included) (Boundary 2 Included))
  = Overlaps -- not Meets

It disambiguates also the issues with singleton intervals

relate (boundsInterval (Boundary 0 Included) (Boundary 0 Included))
       (boundsInterval (Boundary 0 Included) (Boundary 0 Included))
  = Equals

relate (boundsInterval (Boundary 0 Included) (Boundary 0 Included))
       (boundsInterval (Boundary 0 Included) (Boundary 2 Excluded))
  = Starts

relate (boundsInterval (Boundary 1 Included) (Boundary 1 Included))
       (boundsInterval (Boundary 0 Included) (Boundary 2 Included))
  = During

CONCLUSION

We took a look at intervals and how they can relate with each other. We noticed how a naive model of the domain leads to subtle issues and ambiguities. With a better model, distinguishing between included and excluded interval boundaries, and using their algebraic operations on intervals as much as possible, we were able to disambiguate the problematic cases and recover a clear view of the situation.

In general I believe that trying to model a domain using mathematical abstractions is a very productive choice. It usually leads to more general and elegant solutions because it forces us to understand our domain to isolate the relevant aspects and forget about the noise given by context-over-specific information.

If you want to check out a working implementation of what we discussed above, you can find it in my availer project.