Either why or how

Either is one of the easiest data structures which can be used to make your code safer and more composable. It is not particularly complex but sometimes it is still hard to grasp, especially for programmers used to a procedural or object oriented approach.

In this post I’ll try to give an explanation on the benefits of using the Either data structure, both from a theoretical and from a practical point of view. I’ll provide examples in Haskell and PHP with Psalm, because these are the languages I know, but the same ideas are easily adaptable to any other programming language.

This post is basically a transcript of an explanation I gave about Either to my colleagues at Soisy, where we are starting to extensively use Either in our PHP codebase. I’d like to thank them all for the support and the precious feedback.

Let’s start from the why

The main reason to introduce Either, as many other data structures, in a codebase comes from the desire to make the code satisfy referential transparency, which brings a lot of goodies:

the code is simpler (non necessarily easier), because there are less moving parts and less communication channels
the code is easier to reason about, because it has a clear semantic
the code is easier to test, because we can just check the output is the expected one given a certain input
the code is easier to refactor
the code is safer, because less things can happen under the hood
the code is more composable
sometimes the code could be made more performant, since we can cache basically any step of our computations

Unfortunately, some things like errors, state management and interaction with the outside world, tend to break referential transparency, with the cost of losing all the above benefits.

Luckily, we can have the cake and eat it, too! The trick is to represent effects like errors, state and IO inside the language, bringing back referential transparency.

A motivating example

If we think about operations which can fail, like validations and parsing, we can see that they have an input, which is the datum we want to validate or parse, and an output, which can be either a success, containing the desired result, or a failure, containing some information about what failed.

It comes quite natural to model this with a function which takes an input and returns either a success or a failure.

-- haskell
parse :: input -> Either failure success

// php
/**
 * @template Input
 * @template Failure
 * @template Success
 */
interface Parser
{
  /**
   * @param Input
   * @return Either<Failure, Success>
   */
  public function parse($input): Either;
}

Once we are able to represent the concept of something is either this or that in our language, we can completely describe a parsing operation, regaining our dear referential transparency.

Modelling `Either`

Our task now is to model an Either data structure. To do this, we need to consider the requirements that such a structure needs to satisfy.

It might not be completely straightforward, but the requirements which characterize the concept of something which could be Either a or b are the following:

we need a way to construct Either a b starting from an a
we need a way to construct Either a b starting from a b
we need to do the above in the best possible way, i.e. as soon as we have a data type t which can be constructed from an a and can be constructed from a b, then we need a way to construct t from Either a b

We can represent them with a diagram

If you want to know more why these requirements make sense, take a look at coproducts.

Another thing which could help with the intuition of these requirements is to think about them in terms of logic, interpreting a, b and t as propositions and functions as implications (as suggested by the Curry-Howard correspondence). Then the requirements we listed above for Either become the introduction and elimination rules for the or disjunction operator.

Translating into code

Translating the above requirements into code is fairly straightforward.

Haskell

Let’s try first in Haskell, where it’s easier.

We need to define a type Either a b which can be constructed either from an a or from a b. Hence we define a datatype with two constructors Left, which accepts an a, and Right, which accepts a b

data Either a b
  = Left a
  | Rigth b

This satisfies our first two requirements. For the third one we need a function which receives a function a -> t and a function b -> t and returns a function Either a b -> t. This is exactly what the either function does

either :: (a -> t) -> (b -> t) -> Either a b -> t

PHP

Now let’s try in PHP, where the solution is a little more verbose, but not anymore complicated. First we need the two constructors

/**
 * @template A
 * @template B
 */
class Either
{
  /**
   * @template C
   * @template D
   * @param C $value
   * @return self<C, D>
   */
  public static function left($value): self {...}

  /**
   * @template C
   * @template D
   * @param D $value
   * @return self<C, D>
   */
  public static function right($value): self {...}
}

For the third requirement, we need to define a method which receives as input two callables for consuming each of the branches and use them to process the Either itself to compute the result

/**
 * @template A
 * @template B
 */
class Either
{
  /**
   * @template T
   * @param callable(A): T $ifLeft
   * @param callable(B): T $ifRight
   * @return T
   */
  public function eval(callable $ifLeft, callable $ifRight) {...}
}

For a more complete implementation of Either in PHP, take a look at this or this.

What can we do with this?

Now it’s time to see some examples and some concrete benefits of such an approach. Let’s see some of the things which come quite easy and natural once we have the Either data structure in place.

I’ll try to provide some concrete examples all in the context of parsing/validation.

Forget (temporarily) that we are working with `Either`

Suppose you were able to parse a User from some raw data, obtaining an Either Error User. Your User datatype/class also exposes a function/method to retrieve her date of birth.

-- haskell
birthDate :: User -> Date

// php
class User
{
  public function birthDate(): DateTimeImmutable {...}
}

The function birthDate does not know anything about Either, so how can we use it when we only have an Either Error User? Luckily Either allows to transform a function which knows nothing about it to a function which works nicely with it. This is done using the so called map combinator

-- haskell
fmap birthDate :: Either Error User -> Either Error Date

/**
 * @template A
 * @template B
 */
class Either
{
  /**
   * @template C
   * @param callable(B): C $f
   * @return self<A, C>
   */
  public function map(callable $f): self {..}
}

/** @var Either<Error, User> $eitherUser */
$eitherUser;

/** @var Either<Error, DateTimeImmutable> $eitherBirthDate */
$eitherBirthDate = $eitherUser->map([User::class, 'birthDate']);

Join several results

Suppose you have a User datatype/class which can be constructed just from a name and a surname

-- haskell
data User = User Name Surname

// php
class User
{
  public static function user(Name $name, Surname $surname): self {...}
}

If you want to build a user, you need to first build a Name and a Surname. Suppose we already have a way to parse strings to Name and Surname

-- haskell
parseName :: String -> Either Error Name
parseSurname :: String -> Either Error Surname

// php
class Name
{
  /**
   * @return Either<Error, self>
   */
  public static function parse(string $s): Either {...}
}

class Surname
{
  /**
   * @return Either<Error, self>
   */
  public static function parse(string $s): Either {...}
}

What we would like to do now is pass the results of the parsing of Name and Surname to the constructor of User but, as in the previous paragraph, our arguments are wrapped in Either Error and therefore we can’t simply pass them to the constructor.

We’re still in luck since Either allows to lift a function of any arity (i.e. with any number of arguments) to its context

-- haskell
User <$> parseName "Marco" <*> parseSurname "Perone" :: Either Error User

// php
/**
 * @template A
 * @template B
 */
class Either
{
  public static function liftA(callable $f, self ...$args): self {..}
}

/** @var Either<Error, User> $eitherUser */
$eitherUser = Either::liftA(
  [User::class, 'user'],
  Name::parse('Marco'),
  Surname::parse('Perone')
);

The difference in the implementation between Haskell and PHP is due to the fact that in Haskell every function is curried by default, which makes it easier to use simpler operators.

What happens to the errors?

The code above looks nice, but there is a subtle thing which is not clear. Try to think about the case when both the parsing of the name and the parsing of the surname failed. What should be the final error? Should we stop the computation as soon as one piece of the puzzle fails? Or should we collect and report all the errors?

There is no correct answer here, it really depends on the context and on the situation, so we need to make space for both options.

Haskell by default stops as soon as it finds one error and reports just that one. If you want to collect all the errors, you should use Validation instead. As a datatype its definition is not different from the Either one. The main thing which changes is the Applicative instance, allowing us to collect errors. Notice that the instance requires a Semigroup constraint on Error, which allows us to accumulate the errors themselves.

In PHP the most reasonable thing to do is to modify the liftA implementation, adding a parameter which specifies how to behave when we have more than one error

// php
/**
 * @template A
 * @template B
 */
class Either
{
  /**
   * @param callable(A, A): A $joinLeft
   */
  public static function liftA(callable $joinLefts, callable $f, self ...$args): self {..}
}

Sequencing operation which may fail

It’s often the case that a parsing operation does not happen in a single step; more often it is composed by several steps, each one depending on the result of the previous one.

Let’s consider the case when we want to parse a User from the body of an HTTP request containing some JSON. We could split the whole operation into three steps:

parse the request’s body into JSON
parse the JSON into some raw UserData
parse the raw UserData into the domain User datatype/object

-- haskell
parseJSON :: String -> Either Error JSON
parseUserData :: JSON -> Either Error UserData
parseUser :: UserData -> Either Error User

class JSON
{
  /**
   * @return Either<Error, self>
   */
  public static function fromString(string $s): Either {...}
}

class UserData
{
  /**
   * @return Either<Error, self>
   */
  public static function fromJSON(JSON $json): Either {...}
}

class User
{
  /**
   * @return Either<Error, self>
   */
  public static function fromUserData(UserData $data): Either {...}
}

We clearly need to run the three steps in sequence, every time passing the result of the previous step to the next one. As always Either allows to implement this behaviour in a very declarative manner

-- haskell
parseJSON >=> parseUserData >=> parseUser :: String -> Either Error User

/** @var string $responseBody */
$responseBody;

JSON::fromString($responseBody)
  ->bind([UserData::class, 'fromJSON'])
  ->bind([User::class, 'fromUserData'])

The name bind is the classical one, it might come easier to read it as andThen.

The difference in the implementation here is due to object orientation. The signature of the >=> operator is

(>=>) :: (a -> Either e b) -> (b -> Either e c) -> a -> Either e c

As you can see >=> has only functions as inputs. This prevents us from implementing it as a concrete method of a class. We can solve by using a completely equivalent formulation of >=>, which is

bind :: Either e a -> (a -> Either e b) -> Either a b

Its first argument is a concrete datatype and therefore we can implement it as a method of the Either class.

/**
 * @template E
 * @template A
 */
class Either
{
  /**
   * @template B
   * @param callable(A): Either<E, B> $next
   * @return Either<E, B>
   */
  public static function bind(callable $next): self {..}
}

Conclusion

Either is a data structure which allows describing in a single datatype/class the fact that something could be either one of two things. This turns out to be very helpful also in domain modelling, when a single concept could have different instances.

Either particularly shines when it is used to represent the result of a computation which might fail. This allows us in the first place to gain all the benefits which referential transparency offers. Moreover, it allows the creation of a very expressive API which leads to a very declarative style of programming.

As a data structure Either is much more used in functional languages, but it could be used with great advantage in any language which offers higher order functions and type variables/generics.

If you haven’t already, I’d encourage you to try to use it in your code base. Once you do, it’s hard you’ll want to go back.