Either why or how
Either
is one of the easiest data structures which can be used to make your code safer and more composable. It is not particularly complex but sometimes it is still hard to grasp, especially for programmers used to a procedural or object oriented approach.
In this post I’ll try to give an explanation on the benefits of using the Either
data structure, both from a theoretical and from a practical point of view. I’ll provide examples in Haskell
and PHP
with Psalm, because these are the languages I know, but the same ideas are easily adaptable to any other programming language.
This post is basically a transcript of an explanation I gave about Either
to my colleagues at Soisy
, where we are starting to extensively use Either
in our PHP
codebase. I’d like to thank them all for the support and the precious feedback.
Let’s start from the why
The main reason to introduce Either
, as many other data structures, in a codebase comes from the desire to make the code satisfy referential transparency, which brings a lot of goodies:
- the code is simpler (non necessarily easier), because there are less moving parts and less communication channels
- the code is easier to reason about, because it has a clear semantic
- the code is easier to test, because we can just check the output is the expected one given a certain input
- the code is easier to refactor
- the code is safer, because less things can happen under the hood
- the code is more composable
- sometimes the code could be made more performant, since we can cache basically any step of our computations
Unfortunately, some things like errors, state management and interaction with the outside world, tend to break referential transparency, with the cost of losing all the above benefits.
Luckily, we can have the cake and eat it, too! The trick is to represent effects like errors, state and IO inside the language, bringing back referential transparency.
A motivating example
If we think about operations which can fail, like validations and parsing, we can see that they have an input, which is the datum we want to validate or parse, and an output, which can be either a success, containing the desired result, or a failure, containing some information about what failed.
It comes quite natural to model this with a function which takes an input and returns either a success or a failure.
-- haskell
parse :: input -> Either failure success
// php
/**
* @template Input
* @template Failure
* @template Success
*/
interface Parser
{
/**
* @param Input
* @return Either<Failure, Success>
*/
public function parse($input): Either;
}
Once we are able to represent the concept of something is either this or that in our language, we can completely describe a parsing operation, regaining our dear referential transparency.
Modelling Either
Our task now is to model an Either
data structure. To do this, we need to consider the requirements that such a structure needs to satisfy.
It might not be completely straightforward, but the requirements which characterize the concept of something which could be Either
a
or b
are the following:
- we need a way to construct
Either a b
starting from ana
- we need a way to construct
Either a b
starting from ab
- we need to do the above in the best possible way, i.e. as soon as we have a data type
t
which can be constructed from ana
and can be constructed from ab
, then we need a way to constructt
fromEither a b
We can represent them with a diagram
If you want to know more why these requirements make sense, take a look at coproducts.
Another thing which could help with the intuition of these requirements is to think about them in terms of logic, interpreting a
, b
and t
as propositions and functions as implications (as suggested by the Curry-Howard correspondence). Then the requirements we listed above for Either
become the introduction and elimination rules for the or
disjunction operator.
Translating into code
Translating the above requirements into code is fairly straightforward.
Haskell
Let’s try first in Haskell, where it’s easier.
We need to define a type Either a b
which can be constructed either from an a
or from a b
. Hence we define a datatype with two constructors Left
, which accepts an a
, and Right
, which accepts a b
data Either a b
= Left a
| Rigth b
This satisfies our first two requirements. For the third one we need a function which receives a function a -> t
and a function b -> t
and returns a function Either a b -> t
. This is exactly what the either
function does
either :: (a -> t) -> (b -> t) -> Either a b -> t
PHP
Now let’s try in PHP, where the solution is a little more verbose, but not anymore complicated. First we need the two constructors
/**
* @template A
* @template B
*/
class Either
{
/**
* @template C
* @template D
* @param C $value
* @return self<C, D>
*/
public static function left($value): self {...}
/**
* @template C
* @template D
* @param D $value
* @return self<C, D>
*/
public static function right($value): self {...}
}
For the third requirement, we need to define a method which receives as input two callables for consuming each of the branches and use them to process the Either
itself to compute the result
/**
* @template A
* @template B
*/
class Either
{
/**
* @template T
* @param callable(A): T $ifLeft
* @param callable(B): T $ifRight
* @return T
*/
public function eval(callable $ifLeft, callable $ifRight) {...}
}
For a more complete implementation of Either
in PHP, take a look at this or this.
What can we do with this?
Now it’s time to see some examples and some concrete benefits of such an approach. Let’s see some of the things which come quite easy and natural once we have the Either
data structure in place.
I’ll try to provide some concrete examples all in the context of parsing/validation.
Forget (temporarily) that we are working with Either
Suppose you were able to parse a User
from some raw data, obtaining an Either Error User
. Your User
datatype/class also exposes a function/method to retrieve her date of birth.
-- haskell
birthDate :: User -> Date
// php
class User
{
public function birthDate(): DateTimeImmutable {...}
}
The function birthDate
does not know anything about Either
, so how can we use it when we only have an Either Error User
? Luckily Either
allows to transform a function which knows nothing about it to a function which works nicely with it. This is done using the so called map
combinator
-- haskell
fmap birthDate :: Either Error User -> Either Error Date
/**
* @template A
* @template B
*/
class Either
{
/**
* @template C
* @param callable(B): C $f
* @return self<A, C>
*/
public function map(callable $f): self {..}
}
/** @var Either<Error, User> $eitherUser */
$eitherUser;
/** @var Either<Error, DateTimeImmutable> $eitherBirthDate */
$eitherBirthDate = $eitherUser->map([User::class, 'birthDate']);
Join several results
Suppose you have a User
datatype/class which can be constructed just from a name and a surname
-- haskell
data User = User Name Surname
// php
class User
{
public static function user(Name $name, Surname $surname): self {...}
}
If you want to build a user, you need to first build a Name
and a Surname
. Suppose we already have a way to parse strings to Name
and Surname
-- haskell
parseName :: String -> Either Error Name
parseSurname :: String -> Either Error Surname
// php
class Name
{
/**
* @return Either<Error, self>
*/
public static function parse(string $s): Either {...}
}
class Surname
{
/**
* @return Either<Error, self>
*/
public static function parse(string $s): Either {...}
}
What we would like to do now is pass the results of the parsing of Name
and Surname
to the constructor of User
but, as in the previous paragraph, our arguments are wrapped in Either Error
and therefore we can’t simply pass them to the constructor.
We’re still in luck since Either
allows to lift a function of any arity (i.e. with any number of arguments) to its context
-- haskell
User <$> parseName "Marco" <*> parseSurname "Perone" :: Either Error User
// php
/**
* @template A
* @template B
*/
class Either
{
public static function liftA(callable $f, self ...$args): self {..}
}
/** @var Either<Error, User> $eitherUser */
$eitherUser = Either::liftA(
[User::class, 'user'],
Name::parse('Marco'),
Surname::parse('Perone')
);
The difference in the implementation between Haskell and PHP is due to the fact that in Haskell every function is curried by default, which makes it easier to use simpler operators.
What happens to the errors?
The code above looks nice, but there is a subtle thing which is not clear. Try to think about the case when both the parsing of the name and the parsing of the surname failed. What should be the final error? Should we stop the computation as soon as one piece of the puzzle fails? Or should we collect and report all the errors?
There is no correct answer here, it really depends on the context and on the situation, so we need to make space for both options.
Haskell by default stops as soon as it finds one error and reports just that one. If you want to collect all the errors, you should use Validation
instead. As a datatype its definition is not different from the Either
one. The main thing which changes is the Applicative
instance, allowing us to collect errors. Notice that the instance requires a Semigroup
constraint on Error
, which allows us to accumulate the errors themselves.
In PHP the most reasonable thing to do is to modify the liftA
implementation, adding a parameter which specifies how to behave when we have more than one error
// php
/**
* @template A
* @template B
*/
class Either
{
/**
* @param callable(A, A): A $joinLeft
*/
public static function liftA(callable $joinLefts, callable $f, self ...$args): self {..}
}
Sequencing operation which may fail
It’s often the case that a parsing operation does not happen in a single step; more often it is composed by several steps, each one depending on the result of the previous one.
Let’s consider the case when we want to parse a User
from the body of an HTTP request containing some JSON. We could split the whole operation into three steps:
- parse the request’s body into
JSON
- parse the
JSON
into some rawUserData
- parse the raw
UserData
into the domainUser
datatype/object
-- haskell
parseJSON :: String -> Either Error JSON
parseUserData :: JSON -> Either Error UserData
parseUser :: UserData -> Either Error User
class JSON
{
/**
* @return Either<Error, self>
*/
public static function fromString(string $s): Either {...}
}
class UserData
{
/**
* @return Either<Error, self>
*/
public static function fromJSON(JSON $json): Either {...}
}
class User
{
/**
* @return Either<Error, self>
*/
public static function fromUserData(UserData $data): Either {...}
}
We clearly need to run the three steps in sequence, every time passing the result of the previous step to the next one. As always Either
allows to implement this behaviour in a very declarative manner
-- haskell
parseJSON >=> parseUserData >=> parseUser :: String -> Either Error User
/** @var string $responseBody */
$responseBody;
JSON::fromString($responseBody)
->bind([UserData::class, 'fromJSON'])
->bind([User::class, 'fromUserData'])
The name bind
is the classical one, it might come easier to read it as andThen
.
The difference in the implementation here is due to object orientation. The signature of the >=>
operator is
(>=>) :: (a -> Either e b) -> (b -> Either e c) -> a -> Either e c
As you can see >=>
has only functions as inputs. This prevents us from implementing it as a concrete method of a class. We can solve by using a completely equivalent formulation of >=>
, which is
bind :: Either e a -> (a -> Either e b) -> Either a b
Its first argument is a concrete datatype and therefore we can implement it as a method of the Either
class.
/**
* @template E
* @template A
*/
class Either
{
/**
* @template B
* @param callable(A): Either<E, B> $next
* @return Either<E, B>
*/
public static function bind(callable $next): self {..}
}
Conclusion
Either
is a data structure which allows describing in a single datatype/class the fact that something could be either one of two things. This turns out to be very helpful also in domain modelling, when a single concept could have different instances.
Either
particularly shines when it is used to represent the result of a computation which might fail. This allows us in the first place to gain all the benefits which referential transparency offers. Moreover, it allows the creation of a very expressive API which leads to a very declarative style of programming.
As a data structure Either
is much more used in functional languages, but it could be used with great advantage in any language which offers higher order functions and type variables/generics.
If you haven’t already, I’d encourage you to try to use it in your code base. Once you do, it’s hard you’ll want to go back.