Benjamin Sago / ogham / cairnrefinery / etc…

Technical notes Check data easily with PostgreSQL domains

If I had to give only one tip to someone designing a PostgreSQL database schema from scratch, it would be this one: create a domain for non-empty strings, use it where appropriate (which will end up being something like 90% of all string columns), and never worry about entire classes of bugs again.

In PostgreSQL, a domain is a type definition that has one or more check constraints attached to it. (It looks like there is no equivalent feature in MySQL or MariaDB.) A check constraint is a simple mechanism for limiting what values may be placed in a table cell, such as “only allow values greater than zero”, or “only allow non-empty strings”. When added to a column, you can even use them to enforce intra-column constraints, such as “if the status is failed, then the failure reason must not be null”.

The biggest downside is not the performance cost of having the database check the values before it inserts them — the cost is negligible — but having to remember to spell out the exact check you need everywhere. Domains alleviate this problem by describing the constraint as part of the column’s type.

My favourite example

The vast majority of all string columns, both varchar and text, across all the databases I’ve ever seen, should never, ever have an empty string inserted into them. If the value can be looked up or referenced as an identifier, or displayed on the screen as content, it’s incredibly rare that the empty string should be considered an allowed value.

Like many such problems in software development, it persists because solving it completely involves getting the checking logic correct everywhere but skipping the check or getting it wrong somehow only once is enough to commit bad data to your database forever.

These two half-lines of code — which is basically one line of code if I hadn’t split it in half — have done me more good than probably any other, because I don’t need to remember to write out the entire check line in my table definition, I can just use a domain.

First, before I define anything else in my schema, I create a domain called content:

create domain content as text
    check (value != '');

Now, I would use this new domain as I would any other type. For example, a table for blog entries should ensure that each post has a non-empty slug, a non-empty title, and (almost certainly) a non-empty body, so I use the content type for these three columns:

create table posts (
    id_post         serial       primary key,
    post_title      content      not null,
    post_slug       content      not null,
    post_body       content      not null,
    published_date  timestamptz  not null

And just like that, if a sneaky piece of data manages to slip through the cracks, you’ll get an actionable database error rather than no error and bad data recorded to disk. A normal insert works as before:

# insert into posts
    values (default, 'My Post', 'my-post', 'This is my post', current_date);

But a post where the title, slug, or body is blank results in an exception being thrown:

# insert into posts
    values (default, '', 'my-post', 'This is my post', current_date);
ERROR:  23514: value for domain content violates check constraint "content_check"
SCHEMA NAME:  public
CONSTRAINT NAME:  content_check
LOCATION:  ExecEvalConstraintCheck, execExprInterp.c:3640

Of course, if this is a user-facing application, then you do still need to implement validation within the application itself. There’s no easy, reliable way to turn a database-level error (such as “value for domain content violates check constraint”) into a user-facing error (such as “post title cannot be empty”), so your application will still have to do its own empty-string checking. The domain promises that if this checking is somehow missed, or skipped, the user will get an incomprehensible error message thrown at them — but this is still better than the alternative!

A sidebar on static typing

For a long time, I never understood why anybody would engineer a system without the safety net of a type system that gets checked statically beforehand (usually with a compilation step). Yes, you have to have a build step, and yes, you have to spell out all the types within your code, but the worst-case scenario when dealing with a database is not that the user sees an error message because you’re accidentally adding a string to a number somewhere, but that you call the wrong function or use the wrong piece of data and the result gets written to the database, forever. (This is even worse when your database itself does not check types, like SQLite does)

My view on this changed when I worked on a project in duck-typed Ruby. Sure, we had NameError and NoMethodError crop up on the regular, but thanks to the PostgreSQL checks, no error was permanently devastating — and it meant we didn’t need to duplicate the type information between the front-end, back-end, and database. I still feel better with my static typing, but the situation really wasn’t as bad as I thought it would be.