Benjamin Sago / ogham / cairnrefinery / etc…

Technical notes Check nested relations with PostgreSQL triggers

I’ve previously looked at using domains and check constraints to quickly and easily validate data. In that article, I end with a comment on how if you move validation into the database engine itself, you can keep the code of your application itself simple.

If you continue down this path, the next thing you’ll want to start validating is not individual table rows and the values in the cells, but whether a row is valid given the table or schema as a whole.

For example, if you define Table A which references Table B, you can use foreign keys to make sure that a cell in Table A references an existing row in Table B, and you can use checks and domains to make sure that the values in Table A are within allowed ranges. But you can use neither foreign keys nor checks nor domains to ensure that a value inside a row of Table A is valid depending on one of the other values of Table B. (If that’s too abstract, there’ll be a better example later.)

For this, you’ll need to run some code at insert-time.

Consistency triggers

The best way I have found to do this in PostgreSQL is to create a trigger — a function that gets executed when an event occurs — and have it fire when a table is inserted into or updated.

Functions are written not in SQL but in PL/pgSQL, PostgreSQL’s built-in procedural programming language. PL/pgSQL is nobody’s favourite programming language, but fortunately we don't need to use too much of it in order to program the database with the logic we need, and it integrates very easily with traditional SQL statements.

Here’s the skeleton for a consistency trigger, where we define a function and associate it with insert and update events for a table:

create function <check_new_whatevers>() returns trigger as $$
begin
    if <condition> then
        raise exception 'Something wrong with %', new.<field>;
    end if;
    return new;
end;
$$ language plpgsql;

create trigger <whatevers_trigger>
    before insert or update on <file_executables>
    for each row execute procedure <check_new_whatevers>();

Believe it or not, the text between the $$ delimiters is a string. PostgreSQL allows many different string delimiters, including using dollars as quotes. The function’s body is read in as a string and parsed separately from the surrounding DDL.

Anyway, this works by defining a function that does something with a value called new, which is dynamically given the type of the table row that’s being inserted into and the values of each of the columns. We are then free to inspect this data and make queries on data in other tables, and then raise an exception if the data is not up to scratch, which aborts the transaction and rolls back the data that would have been inserted or updated; or, we can simply return new, which will insert the data as per usual.

A concrete example

The example I gave above was described in abstract terms, so let’s look at something more real-world. Last time, I used a table describing posts on a blog as the example, so I’m going to expand on that.

Let’s say that a blog has many posts, and that each post has an author, and that each author has a featured post:

create table authors (
    id_author    serial   primary key,
    author_name  content  not null
);

create table posts (
    id_post      serial   primary key,
    post_title   content  not null,
    public       boolean  not null,
    id_author    int      not null  references authors
);

create table featured_posts (
    id_author    int      not null  references authors,
    id_post      int      not null  references posts
);

create unique index on featured_posts (id_author);

There are two things we want to check here: that a post can only be featured if it is public, and that the featured post associated with an author should have that same author as the one that it’s linked to. This is what such a checking trigger would look like:

create function check_featured_posts() returns trigger as $$
declare
    res_public boolean;
    res_author int;
begin
    select into res_public, res_author
        p.public, p.id_author
    from posts p
    where p.id_post = new.id_post;

    if not res_public then
        raise exception 'Author %’s featured post % must be public!',
            new.id_author, new.id_post;
    end if;

    if res_author != new.id_author then
        raise exception 'Author %’s featured post % has a different author, %!',
            new.id_author, new.id_post, res_author;
    end if;

    return new;
end;
$$ language plpgsql;

create trigger check_featured_posts
    before insert or update on featured_posts
    for each row execute procedure check_featured_posts();

Inside the PL/pgSQL function, we use select into (rather than just select) to look up the post in the posts table based on the post ID in new, which holds the values being inserted. We put the data we need — the public flag and the author ID — into two local variables, which need to be declared in a declare block outside the function body, and then compare them against our expected values, throwing exceptions if they differ.

Oh, before we can try this trigger out, we just need to insert some test data:

insert into authors
    values (0, 'Ben'),
           (1, 'Evil Ben');

insert into posts
    values (0, 'How to use PostgreSQL triggers', true,  0),
           (1, 'How to delete all your files',   false, 1);

With all this in place, we get an error if we try to insert a featured post that’s not public:

# insert into featured_posts values (1, 1);
ERROR:  P0001: Author 1’s featured post 1 must be public!
CONTEXT:  PL/pgSQL function check_featured_posts() line 12 at RAISE
LOCATION:  exec_stmt_raise, pl_exec.c:3868

We also get an error if we try to give an author a featured post that has the ID of a different author:

# insert into featured_posts values (1, 0);
ERROR:  P0001: Author 1’s featured post 0 has a different author, 0!
CONTEXT:  PL/pgSQL function check_featured_posts() line 17 at RAISE
LOCATION:  exec_stmt_raise, pl_exec.c:3868

And, just to check that public, author-matched posts are unaffected:

# insert into featured_posts values (0, 0);
INSERT 0 1

The error strings we get from PostgreSQL leave a little to be desired — I’m not sure why they think that the file and line number of the C source code that handles this particular error case is worthy of inclusion — but, as before, these errors should not be seen by users to begin with.