order of columns in indexes with PostgreSQL

Wrong index column order causing trouble.

November 24, 2020



Imagine a specific PostgreSQL table called client:

This is an example query that frequently ran, sometimes a couple of times in a second:

SELECT "client"."id", "client"."name", "client"."uuid", "client"."ip_address", "client"."created_at", "client"."expires_at"
  FROM "client" 
  WHERE ("client"."expires_at" >= '2020-11-16T18:39:26.032414'::timestamp
    AND "client"."ip_address" = '127.0.0.1'::inet
    AND "client"."uuid" = '848d4113-759c-4021-8561-e87cff772e2a'::uuid)
  ORDER BY "client"."id" ASC
  LIMIT 1

Initially the following (inefficient) index was used:

CREATE INDEX client_index ON client(ip_address, uuid);

Within a very short time, quite unexpectedly but for some obvious reasons in hindsight (no pruning of obsolete entries etc.),
the number of rows in the table grew into the millions.

The database load increased slightly but not very much.
Still the idea was to optimize the index and include the column 'expires_at' - That one was simply forgotten in the first place.

The new index was created in the following way (This is where the error lies.):

DROP INDEX client_index;
CREATE INDEX client_index ON client(expires_at, ip_address, uuid);

This new index caused the database load on the CPU to hit 100%.
The service stopped working, everything went down, clients sarted to complain.

EXPLAIN revealed:

Limit  (cost=190704.99..190705.00 rows=1 width=45)
   ->  Sort  (cost=190704.99..190705.00 rows=1 width=45)
       Sort Key: id
         ->  Index Scan using client_index on client  (cost=0.56..190704.98 rows=1 width=45)
             Index Cond: ((expires_at >= '2020-11-16 18:39:26.032414'::timestamp without time zone) AND (ip_address = '127.0.0.1'::inet) AND (uuid = '848d4113-759c-4021-8561-e87cff772e2a'::uuid))

The Index Scan looks fine, but not so the Sort.

It turned out the order on the composite index was wrong.

(expires_at, ip_address, uuid)

The correct index looks like this, much more efficient:

DROP INDEX client_index;
CREATE INDEX client_index ON client(ip_address, uuid, expires_at);

The first column is the primary sort criteria. This should, in optimal cases, be the column with the most unique values.
In this case, in addition, it's also best to have columns that are compared by equality at first
when it comes to column order in the index. In the WHERE clause expires_at is used as a range (>=). - This should not be the primary index criteria.


As always there are several angles to look at problems.
The important thing is to know your data and how it is used/accessed/queried.