order of columns in indexes with PostgreSQL
Wrong index column order causing trouble.
November 24, 2020
Imagine a specific PostgreSQL table called client
:
This is an example query that frequently ran, sometimes a couple of times in a second:
SELECT "client"."id", "client"."name", "client"."uuid", "client"."ip_address", "client"."created_at", "client"."expires_at"
FROM "client"
WHERE ("client"."expires_at" >= '2020-11-16T18:39:26.032414'::timestamp
AND "client"."ip_address" = '127.0.0.1'::inet
AND "client"."uuid" = '848d4113-759c-4021-8561-e87cff772e2a'::uuid)
ORDER BY "client"."id" ASC
LIMIT 1
Initially the following (inefficient) index was used:
CREATE INDEX client_index ON client(ip_address, uuid);
Within a very short time, quite unexpectedly but for some obvious reasons in hindsight (no pruning of obsolete entries etc.),
the number of rows in the table grew into the millions.
The database load increased slightly but not very much.
Still the idea was to optimize the index and include the column 'expires_at' - That one was simply forgotten in the first place.
The new index was created in the following way (This is where the error lies.):
DROP INDEX client_index;
CREATE INDEX client_index ON client(expires_at, ip_address, uuid);
This new index caused the database load on the CPU to hit 100%.
The service stopped working, everything went down, clients sarted to complain.
EXPLAIN
revealed:
Limit (cost=190704.99..190705.00 rows=1 width=45)
-> Sort (cost=190704.99..190705.00 rows=1 width=45)
Sort Key: id
-> Index Scan using client_index on client (cost=0.56..190704.98 rows=1 width=45)
Index Cond: ((expires_at >= '2020-11-16 18:39:26.032414'::timestamp without time zone) AND (ip_address = '127.0.0.1'::inet) AND (uuid = '848d4113-759c-4021-8561-e87cff772e2a'::uuid))
The Index Scan
looks fine, but not so the Sort
.
It turned out the order on the composite index was wrong.
(expires_at, ip_address, uuid)
The correct index looks like this, much more efficient:
DROP INDEX client_index;
CREATE INDEX client_index ON client(ip_address, uuid, expires_at);
The first column is the primary sort criteria. This should, in optimal cases, be the column with the most unique values.
In this case, in addition, it's also best to have columns that are compared by equality at first
when it comes to column order in the index. In the WHERE
clause expires_at
is used as a range (>=
). - This should not be the primary index criteria.
As always there are several angles to look at problems.
The important thing is to know your data and how it is used/accessed/queried.