Major PostgreSQL Upgrade Without Downtime

If you’re looking to perform a PostgreSQL engine upgrade from one major version to another, it is not possible to do so using standard physical replication. You can do it offline with the pg_upgrade command. This works but requires completely shutting down the engine, and depending on the volume, it can take hours, which is clearly not optimal in terms of service delivery.

Note: I recommend using Patroni if you are running PostgreSQL with replication; it is the best open-source tool currently available for managing PostgreSQL clusters.

Here, we will see how to perform major PostgreSQL upgrades without downtime (though with a few small subtleties).

Prerequisites

A source PostgreSQL cluster/instance
A target PostgreSQL cluster/instance of the destination version
No schema modifications during the migration
The max_logical_replication_workers configuration set appropriately if you have many databases (2 workers per database)
The wal_level configuration set to logical on both clusters (note that a restart is required, not just a reload)

You can check the configuration hot using the following command:

sudo su - postgres -c 'psql postgres -c "show wal_level;"'
 wal_level
-----------
 logical
(1 row)

If you are using Patroni, you need to edit the configuration (patronictl -c /etc/patroni/config.yml edit-config) and restart each instance:

postgresql:
  parameters:
    ...
    wal_level: logical
    max_logical_replication_workers: 8

Migration

Setting up logical replication

To set up logical replication, we need to replicate our schema (without the data) to the target cluster:

export PGPASSWORD="<redacted>"
pg_dumpall -h <legacy_pg_primary> -U postgres --schema-only | psql -h <new_pg> -U postgres
unset PGPASSWORD

If you are using this migration to change topology, you can replicate just one source database, for example:

export PGPASSWORD="<redacted>"
pg_dump -h <legacy_pg_primary> -U postgres --schema-only <database_name> | psql -h <new_pg> -U postgres <new_database_name>
unset PGPASSWORD

Now, we just need to replicate. This will use PostgreSQL’s pub/sub model.

On the old cluster, create a publication for all tables:

CREATE PUBLICATION zero_downtime_upgrade FOR ALL TABLES;

Now on the new cluster, subscribe to it:

CREATE SUBSCRIPTION my_subscription CONNECTION 'dbname=mydb host=myhost user=myuser password=mypass' PUBLICATION zero_downtime_upgrade;

From now on, the target PostgreSQL will retrieve data from all the tables from the publication. If this new PostgreSQL is version >= 16, you can follow the progress table by table of the initial replication in the logs.

Logical replication takes a lot of disk space in WALs. If you archive these WALs, the archive might be too long compared to the replication speed, which could fill up all the space. What you can do is clean up some of these every minute during this initial synchronization; archiving is not very important at this moment in the process, it will be only once the initial synchronization is finished.

You can run the following script in a tmux session to keep the last 50 WALs on the file system:

cd /var/lib/postgresql/16/main/data
while true; do
	pg_archivecleanup pg_wal $(ls  pg_wal/|tail -50|head -1) && sleep 60
done

If your target is a cluster and not an isolated instance, validate beforehand that its replicas follow the replication in real-time and are not lost, for example, if their CPU/disk performance is lower than the primary.

You can check the progress of the replication by reading the offsets on both clusters:

On the old cluster, in the publications and replication slots:

select slot_name,confirmed_flush_lsn from pg_replication_slots;
select application_name,replay_lsn from pg_stat_replication;

On the new cluster, in the subscriptions:

select subname,received_lsn from pg_catalog.pg_stat_subscription;

Switching to the new cluster

Once the initial synchronization is complete, it will synchronize the data from the old to the new cluster in real-time. You can switch to the new cluster by changing your URLs/credentials whenever you want, but consider the following points:

If the database writes in inserts, sequences will evolve over time. If your schema contains SERIAL or BIGSERIAL fields, these are not replicated in the pub/sub mechanism. This will cause insertion issues on your target cluster because your sequences will have been reset.

On Stackoverflow (source), someone cleverly suggests the following script to generate the queries to run on the new cluster to reset the sequences to the correct offset:

#!/bin/bash
port=$1
db=$2

query="select schemaname as schema,
       sequencename as sequence,
       start_value,
       last_value
from pg_sequences order by sequencename asc;
"

while read schema sequence start_value last_value
do
        if [ -z "$last_value" ]
        then
        echo "SELECT pg_catalog.SETVAL('${schema}.\"${sequence}\"', $start_value, true);"
        else
                echo "SELECT pg_catalog.SETVAL('${schema}.\"${sequence}\"', $last_value, true);"
        fi

done < <(psql -t -A -F" " -p ${port} ${db} -c "${query}")

If you use replicas in a read pool and not just the primary, migrate the read-only PostgreSQL URIs first, then the primary; otherwise, you will read an older state than the actual one in the opposite direction.

Final Considerations

This article gives a general idea of a “hot” migration of a major PostgreSQL version, which will cover many common use cases. However, you must consider the application’s topology: does it separate reads and writes? Is latency important? Can I afford a micro-downtime to ensure everything has been replicated? These are points to be addressed on a case-by-case basis, which a simple article on the internet cannot handle for you.