PostgreSQL

Basic setup

Let's create and explore a test cluster before using it for something important. So, how does this work then ? Well, lets create one with the following config. Click the for more information.

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster (1)
metadata:
  name: test-postgres-cluster
spec:
  postgresVersion: 16 (2)
  instances:
    - name: instance1
      replicas: 2 (3)
      dataVolumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2Gi (4)
        storageClassName: rook-block
        (5)
      affinity: (6)
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: test-postgres-cluster
                  postgres-operator.crunchydata.com/instance-set: instance1


  backups:
    pgbackrest: (7)
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.47-2
      repos:
      - name: repo1
        volume:
          volumeClaimSpec:
            accessModes:
            - "ReadWriteOnce"
            resources:
              requests:
                storage: 2Gi
            storageClassName: rook-block

Remember operators extend Kubernetes functionality, this is a new resource type.
We can set the PostgreSQL version here.
Define how many instances we want to run
Define the disk size of each instance
In theory, that's all we need to define. But let's go further for bonus points.
This section is wht gives us atual high-availability. It's not special to the PGO, but native Kubernetes functionality. It tells k8s to spread the instances across different hostnames if possible. That way loss of one k8s node does not take down all of our instances.
pgBackRest is the only option, and is a required setting.

Connect and look around the cluster

Check out what the above manifest has created first:

kubectl get pods -l postgres-operator.crunchydata.com/cluster=test

The output will similar to the following:

NAME                     READY   STATUS    RESTARTS   AGE
test-backup-znmx-zzf84   1/1     Running   0          87s
test-instance1-f74j-0    4/4     Running   0          107s
test-instance1-pf8x-0    4/4     Running   0          106s
test-repo-host-0         2/2     Running   0          106s

We can see the two instances we requested, one backup repository and one active backup job already running.

So what else has been setup for us ? We said this is highly available, and the PGO uses Patroni to manage that. So lets look at that status:

kubectl get pods -l postgres-operator.crunchydata.com/instance-set=instance1 -o=custom-columns='NAME:.metadata.name,ROLE:.metadata.labels.postgres-operator\.crunchydata\.com/role'

You should see something like this:

NAME                    ROLE
test-instance1-f74j-0   replica
test-instance1-pf8x-0   master

We can see one master, and one replica which seems sensible. If we had more instances, there would be more replicas.

What about connecting ? Well, the PGO also creates an initial user and database for us.

kubectl get secrets -l postgres-operator.crunchydata.com/cluster=test

You'll find the following secrets:

NAME                        TYPE     DATA   AGE
test-cluster-cert           Opaque   3      26m
test-instance1-f74j-certs   Opaque   6      26m
test-instance1-pf8x-certs   Opaque   6      26m
test-pgbackrest             Opaque   5      26m
test-pguser-test            Opaque   8      26m
test-replication-cert       Opaque   3      26m

We can see a number of Kubernetes native secrets have been created. The one we need for connection and user info is test-pguser-test. Let's examine that, run:

kubectl get secrets/test-pguser-test -o yaml

and you will see the following, or similar:

apiVersion: v1
data:
  dbname: dGVzdA==
  host: dGVzdC1wcmltYXJ5LmRlZmF1bHQuc3Zj
  jdbc-uri: amRiYzpwb3N0Z3Jlc3FsOi8vdGVzdC1wcmltYXJ5LmRlZmF1bHQuc3ZjOjU0MzIvdGVzdD9wYXNzd29yZD1mJTVEOVBkLjVDSWVYZSU0MGFpJTI5UC5oVjZFeCU1RSZ1c2VyPXRlc3Q=
  password: Zl05UGQuNUNJZVhlQGFpKVAuaFY2RXhe
  port: NTQzMg==
  uri: cG9zdGdyZXNxbDovL3Rlc3Q6ZiU1RDlQZC41Q0llWGUlNDBhaSUyOVAuaFY2RXglNUVAdGVzdC1wcmltYXJ5LmRlZmF1bHQuc3ZjOjU0MzIvdGVzdA==
  user: dGVzdA==
  verifier: U0NSQU0tU0hBLTI1NiQ0MDk2OkdYMG0vRDNFVjRBbFIxa2NCRkN6QlE9PSRVcGpyTlpHekJYNHlsODBYWTZOYjdXZGZ0QjRaT0dSVEs3VGlaMnhpTmlBPTpIakFQdmg5MzA0R0VvY2RwZENURDA2NVJsamJpRTc2OWVoRFBzNjlaZjlNPQ==
kind: Secret
metadata:
  creationTimestamp: "2024-03-10T07:09:01Z"
  labels:
    postgres-operator.crunchydata.com/cluster: test
    postgres-operator.crunchydata.com/pguser: test
    postgres-operator.crunchydata.com/role: pguser
  name: test-pguser-test
  namespace: default
  ownerReferences:
  - apiVersion: postgres-operator.crunchydata.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: PostgresCluster
    name: test
    uid: 994f3f91-6aa3-4b2e-a716-3174233f2520
  resourceVersion: "43635550"
  uid: 765cce42-c1f7-4fea-b28b-bbd43fcebf52
type: Opaque

You can see a number of paramets set, user, password, URI, dbname.. any combination of parameters ready to use in your application.

For our purposes, we want the uri value.

kubectl get secrets test-pguser-test -o go-template='{{.data.uri | base64decode}}'

and you'll see yours output as follows:

postgresql://test:f%5D9Pd.5CIeXe%40ai%29P.hV6Ex%5E@test-primary.default.svc:5432/test

To use this from outside of the cluster, we'll need to setup a port-forward and then use the URI string to connect

PG_CLUSTER_PRIMARY_POD=$(kubectl get pod -o name \
  -l postgres-operator.crunchydata.com/cluster=test,postgres-operator.crunchydata.com/role=master)

kubectl port-forward "${PG_CLUSTER_PRIMARY_POD}" 5432:5432

Now fetch the secrets and connect:

PG_CLUSTER_USER_SECRET_NAME=test-pguser-test

PGPASSWORD=$(kubectl get secrets "${PG_CLUSTER_USER_SECRET_NAME}" -o go-template='{{.data.password | base64decode}}')
PGUSER=$(kubectl get secrets "${PG_CLUSTER_USER_SECRET_NAME}" -o go-template='{{.data.user | base64decode}}')
PGDATABASE=$(kubectl get secrets "${PG_CLUSTER_USER_SECRET_NAME}" -o go-template='{{.data.dbname | base64decode}}')

PGPASSWORD=${PGPASSWORD} psql -h localhost -U ${PGUSER} ${PGDATABASE}

And you should finally be connected and see something similar to the following:

psql (16.2 (Homebrew), server 16.1)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

test=>

Connection pooling

The Crunchy PGO has the ability to install and configure PgBouncer instance for us, to do so add the following stanza to your postgresql config.

proxy:
  pgBouncer:
    image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.21-0

That's all you need to do, easy as. Once you've added that you'll find some new connection parameters added to the secret we've used in the previous step. The new parameters are:

pgbouncer-host
pgbouncer-port
pgbouncer-uri
pgbouncer-jdbc-uri

Backup and disaster recovery

Configuration

Using pgBackRest there are four different ways to store backups:

Azure
GCS
s3
Kubernetes volume

You can mix, match and have as many different storage repositories as you would like. We've set up a Kuberenetes volume in our configuration, and while the configuration is specific to storage type the operations are the same.

Backup operations and management

pgBackRest has three types of backup mode available, aimed at getting your recovery time (Recovery Time Object, or RTO) as short as possible:

full
differential
incremental

Selecting the appropriate strategy is outside the scope of our document, but lets show an example using PGO configuration.

Schedule

Schedules are cron-formatted strings that dictate the backup type, and time to execute.

Let's say that our backup policy is to take a full backup weekly on Sunday at 1am and take differential backups daily at 1am on every day except Sunday. We would want to add configuration to our spec that looks similar to:

spec:
  backups:
    pgbackrest:
      repos:
      - name: repo1
        schedules:
          full: "0 1 * * 0"
          differential: "0 1 * * 1-6"

The PGO will create the necessary Kubernetes CronJobs to perform the descibed backups.

Retention

PGO lets you set backup retention on full and differential backups. When a full backup expires, either through your retention policy or through manual expiration, pgBackRest will clean up any backup and WAL files associated with it. For example, if you have a full backup with four associated incremental backups, when the full backup expires, all of its incremental backups also expire.

There are two different types of backup retention you can set:

count: This is based on the number of backups you want to keep. This is the default. time: This is based on the total number of days you would like to keep a backup. Let's look at an example where we keep full backups for 14 days. The most convenient way to do this is through the spec.backups.pgbackrest.global section:

spec:
  backups:
    pgbackrest:
      global:
        repo1-retention-full: "14"
        repo1-retention-full-type: time

One-off manual backup

There are times where you may want to take a one-off backup, such as before major application changes or updates. This is not your typical declarative action -- in fact a one-off backup is imperative in its nature! -- but it is possible to take a one-off backup of your Postgres cluster with PGO.

First, you need to configure the spec.backups.pgbackrest.manual section to be able to take a one-off backup. This contains information about the type of backup you want to take and any other pgBackRest configuration options.

Let's configure the custom resource to take a one-off full backup:

spec:
  backups:
    pgbackrest:
      manual:
        repoName: repo1
        options:
         - --type=full

This does not yet trigger the one-off backup -- you have to do that by adding the postgres-operator.crunchydata.com/pgbackrest-backup annotation to your custom resource. The best way to set this annotation is with a timestamp, so you know when you initialized the backup.

For example, for our test cluster, we can run the following command to trigger the one-off backup:

kubectl annotate postgrescluster test \
  postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"

PGO will detect this annotation and create a new, one-off backup Job!

Next steps

See the Crunchy PGO documentation for more information such as high availability, monitoring, and managing your cluster.