How to Efficiently Add a Stored, Generated Column to a Large PostgreSQL Table

Опубликовано: 31 Май 2025
на канале: vlogize

Discover an efficient method to add a `stored, generated column` to very large PostgreSQL tables without locking issues, ensuring minimal downtime and optimal performance.
---
This video is based on the question https://stackoverflow.com/q/77852268/ asked by the user 'Alexi Theodore' ( https://stackoverflow.com/u/9819342/ ) and on the answer https://stackoverflow.com/a/77857071/ provided by the user 'pert5432' ( https://stackoverflow.com/u/23137182/ ) at 'Stack Overflow' website. Thanks to these great users and Stackexchange community for their contributions.

Visit these links for original content and any more details, such as alternate solutions, comments, revision history etc. For example, the original title of the Question was: How to add a stored, generated column to a very large table?

Also, Content (except music) licensed under CC BY-SA https://meta.stackexchange.com/help/l...
The original Question post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license, and the original Answer post is licensed under the 'CC BY-SA 4.0' ( https://creativecommons.org/licenses/... ) license.

If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
How to Efficiently Add a Stored, Generated Column to a Large PostgreSQL Table

Managing large databases can often feel like a giant puzzle, especially when it involves modifying structures within them. One common challenge that arises is the need to add a new stored, generated column to a very large table, containing over 100 million rows. This process not only requires careful planning but also an efficient execution strategy to avoid long periods of table locking.

The Problem

When adding a stored, generated column to a large table, PostgreSQL needs to compute values for this column for every existing row. This operation can lead to unacceptably long lock times, during which no other operations can be performed on the table. Currently, there isn’t a built-in option to validate such a column asynchronously, making this task particularly cumbersome.

As a result, database administrators often find themselves seeking alternative approaches that could mitigate downtime and allow for smoother upgrades.

Proposed Solution: A Step-by-Step Guide

The objective is to minimize locking time while ensuring that the new column populated with the necessary values. Here’s an effective approach you can take:

Step 1: Create a Nullable Column

Begin by adding the new column as NULLABLE. This allows you to make structural changes without immediately computing values for every existing row.

[[See Video to Reveal this Text or Code Snippet]]

Step 2: Establish a Trigger

Next, create a trigger that automatically fills the new column when new rows are added or existing rows are updated. The trigger function will assign values to column_c by calling your desired function.

Trigger Type: BEFORE INSERT OR UPDATE

Functionality: Assign the result of your computation to NEW.column_c

Here’s a basic example:

[[See Video to Reveal this Text or Code Snippet]]

Step 3: Fill Existing Rows Asynchronously

With the column and trigger in place, you can now proceed to fill in the generated values for rows that already exist in the table. This should be done asynchronously to avoid putting strain on the database.

You can use an asynchronous cursor process to update values at a controlled pace, ensuring that the performance remains optimal.

[[See Video to Reveal this Text or Code Snippet]]

Step 4: Make the Column NOT NULL

Once every existing row has been populated in column_c, you can now alter the column constraint to NOT NULL. This step locks the table briefly, but it's worth it to enforce data integrity.

[[See Video to Reveal this Text or Code Snippet]]

Conclusion

By following this structured approach, you can efficiently add a stored, generated column to a large PostgreSQL table without suffering from extended locking periods. This method simulates the behavior of generated columns while allowing for a phased population of values, keeping your database responsive throughout the process.

If you’re facing this challenge, give this strategy a try and reduce your downtime effectively!