Gitlab hero border pattern left svg Gitlab hero border pattern right svg

Blueprint: Postgres database bloat analysis

On this page


PostgreSQL maintains consistency across concurrent database sessions using multi-version concurrency control. Most modifications to records result in a new version of the record that obtains additional space. The old version is considered a dead tuple until the space is marked for re-use. This space can be re-used for new records (created by update or insert queries) and is generally referred to as bloat. All relevant data structures (heap, index, toast) can suffer from bloat which has an impact on query performance and space usage. We refer to this as index bloat or table bloat and similar.

A background process called autovacuum sweeps dead tuples and marks the space for re-use. Bloat is reduced, if the space is successfully re-used later. Over time, all data structures may accumulate bloat that cannot be removed with regular background autovacuuming. For example, index pages are only reclaimed if they became completely empty. That is, an index page is kept around also if only a few keys remain on that page. Here, additional strategies have to be implemented to reorganize data structures in order to remove bloat.

The PostgreSQL database for is 2.6 TB in size as of January 2019. The amount of bloat is measured to be at 18 % of the overall database size, ranging at up to ~450 GB of bloat (see Analysis).

The problem with bloat is that it may slow down scan operations on relevant data structures. These can be indexes (index scans) and heap (table scans). Generally, index bloat is considered to have higher impact than heap bloat (because indexes are likely more often used to retrieve records from the heap than sequentially scanning a full table).


In order to estimate bloat, we used a database instance from the backup restore pipeline (in January 2019). This comes with the exact same data as the production cluster but is otherwise detached and read/write. On this instance, we measured table, index and toast-table sizes separately before and after performing a VACUUM FULL on the database. The difference is what we consider bloat (see below table). The full vacuum run took nearly 24 hours to complete.

Data structure Before After Bloat in GB Bloat in % of current size (Before)
Table 1210 GB 1186 GB 24 GB 2 %
Index 843 GB 410 GB 433 GB 51 %
Toast 509 GB 502 GB 7 GB <2 %
Total (all) 2556 GB 2104 GB 452 GB 18 %

From this analysis, it is clear that overall table and toast bloat is negligible for This means that autovacuum performs really well and frees up space that is later reclaimed by new record versions. However, there are a few outlying tables that have elevated levels of bloat (see "Table Analysis").

Table Analysis

Here we examine table (heap) bloat more closely. The analysis is based on a table size comparison from before and after a VACUUM FULL (source).

Below we can see that most of the tables are either small or show low levels of bloat.

Table bloat

The encircled tables in red have a bloat level above 10% and more than 0.5 GB of bloat accumulated. Those tables are

Rewriting these tables with CLUSTER takes about 40-50s and removes 7.4 GB of table bloat (30% of overall table bloat). During this time, the table is locked exclusively and other queries are going to be blocked. Depending on the access patterns, it may be a viable approach to do for It has been pointed out though that services is a rather actively used table which may not tolerate being locked for up to a minute. For snippets however, we should be able to just lock it exclusively for a minute without much impact.

There are other tables (marked in blue) with a high bloat ratio that are relatively small in total size. For those, it may be interesting to tune autovacuum settings. Given their comparably small size, those can be rewritten with CLUSTER as well causing only very short lock situations.

For those two tables, we may want to dig deeper into why we see these amounts of bloat. We may be able to change the access pattern for the application in order to improve this, so that we don't have to deal with a lot of bloat in the first place.

Index analysis

Note that a different database state was used to perform the index analysis vs the table analysis. Both database snapshots were a few days apart only, but their total size differs.

Let's dissect where 51% index bloat is coming from. Below is a distribution of overall index size per table and their related bloat levels. There are quite a few bigger indexes with high levels of bloat - this is where we should target at.


Going one step further, we look at individual index sizes and their respective bloat. The below graph shows the distribution per index and blue color indicates the index is a primary key.


Based on these statistics, we arrive at this picture:

Data structure Actual size Bloat in GB % of overall index bloat
Primary key indexes 44.4 24.1 5.4
Other indexes 379.6 423.1 94.6
Overall 424 447.2 100

Indexes that do not serve a primary key constraint make up 95% of the overall index bloat. Rebuilding those indexes can be accomplished with normal DDL in a regular background migration.

If we rebuild regular indexes only, this would free up 423.1 GB of space and get us at 5.7% overall index bloat (down from 51%).

Zooming into the index dataset, we can examine the biggest offenders in terms of index bloat more closely. The export contains data for indexes that are above 70% of bloat with more than 1 GB worth of bloat space.

An interesting example are indexes in users on boolean columns ghost and admin and similar for the enum-varchar state. Let's look at the table and index definitions:

                                                      Table "public.users"
                    Column                    |            Type             |                     Modifiers
 id                                           | integer                     | not null default nextval('users_id_seq'::regclass)
 admin                                        | boolean                     | not null default false
 ghost                                        | boolean                     |
 state                                        | character varying           |

    "index_users_on_admin" btree (admin)
    "index_users_on_ghost" btree (ghost)
    "index_users_on_state" btree (state)

Their net size is 70 MB but they range at 2.2 GB each. Note there is generally only one entry with the ghost flag set. Similarly, there are only a few users with admin flag set compared to normal users. We may be able to redefine those indexes, for example by making them partial. This avoids bloat because the index contains only very little records and does not change that often.

We may want to conduct a more thorough analysis of indexes, their definition and also their actual usage on (but this is a separate effort). For example, we might be able to remove indexes that are not used at all.

Storage cost

The level of bloat is the same across all replicas in the cluster (physical replication).

We use SSD persistent disks for database storage in Google Cloud Engine (us-east1 region). With currently 8 replicas for production, the overall bloat ranges at roughly 3.5 TB which comes in at 595 USD per month. There is additional associated cost for increased size of backups and by other similar environments (e.g. staging).

Positives aspects of bloat

There are also positive aspects of (table) bloat: HOT updates can only take place if there's enough room on the page (plus other requirements). Hence, bloat space can have a positive impact on update performance as well - best if it is uniformly distributed across the table.

This is also why it makes sense to explicitly bloat a table while it is being populated. This is what fillfactor does: PostgreSQL reserves some space on each page for updates.

Strategies to remove bloat

Generally, database bloat can be reduced by rewriting relevant data structures (heap or index files). There are straight-forward ways of executing this as an offline operation: REINDEX, CLUSTER and VACUUM FULL are all sufficient to remove bloat from index and/or table structures, but require holding an exclusive lock for an extended period of time.

Since we require an online solution that does not interfere with ongoing traffic, we need to look at more elevated approaches.

Standard migration with DDL to rebuild and swap indexes

Here, we use a standard migration to rebuild indexes concurrently using standard DDL statements. This migration is similar to creating new indexes and as such can be done without impact on the live system.

The steps are:

  1. CREATE INDEX CONCURRENTLY - recreate an index on the sides
  2. DROP INDEX and ALTER INDEX ... RENAME - in a single transaction, drop the old index and rename the newly created one to the original name. As with all other approaches, we need an exclusive lock on the table in order to swap indexes (however, this is a quick operation).

Since the new index is freshly rebuilt, it contains no bloat.

This method is only applicable to cleaning up indexes that don't participate in a primary key constraint. As pointed out in the analysis, we can clean up 95% of all index bloat with this method arriving at 5.7% total index bloat.

After the migration, we'd arrive at

Data structure Actual size in GB Bloat in GB  
Primary key indexes (untouched) 44.4 24.1  
Other indexes (rebuilt) 379.6 0  
Tables (untouched) 1186 24  
Toast (untouched) 509 7  
Total 2119 55.1 2.5 % overall bloat

The benefits of this method are:

  1. Efficient: Cleans up vast majority of the overall bloat for
  2. Dogfooding: Ships easily with the product as a standard background migration
  3. Low technical risk: Safest method we have available (no need to include PostgreSQL extensions)
  4. Boring: Low implementation effort.

The effort to implement this method is fairly low: It can be delivered with the product (to self-hosted installations and using a standard background migration or a rake task without additional dependencies.

Use pg_repack for index and table maintenance

pg_repack is a PostgreSQL extension that implements repacking of table and index structures. It can be run on-line because it provides a trigger-based solution to rebuild the data structure on the sides and later swaps data files instead of logically swapping the index or table.

Since pg_repack is based on physical file swapping, it can be used to

As such it is more versatile than the standard index rebuild method above.

Benefits are:

  1. Versatility: Ability to repack tables and primary key indexes (not needed today)

Low lights are:

  1. Risk: Additional compiled extension that is included in PostgreSQL runtime
  2. Effort: Additional work required to ship to customers (omnibus) and apply for (chef/apt setup)

Author: Andreas Brandl {: .note}