Top 10 Configuration Parameters to Tune PostgreSQL for Better Performance

The conventional wisdom is that, if you do not know what you are doing, you’re better off using the default configuration. Databases usually perform well with the default parameter values and premature performance optimisation often results in worse performance. On the other hand, Postgres is a different story. Postgres default configuration is set to work everywhere, including tiny machines like Raspberry Pi. Therefore, the most default values are too conservative for normal usages. It is recommended to tweak with the configuration so that it performs well for your use case. With PostgreSQL, a few small config changes often result in a huge performance improvement.

Having said that, Postgres has more than 200 configuration parameters. Tuning Postgres can be confusing. In this post, I will present top 10 parameters that you should know. By default, these parameters are set way too conservative. Changing these values to be more aggressive is likely to increase the performance.

(1) max_connection

If you have a lot of concurrent users and are getting connection rejected, the solution is not so simple as to increase this value. If it is set to high, you will see high latency.

Postgres rejects connections over the value. It is for limiting the number of connection to prevent performance degradation from too many connections. Default is 100. In Postgres, each connection creates a process and consume memory. This means Postgres will suffer from too many connections. Especially, the latency will increase.

If you have an application with many concurrent users, it is sometimes better to deny connections entirely than degrade the performance. If you have too many connections, the solution is to set up PgBouncer connection pooling. The excessive connections will not be rejected. Check out this video (about 15 minutes in), explaining max_connection parameters in details.

(2) shared_buffer

The default is too low for most modern hardware. More buffers will be needed for many concurrent queries and many CPUs like web applications or OLAP.

The conventional wisdom is to set it to 1/4th of RAM and effective_cache_size 3/4th .  However, 1/4th may be too low for small machine and too high for big machines. Start with 1/4th and need to try different values out.

(3) effective_cache_size

Increasing this value will increase the amount of memory available to cache data. The larger value makes database more likely to use index. It should be more than 50% of RAM. shared_buffer plus the amount of OS cache available is a good place to start.

(4) work_mem

This can changed by any user and does not require to reboot the database. The parameter can be allocated per role or task. The value determines the maximum local process memory used for operations such as sort and joins. The optimal value would be vastly different between OLTP, OLAP and DW (small to very large, e.g. like gigabytes for complex queries going for hours. OLTP 10mb, OLAP 100mb, DW 1gb).

Lower it if you have too many connections (e.g. for web applications) and raise it for large querry. If you are swapping Ram, you have too much work_mem and logging temp files means not enough work_memory.

(5) maintenance_work_mem

Used for maintenance for vacuuming, index creation, checking foreign keys. Generally speaking, it should be raised. For vacumming, the faster it completes, better.

As it is used for index creation, we should raise it if you are doing a lot of bulk load.

Raise it to 256MB to 1GB for large database.

(6) random_page_cost

The ration of random_page_cost and seq_page_cost means how much more expensive to read randomly compared to reading sequentially from the disk. The default is random_page_cost=4.0 and seq_page_cost=1.0. This means random read is 4 times more expensive than sequential read. This is not true for SSD. It is recommended to set the value to 2 or 3 for SSD.

(7) {min, max}_wal_size

When it is too low, it will cause excessive checkpoints. The recommendation is to make max_wal_size large enough to rarely reach it. Make min at least 1GB and max several GB up to 50 to 100 GB.  Along with changing this value, you also need to increase checkpoint_timeout to get the benefit of increased max_wal_size. By raising this value, we can spread checkpoint further apart. To understand checkpoint, I recommend you to read this excellent blog (Basic of Tuning Checkpoint).

(8) autovacuum_vacuum_scale_factor

Vacuuming is the process to clear out dead tuples created by deletion and insertions. PostgreSQL has a mechanism to automatically vacuum the tables depending of the threshold value specified.

Generally speaking, default values that control auto vacuum are too conservative. You can make them more aggressive. Sometimes I hear people recommending to turn off auto vacuum and do vacuum manually. Especially autovacuum is not so good for batch writes (e.g. data warehouse). It is better to do manual vaccum for bulk load. However, it is better to use auto vacuum if you do not fully understand manual vacuuming process. So, the rule of thumb is not to turn off auto vacuum if you are not confident about vacuuming by yourself.

Autovacuum_vacuum_scale_factor determines the percentage of dead tuples in the tables before auto vacuum happens. When it is set to 0.2, auto vacuum happens when 20% of the table contains dead tuples.

20% sounds fine with a small to medium size tables. But, it is too much for a large table (e.g. 200GB for 1TB table). This value can be set per table. If not, vacuuming large tables not often enough is worse than vacuuming small tables too often.

You can use this value in conjunction with autovacuum_vacuum_threashold, which determines the number of dead tuples for auto vacuuming. For example, autovacuum_vacuum_threashold=1000 means auto vacuuming happens when 1000 rows are dead tuples.

To further understand auto vacuuming, you should read Autovacuum Tuning Basics.

(9) autovacuum_max_workers

Use more autovacuum_max_workers if you have many cores. Default is 0. It is better to have the same number as the number of cores so that we can do this concurrently. The faster vacuuming finishes, the better database performance will be. In conjunction with this parameter, you should set multiple_autovacuum_workers. This should not be more than ½ cores.

For other auto vacuum tuning parameters, you can raise autovacuum_cost_limit significantly and reduce autovacuum_vacuum_cost_delay significantly. The default values for these parameters are noticeably conservative.

(10) synchronous_commit

With the default configuration, commit happens after the success response returned to the client. Turning this parameter on means commit happens when it is executed in the database server regardless of getting the success message or not. If data integrity is super important than response time, turn it on. The general recommendation is to keep it off for safety.

REFERENCE

PostgreSQL Configuration for Humans
Tuning Your PostgreSQL Server
Five Steps to PostgreSQL Performance
Performance Tuning PostgreSQL
Basic of Tuning Checkpoint
Autovacuum Tuning Basics

Git
How to specify which Node version to use in Github Actions

When you want to specify which Node version to use in your Github Actions, you can use actions/setup-node@v2. The alternative way is to use a node container. When you try to use a publicly available node container like runs-on: node:alpine-xx, the pipeline gets stuck in a queue. runs-on is not …

AWS
Using semantic-release with AWS CodePipeline and CodeBuild

Here is the usual pattern of getting the source from a git repository in AWS CodePipeline. In the pipeline, we use AWS CodeStart to connect to a repo and get the source. Then, we pass it to the other stages, like deploy or publish. For some unknown reasons, CodePipeline downloads …

DBA
mysqldump Error: Unknown table ‘COLUMN_STATISTICS’ in information_schema (1109)

mysqldump 8 enabled a new flag called columm-statistics by default. When you have MySQL client above 8 and try to run mysqldump on older MySQL versions, you will get the error below. mysqldump: Couldn’t execute ‘SELECT COLUMN_NAME, JSON_EXTRACT(HISTOGRAM ‘$”number-of-buckets-specified”‘) FROM information_schema.COLUMN_STATISTICS WHERE SCHEMA_NAME = ‘myschema’ AND TABLE_NAME = ‘craue_config_setting’;’: Unknown …