Tuning for very large shared_buffers, using Veritas VxFS

The debate about the optimum shared_buffers size in PostgreSQL is clearly far from over. However, I have not seen any results where the buffer cache and the FS cache were tuned in unison.

Because PostgreSQL knows what buffers are in use, and knows when to flush the dirty buffers it is ideal to use as much of it as possible. Using an secondary cache (the FS) as in a ‘traditional’ PostgreSQL configuration, just introduces more workload in the system. The secondary cache needs a process to wake up and flush the buffers (pdflush), and also has to manage it’s own LRU, and likely has many duplicate buffers sitting there wasting your precious RAM.

The point I am trying to stress is that the PostgreSQL cache *and* the FS cache must be tuned together. In this case, I remove the FS cache entirely using direct I/O, and then take over that space with the fastest cache in the stack; the PostgreSQL buffer cache.

So here is a bit of testing to show the results.

The testing was performed on a set of nice new HP DL185 G5′s with 14 72GB 15k SAS Drives and Quad Core Opteron processors and 32GB of main memory. The OS is SUSE Linux EnterpriseServer 10 SP1 (x86_64) 2.6.16.46-0.12-smp. The filesystem is Veritas VxFS file system. The version of PostgreSQL tested is 8.2.5.

The test itself is a set of custom pgbench scripts. These scripts were developed using actual data from production PostgreSQL log files, and pgfouine. The access pattern is about 70% read, with 15% update and 15% insert. There is no delete activity. This matches a very specific workload on a subset of our production databases. The workload is random, and the total data size is larger than our cache size by a very large amount, this guarantees a good workout of physical disk and caches in front of them. The WAL logs and data were on the same VxFS volume.

The test runs were gathered with pgd, and graphed. Between each test the filesystem was unmounted and remounted. I tested each run 3 times and averaged the tests.

Test#1:
VxFS buffered I/O
shared_buffers=500MB
disk controller cache=512MB write, 0MB read
500GB total data size
36 concurrent backends

Test#2:
VxFS buffered I/O
shared_buffers=2GB
disk controller cache=512MB write, 0MB read
500GB total data size
36 concurrent backends

Test#3:
VxFS with convosync=direct,mincache=direct
shared_buffers=20GB
disk controller cache=512MB write, 0MB read
500GB total data size
36 concurrent backends

Conclusion:
A modest gain can be had when using a very large (comparatively) shared_buffers setting when combining that change with direct I/O. The PostgreSQL cache does scale quite nicely up to at least a 20GB cache size when configured in this manner.

Further gains could likely be achieved by separating the WAL logs with specific VxFS tunables as well as tuning other PostgreSQL parameters due to the larger cache (bgwriter tunables, etc).

This entry was posted in Database Engineering, PostgreSQL and tagged , , , . Bookmark the permalink.

5 Responses to Tuning for very large shared_buffers, using Veritas VxFS

  1. kgorman says:

    Regarding your post on your blog at http://www.xaprb.com/blog/2008/12/16/a-comment-on-very-large-shared_buffers-benchmarks/#comment-15625. The bottom axis represents a sample per 10 second period. So the number 100 represents 100, 10 second sample periods. The graph represents performance over a given time period during normal running state of the database.

  2. kgorman says:

    Ok, I am back from vacation and the fires that cropped up during them.

    Xaprb,

    I wish I knew your name. I am sorry the results don’t help you. However, I do agree that there could be more cases tested to help make the case a bit more clear. This is not meant to be a scientific test as such, but more to get you thinking. Clearly that part has worked.

    I will see what I can drum up. I definitely see your point, thanks for the comments.

  3. kgorman says:

    it was just waiting to be approved ;-)

  4. Pingback: A comment on very large shared_buffers benchmarks at Xaprb

  5. Xaprb says:

    You need to run at least 6 tests:

    2 variables of I/O tuning: direct and non-direct
    3 variables of shared_buffers size: 500M, 2G, 20G

    Your results don’t show me anything about what happens on a 32GB machine with 20G of shared_buffers and not using directio. I assume the answer is “swapping” but you know what assume is :)

blog comments powered by Disqus