The love of Data, Database Engineering, Architecture, Entrepreneurship, and other assorted bits

pg_standby lag monitoring

24 June 2008

I have been using pg_standby quite a bit lately, and thus we needed some management tools around it.  One simple thing that was sorely missed was lag monitoring.  When you have 100′s of standby’s,  and many of them on the bleeding edge of keeping up with the primary, then lag monitoring is very important.  Heck, it’s even important if you just have one standby database.  If your standby is lagging when you need it, you might effect your ability to have a up to date database when you fail over to it. The problem is that PostgreSQL has no way to actively query the database for it’s lag amount, because it’s actively recovering.  So one must rely on the control file for the last checkpoint time as the most accurate marker of how up to date the standby actually is. Here is some code for implementing this: 

Read More

Mapping Linux memory for PostgreSQL using fincore

18 June 2008

PostgreSQL, more-so than other databases generally relies on OS file system cache to keep performance high. It is pretty common not to see gains in performance for buffer cache sizes over about 2GB, why this is true I am unsure, but perhaps the tracking of buffer overhead becomes high? Anyways, since the OS buffer cache keeps file buffers in memory as well, there are lots of database objects in the PostgreSQL cache, OS cache, or both. We can debate <a href=>double buffering</a> later.

Read More

Incremental Commits in PostgreSQL

30 April 2008

On large and/or busy databases it’s often not feasible to update an entire table at one time. For instance, if one has to perform some background DML task to change all credit card limits on a 300gb table that sees 1000 transactions per second, and increase each card limit by $1000. Simply executing an update statement with no predicate would attempt to lock every row in the table and perform the update. Even if the lock was granted on such a busy table, you would block all updates to those rows (they would enqueue) until you commit (or rollback), and you would also cause lots of reads from the read-before images. This is bad in Oracle; <a href=>ORA-01555: snapshot too old</a>, and also has <a href=>vacuum implications</a> in PostgreSQL. So in order to run our theoretical job above, we need some sort of procedure that runs in the background and commits the work every N rows. This is a pretty common technique in Oracle PL/SQL, and a simple construct exists:

Read More

int vs bigint

18 April 2008

So which one is a better use of space given that an application uses 32bit integers? In PostgreSQL, the int column takes only 4 bytes, and the bigint takes 8 bytes. So one would think there is significant space savings to be had by using int. In the real world with disk space costs where they are at, it just does not matter in a practical sense. Furthermore, if you are running on an <a href=>extent based filesystem</a> like ext3 or vxfs, then you really don’t see any practical benefits. Why? Well, your extent size is likely to be much larger then (number of tuples * 4 bytes) so the difference in size is masked because the FS sizes up to the next extent size. Consider the below example:

Read More

The next breed of DBA’s?

10 April 2008

University of California, Berkeley and Carnegie Mellon University students now have what appears to be a <a href=>first rate course</a> on database internals. What I like about this course is not only is it very practical experience, but also, that it teaches internals. So much of how a database works has to do with how it’s built, and the syllabus shows that this course should turn out excellent talent ready to develop on any database platform. However, how many of these students will go on to become DBA’s? I hope many, but it’s not likely I am afraid. It’s too bad, because this is the kind of knowledge that makes excellent DBA’s.

Read More