Database engineering, architecture, startups, and other assorted bits

Open source databases; scaling the application stack.

After reading Allan Packers excellent blog entry; Are Proprietary Databases Doomed?, a couple of additional thoughts came to mind. He mentions that open source databases (OSDB’s) are becoming ‘good enough’, and explains about how most users don’t need or use the vast amount of the features available in the proprietary databases, and thus, OSDB’s are becoming more popular. Coming from a proprietary background myself, scaling Oracle at Paypal and then Ebay, I know that side of the business pretty well. In the last year, I have made the professional leap to OSDB’s in a big way at Hi5, where we do quite a bit of workload on an OSDB; PostgreSQL. So Allan’s points hit home for me.

However, I think there is additional context to why the OSDB market is growing. The stack is changing around OSDB’s to enable them to make up market segment faster than they normally would. Take any new large scale internet architecture, they are almost universally using OSDB’s. Is this trend because OSDB’s are getting so good, and copying the feature set of the propriety databases? No, not really. It’s because the OSDB software has become *good enough*, good enough when we, as software engineers code around many of the features the OSDB’s lack (and propriety vendors market as differentiators). In order to scale your company on the internet, you must scale the entire application stack, not just part of your application, hence the adoption of architectures incorporating splitting and sharding.

So where an OSDB may fall short on feature set or performance, we engineer the shortcoming out of the equation. This approach renders many of the new features of the proprietary databases, well, not so attractive. Take for instance, partitioning; stable and robust in Oracle, and well, not so good in PostgreSQL. Architectures like ours here at Hi5 (dated .pdf at this point) are designed such that we don’t need the database to handle some of these more complex configurations like partitioning. The application is partitioned natively, and all the database engine sees are regular logically partitioned tables. OK, fine, so what about result caching? Nope, we use memcached for that. How about stored outlines and plan stability? Not needed, we partition our databases so small and simple that plans rarely deviate. Parse storms? Sure we see that problem, but heck the tables are so small (in comparison), the impact is fairly small. What about online redefinition? Forget it, we can ‘mark down’ databases, features, or tables in our application so that the customer does not see much impact. And what is great is, the open source model lends itself to this architecture, since it’s free in terms of license costs, we can split and make more databases as needed, and the near zero entry cost allows for rapid innovation and growth. Sure my point is more directed at internet based database usage, and not a more traditional usage, but thats the place where I play.

I think when you look at why OSDB’s have gained popularity, one has to additionally consider how they are incorporated into a scalable, robust application stack. I think the idea of engineering your whole stack will continue to grow (at least in the internet space), and perhaps an even larger market share will come with it. This is one of the reasons I made the leap to the open source community, and I am glad I did.

2 Discussions on
“Open source databases; scaling the application stack.”
  • I think it comes down to two things: cost and (as Jignesh says), ease of use. As Postgres and MySQL (and firebird, ingres, etc) get easier to use more people are willing to play with them. As they play around, they can see exactly what they are capable of so they start applying them.

    Eventually, they start making recommendations that smaller or non-mission critical apps be implemented with an OSDB.

    Startups are the perfect fit for an OSDB model as they rarely have the huge amounts of capital to start with a commercial database. An example is an oracle enterprise edition database on a 4 socket computer with quad core cpus. Without any discounts, that’s 4*4*47k=$752,000. That doesn’t include the hardware.

    But, if you want reliability, scalability and managability in a general purpose database, no one does that better than oracle and large enterprises are willing to pay for that. Oracle also discounts quite deeply for larger customers.

    Good post.


  • Kenny,

    I agree with you. Its really the applications that need to be aware of what is available and what is not and what needs to be work around with Open Source Databases. Once that happens and the ecosystem matures, then the differentiators of proprietary databases boils down to their value-add to this new ecosystem.

    Another big holdup I believe is Ease of Use. As things become more and more easier to do with OSDB and meet the basic feature sets needed and the deficiencies can be work around, you will see more and more people using OSDBs.


Leave A Comment

Your email address will not be published.