MongoSF 2011 slides: MongoDB Performance Tuning

Here are my slides from MongoSF 2011:

MongoDB Performance Tuning

[Read MongoSF 2011 slides: MongoDB Performance Tuning]


MongoSF 2011

I am very excited to speak @ MongoSF 2011. We have been doing quite a bit of performance tuning lately at Shutterfly as we deploy more and more MongoDB services. My hope is I can share some of what we have been doing in terms of performance tuning and performance management and it will be valuable to folks who may face performance challenges with MongoDB. I just wanted to put up some of the specific items I will be going over:

  • Utilizing the profiler, interpreting the data, and using it to make your application faster.  For instance, do you know how to see when your document updates cause a document to be read and re-written inside the datafile?
  • How to tune around the the single db-wide lock in MongoDB.  How to minimize it’s impact.
  • How to monitor using mongostat.  What to look for, and what to do when you find something bad.  For instance, are you looking at ‘locked %’?  You should be!

[Read MongoSF 2011]


PGEast and MongoDB?

When I first started playing with MongoDB I accidentially posted my posts in a category that got them put up on Planet PostgreSQL. I remember getting some pretty pissed off emails and comments. Beyond just my mistake such that the posting was on the incorrect blog, people seemed pretty pissed off about NoSQL in general. I remember thinking how closed minded that was. These are all different tools for different jobs, not a religion.

Well, it seems there is some hope! The PostgreSQL Conference Series folks have allocated some talks at PGEast for MongoDB. Nice job guys. Let’s hope there are lots of good discussions, there are lots of concepts I think that MongoDB can pick up from a more mature RDBMS like PostgreSQL and the reverse too!

[Read PGEast and MongoDB?]


Guy Harrison has a series of articles on real world NoSQL deployments over at GigaOM. The second installment was on MongoDB where he interviewed me on our deployment at Shutterfly. Check out Real World NoSQL: MongoDB at Shutterfly by Guy Harrison.

[Read GigaOM article – Real World NoSQL: MongoDB at Shutterfly]


Interview on NoSQLDatabases.com

The folks over at NoSQLDatabases.com have posted an interview they did with me on our implementation of MongoDB at Shutterfly. Good folks, great blog. Here is a link to the article. I talk a lot about what we have done at Shutterfly, In particular one item I discuss is ORM’s and the promise of not using heavyweight mappers in a non-relational architecture. I also talk a bit about the challenges and benefits of modeling data as documents. I hope it’s helpful for folks thinking about using something like MongoDB.

[Read Interview on NoSQLDatabases.com]


The video of the presentation, Sharing Life’s Joy using MongoDB: A Shutterfly Case Study, I gave at MongoSV is now online. Nice job editing the slides in 10gen!

Here are just the slides:
Sharing Lifes Joy With Mongodb

[Read Sharing Life’s Joy using MongoDB: A Shutterfly Case Study]


I am speaking at MongoSV 2010

I am excited to announce I will be speaking at the MongoSV conference Dec 3 2010. My talk, Sharing Life’s Joy using MongoDB: A Shutterfly Case Study will be focused on how we have been using MongoDB here at Shutterfly over the last year. I plan to outline some of the specific cases where MongoDB has been a massive win and some areas to be careful of if you are planning your own MongoDB application. This is a follow on to my previous talk at MongoSF with more technical depth. I will show code examples and various use cases outlining parts of our journey.

I hope to see you all there, it’s shaping up to look like an amazing lineup. I think this conference will be great for anyone new to non-relational data stores as well as people who already are neck deep in a MongoDB implementation. So Sign up NOW!

[Read I am speaking at MongoSV 2010]


Shutterfly is looking for: Staff Engineer

Shutterfly is looking for: Staff Engineer – Platform
http://jobvite.com/m?35y6Xfwq #job

[Read Shutterfly is looking for: Staff Engineer]


MongoDB: Lagged Replica with Replica Sets

In an enterprise database architecture, it’s very common to create a standby or replica database with a ‘lag’ in it’s state relative to the primary. Operations applied to the primary are not seen on the replica for some amount of pre-determined timeframe. The purpose of such an architecture is to protect yourself against an accidental deletion, code bug, corruption, table drop, etc. If something really bad happens to the primary it may replicate that horrible thing before someone can step in and correct it. A lagged replica solves this problem by giving some amount of time to stop the replica from ingesting the change, and allowing an operator to use the clean data to fix the primary or even roll back to a earlier image.

How long should you lag your replica? Thats up to you, but as a general rule of thumb 8 hours would leave you reasonable time to detect a data problem and take corrective action.

MongoDB now has this capability with the 1.7.x versions of MongoDB. For now, you will have to use the nightly builds in order to have the capabilities. But on release of 1.7 it will be generally available. Here is how it works.

Setup a replica set like normal. But be sure to specify a slave with some amount of lag. It’s important to make sure you set priority=0 on this slave so it never automatically becomes master. Thus, it makes sense to always have at least 1 primary and 2 replicas in a lagged replica configuration. 1 primary, 1 replica for failover, then a lagged replica to ensure data safety.

In the above example, here is the config:

c={_id:"sfly",
         members:[
             {_id:0,host:"host_a:27017"},
             {_id:1,host:"host_b:27017"},
             {_id:2,host:"host_c:27017",priority:0,slaveDelay:120},
             {_id:3,host:"host_d:27017",arbiterOnly:true}]
}
>
{
	"_id" : "sfly",
	"members" : [
		{
			"_id" : 0,
			"host" : "host_a:27017"
		},
		{
			"_id" : 1,
			"host" : "host_b:27017"
		},
		{
			"_id" : 2,
			"host" : "host_c:27017",
			"priority" : 0,
			"slaveDelay" : 120
		},
		{
			"_id" : 3,
			"host" : "host_d:27017",
			"arbiterOnly" : true
		}
	]
}
> rs.initiate(c);
{
	"info" : "Config now saved locally.  Should come online in about a minute.",
	"ok" : 1
}
> rs.conf()
{
	"_id" : "sfly",
	"version" : 1,
	"members" : [
		{
			"_id" : 0,
			"host" : "host_a:27017"
		},
		{
			"_id" : 1,
			"host" : "host_b:27017"
		},
		{
			"_id" : 2,
			"host" : "host_c:27017",
			"priority" : 0,
                        "slaveDelay" : 120
		},
		{
			"_id" : 3,
			"host" : "host_d:27017",
			"arbiterOnly" : true
		}
	]
}

[Read MongoDB: Lagged Replica with Replica Sets]


OSX + Bit.ly

A couple months ago I didn’t even know what bit.ly was, I was using tinyurl for everything. Sheez, how web 1.0 of me. But after bit.ly started using MongoDB for it’s backend services, I started using it for url shortening. I just love the idea of web services, and bit.ly was crying out for a nice OSX implementation. I wanted full OSX compatibility instead of having to bring up a web browser each time I needed to shorten an url. This Automator script turned that all around for me. Now I use bit.ly for almost every url I ever copy/paste.

[Read OSX + Bit.ly]


Why Not Auto Increment in MongoDB

I came across [this blog post][1] with a nice pattern for auto-increment in MongoDB. It’s a great post, but there is something to think about beyond how to logically perform the operation; performance.

The idea presented in the blog is to utilize the MongoDB findAndModify command to pluck sequences from the DB using the atomic nature of the command.

counter=db.command("findandmodify", "seq", query={"_id":"users"},update={"$inc":{"seq":1}})
f={"_id":counter['value']['seq'],"data":"somedata"}
c.insert(f)

When using this technique each insert would require both the insert as well as the findAndModify command which is a query plus an update. So now you have to perform 3 operations where it used to be one. Not only that, but there are 3 more logical I/O’s due to the query, and those might be physical I/O’s. This pattern is easily seen with the mongostat utility.

Maybe you still meet your performance goals. But then again maybe not.

I did some testing to play with the various options. I compared a complete insert cycle with a unique key. The test is a simple python program that performs inserts using pymongo. The program is a single process and I ran 3 concurrent processes just so simulate a bit of concurrency. The save uses safe_mode=False. I tested the findAndModify approach to the native BSON objectId approach vs Python UUID generation approach.

The results are:

</table>

So clearly if the problem being solved can be achieved using the native BSON objectId type it should be. This is the fastest way to save data into MongoDB in a concurrent application.

f={"data","somedata"}    # let MongoDB generate objectId for _id
c.insert(f)

That said, what if auto-increment / concurrent unique key generator is still required? One option would be to use a relational store with a native sequence generation facility like PostgreSQL. PostgreSQL, in my testing, achieved 389,000 keys/sec when fetching from a single sequence using about 30 processes. Thus fetching sequences clearly outpaces the ability for MongoDB to insert them. Something like the following is possible:

cur.execute("nextval('users_seq')")
s=cur.fetchone()
f={"_id":s[0],"data":"somedata"}
c.insert(f)

The stack used in this test is:
- Sun X2270 dual quad core AMD 2376, 24GB RAM, 2 100GB SATA Drives, software RAID.
- MongoDB 1.5.7
- PostgreSQL 8.4.2
- Python 2.6.4
- Pymongo 1.7
- Linux Centos 2.6.18-128.el5 x86_64

[1]: http://shiflett.org/blog/2010/jul/auto-increment-with-mongodb
Type Inserts/s
findAndModify auto-increment 3000
Native BSON objectId’s 20000
Python UUID 9000

[Read Why Not Auto Increment in MongoDB]


I just uploaded a little utility that pulls performance data from MongoDB and loads it into rrdtool for trending and analysis.

Sure there are options like cacti out there, but this is just a simple raw utility vs something designed for a larger environment. Simple.

Feedback would be awesome.

http://github.com/kgorman/mongo_graph

[Read mongo_graph: a rrdtool graphing utility for MongoDB]