Hello ObjectRocket

I am very excited to say that it’s time to say hello to ObjectRocket! A few weeks ago I made the move from a great place, to an even better opportunity. A couple of amazing guys and myself founded ObjectRocket Inc. It’s time to make ObjectRocket my full time job.

I have been at Shutterfly for almost two and a half years now. I am very proud of the teams achievements in this timeframe. We brought a completely new technology into Shutterfly, and while we had some kinks, I am very proud of where we ended up. Shutterfly is one of the larger and perhaps more complex MongoDB production installations in existence. For some amount of time I will continue to help Shutterfly transition.

We created ObjectRocket with the idea that we can do something better than anyone else. Even the ‘big guys’. It’s time for me to focus on making customers happy with fantastic Engineering and Technology. Our initial concept has some legs, so I am going to pour my energies into making this idea a success. Frankly, I have never been so excited about something I have been doing in my life.

What is it we are doing? Well, we aren’t quite ready to announce that quite yet. But I can say that we plan to solve, what we believe are, serious problems in the MongoDB and NoSQL space. For now, it’s heads down on building something completely new and amazing. If you are curious, we will provide updates on Twitter so just follow us.

Posted in Database Engineering, Mongodb | Leave a comment

Mongoperf

Starting with version 2.1, MongoDB has a new utility called Mongoperf. Mongoperf is a great little utility for quickly testing the I/O performance of your system. I thought I would go over it a little bit.

Mongoperf can be found in the bin directory of your 2.1+ MongoDB distribution. The utility is invoked with options to tell it how to perform a sample I/O run, and it gives output indicating the performance of your disk I/O subsystem. The utility generates random I/O over a single file. One great aspect of this utility is it accesses the I/O subsystem very much like MongoDB itself does. It uses the same memory mapped files interface just like MongoDB itself. Running the utility with –help outputs the various options for running. The utility takes a JSON document as input. Typically I like to store the various options in a .js file and emit that into stdin of the utility as such:

$>echo "{nThreads:2,w:true,r:false,mmf:true}" > perfsession1.js
$>
$>mongoperf < perfsession1.js

The most powerful usage of this tool is to bypass the Linux page cache, and perform Direct I/O. Running in this manner shows true gauge on how fast your storage subsystem is. This is done via the {mmf:false} assignment in the config document. When mmf is false, then Mongoperf performs direct I/O via the O_DIRECT flag. So no system memory is used to cache file buffers. That said, anything below the linux filesystem could be cached, including controller and drive caches. For example:

$>echo "{nThreads:2,w:true,r:false,mmf:false,fileSizeMB:10000}" > perfsession1.js
$>
$>mongoperf < perfsession1.js

The output of the utility shows the throughput as the threads ramp up to the nThreads limit you specify and then will just run at that level of concurrency forever. The output is in Disk Operations per Second. For example:

mongoperf < perfsession1.js
mongoperf
use -h for help
options:
{ nThreads: 2, w: true, r: false, mmf: true, fileSizeMB: 256 }
creating test file size:256MB ...
testing...
new thread, total running : 1
9922222 ops/sec 
11197085 ops/sec 
11130190 ops/sec 
11308340 ops/sec 
11068894 ops/sec 
11084711 ops/sec 
11447503 ops/sec 
11233556 ops/sec 
new thread, total running : 2
12925223 ops/sec 
12142161 ops/sec 
12290461 ops/sec 
12476609 ops/sec 
12392879 ops/sec 
11772720 ops/sec 
12308988 ops/sec

There are a couple things to note about this utility. For one, be sure to specify a reasonably large file for fileSizeMB. Maybe like 10000. This ensures you are wider than a single stripe if using RAID, this also ensures that you have enough data on disk that random I/O is truly random. Let the test run long enough that the system normalizes (caches get populated, etc). Try for 3-5 minutes or so as a general guideline. Also try different nThreads values to get a feel for the level of disk concurrency your system can tolerate. Also, another great test is to test the read and write options together, and separate to get a feel for how well your system does on each on of these types of workload. For instance, you can test the write cache on a controller quite effectively using this approach.

In the future I think having an option for creating multiple files to I/O against would be a great option and allow for a bit more realistic workload. Perhaps an option to output the results in BSON/JSON would be cool too.

Posted in Mongodb | Leave a comment

Chevrolet Volt

I have had my Volt for about a month now, and it’s a pretty darn amazing piece of engineering. If you don’t know much about the volt, here is a quickie intro.

If you really want to geek out, check out the deep dive series of video’s on the amazing drivetrain in the car.

Posted in Random | Leave a comment

SOPA and PIPA explained

This is a nice explanation of the implications of SOPA and PIPA.

Posted in Random | Leave a comment

MongoSV follow up

What a madhouse!  The 2011 MongoSV conference was a blast, it didn’t hurt there were over 1000 attendees.  I really enjoyed some excellent talks, and good discussions with colleges and 10gen employees.  Plus, the opportunities in the evenings for a couple cold beverages were fantastic.

I have a couple take-aways from this years conference:

  • The new aggregation features of MongoDB are going to be great, I am excited to start playing with them.
  • 10gen is listening to us about the locking behavior of MongoDB, and the roadmap is promising.
  • Lots of people use AWS for MongoDB.  More than I would have thought considering EBS is horrid.  Most everyone thought AWS and EBS was painful, but the pain was outweighed by the perceived benefits.
  • Everyone agrees schema design is key.  The money quote from the conference was:
Schema free != Design free –@nathenharvey @nathanharvey 

  • Some people are using MongoDB for what I would call, non-optimal use cases.  I think one has to be careful to really think hard about when to use MongoDB and when not to.  It’s an exciting product, but not a cure-all.  I listened to more than one talk where folks will have a hard time continuing to use MongoDB.
  • There is still a lot to be learned about how to operate MongoDB in a production environment.  Hrm, seems like a good subject for future talks.
  • Someone needs to do a talk on locking internals so folks can really understand that locks in MongoDB are not the same locks as in most RDBMS’s.
  • The SJC airport is awesome.  I had to fly out Friday night to LA, and it was seamless.  Sorry I missed the after-party.
Here is the video from my session:


Watch live video from mongodb on Justin.tv

Posted in Mongodb | Leave a comment

Masters of MongoDB

The day before this years MongoSV Conference a few folks, chosen by 10gen, and labelled the Masters of MongoDB gathered to discuss the MongoDB platform, swap war stories, share gripes, and overall give feedback on the direction and roadmap of MongoDB to the 10gen folks.

First of all, I am incredibly honored to be considered in such a group of high caliber folks. But beyond that I was struck by the level of commitment that 10gen has to making sure the roadmap is proper and that they are listening very closely to the community. I feel like with such a commitment the areas for improvement in MongoDB will get good attention and focus. I think this is a huge component for consideration when looking at MongoDB. The community is very very strong.  I made lots of good contacts and was able to connect with friends.

Thanks 10gen for putting on such a good event, and above all, listening to your customers.

Posted in Mongodb | Leave a comment

MySQL/Hbase tech talk at FB

I was sorry to miss the event as some really good stuff was being discussed. Paul has some really great points about single server performance at the Q&A at the end.

Watch live streaming video from fbtechtalks at livestream.com
Posted in Database Engineering, MySQL | Leave a comment

I am speaking at the MongoSV 2011 conference

I had a great time presenting at MongoSV last year.  This year I am presenting a talk on MongoDB performance tuning, similar to what I presented at MongoSF but with some updated items around scalability, tuning, and hardware!  My college Luciano Resende is also speaking.  I hope to give some insight into some of the work we have done at Shutterfly around performance tuning and scalability.  I also hope to speak in more detail about our implementation of the Facebook Flashcache technology.  If you haven’t signed up, you can register here for the conference. Here is a video 10gen put out for the conference:

*changed the video to the updated version from 10gen

Posted in Mongodb | 3 Comments

Stellar use cases for MongoDB

MongoDB has a nice wide sweet spot where it’s a very useful persistence platform, however, it’s not for everything. I thought I would quickly enumerate a couple great use cases that have come up in the last year and a half and why they are such a great fit for MongoDB.

1. Documents: Using MongoDB instead of a XML based system.

MongoDB is a document oriented data store. XML is a document language. By moving a traditional XML app to MongoDB one can experience a few key advantages. The typical pattern in XML is to fetch an entire document, work with it, and put it back to the server. This approach has many downsides including the amount of data transmitted over the wire, collision detection/resolution, data set size, and server side overhead. In the MongoDB model, documents can be updated atomically, fetched by index, and even partially fetched. Applications are simpler, faster, and more robust.

2. Metadata storage systems.

Any system that stores metadata can be a great use case in MongoDB. Such systems typically have a pattern of adding attributes about some type of entitiy, and then needing to query/sort/filter based on these items. The prototypical use case for such a system is the use of tags. The tag implementation is so superior in MongoDB that almost single handedly compels one to use MongoDB for any system needing tags. Simply put:

db.mymetadata.save({stuff:"some data here", thing:"/x/foo/bar.mpg", tags:['cats','beach','family']})
db.mymetadata.ensureIndex({"tags":-1})
db.mymetadata.find({tags:'cats'})
...
"indexBounds" : {
		"tags" : [
			[
				"cats",
				"cats"
			]
		]

In many metadata systems the schema may vary depending on the metadata itself. This allows for huge degrees of flexibility in the data modeling of applications that store metadata. Imagine a media storage service that can store video and image data in the same collection but with different attributes about each type of metadata. No joins needed on query, and the logical I/O path is minimized! MongoDB now supports sparse indexes, so indexes on attributes that are not in every document are kept at a minimum size.

3. Read intensive systems

Any system where the amount of change is low, and read is high is a nice sweet spot for MongoDB. MongoDB has a nice scaling property with both the replica sets functionality (setting SLAVE_OK), as well as using sharding. Combine this with the document model, and metadata storage capabilites one has an excellent system for say a gawker clone. Reads can come off any one of N sharded nodes by say, story_id, and reads can be geographically targeted to a slave for reads. Keep your data clustered by key for super fast I/O.

Posted in Data Architecture, Database Engineering, Mongodb | Tagged | Leave a comment

MongoSF 2011 slides: MongoDB Performance Tuning

Here are my slides from MongoSF 2011:

MongoDB Performance Tuning

Posted in Mongodb | Tagged , , | Leave a comment