<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>kennygorman.com</title>
	<atom:link href="http://www.kennygorman.com/wordpress/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.kennygorman.com/wordpress</link>
	<description>database engineering, architecture, and other assorted bits</description>
	<lastBuildDate>Tue, 24 Aug 2010 21:02:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>MongoDB: Lagged Replica with Replica Sets</title>
		<link>http://www.kennygorman.com/wordpress/?p=699</link>
		<comments>http://www.kennygorman.com/wordpress/?p=699#comments</comments>
		<pubDate>Tue, 24 Aug 2010 20:57:38 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Data Architecture]]></category>
		<category><![CDATA[Database Engineering]]></category>
		<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[replica sets]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=699</guid>
		<description><![CDATA[In an enterprise database architecture, it&#8217;s very common to create a standby or replica database with a &#8216;lag&#8217; in it&#8217;s state relative to the primary. Operations applied to the primary are not seen on the replica for some amount of &#8230; <a href="http://www.kennygorman.com/wordpress/?p=699">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>In an enterprise database architecture, it&#8217;s very common to create a standby or replica database with a &#8216;lag&#8217; in it&#8217;s state relative to the primary.   Operations applied to the primary are not seen on the replica for some amount of pre-determined timeframe.  The purpose of such an architecture is to protect yourself against an accidental deletion, code bug, corruption, table drop, etc.  If something really bad happens to the primary it may replicate that horrible thing before someone can step in and correct it.  A lagged replica solves this problem by giving some amount of time to stop the replica from ingesting the change, and allowing an operator to use the clean data to fix the primary or even roll back to a earlier image.</p>
<p>How long should you lag your replica?  Thats up to you, but as a general rule of thumb 8 hours would leave you reasonable time to detect a data problem and take corrective action.</p>
<p>MongoDB now has this capability with the 1.7.x versions of MongoDB.  For now, you will have to use the nightly builds in order to have the capabilities.  But on release of 1.7 it will be generally available.  Here is how it works.</p>
<p>Setup a replica set like normal.  But be sure to specify a slave with some amount of lag.  It&#8217;s important to make sure you set priority=0 on this slave so it never automatically becomes master.  Thus, it makes sense to always have at least 1 primary and 2 replicas in a lagged replica configuration.   1 primary, 1 replica for failover, then a lagged replica to ensure data safety.<img src="http://www.gliffy.com/pubdoc/2214542/L.png"/></p>
<p>In the above example, here is the config:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript"><span style="color: #66cc66;">&gt;</span> c=<span style="color: #66cc66;">&#123;</span>_id:<span style="color: #3366CC;">&quot;sfly&quot;</span>,
         members:<span style="color: #66cc66;">&#91;</span>
             <span style="color: #66cc66;">&#123;</span>_id:<span style="color: #CC0000;">0</span>,host:<span style="color: #3366CC;">&quot;host_a:27017&quot;</span><span style="color: #66cc66;">&#125;</span>,
             <span style="color: #66cc66;">&#123;</span>_id:<span style="color: #CC0000;">1</span>,host:<span style="color: #3366CC;">&quot;host_b:27017&quot;</span><span style="color: #66cc66;">&#125;</span>,
             <span style="color: #66cc66;">&#123;</span>_id:<span style="color: #CC0000;">2</span>,host:<span style="color: #3366CC;">&quot;host_c:27017&quot;</span>,priority:<span style="color: #CC0000;">0</span>,slaveDelay:<span style="color: #CC0000;">120</span><span style="color: #66cc66;">&#125;</span>,
             <span style="color: #66cc66;">&#123;</span>_id:<span style="color: #CC0000;">3</span>,host:<span style="color: #3366CC;">&quot;host_d:27017&quot;</span>,arbiterOnly:<span style="color: #003366; font-weight: bold;">true</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&gt;</span>
<span style="color: #66cc66;">&#123;</span>
	<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #3366CC;">&quot;sfly&quot;</span>,
	<span style="color: #3366CC;">&quot;members&quot;</span> : <span style="color: #66cc66;">&#91;</span>
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">0</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_a:27017&quot;</span>
		<span style="color: #66cc66;">&#125;</span>,
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">1</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_b:27017&quot;</span>
		<span style="color: #66cc66;">&#125;</span>,
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">2</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_c:27017&quot;</span>,
			<span style="color: #3366CC;">&quot;priority&quot;</span> : <span style="color: #CC0000;">0</span>,
			<span style="color: #3366CC;">&quot;slaveDelay&quot;</span> : <span style="color: #CC0000;">120</span>
		<span style="color: #66cc66;">&#125;</span>,
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">3</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_d:27017&quot;</span>,
			<span style="color: #3366CC;">&quot;arbiterOnly&quot;</span> : <span style="color: #003366; font-weight: bold;">true</span>
		<span style="color: #66cc66;">&#125;</span>
	<span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&gt;</span> rs.<span style="color: #006600;">initiate</span><span style="color: #66cc66;">&#40;</span>c<span style="color: #66cc66;">&#41;</span>;
<span style="color: #66cc66;">&#123;</span>
	<span style="color: #3366CC;">&quot;info&quot;</span> : <span style="color: #3366CC;">&quot;Config now saved locally.  Should come online in about a minute.&quot;</span>,
	<span style="color: #3366CC;">&quot;ok&quot;</span> : <span style="color: #CC0000;">1</span>
<span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&gt;</span> rs.<span style="color: #006600;">conf</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
	<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #3366CC;">&quot;sfly&quot;</span>,
	<span style="color: #3366CC;">&quot;version&quot;</span> : <span style="color: #CC0000;">1</span>,
	<span style="color: #3366CC;">&quot;members&quot;</span> : <span style="color: #66cc66;">&#91;</span>
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">0</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_a:27017&quot;</span>
		<span style="color: #66cc66;">&#125;</span>,
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">1</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_b:27017&quot;</span>
		<span style="color: #66cc66;">&#125;</span>,
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">2</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_c:27017&quot;</span>,
			<span style="color: #3366CC;">&quot;priority&quot;</span> : <span style="color: #CC0000;">0</span>,
                        <span style="color: #3366CC;">&quot;slaveDelay&quot;</span> : <span style="color: #CC0000;">120</span>
		<span style="color: #66cc66;">&#125;</span>,
		<span style="color: #66cc66;">&#123;</span>
			<span style="color: #3366CC;">&quot;_id&quot;</span> : <span style="color: #CC0000;">3</span>,
			<span style="color: #3366CC;">&quot;host&quot;</span> : <span style="color: #3366CC;">&quot;host_d:27017&quot;</span>,
			<span style="color: #3366CC;">&quot;arbiterOnly&quot;</span> : <span style="color: #003366; font-weight: bold;">true</span>
		<span style="color: #66cc66;">&#125;</span>
	<span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#125;</span></pre></div></div>

<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2FbikEgB&amp;text=MongoDB%3A+Lagged+Replica+with+Replica+Sets&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=699"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=699</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OSX + Bit.ly</title>
		<link>http://www.kennygorman.com/wordpress/?p=694</link>
		<comments>http://www.kennygorman.com/wordpress/?p=694#comments</comments>
		<pubDate>Mon, 09 Aug 2010 21:00:32 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[Random]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=694</guid>
		<description><![CDATA[A couple months ago I didn&#8217;t even know what bit.ly was, I was using tinyurl for everything. Sheez, how web 1.0 of me. But after bit.ly started using MongoDB for it&#8217;s backend services, I started using it for url shortening. &#8230; <a href="http://www.kennygorman.com/wordpress/?p=694">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><img alt="" src="http://static.betaworks.com/corpsite/images/bitly_logotype.png" class="alignleft" width="150"/>A couple months ago I didn&#8217;t even know what <a href="http://bit.ly/">bit.ly</a> was, I was using <a href="http://tinyurl.com/">tinyurl</a> for everything.  Sheez, how web 1.0 of me.  But after bit.ly <a href="http://blip.tv/file/3704043">started using MongoDB</a> for it&#8217;s backend services, I started using it for url shortening.  I just love the idea of web services, and bit.ly was crying out for a nice OSX implementation. I wanted full OSX compatibility instead of having to bring up a web browser each time I needed to shorten an url.  <a href="http://davidrpoindexter.com/tutorial/bit-ly-url-shortening-with-mac-os-x-snow-leopard-services-and-applescript/">This Automator script turned that all around for me</a>.  Now I use bit.ly for almost every url I ever copy/paste.</p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2Fdwj9Yu&amp;text=OSX+%2B+Bit.ly&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=694"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=694</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why Not Auto Increment in MongoDB</title>
		<link>http://www.kennygorman.com/wordpress/?p=661</link>
		<comments>http://www.kennygorman.com/wordpress/?p=661#comments</comments>
		<pubDate>Thu, 05 Aug 2010 00:56:48 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Data Architecture]]></category>
		<category><![CDATA[Database Engineering]]></category>
		<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[auto-increment]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[sequence]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=661</guid>
		<description><![CDATA[I came across this blog post with a nice pattern for auto-increment in MongoDB. It&#8217;s a great post, but there is something to think about beyond how to logically perform the operation; performance. The idea presented in the blog is &#8230; <a href="http://www.kennygorman.com/wordpress/?p=661">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I came across <a href="http://shiflett.org/blog/2010/jul/auto-increment-with-mongodb">this blog post</a> with a nice pattern for auto-increment in MongoDB.  It&#8217;s a great post, but there is something to think about beyond how to logically perform the operation; performance.</p>
<p>The idea presented in the blog is to utilize the MongoDB findAndModify command to pluck sequences from the DB using the atomic nature of the command.</p>

<div class="wp_syntax"><div class="code"><pre class="python">counter=db.<span style="color: black;">command</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;findandmodify&quot;</span>, <span style="color: #483d8b;">&quot;seq&quot;</span>, query=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;_id&quot;</span>:<span style="color: #483d8b;">&quot;users&quot;</span><span style="color: black;">&#125;</span>,update=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;$inc&quot;</span>:<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;seq&quot;</span>:<span style="color: #ff4500;">1</span><span style="color: black;">&#125;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
f=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;_id&quot;</span>:counter<span style="color: black;">&#91;</span><span style="color: #483d8b;">'value'</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #483d8b;">'seq'</span><span style="color: black;">&#93;</span>,<span style="color: #483d8b;">&quot;data&quot;</span>:<span style="color: #483d8b;">&quot;somedata&quot;</span><span style="color: black;">&#125;</span>
c.<span style="color: black;">insert</span><span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span></pre></div></div>

<p>When using this technique each insert would require both the insert as well as the findAndModify command which is a query plus an update.  So now you have to perform 3 operations where it used to be one.  Not only that, but there are 3 more logical I/O&#8217;s due to the query, and those might be physical I/O&#8217;s. This pattern is easily seen with the mongostat utility.</p>
<p>Maybe you still meet your performance goals.  But then again maybe not.</p>
<p>I did some testing to play with the various options.  I compared a complete insert cycle with a unique key.  The test is a simple python program that performs inserts using pymongo.  The program is a single process and I ran 3 concurrent processes just so simulate a bit of concurrency.  The save uses safe_mode=False. I tested the findAndModify approach to the native BSON objectId approach vs Python UUID generation approach.  </p>
<p>The results are:</p>
<table>
<tr>
<td>Type</td>
<td>Inserts/s</td>
</tr>
<tr>
<td>findAndModify auto-increment</td>
<td>3000</td>
<tr>
<tr>
<td><a href="http://www.mongodb.org/display/DOCSDE/Object+IDs">Native BSON objectId&#8217;s</a></td>
<td>20000</td>
<tr>
<tr>
<td><a href="http://docs.python.org/library/uuid.html">Python UUID</a></td>
<td>9000</td>
<tr>
</table>
<p>So clearly if the problem being solved can be achieved using the native BSON objectId type it should be.  This is the fastest way to save data into MongoDB in a concurrent application.</p>

<div class="wp_syntax"><div class="code"><pre class="python">f=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;data&quot;</span>,<span style="color: #483d8b;">&quot;somedata&quot;</span><span style="color: black;">&#125;</span>    <span style="color: #808080; font-style: italic;"># let MongoDB generate objectId for _id</span>
c.<span style="color: black;">insert</span><span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span></pre></div></div>

<p>That said, what if auto-increment / concurrent unique key generator is still required?  One option would be to use a relational store with a native sequence generation facility like PostgreSQL.  PostgreSQL, in my testing, achieved 389,000 keys/sec when fetching from a single sequence using about 30 processes.  Thus fetching sequences clearly outpaces the ability for MongoDB to insert them. Something like the following is possible:</p>

<div class="wp_syntax"><div class="code"><pre class="python">cur.<span style="color: black;">execute</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;nextval('users_seq')&quot;</span><span style="color: black;">&#41;</span>
s=cur.<span style="color: black;">fetchone</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
f=<span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;_id&quot;</span>:s<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,<span style="color: #483d8b;">&quot;data&quot;</span>:<span style="color: #483d8b;">&quot;somedata&quot;</span><span style="color: black;">&#125;</span>
c.<span style="color: black;">insert</span><span style="color: black;">&#40;</span>f<span style="color: black;">&#41;</span></pre></div></div>

<p>The stack used in this test is:<br />
- Sun X2270 dual quad core AMD 2376, 24GB RAM, 2 100GB SATA Drives, software RAID.<br />
- <a href="http://www.mongodb.org/">MongoDB 1.5.7</a><br />
- <a href="http://www.postgresql.org/">PostgreSQL 8.4.2</a><br />
- <a href="http://www.python.org/">Python 2.6.4</a><br />
- <a href="http://api.mongodb.org/python/1.7%2B/index.html">Pymongo 1.7</a><br />
- Linux Centos 2.6.18-128.el5 x86_64</p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2FcqJBHO&amp;text=Why+Not+Auto+Increment+in+MongoDB&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=661"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=661</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>mongo_graph: a rrdtool graphing utility for MongoDB</title>
		<link>http://www.kennygorman.com/wordpress/?p=640</link>
		<comments>http://www.kennygorman.com/wordpress/?p=640#comments</comments>
		<pubDate>Fri, 23 Jul 2010 21:32:58 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[rrdtool]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=640</guid>
		<description><![CDATA[I just uploaded a little utility that pulls performance data from MongoDB and loads it into rrdtool for trending and analysis. Sure there are options like cacti out there, but this is just a simple raw utility vs something designed &#8230; <a href="http://www.kennygorman.com/wordpress/?p=640">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I just uploaded a little utility that pulls performance data from <a href="http://www.mongodb.org/">MongoDB</a> and loads it into <a href="http://oss.oetiker.ch/rrdtool/">rrdtool</a> for trending and analysis.</p>
<p>Sure there are options like <a href="http://www.cacti.net/">cacti</a> out there, but this is just a simple raw utility vs something designed for a larger environment.  Simple.</p>
<p>Feedback would be awesome.</p>
<p><a href="http://github.com/kgorman/mongo_graph">http://github.com/kgorman/mongo_graph</a></p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2FcUKxHS&amp;text=mongo_graph%3A+a+rrdtool+graphing+utility+for+MongoDB&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=640"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=640</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data clustering in MongoDB using embedded docs</title>
		<link>http://www.kennygorman.com/wordpress/?p=611</link>
		<comments>http://www.kennygorman.com/wordpress/?p=611#comments</comments>
		<pubDate>Fri, 25 Jun 2010 23:11:33 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Data Architecture]]></category>
		<category><![CDATA[Database Engineering]]></category>
		<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=611</guid>
		<description><![CDATA[I wrote a while ago about how to cluster data to save cash. This post was geared towards relational stores. But in reality, the technique is applicable to any block store on disk. To recap, the premise is simple. When &#8230; <a href="http://www.kennygorman.com/wordpress/?p=611">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I wrote a while ago about how to <a href="http://www.kennygorman.com/wordpress/?p=334">cluster data to save cash</a>.  This post was geared towards relational stores.  But in reality, the technique is applicable to any block store on disk.  To recap, the premise is simple.  When you run a query for some amount of data, you want to minimize I/O as much as possible.  Even if the result is in cache, you still want to reduce logical I/O.  See <a href="http://www.kennygorman.com/wordpress/?p=334">my post</a> for examples.</p>
<p>So how does one manage this technique in <a href="http://www.mongodb.org/">MongoDB</a>?  If you&#8217;re not familiar MongoDB is a fairly new database that is non-relational and schema-less.   However, data density and clustering is still important.  Anytime you can reduce the amount of logical or physical I/O to return a query that is a good thing.</p>
<p>With <a href="http://jira.mongodb.org/browse/SERVER-1054">SERVER-1054</a>, the MongoDB folks implemented one key feature that helps one manage data clustering in MongoDB;  the ability to show what file/offset a given document lives in.  This allows the inspection of the location, and action to be taken.  Think of it as a measure of how fragmented your data is inside blocks.</p>

<div class="wp_syntax"><div class="code"><pre class="javascript"><span style="color: #66cc66;">&gt;</span> <span style="color: #003366; font-weight: bold;">var</span> arr=db.<span style="color: #006600;">photos</span>.<span style="color: #006600;">find</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#123;</span><span style="color: #66cc66;">&#125;</span>, <span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">'$diskLoc'</span>: <span style="color: #CC0000;">1</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #006600;">showDiskLoc</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&gt;</span> <span style="color: #000066; font-weight: bold;">for</span><span style="color: #66cc66;">&#40;</span><span style="color: #003366; font-weight: bold;">var</span> i=<span style="color: #CC0000;">0</span>; i<span style="color: #66cc66;">&lt;</span>arr.<span style="color: #006600;">length</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>; i++<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
    printjson<span style="color: #66cc66;">&#40;</span>arr<span style="color: #66cc66;">&#91;</span>i<span style="color: #66cc66;">&#93;</span>.$diskLoc.<span style="color: #006600;">offset</span><span style="color: #66cc66;">/</span><span style="color: #CC0000;">512</span><span style="color: #66cc66;">&#41;</span>
  <span style="color: #66cc66;">&#125;</span>
<span style="color: #CC0000;">3241650.34375</span>
<span style="color: #CC0000;">3241650.4453125</span>
<span style="color: #66cc66;">&gt;</span></pre></div></div>

<p>In the above example, both of these rows live in block # 3241650.  Thus the data is dense.</p>
<p>If a given set of documents are typically queried together, then they should be as dense as possible.  Additionally if they are in contiguous blocks that is good too.  </p>
<p>In a traditional RDBMS data store there are various techniques to re-cluster the data by a given key to re-arrange the data densely.  For instance using CREATE TABLE AS SELECT * FROM foo ORDER BY mykey.  However, it&#8217;s mostly a one time affair because future inserts may not be dense.</p>
<p>In MongoDB depending on the design, that may or may not be required.   A design pattern called <a href="http://www.mongodb.org/display/DOCS/Updating+Data+in+Mongo#UpdatingDatainMongo-EmbeddingDocumentsDirectlyinDocuments">embedding</a> can alleviate many of the typical problems associated with data clustering and AUTOMATICALLY keep your collection dense.  Thus further making the MongoDB seem much faster than a traditional RDBMS.</p>
<p>Let me give an example to illustrate.  Let&#8217;s give the following relational data model:</p>

<div class="wp_syntax"><div class="code"><pre class="sql">postgres<span style="color: #66cc66;">=</span><span style="color: #808080; font-style: italic;"># \d photos</span>
         <span style="color: #993333; font-weight: bold;">TABLE</span> <span style="color: #ff0000;">&quot;public.photos&quot;</span>
  <span style="color: #993333; font-weight: bold;">COLUMN</span>   <span style="color: #66cc66;">|</span>     Type      <span style="color: #66cc66;">|</span> Modifiers 
<span style="color: #808080; font-style: italic;">-----------+---------------+-----------</span>
 id        <span style="color: #66cc66;">|</span> integer       <span style="color: #66cc66;">|</span> 
 user_id   <span style="color: #66cc66;">|</span> integer       <span style="color: #66cc66;">|</span> 
 file_path <span style="color: #66cc66;">|</span> character<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">42</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span></pre></div></div>

<p>And the typical access path is:</p>

<div class="wp_syntax"><div class="code"><pre class="sql">  <span style="color: #993333; font-weight: bold;">SELECT</span> * <span style="color: #993333; font-weight: bold;">FROM</span> photos <span style="color: #993333; font-weight: bold;">WHERE</span> user_id <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">10</span>;</pre></div></div>

<p>Then one can expect results (worse case) where all the results are in different blocks.  Thus at least 3 I/O operations to return this query.   If they were dense, they would be all in one block.</p>

<div class="wp_syntax"><div class="code"><pre class="sql">postgres<span style="color: #66cc66;">=</span><span style="color: #808080; font-style: italic;"># select ctid, * from photos where user_id=10;</span>
 ctid  <span style="color: #66cc66;">|</span> id <span style="color: #66cc66;">|</span> user_id <span style="color: #66cc66;">|</span>                 file_path                  
<span style="color: #808080; font-style: italic;">-------+----+---------+--------------------------------------------</span>
 <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span>,<span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span>  <span style="color: #cc66cc;">1</span> <span style="color: #66cc66;">|</span>      <span style="color: #cc66cc;">10</span> <span style="color: #66cc66;">|</span> /home/foo/<span style="color: #cc66cc;">1</span>.jpg                           
 <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">22</span>,<span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">24</span> <span style="color: #66cc66;">|</span>      <span style="color: #cc66cc;">10</span> <span style="color: #66cc66;">|</span> /home/foo/<span style="color: #cc66cc;">2</span>.jpg                           
 <span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">334</span>,<span style="color: #cc66cc;">3</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">|</span> <span style="color: #cc66cc;">23</span> <span style="color: #66cc66;">|</span>      <span style="color: #cc66cc;">10</span> <span style="color: #66cc66;">|</span> /home/foo/<span style="color: #cc66cc;">3</span>.jpg</pre></div></div>

<p>In MongoDB the following model can be used to *always* keep the data dense and tightly clustered.</p>

<div class="wp_syntax"><div class="code"><pre class="javascript">photos=
<span style="color: #66cc66;">&#123;</span> <span style="color: #3366CC;">&quot;_id&quot;</span> : ObjectId<span style="color: #66cc66;">&#40;</span><span style="color: #3366CC;">&quot;4c252807164314895e44fb6d&quot;</span><span style="color: #66cc66;">&#41;</span>,
  <span style="color: #3366CC;">&quot;user_id&quot;</span> : <span style="color: #CC0000;">10</span>,
  <span style="color: #3366CC;">&quot;paths&quot;</span> : <span style="color: #66cc66;">&#91;</span><span style="color: #3366CC;">'/home/foo/1.jpg'</span>,<span style="color: #3366CC;">'/home/foo/2.jpg'</span>,<span style="color: #3366CC;">'/home/foo/3.jpg'</span><span style="color: #66cc66;">&#93;</span>
<span style="color: #66cc66;">&#125;</span></pre></div></div>

<p>And a query would be:</p>

<div class="wp_syntax"><div class="code"><pre class="javascript">db.<span style="color: #006600;">photos</span>.<span style="color: #006600;">find</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#123;</span><span style="color: #3366CC;">&quot;owner&quot;</span>:<span style="color: #CC0000;">10</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>The data payload is exactly 1 I/O.  As the embedded document grows over the block size it would start to span multiple blocks.  So this is an additional design consideration.  Keep embedded documents less than the block size or you may not be able to see additional benefits.</p>
<p>Embedding may not always be possible.  But if one is aware of the potential I/O savings when performing the design then it&#8217;s just another data point to making a more intelligent and fast performing data store.</p>
<p>MongoDB does not yet have the simple capability to rebuild a collection and re-order the data in a simple operation.  This is the technique used on the RDBMS side pretty commonly and shown in my previous post examples.  So design with that in mind.</p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2Fdge5HO&amp;text=Data+clustering+in+MongoDB+using+embedded+docs&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=611"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=611</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>mongostat</title>
		<link>http://www.kennygorman.com/wordpress/?p=606</link>
		<comments>http://www.kennygorman.com/wordpress/?p=606#comments</comments>
		<pubDate>Tue, 22 Jun 2010 02:46:39 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[mongostat]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=606</guid>
		<description><![CDATA[The MongoDB command line performance monitoring utility named mongostat is now (well since 1.3.3) part of the core distribution of Mongodb. The python version hosted on my site is now deprecated in lieu of the C++ version in the distro.]]></description>
			<content:encoded><![CDATA[<p>The MongoDB command line performance monitoring utility named <a href="http://www.mongodb.org/display/DOCS/mongostat">mongostat</a> is now (well since 1.3.3) part of the <a href="http://github.com/kgorman/mongo/blob/master/tools/stat.cpp">core distribution</a> of Mongodb.  The python version hosted on my site is now deprecated in lieu of the C++ version in the distro.</p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2FbQqehn&amp;text=mongostat&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=606"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=606</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WordPress 3.0</title>
		<link>http://www.kennygorman.com/wordpress/?p=581</link>
		<comments>http://www.kennygorman.com/wordpress/?p=581#comments</comments>
		<pubDate>Mon, 21 Jun 2010 18:39:41 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Random]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=581</guid>
		<description><![CDATA[On June 17th, WordPress 3.0 was launched. I decided to take the plunge and upgrade. There are just so many compelling features it&#8217;s hard not to. Part of the new release is the twenty ten theme with some pretty exciting &#8230; <a href="http://www.kennygorman.com/wordpress/?p=581">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>On June 17th, WordPress 3.0 <a href="http://wordpress.org/development/2010/06/thelonious/">was launched</a>.  I decided to take the plunge and upgrade. There are just so many compelling features it&#8217;s hard not to.  Part of the new release is the twenty ten theme with some pretty exciting features including a new menu system that I hope to take advantage of as well as featured images.  I decided to go ahead and use it, and thus the new look.</p>
<p>I try not to blog about blogging, but I couldn&#8217;t help it in this case.  The upgrade is pretty compelling, and I thought I would share my thoughts.</p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2Far12XY&amp;text=Wordpress+3.0&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=581"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=581</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MongoSF; Video&#8217;s up</title>
		<link>http://www.kennygorman.com/wordpress/?p=570</link>
		<comments>http://www.kennygorman.com/wordpress/?p=570#comments</comments>
		<pubDate>Tue, 11 May 2010 17:41:30 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[mongosf]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=570</guid>
		<description><![CDATA[The videos from MongoSF are starting to get posted now on the 10gen site. The presentations are there too. Here is my talk:]]></description>
			<content:encoded><![CDATA[<p>The videos from MongoSF are starting to get posted now on the <a href="http://www.10gen.com/event_mongosf_10apr30">10gen site</a>.  The presentations are there too.  Here is my talk:</p>
<p><embed src="http://blip.tv/play/AYHcww0C" type="application/x-shockwave-flash" width="480" height="350" allowscriptaccess="always" allowfullscreen="true"></embed></p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2Fc9HU1t&amp;text=MongoSF%3B+Video%27s+up&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=570"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=570</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wayback Machine: snapshots still valid technique</title>
		<link>http://www.kennygorman.com/wordpress/?p=552</link>
		<comments>http://www.kennygorman.com/wordpress/?p=552#comments</comments>
		<pubDate>Mon, 10 May 2010 21:10:06 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Database Engineering]]></category>
		<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[Oracle]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[backups]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[snapshots]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=552</guid>
		<description><![CDATA[I came across this old article I wrote for the NOCOUG newsletter in 2002 about using OS snapshots for backups. This technique is still very much a valid and widely used technique to perform backups. The idea is simple: - &#8230; <a href="http://www.kennygorman.com/wordpress/?p=552">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I came across this <a href="http://www.kennygorman.com/2002_os_snapshots_for_backup.pdf">old article</a> I wrote for the <a href="http://www.nocoug.org/">NOCOUG</a> newsletter in 2002 about using OS snapshots for backups.  This technique is still very much a valid and widely used technique to perform backups.  The idea is simple:</p>
<p>- Stop I/O temporarily<br />
- Snapshot the filesystem (OS snapshot, rsync, whatever)<br />
- Release I/O<br />
- Backup any logs needed to recover point in time</p>
<p>This technique works for many different data stores.  In the article I only show Oracle.  But many other databases have the same capabilities for backups.  Here are some examples:</p>
<p>PostgreSQL:</p>

<div class="wp_syntax"><div class="code"><pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> pg_start_backup<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'label'</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #808080; font-style: italic;">-- snapshot the DB here</span>
<span style="color: #993333; font-weight: bold;">SELECT</span> pg_stop_backup<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #808080; font-style: italic;">-- backup wal logs here</span></pre></div></div>

<p>You can find all the details of this kind of backup in the <a href="http://www.postgresql.org/docs/8.1/interactive/backup-online.html">PostgreSQL docs</a>.</p>
<p>MongoDB:</p>

<div class="wp_syntax"><div class="code"><pre class="java"> <span style="color: #66cc66;">&gt;</span> use admin
switched to db admin
<span style="color: #66cc66;">&gt;</span> db.<span style="color: #006600;">runCommand</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#123;</span>fsync:<span style="color: #cc66cc;">1</span>,lock:<span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#125;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#123;</span>
	<span style="color: #ff0000;">&quot;info&quot;</span> : <span style="color: #ff0000;">&quot;now locked against writes&quot;</span>,
	<span style="color: #ff0000;">&quot;ok&quot;</span> : <span style="color: #cc66cc;">1</span>
<span style="color: #66cc66;">&#125;</span>
<span style="color: #808080; font-style: italic;">// snapshot the DB here</span>
<span style="color: #66cc66;">&gt;</span> db.$cmd.<span style="color: #006600;">sys</span>.<span style="color: #006600;">unlock</span>.<span style="color: #006600;">findOne</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">;</span>
<span style="color: #66cc66;">&#123;</span> <span style="color: #ff0000;">&quot;ok&quot;</span> : <span style="color: #cc66cc;">1</span>, <span style="color: #ff0000;">&quot;info&quot;</span> : <span style="color: #ff0000;">&quot;unlock requested&quot;</span> <span style="color: #66cc66;">&#125;</span></pre></div></div>

<p>You can find the docs on this procedure on the <a href="http://www.mongodb.org/display/DOCS/Backups">MongoDB site</a>.</p>
<p>I thought I would include the original article here even though it&#8217;s going on 8 years old!</p>
<p><a href="http://www.kennygorman.com/2002_os_snapshots_for_backup.pdf">OS Snapshots for Backup;<br />
Utilizing operating system snapshots for quick and painless Oracle database backup and restore.</a> from VOL. 16, No. 2 · MAY, 2002 of the NOCOUG Journal</p>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2FbrYLfH&amp;text=Wayback+Machine%3A+snapshots+still+valid+technique&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=552"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=552</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MongoSF Slides</title>
		<link>http://www.kennygorman.com/wordpress/?p=546</link>
		<comments>http://www.kennygorman.com/wordpress/?p=546#comments</comments>
		<pubDate>Mon, 03 May 2010 17:48:49 +0000</pubDate>
		<dc:creator>kgorman</dc:creator>
				<category><![CDATA[Data Architecture]]></category>
		<category><![CDATA[Database Engineering]]></category>
		<category><![CDATA[Mongodb]]></category>
		<category><![CDATA[mongosf]]></category>

		<guid isPermaLink="false">http://www.kennygorman.com/wordpress/?p=546</guid>
		<description><![CDATA[I had a great time at the MongoSF Conference on Friday. There were a ton of great presentations, and lots and lots of excitement. A big thanks to 10gen for inviting me to speak. I had a great time and &#8230; <a href="http://www.kennygorman.com/wordpress/?p=546">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I had a great time at the <a href="http://mongosf.eventbrite.com/">MongoSF Conference</a> on Friday.  There were a ton of great presentations, and lots and lots of excitement.  A big thanks to <a href="http://www.10gen.com/">10gen</a> for inviting me to speak.  I had a great time and I hope everyone learned a lot from our experiences so far with MongoDB.  I especially liked Mike Dirolf&#8217;s discussion on Python and <a href="http://api.mongodb.org/python/">pymongo</a>.  There have been lots of changes as of late, and most of them fantastic!</p>
<p>Here are my slides from my presentation:</p>
<div style="width:425px" id="__ss_3925054"><strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/mongosf/implementing-mongodb-at-shutterfly-kenny-gorman" title="Implementing MongoDB at Shutterfly (Kenny Gorman)">Implementing MongoDB at Shutterfly (Kenny Gorman)</a></strong><object id="__sse3925054" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=implementingmongodb-100430192617-phpapp01&#038;stripped_title=implementing-mongodb-at-shutterfly-kenny-gorman" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse3925054" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=implementingmongodb-100430192617-phpapp01&#038;stripped_title=implementing-mongodb-at-shutterfly-kenny-gorman" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/mongosf">MongoSF</a>.</div>
</div>
<div class="tw_button" style=";float:none;margin:0 auto;text-align:center;"><a href="http://twitter.com/share?url=http%3A%2F%2Fbit.ly%2F9mNg2Y&amp;text=MongoSF+Slides&amp;related=kennygorman&amp;lang=en&amp;count=horizontal&amp;counturl=http://www.kennygorman.com/wordpress/?p=546"  class="twitter-share-button">Tweet</a></div>]]></content:encoded>
			<wfw:commentRss>http://www.kennygorman.com/wordpress/?feed=rss2&amp;p=546</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
