kennygorman.com

Database engineering, architecture, startups, and other assorted bits

Fusion-io SSD

I got the opportunity to test out some of the new Fusion-io Solid State ioDrive, and I thought I would post some results.

Fusion-io has created a SSD product called ioDrive that is based on PCIe cards vs replacing SAS or SATA drives with SSD directly. This approach allows for much lower latency because of the use of the PCIe bus vs traditional disk channels geared towards slow disk. The 320GB model I used in my test are made of Multi Level Cell (MLC) NAND flash and are quoted by Fusion-io to achieve throughput somewhere in the 70k IOPS neighborhood.

For this test I used two identical Dell 2970 boxes, one using a 6 disk RAID 10 disk, and the other using a single Fusion-io 320GB NAND flash PCIe card. Here are the important configuration items:
- Dell 2970 2u
- (6) SAS disk RAID 10 with Perc6i controller or Fusion-io 320GB PCIe ioDrive.
- 32GB RAM
- Quad-Core AMD Opteron(tm) Processor 2347 HE 1895 MHZ 512 KB Cache
- SuSE linux; 2.6.16.46-0.12-smp x86_64
- VxFS file system with 8k block size and cached I/O
- PostgreSQL 8.2.4 with 2GB buffer cache and fsync=on
- All data on the same mount point; /data

The test I used is a custom set of pgbench scripts that represent a real world workload. The script is launched from a third host and is not run on the database host itself. The test is about 80% reads and about 20% writes. The test does not perform deletes, just select,insert,update. Typical queries are index range scan type queries where multiple rows are fetched per result set.

The performance was measured using pgstat.

The test results are shown below. Some interesting things to note:
- Notice about 400% peak improvement in performance using SSD.
- Notice at about 25 concurrent backends the machine with SSD starts to degrade.
- Notice at about 100 concurrent backends the machine with disk starts to degrade.

fusionio

When considering SSD there are some new things to think about vs traditional disk. In this test I used RAID10 for SAS drives and a single Fusion-io 320GB card. Unfair? Perhaps a bit, but one thing to consider is that SSD is more reliable than traditional disk even though it has a limited lifetime. Another thing to consider is the machine with SSD does not need as much RAM because the disk is so fast. So comparing disk to SSD directly is not always a perfect comparison. In the real world I would run 8GB of RAM on the SSD machine, and perhaps run RAID1 of 2 cards. Here is a white-paper outlining some of the differences. One other item to note is because SSD’s lifetime is effected by number of writes being performed to the drive. So RAID5 while economical could cause premature end of lifetime (writing all the parity).

The Fusion-io cards are simple to install and configure. The drivers are available on Freshmeat The drivers are available on the Fusion IO support site. They have a simple setup guide so I won’t cover any of the details, but once installed the drive appears as any other block device. Installing the Veritas file system was a little more time consuming. Here is a quick cheat sheet:

vxddladm addforeign path=/dev/fioa
vxdisk scandisks
vxdg init dbdg fio=fiob cds=off
vxassist -g dbdg -p maxsize
vxassist -g dbdg make fusionA 313563136
mkfs.vxfs /dev/vx/rdsk/dbdg/fusionA -o bsize=4096,largefiles
mount -t vxfs /dev/vx/dsk/dbdg/fusionA /data

A note about random writes. Random writes on SSD are the Achilles Heel. My tests perform random writes because this is what our workload really does. To speed up random writes some tuning measures can be performed. The Fusion-io architecture employs a background process that performs writes to the SSD media. This process can become overwhelmed, and in order to speed it up more scratch space needs to be used in a high random write environment. So formatting the disk space with less usable, and more reserve may result in a speed up in performance. This is done at format time with the fio-format tool. So test before you deploy to see what free space percentage works well with your workload. If I can grab some more time I will do so and add the results to my initial testing.

In terms of price these units are fairly expensive, but coming down in price. If you consider the TPS/$ factor then SSD is fairly competitive and when you add in the form factor savings (more TPS per U) as well as power savings; now might be about the time to jump into some Fusion-io SSD’s.

11 Discussions on
“Fusion-io SSD”
  • Kgorman,

    I don’t think that white papers from the vendor prove much except that they thought about the issue, whether or not they did anything about it. Those are marketing pieces. I’d like to see some 3rd-party testing of the lifetime of SSDs under a WAL load on a really busy database.

    If you believed Dell’s whitepapers, for example, every server they ship would be perfect.

  • Just an update on Fusion IO and some comments on Gregs rant. I have not had time to post this, but have wanted to for some time.

    Fusion IO does not use any memory based caching in front of the SSD. It also uses a capacitor for allowing all writes to drain in the event of a cold-power off scenario. I personally tested various combination’s of failure scenarios where the power was cut off during very high write workloads w/o any problems with data loss.

    Greg, In terms of ranting, that’s all fine. Just go rant to the people you have a beef with not me. I don’t give a rats ass about your issues with site design and how they force people to sign up. Also, if your still interested in FusionIO you need to do some more homework. I suggest getting one in your hands an actually using it, they are very nice about loaner hardware so you can make accurate judgments.

    The rest of you, thanks for taking the time to read and comment, it’s very much appreciated and enjoyed.

  • That’s not my blog, if it were I’d have tracked down and put in the full details there in the first place. I just linked to there because they had the only public commentary I was able to find about the write durability of the product. I keep hoping someone who has one of these drives will go through the diligence I’d like to see here and was curious how deep you’d gotten into that.

    The unfortunate situation here is that most vendors ship their drives in configurations that aren’t safe for database use, and unless specific steps are made to correct for that write tests against the drive are not giving real-world results in that context. Note that Peter and several others on that other blog had the same concern, I’m not the only one who’s stuck on this point. I really don’t care about devices that perform well but without good write guarantees, so the first question I ask is not “how fast is it?” but instead “if the drive loses power, can data that’s been written and returned a successful fsync be lost?” The information on that blog suggests the default Fusion IO configuration gives the wrong answer to that question, meaning databases put there can get corrupted after a power outage, but there’s a lot of handwaving rather than hard data there too. I’m not a very trusting person, which applies equally to vendors and to people being critical of products; I don’t know what’s the real story here yet. But when you mention lots of data being cached in RAM, that sure sounds scary for writes being able to survive an outage.

    Thanks for the clarification about their support site. From my perspective, making people go through registration hoops is still an odd decision that reflects on their business. A small company has enough hurdles to cross selling to enterprise customers already, they shouldn’t make life more difficult for buyers who want more technical information about the product. I suspect that’s driven by the fact that any access to the drivers is sitting behind a EULA agreement for legal reasons. I downloaded all their user guides, browsed the knowledgebase, and even poked through the driver source code briefly; wasn’t able to get any clarification on the write durability questions I still have about the device so far though.

  • Greg,

    The support site is free, there is no requirement to have bought anything. I have found Fusion IO to be an excellent vendor so far. Great support and help throughout our testing cycle. I think you are reading *way* too far into a company by looking at the (what you thought was closed) support site and making judgements. The Fusion I/O folks have been fantastic in terms of support.

    In terms of durability, the Fusion I/O has only metadata in DRAM and has capacitors for the actual data pages. Pages are written one at a time. I am not privy to the actual algorithms used. I spoke to the folks over there about this directly. You might want to give them a call to clarify your blog post.

    The Intel X25-E you mention in your blog is actually a SATA ‘drive’, where this is a card on the PCI-E bus. So the drivers don’t write down a storage chain with a faster ‘disk’. They write to the storage device directly. AKA: The Intel SSD is emulating a SATA drive.

  • Peter,

    My understanding is it’s simply NAND flash. The data is written to a temp space directly where the background process performs the final write.

  • Josh,

    This whitepaper addresses your concerns. I don’t have any reason to dispute it. The short answer is, likely longer than spinning disks. In our write duty cycle of this test, the whitepaper indicates for MLC flash about 5 years. But at our TPS rate who knows really. The good news is Fusion IO would start to take chips out of rotation at about 5 years as they fail and available free space would start to shrink. So it would not be a catastrophic event when they start to fail.

    Here is the link to the whitepaper:
    http://www.fusionio.com/PDFs/Whitepaper_Solidstatestorage2.pdf

  • Any idea whether these devices use write caches? Many other SSDs I have seen appear to run with write caches on that risks data integrity and durability during a power outage.

  • Kenny,

    What is the limited lifetime like in database terms? Like, how long could you use an SSD for a high-volume database (or for the WAL) before you had to replace it?

  • The link to the Fusion drivers you’ve got there points to the fio tool. I tried to find them myself, only to discover that the Fusion is one of those dubious companies that won’t let you see any of their support site unless you’ve already bought something from them. That pushes them way down on the list of vendors I’d consider dealing with, because it’s funny how companies who do that sort of thing always have the most things to hide about their product…

    Did you leave the Fusion device in its default, non-durable state? See http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/ for more about that if you didn’t fool with it yet. I find all these write-cached SSD numbers academically interesting, but not really helpful for database use where you need fsync to actually force a physical commit.

Leave A Comment

Your email address will not be published.