published on

2tb hitachi drives and zfs

I’ve built a new file server at work so we can start phasing out the old ultra-scsi arrays we have.

Since performance (and price) is valued more than actual space, my initial quote with iXsystems was for 26 1TB Drives ( HUA72201 ) with the assumption being that the lower platter density would provide better performance.

I was not super satisfied with the results when I actually setup the ZFS array, and the best performance numbers I got out of it was about 250MB/sec

Around the same time, I got a new 1U iXsystems server that had 4 2TB drives ( HUS724020 ) for one of our internal FTP servers. I setup a similar ZFS pool, and the difference in performance was shocking. With the 2TB drives configured with the 4K sector size, I easily got 440MB/sec with my standard dd tests, and 800MB/sec during a zpool scrub of test random data.

I contacted iXsystems, and they were willing to exchange the 1TB drives with 2TB drives (we of course paid the costs difference).

That, is great customer services. They continue to impress me as a company that really works hard for their customers. It works out to our advantage, we are not a large company that can throw its weight around, so its fantastic when we get to build a relationship like this.

Now, the big difference between the 1TB Hitachi drives and the 2TB was the 4096 vs 512 advanced sector size, and, the 2TB drives have

Hardware

From one on my favorite sysutil tools, sysinfo:
System information

Manufacturer: iXsystems
Product Name: iX4236-847E16

INFO: Run `dmidecode -t system` to see further information.

Base board information
Manufacturer: Supermicro
Product Name: H8SCM

INFO: Run `dmidecode -t baseboard` to see further information.

Graphic card information:
vendor='Matrox Graphics, Inc.'
device='MGA G200eW WPCM450'
INFO: Check pciconf(8) for more information.

PCI devices with no driver attached:
none0@pci0:0:20:0:  class=0x0c0500 card=0xba1115d9 chip=0x43851002 rev=0x3d hdr=0x00

BIOS information

Vendor: American Megatrends Inc.
Version: 3.0       
Release Date: 10/30/2012
BIOS Revision: 8.16

INFO: Run `dmidecode -t bios` to see further information.

CPU information

Machine class:  amd64
CPU Model:  AMD Opteron(tm) Processor 4386                 
No. of Cores:   8
Cores per CPU:  

CPU usage statistics:
CPU:  0.2% user,  0.0% nice,  2.5% system,  0.6% interrupt, 96.7% idle

RAM information

Memory information from dmidecode(8)
Maximum Capacity: 64 GB
Number Of Devices: 4

System memory summary
Total real memory available:    65498 MB
Logically used memory:          54143 MB
Logically available memory:     11355 MB

Swap information
Device          1K-blocks     Used    Avail Capacity
/dev/ada0p3       1905664     6.5M     1.8G     0%

Operating system information

Operating system release:   FreeBSD 9.1-RELEASE
OS architecture:            amd64
Currently booted kernel:    /boot/kernel/kernel

Currently loaded kernel modules (kldstat(8)):
aio.ko
if_lagg.ko
zfs.ko
opensolaris.ko

Bootloader settings
The /boot/loader.conf has the following contents:
vfs.zfs.txg.timeout="5"
vfs.zfs.arc_max="61140M"
vfs.zfs.prefetch_disable="0"
aio_load="YES"
vfs.zfs.write_limit_override="1073741824"
vfs.zfs.arc_meta_limit="32054968320"
hw.em.rxd=4096
hw.em.txd=4096

Storage information

Available hard drives:
    ada0: <InnoLite SATADOM D150QV 120319> ATA-8 SATA 2.x device
    ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
    ada0: Command Queueing enabled
    ada0: 30533MB (62533296 512 byte sectors: 16H 63S/T 16383C)
    da27: <ATA D2CSTK251M11-012 2.22> Fixed Direct Access SCSI-6 device 
    da27: 600.000MB/s transfers
    da27: Command Queueing enabled
    da27: 114473MB (234441648 512 byte sectors: 255H 63S/T 14593C)
    da26: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da26: 600.000MB/s transfers
    da26: Command Queueing enabled
    da26: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da25: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da25: 600.000MB/s transfers
    da25: Command Queueing enabled
    da25: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da24: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da24: 600.000MB/s transfers
    da24: Command Queueing enabled
    da24: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da23: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da23: 600.000MB/s transfers
    da23: Command Queueing enabled
    da23: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da22: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da22: 600.000MB/s transfers
    da22: Command Queueing enabled
    da22: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da21: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da21: 600.000MB/s transfers
    da21: Command Queueing enabled
    da21: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da20: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da20: 600.000MB/s transfers
    da20: Command Queueing enabled
    da20: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da19: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da19: 600.000MB/s transfers
    da19: Command Queueing enabled
    da19: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da18: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da18: 600.000MB/s transfers
    da18: Command Queueing enabled
    da18: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da17: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da17: 600.000MB/s transfers
    da17: Command Queueing enabled
    da17: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da16: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da16: 600.000MB/s transfers
    da16: Command Queueing enabled
    da16: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da15: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da15: 600.000MB/s transfers
    da15: Command Queueing enabled
    da15: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da14: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da14: 600.000MB/s transfers
    da14: Command Queueing enabled
    da14: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da13: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da13: 600.000MB/s transfers
    da13: Command Queueing enabled
    da13: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da12: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da12: 600.000MB/s transfers
    da12: Command Queueing enabled
    da12: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da11: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da11: 600.000MB/s transfers
    da11: Command Queueing enabled
    da11: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da10: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da10: 600.000MB/s transfers
    da10: Command Queueing enabled
    da10: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da9: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da9: 600.000MB/s transfers
    da9: Command Queueing enabled
    da9: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da8: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da8: 600.000MB/s transfers
    da8: Command Queueing enabled
    da8: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da7: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da7: 600.000MB/s transfers
    da7: Command Queueing enabled
    da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da6: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da6: 600.000MB/s transfers
    da6: Command Queueing enabled
    da6: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da5: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da5: 600.000MB/s transfers
    da5: Command Queueing enabled
    da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da4: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da4: 600.000MB/s transfers
    da4: Command Queueing enabled
    da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da3: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da3: 600.000MB/s transfers
    da3: Command Queueing enabled
    da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da2: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da2: 600.000MB/s transfers
    da2: Command Queueing enabled
    da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da1: <ATA Hitachi HUS72402 A3B0> Fixed Direct Access SCSI-6 device 
    da1: 600.000MB/s transfers
    da1: Command Queueing enabled
    da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
    da0: <LSI Logical Volume 3000> Fixed Direct Access SCSI-6 device 
    da0: 150.000MB/s transfers
    da0: Command Queueing enabled
    da0: 27656MB (56639488 512 byte sectors: 255H 63S/T 3525C)

Raid controllers:
    ahcich0:
    mps0:
    vendor='LSI Logic / Symbios Logic'
    device='SAS2004 PCI-Express Fusion-MPT SAS-2 [Spitfire]'

GEOM Setup

Using 4096 sectors is not a given on FreeBSD. You have to run a few geom related commands to get the offset right. Once set, and the pool is created, you can verify using “‘zdb”’.

Since I did this a few dozen times between the 1TB drives, 2TB drives, and two separate servers, I wrote a quick and dirty bash script to perform the offset:

#!/usr/bin/env bash

i=1

while [ $i -le 26 ]
do
    echo "creating GPT table"
    gpart create -s gpt da$i
    echo "creating partition"
    gpart add -t freebsd-zfs -l disk$i da$i
    echo "4k sector optimization"
    gnop create -S 4096 /dev/gpt/disk$i
    ((i++))
done

I had 26 drives, and two drives are reserved as spares for the pool.

ZFS Setup

zpool create data raidz /dev/gpt/disk1.nop /dev/gpt/disk2.nop /dev/gpt/disk3.nop /dev/gpt/disk4.nop
zpool add data raidz /dev/gpt/disk5.nop /dev/gpt/disk6.nop /dev/gpt/disk7.nop /dev/gpt/disk8.nop
zpool add data raidz /dev/gpt/disk9.nop /dev/gpt/disk10.nop /dev/gpt/disk11.nop /dev/gpt/disk12.nop
zpool add data raidz /dev/gpt/disk13.nop /dev/gpt/disk14.nop /dev/gpt/disk15.nop /dev/gpt/disk16.nop
zpool add data raidz /dev/gpt/disk17.nop /dev/gpt/disk18.nop /dev/gpt/disk19.nop /dev/gpt/disk20.nop
zpool add data raidz /dev/gpt/disk21.nop /dev/gpt/disk22.nop /dev/gpt/disk23.nop /dev/gpt/disk24.nop
zpool add data spare /dev/gpt/disk25.nop /dev/gpt/disk26.nop
zpool add data log /dev/gpt/log.nop
zpool add data cache /dev/gpt/cache.nop
zfs set checksum=fletcher4 data
zfs set compression=lzjb data
zfs set aclmode=passthrough data
zfs set aclinherit=passthrough data

You may notice I have a separate log and cache device. We purchased this server with 2 30GB SSD’s that are configured as a hardware RAID1 for the ZIL, and then we also have a 128GB SSD for the L2ARC.

This is a justified setup in my opinion. We export this data over both NFS and CIFS, and we deal with hundreds of thousands of images access from SCADA type systems as well as workstations. The point is, there are a log of people looking a lot of data, and sometimes its the same dataset which is where the L2ARC can come in handy.

Here is a simple dd run:

root@zfs-1:/data # dd if=/dev/zero of=/data/test.dat bs=128k count=10000
100000+0 records in
100000+0 records out
13107200000 bytes transferred in 30.188338 secs (434180911 bytes/sec)

Here was the output of the 1TB drive:

root@zfs-1:/data # dd if=/dev/zero of=/data/test.dat bs=128k count=10000
100000+0 records in
100000+0 records out
13107200000 bytes transferred in 66.972758 secs (195709425 bytes/sec)