On our SAN at work, we use ZFS across it from a single head node to provide storage virtualization services, which are then exported via NFS to the rest of our machines on the network. At first, performance was a dog, especially on writing many small files to disk. This was indicative of having synchronous disk I/O, where all data is written straight to disk immediately before the writing program can go on, instead of having the data written to a cache in ram (aka asynchronous disk I/O). Synch’ed disk IO can slow down disk speeds a lot, as software accessing the filesystem has to wait for the disk writes to finish before it can go on, and there’s a lot more overhead involved with small files than with big files.
Upon some study, I went and turned off two ZFS module options in /etc/system that had more than a twofold increase in speed, especially when dealing lots of small writes.
set zfs:zil_disable=1
set zfs:zfs_noforcecache=1
These two settings made untarring the Linux source code (dozens of megabytes of small files) go from over 15 minutes for just linux/Documentation, to about 25 seconds. While on the SAN head node it’s under 15 seconds for the same operation, there’s still NFS overhead to worry about.
I also know that using these settings really annoy the ZFS developers, as they rightly should, as it mucks about with internals that effectively neuter ZFS’s very ingenious and effective data protection schemes. One of them, the noforcecache option, is mitigated by our disk controller hardware having battery backups for their on-hardware disk cache. The zil_disable is only backed by the SAN running off of UPS systems, however. Hopefully the ZFS devs can make the ZIL (the ZFS Intent Log) work correctly with NFS in the future without essentially forcing sync’ed IO.


2 Comments
These days, an unpacked kernel is hundreds of megabytes of small files. The 2.6.21.5 that comes with slack clocks in a 285MB. For a really interesting comparison, you should compare NFS’s behavior with SMB.
hi again peawee !
i just seen a demo of “the next days in storage”, from isilon IQ clustered storage.
this is REALLY increadible, magic, and so on.
the keyword here is not ZFS but OneFS.
it handles everything.
by using infiniband + differents nodes, the read/write are really using the 4GB RAM of each nodes as cache.
the minimum setup is 3 nodes, so 12GB of “disk cache”.
—–
for the SAS expander problem with SATA drives.
I just read that seagate ES.2 1TB drive WILL have a SAS version.
so magic happen
Post a Comment