… otherwise known as when is a sync() not a sync()?
Recently I ran some performance tests on disk I/O, from both Java and C-based applications. The nature of the applications is such that they require transactional logging for reliability, and therefore need a guarantee that data has been written to disk. After running some simple write tests, I noticed an order of magnitude difference in performance between a couple of machines. This got me thinking about the impact of disk write synchronization, and what kind of differences would lead to this delta in performance.
Java makes explicit synching a relatively simple task. For the standard I/O library, you need to obtain a FileDescriptor instance, and from there you can invoke sync(). If you’re using NIO, subclasses of FileChannel and the MappedByteBuffer expose a force() method. These methods map down to a Operating System call (e.g. fsync()) which will force all outstanding I/O for the file to be written to disk.
Or so, based upon the documentation, you would be led to believe.
However, there’s one little thing missing from this description. It is the fact that modern hard drives commonly have an on-disk write cache. This helpful little cache, when enabled, provides a considerable performance gain for disk writes. It also helps most hard drive manufacturers boast some pretty impressive performance figures, but I’ll leave that point to another discussion. The downside to this cache is that in the case of a system failure (power, OS crash, etc), there’s a fair chance that there’ll be some data in the cache which is not on the disk. For a transaction processing system, this could be fatal – there’s a chance that data may have been lost. Now, I must mention that there are some drives that have battery-backed write cache, and higher end RAID controllers also have equivalent battery-backed stores.
Back to the performance tests: What I haven’t mentioned so far is that there is a good chance that the write cache has been enabled by default. How do you find out if it is enabled? On Windows, it is as simple as opening up Device Manager, drilling down to your Hard Drive, bringing up its properties and selecting the ‘Disk Properties’ tab. You should see a checkbox indicating whether the write cache is enabled. If it is not enabled, you’ll get the following helpful message when you enable it:
“By enabling write caching, file system corruption and/or data loss could occur if the machine experiences a power, device or system failure and cannot be shutdown properly.”
Some good advice from Microsoft!
If you’re running Linux, the hdparm utility allows to enable or disable write caching for IDE-based drives. Be very careful with hdparm, as it can do a lot of nasty stuff to your hard drive.
/sbin/hdparm -W 0 /dev/hda 0 Disable write caching
/sbin/hdparm -W 1 /dev/hda 1 Enable write caching
For my Linux tests, changing this setting instantly clarified the difference in performance. On one Linux system, configured with a 40GB 5400RPM IDE drive, the write cache had been enabled by default. This system had shown 10x the performance of the other system, which was configured with an 18GB 10000RPM SCSI drive. Disabling the write cache on the 40GB drive brought the performance back down below that of the 18GB drive, as expected.
Are there any alternatives to write caching? There’s a concept called Tagged Command Queuing, which uses intelligent algorithms to map disk commands to the rotational and seeking characteristics of the drive. This is commonly supported across SCSI drives, and was introduced as part of the ATA-4 IDE spec. It is also supported within newer Serial-ATA devices. This requires support from the disk I/O drivers, and I’ve yet to investigate implementations of this feature.
What’s the key take-away from this post? Well, writing reliable code is only half the challenge. The hardware platform and its constituent devices need to be carefully tuned to ensure that integrity is maintained and optimal reliability is achieved.
Check out the following links for more reading on this topic:
Microsoft Support How To: Manually Enable/Disable Disk Write Caching
Apple: Technote discussing Write Cache Flushing
EXT3 and disk write back cache
Experiments on Disk Write Back Caches
IDE write ordering
Questions regarding Journalling-FSes and w-cache recording