by John Macy
Technical Solutions Expert, Healthcare Data Management

In 2010, Robin Harris wrote that “RAID 6 will stop working in 2019”. This doesn’t come as a huge surprise: Harris had previously commented that RAID 5 would stop working in 2009. And, sure enough, RAID 5 ceased to be recommended by storage vendors who, instead, suggested RAID 6 as a better alternative.

Harris’ theory on RAID 5 basically determined that, at the then-current data growth rates and subsequent expansions in disk drive sizes, the level of risk of data loss was rising. He felt that RAID 5 simply wasn’t good enough. In short, with RAID 5 on a 12TB drive, there is a 62% certainty of losing data during a rebuild of the drive following a failure. 62% certainty of data loss is not protection – particularly for a hospital where patient care could be impacted.

This new threat level comes from the fact that drives have expanded in size to hold more data, while the error rates have held relatively steady: the same error rate, but multiplied by larger amounts of data, equals a higher risk that data will be lost. Think about the exponential growth of data within a hospital, such as the surge in medical images, the move towards paperless environments and the introduction of the EHR, and this threat becomes hard to ignore.

By contrast, RAID 6 enables data stored in the disk array to survive a two drive failure. So, if a second drive fails during rebuild from the failure of the first disk, you can still rebuild the data. But, by 2019, the amount of data being stored will increase the risk that data will be lost during a rebuild simply because a third drive will fail.

So, what’s next? Some, like Adam Leventhal, have called for triple-parity RAID. In his late 2009 article, Leventhal stated: “Perhaps even more ominously, in a few years, reconstruction will take so long as to effectively strip away a level of redundancy” effectively making it impossible to add new parity levels. At some point, the approach itself will break down and stop working due to the large amounts of data under management.

But, none of this comes for free: the cost of RAID 6 is already consuming more of the disk capacity that you thought you were obtaining to store your expanding healthcare data. RAID 6 also takes longer to rebuild and recover than previous RAID 5 systems. Furthermore, if you think that data de-duplication makes this better, think again. De-duplication just ups the ante on the stored data. For example, if you are using a 12TB disk system and de-duping data at a 10:1 ratio, then your data at risk on the storage device is effectively 120TB rather than 12TB.

So, what is health IT to do? Here are some thoughts about how to approach risk mitigation in relation to healthcare data management:

  1. RAID at any level is not a replacement for backup protection. Backup offers many benefits that RAID cannot provide, including: safeguarding against catastrophic failures of multiple drives; offering a comprehensive disaster recovery solution; and enabling long-term data retention. Well-designed backup systems are also designed to better provide rapid recovery of data following data loss, corruption or failures. When used in combination with archive technologies that remove stale and infrequently used data from storage systems, backup protection can help to alleviate the pressure from disk storage systems – effectively extending their life by enabling more data to be stored.
  2. Spread out your risk by spreading out your data among multiple storage types. The strategy of putting all of your eggs in one basket is never good. That’s as true for data as it is for other critical resources. Spreading your data among tiers of disk, including tier 1 high-performance storage, SATA drives and even tape, can significantly lower your risk of data loss. It is extremely unlikely that a failure, affecting all three or more types of media at the same time, will occur.
  3. No one approach is a silver bullet. If your disk vendor suggests that you can solve your problem with more disk, then perhaps it is because he or she has disk to sell. Taking a balanced approach that combines strategies and media types is more likely to lower your risk than trying to base your solution on a single strategy or approach.