Accessibility Links

  1. Skip to Main Navigation
  2. Skip to Main Content
BridgeHead Logo


Blog  –  12 February 2014

What is a Root Cause of Failures in Hospital Backup and Recovery?

  • LinkedIn
  • Google+
  • Digg
  • Del.icio.us

By Jamie Clifton

What is it about hospital applications that make them hard to backup and recover? In truth, many applications and not just hospital applications are difficult to manage. However, hospital applications are typically more critical because of the nature of the data that they create – and so protecting and managing that data becomes more of a challenge.

In this blog entry, we will examine the structure of a generic application, so as to understand how backup and recovery can be improved by matching that structure. Also, we identify the component of applications which most often are at the root of failures in backup and recovery protection.

In summary, hospitals working with generic backup and recovery technologies will inevitably encounter limitations when their applications grow too large. In particular, application objects present unique challenges to hospital IT teams in designing effective data protection strategies. Read this blog entry for details on what is available, and the trade-offs of each option.

The Structure of a Typical Application

While applications perform a huge variety of tasks, they are all generally architected in very similar ways. As the figure illustrates, an application typically found in a hospital consists of 3 core elements:

  1. The application services, e.g. the executables, software and infrastructure.
  2. A database OR a connection to a database (such as Microsoft SQL). The database files are generally highly volatile and typically require a high level of system performance.
  3. Objects – these can be files stored in a file system or other more defined containers, e.g. emails or DICOM (.DCM) files. They can be huge in volume and size, but tend to be static in nature.

Some applications will have larger database requirements, while others generate more objects. So, the make-up of applications found within a hospital may differ, but the base components remain the same.

Designing Application Protection to Match the Structure

Every organization needs to ensure their applications are sufficiently protected; in this regard they share the same disaster recovery (DR) requirements, namely:

  • Full recovery of vital applications if the entire facility were to fail.
  • Full recovery to a point in time of an application should that application, or it’s hardware, fail (RPO) .
  • Partial recovery of objects, within the application, if a user or application has made an error.

In all of these cases, organizations want recovery to occur as seamlessly and quickly as possible (RTO) .

If a hospital is struggling to protect an application effectively, i.e. not meeting the requirements in the list above, then it’s most frequently because the application objects have grown to a size or number where traditional backup methods no longer work to best effect.

What are the Problems a Large Application Causes in DR Terms?

Referring to the components of an application, the database and application system tend not to cause undue problems, in backup terms, as they both are likely to have well defined backup interfaces and are also generally smaller in physical size; having said that, protecting them is still a critical task.

In most cases, the objects that an application creates are much more likely to cause DR to fail. This is primarily due to:

  1. Size: the overall size of the objects associated with the application, which is often measured in TB.
  2. Volume: the number of distinct objects that make up the application.

Many applications found in hospitals, today, are sizeable and contain a large number of objects – and this creates challenges in terms of their protection. There are three main protection strategies that organizations rely on to safeguard their applications: 1) Image Level Protection; 2) Object Level Protection; and 3) Replication.

1) Image Level (Block Level) Protection

Image Level (or Block Level) Protection (not to be confused with protection of medical images) is where an area on a disk containing an application is protected block-by-block as opposed to file-by-file. In order to make this image usable on restore, i.e. make it application consistent , this protection must be ‘orchestrated’. During orchestration the application is instructed to prepare for a backup prior to the backup being carried out. This ensures all data in use is flushed to give a true and full representation of the application on the disk.

This type of protection gives a rapid restore of the entire application, again block-by-block.

Points For:

  • Efficient backup method for large disk volumes
  • Ideally suited for system level recovery

Points Against:

  • Poor for object level recovery, as individual objects are not recognized by the backup
  • Each image, containing the entire application, can be very large
  • To get a perfect image backup, the source data must be frozen at a point in time. This can affect the operation of the application.

2) Object Level Protection

The backup solution will scan all of the individual objects associated with an application to decide what to backup. This means that recovering those objects, at a granular level, is more easily achieved. This protection method normally requires a system of full and incremental backup cycles.
Incremental backups are used with large applications where, due to time constraints on backup, it is only feasible to look for and protect new or changed objects, knowing that the other objects have been previously protected by a full backup.

Points for:

  • Good for object recovery
  • Good for smaller systems

Points Against:

  • Struggles as the application grows, mainly because the full back up starts to exceed the backup window
  • Restore can be a complex procedure of recovering full and incremental backups in the right order.

3) Replication

Protection, via replication, is usually provided by a hardware solution and is, generally, used in hospital environments where high availability is required. For a number of reasons, this method, on its own, isn’t suitable for application recovery. Since the replicated data is normally only ‘crash consistent’ not ‘application consistent’  it would need repair before being usable by an application. Another flaw with replication is that erroneous data is copied as efficiently as good data. As a result, the usefulness of the replicated data, when it comes to recovering an application, is often reduced. As we will see later, it is only when a Point-in-Time, application consistent copy is created, and historical backup versions retained, that replication can be used successfully for application protection and recovery.

Points for:

  • Easy to implement as its likely to be part of a hospital’s existing storage assets, which are used for high availability
  • Fast operation

Points Against:

  • Replicates errors
  • Does not orchestrate application backup
  • Does not manage the recovery process
  • Expensive – the hospital will be restricted to using hardware that can support it
  • Vendor lock-in
  • Disk only, no copy of last resort on separate media, e.g. tape.
  • LinkedIn
  • Google+
  • Digg
  • Del.icio.us

Leave a comment