By Tim Kaschinske

Unstructured data comes in many formats] Unstructured data is data that is stored in files rather than databases.  It is unstructured in that it is generally not organized for easy access.  There may be an application database with links to the files on disk that allows easy access from the application, but otherwise, application access is much more difficult.

In healthcare environments, unstructured data is growing dramatically.  This trend began with the advent of PACS, which store radiology images in DICOM files.  Even for small healthcare environments, the unstructured data stored by PACS often amounts to several terabytes of data.  For large healthcare environments it often amounts to more than a petabyte.

It didn’t stop with PACS, however.  Scanned documents now take up a larger and larger portion of the storage maintained within healthcare.  Procedures that use video store that data in digital formats. And, reports are moving to PDF from text to take advantage of the ubiquitous viewing nature of PDF.

As a result, healthcare is experiencing an explosion of unstructured data, which requires storage and data management to cope with the sheer volume of data being produced.

Management Challenged by Multiple Data Silos

Unstructured data is not just randomly sitting out on storage devices.  In reality, it is created and managed by different applications within the healthcare environment.  Because these applications are often specific to different healthcare departments, each produces a silo of  independent unstructured data.  While storage for this unstructured application data can often be consolidated, there are still problems that exist because of these silos of data.

As mentioned the silo implementation of applications and their resulting unstructured data mirrors the physical world of different departments.  The applications and unstructured data are implemented to solve specific problems that exist within the department.  For example, the Radiology Information System (RIS) exists to solve problems related to the scheduling, fulfillment and reimbursement of radiology procedures.  The PACS exists to manage radiology images and eliminate the problems related to film in a radiology department.

These applications and resulting unstructured data are very good at solving those departmental problems, but they are not so good at solving problems that involve multiple departments.



Simply exchanging data between applications can be problematic if no well-defined method is defined that both applications will support.  The Hospital Information System (HIS) and Radiology Information System (RIS) are required to exchange patient information for the purpose of billing.  Both the HIS and RIS support the use of HL7 messaging to ensure that this information is exchanged properly.  Not all systems support HL7, however, and some that do only support specific HL7 messages related to work they do within the department.

A radiology and cardiology PACS can exchange images using DICOM, but that does not guarantee that a radiology PACS can display multi-frame cardiology images.  Applications that scan documents into images do not directly support DICOM.   Oncology often supports DICOM, but pathology typically does not.  Exchanging data between applications in this environment is problematic at best.

Searching for Patient Data Across Multiple Applications

Performing a search for patient data across multiple applications can also be problematic, if not impossible.  Imagine trying to find all of the data for a patient who has had a radiology procedure, multiple biopsies in pathology, and has a cancer treatment plan in oncology.  Performing a search like this across multiple applications often requires coordination between these applications.  As this does not exist in many healthcare environments, users must access each application individually to obtain the required data.  It also often requires that the user have knowledge that the data exists within the application in order for them to search for that data.







Similarly, performing analytics involving data from multiple applications can be equally problematic.  Often, the system performing the analytics will need an interface to each system in order to acquire the data to perform analytics.  Without this capability, tasks such as analyzing the effectiveness of a specific drug within a healthcare environment can be difficult at best.

Barriers to Patient Identification
Then there are the problems associated with patient identification in various applications.  With an acquisition of one healthcare provider organization by another, the applications at the new organization will have different local patient IDs to identify patients.  Some organizations have implemented a Master Patient Index (MPI) to help resolve this problem, but lately there have been acquisitions where each organization has its own MPI.  Having multiple MPI’s implemented within an organization reduces the effectiveness of the MPI, and further complicates the task of finding all the data for a patient across applications and organizations.







All of this makes it very difficult to assemble what is referred to as the “Longitudinal Patient Record,” which is basically all of the patient data across all applications in a healthcare environment.  It’s also been defined as all of the patient data across all applications across all healthcare environments.  While that is a worthy goal, implementing it within a single healthcare environment is a challenge that needs to be resolved before expanding to multiple healthcare environments.







Some solutions do exist.  Organizations such as IHE ( are working to define methods for applications to exchange data using standards that already exist within healthcare.  While it works well when all applications support the standards, the process of upgrading every application to support the standards can be difficult.  Portals also exist to organize data, but require interfaces to each application that has data to be accessed by the portal.

VNA’s have had success in consolidating the unstructured data stored in DICOM from multiple PACS, but they have struggled to grow beyond their roots in radiology and DICOM.  What is needed is a Common Clinical Repository that can be a single source for all unstructured data, providing data management and a single point of access for unstructured data.