Design and implementation of a health document

Exchanging and integrating medical information in the healthcare domain is a challenge. Indeed, the diversity of databases and the different representations of information sources make this exchange a very difficult task. Divers standards, (e.g. HL7: Health Level Seven; DICOM: Digital Imaging and Communication in Medicine), are created to enable the exchange and make health information systems interoperable. However, applying standardization requires changing the structure of existing healthcare systems. Our main purpose is to create a health document for exchanging health information between heterogeneous systems without applying changes on the internal structure of systems. The document uses the XML language to allow a structured and flexible exchange of healthcare data. The proposed health document can make the exchange of healthcare data among heterogeneous health information systems simpler and efficient. This document addresses the problem of interoperability between health information systems. The paper summarizes standards used to support interoperability in healthcare domain and propose a health document to enable the exchange of medical information across heterogeneous and distributed health information systems without requirements or adjustment on their systems.


INTRODUCTION
The exchange of patient"s medical information in distributed Health Information Systems (HIS) [5] is of great importance for mastering the patient care process. Many standards are created to enable this exchange such as HL7 [1], openEHR [4] and DICOM [2]. However, applying standardization requires changing the structure of existing patient"s record¬¬s.
Several countries have created and developed their own Electronic Health Record (EHR) [3] for example: France "DMP", Canada "Health Infoway", Taiwan "TMT", UK "NHS", and Australia "NHETA"; and they are engaged to standardize a national EHR. These efforts did not attempt the objective.
In this context, we propose a health document used as mediation between distributed HIS. Our objective is to provide a health document mediator for exchanging patient health information across healthcare systems using XML language to encode the document. The structure of the health document is simple and can be read by a non-expert. The document does not impose any training or adjustment on the system architecture. The use of this document will reduce the number of transformations between HIS from [n×(n-1)/2] to (n) transformations.

BACKGROUD
Many efforts have been made to integrate heterogeneous systems in healthcare domain. Gaynor et al [20] presented a framework for specifying and analyzing the interoperability of medical system. Within this framework, an Interoperability Matrix and its associated Interoperability Flow Graph represent different types of interoperability between related applications. This representation permit a visual view of interoperability attributes and promotes the application of graph theory algorithms to analyze resource requirements for infrastructure-enabling interoperability between related applications. Lopez et al [15] proposed a framework as a set of principles and guidelines as well as methodologies and techniques for realizing semantic interoperability in Health Information Systems using Rational Unified Process (RUP) and formal software processes engineering methods. To achieve this objective, he analyzed approaches for information systems architecture and he harmonized them towards the framework. Sahay et al [21] provides a semi-automatic ontology alignment to resolve the problem of heterogeneity between different versions of HL7 ontologies. They proposed the PPEPR ontology building methodology for the HL7 standard. They have tried three ontology matching systems for alignments between different ontologised versions of the HL7, but they conclude that simple manual mapping worked better for them. These projects are limited to a specific use. Therefore, the need to build a new solution for general use and without any requirement or adjustment is an urgent priority. Our main objective is to break the isolation between HIS and communicate them through a simple and easy solution without any requirements or adjustment.

Standards
Standards for healthcare were created by a variety of organizations for various types or categories of interoperability in HIS. These standards allow HIS to communicate in the same way across system. Below is a summary of key standards at the syntactic and semantic level.

Logical Observation Identifiers Names and Codes (LOINC) [6]
-is a universal standard for identifying individual laboratory results and clinical observations. It facilitates the exchange of test results for clinical care, healthcare management, and research.

International Statistical Classification of Diseases (ICDx) [7]
-is an international standard for epidemiology, health management, and clinical purposes. Used to identify diseases, signs, symptoms, abnormal findings, complaints, and social circumstances for billing purposes list by the World Health Organization (WHO) 1 .

 Health Level 7 (HL7) [1]
-is a standard for the exchange of data between healthcare applications. There are two major HL7 Versions, HL7 V2.x and HL7 V3. The Version 3 introduces a new approach to clinical information exchange: the Clinical Document Architecture (CDA), the Reference Information Model (RIM), and Clinical Context Object Workgroup (CCOW). However, even with the innovations of the version 3 it has a seen slow adoption. In addition, the version 2.
x had interoperability problems because of the variety of implementations by healthcare providers.

National Council for Prescription Drug Programs (NCPDP) [9]
-is a standard for transmitting prescription requests and fulfillment from pharmacies to payers.

Clinical Context Object Workgroup (CCOW) [11]
is a standard for providing comprehensive view and single sign-on capability across systems without integrating databases. CCOW specify technology-neutral architectures, component interfaces, and data definitions as well as an array of interoperable technology-specific mappings of these architectures, interfaces, and definitions. It is an independent vendor developed by the HL7 organization.

Continuity of Care Record (CCR) [14]
is a patient health record standard containing various information sections about patient (such as diagnosis and problem list, insurance information, medications, patient A u g u s t 0 6 , 2 0 1 5 demographics, etc.). It is used to transmit information across health professionals and to permit an easy creation by a physician using an electronic health record software program.

OpenEHR [4]
-is an open international standard specification in health informatics describing the health data in EHRs.
 Clinical Document Architecture CDA [13] -is a document markup standard that specifies the structure and semantics of clinical documents.

Digital Imaging and Communications in Medicine (DICOM) [2]
-is an international standard for the communication of medical images in radiology, cardiology, dentistry, and pathology. Developed by the DICOM Standards Committee and under the umbrella of National Electrical Manufacturers Association (NEMA).

Clinical of Care Document (CCD) [10]
-is a standard for specifying the encoding, structure, and semantics of a patient summary clinical document for exchange. CCD allows physicians to send electronic medical information to other providers without loss of meaning and enabling improvement of patient care. CCD is a US version of CDA.

Classification of healthcare standards
In healthcare domain, a number of standards were created to address the requirements of interoperability problems at both semantic and syntactic layer. These standards are organized into six categories [12]. Table 1 provides a complete classification of these standards.


Messaging standardsoutline the structure, content and data requirements of electronic messages to enable the effective and accurate sharing of information.
 Terminology standardsprovide specific codes for terminologies and classifications for clinical concepts such as diseases, allergies and medications. Terminology systems assign a unique code or value to a specific disease or entity.
 Document Standardsindicate the type of information included in a document and also the location of the information.
 Conceptual standardsallow the transmission of information between systems without losing meaning and context.
 Application standardsdetermine the implementation rules for software systems to interact with each other. For example, application standards using single sign-on allow users to logs into multiple information systems within the same environment.
 Architecture standardsdefine a generic model for health information systems. They allow the integration of health information systems by providing guidance to aid the planning and design of new systems and also the integration of existing systems.

Harmonization between standards
Harmonization between standards is a positive step in ensuring that health information systems are, or have the potential to be, interoperable with each other and share health information. This increases the potential for any given system to be compliant with an increasing number of interoperability standards, which could reduce the need for procurement of new systems to facilitate interoperability. Current areas of standards harmonization include data types [12]: the HL7 CDA and the OpenEHR utilizes the CEN 13606 Reference model.

Canonical model
Our document is based on the canonical triplet [8] presented in Figure 1.  Medical Information: any information produced or observed in a health context, considered useful and potentially reusable for patient, hospital, or researchers.
 Post Production of Medical Information: combining human resources and materials responsible for the production of medical information.
 Pathological Case: is the association between patient and his historical records  Medical Activity: represent the care plan realized by the PPMI.

Global view of the care process across the canonical model
To be closer to the canonical representation, the practitioner should consider the care process as a tree. The top of the tree is the input activity, reason why the patient is hospitalized; the other nodes represent other various activities included in the process. Figure 2 illustrate an example of a canonical representation in the form of a tree.

Fig 2: Example of a medical activity
We introduced the concept of time in our canonical representation to differentiate between activities requiring chronological order and those not depending on time. Figure 3 shows the manner of expressing a tree with time order and without time order. In Figure 3A, the activity "a" consists of the activity "b" and "c". On the other side in Figure 3B, the start of the activity "b" outstrips the start of the activity "c". And this helps the reader to better understand the succession of steps. This is advantage for our canonical representation. Being an expert or not, user can now understand exactly the meaning of the representation.
Also, we find this structural concept of medical activity in the notion of PPMI and team. The Figure 4 illustrates an example of a surgery team.

Fig 4: Example of a surgery team
The canonical representation allows to marks the point of birth of each Medical Information (MI). By associating it with the care staff (team) that produced this MI and the material used. Also the traceability of medical information is systematic; more we descend in the tree, more there are details on the MI. Furthermore, the global aspect of the tree of the MI informs about the care strategy to trait this pathological case.
At the level of interoperability, the global view of the care activity through the canonical model allows to qualify all exchanged medical information. It provides information about the degree of data reliability: we can know who and how.

Constraint with the canonical model Database representation
Unlike the traditional approach, the production of a new model issued from the canonical model is no longer a problem. The problem arises, however, when you want to implement the relationship between activities in an existing operational model.
Elloub et al [23] reports that the best solution is to add a layer "parent" additional to the existing model. For example, for Figure 2, we associate the adjacency list representation as shown in the table 2. The author also presents other forms of external representations based on the nested set or path enumeration model.

Optimization Appearance/ model choice
In the medical context, some queries are more common than others. It is then to find the model whose performance area coincides best with the requirements of queries in medical context.

Visualization of results
It is clear to us that reading the graphic Figure is much simpler than that the adjacency list. The problem is that the results of queries on standard databases are rather in the tabular form. Our team works also on graphical interfaces for visualization of data tree and manipulation (add, delete) of new nodes.

THE HEALTH DOCUMENT STRUCTURE
The structure of our health document is based on the canonical model. We have developed a class diagram based on this canonical model [8,19]. We extend this diagram by taking into account new classes (Document, ReferenceDocument, and Role) and we propose a new one, presented in Figure 5. Traditional information systems reach a functional interoperability but not semantic. This means that the information arrives at its destination, but it is not understood. To reach semantic interoperability, terminological references are indispensable [16].Our health document includes these terminological standards such as SNOMED CT [18] and ICDx. Therefore, data encoded with distinct terminologies can be related to each other by the means of semantic relationships and provides better results. Our main objective is to develop a document for exchanging medical information between heterogeneous and distributed HI S.  Reference Activity contains description of methods of execution and the site on which it takes place;  Medical Action is the simplest action of Medical Activity such as a question during a medical check;  Document provides information about Medical Activity such as prescriptions, analyses…etc. ;  Reference Document contains models for documents like outputs letters, reports...etc.;  Patient is an individual awaiting or under medical care and treatment;  Pathology contains historical diseases related to the patient.
 Actor describes the medical personnel;  Team is a group of Actors that intervene in a Medical Activity.
 Role describes the participation of an actor in a team.
 Material Post is the equipment used by a Team to practice a Medical Activity.

THE PROCESS OF TRANSFORMATION
The principle of transformation consists to convert the XML schema of the HIS to the health document. Figure 6 illustrates the process of transformation.

The Matcher
The matcher finds semantic relationships between the input schema and the schema of our document using the schema matching technique. Schema matching is the task to find correspondences between elements of two schemas [24]. After that, the transformer component transforms the message to a common canonical format. This transformation requires the use of auxiliary information: thesaurus.
Two matching approaches are used in the matcher configuration: terminological approach and structural approach [24].
 Terminological approach: measure similarity of terms by comparing the lists of words of which the terms are composed. Our system supports string matching algorithms, Edit distance and n-gram, and a linguistic algorithm. N-gram is a set of consecutive characters extracted from a word. Edit distance is the number of deletions, insertions required to transform one string to another one. The linguistic algorithm computes similarity by using thesaurus. Our system supports two types of thesaurus: common thesauri (WordNet [22]) and specific thesauri (UMLS [17]). The use of biomedical thesauri yields better results.
 Structural approach: elements are similar if they have similar relationships or paths. We used paths algorithm.
These matchers are used in combination. Each matcher calculates the similarity value between elements of two schemas. The similarity value varies between 0 and 1(value 1 means forte similarity and value 0 indicates forte dissimilarity). We used the weighted method to aggregate similarity values. It determines a weighted sum of similarity values basing on individual matchers similarities. In our experiments we use the weights 0.38, 0.38, and 0.24 for n-gram, edit distance and linguistic algorithm, respectively. To compute the similarity we use the formula bellow: sim(C1, C2) = Final similarity is calculated using an average sum between terminological and structural similarity.

XSD Matcher
Health document Transformer HIS Thesaurus A u g u s t 0 6 , 2 0 1 5

The Transformer
A set of mapping elements is generated from the Matcher component. Basing on these mappings, the Transformer translates the input schema to the health document using XSLT language.

RESULTS AND DISCUSSION
To demonstrate the simplicity of our document, we chose to work with HL7 standard. The reason for this choice is that HL7 is the most popular standard in healthcare domain and the licensing permits his use. Figure 7A and 8A shows an example of HL7 message in both versions. Messages used for both versions (HL7 v2 and v3) contain the same data.

Fig 7: HL7 v2 message before and after transformation
It is clear that we should be an expert (in medical domain) to understand HL7 version 2 message; even if it is the most widely used standard in the world today.
Real HL7 version 2 messages are in EDI format (Electronic Data Interchange); they should first be converted to XML file. To do this, we used HPAI [25], an open-source tool, for converting HL7 version 2 messages to XML documents.
As illustrated in Figure 7A, the name of elements is not comprehensible. For this, we used the code table in [26] to develop elements in their detailed form. Then, the matching process begins. The terminological matchers are fist used before running the structural matchers. The Matcher component uses Java language to calculate similarities between elements. Then, using final mappings generated by the Matcher, the Transformer uses XSLT language to transform the inputted schema to the health document. Figure 7B and 8B shows the messages after transformation.  Our health document permits a standardized communication between HIS. The use of XML language for the document encoding is an advantage because it helps to standardize the content of the document. The health document model contains five data: medical activity data, patient data, patient history data, data requirements and medical records data (e.g., prescriptions, analyses). So, we covered all patient medical data required for an exchange. Our transformation architecture is simple, requiring no changes or adjustments on the systems and helps to integrate heterogeneous HIS.
As result, the messages can now easily be read by human eye and the content is understandable. Therefore, the use of the health document facilitates the exchange of medical data across healthcare systems.

CONCLUSION
In this paper, we briefly summarized standards used to support interoperability in healthcare domain. It is clear that the number of standards in health domain is large. However, it is extremely difficult to except the use of a single worldwide standard without further integration, and development. In this context, we propose a health document to exchange meaningful medical data between heterogeneous and distributed health information systems.
The use of the canonical representation allows to clearly visualizing results in the form of a tree. And this permits to understand exactly the tracking of an activity medical (its beginning and its end).
The proposed health document is simple and for general use. It does not impose any requirement or adjustment on the system architecture. Also, it is open, flexible, and platform independent.