One health surRveillance Initiative On harmonization of data collection and interpretatioN (ORION)
WP3 – One Health Surveillance Harmonisation Infrastructure
Figure 1: Envisioned architecture of the One Health surveillance harmonisation tools to be delivered by ORION’s WP3. Continue reading for details.
The ORION project
The ORION project, launched in 2018, aims at establishing and strengthening inter-institutional collaboration and transdisciplinary knowledge transfer in the area of surveillance data integration and interpretation, along the One Health (OH) objective of improving health and well-being.
Through three main work packages (WP), ORION’s specific goals can be summarized as the delivery of three main resources:
- a“OH Surveillance Codex” (WP1) - a high level framework for harmonised, cross-sectional description and categorisation of surveillance data covering all surveillance phases and all knowledge types;
- a “OHS Knowledge Hub” (WP2) - a cross-domain inventory of currently available data sources, methods / algorithms / tools, that support OH surveillance data generation, data analysis, modelling and decision support; and
- “OHS Infrastructural Resources” (WP3) – that are practical, infrastructural resources forming the basis for successful harmonisation and integration of surveillance data and methods.
Developed solutions will be exemplified and validated during several One Health pilots, which will support the operationalization and implementation of OH surveillance solutions on a national level and provide crucial feedback for future development and dissemination actions.
Trainings and workshops will be offered (WP4) to support and integrate with other EJP projects in their data harmonisation efforts.
WP3 – Harmonisation resources
WP3 was planned to identify infrastructural resources which can support the harmonization and integration of existing methods regarding collection and analysis of surveillance data. During the first three months of work, we performed a quick inventory of the surveillance data flow between animal, human and food surveillance agencies within ORION partner countries, and the practices of mandatory reporting to the European Centre for Disease Control (ECDC) and the European Food Safety Agency (EFSA). It became apparent that surveillance data interoperability is very low, not only between agencies, but even within agencies, along the surveillance pathway. Surveillance data is usually stored in many different formats, across many different databases, which are not interoperable. As a result:
- A lot of human effort is needed, every year, for each agency to collate all the data available within the institution, and produce reports to send to ECDC/EFSA.
- After collation, these data need to be coded into standard data models, which also requires extra human effort.
- Steps 1 and 2 above are prone to error, cause time delays in data sharing, and most concerning, they do not result in FAIR data (findable, accessible, interoperable and reusable). Data are standardised according to given terminologies, but they are still not machine interpretable.
- The effort spent on steps 1 and 2 above does not result in direct benefits in terms of data usability to the institutions that own the data.
As a result, we decided already on month 41 to focus this work package on the specific issue of data interoperability. Interoperability is used here to mean “the ability of different information technology systems and software applications to communicate, exchange data, and use the information that has been exchanged”2.
The chosen solution to achieve this goal was the development of an ontological framework for health surveillance. “An ontology defines a common vocabulary for researchers who need to share information in a domain. It includes machine-interpretable definitions of basic concepts in the domain and relations among them”3.
Several data standards and terminologies already exist in health and epidemiology, including extensive catalogues of standards for reporting data to European Agencies. Ontologies can incorporate these existing resources and re-use all their knowledge. But we move beyond the listing of concepts and include also “relationships” between concepts (semantics), creating a knowledge model for a specific domain – here health surveillance. A machine-interpretable version of the domain knowledge offers several advantages, in particular:
- Use of automated reasoners to make inferences and detect errors, both in the ontology itself, or in the data.
- Knowledge growth and updates. Growing and updating knowledge is easier, and can be automated, when concepts structure and relationships are explicit.
- Reuse. Ontologies are meant to model specific pieces of knowledge, in a way that allows linking to complementary pieces. In such a multi-disciplinary field as epidemiology, knowledge reuse can be highly beneficial. There already exist, for instance, ontologies of anatomy, infectious diseases, and clinical signs. Creating an ontological framework for health surveillance will be, to a large extent, an exercise of reusing and integrating existing resources. This ensures that the knowledge is updated by a community of experts, rather than dependent on efforts from a specific project.
- Interoperability. Vocabularies allow humans to understand each other and agree on what things mean. Ontologies allow software to talk to each other. This interoperability means that we would be able to share tools, while data themselves continue to be private.
You can find an introductory text on ontologies in the dedicated chapter.
Two quick examples can demonstrate the power of “semantics” and the result in terms of “interoperability”.
The first is the “knowledge graph” used by google. Try googling a book you like, or a famous person. Try googling “Stephen Hawking”. Google’s knowledge graph on the bottom right of your screen will display things like their date of birth, family members, relevant work, etc. This is because “Stephen Hawking” is being recognized as not just a string of text, which Google would search over the entire web, but as this specific concept called “person”. A knowledge model exists where is has been modelled what are the characteristics of a person: they have professions, they are connected to other persons through relationships such as spouse, parent, child. The machine “understands” you are searching for a person, rather than just looking the text string “Stephen Hawking” all over the web.
The second example is the data about air travel all around the world. Think about how “Expedia.com” is able to retrieve and compare data from a great number of flight providers – this is not backed by data sharing agreements, rather by their adherence to (the same) schema4. It is also thanks to schema mark-up that your calendar can automatically recognize and add to your diary that flight event, once you get a confirmation email with your flight details.
The examples above emphasize the main principles we want to address:
- Build an ontological framework for health surveillance that allows computers to understand and reason with current data standards in the same way that humans do, so that we can reduce error, delays, and resources needed to code data;
- Improve usability of data inside the institutions who own and/or use the data, as well as the potential for reuse by external stakeholders and for research and discovery.
Note that the issue of data sharing is not addressed by this project. The goal is to make data FAIR (findable, accessible, interoperable and reusable5) for those who have access to the data.
The adoption of this framework in practice is schematized in Figure 1.
- YEAR 1: collate and develop ontological resources for health surveillance;
- YEAR 2: create the tools needed and implement the framework in practice through a “one health surveillance pilot”; and
- YEAR 3: make the framework available and documented for adoption by other countries.
ORION-WP3: PLAN FOR YEAR 1
The Swedish National Veterinary Institute (SVA) is leading this WP. The “One health surveillance pilot” that will test the application of the proposed framework in practice will be a joint effort of SVA, the Public Health Agency of Sweden (FoHM), EFSA and ECDC. SVA and FoHM have chosen the surveillance against Campylobacter spp. As the focus of the pilot. Other countries involved in ORION may choose to take part in the pilot.
This project will build on the existing AHSO initiative to develop an ontological framework for animal health surveillance, extending it into the “One Health” domain.
As for all WPs in ORION, year 1 will be dedicated to a “requirement analysis”. In WP3 this means an inventory of the requirements to develop the ontological framework, and the requirements in terms of technical infrastructure to adopt it in practice.
Several parallel tasks are being addressed on year 1:
- Ontological content development: We are defining which “content” should be on the ontology through three main parallel inventories:
- Inventory of existing ontologies, which can be reused.
- Inventory of existing terminologies/vocabularies/data standards, in particular those which we must hold compatibility to, such as the reporting guidelines form EFSA and ECDC.
- Inventory of data structures and standards used within the participating institutions.
- Inventory of relevant projects, finished or ongoing, which results can contribute to this framework.
- Inventory of harmonisation resources: a literature review on the subject of data interoperability.
- Mapping of the data flow along the surveillance chain within the countries involved in WP3, and from these countries to EFSA and ECDC (Sweden, Denmark, The Netherlands, Germany, United Kingdom, Norway and Belgium).
- Requirement analysis for the technical development, which includes the following sub-tasks:
- Mapping of the technical infrastructure currently used in participating institutions, such as the database systems the developed tools will have to be compatible with.
- Inventory of available tools that can provide the require functionality, namely: translate between existing databases and desired output formats for reporting to EFSA and ECDC, using the ontology developed.
Our preliminary work with all the parallel tasks listed above has shown that “surveillance data” is usually handled in three levels: individual samples collected from animals or people; specific cases or observations, which may involve the collection of multiple samples; and data about the surveillance programme, that is, data about the activities performed by the institution, in a given year, against a particular hazard.
Figure 2 gives a very simplified overview of how these levels of information can be modelled within AHSO, exemplifying some catalogues and data models from EFSA which refer to similar concepts. AHSO will be extended to incorporate these catalogues, reusing their knowledge and being entirely compatible with them. Concepts on the right side of Figure 2 can be attributed to any level of information – for instance a surveillance programme can de defined to target one specific animal species, or this information can be inferred from animal species recorded in the samples that are part of that surveillance programme.
Figure 2. Simplified scheme of the three main levels of information for surveillance data: sample, observation and observation context. An observation context can contain several observations, and observations can contain several samples. Information can be passed up or down these levels – for instance the “hazard” can be declared for the observation context or recorded for each individual sample. The coloured boxes are examples of EFSA catalogues which AHSO will incorporate to model these levels of information.
The job of collating information to report to EFSA/ECDC seems to mainly constitute of collating sample/observations data from specific databases, and then manually inputting data about the official surveillance programme.
SVA has been involved with the construction of a framework for documentation of surveillance activities in animal health, as part of the RISKUR6 project finished in 2015. It is currently being discussed whether that tool could be extended to “One Health surveillance”, so that it could be used to document all the information needed at the surveillance programme level. We are currently working with the assumption that such a tool would exist, so that two main sources of surveillance data would have to be considered in our framework:
the automated extraction and collation of sample data from existing databases;
plus data about the surveillance programmes from the documentation tool.
No changes would be required from surveillance agencies to the way they currently store data. The surveillance documentation tool would need to be used only if, currently, this information is not captured anywhere else before the time to report to EFSA/ECDC, as it seems to be the case in the countries evaluated.
Individual “translation dictionaries” will be set up, within each involved agency, to mark up their existing data according to the structure provided by the surveillance ontology. This process would only need to be done once, and whenever the underlying data structure changes.
As shown in Figure 1, the framework developed by ORION-WP3, powered by a health surveillance ontology, will then use these translating rules to automatically extract, collate, and translate all needed data. Compared to the existing methods of data reporting, the main advantages would be:
- Automating the work of collating and translating data, reducing errors, delays, and human resources.
- Producing data that are compatible with current required data standards, but are also compatible with a knowledge model that makes those data machine-interpretable, and therefore more readily reusable for inference, reasoning, and knowledge discovery. These data are therefore highly reusable to the institution who produced the data, as well as to the stakeholders receiving the data.
Further work and discussions are needed to establish the exact workflow within the framework. For instance, since data will be collated and translated automatically, would countries continue to report to ECDC/EFSA yearly, or would we create an API where the stakeholders could query data at any time? Would data be actively pushed or always pulled by stakeholders? All these options are implementable in the framework shown in Figure 1, and through the one health pilot we will be able to test improvements to the current flows.
ORION-WP3: How to get involved
1. ORION’s “requirement analysis workshop”, held in Berlin on April 2018. ↩
2. HIMSS Dictionary of Healthcare Information Technology Terms, Acronyms and Organizations, 2nd Edition, 2010, Appendix B, p190. ↩
3. Natalya F. Noy and Deborah L. Mcguinness. 2001. Ontology Development 101: A Guide to Creating Your First Ontology. Available at http://protege.stanford.edu/publications/ontology\_development/ontology101.pdf. ↩
4. https://schema.org/. ↩