Ontology-Enabled IRB-Free 5-Minute Retrospective Clinical Trials
Let’s say we are interested in the following research question:
- Do patients who have Rosacia have a higher risk of Obstructive Sleep Apnea than the general population?
To answer this question we need to know:
- A. Number of Patients in total who do not have Rosacia
- B. Number of Patients who do not have Rosacia and have OSA
- C. Number of Patients with Rosacia
- D. Number of Patients with Rosacia and OSA
We then compare B/A with D/C using for example a Pearson chi square test. (This is a real example: people with Rosacia had a two-fold rate of OSA.)
But how to get these numbers? And how to ensure that we have a sample of patients that is representative of the relevant general population? Our strategy, first, is to use ontology tagging to semantically enhance a given body of Electronic Health Record data deriving from large numbers of patients. In the ontologically enhanced data, common ontology terms are associated with multiple codes whenever multiple codes refer to the same phenomenon on the side of the patient. Second, we use this enhanced body of data to select our population of patients on the basis of inclusion and exclusion criteria that are themselves specified using terms from the ontology. Third, we exploit the power of the subsumption hierarchy within the ontology to find the transitive reflexive closure on subsumption for any given term used in our query in order to maximize the degree to which we obtain all relevant data from our dataset even where the included data are coded in multiple different ways.
By following this strategy, someone asking clinical research questions does not need to know all the subtypes of, for example, cardiovascular disorder or anti-arrhythmic drug in order to have access to the corresponding data. The normalization of the many ways to say the same thing provided by use of a single ontology term facilitates more reliable information retrieval. Coupling these characteristics with a human-friendly interface allows subject-matter experts with no computer science background to address arbitrarily complex questions against clinical data and get answers within seconds.