Abstract: Elkin: Difference between revisions

From NCOR Wiki
Jump to navigationJump to search
No edit summary
Line 13: Line 13:
We then compare B/A with D/C using for example a Pearson chi square test. (This is a real example: people with Rosacia had a two-fold rate of OSA.)
We then compare B/A with D/C using for example a Pearson chi square test. (This is a real example: people with Rosacia had a two-fold rate of OSA.)
   
   
But how to get these numbers? Our strategy is to extract all relevant prior stored codes from the Electronic Health Records of large numbers of patients and to use ontologies to enhance the resultant data by associating common ontology terms with multiple codes whenever multiple codes refer to the same phenomenon on the side of the patient. We can then the power of the subsumption hierarchy within the ontology to find the transitive reflexive closure on subsumption for any given term used in our query in order to maximize the degree that we obtain all relevant data from our dataset even where the included data are coded in multiple different ways.
But how to get these numbers? And how to ensure that we have a sample of patients that is representative of the relevant 'general population'? Our strategy, first, is to use ontology tagging to semantically enhance a given body of Electronic Health Record data deriving from large numbers of patients. In the ontologically enhanced data common ontology terms are associated with multiple codes whenever multiple codes refer to the same phenomenon on the side of the patient. Second, we use this enhanced body of data to select our population of patients on the basis of inclusion and exclusion criteria that are themselves specified using terms from the ontology. Third,
In this way the person asking clinical research questions does not need to know all the subtypes of, for example, cardiovascular disorder or anti-arrhythmic drugs in order to have access to the corresponding data. The normalization of the many ways to say the same thing provided by use of a single ontology term facilitates reliable information retrieval. Coupling these characteristics with an interface allows subject-matter experts with no computer science background to ask complex questions of clinical data and get answers within seconds.
we exploit the power of the subsumption hierarchy within the ontology to find the transitive reflexive closure on subsumption for any given term used in our query in order to maximize the degree to which we obtain all relevant data from our dataset even where the included data are coded in multiple different ways.
 
By following this strategy, someone asking clinical research questions does not need to know all the subtypes of, for example, cardiovascular disorder or anti-arrhythmic drugs in order to have access to the corresponding data. The normalization of the many ways to say the same thing provided by use of a single ontology term facilitates more reliable information retrieval. Coupling these characteristics with a human-friendly interface allows subject-matter experts with no computer science background to address arbitrarily complex questions against clinical data and get answers within seconds.

Revision as of 18:21, 26 June 2016

Ontology-Enabled IRB-Free 5-Minute Retrospective Clinical Trials

Let’s say we are interested in the following research question:

Do patients who have Rosacia have a higher risk of Obstructive Sleep Apnea than the general population?

To answer this question we need to know:

A. Number of Patients in total who do not have Rosacia
B. Number of Patients who do not have Rosacia and have OSA
C. Number of Patients with Rosacia
D. Number of Patients with Rosacia and OSA

We then compare B/A with D/C using for example a Pearson chi square test. (This is a real example: people with Rosacia had a two-fold rate of OSA.)

But how to get these numbers? And how to ensure that we have a sample of patients that is representative of the relevant 'general population'? Our strategy, first, is to use ontology tagging to semantically enhance a given body of Electronic Health Record data deriving from large numbers of patients. In the ontologically enhanced data common ontology terms are associated with multiple codes whenever multiple codes refer to the same phenomenon on the side of the patient. Second, we use this enhanced body of data to select our population of patients on the basis of inclusion and exclusion criteria that are themselves specified using terms from the ontology. Third, we exploit the power of the subsumption hierarchy within the ontology to find the transitive reflexive closure on subsumption for any given term used in our query in order to maximize the degree to which we obtain all relevant data from our dataset even where the included data are coded in multiple different ways.

By following this strategy, someone asking clinical research questions does not need to know all the subtypes of, for example, cardiovascular disorder or anti-arrhythmic drugs in order to have access to the corresponding data. The normalization of the many ways to say the same thing provided by use of a single ontology term facilitates more reliable information retrieval. Coupling these characteristics with a human-friendly interface allows subject-matter experts with no computer science background to address arbitrarily complex questions against clinical data and get answers within seconds.