CTS Ontology Workshop 2023: Difference between revisions

From NCOR Wiki
Jump to navigationJump to search
mNo edit summary
 
(79 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<font size="+3">Ontologies, AI and Electronic Health Records</font>  
<font size="+2">Ontologies, AI and Electronic Health Records</font>  


More about the '''[http://ncorwiki.buffalo.edu/index.php/Clinical_and_Translational_Science_Ontology_Group Clinical and Translational Science Ontology Group (CTSOG)]''' and previous meetings.
<font size="+1">Feb 23 - 24, 2023 - Charleston, SC</font>
 
<font size="+2">Feb 23 - 24, 2023 - Charleston, SC</font>


[[File:Chs23sm.jpg|link=|right|alt=Charleston]]
[[File:Chs23sm.jpg|link=|right|alt=Charleston]]
Line 12: Line 10:


'''The Clinical and Translational Science Ontology Group (CTSOG)''' invites you to join us February 23-24, 2023, in Charleston, SC to discuss the role of ontologies in improving electronic health records (EHR) systems by advancing semantic interoperability, translational research, and artificial intelligence (AI). As health data increases in volume and complexity, and artificial intelligence applications gain momentum, the need for careful planning and interoperability becomes more critical. The purpose of this workshop is to explore new paradigms in EHRs, by examining successes and failures of ontologies and AI in different areas of biomedicine and their role in equitable healthcare.  
'''The Clinical and Translational Science Ontology Group (CTSOG)''' invites you to join us February 23-24, 2023, in Charleston, SC to discuss the role of ontologies in improving electronic health records (EHR) systems by advancing semantic interoperability, translational research, and artificial intelligence (AI). As health data increases in volume and complexity, and artificial intelligence applications gain momentum, the need for careful planning and interoperability becomes more critical. The purpose of this workshop is to explore new paradigms in EHRs, by examining successes and failures of ontologies and AI in different areas of biomedicine and their role in equitable healthcare.  
More about the '''[http://ncorwiki.buffalo.edu/index.php/Clinical_and_Translational_Science_Ontology_Group Clinical and Translational Science Ontology Group (CTSOG)]''' and previous meetings.


== '''Themes''' ==
== '''Themes''' ==
Line 17: Line 17:
* Improving the EHR with ontologies and with AI
* Improving the EHR with ontologies and with AI
* The functions of the EHR and other healthcare documents
* The functions of the EHR and other healthcare documents
<font size="+1">Special Focus Areas:</font>
* Social Determinants of Health
* Social Determinants of Health
* Mental health
* Mental health
* EHR across the lifespan
* ChatGPT


== '''Organizers''' ==
== '''Organizers''' ==


Workshop Co-organizers:
'''Workshop Co-organizers'''


Bill Hogan, Jihad Obeid, Barry Smith
Bill Hogan, Jihad Obeid, Barry Smith


<font size="+1">CTSOG Co-chairs:</font>
'''CTSOG Co-chairs'''


Bill Hogan (University of Florida College of Medicine, Gainesville, FL), hoganwr@ufl.edu
Bill Hogan (University of Florida College of Medicine, Gainesville, FL), hoganwr@ufl.edu
Line 42: Line 39:
[[File:Markiii-logo_160.png|link=https://www.markiiisys.com/|Mark III Systems]]    &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp;
[[File:Markiii-logo_160.png|link=https://www.markiiisys.com/|Mark III Systems]]    &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp;
[[File:Nvidia-logo_180.png|link=https://www.nvidia.com/en-us/|NVIDIA]]    &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp;
[[File:Nvidia-logo_180.png|link=https://www.nvidia.com/en-us/|NVIDIA]]    &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp;
[[File:TriNetX.png|link=http://trinetx.com/|TriNetX]]
[[File:TriNetX-Logo-Horizontal-White_180.jpg|link=http://trinetx.com/|TriNetX]]


== '''Venue'''==
== '''Venue'''==
Line 62: Line 59:
* App-based parking next to 22 WestEdge building (signage with instructions posted onsite)
* App-based parking next to 22 WestEdge building (signage with instructions posted onsite)


== '''Hotel'''==
=='''Registration'''==  
 
Registration is free.
 
However, registration is required for planning purposes. Registration is now closed.
 
== '''Agenda: Thursday Feb 23'''==
 
'''08:30a''': Breakfast and Networking
 
'''09:00a''': Introduction to the meeting
 
'''09:15-10:45a''' Working sessions/Presentations
 
*'''Barry Smith and Jobst Landgrebe:''' Discussion on AI and Medicine, with a special focus on ChatGPT
 
Slides: [https://buffalo.box.com/v/Smith-on-ChatGPT Smith], [https://buffalo.box.com/v/Landgrebe-ChatGPT Landgrebe]
 
[https://www.youtube.com/watch?v=2f5Dh56lOBY Video]
 
How are we to explain the peculiar tendency of ChatGPT to throw up what are called [https://buffalo.box.com/v/ChatGPT-Hallucinations ‘hallucinations’]? To answer this question we will draw on our book ''[https://buffalo.app.box.com/v/AI-Without-Fear Why Machines Will Never Rule the World]'', whose core thesis is that an artificial intelligence that could equal or exceed human intelligence—sometimes called "artificial general intelligence" (AGI)—is for mathematical reasons impossible. The argument for this thesis rests on the fact that (1.) human intelligence is a capability of a complex dynamic system—the human brain and central nervous system, and (2.) systems of this sort (like all organic systems) cannot be modelled mathematically in a way that would allow the models to operate inside a computer. We survey on this basis the potential of AI in the future of clinical and translational research.
 
'''--break--'''
 
'''11:00-12:30p''' Working sessions/Presentations
 
*'''Discussion and brief presentations. Topics to include:
:The Clinical and Translational Science Ontology Landscape in 2023
 
[https://buffalo.box.com/s/6nefh88tsdmo0e1cuirprei2mq189wzn Video]
 
'''12:30-01:30p'''  Networking and Working Lunch


'''Courtyard by Marriott Charleston Waterfront'''
'''01:30-3:00p''' Working sessions/Presentations


'''Address:''' 35 Lockwood Dr, Charleston, SC 29401
*'''Yonghui Wu:''' A large language model for electronic health records.


'''Phone:''' (843) 722-7229
[https://buffalo.box.com/v/Yongui-AI-and-EHR Slides]


Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. We develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text). GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks, which can be applied to medical AI systems to improve healthcare delivery.


'''--break--'''


[https://www.marriott.com/event-reservations/reservation-link.mi?id=1670450257241&key=GRP&app=resvlink  Feb. 22-23 reservation link]
'''3:15-5:00p''' Working sessions/Presentations


[https://www.marriott.com/event-reservations/reservation-link.mi?id=1670451475075&key=GRP&app=resvlink Feb. 24-25 reservation link] (use this link if you are extending your stay in Charleston through the weekend)
*'''Richard Ohrbach and Barry Smith:''' Defining 'Injury'


If you have any issues registering, please call the hotel directly at 843-722-7229 and mention you are visiting for the CTSA Ontology meeting.
[https://buffalo.box.com/v/Ohrbach-Defining-Injury Slides]


=='''Registration'''==
[https://buffalo.box.com/s/tqjac00xph64t41q3i5lz00f2jskufxs Video]


[https://redcap.musc.edu/surveys/?s=W7FNH78FFHCRNN9Y Registration is free].
This session is intended as a contribution to the ontology component of a broader investigation of injury and pain. SNOMED-CT defines an injury as a 'disorder resulting from physical damage to the body'. The WHO defines an injury as a 'bodily lesion at the organic level, resulting from acute exposure to energy (mechanical, thermal, electrical, chemical or radiant), in amounts that exceed the threshold of physiological tolerance.' We will explore these and other definitions with a view to establishing a more coherent understanding of the ontology of injury and of related phenomena such as lesion, trauma, pain, and so forth. Further topics for possible discussion are listed <u>[[Injury | here]]</u>. For the ontology of pain, see in particular [https://philpapers.org/archive/SMITAO-12.pdf here]


However, '''[https://redcap.musc.edu/surveys/?s=W7FNH78FFHCRNN9Y registration is required]''' for planning purposes. Please register for the workshop using [https://redcap.musc.edu/surveys/?s=W7FNH78FFHCRNN9Y this form].
'''--break--'''


== '''Speakers'''==
'''5:30p''' Networking Reception


*'''Barry Smith and Jobst Landgrebe:''' AI and Medicine, with a special focus on ChatGPT
=='''Agenda: Friday Feb 24'''==


'''Abstract: '''How are we to explain the peculiar tendency of ChatGPT to throw up what are called ‘hallucinations’? To answer this question we will draw on our book ''[https://buffalo.app.box.com/v/AI-Without-Fear Why Machines Will Never Rule the World]'', whose core thesis is that an artificial intelligence that could equal or exceed human intelligence—sometimes called "artificial general intelligence" (AGI)—is for mathematical reasons impossible. The argument for this thesis rests on the fact that (1.) human intelligence is a capability of a complex dynamic system—the human brain and central nervous system, and (2.) systems of this sort (like all organic systems) cannot be modelled mathematically in a way that would allow the models to operate inside a computer. We survey on this basis the potential of AI in the future of clinical and translational research.
'''08:30-09:15a''' Breakfast and Networking


*'''Justin Reese:''' Unsupervised machine learning to define subtypes of long COVID using the Human Phenotype Ontology. [[Authors]]
'''09:15-10:15a''' Working sessions/Presentations


'''Abstract:''' Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterized by a wide range of manifestations that are difficult to analyze computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.
*'''William Hogan:''' Semantic Representation of Occupations as Social Determinants of Health


We present a method for defining subtypes of long COVID by computationally modeling PASC phenotype data and applying unsupervised machine learning. We extracted Long COVID phenotype data expressed as HPO (Human Phenotype Ontology) terms from electronic healthcare records (EHRs), and then used semantic similarity of phenotype data to calculate a matrix of pairwise patient similarity. This matrix was then used to clustered patients using unsupervised machine learning (k means clustering).
[https://buffalo.box.com/v/Hogan-Occupations-Ontology Slides]


We identified six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centers to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems.
[https://buffalo.box.com/s/tqjac00xph64t41q3i5lz00f2jskufxs Video]


Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.
The ontological representation of occupation presents several challenges. For example, is it your occupation if you have not yet done it? Are you a nuclear engineer before you start your first job doing it? Also, if an occupation is a role (in the BFO sense), then how does it differ from a job role?  If I have an occupation of professor, and have had four job roles as professors at different organizations, then that is one occupation numerically but four job roles. How to handle non-employment situations such as "retired" and "homemaker"? What if you concurrently hold two jobs in different "occupations"? I will discuss these issues in the context of recent discussions at the OMRSE monthly meeting and present some options for ontological representation of occupation and related entities.


*'''William Hogan:''' Semantic Representation of Occupations as Social Determinants of Health.
'''--break--'''
*'''Yonghui Wu:''' A large language model for electronic health records.


'''Abstract:''' Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. We develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text). GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks, which can be applied to medical AI systems to improve healthcare delivery.
'''10:30-12:30p''' Working sessions/Presentations


*'''Richard Ohrbach and Barry Smith:''' Defining 'Injury'
*'''Justin Reese:''' Unsupervised machine learning to define subtypes of long COVID using the Human Phenotype Ontology] [[Authors | Co-authors]]


'''Abstract:''' This session is intended as a first contribution to the ontology component of the program project for investigating injury and pain response to the NIH [https://www.nidcr.nih.gov/grants-funding/funding-priorities/future-research-initiatives/tmd-collaborative-improving-patientcentered-translational-research-tmd-impact TMD IMPACT Collaborative for IMproving PAtient-Centered Translational Research]. SNOMED-CT defines an injury as a 'disorder resulting from physical damage to the body'. The WHO defines an injury as a 'bodily lesion at the organic level, resulting from acute exposure to energy (mechanical, thermal, electrical, chemical or radiant), in amounts that exceed the threshold of physiological tolerance.' We will explore these and other definitions with a view to establishing a more coherent understanding of the ontology of injury and of related phenomena such as lesion, trauma, pain
[https://docs.google.com/presentation/d/1WKvoNl6yqiA00zH5qHkRfDKTksboTX897zU7yJpT8yY/edit#slide=id.g1dd10f2c7e8_0_4793 Slides]


Part of the
[https://buffalo.box.com/s/it19e7j0o26g5tgy64o6snm66944eyw7 Video]


== '''Agenda'''==
Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterized by a wide range of manifestations that are difficult to analyze computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.


'''Thursday Feb 23rd'''
We present a method for defining subtypes of long COVID by computationally modeling PASC phenotype data and applying unsupervised machine learning. We extracted Long COVID phenotype data expressed as HPO (Human Phenotype Ontology) terms from electronic healthcare records (EHRs), and then used semantic similarity of phenotype data to calculate a matrix of pairwise patient similarity. This matrix was then used to clustered patients using unsupervised machine learning (k means clustering).


'''09:00'''-09:30a Breakfast and Networking
We identified six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centers to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems.


09:30-10:45a Working sessions/Presentations
Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.


--break--
*'''Paul Heider:''' Beyond Structure:  Using Standardized Labels from a Clinical Section Ontology in Natural Language Processing.


11:00-12:30p  Working sessions/Presentations
[https://buffalo.box.com/v/Cllinical-Section-Ontology Slides]


12:30-01:00p  Networking and Working Lunch
[https://buffalo.box.com/s/w4n2drw17v13imdc026flj5y5gqize2l Video]


1:00-03:15p Working sessions/Presentations
The structured subdivision of clinical notes into sections is often treated as the primary and ultimate use for automatically tagging section boundaries.  We discuss how section information (especially when normalized to a standard ontology) can be used to augment natural language processing algorithms, simplify information retrieval tasks,and speed up system development through disaggregated performance evaluation.  Our examples emphasize a Clinical Section Ontology recently developed at MUSC but are applicable to any standardized section ontology.  We also include a brief overview of the annotated datasets in this domain.


--break--
'''12:30-01:30p''' Networking and Working Lunch


03:30-'''5:00p''' Working sessions/Presentations
'''1:00-03:15p''' Working sessions/Presentations from industry partners


'''Friday Feb 24th'''
*'''Andy Lin: (VP Strategy, CTO | Mark III Systems)''' NVIDIA DGX H100 – The Premier AI Platform for Research and LLMs.


'''09:00'''-09:30a Breakfast and Networking
[https://buffalo.box.com/v/Lin-NVIDIA-DGX-H100 Slides]


09:30-10:45a Working sessions/Presentations
[https://buffalo.box.com/s/v9irq9k9c31yvtn97s91aenvbqfebfri Video]


--break--
With up to 4x training and up to 30x inferencing performance improvements for Large Language Models (LLMs) vs the already industry leading A100 GPU, the recently launched H100 GPU (Hopper) and its purpose-built DGX H100 AI platform are transforming what’s possible with Research and LLMs.  In this session, we’ll dive into the DGX H100 platform and how you can get started or accelerate your work with NVIDIA and Mark III.


11:00-12:30p  Working sessions/Presentations
*'''John Doole: (TriNetX, LLC)''' Medication Informatics at TriNetX: Powering Clinical Research with a Federated Real World Data Network.


12:30-01:00p  Networking and Working Lunch
[https://buffalo.box.com/v/Charleston-TriNetX Slides]


1:00-03:15p Working sessions/Presentations
[https://buffalo.box.com/v/Charleston-TriNetX Video]


--break--
In this talk we will highlight the importance of medication informatics in clinical research, particularly in the context of a federated network. The talk discusses the challenges of data standardization and interoperability and the benefits of using medication ontologies and classification systems to improve data quality and efficiency. Additionally, the talk presents TriNetX's medication framework and how natural language processing is used to extract medication details, showcasing examples of its implementation and a discussion of the advantages and disadvantages of using NLP in clinical research.


03:30-'''5:00p''' Working sessions/Presentations
'''03:15-3:30p''' Closing Remarks


== '''Participants''' ==
== '''Participants''' ==


* Barry Smith, co-organizer, University at Buffalo
* Alekseyenko, Alex, Medical University of South Carolina
* William Hogan, co-organizer, University of Florida
* Baker, Hamilton, Medical University of South Carolina
* Jihad Obeid, co-organizer, Medical University of South Carolina
* Chatterjee, Prosenjit, The Citadel
 
* Doole, John, TriNetX
* Hamilton Baker, Medical University of South Carolina
* Heider, Paul, Medical University of South Carolina
* Jobst Landgrebe, University at Buffalo
* Hogan, William, co-organizer, University of Florida
* Anna Maria Masci, National Institute of Environmental Health Sciences (NIEHS)
* Hutchinson, Tom, University of Pennsylvania, Institute for BioInformatics
* Justin Reese, Lawrence Berkeley National Laboratory
* Kim, Wonhee, Visiting Professor, Medical University of South Carolina
* Landgrebe, Jobst, Cognotekt, Cologne
* Lin, Andy, NVIDIA
* Masci, Anna Maria, National Institute of Environmental Health Sciences (NIEHS)
* Obeid, Jihad, co-organizer, Medical University of South Carolina
* Ohrbach, Richard, University at Buffalo
* Reese, Justin, Lawrence Berkeley National Laboratory
* Scheuermann, Richard, J. Craig Venter Institute
* Simpson, Kit, Medical University of South Carolina
* Skowronek, Matt, TriNetX
* Smith, Barry, co-organizer, University at Buffalo
* Topaloglu, Umit, Center for Biomedical Informatics and Information Technology (CBIIT) National Cancer Institute
* Wehbe, Ramsey, Medical University of South Carolina
* Yonghui Wu, University of Florida
* Yonghui Wu, University of Florida
* Zheng, Jie, University of Pennsylvania

Latest revision as of 19:53, 16 November 2023

Ontologies, AI and Electronic Health Records

Feb 23 - 24, 2023 - Charleston, SC

Charleston

Background

The Clinical and Translational Science Ontology Group (CTSOG) invites you to join us February 23-24, 2023, in Charleston, SC to discuss the role of ontologies in improving electronic health records (EHR) systems by advancing semantic interoperability, translational research, and artificial intelligence (AI). As health data increases in volume and complexity, and artificial intelligence applications gain momentum, the need for careful planning and interoperability becomes more critical. The purpose of this workshop is to explore new paradigms in EHRs, by examining successes and failures of ontologies and AI in different areas of biomedicine and their role in equitable healthcare.

More about the Clinical and Translational Science Ontology Group (CTSOG) and previous meetings.

Themes

  • Improving the EHR with ontologies and with AI
  • The functions of the EHR and other healthcare documents
  • Social Determinants of Health
  • Mental health
  • ChatGPT

Organizers

Workshop Co-organizers

Bill Hogan, Jihad Obeid, Barry Smith

CTSOG Co-chairs

Bill Hogan (University of Florida College of Medicine, Gainesville, FL), hoganwr@ufl.edu

Barry Smith (University at Buffalo, Buffalo, NY), phismith@buffalo.edu

Sponsors

  • Medical University of South Carolina, the Biomedical Informatics Center and:

Mark III Systems            NVIDIA            TriNetX

Venue

Biomedical Informatics Center

Medical University of South Carolina

Address: 22 Westedge St, Charleston, SC 29403

Directions: From Courtyard Marriott on Lockwood

  • 10 minute walk, 0.4 miles, safe crosswalks along the route
  • Hotel shuttle – runs periodically, need to ask hotel desk for timing details.

Paid parking:

  • Parking deck at 10 WestEdge (building before 22 WestEdge, with Publix on bottom floor)
  • App-based parking next to 22 WestEdge building (signage with instructions posted onsite)

Registration

Registration is free.

However, registration is required for planning purposes. Registration is now closed.

Agenda: Thursday Feb 23

08:30a: Breakfast and Networking

09:00a: Introduction to the meeting

09:15-10:45a Working sessions/Presentations

  • Barry Smith and Jobst Landgrebe: Discussion on AI and Medicine, with a special focus on ChatGPT

Slides: Smith, Landgrebe

Video

How are we to explain the peculiar tendency of ChatGPT to throw up what are called ‘hallucinations’? To answer this question we will draw on our book Why Machines Will Never Rule the World, whose core thesis is that an artificial intelligence that could equal or exceed human intelligence—sometimes called "artificial general intelligence" (AGI)—is for mathematical reasons impossible. The argument for this thesis rests on the fact that (1.) human intelligence is a capability of a complex dynamic system—the human brain and central nervous system, and (2.) systems of this sort (like all organic systems) cannot be modelled mathematically in a way that would allow the models to operate inside a computer. We survey on this basis the potential of AI in the future of clinical and translational research.

--break--

11:00-12:30p Working sessions/Presentations

  • Discussion and brief presentations. Topics to include:
The Clinical and Translational Science Ontology Landscape in 2023

Video

12:30-01:30p Networking and Working Lunch

01:30-3:00p Working sessions/Presentations

  • Yonghui Wu: A large language model for electronic health records.

Slides

Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. We develop from scratch a large clinical language model—GatorTron—using >90 billion words of text (including >82 billion words of de-identified clinical text). GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks, which can be applied to medical AI systems to improve healthcare delivery.

--break--

3:15-5:00p Working sessions/Presentations

  • Richard Ohrbach and Barry Smith: Defining 'Injury'

Slides

Video

This session is intended as a contribution to the ontology component of a broader investigation of injury and pain. SNOMED-CT defines an injury as a 'disorder resulting from physical damage to the body'. The WHO defines an injury as a 'bodily lesion at the organic level, resulting from acute exposure to energy (mechanical, thermal, electrical, chemical or radiant), in amounts that exceed the threshold of physiological tolerance.' We will explore these and other definitions with a view to establishing a more coherent understanding of the ontology of injury and of related phenomena such as lesion, trauma, pain, and so forth. Further topics for possible discussion are listed here. For the ontology of pain, see in particular here

--break--

5:30p Networking Reception

Agenda: Friday Feb 24

08:30-09:15a Breakfast and Networking

09:15-10:15a Working sessions/Presentations

  • William Hogan: Semantic Representation of Occupations as Social Determinants of Health

Slides

Video

The ontological representation of occupation presents several challenges. For example, is it your occupation if you have not yet done it? Are you a nuclear engineer before you start your first job doing it? Also, if an occupation is a role (in the BFO sense), then how does it differ from a job role? If I have an occupation of professor, and have had four job roles as professors at different organizations, then that is one occupation numerically but four job roles. How to handle non-employment situations such as "retired" and "homemaker"? What if you concurrently hold two jobs in different "occupations"? I will discuss these issues in the context of recent discussions at the OMRSE monthly meeting and present some options for ontological representation of occupation and related entities.

--break--

10:30-12:30p Working sessions/Presentations

  • Justin Reese: Unsupervised machine learning to define subtypes of long COVID using the Human Phenotype Ontology] Co-authors

Slides

Video

Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterized by a wide range of manifestations that are difficult to analyze computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested.

We present a method for defining subtypes of long COVID by computationally modeling PASC phenotype data and applying unsupervised machine learning. We extracted Long COVID phenotype data expressed as HPO (Human Phenotype Ontology) terms from electronic healthcare records (EHRs), and then used semantic similarity of phenotype data to calculate a matrix of pairwise patient similarity. This matrix was then used to clustered patients using unsupervised machine learning (k means clustering).

We identified six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centers to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems.

Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

  • Paul Heider: Beyond Structure:  Using Standardized Labels from a Clinical Section Ontology in Natural Language Processing.

Slides

Video

The structured subdivision of clinical notes into sections is often treated as the primary and ultimate use for automatically tagging section boundaries.  We discuss how section information (especially when normalized to a standard ontology) can be used to augment natural language processing algorithms, simplify information retrieval tasks,and speed up system development through disaggregated performance evaluation.  Our examples emphasize a Clinical Section Ontology recently developed at MUSC but are applicable to any standardized section ontology.  We also include a brief overview of the annotated datasets in this domain.

12:30-01:30p Networking and Working Lunch

1:00-03:15p Working sessions/Presentations from industry partners

  • Andy Lin: (VP Strategy, CTO | Mark III Systems) NVIDIA DGX H100 – The Premier AI Platform for Research and LLMs.

Slides

Video

With up to 4x training and up to 30x inferencing performance improvements for Large Language Models (LLMs) vs the already industry leading A100 GPU, the recently launched H100 GPU (Hopper) and its purpose-built DGX H100 AI platform are transforming what’s possible with Research and LLMs. In this session, we’ll dive into the DGX H100 platform and how you can get started or accelerate your work with NVIDIA and Mark III.

  • John Doole: (TriNetX, LLC) Medication Informatics at TriNetX: Powering Clinical Research with a Federated Real World Data Network.

Slides

Video

In this talk we will highlight the importance of medication informatics in clinical research, particularly in the context of a federated network. The talk discusses the challenges of data standardization and interoperability and the benefits of using medication ontologies and classification systems to improve data quality and efficiency. Additionally, the talk presents TriNetX's medication framework and how natural language processing is used to extract medication details, showcasing examples of its implementation and a discussion of the advantages and disadvantages of using NLP in clinical research.

03:15-3:30p Closing Remarks

Participants

  • Alekseyenko, Alex, Medical University of South Carolina
  • Baker, Hamilton, Medical University of South Carolina
  • Chatterjee, Prosenjit, The Citadel
  • Doole, John, TriNetX
  • Heider, Paul, Medical University of South Carolina
  • Hogan, William, co-organizer, University of Florida
  • Hutchinson, Tom, University of Pennsylvania, Institute for BioInformatics
  • Kim, Wonhee, Visiting Professor, Medical University of South Carolina
  • Landgrebe, Jobst, Cognotekt, Cologne
  • Lin, Andy, NVIDIA
  • Masci, Anna Maria, National Institute of Environmental Health Sciences (NIEHS)
  • Obeid, Jihad, co-organizer, Medical University of South Carolina
  • Ohrbach, Richard, University at Buffalo
  • Reese, Justin, Lawrence Berkeley National Laboratory
  • Scheuermann, Richard, J. Craig Venter Institute
  • Simpson, Kit, Medical University of South Carolina
  • Skowronek, Matt, TriNetX
  • Smith, Barry, co-organizer, University at Buffalo
  • Topaloglu, Umit, Center for Biomedical Informatics and Information Technology (CBIIT) National Cancer Institute
  • Wehbe, Ramsey, Medical University of South Carolina
  • Yonghui Wu, University of Florida
  • Zheng, Jie, University of Pennsylvania