Ontology for Data Science

From NCOR Wiki
Revision as of 19:32, 18 February 2018 by Phismith (talk | contribs)
Jump to navigationJump to search

Title: PHI 549 Introduction to Ontology for Data Science and Information Fusion, Spring 2019

Proposed Special Topics Course

Faculty: Barry Smith

Registration: Class# . Registration details for off-campus students are provided under Part Time/Graduate here.

Course Structure: This will be a one credit hour course intended for beginning PhDs, Masters and advanced Undergraduate students in all departments. Assessment will be in the form of class attendance (50%), together with ontology-building exercises using the Protégé ontology editing softaware to be completed within 2 weeks of the class (50%). For those enrolled in the CMIF Masters Program completion of this course will represent 1 credit hour in the 3-hour compulsory ontology class for this program.

Course Description: An ontology is a structured collection of terms used to tag data with the goal of making data deriving from heterogeneous sources more easily searchable, comparable. combinable, or analysable. Ontologies allow information to be shared across communities with different sorts of expertise. The course will provide an introduction to ontology for students of data science and information fusion. It is co-sponsored by the Department of Philosophy and the Center for Multi-Source Information Fusion.

Schedule: The class will be taught over two consecutive days in the week before the beginning of the Spring semester.


Day 1

8:30 Registration and Coffee

9:00 Introduction to Ontology, Data Science and Information Fusion

Data mining
Natural language processing
Explainable artificial intelligence

10:30: Coffee

10:45 Ontology Time-Line

1: 1970s: Strong AI, Robotics, PSL
2: 1990s: The Semantic Web, Linked Open Data
3: 2000s: Lessons from the Human Genome Project
4: 2019: Current examples of uses of ontology in data science and information fusion

11:00

12:00 Lunch

12:45 Use of Ontologies in Data Fusion and Data Analysis

Examples from biology
Examples from environmental sciences
Examples from medicine

14:00 Use of Ontologies in Military Domains

Joint Doctrine Ontology
Ontology for command and control
Ontology and military logistics
Space ontology

16:00 Close

Day 2 8:30 Registration and Coffee

9:00 Use of Ontologies in Intelligence Domains

Ontology and intelligence analysis
Ontology of terrorism
Ontology and information fusion

10:30: Coffee

10:45:Basic Formal Ontology

ISO 21838
Common Core Ontologies (CCO)
Industryial Ontologies Foundry (IOF)

12:00 Lunch

12:45 Ontology Technology

From HTML to the Web Ontology Language (OWL)
Ontology Repositories
The Protégé Ontology Editor
SPARQL Queries

14:00 Introduction to Ontology Building

Simple Guide to Building Ontologies with Protégé
Examples
Interactive Ontology Building Session

Feb 26: Simple Protege Introduction

Videos

  • When watching these videos please bear in mind that we have not introduced in the class so far the specific terminology used by Protege. Most importantly, 'class' in Sadawi's course is what we have been referring to as 'type' or 'universal'. 'Property' is what we have been referring to as 'Relation'. Each property has a domain and a range; for instance the property teaches has the domain teacher and the range student. A guide (probably more than you need) is here and there is also an introduction to the Semantic Web in the Appendix to the BFO book. If there is terminology used in Sadawi's lectures which you think needs explaining please feel free to post a request to the the class email list [1].
  • In addition to taking Sadawi's course you should also download Protege to your computer from here and experiment with creating simple ontologies of your own and posting them to the web. Include links on the slack page to the ontologies you create.

Mar 5: Ontology and Referent Tracking

Video
  • This video presents a set of rules and best practices for ontology creation, together with examples
  • The three videos below introduce the idea of referent tracking. Where ontologies are descriptions of types (universals and defined classes in reality), a referent tracking system provides a way of referring to and keeping track of the instances of such types. Ontologies and referent tracking systems are thus two sides of a single kind.
  • A referent system (RTS) is designed not merely to keep track of what is the case in reality but also to allow us to capture what is believed to be the case in reality. It also allows us to keep track of how changes in the information system correspond to changes in the reality outside that system. We will provide an introduction to referent tracking and its implementations.
Basics of Referent Tracking (RT) Video
RT and Video Surveillance Video
RT and Data descriptions Video
  • Reading: How to track absolutely everything?. Note that each time your download this pdf file the copy you create is assigned a new referent tracking ID. This enables Dr Ceusters to keep track of the IP addresses of those who are downloading materials from his site and of which versions of these materials they are downloading at which times.

Mar 12: Basic Formal Ontology

The video material for the period from March 12 through March 26 will cover Basic Formal Ontology as described in the book and in the more technical specification here.

BFO Part One: Overview of BFO
Slides


A shorter summary of the March 12-26 material is presented here:

BFO 2.0 Shorter Version Part One
BFO 2.0 Shorter Version Part Two

Mar 19 Spring Recess

Mar 26 Basic Formal Ontology (Continued)

Slides
BFO Part Two: Varieties of continuant entities
Boundaries, sites and spatial regions
Material entities occupy spatial regions
Temporal instants and temporal intervals


April 2: Basic Formal Ontology (Continued)

BFO Part Four: Granular Partitions
Slides
Manipulating partitions
Object partitions
Quality partitions
Color
Map layers
Process partitions
Map-based partitions of occurrent reality and the fiat entities they create
Weather
Napoleon's march to Moscow
From photography to film
Persistence in time
Partition sequences
Tossing a coin
Chess
Flying from Vienna to New York
Molecular pathways
Defining 'process profile'
Focusing on the cello part when you listen to a string quartet
Granular partitions and the Davidsonian theory of events
Quantities as Fiat Universals
Slides

Granular partitions are systems of cells which are projected on corresponding portions of reality. Maps of the territorial dioceses and archdioceses of the Catholic Church of the United States, for example, are granular partitions of a certain area of land. The former stands to the latter in the relation of refinement. Some granular partitions reflect bona fide divisions of reality, for example between mammals and reptiles. Others, for example, between normal and elevated blood pressure, represent reality in terms of fiat demarcations introduced for diagnostic or other purposes. This talk applies the theory of granular partitions to our understanding of quantities and of units of measure.

Apr 9: Environments and Emotions

Environments: Inside and Outside the Organism Slides
The talk begins with the question: What is an environment? Environments fall outside traditional philosophical classifications since they are neither things nor events. The Environment Ontology proposes a new approach to the understanding of environments grounded in the science of ecology, and considering environments within the same family as niches, habitats, and biomes. The talk concludes with a discussion of the relationship between the ontology of biological environments and the idea of environment underlying the ecological psychology of J. J. Gibson and the theory of behavior settings put forward by Roger Barker.
The Emotion Ontology Slides
The scientific study of emotions utilizes data of a wide range of different sorts, ranging from introspective and observational reports of individual emotional experiences to experimental data deriving from chemical, genetic, and neurological studies. Scientific ontologies provide a strategy for the integration of such heterogeneous data by providing formal definitions of the types of entities in the corresponding domains of reality and a controlled vocabulary in whose terms the different sorts of data can be consistently described. Heretofore, there has been little effort directed towards such formal representation for emotional phenomena, in part because of widespread debates within the affective science community on matters of definition and categorization. The Emotion Ontology is an attempt to rectify this shortfall. I will describe the ontology and show how it interoperates with ontologies in neighboring areas such as neurochemistry. I will also draw some general conclusions pertaining to classification in psychological and psychiatric domains, to the treatment of grief, and to the relation of all of the above to questions of philosophy.

Apr 16: The Ontology of Social Reality (continued)

Diagrams and Time Slides
A set of intermeshed diagrams called musical scores guides the complex series of human actions we call an orchestral performance. A set of intermeshed diagrams in a military field manual, similarly, guides the complex series of human actions which is a military operation. Musical scores and field manuals serve similarly as the basis for training of the users of such diagrams, which are able to perform their guidance functions only if their users have correspondingly intermeshed types of expertise.
Commanding and Other Social Acts Slides
We begin by distinguishing speech acts from document acts, where the latter includes not only for example signing or stamping or filling in a paper document but also including the acts performed, for instance, when you are completing and submitting your tax forms using tax software. We refer to the latter as e-document acts. Planning, and especially military planning, is nowadays a matter of both paper document acts and e-document acts. Successful military planning requires that there be pre-defined types of actions which planners can incorporate into their plans. Planners must be confident that warfighters will be able to execute actions of these types in an effective way. We show how this confidence is achieved 1. through military doctrine -- which defines the relevant action types -- and 2. through military training -- which builds the warfighters who can execute them. Military plans, military doctrine and military training relate not only to the actions of individual warfighters, but also to team actions and to the sorts of team of team actions involved when entire armies are involved in military options. It is the role of military command to make this possible -- we plan team actions by planning the individual actions of commanders at different levels in the military hierarchy. The speech acts and document acts we call military commands thus occur in the typical case as part of the execution of military plans. We conclude with a comparison between the planning, training, and commanding on the side of the military with the orchestration, rehearsal, and conducting that takes place in the performance of symphonic music.
The Ontology of Terrorism Slides
Notoriously, intelligence agencies face the problem of Connecting the Dots. 'Connecting', here, means not only cross-identifying the individuals referred to in different sources, but also combining in useful ways all the data about such individuals. Ontologies allow analysts to harvest combinable information from messy inputs by providing consistent sets of terms for describing the entities involved. Suppose, for example, that ontology terms have been used to tag collections of heterogeneous source data about, say, persons in Baghdad. Analysts can then use the results to identify all available data regarding, say, persons who speak Armenian, or persons with expertise in Java programming; and they can do this independently of the type of data (text, images, audio)which served as inputs. To be effective, however, ontologies need to contain not just terms but also definitions. To illustrate how this works we will consider some simple examples of ontology building, concluding with an ontological approach to the definition of terrorism.

Apr 23: The Ontology of Social Reality (continued)

The Emotion Ontology
The scientific study of emotions utilizes data of a wide range of different sorts, ranging from introspective and observational reports of individual emotional experiences to experimental data deriving from chemical, genetic, and neurological studies. Scientific ontologies provide a strategy for the integration of such heterogeneous data by providing formal definitions of the types of entities in the corresponding domains of reality and a controlled vocabulary in whose terms the different sorts of data can be consistently described. Heretofore, there has been little effort directed towards such formal representation for emotional phenomena, in part because of widespread debates within the affective science community on matters of definition and categorization. The Emotion Ontology is an attempt to rectify this shortfall. I will describe the ontology and show how it interoperates with ontologies in neighboring areas such as neurochemistry. I will also draw some general conclusions pertaining to classification in psychological and psychiatric domains, to the treatment of grief, and to the relation of all of the above to questions of philosophy.
Deontology Ontology
Basic Formal Ontology provides no obvious category under which deontic entities such as claims, rights, obligations, permissions fall. The lecture provides a summary of how such entities may be treated in a way that is consistent with BFO, focusing on the case of obligations generated through acts of promising. a longer version of this material is presented here and here.
Ontology of the Organigram
An organigram is a graph-theoretic structure consisting of nodes and edges. The nodes standardly represent three sorts of entities: divisions within the organization, offices of the persons who head these divisions, and the current holders of such offices. The edges represent relations of sub- and superordination between the entities represented by the nodes. Where such a relation obtains the subordinate has obligations based upon his consent to perform certain duties as directed and controlled by the superordinate. We will evaluate the hypothesis that an organization is itself a graph-theoretic structure that is (or is capable of being) represented by an organigram.

Apr 30: Ontology of Medicine

The Ontology of Disease
A recent paper in the journal Healthcare Informatics Research identifies a paradigm shift - 'from concept representations to ontologies' - in the ways medical terminologies and vocabularies are used to describe medical data [1]. We will describe what this paradigm shift involves, what it means to talk about 'ontologies' in the medical context, and how such talk relates to the traditional concerns of philosophical ontologists. We shall conclude with an ontological definition of disease, and illustrations of how this definition can be applied to a range of clinical examples. [1] See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3920035/
How to Build an Imaging Ontology
We will provide an introduction to the field of biomedical ontology with special reference to the field of pathology informatics. We will look at examples of existing ontologies especially the Ontology for Biomedical Investigations (OBI), the Ontology for Biological and Clinical Statistics (OBCS), and the Ontology for General Medical Science (OGMS). We will then draw lessons from these examples for an ontology of pathology imaging.
The Glory and Misery of Electronic Health Records
Starting from around 2005, national programs for the introduction of Electronic Health Records (EHRs) were launched with great enthusiasm in the US and UK. EHRs were seen as a means of increasing quality, safety and continuity of clinical care while at the same time reducing healthcare costs. I will survey the results of these, pointing out both achievements and failures. Specific topics to be addressed include: problems of data interoperability; ‘meaningful use’; the role of SNOMED CT, openEHR, and FHIR; and the prospects for secondary use of EHR data in information-driven clinical and translational research.​While bioinformatics has witnessed enormous technological advances since the turn of the millennium, progress in the EHR field has been stymied by outdated approaches entrenched through ill-conceived government mandates. In the US, especially, the dominant EHR systems are expensive, difficult to use, fail to ensure even a minimal level of interoperability, and detract from patient care. I will conclude by sketching an evolutionary path towards the sort of EHR landscape that will be needed in the future, in which consistency with biomedical ontologies will play a central role.
Ethics, Informatics and Obamacare
Surveys a series of ethical, economic, clinical and also safety issues relating to the application of informatics to healthcare, focusing especially on the role of informatics in the Patient Protection and Affordable Care Act. Talk presented in the University at Buffalo Clinical/Research Ethics Seminar - Ethics, Informatics and Obamacare, November 20, 2012. Slides are available here: http://ontology.buffalo.edu/13/ethics-informatics-obamacare.pptx

May 7 Student video presentations

Background Materials

Example Ontologies

Information Artifact Ontology
Gene Ontology
OBO (Open Biomedical Ontologies) Foundry
The Environment Ontology
Ontology for General Medical Science

Text: Robert Arp, Barry Smith and Andrew Spear, Building Ontologies with Basic Formal Ontology, Cambridge, MA: MIT Press, August 2015

Further readings are provided here: http://ontology.buffalo.edu/smith/

Requirements: This course is open to all persons with an undergraduate degree and some relevant experience (for example in data scientists, information engineers, terminology researchers).

Grading will be based on:

1. class participation (50%)
2. completion of

For policy regarding incompletes see here

For academic integrity policy see here


Student Learning Outcomes

Program Outcomes/Competencies Instructional Method(s) Assessment Method(s)
The student will acquire an introductory knowledge of current ontology methods in areas relating to data science and information fusion Class lectures Review of submitted online content and of participation in online discussion forum
The student will acquire experience in ontology development Ontology-building exercise Review of results in the form of Protégé file

Grading

Grading will be based on two factors: class participation and Protégé ontology-building exercise; the former will be assessed on the basis of attendance lists which will be circulated at on both days at non-preannounced times,

Grades will be weighted according to the following breakdown:

Weighting Assignment

50% - class attendance
50% - completion of Protégé ontology-building exercise

Students whose presence is not recorded in both lists will receive 0% for attendance. Details of requirements for the ontology-building exercise will be provided in class.

Final Grades

Grade Quality Percentage

A 4.0 93.0% -100.00%
A- 3.67 90.0% - 92.9%
B+ 3.33 87.0% - 89.9%
B 3.00 83.0% - 86.9%
B- 2.67 80.0% - 82.9%
C+ 2.33 77.0% - 79.9%
C 2.00 73.0% - 76.9%
C- 1.67 70.0% - 72.9%
D+ 1.33 67.0% - 69.9%
D 1.00 60.0% - 66.9%
F 0 59.9% or below

An interim grade of Incomplete (I) may be assigned if the student has not completed all requirements for the course. An interim grade of 'I' shall not be assigned to a student who did not attend the course. The default grade accompanying an interim grade of 'I' shall be 'U' and will be displayed on the UB record as 'IU.' The default Unsatisfactory (U) grade shall become the permanent course grade of record if the 'IU' is not changed through formal notice by the instructor upon the student's completion of the course.

Assignment of an interim 'IU' is at the discretion of the instructor. A grade of 'IU' can be assigned only if successful completion of unfulfilled course requirements can result in a final grade better than the default 'U' grade. The student should have a passing average in the requirements already completed. The instructor shall provide the student specification, in writing, of the requirements to be fulfilled.

The university’s Graduate Incomplete Policy can be found here.

Related Policies and Services

Academic integrity is a fundamental university value. Through the honest completion of academic work, students sustain the integrity of the university while facilitating the university's imperative for the transmission of knowledge and culture based upon the generation of new and innovative ideas. See http://grad.buffalo.edu/Academics/Policies-Procedures/Academic-Integrity.html.

Accessibility resources: If you have any disability which requires reasonable accommodations to enable you to participate in this course, please contact the Office of Accessibility Resources in 60 Capen Hall, 645-2608 and also the instructor of this course during the first week of class. The office will provide you with information and review appropriate arrangements for reasonable accommodations, which can be found on the web here.

Background Reading and Video Materials