Ontology for Data Science
Title: PHI 549 Introduction to Ontology for Data Science and Information Fusion, Spring 2019
Proposed Special Topics Course
Faculty: Barry Smith
Registration: Class# . Registration details for off-campus students are provided under Part Time/Graduate here.
Course Structure: This will be a one credit hour course intended for beginning PhDs, Masters and advanced Undergraduate students in all departments. Assessment will be in the form of class attendance (50%), together with ontology-building exercises using the Protégé ontology editing softaware to be completed within 2 weeks of the class (50%). For those enrolled in the CMIF Masters Program completion of this course will represent 1 credit hour in the 3-hour compulsory ontology class for this program.
Course Description: An ontology is a structured collection of terms used to tag data with the goal of making data deriving from heterogeneous sources more easily searchable, comparable. combinable, or analysable. Ontologies allow information to be shared across communities with different sorts of expertise. The course will provide an introduction to ontology for students of data science and information fusion. It is co-sponsored by the Department of Philosophy and the Center for Multi-Source Information Fusion.
Schedule: The class will be taught over two consecutive days in the week before the beginning of the Spring semester.
Day 1
8:30 Registration and Coffee
9:00 Introduction to Ontology, Data Science and Information Fusion
- Data mining
- Natural language processing
- Explainable artificial intelligence
10:30: Coffee
10:45 Ontology Time-Line
- 1: 1970s: Strong AI, Robotics, PSL
- 2: 1990s: The Semantic Web, Linked Open Data
- 3: 2000s: Lessons from the Human Genome Project
- 4: 2019: Current examples of uses of ontology in data science and information fusion
11:00 Introduction to Ontology Building
- The Protégé ontology editor
- Simple examples; Toy Military Vehicle Ontology
- The Web Ontology Language (OWL)
12:00 Lunch
12:45 Ontology in Buffalo
- Basic Formal Ontology
- Common Core Ontologies (CCO)
- Industry Ontologies Foundry (IOF)
14:00 Ontology in Military and Intelligence Domains
- Joint Doctrine Ontology
- Ontology and military logistics
- Ontology and intelligence analysis
- Ontology of terrorism
- Ontology and information fusion
16:00 Close
Day 2 8:30 Coffee
9:00
10:30: Coffee
10:45
11:00
12:00 Lunch
12:45
14:00
Functions and Capabilities (continued) Product Life Cycle (PLC) Ontology Defining ‘System’ 2:00 Part Five: Services, Commodities and Infrastructure 3:00 Adjourn
-
- Building scientific ontologies which work together demands a common set of ontological relations
- Basic Formal Ontology: benefits of coordination
- Users of BFO
- Continuants, occurrents, realizables
- Specific dependence, generic dependence, information artifacts
- Dispositions, roles, functions
- Diseases and disorders: the Ontology of General Medical Science
- The Universal Core: Ontology and the US Federal Government Data Integration Initiative Video, Slides, Reading
- The DoD Net-Centric Data Strategy
- The Universal Core (UCore) Taxonomy and Semantic Layer
- Reasoning with OWL DL
- Managing extension ontologies
- Example: Command and Control
- Information entities
- The UCore change management process
- How UCore SL helps
Feb 26: Simple Protege Introduction
- When watching these videos please bear in mind that we have not introduced in the class so far the specific terminology used by Protege. Most importantly, 'class' in Sadawi's course is what we have been referring to as 'type' or 'universal'. 'Property' is what we have been referring to as 'Relation'. Each property has a domain and a range; for instance the property teaches has the domain teacher and the range student. A guide (probably more than you need) is here and there is also an introduction to the Semantic Web in the Appendix to the BFO book. If there is terminology used in Sadawi's lectures which you think needs explaining please feel free to post a request to the the class email list [1].
- In addition to taking Sadawi's course you should also download Protege to your computer from here and experiment with creating simple ontologies of your own and posting them to the web. Include links on the slack page to the ontologies you create.
Mar 5: Ontology and Referent Tracking
- This video presents a set of rules and best practices for ontology creation, together with examples
- The three videos below introduce the idea of referent tracking. Where ontologies are descriptions of types (universals and defined classes in reality), a referent tracking system provides a way of referring to and keeping track of the instances of such types. Ontologies and referent tracking systems are thus two sides of a single kind.
- A referent system (RTS) is designed not merely to keep track of what is the case in reality but also to allow us to capture what is believed to be the case in reality. It also allows us to keep track of how changes in the information system correspond to changes in the reality outside that system. We will provide an introduction to referent tracking and its implementations.
- Basics of Referent Tracking (RT) Video
- RT and Video Surveillance Video
- RT and Data descriptions Video
- Reading: How to track absolutely everything?. Note that each time your download this pdf file the copy you create is assigned a new referent tracking ID. This enables Dr Ceusters to keep track of the IP addresses of those who are downloading materials from his site and of which versions of these materials they are downloading at which times.
Mar 12: Basic Formal Ontology
The video material for the period from March 12 through March 26 will cover Basic Formal Ontology as described in the book and in the more technical specification here.
A shorter summary of the March 12-26 material is presented here:
- Mar 19 Spring Recess
Mar 26 Basic Formal Ontology (Continued)
- BFO Part Two: Varieties of continuant entities
- Boundaries, sites and spatial regions
- Material entities occupy spatial regions
- Temporal instants and temporal intervals
- Boundaries, sites and spatial regions
April 2: Basic Formal Ontology (Continued)
- BFO Part Four: Granular Partitions
- Slides
- Manipulating partitions
- Object partitions
- Quality partitions
- Color
- Map layers
- Process partitions
- Map-based partitions of occurrent reality and the fiat entities they create
- Weather
- Napoleon's march to Moscow
- Map-based partitions of occurrent reality and the fiat entities they create
- From photography to film
- Persistence in time
- Partition sequences
- Tossing a coin
- Chess
- Flying from Vienna to New York
- Molecular pathways
- Partition sequences
- Defining 'process profile'
- Focusing on the cello part when you listen to a string quartet
- Granular partitions and the Davidsonian theory of events
Granular partitions are systems of cells which are projected on corresponding portions of reality. Maps of the territorial dioceses and archdioceses of the Catholic Church of the United States, for example, are granular partitions of a certain area of land. The former stands to the latter in the relation of refinement. Some granular partitions reflect bona fide divisions of reality, for example between mammals and reptiles. Others, for example, between normal and elevated blood pressure, represent reality in terms of fiat demarcations introduced for diagnostic or other purposes. This talk applies the theory of granular partitions to our understanding of quantities and of units of measure.
Apr 9: Environments and Emotions
- Environments: Inside and Outside the Organism Slides
- The talk begins with the question: What is an environment? Environments fall outside traditional philosophical classifications since they are neither things nor events. The Environment Ontology proposes a new approach to the understanding of environments grounded in the science of ecology, and considering environments within the same family as niches, habitats, and biomes. The talk concludes with a discussion of the relationship between the ontology of biological environments and the idea of environment underlying the ecological psychology of J. J. Gibson and the theory of behavior settings put forward by Roger Barker.
- The Emotion Ontology Slides
- The scientific study of emotions utilizes data of a wide range of different sorts, ranging from introspective and observational reports of individual emotional experiences to experimental data deriving from chemical, genetic, and neurological studies. Scientific ontologies provide a strategy for the integration of such heterogeneous data by providing formal definitions of the types of entities in the corresponding domains of reality and a controlled vocabulary in whose terms the different sorts of data can be consistently described. Heretofore, there has been little effort directed towards such formal representation for emotional phenomena, in part because of widespread debates within the affective science community on matters of definition and categorization. The Emotion Ontology is an attempt to rectify this shortfall. I will describe the ontology and show how it interoperates with ontologies in neighboring areas such as neurochemistry. I will also draw some general conclusions pertaining to classification in psychological and psychiatric domains, to the treatment of grief, and to the relation of all of the above to questions of philosophy.
Apr 16: The Ontology of Social Reality (continued)
- A set of intermeshed diagrams called musical scores guides the complex series of human actions we call an orchestral performance. A set of intermeshed diagrams in a military field manual, similarly, guides the complex series of human actions which is a military operation. Musical scores and field manuals serve similarly as the basis for training of the users of such diagrams, which are able to perform their guidance functions only if their users have correspondingly intermeshed types of expertise.
- We begin by distinguishing speech acts from document acts, where the latter includes not only for example signing or stamping or filling in a paper document but also including the acts performed, for instance, when you are completing and submitting your tax forms using tax software. We refer to the latter as e-document acts. Planning, and especially military planning, is nowadays a matter of both paper document acts and e-document acts. Successful military planning requires that there be pre-defined types of actions which planners can incorporate into their plans. Planners must be confident that warfighters will be able to execute actions of these types in an effective way. We show how this confidence is achieved 1. through military doctrine -- which defines the relevant action types -- and 2. through military training -- which builds the warfighters who can execute them. Military plans, military doctrine and military training relate not only to the actions of individual warfighters, but also to team actions and to the sorts of team of team actions involved when entire armies are involved in military options. It is the role of military command to make this possible -- we plan team actions by planning the individual actions of commanders at different levels in the military hierarchy. The speech acts and document acts we call military commands thus occur in the typical case as part of the execution of military plans. We conclude with a comparison between the planning, training, and commanding on the side of the military with the orchestration, rehearsal, and conducting that takes place in the performance of symphonic music.
- Notoriously, intelligence agencies face the problem of Connecting the Dots. 'Connecting', here, means not only cross-identifying the individuals referred to in different sources, but also combining in useful ways all the data about such individuals. Ontologies allow analysts to harvest combinable information from messy inputs by providing consistent sets of terms for describing the entities involved. Suppose, for example, that ontology terms have been used to tag collections of heterogeneous source data about, say, persons in Baghdad. Analysts can then use the results to identify all available data regarding, say, persons who speak Armenian, or persons with expertise in Java programming; and they can do this independently of the type of data (text, images, audio)which served as inputs. To be effective, however, ontologies need to contain not just terms but also definitions. To illustrate how this works we will consider some simple examples of ontology building, concluding with an ontological approach to the definition of terrorism.
Apr 23: The Ontology of Social Reality (continued)
- The scientific study of emotions utilizes data of a wide range of different sorts, ranging from introspective and observational reports of individual emotional experiences to experimental data deriving from chemical, genetic, and neurological studies. Scientific ontologies provide a strategy for the integration of such heterogeneous data by providing formal definitions of the types of entities in the corresponding domains of reality and a controlled vocabulary in whose terms the different sorts of data can be consistently described. Heretofore, there has been little effort directed towards such formal representation for emotional phenomena, in part because of widespread debates within the affective science community on matters of definition and categorization. The Emotion Ontology is an attempt to rectify this shortfall. I will describe the ontology and show how it interoperates with ontologies in neighboring areas such as neurochemistry. I will also draw some general conclusions pertaining to classification in psychological and psychiatric domains, to the treatment of grief, and to the relation of all of the above to questions of philosophy.
- Basic Formal Ontology provides no obvious category under which deontic entities such as claims, rights, obligations, permissions fall. The lecture provides a summary of how such entities may be treated in a way that is consistent with BFO, focusing on the case of obligations generated through acts of promising. a longer version of this material is presented here and here.
- An organigram is a graph-theoretic structure consisting of nodes and edges. The nodes standardly represent three sorts of entities: divisions within the organization, offices of the persons who head these divisions, and the current holders of such offices. The edges represent relations of sub- and superordination between the entities represented by the nodes. Where such a relation obtains the subordinate has obligations based upon his consent to perform certain duties as directed and controlled by the superordinate. We will evaluate the hypothesis that an organization is itself a graph-theoretic structure that is (or is capable of being) represented by an organigram.
Apr 30: Ontology of Medicine
- A recent paper in the journal Healthcare Informatics Research identifies a paradigm shift - 'from concept representations to ontologies' - in the ways medical terminologies and vocabularies are used to describe medical data [1]. We will describe what this paradigm shift involves, what it means to talk about 'ontologies' in the medical context, and how such talk relates to the traditional concerns of philosophical ontologists. We shall conclude with an ontological definition of disease, and illustrations of how this definition can be applied to a range of clinical examples. [1] See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3920035/
- We will provide an introduction to the field of biomedical ontology with special reference to the field of pathology informatics. We will look at examples of existing ontologies especially the Ontology for Biomedical Investigations (OBI), the Ontology for Biological and Clinical Statistics (OBCS), and the Ontology for General Medical Science (OGMS). We will then draw lessons from these examples for an ontology of pathology imaging.
- Starting from around 2005, national programs for the introduction of Electronic Health Records (EHRs) were launched with great enthusiasm in the US and UK. EHRs were seen as a means of increasing quality, safety and continuity of clinical care while at the same time reducing healthcare costs. I will survey the results of these, pointing out both achievements and failures. Specific topics to be addressed include: problems of data interoperability; ‘meaningful use’; the role of SNOMED CT, openEHR, and FHIR; and the prospects for secondary use of EHR data in information-driven clinical and translational research.While bioinformatics has witnessed enormous technological advances since the turn of the millennium, progress in the EHR field has been stymied by outdated approaches entrenched through ill-conceived government mandates. In the US, especially, the dominant EHR systems are expensive, difficult to use, fail to ensure even a minimal level of interoperability, and detract from patient care. I will conclude by sketching an evolutionary path towards the sort of EHR landscape that will be needed in the future, in which consistency with biomedical ontologies will play a central role.
- Surveys a series of ethical, economic, clinical and also safety issues relating to the application of informatics to healthcare, focusing especially on the role of informatics in the Patient Protection and Affordable Care Act. Talk presented in the University at Buffalo Clinical/Research Ethics Seminar - Ethics, Informatics and Obamacare, November 20, 2012. Slides are available here: http://ontology.buffalo.edu/13/ethics-informatics-obamacare.pptx
May 7 Student video presentations
Background Materials
Example Ontologies
Text: Robert Arp, Barry Smith and Andrew Spear, Building Ontologies with Basic Formal Ontology, Cambridge, MA: MIT Press, August 2015
Further readings are provided here: http://ontology.buffalo.edu/smith/
Requirements: This course is open to all persons with an undergraduate degree and some relevant experience (for example in data scientists, information engineers, terminology researchers). No prior knowledge of ontology is required. In order to receive a grade and course credit students will be required to have reviewed in a timely manner all provided videos and any accompanying recommended reading. Grading will be on the basis of contributions to the on-line class discussion forum and on the quality and content of a 20 minute youtube video (with accompanying essay and powerpoint slide deck) on some topic in the field of applied ontology. Each student will be required to create one such video for presentation in the final class session on May 8. Examples of student videos created in comparable classes in the past are available here and here.
- Your video should be 20 minutes long; it will be graded on the basis of clarity and force of argument, interestingness of content, and quality of delivery.
- The video should be based on a powerpoint presentation of approximately 20 slides. The slides should provide a minimal amount of text (using 30 point font or above), together with accompanying graphics, for example charts representing data. You should not read the slides -- rather, you should use the slides as summaries of the successive points you want to make, and present these points ex tempore.
- The video should be accompanied by an essay presenting the points you make and providing literature references, the whole amounting to at least 3000 words. A short draft of your essay should be submitted to Dr Smith by March 31 at the latest.
Class participants should communicate by email with Dr Smith to determine topic and scope of your video presentation and accompanying materials.
Grading will be based on:
- 1. forum participation (25%)
- 2. 20 minute youtube video (25%)
- 3. associated powerpoint slides (25%)
- 4. associated essay (25%)
For policy regarding incompletes see here
For academic integrity policy see here