ClassificationByMarkerSpec: Difference between revisions
From NCOR Wiki
Jump to navigationJump to search
Created page with '== Goal == Write a protocol for recording of CYTOF experimental results and a method for determining cell type based on recorded markers. == Status == This is an early draft. ...' |
(No difference)
|
Latest revision as of 06:18, 12 September 2013
Goal
Write a protocol for recording of CYTOF experimental results and a method for determining cell type based on recorded markers.
Status
This is an early draft. In particular since writing this I have come to the conclusion that we should be using more information in the determination of cell type, and that information necessary to do the determination should reside in both the Cell Ontology and the Ontology for Biomedical Investigations. See Motivation for ontological representation of cell type identification assays for flow cytometry and CyTOF presented at the ImmPort_Ontology_Conference
Draft
Protocol for classifying and recording cell type/populations from CYTOF experiments
Alan Ruttenberg
Assumptions:
===========
1. Clustering/gating of CYTOF results will be done by the submitter. If
this assumption is incorrect we should describe the clustering in a
separate protocol (protocol-clustering)
2. Targeted proteins will be clearly identified. Where an antibody is
used as surrogate, and if the submitter wishes to use the antibody to
identify the protein, the antibody must be identified by an Immport
antibody registry identifier. If the antibody is not yet present in
the Immport antibody registry, it needs to be submitted and
curated according to (protocol-antibody)
3. Information specifying a protein target will generally be in most
cases unambiguous. However it may be determined to be ambiguous, for
example, in a case where we have a protein specified as CD8 (or an
antibody whose target is so specified), and whose specification is
thus ambiguous as to whether CD8a or CD8b is targeted, or in the case
that where protein has isoforms and the isoform is not specified. In
such cases the submitter should be queried as to whether this was
intended. The submitter can either fix the ambiguity, explain how we
should handle it in the analysis (e.g. perform analysis for each
possibility, ignore differences in isoforms), or remove the protein
target from the submission.
How to read this document
=========================
(type:xx) informally indicates a data type, which may be referred in
different parts of this protocol. The data types will be formalized in
subsequent revision.
For each data type, elements are enumerated by a common letter and ascending number
p1.
p2.
...
g1.
...
Inputs:
======
CYTOF results data for which cell type needs to be determined or
validated. Note that not all CYTOF experiments are of this nature - of
the experiments reported at http://www.cytobank.org/nolanlab/reports/,
for example, one characterized phases of the cell cycle using known
markers for cell types probed, one validated a surrogate CYTOF-based
marker for cell death. Such experiments are outside the scope of this
protocol.
Input data should consist of sets of proteins that were probed in the
experiment and which are grouped together as representative of a cell
type or state, analogous to gating information for flow cytometry
experiments.
--
type: cytof-experiment-protein-set
The first part of the input should be a complete listing of the
proteins used in the experiement
s1. Name or identifier of this protein set
--
type: cytof-experiment-protein-readout
For each of the proteins probed in the name set s1 the following
information should be provided:
p1. Name and/or identifiers of the target protein. Where more than one
identifier or more than one name is known to the investigator as a
common way to identify the protein, please include these.
p2. If a modified form of the protein is targeted (e.g. phosphorylated
at a specific site) then include a description of the targeted
form. If the protein is known not to be modified in certain ways, then
make this knowledge explicit (e.g. report 'not phosphorylated at
S235')
p3. If the readout is of an antibody to the protein, then the clone
name, manufacturer (if commercial) or other source (if not), and
catalog number (if manufacturer) or other identifier.
p4. If the readout is of some other marker, information that specifies
that marker.
p5. If the researcher expects that the marker may be ambiguous -
e.g. targets more than one isoform, or different gene products, please
include this information.
p6. A local synonym(handle) for the protein, for the purposes of easily
referring to it elsewhere during the protocol.
p7. In the case that a PRO identifier for the protein is known or has
been obtained through Immport resources, it can be substituted for p1
and p2.
p8. In the case that the antibody or marker has been registered and
curated into an Immport resource, it's Immport-chosen identifier can
be used in place of p1-p7. However, inclusion of redundant information
can serve as a quality check and so the submitter is encouraged to
include all fields if feasible.
--
type: cytof-cluster-protein-markers
Each set of proteins that are grouped together as representative of a
cell type or state, should have
g1. a name for the group
g2. The name of the complete set of proteins probed as named in s1
g3. Whether or not proteins in s1 but not listed as members of this cluster
should be considered as not having detectable levels ('assume-absent')
or whether only the listed proteins should be considered part of the
group ('assumed-complete').
--
type: cytof-cluster-protein-marker-level
The following information for each protein in a
cytof-cluster-protein-markers should be provided.
ml1. The chosen local synonym/handle for the protein (p6), or PRO identifer (p7)
ml2. Whether the targeted protein was present or the amount was below
detectable levels. ('present'/'+' or 'absent'/'-')
ml3. If the targeted protein is present and quantitated then the
quantity of the protein and associated unit (give unit name), unless
the measurement is relative to an arbitrary standard. (enter as
'experiment unit')
ml4. If available, a qualitative assessment of the level of the
protein, one of:
'high' (synonym: 'bright')
'low' (synonym: 'dim')
Question: Do we need "mid", as in "CD38mid"
If the submitter does not want to use qualitative level in the query, instead write
'ignore'
If the submitter wants the qualitative assesment computed write
'computed' (Question: do we want to allow this?)
Notes:
If protein levels are quantitated in ml3, and ml4 is 'compute', then we
need to design (protocol-assign-qualitative-marker-level)
If ml4 is 'ignore' then only 'absent'/'present' information will be used
--
Validation
==========
Ensure required combinations of fields are present in input
Ensure that any antibodies specified in place of proteins are
registered in Immport antibody registry
Ensure that if any cytof-cluster-protein-marker-level ml4 is
'computed', then all cytof-cluster-protein-marker-level ml3 must be
quantitated
Ensure that all cytof-cluster-protein-marker-level ml3 units are either
'experiement unit' or all are units that are convertible with one
another.
Ensure that for any cytof-protein-marker-level that where ml2 is
absent, ml3 and ml4 are empty
Question: Are there more validations (Alan guesses yes)
If any validation fails, return the submission to investigator with
report.
== TODO: Provide an example of valid input.==
Processing each protein target in cytof-experiment-protein-set
==============================================================
1. For proteins specified using an antibody in the Immport antibody
registry, retrieve the associated PRO ID.
2. For proteins that are not specified by a PRO ID, first
a. Attempt to look up the PRO entry given the information, or if that fails
b. Submit a term request for the protein to Alex Diehl for submission to the PRO team
3. If the the result of 2 is that the pro term is ambiguous, contact
the submitter for instructions on how to handle the ambiguity.
4. When all proteins have PRO IDs we can proceed to the processing of
the cytof-cluster-protein-markers
Processing of each cytof-cluster-protein-markers
================================================
type: cl-triple-store
Required: A triple store providing SPARQL query answering over the
fully reasoner CL.
sqa1: The SPARQL endpoint for the triple store
Assumes: All protein targets have PRO IDs
1. If any ml4 is 'compute' use protocol-assign-qualitative-marker-level to assign qualitative levels
2. Construct a class query based on all the cytof-cluster-protein-marker-level in the cytof-cluster-protein-markers. This query will be a conjunction of terms. Below we list the
forms the terms using OWL2 functional syntax (http://www.w3.org/TR/owl2-syntax/)
The relations we will use are:
<lacks_part> <http://purl.obolibrary.org/obo/cl#lacks_part>
<has_high_plasma_membrane_amount> <http://purl.obolibrary.org/obo/cl#has_high_plasma_membrane_amount>
<has_low_plasma_membrane_amount> <http://purl.obolibrary.org/obo/cl#has_low_plasma_membrane_amount>
<has_plasma_membrane_part> <http://purl.obolibrary.org/obo/RO_0002104>
(note: some of these relations have legacy URLs - check/fix)
If mp2 is 'absent' or '-'
then the clause is ObjectSomeValuesFrom(<lacks_part> <PRO_ID>)
If mp2 is 'present' or '+' and p4 is 'ignore'
then the clause is ObjectSomeValuesFrom(<has_plasma_membrane_part> <PRO_ID>)
If mp2 is 'present' and p4 is 'high' or 'bright'
then the clause is ObjectSomeValuesFrom(<has_high_plasma_membrane_amount> <PRO_ID>)
If mp2 is 'present' and p4 is 'low' or 'dim'
then the clause is ObjectSomeValuesFrom(<has_low_plasma_membrane_amount> <PRO_ID>)
If mp3 is 'assume-absent' then for each protein identified in
cytof-experiment-protein-set that is not identified in a
cytof-experiment-protein-readout (ap), add the clause
add the clause ObjectSomeValuesFrom(<lacks_part> <PRO_ID of ap>)
join the terms above using ObjectIntersectionOf
The class (ObjectIntersectionOf <cell> <E1>) defines a class of cells
(C1) with the markers as specified.
2. Using the expression for C1 above we will construct 2 SPARQL queries.
- A query Q1 for the most specific of the more general class types - this will yield the immediate superclasses of C1
- A query Q2 for the most general of the more specific class types - this will yield the immediate subclasses of C1
To construct Q1, Q2, first render the expression for C1 as RDF triples
<R>. This can be accomplished with a call to the OWLAPI, or equivalent
(see below, LSW2 code)
Retrieve the node that is the class defined by the expression. (S)
For the subclasses query, construct the sparql query
SELECT ?subclasses where { ?subclasses rdfs:subClassOf S . <R> }
For the superclasses query, construct the SPARQL query
SELECT ?superclasses where { S rdfs:subClassOf ?superclasses . <R> }
Here is an example of this transformation using LSW2.
(sparql-stringify ;; render a sparql query from sexp
`(:select
(?subclasses) ;; The variable for which we want solutions
()
,@(let ((p1 !<http://purl.obolibrary.org/obo/PR_P01730>) ;; CD4 human
(p2 !<http://purl.obolibrary.org/obo/PR_000025405>) ;; CD8a human
(has-high !<http://purl.obolibrary.org/obo/cl#has_high_plasma_membrane_amount>)
(cell !<http://purl.obolibrary.org/obo/CL_0000000>)) ;; Cell
(let ((translated
(butlast
(t-collect
`(ontology (object-intersection-of
,cell
(object-some-values-from ,has-high ,p1)
(object-some-values-from ,has-high ,p2)))))))
(let ((class-defined (first (first translated)))) ;; The blank node representing the defined class
`((?subclasses ,!rdfs:subClassOf ,class-defined) ;; The additional triple
,@translated)))))) ;; The rest of the RDF triples
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?subclasses
WHERE {
?subclasses rdfs:subClassOf _:b2 .
_:b2 owl:intersectionOf _:b3 .
_:b3 rdf:rest _:b4 .
_:b4 rdf:rest _:b6 .
_:b6 rdf:rest rdf:nil .
_:b6 rdf:first _:b7 .
_:b7 owl:someValuesFrom obo:PR_000025405 .
_:b7 owl:onProperty <http://purl.obolibrary.org/obo/cl#has_high_plasma_membrane_amount> .
_:b7 rdf:type owl:Restriction .
_:b4 rdf:first _:b5 .
_:b5 owl:someValuesFrom obo:PR_P01730 .
_:b5 owl:onProperty <http://purl.obolibrary.org/obo/cl#has_high_plasma_membrane_amount> .
_:b5 rdf:type owl:Restriction .
_:b3 rdf:first obo:CL_0000000 .
_:b2 rdf:type owl:Class . }
== TO BE DONE: Add clauses to get just the immediate subs and supers ==
Construct a report:
For each cytof-cluster-protein-markers give it's name, g1
Show the logical definition of the class defined for the query of this cluster (C1)
List the immediate superclasses, each of their URI, their logical definitions, and their textual definitions.
List the same for the immediate subclasses
We can predict a number of cases, but at this point in development we
should present this report to the submitter for review and comment.
Some cases we can imagine:
Result: The immediate superclasses and subclasses yield a single class.
Interpretation: Direct hit - the population measured is precisely C1
Result: The superclass expression(s) only differ from C1 by including
clauses that reflect absences of protein.
Interpretation: Here a judgement may need to be made as to whether the
absences reflect knowledge missing at the time the CL term was
curated, or whether they might define a distinct population.
Result: The subclass expression(s) only differ from C1 by the presence
of clauses that reflect properties or capabilities not measured in
this experiment.
Interpretation: Here a judgement may need to be made as to whether the
the extra conditions in the subclass expressions are also true of
the cell defined by the cluster, or whether they might be a distinct
population.
Result: The subclasses and superclasses look odd/incorrect.
Interpretation: The cluster might represent a population including different cell types.
================================================================
Our next step is review of several experiements and scrutiny of the
generated reports to gain experience with the protocol, with the aim
of removing or shortening the review step as much as possible.