-
Notifications
You must be signed in to change notification settings - Fork 7
JSON Export Format
The JSON export command is documented on the main Canto site.
A technical specification of the JSON export file (as a JSON Schema) is available at etc/export.schema.json on the Canto repository.
The output has this structure:
{
"curation_sessions" : {
"<session_key>": { see note below
"genes" : { see note below
"<organism_name_and_gene_uniquename>" : { see note below
"organism" : "Drosophila melanogaster",
"uniquename" : "FBgn0040505"
},
"<another_organism_name_and_gene_uniquename" : {
...
}
},
"alleles" : { see note below
"<unique_allele_id>": { see note below
"allele_type" : "<allele_type>",
"gene" : "<organism_name_and_gene_uniquename>", see note below
"name" : "<gene_name>",
"primary_identifier" : "<unique_allele_id>",
"synonyms" : []
},
"<another_unique_allele_id>" : {
...
}
},
"genotypes" : { see note below
"<unique_genotype_id>": { see note below
"loci": [
[
{
"id" : "FBal0157358"
},
{
"id" : "FBal0157359"
}
]
],
"organism_taxonid" : 7227
},
"<another_unique_genotype_id>": {
...
}
},
"metagenotypes": [ optional - see note below
"<a_unique_metagenotype_id>": {
"type": "pathgen-host",
"host_genotype": "<unique_genotype_id>",
"pathogen_genotype": "<another_unique_genotype_id>
}
],
"annotations" : [ see note below
{
"conditions" : [],
"creation_date" : "2019-06-11",
"curator" : {
"community_curated" : false,
"email" : "[email protected]",
"name" : "Kim Rutherford"
},
"evidence_code" : "",
"extension" : [],
"genotype" : "<unique_genotype_id>",
"publication" : "<some_pubmed_id>",
"status" : "new",
"submitter_comment" : null,
"term" : "<some_term_id>",
"type" : "phenotype",
"with_gene_id" : null
},
{
...
}
],
"metadata": { see note below
"accepted_timestamp" : "2019-05-31 10:00:22",
"annotation_status" : "APPROVED",
"annotation_status_datestamp" : "2019-06-12 10:48:22",
"approval_in_progress_timestamp" : "2019-06-12 10:47:39",
"approved_timestamp" : "2019-06-12 10:48:22",
"approver_email" : "[email protected]",
"approver_name" : "Admin Person",
"canto_session" : "4e77f8cbed7cd6c6",
"curation_accepted_date" : "2019-05-31 10:00:22",
"curation_pub_id" : "PMID:17285636",
"curator_email" : "[email protected]",
"curator_name" : "Kim Rutherford",
"curator_role" : "FlyBase-test",
"first_approved_timestamp" : "2019-06-12 10:48:22",
"needs_approval_timestamp" : "2019-06-12 10:47:19",
"session_created_timestamp" : "2019-05-27 05:53:39",
"session_first_submitted_timestamp" : "2019-06-12 10:47:19",
"session_genes_count" : "4",
"session_term_suggestions_count" : "0",
"session_unknown_conditions_count" : "0",
"term_suggestion_count" : "0",
"unknown_conditions_count" : "0"
},
"organisms" : {
"7227" : {
"full_name" : "Drosophila melanogaster"
}
},
"publications": { see note below
"<some_pubmed_id>" : {}
}
},
"<some_other_session_key>": {
...
},
...
}
}
The Canto session key is a unique 16 character hexadecimal ID for the session.
These keys are used in the allele
section to refer to genes in the
genes
section. The current format of organism + gene_uniquename may
change in future.
The genes have two fields:
-
organism
: the genus + species (+ optional strain) -
uniquename
: the gene primary ID
Used to uniquely refer to alleles in the alleles
section from the
genotypes
section. If the allele has been loaded from an external JSON
file (see JSON import file) then this ID will
the imported ID. Allele that are added in a Canto session get
assigned a unique ID for use when exporting.
Every allele will have an allele_type
and a primary_identifier
field.
- The
primary_identifier
is equivalent to be theuniquename
from Chado and will match the key of this allele (seeunique_allele_id
) - Example allele types are "deletion", "wild_type" and "aberration"
-
gene
is the ID of the gene of this allele unless the allele has type "aberration" -
name
anddescription
are optional, depending on the allele type -
notes
is an optional map of notes attached to this allele -
synonyms
is a list of synonyms that have been added in this session
A unique ID created for a genotype by Canto. These IDs are used in the annotations section to refer to genotypes.
-
organism_taxonid
: the NCBI taxon ID of organism this genotype comes from -
loci
: a list of loci -
comment
: a genotype specific comment
Each locus is a list of allele IDs with optional expression eg.
[
{
"id" : "FBal0157358"
}
]
or:
[
{
"expression" : "Overexpression",
"id" : "SPBC28E12.06c:00e0a3ede15887bf-2"
}
]
A diploid locus will look like:
[
{
"id" : "FBal0157358"
},
{
"id" : "FBal0157359"
}
]
And for a multi-locus genotype the loci
list will have 2 or more parts. eg.
"4e77f8cbed7cd6c6-genotype-26" : {
"loci" : [
[
{
"id" : "FBal0157358"
},
{
"id" : "FBal0157359"
}
],
[
{
"id" : "FBal0322737"
},
{
"id" : "FBal0322736"
}
]
],
"organism_taxonid" : 7227
},
Haploid and diploid loci can be mixed. eg.
"4e77f8cbed7cd6c6-genotype-20" : {
"loci" : [
[
{
"id" : "FBal0125507"
}
],
[
{
"id" : "FBal0288220"
}
],
[
{
"id" : "FBal0157358"
},
{
"id" : "FBal0157359"
}
]
],
"organism_taxonid" : 7227
},
This section contains two types of objects (differentiated with the type
field):
-
pathogen-host
to represent the "metagenotype" of a host genotype and a pathogen genotype when Canto is used in pathogen-host mode -
interaction
for a genetic interaction
All three fields are required:
-
type
-"pathogen-host"
-
host_genotype
- an ID (from thegenotypes
section) of a host genotype -
pathogen_genotype
- a pathogen genotype ID
Used to export genetic interactions
-
type
-"interaction"
-
genotype_a
- a genotype ID from thegenotypes
section -
genotype_b
- a genotype ID
A list of annotations with these fields:
-
type
: the annotation type (eg. "phenotype" or "molecular_function") -
creation_date
: when the annotation was made -
curator
:-
community_curated
: true if the curator is a non-admin user name
email
-
-
evidence_code
: eg. "Inferred from Physical Interaction" or "Microscopy" -
publication
: the publication/PubMed ID -
status
: currently always "new" -
submitter_comment
: a note from the curator for this annotation (if any) -
term
: the ID for the term that the gene or genotype was annotated with -
extension
: the extension for this annotation, if any. See Annotation extensions below. -
gene
: the unique gene ID of the gene that was annotated
-
with_gene_id
: for GO IPI/IGI annotations this is the value for GAF column 8
-
interacting_genes
: for interaction annotations, a list of the IDs of the interacting genes
-
conditions
: an optional list of IDs from an experimental condition ontology (eg. PECO -
genotype
: the unique genotype ID of the genotype that was annotated -
genotype_interactions_no_phenotype
: a list of genotype to genotype interactions associated with this phenotype annotation (See Genotype interactions below) -
genotype_interactions_with_phenotype
: a list of genotype to genotype interactions including details about single allele phenotype and extension (for example the rescued phenotype)
Not all of these fields have a value for all annotation types. These field will always be present:
creation_date
curator
- one of
gene
orgenotype
publication
-
term
(for ontology annotations) orinteracting_genes
(for interactions) type
Double mutant phenotypes can have associated (inferred) genetic interactions. These are attached to the phenotype annotation in the fields:
genotype_interactions_no_phenotype
genotype_interactions_with_phenotype
These fields are required by in all genotype-genotype interactions:
-
genotype_a
,genotype_b
: the IDs of a single locus genotype -
interaction_type
: see the main Canto documenation
The alleles from genotype_a
and genotype_b
are also the two
alleles in the doudble mutant of the phenotype annotation.
If the interaction has details about the phenotype and extension of the single locus phenotype that is rescued, these aditional fields are required:
-
genotype_a_phenotype_termid
: a term ID (example: "FYPO:0000091") -
genotype_a_phenotype_extension
: the external for the term in the same format as theextensions
field, can be empty ([]
)
The extensions
field of an annotation is a list of lists. Each part of the extension is a relation and a range.
GO annotations will generally use these relations: http://wiki.geneontology.org/index.php/Annotation_usage_examples_for_each_annotation_extension_relation
The possible range types (rangeType
) constrain the rangeValue
:
- "Ontology" - an ontology term ID
- "Gene" - a gene uniquename/ID
- "Metagenotype" - a metagenotype ID from the
metagenotypes
section - "Text" - a text field for other cases
Each extension part will also have a rangeDisplayName
when appropriate.
Examples for GO annotations:
{
"rangeDisplayName" : "rsd1",
"rangeType" : "Gene",
"rangeValue" : "PomBase:rsd1",
"relation" : "has_direct_input"
}
{
"rangeDisplayName" : "cellular response to nitrogen starvation",
"rangeType" : "Ontology",
"rangeValue" : "GO:0006995",
"relation" : "exists_during"
}
Example for a phenotype annotation:
{
"rangeDisplayName" : "high",
"rangeType" : "Ontology",
"rangeValue" : "FYPO_EXT:0000001",
"relation" : "has_expressivity"
}
The dependent and independent extension parts are written using a list-of-lists structure. The top level list contains independent extensions and the sub-lists holds the dependent parts.
The overview is:
"extension": [
[
{some_range_and_relation},
{another_range_and_relation}
],
[
{an_independent_range_and_relation},
...
],
...
]
The top level list will be empty if the current annotation has no extension. The sub-lists (if any) must contain at lease one element.
In the simple case where the extension field has just one part looks like:
"extension": [
[
{
"rangeDisplayName" : "high",
"rangeType" : "Ontology",
"rangeValue" : "FYPO_EXT:0000001",
"relation" : "has_expressivity"
}
]
]
An extension with two dependent parts like:
has_substrate(PomBase:SPATRNAASP.01), happens_during(cellular response to nitrogen starvation)
is represented as:
"extension": [
[
{
"rangeValue": "PomBase:SPATRNAASP.01",
"rangeType": "Gene",
"relation": "has_substrate"
},
{
"relation": "happens_during",
"rangeValue": "GO:0006995",
"rangeType": "Ontology",
"rangeDisplayName": "cellular response to nitrogen starvation"
}
]
]
The nested list contains the two dependent parts.
To represent two independent extensions on the same annotation the top level list will contain multiple elements. For example:
has_substrate PomBase:SPATRNAASP.01 , happens_during cellular response to nitrogen starvation |
has_substrate PomBase:SPATRNAASP.02 , happens_during cellular response to nitrogen starvation
is written:
"extension": [
[
{
"rangeValue": "PomBase:SPATRNAASP.01",
"rangeType": "Gene",
"relation": "has_substrate"
},
{
"relation": "happens_during",
"rangeValue": "GO:0006995",
"rangeType": "Ontology",
"rangeDisplayName": "cellular response to nitrogen starvation"
}
],
[
{
"relation": "has_substrate",
"rangeValue": "PomBase:SPATRNAASP.02",
"rangeType": "Gene"
},
{
"rangeDisplayName": "cellular response to nitrogen starvation",
"rangeType": "Ontology",
"rangeValue": "GO:0006995",
"relation": "happens_during"
}
]
]
(See https://curation.pombase.org/pombe/curs/4a7f9665ed7386e8/ro for an example of this)
-
accepted_timestamp
: when the session was accepted but the curator -
annotation_status
: the current annotation status, will always be "APPROVAL" if the--dump-approved
was passed to the export script -
annotation_status_datestamp
: when the status last changed -
approval_in_progress_timestamp
: when the approval process started -
approved_timestamp
: when the session was approved, may be different fromfirst_approved_timestamp
if the session went through the approval process more than once approver_email
-
approver_name
: who approved the session -
canto_session
: the 16 character hexadecimal session ID -
curation_pub_id
: the PubMed ID curator_email
-
curator_name
: who curated the session -
curator_role
: "community" or the organisation name (eg. "PomBase" or "FlyBase" -
first_approved_timestamp
: when the session was first approved -
needs_approval_timestamp
: when the session was submitted to the curators for approval, might be different to session_first_submitted_timestamp if the session was re-submitted -
session_created_timestamp
: when the session created, either by the admins or when the user enters a PMID on the front page session_first_submitted_timestamp
-
session_genes_count
: number of genes in the session -
term_suggestion_count
: the number of terms that have outstanding term suggestion, should be 0 for approved sessions -
unknown_conditions_count
: the number of conditions in the session that haven't been assigned a condition ontology ID, should be 0 for approved sessions -
has_community_curation
: true if and only if there are any annotations in this session made by a community curator -
annotation_curators
: if the flag--export-curator-names
is off this won't be exported. If--export-curator-names
is set, this is an array of hashes with the keys:-
name
: the name of the curator (admin or community) -
orcid
: the ORCID of the curator, ornull
if not known -
community_curator
: true if this curator is a community curator -
annotation_count
: the number of annotations by this curator in this session
-