JSON Export Format

Exporting from Canto to a JSON file

The JSON export command is documented on the main Canto site.

A technical specification of the JSON export file (as a JSON Schema) is available at etc/export.schema.json on the Canto repository.

The output has this structure:

  
{
  "curation_sessions" : {
    "<session_key>": {  see note below

      "genes" : {  see note below
        "<organism_name_and_gene_uniquename>" : { see note below
          "organism" : "Drosophila melanogaster",
          "uniquename" : "FBgn0040505"
        },
        "<another_organism_name_and_gene_uniquename" : {
          ...
        }
      },

      "alleles" : {   see note below
        "<unique_allele_id>": {    see note below
          "allele_type" : "<allele_type>",
          "gene" : "<organism_name_and_gene_uniquename>",  see note below
          "name" : "<gene_name>",
          "primary_identifier" : "<unique_allele_id>",
          "synonyms" : []
        },
        "<another_unique_allele_id>" : {
          ...
        }
      },

      "genotypes" : {   see note below
        "<unique_genotype_id>": {   see note below
          "loci": [
            [
              {
                "id" : "FBal0157358"
              },
              {
                "id" : "FBal0157359"
              }
            ]
          ],
          "organism_taxonid" : 7227
        },
        "<another_unique_genotype_id>": {
          ...
        }
      },

      "metagenotypes": [  optional - see note below
        "<a_unique_metagenotype_id>": {
          "type": "pathgen-host",
          "host_genotype": "<unique_genotype_id>",
          "pathogen_genotype": "<another_unique_genotype_id>
        }
      ],

      "annotations" : [   see note below
        {
          "conditions" : [],
          "creation_date" : "2019-06-11",
          "curator" : {
            "community_curated" : false,
            "email" : "[email protected]",
            "name" : "Kim Rutherford"
          },
          "evidence_code" : "",
          "extension" : [],
          "genotype" : "<unique_genotype_id>",
          "publication" : "<some_pubmed_id>",
          "status" : "new",
          "submitter_comment" : null,
          "term" : "<some_term_id>",
          "type" : "phenotype",
          "with_gene_id" : null
        },
        {
          ...
        }
      ],
      "metadata": {   see note below
        "accepted_timestamp" : "2019-05-31 10:00:22",
        "annotation_status" : "APPROVED",
        "annotation_status_datestamp" : "2019-06-12 10:48:22",
        "approval_in_progress_timestamp" : "2019-06-12 10:47:39",
        "approved_timestamp" : "2019-06-12 10:48:22",
        "approver_email" : "[email protected]",
        "approver_name" : "Admin Person",
        "canto_session" : "4e77f8cbed7cd6c6",
        "curation_accepted_date" : "2019-05-31 10:00:22",
        "curation_pub_id" : "PMID:17285636",
        "curator_email" : "[email protected]",
        "curator_name" : "Kim Rutherford",
        "curator_role" : "FlyBase-test",
        "first_approved_timestamp" : "2019-06-12 10:48:22",
        "needs_approval_timestamp" : "2019-06-12 10:47:19",
        "session_created_timestamp" : "2019-05-27 05:53:39",
        "session_first_submitted_timestamp" : "2019-06-12 10:47:19",
        "session_genes_count" : "4",
        "session_term_suggestions_count" : "0",
        "session_unknown_conditions_count" : "0",
        "term_suggestion_count" : "0",
        "unknown_conditions_count" : "0"
      },
      "organisms" : {
        "7227" : {
          "full_name" : "Drosophila melanogaster"
        }
      },
      "publications": {   see note below
        "<some_pubmed_id>" : {}
      }
    },
    "<some_other_session_key>": {
      ...
    },
    ...
  }
}

`<session_key>`

The Canto session key is a unique 16 character hexadecimal ID for the session.

`<organism_name_and_gene_uniquename>`

These keys are used in the allele section to refer to genes in the genes section. The current format of organism + gene_uniquename may change in future.

`genes` section

The genes have two fields:

organism: the genus + species (+ optional strain)
uniquename: the gene primary ID

`<unique_allele_id>`

Used to uniquely refer to alleles in the alleles section from the genotypes section. If the allele has been loaded from an external JSON file (see JSON import file) then this ID will the imported ID. Allele that are added in a Canto session get assigned a unique ID for use when exporting.

`alleles` section

Every allele will have an allele_type and a primary_identifier field.

The primary_identifier is equivalent to be the uniquename from Chado and will match the key of this allele (see unique_allele_id )
Example allele types are "deletion", "wild_type" and "aberration"
gene is the ID of the gene of this allele unless the allele has type "aberration"
name and description are optional, depending on the allele type
notes is an optional map of notes attached to this allele
synonyms is a list of synonyms that have been added in this session

`<unique_genotype_id>`

A unique ID created for a genotype by Canto. These IDs are used in the annotations section to refer to genotypes.

`genotypes` section

organism_taxonid: the NCBI taxon ID of organism this genotype comes from
loci: a list of loci
comment: a genotype specific comment

Each locus is a list of allele IDs with optional expression eg.

  [
     {
        "id" : "FBal0157358"
     }
  ]

or:

  [
     {
        "expression" : "Overexpression",
        "id" : "SPBC28E12.06c:00e0a3ede15887bf-2"
     }
  ]

A diploid locus will look like:

  [
     {
        "id" : "FBal0157358"
     },
     {
        "id" : "FBal0157359"
     }
  ]

And for a multi-locus genotype the loci list will have 2 or more parts. eg.

  "4e77f8cbed7cd6c6-genotype-26" : {
     "loci" : [
        [
           {
              "id" : "FBal0157358"
           },
           {
              "id" : "FBal0157359"
           }
        ],
        [
           {
              "id" : "FBal0322737"
           },
           {
              "id" : "FBal0322736"
           }
        ]
     ],
     "organism_taxonid" : 7227
  },

Haploid and diploid loci can be mixed. eg.

  "4e77f8cbed7cd6c6-genotype-20" : {
     "loci" : [
        [
           {
              "id" : "FBal0125507"
           }
        ],
        [
           {
              "id" : "FBal0288220"
           }
        ],
        [
           {
              "id" : "FBal0157358"
           },
           {
              "id" : "FBal0157359"
           }
        ]
     ],
     "organism_taxonid" : 7227
  },

`metagenotypes` section

This section contains two types of objects (differentiated with the type field):

pathogen-host to represent the "metagenotype" of a host genotype and a pathogen genotype when Canto is used in pathogen-host mode
interaction for a genetic interaction

`pathogen-host` type

All three fields are required:

type - "pathogen-host"
host_genotype - an ID (from the genotypes section) of a host genotype
pathogen_genotype - a pathogen genotype ID

`interaction` type

Used to export genetic interactions

type - "interaction"
genotype_a - a genotype ID from the genotypes section
genotype_b - a genotype ID

`annotations` section

A list of annotations with these fields:

type: the annotation type (eg. "phenotype" or "molecular_function")
creation_date: when the annotation was made
curator:
- community_curated: true if the curator is a non-admin user
- name
- email
evidence_code: eg. "Inferred from Physical Interaction" or "Microscopy"
publication: the publication/PubMed ID
status: currently always "new"
submitter_comment: a note from the curator for this annotation (if any)
term: the ID for the term that the gene or genotype was annotated with
extension: the extension for this annotation, if any. See Annotation extensions below.
gene: the unique gene ID of the gene that was annotated

GO annotations only:

with_gene_id: for GO IPI/IGI annotations this is the value for GAF column 8

Physical Interaction annotations only:

interacting_genes: for interaction annotations, a list of the IDs of the interacting genes

Phenotype/genotype annotations only

conditions: an optional list of IDs from an experimental condition ontology (eg. PECO
genotype: the unique genotype ID of the genotype that was annotated
genotype_interactions_no_phenotype: a list of genotype to genotype interactions associated with this phenotype annotation (See Genotype interactions below)
genotype_interactions_with_phenotype: a list of genotype to genotype interactions including details about single allele phenotype and extension (for example the rescued phenotype)

Not all of these fields have a value for all annotation types. These field will always be present:

creation_date
curator
one of gene or genotype
publication
term (for ontology annotations) or interacting_genes (for interactions)
type

Genotype to genotype interactions

Double mutant phenotypes can have associated (inferred) genetic interactions. These are attached to the phenotype annotation in the fields:

genotype_interactions_no_phenotype
genotype_interactions_with_phenotype

Common fields:

These fields are required by in all genotype-genotype interactions:

genotype_a, genotype_b: the IDs of a single locus genotype
interaction_type: see the main Canto documenation

The alleles from genotype_a and genotype_b are also the two alleles in the doudble mutant of the phenotype annotation.

Single phenotype fields

If the interaction has details about the phenotype and extension of the single locus phenotype that is rescued, these aditional fields are required:

genotype_a_phenotype_termid: a term ID (example: "FYPO:0000091")
genotype_a_phenotype_extension: the external for the term in the same format as the extensions field, can be empty ([])

Annotation extension

The extensions field of an annotation is a list of lists. Each part of the extension is a relation and a range.

GO annotations will generally use these relations: http://wiki.geneontology.org/index.php/Annotation_usage_examples_for_each_annotation_extension_relation

The possible range types (rangeType) constrain the rangeValue:

"Ontology" - an ontology term ID
"Gene" - a gene uniquename/ID
"Metagenotype" - a metagenotype ID from the metagenotypes section
"Text" - a text field for other cases

Each extension part will also have a rangeDisplayName when appropriate.

Examples for GO annotations:

  {
     "rangeDisplayName" : "rsd1",
     "rangeType" : "Gene",
     "rangeValue" : "PomBase:rsd1",
     "relation" : "has_direct_input"
  }

  {
     "rangeDisplayName" : "cellular response to nitrogen starvation",
     "rangeType" : "Ontology",
     "rangeValue" : "GO:0006995",
     "relation" : "exists_during"
  }

Example for a phenotype annotation:

  {
     "rangeDisplayName" : "high",
     "rangeType" : "Ontology",
     "rangeValue" : "FYPO_EXT:0000001",
     "relation" : "has_expressivity"
  }

Extension field structure

The dependent and independent extension parts are written using a list-of-lists structure. The top level list contains independent extensions and the sub-lists holds the dependent parts.

The overview is:

"extension": [
  [
    {some_range_and_relation},
    {another_range_and_relation}
  ],
  [
    {an_independent_range_and_relation},
    ...
  ],
  ...
]

The top level list will be empty if the current annotation has no extension. The sub-lists (if any) must contain at lease one element.

In the simple case where the extension field has just one part looks like:

"extension": [
  [
    {
     "rangeDisplayName" : "high",
     "rangeType" : "Ontology",
     "rangeValue" : "FYPO_EXT:0000001",
     "relation" : "has_expressivity"
    }
  ]
]

An extension with two dependent parts like:

has_substrate(PomBase:SPATRNAASP.01), happens_during(cellular response to nitrogen starvation)

is represented as:

"extension": [
    [
      {
        "rangeValue": "PomBase:SPATRNAASP.01",
        "rangeType": "Gene",
        "relation": "has_substrate"
      },
      {
        "relation": "happens_during",
        "rangeValue": "GO:0006995",
        "rangeType": "Ontology",
        "rangeDisplayName": "cellular response to nitrogen starvation"
      }
    ]
  ]

The nested list contains the two dependent parts.

To represent two independent extensions on the same annotation the top level list will contain multiple elements. For example:

has_substrate PomBase:SPATRNAASP.01 , happens_during cellular response to nitrogen starvation |
has_substrate PomBase:SPATRNAASP.02 , happens_during cellular response to nitrogen starvation

is written:

  "extension": [
    [
      {
        "rangeValue": "PomBase:SPATRNAASP.01",
        "rangeType": "Gene",
        "relation": "has_substrate"
      },
      {
        "relation": "happens_during",
        "rangeValue": "GO:0006995",
        "rangeType": "Ontology",
        "rangeDisplayName": "cellular response to nitrogen starvation"
      }
    ],
    [
      {
        "relation": "has_substrate",
        "rangeValue": "PomBase:SPATRNAASP.02",
        "rangeType": "Gene"
      },
      {
        "rangeDisplayName": "cellular response to nitrogen starvation",
        "rangeType": "Ontology",
        "rangeValue": "GO:0006995",
        "relation": "happens_during"
      }
    ]
  ]

(See https://curation.pombase.org/pombe/curs/4a7f9665ed7386e8/ro for an example of this)

`metadata`

accepted_timestamp: when the session was accepted but the curator
annotation_status: the current annotation status, will always be "APPROVAL" if the --dump-approved was passed to the export script
annotation_status_datestamp: when the status last changed
approval_in_progress_timestamp: when the approval process started
approved_timestamp: when the session was approved, may be different from first_approved_timestamp if the session went through the approval process more than once
approver_email
approver_name: who approved the session
canto_session: the 16 character hexadecimal session ID
curation_pub_id: the PubMed ID
curator_email
curator_name: who curated the session
curator_role: "community" or the organisation name (eg. "PomBase" or "FlyBase"
first_approved_timestamp: when the session was first approved
needs_approval_timestamp: when the session was submitted to the curators for approval, might be different to session_first_submitted_timestamp if the session was re-submitted
session_created_timestamp: when the session created, either by the admins or when the user enters a PMID on the front page
session_first_submitted_timestamp
session_genes_count: number of genes in the session
term_suggestion_count: the number of terms that have outstanding term suggestion, should be 0 for approved sessions
unknown_conditions_count: the number of conditions in the session that haven't been assigned a condition ontology ID, should be 0 for approved sessions
has_community_curation: true if and only if there are any annotations in this session made by a community curator
annotation_curators: if the flag --export-curator-names is off this won't be exported. If --export-curator-names is set, this is an array of hashes with the keys:
- name: the name of the curator (admin or community)
- orcid: the ORCID of the curator, or null if not known
- community_curator: true if this curator is a community curator
- annotation_count: the number of annotations by this curator in this session

Canto - PomBase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON Export Format

Exporting from Canto to a JSON file

`<session_key>`

`<organism_name_and_gene_uniquename>`

`genes` section

`<unique_allele_id>`

`alleles` section

`<unique_genotype_id>`

`genotypes` section

`metagenotypes` section

`pathogen-host` type

`interaction` type

`annotations` section

GO annotations only:

Physical Interaction annotations only:

Phenotype/genotype annotations only

Genotype to genotype interactions

Common fields:

Single phenotype fields

Annotation extension

Extension field structure

`metadata`

Clone this wiki locally

JSON Export Format

Exporting from Canto to a JSON file

<session_key>

<organism_name_and_gene_uniquename>

genes section

<unique_allele_id>

alleles section

<unique_genotype_id>

genotypes section

metagenotypes section

pathogen-host type

interaction type

annotations section

GO annotations only:

Physical Interaction annotations only:

Phenotype/genotype annotations only

Genotype to genotype interactions

Common fields:

Single phenotype fields

Annotation extension

Extension field structure

metadata

Clone this wiki locally

`<session_key>`

`<organism_name_and_gene_uniquename>`

`genes` section

`<unique_allele_id>`

`alleles` section

`<unique_genotype_id>`

`genotypes` section

`metagenotypes` section

`pathogen-host` type

`interaction` type

`annotations` section

`metadata`