This is a glossary of fields used by the data resource descriptor file convention of the (Re)usable Data Project.
A template file is provided to assist with this process, and this documentation mirrors the description of these fields as they appear on reusabledata.org data source detail pages.
To make sure structured information is displayed properly, users may want to view this file in the original form on GitHub.
Unique ID of the resource, biodbcore or NAR preferred. But any unique internal ID is usable.
This is a short-form identifier (often an initialism) for the resource created by the curator. By convention, this is typically the same as the file prefix describing the resource. Dashes are used in lieu of spaces when needed.
civicdb
, bgee
, mgi
, monarch
, pmkb
, ncbi-gene
A full description of the resource from the resource itself, if possible.
'Integrate, align, and re-distribute cross-species gene, genotype, variant, disease, and phenotype data. Provide a portal for exploration of phenotype-based similarity. Facilitate identification of animal models of human disease through phenotypic similarity. Enable quantitative comparison of cross-species phenotypes. Develop embeddable widgets for data exploration. Influence genotype and phenotype reporting standards. Improve ontologies to better curate genotype-phenotype data.'
(Optional) The ISO 8601 date of when the license was last reviewed by a (Re)Usable Data Project curator.
2017-12-03
A full-length, human-readable name for the resource that differentiates it from other similar resources.
National Center for Biotechnology Information (Gene)
URL for the resource. (Also described as Location in the reusabledata.org source details view.)
http://www.civicdb.org
(Optional) How the resource relates to the data it contains. Used for gross description; it is naturally hard to categorize many resources.
A warehouse
value is currently under discussion.
Current allowable entries are: unknown
, repository
, source
, and integrator
.
Whether or not annotation is complete on this resource. (Also described as Curation status in the reusabledata.org source details view.)
Current allowable entries are: complete
, incomplete
, and nonpublic
.
The area of research for the resource, loosely determined by the curator. (Also described as Field in the reusabledata.org source details view.)
biomedical
, biology
, pharmacology
The type of data the resource contains. (Also described as Type in the reusabledata.org source details view.)
x-species
, cross-species
, ontology
, MOD
, genomic resource
, pathway
, sequence
, human phenotype gene associations
(Optional) Free tags to describe the resource and its data. (Also described as Categories in the reusabledata.org source details view.)
cancer
, precision medicine
, variants
, variant disease associations
, food terminology
, food ontology
, food associations
(Optional) Links and metadata the resource's data in a structured list.
YAML
- type: download
location: https://civic.genome.wustl.edu/releases
- type: api
location: https://griffithlab.github.io/civic-api-docs/
- type: api
location: http://foo.bar/api-v1
label: old api
The license that is used by the resource. We use
SPDX where we can or: inconsistent
,
public domain
, unlicensed
, all rights reserved
, or custom
.
See scripts/source.schema.yaml for the most current full list that we use (as well as how to update).
CC0-1.0
, CC-BY-4.0
, custom
, all rights reserved
The type of license that is being used. This will be to define compatible data pools in the future; we only use the grossest terms now. If it is not known or unclassifiable "unknown" is used.
Current possible values are: unknown
, copyleft
, permissive
,
copyright
, restrictive
, private pool
(Optional) The link to the resource license. (Also described as License location in the reusabledata.org source details view.)
http://www.omim.org/help/agreement
, http://www.orphadata.org/cgi-bin/inc/legal.inc.php
(Optional) Setting this flag to true indicates that the licensing was combinatorially complicated enough (as is the case in some commercial licenses) that the curator chose to wear a single "hat" during the process. From the site text:
"While we try to cover as much of the licensing possibilities of a data resource that we can, in a few cases we may choose a particular "hat" to wear while evaluating to prevent a combinatorial explosion, which may also reduce the clarity of our curations for the community. In these cases, we may take on the role of a (1) non-commercial (2) academic (3) group that is (4) based in the US and trying to (5) create an aggregating resource (integrator), noting that other entities may have different results in the license commentary."
(Also described as Focused curation in the reusabledata.org source details view.)
true
, false
(Optional) Structured issues with the license.
For every issue discovered with a resource, there should be a corresponding item in the license-issues field that marks the /exact/ violation, along with any comments. This field can be used by resources as the first step to improvement, as well as clarify any surrounding circumstances.
Any issues or thoughts about a resource that do not slot into one of the criteria violations can go into the license-commentary field. They may cross reference.
(Also described as Issues in the reusabledata.org source details view.)
YAML
- criteria: A.1.1
comment: looked under only rock, not there
- criteria: E.1.1
comment: clause that effectively uniformly prevents smurfs from accessing the data
(Optional) Further commentary on the license, possibly including the process of curation and things like locations of additional licenses. (Also described as Commentary in the reusabledata.org source details view.)
YAML
- "one thought"
- "another thought"
(Optional) Marker noting that there was some extended internal discussion or controversy about the evaluation of the licensing terms. If this is marked at "true", the controversy, or a link to a permanent archive of the controversy, must be sufficiently contained in the "license-commentary" to reconstruct the issues. (Also described as Controversial in the reusabledata.org source details view.)
true
, false
(Optional) In cases where there may not be the bandwidth for multiple (min. 2) people to review an evaluation, this piece of metadata allows desired fixes and new evaluations to start moving through the system, but have a way to keep track of them for additional scrutiny later on. The assumption is that things are not.
true
, false
(Optional) List of resource contact information, link, email, or whatever is public.
YAML
- https://civic.genome.wustl.edu/contact
- foo@bar.bib
(Optional) Semi-structured list of supporting grants.
yaml
- label: 'Rhea development and curation activities at the SIB are supported by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation (SERI), and by the SystemsX.ch, The Swiss Initiative in Systems Biology.'
url: http://foo.bar
- label: 'NIH Grant for Science 123'
All copyrightable materials on this site are ©
2019 the (Re)usable Data
Project under the
CC-BY
4.0 license.
The (Re)usable Data Project is funded by the National Center for
Advancing Translational Sciences
(NCATS) OT3
TR002019 as part of
the Biomedical
Data Translator project
and U24TR002306 as part of the CTSA Program National Center for Data to Health (CD2H).
The (Re)usable Data Project would like to acknowledge
the assistance of many more people than can be listed
here. Please visit the about
page for the full list.