Guide#
Welcome to the Bionty guide! 👋
In the following we will outline the main concepts and terminology of Bionty.
Entities#
In many practical applications, a biological entity (e.g., Species
) represents a variable that can take values from a vocabulary of terms.
There are different roughly equivalent vocabularies for the same entity. For example, one can describe species with the vocabulary of the scientific names, the vocabulary of the common names, or the vocabulary of ontology IDs for the same species.
There are different versions and granularity of these vocabularies. Typically, vocabularies are based on a given version of a public ontology, and may contain “custom” terms representing new knowledge that is not yet represented publicly.
Entity model#
We address 1. with a so-called Entity
model: Within Bionty, the primary representation for an entity is an Entity
object,
in which each column of the Entity table attribute corresponds to a vocabulary.
We address 2. through a user-setup process consists of:
looking up a standard ontology, fixing a resolution/depth of terms in the ontology and writing it to the vocabulary.
adding user-defined terms to the ontology, or, if their relation within the ontology is not yet clear, directly to the vocabulary.
Example:
Species is an entity.
Take one value that the entity can take: human is a choice (the name) for a descriptor of the abstract entry/ value/ term homo sapiens
The Entity class#
The Entity
class is the core class of Bionty that implements the above introduced Entity model.
It offers three primary functionalities (.df
, .lookup
, .curate
) that are managed by a single parameter id
.
When instantiating an Entity set the default id
by, for example, bionty.Phenotype(id="id")
.
The id
corresponds to the field name that constitutes the primary reference for every subsequent operation (.df
, .lookup
, .curate
).
Accessing ontology DataFrames: The
id
parameter sets the default index of the Pandas DataFrame when it is accessed (.df
). See Look up records of species, gene, protein, cell marker.Looking up records: Entity offers a
.lookup
function to lookup identifiers of Entity records. See Look up records of species, gene, protein, cell marker.Curating ontologies: By default,
.curate
curates any specified column in the target Pandas DataFrame against the index as defined by theid
of the Entity DataFrame. See Curate entity identifiers.
Glossary#
entity (lower case) refers to biological entities as described above.
Entity
refers to the entity class.Entity table/reference table refers to a table where the columns are vocabularies, accessed via
Entity.df
.Records refers to entries/rows in the Entity table.
Vocabularies are sets of terms that describe an entity.
Ontologies refer to sets of standardized terms that constitute a vocabulary.