DataJoint introduces a principled data model, which is described in detail in Yatsenko et al., 2018. This data model is a conceptual refinement of the Relational Data Model and also draws on the Entity-Relationship Model (ERM).
The Relational Data Model was inspired by the concepts of relations in Set Theory. When the formal relational data model was formulated, it introduced additional terminology (e.g. relation, attribute, tuple, domain). Practical programming languages such as SQL do not precisely follow the relational data model and introduce other terms to approximate relational concepts (e.g. table, column, row, datatype). Subsequent data models (e.g. ERM) refined the relational data model and introduced their own terminology to describe analogous concepts (e.g. entity set, relationship set, attribute set). As a result, similar concepts may be described using different sets of terminologies, depending on the context and the speaker’s background.
For example, what is known as a relation in the formal relational model is called a table in SQL; the analogous concept in ERM and DataJoint is called an entity set.
The DataJoint documentation follows the terminology defined in Yatsenko et al, 2018, except entity set is replaced with the more colloquial table or query result in most cases.
The table below summarizes the terms used for similar concepts across the related data models.
|Relational||ERM||SQL||DataJoint (formal)||This manual|
|relation||entity set||table||entity set||table|
|attribute value||attribute value||field value||attribute value||attribute value|
|primary key||primary key||primary key||primary key||primary key|
|foreign key||foreign key||foreign key||foreign key||foreign key|
|schema||schema||schema or database||schema||schema|
|relational expression||data query||
||query expression||query expression|
DataJoint: databases, schemas, packages, and modules¶
A database is collection of tables on the database server. DataJoint users do not interact with it directly.
A DataJoint schema is
- a database on the database server containing tables with data and
- a collection of classes (in MATLAB or Python) associated with the database, one class for each table.
In MATLAB, the collection of classes is organized as a package, i.e. a file folder starting with a
In Python, the collection of classes is any set of classes decorated with the appropriate
Very commonly classes for tables in one database are organized as a distinct Python module.
Thus, typical DataJoint projects have one module per database.
However, this organization is up to the user’s discretion.
Base tables are tables stored in the database, and are often referred to simply as tables in DataJoint. Base tables are distinguished from derived tables, which result from relational operators.
Relvars and relation values¶
Early versions of the DataJoint documentation referred to the relation objects as relvars https://en.wikipedia.org/wiki/Relvar. This term emphasizes the fact that relational variables and expressions do not contain actual data but are rather symbolic representations of data to be retrieved from the database. The specific value of a relvar would then be referred to as the relation value. The value of a relvar can change with changes in the state of the database.
The more recent iteration of the documentation has grown less pedantic and more often uses the term table instead.
The vocabulary of DataJoint does not include this term.
In data science, the term metadata commonly means “data about the data” rather than the data themselves. For example, metadata could include data sizes, timestamps, data types, indexes, keywords.
In contrast, neuroscientists often use the term to refer to conditions and annotations about experiments. This distinction arose when such information was stored separately from experimental recordings, such as in physical notebooks. Such “metadata” are used to search and to classify the data and are in fact an integral part of the actual data.
In DataJoint, all data other than blobs can be used in searches and categorization. These fields may originate from manual annotations, preprocessing, or analyses just as easily as from recordings or behavioral performance. Since “metadata” in the neuroscience sense are not distinguished from any other data in a pipeline, DataJoint avoids the term entirely. Instead, DataJoint differentiates data into data tiers.