Foreign keys define relationships between different entities represented by the schema design.
Even if you already know about foreign keys in SQL, please still read this section carefully. DataJoint prescribes a more principled way for defining and using data dependencies than in other models and languages such as SQL.
What are foreign keys¶
The DataJoint pipeline can be visualized as a graph with nodes and edges. The diagram of such a graph is called the entity relationship diagram or ERD. The nodes of the graph are tables and the edges connecting them are foreign keys. The edges are directed and the overall graph is a directed acyclic graph, a graph with no loops.
For example, the ERD below is the pipeline for multipatching experiments
The graph defines the direction of the workflow. The tables at the top of the flow need to be populated first, followed by table on step below and so forth until the last table is populated at the bottom of the pipeline. The top of the pipeline tends to be dominated by lookup tables (gray stars) and manual tables (green squares). The middle has many imported tables (blue triangles) and the bottom has computed tables (red stars).
Defining a foreign key¶
Foreign keys are defined as arrows
-> in the Table Definition, pointing to another table.
It may be a bit confusing that in the table definitions the arrow points to the table upstream in the pipeline whereas the ERD depicts arrows pointing downstream. Perhaps we could allow the left-pointing arrow <- in future of revisions of DataJoint to reduce confusion. In either case, the foreign key always points to upstream upstream tables.
A foreign key may be defined as part of the Primary Key. In the ERD, foreign keys from the primary key are shown as solid lines. This means that the primary key of the referenced table becomes part of the primary key of the new table. A foreign key outside the primary key is indicated by dashed line in the ERD.
For example, the following definition for the table
mp.Slice has three foreign keys, including one within the primary key.
## brain slice -> mp.Subject slice_id : smallint # slice number within subject --- -> mp.BrainRegion -> mp.Plane slice_date : date # date of the slicing (not patching) thickness : smallint unsigned # slice thickness in microns experimenter : varchar(20) # person who performed this experiment
You can examine the resulting table heading in MATLAB as
or in Python as
The heading of
mp.Slice may look something like
subject_id : char(8) # experiment subject id slice_id : smallint # slice number within subject --- brain_region : varchar(12) # abbreviated name for brain region plane : varchar(12) # plane of section slice_date : date # date of the slicing (not patching) thickness : smallint unsigned # slice thickness in microns experimenter : varchar(20) # person who performed this experiment
This displayed heading reflects the actual columns in the table. The foreign keys have been replaced by the primary key attributes of the referenced tables, including their data type and comment.
How foreign keys work¶
The foreign key
-> A in the definition of table
B has the following effects:
- The primary key attributes of
Aare made part of
- A foreign key constraint is created in
Bwith reference to
- If one does not already exist, an index is created to speed up searches in
Bfor matches to
A. (The reverse search is already fast because it uses the primary key of
A foreign key constraint means that a tuple in
B cannot exist without a matching tuple in
A. Matching means attributes in
B that correspond to the primary key of
A must have the same values.
An attempt to insert a tuple into
B that does not have a matching counterpart in
A will fail.
Conversely, deleting a tuple from
A that has matching tuples in
B will result in the deletion of these matching tuples and so forth, recursively, downstream in the pipeline.
A with a foreign key, we often say that
B depends on
A. We will therefore call
B the dependent table and
A the referenced table with respect to the foreign key from
Note to those already familiar with the theory of relational databases: The usage of the words “depends” and “dependency” here should not be confused with the unrelated concept of functional dependencies that is used to define normal forms.
Foreign keys enforce the desired property of databases known as referential integrity. Referential integrity enforces the constraint that no entity exists in the database without all the other entities on which it depends. An entity in relation
B depends on an entity in relation
A when they belong to them or are computed from them.
Renamed foreign keys¶
In most cases, the foreign key includes the primary key attributes of the referenced table in the table definition without renaming them. In such a case, an entity in the dependent table depends on exactly one entity in the referenced table. DataJoint provides the following syntax rename the primary key attributes when they are included in the new table.
The foreign key
(new_attr) -> Table
renames the primary key attribute of
new_attr before integrating it into the table definition.
This works if there is no ambiguity which of the primary key attributes must be renamed. Such is the case if
Table has only one attribute in the primary key or it only has one attribute that has not yet been included in the dependent table’s definition.
For example, the table
Experiment, may depend on table
User but rename the foreign key attribute into
operator as follows
(operator) -> User
In some cases, it is not clear which attribute or attributes from the referenced table should be renamed. Such is the case when multiple attributes are renamed or when the referenced table has multiple attributes that have not yet included.
For example, a table for
Synapse may reference the table
Cell twice as
The table definition may appear as
## synapse between two cells (presynaptic) -> Cell(cell_id) (postsynaptic) -> Cell(cell_id) --- connection_strength : double # (pA) peak synaptic current
If the primary key of
Cell is (
cell_id), then the primary key of
Synapse resulting from the above definition will be (
The first foreign key was responsible for including the first three attributes and the second foreign key added the last. Note that the second foreign key could just as well have been
(postsynaptic) -> Cell with the same effect but it does not make the table definition any clearer.
Note that the design of the
Synapse table above imposes the constraint that the synapse can only be found between cells in the same animal and in the same slice. If we wished to allow to represent synapses between cells from different slices, then we would have to rename
slice_id as well:
## synapse between two cells (presynaptic_slice, presynaptic_cell) -> Cell(slice_id, cell_id) (postsynaptic_slice, postsynaptic_cell) -> Cell(slice_id, cell_id) --- connection_strength : double # (pA) peak synaptic current
In this case, the primary key of
Synapse will be (
postsynaptic_cell). This primary key still imposes the constraint that synapses can only form between cells within the same animal but may connect cells across different slices.
In the ERD, renamed foreign keys are shown as red lines with an additional dot node in the middle to indicate that a renaming took place.
Foreign key options¶
Foreign key options are currently in development.
Foreign keys allow the additional options
unique, which can be inserted in square brackets following the arrow.
For example, in the following table definition
rig_id : char(4) # experimental rig --- -> Person
each rig belongs to a person but it does not prevent one person owning multiple rigs.
nullable option, a rig may not belong to anyone when the foreign key attributes for
Person are set to
rig_id : char(4) # experimental rig --- -> [nullable] Person
With the unique option, a person may only appear once in the entire table, which means that no one person can own more than one rig.
rig_id : char(4) # experimental rig --- -> [unique] Person
Finally with both unique and nullable, a rig may or may not be owned by anyone and each person may own up to one rig.
rig_id : char(4) # experimental rig --- -> [unique, nullable] Person
Foreign keys made from the primary key cannot be nullable but may be unique.