# Operators¶

## Relational operators¶

[Data queries](Query basics) have the form of expressions using operators to derive the desired relation. The expressions themselves do not contain any data. They represent the desired data symbolically.

Once a query is formed, the [fetch](Fetching) methods are used to bring the data into the MATLAB or Python workspace. Since the expressions are only symbolic representations, repeated `fetch`

calls may yield different results as the state of the database is modified.

DataJoint implements a complete algebra of operators on relations:

operator | notation | meaning |
---|---|---|

restriction | `A & cond` |
The subset of relation `A` the meets condition `cond` |

difference | `A - cond` |
The subset of relation `A` the does not meet condition `cond` |

join | `A * B` |
Combines all matching information from `A` and `B` |

left join | `A ** B` (Python) `A .* B` (MATLAB) |
Combines matching information from `B` into `A` |

union | `A + B` |
All unique tuples from `A` and `B` |

projection | `A.proj(...)` |
Selects and renames attributes from `A` or computes new attributes |

aggregation | `A.aggr(B, ...)` |
Same as projection but allows computations based on matching information in `B` |

## Principles of relational algebra¶

DataJoint’s relational algebra improves upon the classical relational algebra and upon other query languages to simplify and enhance the construction and interpretation of precise and efficient data queries.

The clarity of DataJoint’s query expressions stems from the concept of [entity integrity](Entity integrity). Entity integrity states that every relation must have a well-defined *primary key*. In other systems, this concept applies to *base relations*, which store the data in the database. DataJoint extends entity integrity to *derived relations* too.

1. **Purely relational**: Data are represented and manipulated in the form of *relations*.
1. **Algebraic closure**: All relational operators operate on relations and yield relations. Thus relational expressions may be used as operands in other expressions or may be assigned to variables to be used in other expressions.
1. **Attributes are identified by names.** All attributes of relations have well-defined names. This includes derived relations resulting from relational operators. Relational operators use attribute names to determine how to perform the operation. The order of the attributes in relations is not significant.
1. **All relations have a primary key.** This includes derived relations resulting from relational operators, for which the primary key is properly derived from the primary keys of the operands in expressions. Relational operators use the information about the operands’ primary keys to define the query.

## Matching tuples¶

Binary relational operators in DataJoint are based on the concept of *matching tuples* and we will use this phrase throughout.

Two tuplesmatchwhen they have no common attributes or when their common attributes contain the same values.

Here *common attributes* are those that have the same names in both tuples. It is usually assumed that the common attributes are of compatible data types to allow equality comparisons.

Another way to phrase the same definition is

Two tuplesmatchwhen they have no common attributes whose values differ.

It may be conceptually convenient to imagine that all relations always have an additional invisible attribute, `omega`

whose domain comprises only one value, 1. Then the definition of matching tuples is simplified:

Two tuplesmatchwhen their common attributes contain the same values.

Matching tuples can be *merged* into a single tuple without any conflicts of attribute names and values.

## Join compatibility¶

All binary operators with other relations as its two operands require that their operands be *join-compatible*, which means that:

- All common attributes in both operands (attributes with the same name) must be part of the primary key or of a foreign key.
- All common attributes in the two relations must be of a compatible datatype for equality comparisons.

These restrictions are introduced both for performance reasons and for conceptual reasons. For performance, they encourage queries that rely on indexes. For conceptual reasons, they encourage database design in which entities in different relations are lated to each other by the use of primary keys and foreign keys.