Relational Algebra Part 1 Much of the material presented in these slides was developed by Dr. Ramon Lawrence at the University of Iowa Slideshow and powerpoint viewer: Relational Algebra A

Relational Algebra A query language is used to update and retrieve data that is stored in a data model. Relational algebra is a set of relational operations for retrieving data. Every relational operator takes as input one or more relations and produces a relation as output. Just like algebra with numbers, relational algebra consists of operands (which are relations) and a set of operators. Closure property - input is relations, output is relations Unary operations - operate on one relation Binary operations - have two relations as input A sequence of relational algebra operators is called a relational algebra expression, i.e., an expression may contain more than one relational algebra operator.

Selection Operation The selection operation is a unary operation that takes in a relation as input and returns a new relation as output that contains a subset of the tuples of the input relation. That is, the output relation has the same number of columns as the input relation, but may have less rows. To determine which tuples are in the output, the selection operation has a specified condition, called a predicate, that tuples must satisfy to be in the output. The predicate is similar to a condition in an if statement.

Selection Operation Formal Definition The selection operation on relation R with predicate F is denoted by σF(R). σF(R)={t | t R and F(t) is true} where R is a relation, t is a tuple variable F is a formula (predicate) consisting of operands that are constants or attributes comparison operators: <, >, =, ≠, ≤, ≥ logical operators: AND, OR, NOT

Selection Questions Write the relational algebra expression that: 1) Returns all employees working on project P2. 2) Returns all employees who are working as a manager on a project. 3) Returns all employees that have been working as a manager for more than 40 months. Show the resulting relation for each case.

Projection Operation The projection operation is a unary operation that takes in a relation as input and returns a new relation as output that contains a subset of the attributes of the input relation and all non-duplicate tuples. The output relation has the same number of tuples as the input relation unless removing the attributes caused duplicates to be present. Question: When are we guaranteed to never have duplicates when performing a projection operation? Besides the relation, the projection operation takes as input the names of the attributes that are to be in the output relation.

Projection Operation Formal Definition The projection operation on relation R with output attributes A1,…,Am is denoted by ΠA1,…,Am(R). ΠA1,…,Am(R)={t[A1,…, Am] | t R} where R is a relation, t is a tuple variable {A1,…,Am} is a subset of the attributes of R over which the projection will be performed. Order of A1,…, Am fixes their order in the result. Cardinality of ΠA1,…,Am(R) is not necessarily the same as R because of duplicate removal.

Projection Questions Write the relational algebra expression that: 1) Returns only attributes resp and dur. 2) Returns only eno. 3) Returns only pno. Show the resulting relation for each case.

Union Union is a binary operation that takes two relations R and S as input and produces an output relation that includes all tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated. General form: R S = {t | t R or t S} where R, S are relations, t is a tuple variable. R and S must be union-compatible. To be unioncompatible means that the relations must have the same number of attributes with the same domains.

Set Difference Set difference is a binary operation that takes two relations R and S as input and produces an output relation that contains all the tuples of R that are not in S. General form: R – S = {t | t R and t∉S} where R and S are relations, t is a tuple variable. Note that: R–S≠S–R R and S must be union compatible.

Intersection Intersection is a binary operation that takes two relations R and S as input and produces an output relation which contains all tuples that are in both R and S. General form: R ∩ S = {t | t R and t S} where R, S are relations, t is a tuple variable. R and S must be union-compatible. Note that R ∩ S = R – (R – S) = S – (S – R).

Summarize Operator and Aggregate Functions Aggregate functions take a collection of values (i.e., a relation?) and return a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values The summarize operation in relational algebra G1 ,G2 ,,Gn F ( A ),F ( A ,,F ( A ) (E ) 1 1 2 2) n n E is any relational-algebra expression G , G …, G is a list of attributes on which to group (can be 1 2 n empty) Each F is an aggregate function i Each Ai is an attribute name

Aggregate Functions (Cont.) Result of aggregation does not have a name Can use rename operation to give it a name For convenience, we permit renaming as part of aggregate operation branch_name g sum(balance) as sum_balance (account)