Friday, August 3, 2012

Relational Algebra Operations

How important is it to understand the relational algebra?

In the first place, the relational algebra is a part of computer science and so it should be useful thing ;)

Secondly, although the relational algebra is a formal model, it's the basis for database query languages. Widely used SQL is based on it. The study of the relational algebra is supposed to make the comprehension of SQL easier.

Foundations

First of all the relational algebra is a procedural query language. The word "procedural" means that the language consists of operations. The operations are performed on relations (tables or parts of them).

Relational Algebra Operations:
  1. Fundamental Operations
  2. Additional Relational-Algebra Operations
  3. Extended Relational-Algebra Operations

Let's look at them more closely.

Select

The select operation selects tuples that satisfy a given predicate (condition). It's denoted by the letter sigma σ.

Suppose we have table (relation) with persons as it's represented in Figure 1. We want to select the tuples with persons from department "Comp. Sci.".

In the language of the relation algebra we write:

σdept_name="Comp. Sci."(Persons)

From the example we see that the condition appears as a subscript to σ.

Project

The project operation allows us to choose attributes of a relation (table) that we want to list. Projection is denoted by the letter pi Π.

Suppose we want to list all persons' ID, name, but don't want to list other fields.

We write the query in a such way:
ΠID, name(Persons)

Any duplicate tuples will be eliminated!

Union

The union statement allows us to get the union of two queries. It's denoted as U.

Suppose that persons from the Figure 1 can assign tasks for themselves. We have two tables: Tasks and PersonsTasks.

The table Tasks includes the list of tasks that persons can assign to themselves.

The table PersonsTasks includes the information about persons and their tasks.

Suppose we want to take person_id of the persons that assigned tasks to themselves in 6/3/2012 and in 5/1/2012

We write the query in a such way:

Πperson_iddate="6/3/2012"(PersonsTasks)) U
Πperson_iddate="5/1/2012"(PersonsTasks))
Set Difference

The set-difference operation allows to find tuples that are in one relation but are not in another. It's denoted by -.

Suppose we want to find names of persons that have salary over 50000, but they are not from Comp. Sci. department. We will select data form the table Persons that was shown in Figure 1.

Πnamesalary>50000(Persons)) -
Πnamedept_name="Comp. Sci."(Persons))
Cartesian Product

The Cartesian product allows us to get all combinations of rows of two relations.

We can write the Cartesian product of two relations in a such way:

ResultRelation = Tasks × PersonsTasks

For example, let's take tables that were represented in Figure 2 and Figure 3. The result of the Cartesian product of the tables Tasks(ID, description, type) and PersonsTasks(person_id, task_id, date, done) will be ResultRelation that includes fields of the relations Tasks and PersonsTasks.

ResultRelation(Tasks.ID, Tasks.description, Tasks.type, PersonsTasks.person_id, PersonsTasks.task_id, PersonsTasks.date, PersonsTasks.done)

The result relation for this operation appears in Figure 4.

The result is the relation that includes all combinations of tuples from Tasks and PersonsTasks. The Cartesian product operation associates every tuple of Tasks with every tuple of PersonsTasks.

Usually we use the Cartesian project operation with conditions, because in most cases we don't want to get all combinations of tuples. Figure 4 shows all combinations, but they are not useful for us.

For example we got combinations for the person with person_id=10105 with all tasks, and so we got some tasks that the person did't assign for himself. If we want to get the relevant information for each person, we should add an appropriate condition.

σTasks.ID = PersonsTasks.task_id(Tasks × PersonsTasks)

The result is represented on Figure 5.

Rename

Using the rename operation we can rename relations. It's denoted by rho ρ

ρnew_name(relation_name)

There is another way to rename a relation. We can specify attributes (column names) that we want to get in a new relation.

ρnew_name(column_name1, column_name2)

Let's find the highest salaries using the rename operator.

First of all, let's find all salaries except of the maximum salary. And then let's take all salaries from Persons and remove salaries that we found before. In a such way we will get the maximum salary of the persons from Figure 1.

Πsalary(Persons) - 
ΠPersons.salaryPersons.salary < p.salary(Persons × ρp(Persons))

We needed to use the table Persons twice in the Cartesian product and so we renamed one of the tables to differ between them.