Relational Data Structure

Relation
A relation is a table with columns and rows.
 
Attribute
An attribute is a named column of a relation.
 
Domain
A domain is a set of allowable values for one or more attributes.
 
Tuple
A tuple is a row of a relation.
 
Degree
The degree of a relation is the number of attributes it contains.
 
Cardinality
A cardinality of a relation is the number of tuples it contains.
 
Relational database
A collection of normalized relations with distinct relation names.
 

Alternative terminology for relational model
 Formal term   Alternative 1   Alternative 2 
Relation Table File
Tuple Row Record
Attribute Column Field

Relation schema
A named relation defined by a set of attributes and domain name pairs.
 

Example: Let A1, A2, ..., An be attributes with domains D1, D2, ..., Dn. Then the set S = {A1:D1, A2:D2, ..., An:Dn} is a relation schema. A relation R defined by the relational schema S is a set of mappings from the attribute names to their corresponding domains. Thus, the relation R is a set on n-tuples:
    (A1:d1, A2:d2, ..., An:dn) such that di belongs to Di
Relation database schema
A set of relation schemas, each with a distinct name.
 

Properties of a relation:

Relational Keys

Superkey
An attribute, or a set of attributes, that uniquely identifies a tuple within a relation.
 
Candidate key
A superkey such that there is no proper subset is a superkey in the relation.
 
A candidate key, K, for a relation R has two properties:
  • uniqueness - in each tuple in R, the values of K uniquely identify that tuple;
  • irreducibility - no proper subset of K has the uniqueness property.

    Primary key
    The candidate key that is selected to identify tuples uniquely within the relation.
     
    Foreign key
    An attribute, or set of attributes, within one relation that matches the candidate key of some (possibly the same) relation.
     

    Relational Integrity

    Every data model has a set of integrity rules, which ensures that the data is accurate. Since every attribute has an associated domain, there are constraints (called domain constraints) that form restrictions on the set of values allowed for the attributes of relations.

    In addition, there are two important integrity rules, which are constraints or restrictions that apply to all instances of the database. The two principal rules for the relational model are known as entity integrity and referential integrity.

    Null
    represents a value for an attribute that is currently unknown or is not applicable for this tuple.
     
    Entity integrity
    in a base relation, no attribute of a primary key can be Null.
     
    Referential integrity
    If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly Null.
     
    Enterprise constraints
    Additional rules specified by the users or database administrators of a database.
     

    Entity-Relationship Model

    Introduction

    One of the most difficult aspects of a database design is the fact that designers, programmers, and end-users tend to view data and its use in different ways. Unfortunately, unless we gain a common understanding that reflects how the enterprise operates, the design we produce will fail to meet the users' requirements. To ensure that we get a precise understanding of the nature of the data and how it is used, we need to have a model for communication that is non-technical and free of ambiguities. The Entity-Relationship model is one such example.

    ER model is a top-down approach to database design that begins by identifying the important data called entities and relationships between the data that must be represented in the model.

    Entity type
    A group of objects with the same properties, which are identified by the enterprise as having an independent existence. (The basic concept of the ER model, which represents a group of "objects" in the "real world" with the same properties.)
     
    Entity occurrence
    A uniquely identifiable objects of an entity type.
      Example: Branch, Department, Staff, Skills
    Relationship type
    A set of meaningful associations among entity types (each relationship is given a name that describes its functions).
      Example: Branch Has Department
    Relationship occurrence
    A uniquely identifiable association, which includes one occurrence from each participating entity types.
     
    Degree of a relationship type
    The number of participating entity types in a relationship. (binary, ternary, quaternary).
      Example of a quaternary relation: A solicitor arranges a bid on behalf of a buyer supported by a financial institute.
    Recursive relationship
    A relationship type where the same entity participates more than once in different roles. (Relationships may be given role names to indicate the purpose that each entity type plays in a relationship.)
      Example:
    Staff (Supervisor) Supervises Staff (Supervisee).
    Attribute
    A property of an entity or a relationship.
     

    Attributes can be classified as:

  • simple or composite;
  • single-valued or multi-valued;
  • or derived.

    Attribute domain
    A set of available values for one or more attributes.
     
    Simple (or atomic) attribute
    An attribute composed of a single component with an independent existence. (Simple attribute can not be further subdivided into small components.)
      Example: Salary.
    Composite attribute
    An attribute composed of multiple components, each with different existence. (Can be further divided.)
      Example: Address can be subdivided into Street, City, State, ZipCode.
    Single-valued attribute
    An attribute that holds a single value for each occurrence of an entity type.
     
    Multi-valued attribute
    An attribute that holds multiple values for each occurrence of an entity type.
      Example: Phones (of an office), Authors (of a book), Skills (of an employee).
    Derived attribute
    An attribute that represents a value that is derivable from the values of a related attribute or a set of attributes, not necessarily in the same entity.
     

    Keys

    Candidate key
    The minimal set of attributes that uniquely identifies each occurrence of an entity type.
     
    Primary key
    The candidate key that is selected to uniquely identify each occurrence of an entity type.
     
    Composite key
    A candidate key that consists of more than one attribute.
     
    Note: Relationship also may have attributes.
    Example: Relationship: Parts SoldTo Client may have attributes When or/and HowMany.

    Structural constraints

    These constraints should reflect the restrictions on the relationships as perceived in the "real world". The main type of constraints on relationships is called multiplicity.

    Multiplicity
    The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an associated entity type through a particular relationship.
     
    Binary relationship are generally referred to as being:
  • one-to-one (1:1)
  • one-to-many (1:*)
  • many-to-many (*:*)
    Examples:
  • Department Has Manager (1:1)
  • Department Employees Staff (1:*)
  • Part SoldTo Client (*:*)

    Functional Dependencies

    One of the main concepts associated with normalization is functional dependency, which describes the relationship between attributes.

    Characteristics of Functional Dependencies

    Assume, that a relational schema has attributes (A, B, C, D, ..., Z) and that the whole database is described by a single universal relation called R = (A, B, C, D, ..., Z).
    Functional dependency
    Describes the relationship between attributes in a relation. For example, if A and B are attributes of relation R, B is functionally dependent on A (denote A->B), if each value of A is associated with exactly one value of B. (A and B may each consist of one or more attributes.)
     
    Determinant
    Refers to the attribute or a set of attributes on the left-hand side of the arrow of a functional dependency.
      See example.

    Inference Rules for Functional Dependencies

    Closure of X
    is the set of all functional dependencies that are implied by a given set of functional dependencies X (written X+).
     

    A set of inference rules, called Armstrong's axioms, specifies how new functional dependencies can be inferred from given ones.

    Name Explanation Example
    Reflexivity if B is a subset of A, then A->B sName,Position->Position
    Augmentation if A->B, then A,C->B,C bAddress->branchNo => bAddress,Position->branchNo,Position
    Transitivity
    If  A->B , then A->C
    and
    B->C
    staffNo->branchNo
    branchNo->bAddress
     => staffNo->bAddress
    Note that each of these three rules can be directly derived from the definition of functional dependency. The rules are complete in that given a set X of functional dependencies, all functional dependencies implied by X can be derived from X using these rules. In other words, the rules can be used to derived the closure of X+.
    Several further rules can be derived from the three given above that simplify practical task of computing X+.
    Self-determination A->A  
    Decomposition
    If A->B,C, then  A->B
    and
    A->C
    staffNo->Position,Salary  =>  staffNo->Position
    and
    staffNo->Salary
    Union
    If  A->B
    and
    A->C
    , then A->B,C
    staffNo->Position
    and
    staffNo->Salary
      => staffNo->Position,Salary
    Composition
    If  A->B
    and
    C->D
    , then A,C->B,D
    staffNo->sName
    and
    branchNo->bAddress
      => staffNo,branchNo->Position,bAddress

    Minimal Set of Functional Dependencies

    Definition 1:
    A set of functional dependencies Y is covered by a set of functional dependencies X, if every functional dependency in Y is also in X+.
     
    Definition 2:
    A set of functional dependencies X is set to be equivalent to a set of functional dependencies Y if X is covered by Y and Y is covered by X.
     
    Definition 3:
    A set of functional dependencies X is minimal if it satisfies the following conditions:
  • Every dependency in X has a single attribute on its right-hand side.
  • We can not replace any dependency A->B in X with dependency C->B, where C is a proper subset of A, and still have a set of dependencies that is equivalent to X.
  • We can not remove any dependency from X and still have a set of dependencies that is equivalent to X.
     
    Definition 4: Full Functional Dependency.
    Indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if
  • B is functionally dependent on A,
  • but not on any proper subset of A
    Definition 5: Transitive dependency.
    A condition where A, B, and C are attributes of a relation such that A->B and B->C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
     

    Normalization

    Normalization is a formal technique for analyzing relations based on their primary key (or candidate key) and functional dependencies (Codd, 1972). The technique involves a series of rules that can be used to test individual relations so that a database can be normalized to any degree. When a requirement is not met, the relation violating the requirement must be decomposed into relations that individually meet the requirements of normalization.

     First Normal Form
       Second Normal Form  
       Third Normal Form  
       Boyce-Codd Normal Form  
       Fourth Normal Form  
       Fifth Normal Form  
     
     
     
     
     

    First Normal Form

    A relation is in First Normal Form if the intersection of of each row and column contains one and only one value.
    The domain of attributes must include only atomic (simple, indivisible) values and the value of any attribute in a tuple must be a single value from the domain of that attribute.
    Example.

    Second Normal Form

    A relation is in Second Normal Form if it is in 1NF and every non-primary-key attributes is fully functionally dependent on the primary key.
    Example

    Third Normal Form

    A relation is in Third Normal Form if it is in 2NF and no non-primary-key attribute is transitively dependent on the primary key.
    Example

    Boyce-Codd Normal Form

    A relation is in BCNF, if and only if, every determinant is a candidate key.
    Example

    Before moving to higher level normal forms (which are not used very often) let's review the normalization process.

    Fourth Normal Form

    Multi-valued dependency (MVD)
    Represents a dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set values for C. However, the set of values for B and C are independent of each other.
     
    We represent a MVD between attributes A, B, and C using the following notation:
     A ->> B
     A ->> C
    
    A multi-valued dependency A ->>B can be trivial if
    (a) B is a subset A or
    (b) A union B = R
    and nontrivial otherwise.
    A relation is in 4NF if it is in BCNF and contains no nontrivial multi-valued dependencies.
    Example

    Fifth Normal Form

    Lossless-join dependency
    A property of decomposition, which ensures that no spurious tuples are generated when relations are reunited through a natural join operation.
     
    Join dependency
    Describes a type of dependency. For example, for a relation R with subsets of the attributes of R denoted as A, B, ..., Z, a relation R satisfies a join dependency if, and only if, every legal value of R is equal to the join of its projections on A, B, ..., Z.
     

    We've seen a lot of examples where we decompose a relation into two relations. However, there are cases where we require to perform a lossless-join decompose of a relation into more than two relations. These cases are the focus of the lossless-join dependency and fifth normal form.

    A relation is in 5NF if it has no join dependency.
    Example