Notes On Data Dependency Terminologies

647 Words2 Pages

4.1 Data Dependency Terminologies

Because our overall goal is to discover data dependencies that are related to sequence of operations performed by transactions, we first define sequence in our context.

Definition 1: A sequence is an ordered list of read and/or writes operations. We denote a sequence s by , where oi ∈ {r, w} and dk is a data item, 1 ≤ k ≤ n. D(s) represents the set of data items contained in the sequence, i.e., D(s) = {d1, d2, …, dn}. The support for a sequence is defined as the fraction of total transactions that contains this sequence.

Read sequence and write sequence are employed to define read and write dependencies respectively.
Definition 2: The Read Sequence of data item x is the sequence with the format which represents that the transaction may need to read all data items d1, d2, …, dn in this order before the transaction updates data item x. It must be noted that each data item may have several read sequences each having different length. All these sequences together are called Read Sequence Set of this data item.

The notation rs(x) is used to denote the read sequence set of data item x. For example, consider the following update statement in a transaction.
Update Table1 set x = a + b + c where d = 90;
In this statement, before updating x, values of a, b, c and d must be read and then the new value of x is calculated. So rs(x).
It must be noted that the database log only contains before and after images of x instead of the mathematical operation used for calculating x, i.e., x = a + b + c. The above example is only for illustrating the concept of read sequence. The database log containing the above transaction ma...

... middle of paper ...

...A pre-set threshold is used to identify whether a dependency is weak or strong.

For example, suppose the predefined threshold for weight of data dependency is 40%. For the sequence , if the probability of reading {a, b, c, d} before x is updated is 75%, then r weight(x , {a, b, c, d}) is equal to 75%. Since this is larger than the threshold, we say, the dependency between x and {a, b, c, d} is strong.
Figure 1 illustrates an example data dependency. Data item x has read dependency relationships with {a, b, c, d}, {c, d}, and {x, e, f}. Besides, it has write dependency relationships with {y, z} and {u, v}. Suppose the predefined threshold for weight of data dependency is 40%. Then for the read dependency only {a, b, c, d} has strong data dependency with x. Similarly for the write dependency only {u, v} has strong data dependency with x.

More about Notes On Data Dependency Terminologies

Open Document