Consequences Of Sequential Pattern Mining

750 Words2 Pages

Because our method uses the sequences of operations, i.e., what sequence of read operations must be performed before an update operation and what sequence of write operations must be done after the same update operation, it is intuitively similar to the problem of sequential pattern mining. But, by only employing sequential pattern mining algorithm on database log, we can only get some sequential patterns consisting of mixed read and write operations and these sequential patterns mined don’t necessary reflect the essential data correlations in a database system. In addition, it is hard to apply these sequences mined directly for detecting malicious transactions. By carefully analyzing the problem encountered, we found that by designing a rule generation algorithm, the sequential pattern discovering algorithm can be utilized to generate the desired classification rules for our purpose.
We split the problem with discovering data dependencies into three steps, namely, sequential pattern discovery phase, read and write sequence set generation phase, and data dependency rules generation phase.

4.2.1 Sequential Pattern Discovery Phase

Consider the 10 example transactions as shown in Table 1. The r(x) and w(x) represent read and write operations respectively and, without loss of generality, integers are used to represent each data item in the database. With minimum support set to 20%, i.e., a minimum support of 3 transactions, Table 2 illustrates 13 desired sequential patterns that satisfy the support constraint. For example, sequential pattern is supported by transactions 1, 4, 9, and 10. An example of a sequence that does not satisfy minimal support is the sequence that is only supported by trans...

... middle of paper ...

...j1), w(dj2), w(dj3),…, w(djk)> to write sequence set of data item di where {w(dj1), w(dj2), w(dj3),…, w(djk)} is the set of all write operations after w(di).

Table 3 illustrates the read and write sequence sets generated by using the above method from the sequential patterns mined in Table 2. For example, the sequence denotes that before data item 4 is updated, data item 6 should be read. While the sequence represents that before data item 4 is updated, data item 7 and 6 should be read in sequence. Of these two sequences, the one that represents more accurate dependency can be determined by analyzing rweight (3, {5}) and rweight(4, {6, 5}) and this will be illustrated in the next sub-section. In the write sequence set, there is only one item that denotes that after data item 5 is updated, data item 4 should be updated.

Open Document