![]() |
![]() |
![]() |
DCI: A hybrid Algorithm for Frequent Set Counting | |
AbstractIn this pages we describe DCI, a hybrid, multi-strategy algorithm for solving the Frequent Set Counting problem. Similarly to Apriori, DCI employs a level-wise technique to identify frequent sets. At iteration k, candidate k-itemsets are determined, and their support discovered by counting itemset occurrences within the transaction database. DCI uses an innovative method for storing candidate itemsets and counting their support, and by exploiting effective pruning techniques which reduce the size of the dataset as execution progresses. Moreover, as soon as the pruned dataset becomes small enough to fit into the main memory, DCI builds on the fly a vertical transaction database, and starts using an efficient intersection-based technique to determine the support of larger itemsets. The most important innovation in DCI resides on a novel counting inference strategy, based on a previously known result by Basted et al. which introduced the concept of key-pattern. When possible, locality of data and pointer dereferencing were optimized due to their importance with respect to developments in computer architectures. We thus conducted in-depth experimental evaluations by taking into account not only execution time, but also virtual memory usage and I/O activity. The experimental results, conducted on a Pentium-based workstation with synthetically generated datasets as well as real-life datasets, show that DCI is more efficient than other algorithms, in most cases. We obtained very good results for datasets characterized by both short and long length frequent patterns. | |
|
For questions regarding this pages please contact paolo palmerini. ©High Performance Computing Lab. at ISTI-CNR., University of Venice, | |