Conventional computing techniques extensively diverge over large scale computing. In order to store and operate structured data, most data scientists suggest higher dimensional arrays, especially in linearization of h...
详细信息
ISBN:
(纸本)9781450346177
Conventional computing techniques extensively diverge over large scale computing. In order to store and operate structured data, most data scientists suggest higher dimensional arrays, especially in linearization of higher order data. However, with the developing size of datasets, the structures become prone to performance degradation for inability of maintaining expanded data velocity. Besides, reallocation of data is required. The index array is a dynamic storage scheme which allows the array to extend subjectively according to the bound of the dimensions. All the dimensions of index array scheme require placing index to maintain data velocity. In this paper, we propose a scalable storage scheme for index array that illustrates the dynamic expansion nature of array. This scheme entails only two indices for any number of dimensions. Hence, simple algorithms can be designed for array operations. The proposed scheme overtakes the conventional structure in terms of memory utilization, index cost and element access.
Introduction: Mena, an Ena/VASP protein family member, is a key actin regulatory protein. Mena is up-regulated in breast cancers and promotes invasion and motility of tumor cells. Mena has multiple splice variants, in...
详细信息
Introduction: Mena, an Ena/VASP protein family member, is a key actin regulatory protein. Mena is up-regulated in breast cancers and promotes invasion and motility of tumor cells. Mena has multiple splice variants, including Mena invasive (Mena(INV)) and Mena11a, which are expressed in invasive or non-invasive tumor cells, respectively. We developed a multiplex quantitative immunofluorescence (MQIF) approach to assess the fraction of Mena lacking 11a sequence as a method to infer the presence of invasive tumor cells represented as total Mena minus Mena11a (called Mena(calc)) and determined its association with metastasis in breast cancer. Methods: The MQIF method was applied to two independent primary breast cancer cohorts (Cohort 1 with 501 and Cohort 2 with 296 patients) using antibodies against Mena and its isoform, Mena11a. Mena(calc) was determined for each patient and assessed for association with risk of disease-specific death. Results: Total Mena or Mena11a isoform expression failed to show any statistically significant association with outcome in either cohort. However, assessment of Mena(calc) showed that relatively high levels of this biomarker is associated with poor outcome in two independent breast cancer cohorts (log rank P = 0.0004 for Cohort 1 and 0.0321 for Cohort 2). Multivariate analysis on combined cohorts revealed that high Mena(calc) is associated with poor outcome, independent of age, node status, receptor status and tumor size. Conclusions: High Mena(calc) levels identify a subgroup of breast cancer patients with poor disease-specific survival, suggesting that Mena(calc) may serve as a biomarker for metastasis.
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a BitTable struc...
详细信息
Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a BitTable structure. BitTableFI is such a recently proposed efficient BitTable-based algorithm, which exploits BitTable both horizontally and vertically. Although making use of efficient bit wise operations, BitTableFI still may suffer from the high cost of candidate generation and test. To address this problem, a new algorithm index-BitTableFI is proposed. index-BitTableFI also uses BitTable horizontally and vertically. To make use of BitTable horizontally, index array and the corresponding computing method are proposed. By computing the subsume index, those itemsets that co-occurrence with representative item can be identified quickly by using breadth-first search at one time. Then, for the resulting itemsets generated through the index array, depth-first search strategy is used to generate all other frequent itemsets. Thus, the hybrid search is implemented, and the search space is reduced greatly. The advantages of the proposed methods are as follows. On the one hand, the redundant operations on intersection of tidsets and frequency-checking can be avoided greatly;On the other hand, it is proved that frequent itemsets, including representative item and having the same supports as representative item, can be identified directly by connecting the representative item with all the combinations of items in its subsume index. Thus, the cost for processing this kind of itemsets is lowered, and the efficiency is improved. Experimental results show that the proposed algorithm is efficient especially for dense datasets. (c) 2008 Elsevier B.V. All rights reserved.
Because of the inherent computational complexity, mining the complete frequent item-set in dense datasets remains to be a challenging task. Mining Maximal Frequent Item-set (MFI) is an alternative to address the probl...
详细信息
Because of the inherent computational complexity, mining the complete frequent item-set in dense datasets remains to be a challenging task. Mining Maximal Frequent Item-set (MFI) is an alternative to address the problem. Set-Enumeration Tree (SET) is a common data structure used in several MFI mining algorithms. For this kind of algorithm, the process of mining MFI's can also be viewed as the process of searching in set-enumeration tree. To reduce the search space, in this paper, a new algorithm, index-MaxMiner, for mining MFI is proposed by employing a hybrid search strategy blending breadth-first and depth-first. Firstly, the index array is proposed, and based on bitmap, an algorithm for computing index array is presented. By adding subsume index to frequent items, index-MaxMiner discovers the candidate MFI's using breadth-first search at one time, which avoids first-level nodes that would not participate in the answer set and reduces drastically the number of candidate itemsets. Then, for candidate MFI's, depth-first search strategy is used to generate all MFI's. Thus, the jumping search in SET is implemented, and the search space is reduced greatly. The experimental results show that the proposed algorithm is efficient especially for dense datasets.
The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. This paper proposes an improved algorithm for mining frequent closed itemse...
详细信息
The set of frequent closed itemsets determines exactly the complete set of all frequent itemsets and is usually much smaller than the latter. This paper proposes an improved algorithm for mining frequent closed itemsets. Firstly, the index array is proposed, which is used for discovering those items that always appear together. Then, by using bitmap, an algorithm for computing index array is presented. Thirdly, based on the heuristic information provided by index array, frequent items, which co-occur together and share the same support, are merged together. Thus, initial generators are calculated. Finally, based on index array, reduced pre-set and reduced post-set are proposed. It is proved that the reduced pre-set and reduced post-set not only retain the function of pre-set and post-set, but also have smaller sizes. Therefore, the redundant items in pre-set and post-set are deleted, thus making it possible to save a lot of work related to inclusion check. The experimental results show that the proposed algorithm is efficient especially on dense dataset.
暂无评论