the performance of parallel distributed data management systems becomes increasingly important withthe rise of Big Data. parallel joins have been widely studied both in the parallelprocessing and the database commun...
详细信息
the performance of parallel distributed data management systems becomes increasingly important withthe rise of Big Data. parallel joins have been widely studied both in the parallelprocessing and the database communities. Nevertheless, most of the algorithms so far developed do not consider the data skew, which naturally exists in various applications. State of the art methods designed to handle this problem are based on extensions to either of the two prevalent conventional approaches to parallel joins - the hash-based and duplication-based frameworks. In this paper, we introduce a novel parallel join framework, query-based distributed join (QbDJ), for handling data skew on distributed architectures. Further, we present an efficient implementation of the method based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate the performance of our approach on a cluster of 192 cores (16 nodes) and datasets of 1 billion tuples with different skews. the results show that the method is scalable, and also runs faster with less network communication compared to state-of-art PRPD approach in [1] under high data skew.
We present in this paper a security-driven solution for scheduling of N independent jobs on M parallel machines that minimizes three different objectives simultaneously, namely the failure probability, the total compl...
详细信息
Performance and efficiency became recently key requirements of computer architectures. Modern computers incorporate Graphics processing Units (GPUs) into running data mining algorithms, as well as other general purpos...
详细信息
ISBN:
(纸本)9783642396403
Performance and efficiency became recently key requirements of computer architectures. Modern computers incorporate Graphics processing Units (GPUs) into running data mining algorithms, as well as other general purpose computations. In this paper, different parallelization methods are analyzed and compared in order to understand their applicability. From multi-threading on shared memory to using NVIDIA's GPU accelerators for increasing performance and efficiency on parallel computing, this work discusses the parallelization of data mining algorithms considering performance and efficiency issues. the performance is compared on both many-core systems and GPU accelerators on a distance measure algorithm using a relatively big data set. We optimize the way we deal with GPUs in heterogeneous systems to make them more suitable for big data mining applications with heavy distance calculations. Moreover, we focus on achieving a higher utilization of GPU resources and a better reuse of data. Our implementation of the content-based similarity algorithm SQFD on the GPU outperforms by up to 50x CPU counterparts, and up to 15x CPU multi-threaded implementations.
Timeliness, accuracy and effectiveness of manufacturing information in manufacturing and business process management have become important factors of constraint to business growth. Single RFID (Radio Frequency Identif...
详细信息
the proceedings contain 23 papers. the topics discussed include: efficient parallelalgorithms for XML filtering with structural and value constraints;a simulation-based method for eliciting requirements of online CIB...
ISBN:
(纸本)9783642366079
the proceedings contain 23 papers. the topics discussed include: efficient parallelalgorithms for XML filtering with structural and value constraints;a simulation-based method for eliciting requirements of online CIB systems;reducing latency and network load using location-aware memcache architectures;modeling capabilities as attribute-featured entities;governance policies for verification and validation of service choreographies;real-text dictionary for topic-specific web searching;evaluating cross-platform development approaches for mobile applications;information gathering tasks on the web: attempting to identify the user search behavior;web-based exploration of photos with time and geospace;mixed-initiative management of online calendars;knowledge discovery: data mining by self-organizing maps;and ranking location-dependent keywords to extract geographical characteristics from microblogs.
B&B algorithms are well known techniques for exact solving of combinatorial optimization problems (COP). they perform an implicit enumeration of the search space instead of exhaustive one. Based on a pruning techn...
详细信息
B&B algorithms are well known techniques for exact solving of combinatorial optimization problems (COP). they perform an implicit enumeration of the search space instead of exhaustive one. Based on a pruning technique, they reduce considerably the computation time required to explore the whole search space. Nevertheless, these algorithms remain inefficient when dealing with large combinatorial optimization instances. they are time-intensive and they require a huge computing power to be solved optimally. Nowadays, multi-core-based processors and GPU accelerators are often coupled together to achieve impressive performances. However, classical B&B algorithms must be rethought to deal withtheir two divergent architectures. In this paper, we propose a new B&B approach exploiting boththe multi-core aspect of actual processors and GPU accelerators. the proposed approaches have been executed to solve FSP instances that are well-known combinatorial optimization benchmarks. Real experiments have been carried out on an Intel Xeon 64-bit quad-core processor E5520 coupled to an Nvidia Tesla C2075 GPU device. the results show that our hybrid B&B approach speeds up the execution time up to ×123 over the sequential mono-core B&B algorithm.
Withthe development of microarray technology, it is possible now to study and measure the expression profiles of thousands of genes simultaneously which can lead to identify subgroup of specific disease or extract hi...
详细信息
the objective of this work is to get benefit of advancement in GPU technologies in the state of art software framework. We have analyzed the existing map-reduce (MR) framework and modify the same for new GPU architect...
详细信息
ISBN:
(纸本)9780889869431
the objective of this work is to get benefit of advancement in GPU technologies in the state of art software framework. We have analyzed the existing map-reduce (MR) framework and modify the same for new GPU architectures. We have identified some significant possibilities for improvement. these improvements are mainly in the context of the different GPU architectures, which were introduced after the development of the MR framework. Our experiments show an average of 2.5x speedup of MR framework on these architectures. Cache reconfiguration is also investigated in this work. We have achieved performance benefit ranging from 10% to 200% for various cache sizes. Based on the above analysis, three techniques have been developed for the performance enhancement of MR framework. First, we exploited the concept of principle of locality by code restructure. We have saved over 32%cache miss per thread. Second, we have reduced the number of comparisons per thread in group phase. Our optimized group phase gives an average of 1.5x speed up. In third optimization, we have performed delayed writing during mapperCount function and make this function as cache sensitive. this reduces significant cache misses and improves the execution time by 10% to 25% for this function.
this Paper proposes an effective SoC hardware architecture implementing a VDP for Full HD TVs. the proposed architecture makes real time video processing possible with supporting efficient bus architecture and flexibl...
详细信息
Scale-invariant feature transform (SIFT) based feature extraction algorithm is widely applied to extract features from images, and it is very attractive to accelerate these SIFT based algorithms on GPU. In this paper,...
详细信息
ISBN:
(纸本)9780769550886
Scale-invariant feature transform (SIFT) based feature extraction algorithm is widely applied to extract features from images, and it is very attractive to accelerate these SIFT based algorithms on GPU. In this paper, we present several parallel computing strategies, implement and optimize the SIFT algorithm using CUDA programming model on GPU. Each stage of SIFT is analyzed in detail to choose the parallel strategy. On the basis of the elementary CUDA-SIFT and CUDA architecture, we optimize the implementation from several aspects to speedup the CUDA-SIFT. Experimental results demonstrate that our implementation after optimization is 2.5 times faster than previous optimization, and our CUDA based SIFT can run at the speed of 20 frames per second on most images with 1280x960 resolution in the test. Using 1920x1440 image to test, we have obtained a speed of 11 frames per second on average, which is about 60 times faster than the CPU implementation of SIFT. In short, our implementation obtains appropriate accuracy and higher efficiency compared to CPU implementations and other GPU implementations, which is attributed to our dedicated optimization strategies.
暂无评论