As the instruction-level parallelism (ILP) on CPU develops to a rather advanced level, the exploration that whether many-core architecture is applicable for graph algorithms is generating more interests in researchers...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
As the instruction-level parallelism (ILP) on CPU develops to a rather advanced level, the exploration that whether many-core architecture is applicable for graph algorithms is generating more interests in researchers. However, due to the irregular memory access and the low ratio of computation to memory access, the performance of graph algorithms on many-core architectures has never worked good enough. To obtain outstanding speedup on many-core architecture, first of all, we need to figure out three questions: (i) how to optimize the memory access, (ii) how to minimize the overhead of synchronization, (iii) how to exploit the parallelism in algorithm. Prior works hardly reach the goal if such questions are treated in separated way. throughout this paper, we aim to settle these questions systematically, and try to provide a set of methods of optimizing graph algorithms on many-core architecture. this paper mainly discusses how to accelerate the Single Source Shortest Path (SSSP) problem on Intel Many Integrated Core (MIC) architecture, on which we propose an asynchronous parallel Dijkstra's algorithm. It aims at maximizing parallelism and minimizing overhead of synchronization. Experimental result shows that the MIC architecture could efficiently solve the SSSP problem, and its performance could be sped up by 9.2x compared to the benchmark of DIMACS.
the association-rule-based recommendation is widespread in many big data applications which need quick response to improve user experience. Spark is a widely used distributed computing platform, which accelerates the ...
详细信息
ISBN:
(数字)9783030050573
ISBN:
(纸本)9783030050573;9783030050566
the association-rule-based recommendation is widespread in many big data applications which need quick response to improve user experience. Spark is a widely used distributed computing platform, which accelerates the processing of large-scale distributed data. Developing appropriate distributed algorithm for Spark is essential to decrease the processing time of distributed recommendation. the existing FP-Growth in Spark is a popular parallel recommendation method but getting the best performance only when the memory of machines can accommodate all immediate Resilient Distributed DataSets (RDDs). However, memory of many practice data centers is still not large enough for large data sets. therefore, in this paper, a caching-based parallel FP-Growth is proposed which consists of an integer-based sorting and an RDD-caching strategy to improve the efficiency. Experimental results show that the proposal decreases the execution time by 32.37% on average compared withthe existing parallel FP-Growth in Spark. Furthermore, impacts of some important parameters upon the performance of the proposal are analyzed by numerous realistic experiments in Spark.
How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and sp...
详细信息
ISBN:
(纸本)9783030050511;9783030050504
How to measure SimRank similarity of all-pair vertices in a graph is a very important research topic which has a wide range of applications in many fields. However, computation of SimRank is costly in both time and space, making traditional computing methods failing to handle graph data of ever-growing size. this paper proposes a parallel multi-level solution for all-pair SimRank similarity computing on large graphs. We partition the objective graph first withthe idea of modularity maximization and get a collapsed graph based on the blocks. then we compute the similarities between verteices inside a block as well as the similarities between the blocks. In the end, we integrate these two types of similarities and calculate the approximate SimRank simlarities between all vertex pairs. the method is implemented on Spark platform and it makes an improvement on time efficiency while maintaining the effectiveness compared to SimRank.
Graphs play a major role in modeling many real-world problems. Due to the availability of huge data, the graph processing in serial environment become more complex. thus, fast and efficient algorithms which work effec...
详细信息
ISBN:
(纸本)9781351124140;9780815357605
Graphs play a major role in modeling many real-world problems. Due to the availability of huge data, the graph processing in serial environment become more complex. thus, fast and efficient algorithms which work effectively utilizing the modern technologies are required. A maximal clique problem is one of the graph processing methods which is used in many applications. the Bron-Kerbosch (BK) algorithm is the most widely used and accepted algorithm for listing each and every maximal clique in a graph. Here an idea of a parallel version of BK algorithm is proposed which will reduce the computation time to a large extent than its serial implementation. It utilizes the cluster computing strategy.
this work describes the parallel methodology for a football tracking algorithm based on multipartite graphs using MPI and OpenMP. the proposed algorithm use a consumer producer scheme to overlap the computing time of ...
详细信息
ISBN:
(纸本)9781728106465
this work describes the parallel methodology for a football tracking algorithm based on multipartite graphs using MPI and OpenMP. the proposed algorithm use a consumer producer scheme to overlap the computing time of the two main procedures of the tracking algorithm: segmentation and tracking;as well a send-and-receive communication pattern to propagate the blob identities. We show how an hybrid system of data and task parallelization improves the execution time for 4K videos, achieving a speedup equal to 19.24 and a processing speed of 21.71 FPS with 128 threads.
Deep convolutional neural networks (CNNs) have proven its potential for many tasks related to object identification and classification. this study aims to show the performance of several convolutional neural networks ...
详细信息
ISBN:
(纸本)9781450366113
Deep convolutional neural networks (CNNs) have proven its potential for many tasks related to object identification and classification. this study aims to show the performance of several convolutional neural networks architectures applied to the diagnosis and screening of skin lesions in patients using different training techniques: Random weights initialization, feature extraction and extending model. A dataset of 1000 clinical images proven by biopsy or consensus among specialists were the examples applied at the various architectures which were end-to-end trained from images directly, using only pixels and disease labels as inputs. the predictions provided from the models intended to claim whether the lesion could be treated by doctors with images only on a teledermatology approach or if it is necessary to prescribe a biopsy or referral to a face-to-face consultation. the model can also tell the urgency of the case and the group of diseases which that lesion belongs to. Performances of deep neural networks in all proposed tasks demonstrated that artificial intelligence has the potential to perform the screening of skin lesions with a level of competence comparable to dermatologists. It is projected 6.3 billion signatures of smartphone by the year 2021 [38]. therefore, deep neural networks incorporated in mobile devices can amplify the reach of dermatologists outside their offices providing universal low-cost access to dermatological diagnostics.
Today's Data Centers networks depend on optical switching to overcome the scalability limitations of traditional architectures. All optical networks most often use slotted Time Division Multiple Access (TDMA) oper...
详细信息
ISBN:
(纸本)9781538649756
Today's Data Centers networks depend on optical switching to overcome the scalability limitations of traditional architectures. All optical networks most often use slotted Time Division Multiple Access (TDMA) operation;their buffers are located at the optical network edges and their organization relies on effective scheduling of the TDMA frames to achieve efficient sharing of the network resources and a collision-free network operation. Scheduling decisions have to be taken in real time, a process that becomes computationally demanding as the network size increases. Accelerators provide a solution and the present paper proposes a scheduler accelerator to accommodate a data center network divided into points of delivery (pods) of racks and exploiting hybrid electro-optical top-of-rack (ToR) switches that access an all-optical inter-rack network. the scheduler accelerator is a parallel scalable architecture with application specific processing engines. Case studies of 2, 4, 8, 16 processors configuration are presented for the processing of all the transfer TDMA time slot requests for the cases of 512 and 1024 ToR network nodes. the architecture is realized on a Xilinx VC707 board to validate the results.
At present, the size of the data is growing rapidly. In such a situation, it is necessary to keep an eye on the speed of the data without undermining it. It is also important to note that while processingthe data, it...
详细信息
ISBN:
(纸本)9781728106465
At present, the size of the data is growing rapidly. In such a situation, it is necessary to keep an eye on the speed of the data without undermining it. It is also important to note that while processingthe data, its quality remains intact. this is the reason that data mining technology has become very important in the field of predictions in the scientific mining area, commercial and environment sectors. In this case, the need for parallelprocessing becomes important. therefore, the aim of this paper is to analyze and perform computation times of different classification algorithms on many datasets using parallel profiling and computing techniques. Performance analysis is based on many factors, such as the unique nature of the dataset, the size, and type of the class, the diversity of the data in the data set, and so on. Many researchers are working on the optimization of classification algorithms which are not showing accurate results according to the processor (core) capacity. So, in this paper, we have displayed some simulation results which discuss the processor's size, efficiency, and workload as well as the complexity of input instructions in the group;Which will help researchers to optimize the code for maximum use of the core. At the end of the paper, we have given a comparative study of the optimized algorithm based on a parallel approach and tuned algorithm based on parallel performance.
Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and res...
详细信息
ISBN:
(纸本)9781538649756
Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope withthe requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole-exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. the robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.
this paper presents an integrated 90-nm CMOS low-energy dual-channel multiplier based on the serial/parallel scheme. Moreover, a comparative study of different multipliers is accomplished at 90 nm CMOS technology. Due...
详细信息
ISBN:
(数字)9781728163543
ISBN:
(纸本)9781728163550
this paper presents an integrated 90-nm CMOS low-energy dual-channel multiplier based on the serial/parallel scheme. Moreover, a comparative study of different multipliers is accomplished at 90 nm CMOS technology. Due to the tendency of the market to minimize devices' size to make them portable, small implementation area is considered by using the dual-channel multiplier. In this paper, a new implementation of multiplication process comes to exceed the battery lifetime by decreasing the power consumption by up to 93%. Furthermore, there is a reduction in the area by up to 67% as compared with previous work which makes it more appropriate for low power handheld devices applications. Also, great saving in energy is attained by using the proposed scheme when performing complex multiplications task.
暂无评论