We herein describe the performance evaluation of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm within the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library....
详细信息
ISBN:
(纸本)9783030105495;9783030105488
We herein describe the performance evaluation of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm within the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the PETSc users the opportunity of testing the MGRIT parallel-in-time approach as an alternative to the Time Stepping integrator (TS), when solving their problems arising from the discretization of linear evolutionary models. To this end, we analyzed the performance parameters of the algorithm in order to underline the relationship between the configuration factors and problem characteristics, intentionally overlooking any accuracy issue and spacial parallelism.
The fog computing approach has come up as a distributed mechanism for capturing of data, its further processing, and allocation of resources associated with the Internet of things (IoT). The IoT services require sever...
详细信息
Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in ...
详细信息
Nowadays, large amounts of high resolution remote-sensing images are acquired daily. However, the satellite image classification is requested for many applications such as modern city planning, agriculture and environ...
详细信息
ISBN:
(纸本)9783319990101;9783319990095
Nowadays, large amounts of high resolution remote-sensing images are acquired daily. However, the satellite image classification is requested for many applications such as modern city planning, agriculture and environmental monitoring. Many researchers introduce and discuss this domain but still, the sufficient and optimum degree has not been reached yet. Hence, this article focuses on evaluating the available and public remote-sensing datasets and common different techniques used for satellite image classification. The existing remote-sensing classification methods are categorized into four main categories according to the features they use: manually feature-based methods, unsupervised feature learning methods, supervised feature learning methods, and objectbased methods. In recent years, there has been an extensive popularity of supervised deep learning methods in various remote-sensing applications, such as geospatial object detection and land use scene classification. Thus, the experiments, in this article, carried out on one of the popular deep learning models, Convolution Neural Networks (CNNs), precisely AlexNet architecture on a standard sounded dataset, UC-Merceed Land Use. Finally, a comparison with other different techniques is introduced.
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern...
详细信息
ISBN:
(数字)9781728190747
ISBN:
(纸本)9781728183824
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern manycore architectures like GPUs. The storage data structures help sparse matrices store in a memory-saving format, but they bring difficulties in optimizing the performance of SpDM on modern GPUs due to irregular data access of the sparse structure, which results in lower resource utilization and poorer performance. In this paper, we refer to the roofline performance model of GPUs to design an efficient SpDM algorithm called GCOOSpDM, in which we exploit coalescent global memory access, fast shared memory reuse, and more operations per byte of global memory traffic. Experiments are evaluated on three Nvidia GPUs (i.e., GTX 980, GTX Titan X Pascal, and Tesla P100) using a large number of matrices including a public dataset and randomly generated matrices. Experimental results show that GCOOSpDM achieves 1.5-8x speedup over Nvidia's library cuSPARSE in many matrices.
Due to recent outbursts in the number of point and shoot devices. The data generated in the form of raw video has also increased exponentially. On YouTube alone, 300 hours of video alone is added every minute. This br...
详细信息
ISBN:
(纸本)9781538681138
Due to recent outbursts in the number of point and shoot devices. The data generated in the form of raw video has also increased exponentially. On YouTube alone, 300 hours of video alone is added every minute. This brings up the need of coming up with a long term solutions to tackle such behemoth tasks. The solution should be flexible, robust with ability to scale up and down dynamically as and when required. Needless to say, such system have a high cost and it is also of utmost importance to consider the cost factor and feasibility. Hadoop is one such framework which met all our requirements. It provides facility to distribute a big and complex task to low as well as high end computers to achieve a common goal. Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. Another advantage of using a Hadoop cluster is that it is fault tolerance as copies of the data are made and can be used in case of failure of a node. etc
At the occurrence of large-scale disasters, many regions were isolated from critical information exchanges due to the damages on communication and information infrastructures. In those serious disasters, quick and fle...
详细信息
ISBN:
(纸本)9783319936598;9783319936581
At the occurrence of large-scale disasters, many regions were isolated from critical information exchanges due to the damages on communication and information infrastructures. In those serious disasters, quick and flexible disaster recovery network and computing system is required to deliver the disaster related information after disaster. In this paper, mobile cloud computing for information exchange among isolated shelters is introduced. The vehicle with mobile cloud server traverses the isolated shelters and exchanges information and returns to the disaster headquarter which is connected to cloud computing through Internet. DTN function is introduced to store, carry and exchange messages as a message ferry among the shelters even in the challenged network environment where wired and wireless communication links are completely damaged. The prototype system of mobile vehicle computing is constructed to evaluate the effects of proposed system.
In smart cites, more and more smart mobile devices (SMDs) have many computation-intensive applications to be processed. Mobile cloud computing (MCC) as an effective technology can help SMDs reduce their energy consump...
详细信息
ISBN:
(纸本)9783319935546;9783319935539
In smart cites, more and more smart mobile devices (SMDs) have many computation-intensive applications to be processed. Mobile cloud computing (MCC) as an effective technology can help SMDs reduce their energy consumption and processing delay by offloading the tasks on the distributed cloudlet. However, due to long transmission delay resulting from the unstable wireless environment, the SMD may be out of the serving area before the cloudlet transmits responses to the user. Thus, delay is a crucial problem for the MCC offloading. In this paper, we consider a multi-SMDs MCC system, where each SMD having an application to be offloaded asks for computation offloading to a cloudlet. In order to minimize the total delay of the SMDs in the system, we jointly take the offloading cloudlet selection, wireless access selection, and computation resource allocation into consideration. We formulate the total delay minimization problem as a mixed integer nonlinear programming (MINLP) problem which is NP-hard. We propose an improved genetic algorithm to obtain a local optimal result. Simulation results demonstrated that our proposal could effectively reduce the system delay.
Due to its favorable isolation and security features, TrustZone technology has been widely used in mobile payment, cloud computing, Internet of Things, etc. However, the cryptographic algorithm provided by the trusted...
详细信息
ISBN:
(数字)9781728196688
ISBN:
(纸本)9781728196695
Due to its favorable isolation and security features, TrustZone technology has been widely used in mobile payment, cloud computing, Internet of Things, etc. However, the cryptographic algorithm provided by the trusted execution environment such as Open-TEE and OP-TEE is inadequate, which brings inconvenience to the development of trusted applications. In addition, the system mainly uses the software algorithms to provide cryptographic services and does not make full use of hardware cryptographic resources, which results in the low efficiency of cryptographic service. To Figure out these two issues, aiming at features of domestic Phytium processors, this paper establishes a unified cryptographic service framework in the TrustZone platform to build a more rich and complete cryptographic algorithm library. Meanwhile, it makes full use of the hardware cryptographic resources of the processor platform to improve the performance of cryptographic service. Test result shows that the proposed framework can bring significant convenience to the development of trusted applications. Meanwhile, compared with the original cryptographic service performance, the cryptographic service provided by the new framework can achieve significant performance improvement.
Sequence comparison is a task performed in several Bioinformatics applications daily all over the world. Algorithms that retrieve the optimal result have quadratic time complexity, requiring a huge amount of computing...
详细信息
ISBN:
(数字)9781728165820
ISBN:
(纸本)9781728165837
Sequence comparison is a task performed in several Bioinformatics applications daily all over the world. Algorithms that retrieve the optimal result have quadratic time complexity, requiring a huge amount of computing power when the sequences compared are long. In order to reduce the execution time, many parallel solutions have been proposed in the literature. Nevertheless, depending on the sizes of the sequences, even those parallel solutions take hours or days to complete. Pruning techniques can significantly improve the performance of the parallel solutions and a few approaches have been proposed to provide pruning capabilities for sequence comparison applications. This paper proposes and evaluates a variant of the block pruning approach that runs in multiple GPUs, in homogeneous or heterogeneous environments. Experimental results obtained with DNA sequences in two testbeds show that significant performance gains are obtained with pruning, compared to its non-pruning counterpart, achieving the impressive performance of 694.8 GCUPS (Billions of Cells Updated per Second) for four GPUs.
暂无评论