We herein describe the performance evaluation of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm within the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library....
详细信息
ISBN:
(纸本)9783030105495;9783030105488
We herein describe the performance evaluation of a modular implementation of the MGRIT (MultiGrid-In-Time) algorithm within the context of the PETSc (the Portable, Extensible Toolkit for Scientific computing) library. Our aim is to give the PETSc users the opportunity of testing the MGRIT parallel-in-time approach as an alternative to the Time Stepping integrator (TS), when solving their problems arising from the discretization of linear evolutionary models. To this end, we analyzed the performance parameters of the algorithm in order to underline the relationship between the configuration factors and problem characteristics, intentionally overlooking any accuracy issue and spacial parallelism.
Simulation of skeletal muscle activation can help to interpret electromyographic measurements and infer the behavior of the muscle fibers. Existing models consider simplified geometries or a low number of muscle fiber...
详细信息
ISBN:
(纸本)9788494919459
Simulation of skeletal muscle activation can help to interpret electromyographic measurements and infer the behavior of the muscle fibers. Existing models consider simplified geometries or a low number of muscle fibers to reduce the computation time. We demonstrate how to simulate a finely-resolved model of biceps brachii with a typical number of 270.000 fibers. We have used domain decomposition to run simulations on 27.000 cores of the supercomputer HazelHen at HLRS in Stuttgart, Germany. We present details on opendihu, our software framework. Its configurability, efficient data structures and modular software architecture target usability, performance and extensibility for future models. We present good parallel weak scaling of the simulations.
Simulating natural phenomena is one of the most important areas in computer graphics area. As one of the natural phenomena simulations, we focused on the simulation of fire flames. Previous fire flame simulations main...
详细信息
ISBN:
(纸本)9789811306952;9789811306945
Simulating natural phenomena is one of the most important areas in computer graphics area. As one of the natural phenomena simulations, we focused on the simulation of fire flames. Previous fire flame simulations mainly focused on the simulation sequences. At this time, those methods are not suitable for modern computer graphics and massively parallel processing architectures. In this work, we present a prototype implementation of the fire flame simulation system, based on the Compute Unified Device Architecture (CUDA) and Open Graphics Library (OpenGL). Our system shows highly efficient execution of those simulations, to show the real-time fire flame simulations.
Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in ...
详细信息
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern...
详细信息
ISBN:
(数字)9781728190747
ISBN:
(纸本)9781728183824
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern manycore architectures like GPUs. The storage data structures help sparse matrices store in a memory-saving format, but they bring difficulties in optimizing the performance of SpDM on modern GPUs due to irregular data access of the sparse structure, which results in lower resource utilization and poorer performance. In this paper, we refer to the roofline performance model of GPUs to design an efficient SpDM algorithm called GCOOSpDM, in which we exploit coalescent global memory access, fast shared memory reuse, and more operations per byte of global memory traffic. Experiments are evaluated on three Nvidia GPUs (i.e., GTX 980, GTX Titan X Pascal, and Tesla P100) using a large number of matrices including a public dataset and randomly generated matrices. Experimental results show that GCOOSpDM achieves 1.5-8x speedup over Nvidia's library cuSPARSE in many matrices.
Nowadays, large amounts of high resolution remote-sensing images are acquired daily. However, the satellite image classification is requested for many applications such as modern city planning, agriculture and environ...
详细信息
ISBN:
(纸本)9783319990101;9783319990095
Nowadays, large amounts of high resolution remote-sensing images are acquired daily. However, the satellite image classification is requested for many applications such as modern city planning, agriculture and environmental monitoring. Many researchers introduce and discuss this domain but still, the sufficient and optimum degree has not been reached yet. Hence, this article focuses on evaluating the available and public remote-sensing datasets and common different techniques used for satellite image classification. The existing remote-sensing classification methods are categorized into four main categories according to the features they use: manually feature-based methods, unsupervised feature learning methods, supervised feature learning methods, and objectbased methods. In recent years, there has been an extensive popularity of supervised deep learning methods in various remote-sensing applications, such as geospatial object detection and land use scene classification. Thus, the experiments, in this article, carried out on one of the popular deep learning models, Convolution Neural Networks (CNNs), precisely AlexNet architecture on a standard sounded dataset, UC-Merceed Land Use. Finally, a comparison with other different techniques is introduced.
Due to recent outbursts in the number of point and shoot devices. The data generated in the form of raw video has also increased exponentially. On YouTube alone, 300 hours of video alone is added every minute. This br...
详细信息
ISBN:
(纸本)9781538681138
Due to recent outbursts in the number of point and shoot devices. The data generated in the form of raw video has also increased exponentially. On YouTube alone, 300 hours of video alone is added every minute. This brings up the need of coming up with a long term solutions to tackle such behemoth tasks. The solution should be flexible, robust with ability to scale up and down dynamically as and when required. Needless to say, such system have a high cost and it is also of utmost importance to consider the cost factor and feasibility. Hadoop is one such framework which met all our requirements. It provides facility to distribute a big and complex task to low as well as high end computers to achieve a common goal. Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. Another advantage of using a Hadoop cluster is that it is fault tolerance as copies of the data are made and can be used in case of failure of a node. etc
In smart cites, more and more smart mobile devices (SMDs) have many computation-intensive applications to be processed. Mobile cloud computing (MCC) as an effective technology can help SMDs reduce their energy consump...
详细信息
ISBN:
(纸本)9783319935546;9783319935539
In smart cites, more and more smart mobile devices (SMDs) have many computation-intensive applications to be processed. Mobile cloud computing (MCC) as an effective technology can help SMDs reduce their energy consumption and processing delay by offloading the tasks on the distributed cloudlet. However, due to long transmission delay resulting from the unstable wireless environment, the SMD may be out of the serving area before the cloudlet transmits responses to the user. Thus, delay is a crucial problem for the MCC offloading. In this paper, we consider a multi-SMDs MCC system, where each SMD having an application to be offloaded asks for computation offloading to a cloudlet. In order to minimize the total delay of the SMDs in the system, we jointly take the offloading cloudlet selection, wireless access selection, and computation resource allocation into consideration. We formulate the total delay minimization problem as a mixed integer nonlinear programming (MINLP) problem which is NP-hard. We propose an improved genetic algorithm to obtain a local optimal result. Simulation results demonstrated that our proposal could effectively reduce the system delay.
At the occurrence of large-scale disasters, many regions were isolated from critical information exchanges due to the damages on communication and information infrastructures. In those serious disasters, quick and fle...
详细信息
ISBN:
(纸本)9783319936598;9783319936581
At the occurrence of large-scale disasters, many regions were isolated from critical information exchanges due to the damages on communication and information infrastructures. In those serious disasters, quick and flexible disaster recovery network and computing system is required to deliver the disaster related information after disaster. In this paper, mobile cloud computing for information exchange among isolated shelters is introduced. The vehicle with mobile cloud server traverses the isolated shelters and exchanges information and returns to the disaster headquarter which is connected to cloud computing through Internet. DTN function is introduced to store, carry and exchange messages as a message ferry among the shelters even in the challenged network environment where wired and wireless communication links are completely damaged. The prototype system of mobile vehicle computing is constructed to evaluate the effects of proposed system.
Due to its favorable isolation and security features, TrustZone technology has been widely used in mobile payment, cloud computing, Internet of Things, etc. However, the cryptographic algorithm provided by the trusted...
详细信息
ISBN:
(数字)9781728196688
ISBN:
(纸本)9781728196695
Due to its favorable isolation and security features, TrustZone technology has been widely used in mobile payment, cloud computing, Internet of Things, etc. However, the cryptographic algorithm provided by the trusted execution environment such as Open-TEE and OP-TEE is inadequate, which brings inconvenience to the development of trusted applications. In addition, the system mainly uses the software algorithms to provide cryptographic services and does not make full use of hardware cryptographic resources, which results in the low efficiency of cryptographic service. To Figure out these two issues, aiming at features of domestic Phytium processors, this paper establishes a unified cryptographic service framework in the TrustZone platform to build a more rich and complete cryptographic algorithm library. Meanwhile, it makes full use of the hardware cryptographic resources of the processor platform to improve the performance of cryptographic service. Test result shows that the proposed framework can bring significant convenience to the development of trusted applications. Meanwhile, compared with the original cryptographic service performance, the cryptographic service provided by the new framework can achieve significant performance improvement.
暂无评论