Heterogeneous architectures proved successful in achieving unprecedented performance and energy-efficiency. However, taking advantage of these diverse processing elements is still hard. Programmers need to code throug...
详细信息
Arithmetic coding (AC) is widely used for lossless data compression, and parallelization of arithmetic coding is relatively simple because all symbols can be encoded independently. On the other hand, parallel adaptive...
详细信息
Modern Internet of Things (IoT) end nodes must support computational intensive workloads at a limited power budget. parallel ultra-low-power (PULP) architectures are a promising target for this scenario, and the avail...
详细信息
Modern Internet of Things (IoT) end nodes must support computational intensive workloads at a limited power budget. parallel ultra-low-power (PULP) architectures are a promising target for this scenario, and the availability of highly optimized software libraries is crucial to exploit parallelism and reduce software development costs. This letter proposes an efficient parallel design of the widely used short-time Fourier transform (STFT) and discrete wavelet transform (DWT) transforms targeting ultra-low-power IoT devices. We address key performance challenges related to fine-grained synchronization and banking conflicts in shared memory. We achieve high throughput (50.95 samples/mu s, on average), good parallel speedup (up to 6.79x), and high energy efficiency (up to 172.55 GOp/s/W) on a cluster of eight RISC-V cores optimized for PULP operation.
Ebb-and-flow irrigation system is a closed-loop efficient subirrigation system. In this study, a numerical model (EBMAN-HP) has been presented for simulation of all components (variations of water depth in supply tank...
详细信息
Ebb-and-flow irrigation system is a closed-loop efficient subirrigation system. In this study, a numerical model (EBMAN-HP) has been presented for simulation of all components (variations of water depth in supply tank and concrete floor/tank) and all phases of flood-floor/bench ebb-and-flow subirrigation systems. The model benefits from a fine-tuned computational algorithm for hysteresis module. The model can simulate both time-specified and sensor-based irrigation scheduling. Since ebb-and-flow irrigation system incorporates numerous pots, Richards' equation should be solved for several pots to obtain sufficient understanding of the whole system. Therefore, the proposed model benefits from OpenMP parallel programming to speed up the execution time. Besides, a novel parallel TDMA solver have been presented that accelerates the computation speed by breaking a large system of equations into several simultaneously-solved portions. The model has been validated and verified against several analytical, numerical and experimental test cases. The results showed hysteresis module can completely remove artificial pumping error in two critical test cases. The parallel TDMA solver was shown to be able to reach to the speedup of about 90 %. The model was shown to perform faster than Hydrus-1D even in serial mode for coarser grids (about 52 % faster in average of 8 test cases) and similar to Hydrus-1D for dense grids (about 6 % faster in average of 4 test cases) with the perfect agreement (NSE between 0.999 and 1.000 and the average difference in MBE less than 0.1 % for 12 cases). parallel model could boost the models' performance to about 500 % using 6 processors. Finally, comprehensive illustrative example has been shown to present almost all capabilities of model.
GPUs are massively parallel processors that allow solving problems that are not viable to traditional processors like CPUs. However, implementing applications for GPUs is challenging to programmers as it requires para...
详细信息
Accurately simulating the radiation damping of an infinite domain is a crucial issue for reasonably and effectively evaluating the isolation effect of three-dimensional (3D) base-isolated nuclear structures considerin...
详细信息
Accurately simulating the radiation damping of an infinite domain is a crucial issue for reasonably and effectively evaluating the isolation effect of three-dimensional (3D) base-isolated nuclear structures considering soil structure interaction (SSI). The scaled boundary finite element method (SBFEM) has excellent advantages that enable it to precisely fulfil the radiation conditions. In this study, the finite element method (FEM) and SBFEM were combined to develop a coupled FEM-SBFEM technology. The structure and finite domain were discretized using FEM, and the infinite soil was realised using SBFEM. A complete and efficient calculation procedure for SSI analysis was developed based on the coupled FEM-SBFEM, improving the calculation efficiency through parallel programming. The accuracy of the developed procedure was verified using a numerical example. Furthermore, the dynamic response of a large-scale refined 3D base-isolated reactor building considering SSI was analysed using the proposed method. The results demonstrate that the 3D base isolation technology has excellent isolation performance. Additionally, we compared the response results of the proposed coupled FEM-SBFEM and direct method. The proposed procedure has broad engineering applicability and can be used for seismic SSI analysis of large-scale nuclear structures while maintaining high computational accuracy and stability.
Nowadays, shared-memory parallel architectures have evolved and new programming frameworks have appeared that exploit these architectures: OpenMP, TBB, Cilk Plus, ArBB and OpenCL. This article focuses on the most exte...
详细信息
Nowadays, shared-memory parallel architectures have evolved and new programming frameworks have appeared that exploit these architectures: OpenMP, TBB, Cilk Plus, ArBB and OpenCL. This article focuses on the most extended of these frameworks in commercial and scientific areas. This paper shows a comparative study of these frameworks and an evaluation. The study covers several capacities, such as task deployment, scheduling techniques, or programming language abstractions. The evaluation measures three dimensions: code development complexity, performance and efficiency, measure as speedup per watt. For this evaluation, several parallel benchmarks have been implemented with each framework. These benchmarks are created to cover certain scenarios, like regular memory access or irregular computation. The conclusions show some highlights, like the fact that some frameworks (OpenMP, Cilk Plus) are better for transforming quickly a sequential code, others (TBB) have a small footprint which is ideal for small problems, and others (OpenCL) are suited for heterogeneous architectures but they require a very complex development process. The conclusions also show that the vectorization support is more critical than multitasking to achieve efficiency for those problems where this approach fits.
SAM is a parallel programming model in Haskell, suitable for manycore computing environments. It has been developed in two versions: SAMSoc adopting the socket communication and SAMSTM adopting the software transactio...
详细信息
OpenMP is the predominant standard for shared memory systems in high-performance computing (HPC), offering a tasking paradigm for parallelism. However, existing OpenMP implementations, like GCC and LLVM, face computat...
详细信息
The Air Traffic Management system is evolving to deal with efficiency, capacity, safety and environmental challenges. Progress along these fronts requires the development of trajectory planning and prediction tools th...
详细信息
The Air Traffic Management system is evolving to deal with efficiency, capacity, safety and environmental challenges. Progress along these fronts requires the development of trajectory planning and prediction tools that can go beyond the current deterministic planning paradigm to deal with an uncertain meteorological and operational context. In this work, we introduce a novel flight planning methodology to generate weather-optimal 4D flight plans under uncer-tainty. By leveraging general-purpose computing on graphics processing units and combining continuous and discrete elements in an integrated fashion, we can simulate and evaluate multiple trajectory options under multiple scenarios in parallel, allowing us to provide quick iterations to a stochastic optimization algorithm. Our computational experiments show that our proposed solutions can provide efficient solutions in seconds, as required in practical settings, while allowing for simple integration of future extensions thanks to its simulation-based nature.
暂无评论