Wireless communications are expected to take place in increasingly complicated scenarios, such as dense urban, forest, tunnel, and other cluttered environments. A key emerging challenge is to understand the physics an...
详细信息
Wireless communications are expected to take place in increasingly complicated scenarios, such as dense urban, forest, tunnel, and other cluttered environments. A key emerging challenge is to understand the physics and characteristics of wave propagation in these environments, which is critical for the analysis, design, and application of advanced mobile and wireless communication systems. In this paper, we present a full-wave field-based computational methodology for radio wave propagation in complex urban environments. Both transmitting/receiving antennas and propagation environments are modeled by first-principles calculations. A system-level, large scene analysis is enabled by the scalable, ultraparallel algorithms on the emerging high-performance computing platforms. The proposed computational framework is verified and validated with semianalytical models and representative measurements.
A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cacheblocked algorithm by K degrees agstr...
详细信息
A novel parallel formulation of Hessenberg-triangular reduction of a regular matrix pair on distributed memory computers is presented. The formulation is based on a sequential cacheblocked algorithm by K degrees agstrom et al. [BIT, 48 (2008), pp. 563 584]. A static scheduling algorithm is proposed that addresses the problem of underutilized processes caused by two-sided updates of matrix pairs based on sequences of rotations. Experiments using up to 961 processes demonstrate that the new formulation is an improvement of the state of the art and also identify factors that limit its scalability.
In the current study, we demonstrate an automated optimization method based on the Levenberg-Marquardt algorithm for electron-optical systems, incorporating an adaptive merit function switching process that enhances m...
详细信息
In the current study, we demonstrate an automated optimization method based on the Levenberg-Marquardt algorithm for electron-optical systems, incorporating an adaptive merit function switching process that enhances minimization convergence. The algorithm is successfully applied to three energy spectrometer designs-a radial mirror analyzer, a parallel radial mirror analyzer, and a parallel magnetic sector analyzer-by first implementing practical modifications to the device geometries. We then optimize the key design parameters to yield good focusing optics. The robustness of the method towards starting configuration is also demonstrated. The procedure can greatly enhance efficiency in the design process of electron-optical systems.
In this study, the convergence rates of the two one-node CMFDs were evaluated for the sequential and the parallel algorithm. The results from the Fourier analyses and the numerical simulations showed good agreement, e...
详细信息
Frequent Itemsets Mining has been applied in many data processing applications with remarkable results. Recently, data streams processing is gaining a lot of attention due to its practical applications. Data in data s...
详细信息
Frequent Itemsets Mining has been applied in many data processing applications with remarkable results. Recently, data streams processing is gaining a lot of attention due to its practical applications. Data in data streams are transmitted at high rates and cannot be stored for offline processing making impractical to use traditional data mining approaches (such as Frequent Itemsets Mining) straightforwardly on data streams. In this paper, two single-pass parallel algorithms based on a tree data structure for Frequent Itemsets Mining on data streams are proposed. The presented algorithms employ Landmark and Sliding Window Models for windows handling. In the presented paper, as in other revised papers, if the number of frequent items on data streams is low then the proposed algorithms perform an exact mining process. On the contrary, if the number of frequent patterns is large the mining process is approximate with no false positives produced. Experiments conducted demonstrate that the presented algorithms outperform the processing time of the hardware architectures reported in the state-of-the-art.
Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core s...
详细信息
Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a 3.7x speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, 3 x 3 and 5 x 5, comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.
Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting gra...
详细信息
Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting graphical processing unit (GPU)'s parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a 10-fold signal recovery speedup, thanks to adhoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
The PageRank algorithm for determining the importance of Web pages has become a central technique in Web search. This algorithm uses the Power method to compute successive iterates that converge to the principal eigen...
详细信息
The PageRank algorithm for determining the importance of Web pages has become a central technique in Web search. This algorithm uses the Power method to compute successive iterates that converge to the principal eigenvector of the Markov chain representing the Web link graph. In this work we present an effective heuristic Relaxed and Extrapolated algorithm based on the Power method that accelerates its convergence. A hybrid parallel implementation of this algorithm has been designed by combining various OpenMP threads for each MPI process and several strategies of data distribution among nodes have been analyzed. The results show that the proposed algorithm can significantly speed up the convergence time with respect to the parallel Power algorithm. (C) 2016 Civil-Comp Ltd. and Elsevier Ltd. All rights reserved.
Thanks to the increasing success of virtualization technologies and processing capabilities of computing devices, the deployment of virtual network functions is evolving towards a unified approach aiming at concentrat...
详细信息
Thanks to the increasing success of virtualization technologies and processing capabilities of computing devices, the deployment of virtual network functions is evolving towards a unified approach aiming at concentrating a huge amount of such functions within a limited number of commodity servers. To keep pace with this trend, a key issue to address is the definition of a secure and efficient way to move data between the different virtualized environments hosting the functions and a centralized component that builds the function chains within a single server. This paper proposes an efficient algorithm that realizes this vision and that, by exploiting the peculiarities of this application domain, is more efficient than classical solutions. The algorithm that manages the data exchanges is validated by performing a formal verification of its main safety and security properties, and an extensive functional and performance evaluation is presented. (C) 2017 Elsevier Inc. All rights reserved.
Real-time simulation is important for the fuel cell online diagnostics and hardware-in-the-loop tests before industrial applications. However, it is hard to implement real-time multidimensional, multiphysical fuel cel...
详细信息
Real-time simulation is important for the fuel cell online diagnostics and hardware-in-the-loop tests before industrial applications. However, it is hard to implement real-time multidimensional, multiphysical fuel cell models due to the model numerical stiffness issues. In this paper, the numerical stiffness of a tubular solid oxide fuel cell real-time model is first analyzed to identify the perturbation ranges related to the fuel cell electrochemical, fluidic, and thermal domains. Some of the commonly used ordinary differential equation (ODE) solvers are then tested for the real-time simulation purpose. At last, a novel two-stage third-order parallel stiff ODE solver is proposed to improve the stability and reduce the multidimensional real-time fuel cell model execution time. To verify the proposed model and the ODE solver, real-time simulation experiments are carried out in a common embedded real-time platform. The experimental results show that the execution speed satisfies the requirement of real-time simulation. The solver stability under strong stiffness and the high model accuracy are also validated. The proposed real-time fuel cell model and the stiff ODE solver can also help to design the online diagnostic control method.
暂无评论