Multiple Sequence Alignment (MSA) is an important operation in Bioinformatics, used to simultaneously compare 3 or more sequences. The MSA problem was proven NP-Hard, so strategies have been proposed to reduce the sea...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Multiple Sequence Alignment (MSA) is an important operation in Bioinformatics, used to simultaneously compare 3 or more sequences. The MSA problem was proven NP-Hard, so strategies have been proposed to reduce the search space and solve it in parallel. Recently, asymmetric multicore processors (AMPs) have become popular, with performance and energy-efficient cores, like the P-Cores and E-cores from Intel. However, parallel MSA applications have complex access patterns and adapting them for AMPs can be challenging. In this paper, we propose PA-Star2 1 , an asymmetric-aware strategy based on A-Star, which computes optimal MSAs taking asymmetry into account when distributing the search space among threads. Our experimental results show that the proposed optimizations can reduce considerably the average execution time of PA-Star2 achieving a speedup of up to 7.70×. We also show that the asymmetric-aware strategy can reduce the average execution time for one of the hardest sequences set from the BAliBASE benchmark, when compared to the symmetric counterpart. Finally, we show that our approach is energy-efficient. 1 1PA-Star2 is open source and the code is publicly available at1PA-Star2 is open source and the code is publicly available at https://***/danielsundfeld/astar_msa
Current parallel systems are increasingly heterogeneous, mixing devices of different types and computing capabilities. Exploiting multiple different devices for the same application continues to be a challenge that ra...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Current parallel systems are increasingly heterogeneous, mixing devices of different types and computing capabilities. Exploiting multiple different devices for the same application continues to be a challenge that ranges from technical problems related to synchronizing and communicating diverse devices to problems of load distribution and flexibility to adjust the computation to the platform resources. In this work, we study the problem of using and extending a heterogeneous portability layer to program and adapt HSOpticalFlow to heterogeneous platforms. HSOpticalFlow is a streaming application to estimate the apparent movement of objects in a sequence of images. It is a simple but characteristic example of the structure of applications based on multilevel ILS (Iterative Loop Stencil), also known as multi-grid methods, applied to a sequence of inputs. Starting from the original CUDA reference code, we present a methodology and programming techniques based on the Controller programming model to implement it as a pipeline among multiple devices. We discuss a technique to determine a proper work partition and mapping for a set of devices. This allows for building very efficient parallel solutions, using similar devices or taking advantage of devices with lower computing power, to reduce the load and increase the productivity of more powerful ones. We present the results of an experimental study using several GPUs of different vendors, architectures, and generations, showing that this solution allows combinations of devices to be efficiently exploited to improve performance. Specifically, the results include speedups of 1.91x using two NVIDIA A100 GPUs and 1.21x using one NVIDIA V100 GPU and one AMD WX9100 GPU, which is about $3 x$ slower than the NVIDIA GPU for this application.
Load imbalance is a critical problem that degrades the performance of parallelized applications in massively parallelprocessing. Although an MPI/OpenMP implementation is widely used for parallelization, users must ma...
详细信息
parallel multi-dimensional interpolation in the complex Fourier domain (also known as the non-uniform fast Fourier transform) encounters major challenges, due to computation issues such as increasing computation compl...
详细信息
ISBN:
(纸本)9781665449663
parallel multi-dimensional interpolation in the complex Fourier domain (also known as the non-uniform fast Fourier transform) encounters major challenges, due to computation issues such as increasing computation complexity and space complexity. For instance, the contemporary graphics processing unit (GPU) is limited by the relatively small memory size and the increasing size of the interpolator. This issue makes multi-dimensional Fourier domain interpolation problematic, while finding an optimized configuration remains an unsolved challenge in industrial applications, e.g. magnetic resonance imaging (MRI) or computerized tomography. To enhance the performance of multi-dimensional interpolation on GPU, a new parallel hierarchical tensor products tree approach is proposed. The method combines the composite 1D interpolators under the limitation imposed by the memory size of the device. The resultant run-time performance on the GPU varies with different configurations. The best-tuned method is 2.52-4.98x faster than the compressed sparse row (CSR) on the discrete GPU and 4.16-9.59x faster than CSR on the integrated GPU. The hierarchical tensor product interpolation is used to compute the multi-dimensional nonuniform fast Fourier transform. An acceleration of 30x was achieved in 3D MRI reconstruction.
The image or video input from the camera is one of the important data sources for unmanned vehicles to perceive the environment. However, the 2D/3D bounding box can only provide a very coarse approximation because one...
详细信息
ISBN:
(纸本)9781665435741
The image or video input from the camera is one of the important data sources for unmanned vehicles to perceive the environment. However, the 2D/3D bounding box can only provide a very coarse approximation because one box often contains other targets and background. In order to solve the problem of precise target tracking and computing limitations of edge devices, this paper proposes Polarmask-Tracker, a lightweight segmentation-based multi-object tracking network for vehicular edge devices. Polarmask-Tracker extended the lightweight Polarmask segmentation head with tracking vector. The polar mask replaces the traditional mask prediction by regression of a group of fixed edge points in polar coordinate system, which can greatly optimize the computational complexity and regression difficulty of the mask. With an additional tracking vector branch generated based on mask, the model can learn tracking tasks in an end-to-end manner. Finally, we further accelerated the entire model based on TensorRT and achieve real-time tracking on mobile edge computing platform. Different from previous evaluations on the ImageNet and COCO datasets, this study uses the KITTI tracking dataset to extend the instance segmentation task to segmentation tracking, also called MOTS. At the same time, the target scales captured from the autonomous vehicle camera are usually smaller, which also brings additional challenges. Evaluations on NVidia Jetson AGX show that the final Polarmask-Tracker can achieve 122.55 FPS, 46.57 mAP for mask segmentation, 56.418 HOTA for tracking.
To satisfy the need for analytical data in the development of digital services, many organizations use data warehouse, and, more recently, data lake architectures. These architectures have traditionally been accompani...
详细信息
ISBN:
(纸本)9783031081699;9783031081682
To satisfy the need for analytical data in the development of digital services, many organizations use data warehouse, and, more recently, data lake architectures. These architectures have traditionally been accompanied by centralized organizational models, where a single team or department has been responsible for gathering, transforming, and giving access to analytical data. However, such centralized models presuppose stability and are incompatible with agile software development where applications and databases are continuously updated. To achieve more agile forms of data management, some organizations have therefore begun to experiment with distributed data management models such as "data meshes". Research on this topic is however limited. In this paper, we report findings from a case study of a public sector organization in Norway that has begun the transition from centralized to distributed data management, outlining both the benefits and challenges of a distributed approach.
In this demo, we implement a Partial Evaluation-based distributedrdF Graph system (PEG for short), which can implement partial evaluation without modifying the single-machine rdF graph system at each site. When a que...
详细信息
In OFB mode, the output of the cryptographic algorithm is fed back to the input of the cryptographic algorithm. The OFB mode does not directly encrypt plaintext through cryptographic algorithms, but generates cipherte...
详细信息
ISBN:
(数字)9798350365443
ISBN:
(纸本)9798350365450
In OFB mode, the output of the cryptographic algorithm is fed back to the input of the cryptographic algorithm. The OFB mode does not directly encrypt plaintext through cryptographic algorithms, but generates ciphertext blocks by XOR combining plaintext blocks with the output of cryptographic algorithms. In OFB encryption mode, messages are treated as bitstreams, and the output of block encryption is added to the encrypted messages, making it difficult for bit errors to propagate. The SM4 algorithm is a symmetric cryptographic technique developed by the Chinese National Cryptographic Administration and standardized as part of the State Encryption Standard in China. SM4 is widely adopted in various applications, including wireless communication systems and secure data transmissions. In this paper, a fast implementation method of SM4 in OFB mode is proposed. We calculate the possible parallelization operations in OFB mode using the operational relationship between the SM4 algorithm round functions, and ensure the correctness of SM4-OFB encryption through a feedback compensation method.
The large-scale penetration of distributed photovoltaic (PV) power generation systems has brought new challenges to the topology identification and detection of traditional distribution networks. This article mainly s...
详细信息
ISBN:
(数字)9798350375138
ISBN:
(纸本)9798350375145
The large-scale penetration of distributed photovoltaic (PV) power generation systems has brought new challenges to the topology identification and detection of traditional distribution networks. This article mainly studies the topology identification technology (TIT) of distributed PV low-voltage (LV) distribution network lines, aiming to design a topology identification method that can adapt to the dynamic changes of the power grid, have large-scale capacity, and improve system accuracy under the same conditions. At the same time, the data processing speed has been improved. This article first constructs a system model, including node model, edge model, and parameter model. It mathematically represents the topology structure of the power grid using graph theory and designs a topology recognition algorithm based on optimization techniques and state estimation. This algorithm is used to solve the distributed characteristics of power grid topology recognition, the difficulty of data collection, the dynamic diversity of power grid structure, the uncertainty of equipment parameters, the high computational complexity of data processing, and the communication constraints of power grid topology recognition. This algorithm adopts modern programming languages and parallel computing frameworks, making it easy to implement efficiently. The results on the simulation platform show that the highest recall rate for 22 test cases is 93.8%, and the response time for test cases is 425 ms to 980 ms, providing a fast response to the information space of the grid.
As Advanced Persistent Threats (APTs) proliferate and evolve, they constitute an increasingly formidable challenge to organizational cybersecurity frameworks. The imperative for innovative, multifaceted detection meth...
As Advanced Persistent Threats (APTs) proliferate and evolve, they constitute an increasingly formidable challenge to organizational cybersecurity frameworks. The imperative for innovative, multifaceted detection methodologies has never been more critical. This manuscript elucidates a groundbreaking framework that ingeniously synergizes Central processing Unit (CPU) utilization metrics with the principles of Zero-Trust architecture. Our approach scrutinizes the nuanced, idiosyncratic patterns of CPU utilization that are indicative of APT activities. Significantly, this method is adept at identifying hallmarks of APTs congruent with criteria delineated in the MITRE ATT&CK framework-specifically in stages antecedent to lateral movement tactics. parallel to this, the framework assimilates the austere security policies typified by Zero Trust architecture, culminating in a holistic, dynamically adaptive defense mechanism. Rigorous experimental validations conducted in realistic operational environments substantiate the efficacy of our approach, which attained an unparalleled accuracy rate of 99.7% in the detection of APTs. The manifest advantages of this multifaceted strategy extend beyond mere detection efficacy, offering perspicacious insights into the operational modalities of APTs, thereby fostering the capability for preemptive cybersecurity initiatives. The contributions of this study are poised to significantly augment both academic discourse and practical applications in the persistent endeavor to fortify cybersecurity infrastructures against ever-escalating APT threats.
暂无评论