Currently, graph data are particularly common in various fields, and graph algorithms are increasingly widely used. However, due to the features of graph datasets, such as sparsity, the acceleration of graph algorithm...
详细信息
ISBN:
(纸本)9798400701405
Currently, graph data are particularly common in various fields, and graph algorithms are increasingly widely used. However, due to the features of graph datasets, such as sparsity, the acceleration of graph algorithms with traditional architectures is faced with great challenges. In terms of domain-specific architecture (DSA), accelerators are mostly designed for a specific graph algorithm because of the different features of different graph algorithms. there is still a great need for a general graph algorithmic processing architecture. In this work, we propose a parallel General Graph processing Architecture, GGPA. As a general graph computing architecture, GGPA can support multiple graph algorithms, realize parallelprocessing and fully explore the parallelism through a unique and effective subgraph partitioning method. GGPA implements flexibility at the execution paradigm level. During algorithm iteration, GGPA dynamically selects the execution paradigm by analyzing vertex update conditions to achieve the best performance. We verify GGPA at the CPU level and architecture simulator level, and experimental results show that GGPA achieves 1.01x to 5.86x speedup compared to other related start-of-the-art work.
When Cable-Driven parallel Robots (CDPRs) are doing some complex work, obstacles in the environment will interfere with cables and the mobile platform. It is a meaningful work to avoid these disturbances by planning t...
详细信息
ISBN:
(纸本)9798350350319;9798350350302
When Cable-Driven parallel Robots (CDPRs) are doing some complex work, obstacles in the environment will interfere with cables and the mobile platform. It is a meaningful work to avoid these disturbances by planning the path of CDPRs. this paper presents an optimal path planning strategy for a reconfigurable CDPR. By a variant of Rapid-exploration Random Tree (RRT) to find the optimal collision avoidance path gradually, Artificial Potential Field (APF) is used to guide the generation of random tree nodes, reducing the time of finding the path, and ensuring the safe distance between the platform and obstacles, and generating the shortest collision-free path. Post-processing algorithm reduce the number of path points. Detect whether the generated path will cause obstacles to interfere withthe cable. If there is interference, optimize the robot's stiffness and minimize cable tensions to determine the optimal configuration. By adjusting the location of the cable connection points on the fixed frame, the obstacle avoidance of the cable is realized. the simulation results demonstrate the RRT*-APF hybrid path planning algorithm's ability to successfully find the path to avoid collision of the mobile platform in the environment with obstacles, the reconstruction algorithm can find the best collision-free configuration of the cable.
Adaptive Bitrate (ABR) algorithms have become increasingly important for delivering high-quality video content over fluctuating networks. Considering the complexity of video scenes, video chunks can be separated into ...
详细信息
ISBN:
(数字)9789819708598
ISBN:
(纸本)9789819708581;9789819708598
Adaptive Bitrate (ABR) algorithms have become increasingly important for delivering high-quality video content over fluctuating networks. Considering the complexity of video scenes, video chunks can be separated into two categories: those with intricate scenes and those with simple scenes. In practice, improving the quality of intricate chunks can lead to more significant improvements in Quality of Experience (QoE) than improving simple chunks. However, current schemes either assign equal priority to all chunks or optimize using a fixed linear-based reward function, making them inadequate for meeting real-world requirements. To tackle these limitations, this paper introduces a novel ABR approach that explicitly considers bitrate adaptation as the primary objective. the proposed approach, CAST (Complex-scene Aware bitrate algorithm via Self-play reinforcemenT learning), leverages the power of parallel computing with multiple agents to train a neural network, aiming to achieve superior video playback quality for intricate scenes while minimizing frequent freezing events. the extensive tracedriven evaluation and subjective test results demonstrate that CAST outperforms existing off-the-shelf schemes.
the active millimeter-wave scanner plays an increasingly pivotal role in public safety by employing a non-contact method to detect contraband concealed beneath human clothing. However, millimeter-wave images encounter...
详细信息
this paper presents a hybrid approach to sentence alignment for the Kazakh-Turkish parallel corpus, addressing the challenges posed by linguistic and structural differences between the two languages. the system is div...
详细信息
We consider the problem of cost-effectively mapping a swarm of soft real-time stream processing applications with moldable-parallel tasks to multicore resources in the device-edge-cloud continuum, consisting of mobile...
详细信息
ISBN:
(纸本)9798350366495;9798350366488
We consider the problem of cost-effectively mapping a swarm of soft real-time stream processing applications with moldable-parallel tasks to multicore resources in the device-edge-cloud continuum, consisting of mobile devices, edge resources and cloud resources. We leverage flexibility from different parallelization degrees and frequency levels (DVFS) for the tasks, keeping application throughput constraints and communication bandwidth limitations while minimizing overall cost (including device/edge resource energy and cloud resource renting). We present two offline algorithmic solutions with a global view of the environment: an integer linear program (ILP) extending the crown scheduling approach for multi-layer distributed systems and a greedy heuristic algorithm. Our experimental evaluation for several real-world and synthetic scenarios shows that the time required for solving the scheduling problem to cost-optimality by the ILP is feasible for nontrivial scenarios. the heuristic achieves about 12% worse cost efficiency on average, yet operates much faster (by 1-2 orders of magnitude), allowing to scale up the problem size more than the ILP approach.
In the field of parallel computing, Coarse-Grained Reconfigurable Architecture (CGRA) is a promising technique for processingparallel applications. Application kernels are mapped on CGRA through the calculation of ma...
详细信息
In the field of image processing and computer vision, negative imaging is vital for applications like medical imaging, art, and data enhancement. this paper introduces an efficient approach for negative image creation...
详细信息
this study focuses on adapting and enhancing the performance of the Geneformer algorithm on the Cambricon MLU290-M5 smart accelerator card, addressing the limitations posed by Nvidia CUDA dependency in biocomputing. T...
详细信息
ISBN:
(纸本)9798400716409
this study focuses on adapting and enhancing the performance of the Geneformer algorithm on the Cambricon MLU290-M5 smart accelerator card, addressing the limitations posed by Nvidia CUDA dependency in biocomputing. the aim is to broaden the Geneformer model's application scope and elevate scientific productivity in fields like network biology via retraining. the research details the hardware and software platform configurations, including the MLU290-M5's technical specifications, software stack architecture, and adaptive modifications made to the PyTorch source code and transformers library. By leveraging the unique characteristics of the MLU hardware, the research team achieved efficient operation of the Geneformer model on the MLU290-M5. Experimental results demonstrate significant performance improvements, particularly in multi-card configurations, with linear scaling achieved. Full utilization of the MLU 290 hardware was achieved through optimizations in data loading, model parallelism and communication protocols. Despite these achievements, challenges remain in further enhancing algorithm performance and hardware utilization efficiency, providing opportunities for future research, especially in improving MLU hardware drivers and related software packages. Overall, this study offers an innovative approach to adapting and optimizing the Geneformer algorithm for smart accelerator cards, paving the way for broader applications and improved scientific productivity in biocomputing.
In this study, the performance of the CNN architectures VGG19, ResNet50, and DenseNet201 is compared in the detection of diseases in potato leaves. Within the scope of this research, the Plant Village database (2152 i...
详细信息
暂无评论