this special issue is dedicated to examining the rapidly evolving fields of artificial intelligence, mathematical modeling, and optimization, with particular emphasis on their growing importance in computational scien...
详细信息
this special issue is dedicated to examining the rapidly evolving fields of artificial intelligence, mathematical modeling, and optimization, with particular emphasis on their growing importance in computational science. It features the most notable papers from the "Mathematical Modeling and Problem Solving" workshop at PDPTA'24, the 30thinternationalconference on parallel and Distributed processing Techniques and Applications. the issue showcases pioneering research in areas such as natural language processing, system optimization, and high-performance computing. the nine selected studies include novel AI-driven methods for chemical compound generation, historical text recognition, and music recommendation, along with advancements in hardware optimization through reconfigurable accelerators and vector register sharing. Additionally, evolutionary and hyper-heuristic algorithms are explored for sophisticated problem-solving in engineering design, and innovative techniques are introduced for high-speed numerical methods in large-scale systems. Collectively, these contributions demonstrate the significance of AI, supercomputing, and advanced algorithms in driving the next generation of scientific discovery.
QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. this broad spectrum of applications has drawn academic and commercial attention to...
详细信息
ISBN:
(纸本)9798400705977
QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. this broad spectrum of applications has drawn academic and commercial attention to developing many software libraries and domain-specific hardware solutions. In the Internet of things (IoT) domain, multicore parallel Ultra-Low-Power (PULP) architectures are emerging as energy-efficient alternatives, outperforming conventional single-core devices by coupling parallelprocessing with near-threshold computing. To the best of the authors' knowledge, our study introduces the first parallelized and optimized implementation of three distinct QR decomposition methods (Givens rotations, Gram-Schmidt process, and Householder transformation) on GAP-9, a commercial embodiment of the PULP architecture. parallel execution on the 8-core cluster leads to a reduction in the total number of cycles by 241% for Givens rotations, 470% for Gram-Schmidt, and 567% for Householder, compared to the GAP9 1-core scenario. while each of them only consumes 0.013 mJ, 0.012 mJ, and 0.216 mJ, respectively. Compared to traditional single-core architectures based on ARM architectures, we achieve 8x, 24x, and 30x better performance and 36x, 35x, and 30x better energy efficiency, paving the way for broad adoption of complex linear algebra tasks in the IoT domain.
Today, there are a large number of tasks that involve processing data in the form of arrays. these include, in particular, algorithms for fast orthogonal transformations and cryptographic protection of information, wh...
详细信息
the proceedings contain 359 papers. the topics discussed include: parallel channel separate attention network for concealed object detection in millimeter-wave images;YOLOv8 detection head improvements for FPGA deploy...
ISBN:
(纸本)9798350376548
the proceedings contain 359 papers. the topics discussed include: parallel channel separate attention network for concealed object detection in millimeter-wave images;YOLOv8 detection head improvements for FPGA deployments;improving the performance of OPTICS on short text clustering by isokernel and UMAP;robust multi-object tracking with CVAE motion model for maritime vessels;molecular property prediction based on graph contrastive learning;a novel design and simulation of band-pass filter for the 5G System based on MEMS;transcending information cocoons: integrating TransR embeddings with tensor decomposition in recommender systems;topic analysis of Chinese documents based on key phrases and latent Dirichlet allocation model;and real-valued beamspace direct position determination for multi-station systems.
this article focuses on the cable driven parallel robot for aircraft spraying. Based on a multi-objective optimization model, the performance of the cable driven parallel robot (CDPR) is optimized using Prairie Dogs o...
详细信息
ISBN:
(纸本)9798350350319;9798350350302
this article focuses on the cable driven parallel robot for aircraft spraying. Based on a multi-objective optimization model, the performance of the cable driven parallel robot (CDPR) is optimized using Prairie Dogs optimization algorithm. Firstly, a static model of 8-cable 6-degree- of-freedom CDPR suitable for aircraft spraying was established, and corresponding evaluation indicators were designed for four performance indicators: workspace, average stiffness, stiffness fluctuation, and flexibility. Establish a multi- objective optimization model by processing performance indicators and solve it using the Prairie Dogs optimization algorithm. the final results indicate that the groundhog optimization algorithm based on multi-objective optimization design has a good effect on the performance optimization of CDPR, providing important reference for the design and optimization of aircraft spraying robots. this study has important theoretical and practical significance in the field of aircraft spraying robots, providing new ideas and methods for optimizing robot performance, and has certain guiding significance for engineering practice.
Blockchain technology inherently necessitates redundant computation to achieve consensus among untrusted parties because of its fundamental threat model. this requirement, however, compromises system performance and i...
详细信息
Efficient image processingarchitectures are consistently in demand across a multitude of applications, particularly those customized for resource-constrained systems-on-chip (SoC). the increasing need for high-perfor...
详细信息
ISBN:
(纸本)9798400709586
Efficient image processingarchitectures are consistently in demand across a multitude of applications, particularly those customized for resource-constrained systems-on-chip (SoC). the increasing need for high-performance image processing in various sectors has driven the development of specialized architectures. However, deploying such architectures on platforms with limited resources, such as SoCs, poses significant challenges. Furthermore, the implementation of complex algorithms to handle large datasets using software solutions often leads to slower response times, prompting exploration into hardware implementations. Field-Programmable Gate Arrays (FPGAs) are becoming popular for hardware implementations because of their attributes: low latency, connectivity, parallel computing capabilities, and flexibility. Consequently, the utilization of FPGA-based implementations has resulted in faster and more efficient performance of unique architectures tailored to specific requirements. this paper presents a novel hardware/software co-design approach to implement erosion, dilation, and neighborhood image processing operations on the FPGA development board, "Zedboard". In this approach, the FPGA is programmed by connecting it to a PC via USB, facilitating the transfer of an image pixel by pixel. the pixels are temporarily stored in on-chip DDR and accessed through DMA (Direct Memory Access) until they are requested by an interrupt signal from the Image processing IP, at which point they are moved to line buffers for faster processing. Once processed, the image is transmitted back to the PC via UART, facilitating pixel-by-pixel transfer for verification, where it is compared with a reference image generated using Python. this comparison confirms a 99.22% match between the processed image and the reference image, withthe discrepancy occurring at the image's edges due to initial padding. Additionally, the time required to process the entire image was measured and displayed
Liver disease is one of the major health problems worldwide and usually leads to serious complications if not diagnosed accurately and in time. Effective detection and classification of liver pathology at early stages...
详细信息
ISBN:
(纸本)9798400717499
Liver disease is one of the major health problems worldwide and usually leads to serious complications if not diagnosed accurately and in time. Effective detection and classification of liver pathology at early stages is crucial, in which histopathologic examination of liver tissue plays a key role. However, manual analysis of histopathological images is easily affected by inter-observer variability. Recent advances in deep learning, on the other hand, have introduced methods to significantly improve the accuracy and efficiency of image-based diagnosis. this study focuses on the application of the You Only Look Once (YOLO) object detection model, specifically YOLOv4, v5, v7, v8, and v9, for automated detection of liver diseases from stained microscopic liver slices. We perform a comprehensive comparative analysis to evaluate the detection accuracy of these models across four common liver conditions: ballooning, fibrosis, inflammation, and steatosis. the results of the study show that the latest versions, in particular YOLOv9, show significant improvements in accuracy and computational efficiency compared to other versions. In this paper, the performance of each model is evaluated in detail, and our results emphasize the potential of the advanced YOLO architecture to enhance medical diagnostics by facilitating faster and more reliable detection of liver disease.
the proceedings contain 336 papers. the topics discussed include: unveiling the potential of natural language processing in collaborative robots (Cobots): a comprehensive survey;unleashing the power of machine learnin...
ISBN:
(纸本)9798350377972
the proceedings contain 336 papers. the topics discussed include: unveiling the potential of natural language processing in collaborative robots (Cobots): a comprehensive survey;unleashing the power of machine learning for enhanced capabilities in consumer electronics drones;convolution driven vision transformer for the prediction of mild cognitive impairment to Alzheimer’s disease progression;enhancing the fairness and performance of edge cameras with explainable ai;theoretical analysis of serial/parallel variations of hash-mining for smaller variance of confirmation time;fault detection in 3D-printing with deep learning;development of a battery-less wireless sensor node for sediment disaster monitoring system;and evaluation of ensemble learning models for hardware-trojan identification at gate-level netlists.
the parallel execution of many graph algorithms is frequently dominated by data communication overheads between compute nodes. this bottleneck becomes even more pronounced in Near-Memory processing (NMP) architectures...
详细信息
ISBN:
(纸本)9798400701405
the parallel execution of many graph algorithms is frequently dominated by data communication overheads between compute nodes. this bottleneck becomes even more pronounced in Near-Memory processing (NMP) architectures with multiple memory cubes as local memory accesses are less expensive. Existing near-memory architectures typically use graph partitioning methods with a fixed vertex assignment, which limits their potential to improve performance and reduce energy consumption. Here, we argue that an NMP-based graph processing system should also consider the distribution of vertices onto memory cubes. We propose SuperCut, a framework for near-memory architectures to effectively reduce communication overheads while maintaining computational balance. We evaluate SuperCut via architectural simulation with 6 real-world datasets and 4 representative applications. the results show that it provides up to 1.8x total energy reduction and 2.6x speedup relative to current state-of-the-art approaches.
暂无评论