High-density EEG is a non-invasive measurement method with millisecond temporal resolution that allows us to monitor how the human brain operates under different conditions. the large amount of data combined with comp...
详细信息
ISBN:
(纸本)9783031488023;9783031488030
High-density EEG is a non-invasive measurement method with millisecond temporal resolution that allows us to monitor how the human brain operates under different conditions. the large amount of data combined with complex algorithms results in unmanageable execution times. Large-scaleGPU parallelism provides the means to drastically reduce the execution time of EEG analysis and bring the execution of large cohort studies (over thousand subjects) within reach. this paper describes our effort to implement various EEG algorithms for multiGPUpre-exascale supercomputers. Several challenges arise during thiswork, such as the high cost of data movement and synchronisation compared to computation. A performance-oriented end-to-end design approach is chosen to develop highlyscalable, GPU-only implementations of full processing pipelines and modules. Work related to the parallel design of the family of Empirical Mode Decomposition algorithms is described in detail with preliminary performance results of single-GPU implementations. the research will continue with multi-GPU algorithm design and implementation aiming to achieve scalability up to thousands of GPU cards.
the proceedings contain 102 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: A New Robust and Reversible Watermarking Technique Based on Eras...
ISBN:
(纸本)9783030389901
the proceedings contain 102 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: A New Robust and Reversible Watermarking Technique Based on Erasure Code;exit-Less Hypercall: Asynchronous System Calls in Virtualized Processes;automatic Optimization of Python Skeletal parallel Programs;impromptu Rendezvous Based Multi-threaded Algorithm for Shortest Lagrangian Path Problem on Road Networks;FANG: Fast and Efficient Successor-State Generation for Heuristic Optimization on GPUs;DETER: Streaming Graph Partitioning via Combined Degree and Cluster Information;which Node Properties Identify the Propagation Source in Networks?;t/t-Diagnosability of BCube Network;strark-H: A Strategy for Spatial Data Storage to Improve Query Efficiency Based on Spark;SWR: Using Windowed Reordering to Achieve Fast and Balanced Heuristic for Streaming Vertex-Cut Graph Partitioning;multitask Assignment Algorithm Based on Decision Tree in Spatial Crowdsourcing Environment;TIMOM: A Novel Time Influence Multi-objective Optimization Cloud Data Storage Model for Business Process Management;RTEF-PP: A Robust Trust Evaluation Framework with Privacy Protection for Cloud Services Providers;a Privacy-Preserving Access Control Scheme with Verifiable and Outsourcing Capabilities in Fog-Cloud Computing;utility-Aware Edge Server Deployment in Mobile Edge Computing;predicting Hard Drive Failures for Cloud Storage Systems;Efficient Pattern Matching on CPU-GPU Heterogeneous Systems;improving Performance of Batch Point-to-Point Communications by Active Contention Reduction through Congestion-Avoiding Message Scheduling;an Open Identity Authentication Scheme Based on Blockchain;RBAC-GL: A Role-Based Access Control Gasless Architecture of Consortium Blockchain;flexible Data Flow Architecture for Embedded Hardware Accelerators;developing Patrol Strategies for the Cooperative Opportunistic Criminals.
the proceedings contain 39 papers. the topics discussed include: performance analysis of several CNN based models for brain MRI in tumor classification;MRI-based lumbar sagittal alignment classification system;3D mapp...
ISBN:
(纸本)9798350352368
the proceedings contain 39 papers. the topics discussed include: performance analysis of several CNN based models for brain MRI in tumor classification;MRI-based lumbar sagittal alignment classification system;3D mapping and landing zone identification in complex terrains using DSM and photogrammetry;vision language models for oil palm fresh fruit bunch ripeness classification;towards no shadow: region-based shadow compensation on low-altitude urban aerial images;comparative analysis of deep learning architectures for blood cancer classification;exploration of group and shuffle module for semantic segmentation of sea ice concentration;on handcrafted machine learning features for art authentication;and acoustic signature modelling of marine vessels in various environmental and operational conditions.
QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. this broad spectrum of applications has drawn academic and commercial attention to...
详细信息
ISBN:
(纸本)9798400705977
QR decomposition is a numerical method used in many applications from the High-Performance Computing (HPC) domain to embedded systems. this broad spectrum of applications has drawn academic and commercial attention to developing many software libraries and domain-specific hardware solutions. In the Internet of things (IoT) domain, multicore parallel Ultra-Low-Power (PULP) architectures are emerging as energy-efficient alternatives, outperforming conventional single-core devices by coupling parallelprocessing with near-threshold computing. To the best of the authors' knowledge, our study introduces the first parallelized and optimized implementation of three distinct QR decomposition methods (Givens rotations, Gram-Schmidt process, and Householder transformation) on GAP-9, a commercial embodiment of the PULP architecture. parallel execution on the 8-core cluster leads to a reduction in the total number of cycles by 241% for Givens rotations, 470% for Gram-Schmidt, and 567% for Householder, compared to the GAP9 1-core scenario. while each of them only consumes 0.013 mJ, 0.012 mJ, and 0.216 mJ, respectively. Compared to traditional single-core architectures based on ARM architectures, we achieve 8x, 24x, and 30x better performance and 36x, 35x, and 30x better energy efficiency, paving the way for broad adoption of complex linear algebra tasks in the IoT domain.
the sharding-based protocols provide an efficient scaling solution for blockchain networks. However, the sharded blockchain suffers from transaction verification inefficiency while facing the threat of attacks from ma...
详细信息
ISBN:
(数字)9789819708598
ISBN:
(纸本)9789819708581;9789819708598
the sharding-based protocols provide an efficient scaling solution for blockchain networks. However, the sharded blockchain suffers from transaction verification inefficiency while facing the threat of attacks from malicious peers. To solve aforementioned issues, this paper proposes a novel transaction processing model applicable to sharded blockchain. Specifically, a transaction admission control algorithm is established based on queuing model for single-shard flooding attacks to avoid transactions from being injected in a short period of time. And then, we present a transaction verification mechanism based on threshold signature, which is more efficient compare with PBFT protocol. the results of simulation experiments show that our transaction processing model can bring better performance compare to other advanced sharding protocols. In the case of 16 shards, the model achieves 3500 TPS throughput improvement and 4 s transaction latency reduction, and it exhibits good robustness in the case of peer misbehavior.
the transition to large-scale new energy systems has greatly increased the complexity, volume, and multidimensionality of power data in the cloud, making data management more challenging and potentially risking the sa...
详细信息
Today, there are a large number of tasks that involve processing data in the form of arrays. these include, in particular, algorithms for fast orthogonal transformations and cryptographic protection of information, wh...
详细信息
In fields like robotics and factory automation (FA), an ultra-low delay vision system has emerged as a critical tool. this system processes videos at an astonishing 1000 frames per second (fps), with each frame being ...
详细信息
ISBN:
(纸本)9798350367164;9798350367157
In fields like robotics and factory automation (FA), an ultra-low delay vision system has emerged as a critical tool. this system processes videos at an astonishing 1000 frames per second (fps), with each frame being processed within 1 millisecond (1-ms). Camera pose estimation plays a crucial role by providing real-time and accurate spatial information, enabling rapid response and precise control of production processes in such systems. Existing works mainly focus on general purpose use (60fps) while less of the focus on specific use (1000fps). this paper proposed an ultra-low delay camera pose estimation method designed to meet the stringent requirements of modern FA systems. the spatial constrained and candidate expanded parallel matching is proposed for accelerating 3D-2D matching process. Spatial constrain utilizes the tiny movements between adjacent frames to shrink the search range to decrease the probability of matching conflicts. And candidate expansion increases the candidate matching points to guarantee the quality of matching between 2D and 3D spaces, which increases parallelism and enhances efficiency of the 3D-2D matching process. Experiments are conducted using a 1000 fps camera with resolution of 640 x 360 . the proposed method demonstrated robust performance across horizontal, vertical, and random camera movements, achieving an average Relative Pose Error (RPE) of 0.5 cm for translation and 0.4 degrees for rotation respectively.
An increased growth in the blood cancer necessitates the development of efficient, cost effective, timely, and accurate diagnosis. Traditional diagnosis methods are often invasive, expensive and time-consuming. the ra...
详细信息
ISBN:
(纸本)9798350352368
An increased growth in the blood cancer necessitates the development of efficient, cost effective, timely, and accurate diagnosis. Traditional diagnosis methods are often invasive, expensive and time-consuming. the rapid artificial intelligence (AI) assistive advancement in the digital healthcare permits the realization of effective solutions in this framework. Specifically, the deep learning (DL), seems promising in an automated diagnosis. However, still a critical gap needs to be covered by understanding that which DL architecture performs better for the blood cancer detection. To address this crucial need, this paper presents a comprehensive comparative analysis of the key DL methods, used in an automated categorization of the blood cancer. the considered DL architectures are the MobileNetV2, DenseNet121, VGG16, ResNet50, and InceptionV3. the applicability is tested using two blood cancer datasets namely the Acute Lymphoblastic Leukemia (ALL) dataset and American Society of Hematology (ASH) dataset. Each model is meticulously trained and evaluated on the ALL dataset for binary classification and the ASH image bank for multi-class classification. the categorization performance is evaluated based on accuracy, precision, recall, F1 score, and latency. Results have shown an out performance of the MobileNetV2 compared to the counter DL architectures with a mean accuracy of 91.26%, precision of 92.94%, recall of 91.27%, F1 score of 90.58% and latency of 104.16 mins for ALL dataset and 88.11 accuracy, 90.23% precision, recall of 88.11%, 87.98% F1 score and latency of 11.16 mins for ASH dataset.
Graphics processing units (GPUs) have become ubiquitous in computing systems, from personal devices to data centres. their massively parallel architecture delivers tremendous performance for graphics and general-purpo...
详细信息
暂无评论