检索结果-内蒙古大学图书馆

Efficient Method for parallel Computation of Geodesic Transformation on CPU

IEEE TRANSACTIONS ON parallel AND distributed systems 2020年第4期31卷 935-947页

作者： Zlaus, Danijel Mongus, Domen Univ Maribor Fac Elect Engn & Comp Sci Maribor 2000 Slovenia

This article introduces a fast Central Processing Unit (CPU) implementation of geodesic morphological operations using stream processing. In contrast to the current state-of-the-art, that focuses on achieving insensitivity to the filter sizes with efficient data structures, the proposed approach achieves efficient computation of long chains of elementary 3 x 3 filters using multicore and Single Instruction Multiple Data (SIMD) processing. In comparison to the related methods, up to 100 times faster computation of common geodesic operators is achieved in this way, allowing for real-time processing (with over 30 FPS) of up to 1500 filters long chains, applied on 1024 x 1024 images. In addition, the proposed approach outperformed GPGPU, and proved to be more efficient than the comparable streaming method for the computation of morphological erosions and dilations with window sizes up to 183 x 183 in the case of using char and 27 x 27 when using double data types.

关键词： Geodesic operators mathematical morphology SIMD parallel processing stream processing

来源：评论

学校读者我要写书评

暂无评论

A Message Passing Interface Library for High-Level Synthesis on Multi-FPGA systems

A Message Passing Interface Library for High-Level Synthesis...

引用

IEEE International Symposium on Embedded Multicore Socs (MCSoC)

作者： Kazuei Hironaka Kensuke Iizuka Hideharu Amano Dept. of Information and Computer Science Keio University Yokohama Japan

ISBN: (纸本)9781665465007

One obstacle to application development on multi-FPGA systems with high-level synthesis (HLS) is a lack of support for a programming interface. Implementing and debugging an application on multiple FPGA boards is difficult without a standard interface. Message Passing Interface (MPI) is a standard parallel programming interface commonly used in distributed memory systems. This paper presents a tool-independent MPI library called FiC-MPI that can be used in HLS for multi-FPGA systems in which each FPGA node is connected directly. By using FiC-MPI, various parallel software, including a general-purpose benchmark, can be easily implemented. FiC-MPI was implemented and evaluated on the M-KUBOS cluster consisting of Zynq MPSoC boards connected with a static time-division multiplexing network. By using the FiC-MPI simulator, parallel programs can be debugged before implementing on real machines. As a case study, the Himeno-BMT benchmark was implemented with FiC-MPI. It achieved 178.7 MFLOPS with a single node and scaled to 643.7 MFLOPS with four nodes, and 896.9 MFLOPS with six nodes of the M-KUBOS cluster. Through the implementation, the easiness of developing parallel programs with FiC-MPI on multi-FPGA systems was demonstrated.

关键词： Multiplexing parallel programming Multicore processing Message passing Benchmark testing Libraries Software

来源：评论

学校读者我要写书评

暂无评论

On the Complexity of Conditional DAG Scheduling in Multiprocessor systems 34

On the Complexity of Conditional DAG Scheduling in Multiproc...

引用

34th IEEE International parallel and distributed Processing Symposium (IPDPS)

作者： Marchetti-Spaccamela, Alberto Megow, Nicole Schloeter, Jens Skutella, Martin Stougie, Leen Sapienza Univ Rome Rome Italy INRIA Erable Paris France Univ Bremen Bremen Germany Tech Univ Berlin Berlin Germany Vrije Univ Amsterdam CWI Amsterdam Amsterdam Netherlands

ISBN: (纸本)9781728168760

As parallel processing became ubiquitous in modern computing systems, parallel task models have been proposed to describe the structure of parallel applications. The workflow scheduling problem has been studied extensively over past years, focusing on multiprocessor systems and distributed environments (e.g. grids, clusters). In workflow scheduling, applications are modeled as directed acyclic graphs (DAGs). DAGs have also been introduced in the real-time scheduling community to model the execution of multi-threaded programs on a multi-core architecture. The DAG model assumes, in most cases, a fixed DAG structure capturing only straight-line code. Only recently, more general models have been proposed. In particular, the conditional DAG model allows the presence of control structures such as conditional (if-then-else) constructs. While first algorithmic results have been presented for the conditional DAG model, the complexity of schedulability analysis remains wide open. We perform a thorough analysis on the worst-case makespan (latest completion time) of a conditional DAG task under list scheduling (a.k.a. fixed-priority scheduling). We show several hardness results concerning the complexity of the optimization problem on multiple processors, even if the conditional DAG has a well-nested structure. For general conditional DAG tasks, the problem is intractable even on a single processor. Complementing these negative results, we show that certain practice-relevant DAG structures are very well tractable.

关键词： parallel processing makespan conditional DAG complexity

来源：评论

学校读者我要写书评

暂无评论

基于异构并行的DAS高密度数据实时解调技术

引用

黑龙江大学自然科学学报 2024年第1期41卷 90-98页

作者：张健何向阁郭莹张敏刘盛春黑龙江大学电子工程学院哈尔滨150080 北京大学地球与空间科学学院北京100871 北京大学东莞光电研究院东莞523808

针对分布式光纤声波传感(distributed optical fiber acoustic sensing,DAS)系统中高密度数据实时解调的需求,提出了基于中央处理器(Central processing unit,CPU)和图形处理器(Graphic processing unit,GPU)的异构并行计算架构,完成了实时解调双通道外差型DAS系统传感数据,可满足同时对两个通道共5000个等效阵元实时解调处理需求。此系统每秒需处理的数据量高达400 MB,相较于仅使用CPU运算的225.5 s运算时间,采用异构并行计算架构的运算时间优化到了468.2 ms,运算速度提升了482倍,且该方案仍有巨大的算力冗余空间,可为后续DAS系统整体实时性能的提升提供算力支持。

关键词：异构并行分布式光纤声波传感高密度数据实时解调

来源：评论

学校读者我要写书评

暂无评论

The Influence of Wartime on distributed Team — Challenges, Leadership, Development: Ukraine Case

The Influence of Wartime on Distributed Team — Challenges, ...

引用

IEEE International workshop on Intelligent Data Acquisition and Advanced Computing systems: Technology and Applications

作者： Carsten Wolff Olena Verenych Kateryna Turchaninova Dortmund University of Applied Science and Arts Dortmund Germany Kyiv National University of Construction and Architecture Kyiv Ukraine

The technological development of the last few years has made a contribution to the form of the work. The tendency to development of work environmental with features that are like those in real life has gone mainstream. The COVID-19 World Pandemic increased the rapidity of implementing these new environmental policies in businesses. Moreover, globalization and sustainability principles allow businesses to be more intercultural and more responsible for the environment. The approach to creating remote work is “it” thing in business now. There are enough researcher results that evaluate the impact of remote work on teamwork. They marked merits and demerits and lightened the main changes in competencies as a leader and a team member by offering some time management and psychological approaches for organization and supporting high productivity during work time. However, russian aggression Ukraine, which has been starting since 2014 and received the second active phase on February 24, 2022, influenced the remote work. The business routine got new challenges. This paper is the first step toward understanding how a war can influence a team member and how it can influence remote work: Ukraine case. The main research questions that were formulated for this research were: (1) how much influence the war has on remote work; (2) if there has been a change in the team development model; and (3) how much influence the war has had on leadership competencies. The research was based on literature reviews and the authors' own experience.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Optimizing Load Balance in a parallel CFD Code for a Large-scale Turbine Simulation on a Vector Supercomputer

Supercomputing Frontiers and Innovations

引用

Supercomputing Frontiers and Innovations 2021年第2期8卷 114-130页

作者： Watanabe, Osamu Komatsu, Kazuhiko Sato, Masayuki Kobayashi, Hiroaki NEC Corporation Tokyo Japan Tohoku University Sendai Japan

A turbine for power generation is one of the essential infrastructures in our society. A turbine’s failure causes severe social and economic impacts on our everyday life. Therefore, it is necessary to foresee such failures in advance. However, it is not easy to expect these failures from a real turbine. Hence, it is required to simulate various events occurring in the turbine by numerical simulations of the turbine. A multiphysics CFD code, "Numerical Turbine," has been developed on vector supercomputer systems for large-scale simulations of unsteady wet steam flows inside a turbine. To solve this problem, the Numerical Turbine code is a block structure code using MPI parallelization, and the calculation space consists of grid blocks of different sizes. Therefore, load imbalance occurs when executing the code in MPI parallelization. This paper creates an estimation model that finds the calculation time from each grid block’s calculation amount and calculation performance. It proposes an OpenMP parallelization method for the load balance of MPI applications. This proposed method reduces the load imbalance by considering the vector performance according to the calculation amount based on the model. Moreover, this proposed method recognizes the need to reduce the load imbalance without pre-execution. The performance evaluation shows that the proposed method improves the load balance from 24.4 % to 9.3 %. © 2021. The Authors. All Rights Reserved.

关键词： Vectors

来源：评论

学校读者我要写书评

暂无评论

A Counterfactual Ultrasound Anti-Interference Self-Supervised Network for B-mode Ultrasound Tongue Extraction

A Counterfactual Ultrasound Anti-Interference Self-Supervise...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Yan Jia Yuqing Cheng Kele Xu Yong Dou Peng Qiao Zhouyu He National Key Laboratory of Parallel and Distributed Computing College of Computer Science and Technology National University of Defense Technology Changsha China College of Systems Engineering National University of Defense Technology Changsha China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

B-mode ultrasound tongue imaging is a non-invasive and real-time method for visualizing vocal tract deformation. However, accurately extracting the tongue’s surface contour remains a significant challenge due to the low signal-to-noise ratio (SNR) and prevalent speckle noise in ultrasound images. Traditional supervised learning models often require large labeled datasets, which are labor-intensive to produce and susceptible to noise interference. To address these limitations, we present a novel Counterfactual Ultrasound Anti-Interference Self-Supervised Network (CUAI-SSN), which integrates self-supervised learning (SSL) with counterfactual data augmentation, progressively disentangles confounding factors, ensuring that the model generalizes well across varied ultrasound conditions. Our approach leverages causal reasoning to decouple noise from relevant features, enabling the model to learn robust representations that focus on essential tongue structures. By generating counterfactual image-label pairs, our method introduces alternative, noise-independent scenarios that enhance model training. Furthermore, we introduce attention mechanisms to enhance the network’s ability to capture fine-grained details even in noisy conditions. Extensive experiments on real ultrasound tongue images demonstrate that CUAI-SSN outperforms existing methods, setting a new benchmark for automated contour extraction in ultrasound tongue imaging. Our code is publicly available at https://***/inexhaustible419/CounterfactualultrasoundAI.

关键词： Training Ultrasonic imaging Tongue Self-supervised learning Data augmentation Data models Cognition Data mining Noise measurement Signal to noise ratio

来源：评论

学校读者我要写书评

暂无评论

Exploiting Simultaneous Communications to Accelerate Data parallel distributed Deep Learning 40

Exploiting Simultaneous Communications to Accelerate Data Pa...

引用

40th IEEE Conference on Computer Communications (IEEE INFOCOM)

作者： Shi, Shaohuai Chu, Xiaowen Li, Bo Hong Kong Univ Sci & Technol Dept Comp Sci & Engn Hong Kong Peoples R China Hong Kong Baptist Univ Dept Comp Sci Hong Kong Peoples R China

ISBN: (纸本)9780738112817

Synchronous stochastic gradient descent (S-SGD) with data parallelism is widely used for training deep learning (DL) models in distributed systems. A pipelined schedule of the computing and communication tasks of a DL training job is an effective scheme to hide some communication costs. In such pipelined S-SGD, tensor fusion (i.e., merging some consecutive layers' gradients for a single communication) is a key ingredient to improve communication efficiency. However, existing tensor fusion techniques schedule the communication tasks sequentially, which overlooks their independence nature. In this paper, we expand the design space of scheduling by exploiting simultaneous All-Reduce communications. Through theoretical analysis and experiments, we show that simultaneous All-Reduce communications can effectively improve the communication efficiency of small tensors. We formulate an optimization problem of minimizing the training iteration time, in which both tensor fusion and simultaneous communications are allowed. We develop an efficient optimal scheduling solution and implement the distributed training algorithm ASC-WFBP with Horovod and PyTorch. We conduct real-world experiments on an 8-node GPU cluster of 32 GPUs with 10Gbps Ethernet. Experimental results on four modern DNNs show that ASC-WFBP can achieve about 1.09 x -2.48x speedup over the baseline without tensor fusion, and 1.15 x -1.35 x speedup over the state-of-the-art tensor fusion solution.

关键词： distributed Deep Learning Communication-Efficient Simultaneous Communications

来源：评论

学校读者我要写书评

暂无评论

Analyzing co-design of agroecology-oriented cropping systems: lessons to build design-support tools

引用

AGRONOMY FOR SUSTAINABLE DEVELOPMENT 2022年第4期42卷 72-72页

作者： Quinio, Maude Jeuffroy, Marie-Helene Guichard, Laurence Salazar, Paola Detienne, Francoise Univ Paris Saclay AgroParisTech INRAE UMR Agron F-78850 Thiverval Grignon France Inst Polytech Paris CNRS Telecom Paris I3 SES Palaiseau France

If the challenges involved in agroecological transition are to be addressed, cropping systems (CS) need to be changed profoundly, which in turn requires innovative design adapted to local conditions. This is however by no means an easy task since such design activity requires extensive knowledge on objects and processes rarely studied until now, most of which is distributed among numerous stakeholders. Since the 2000s, research on design in agriculture has aimed at developing participatory methods to support on-farm design of new systems, but few studies have focused on the elaboration of design-support tools. With a view to defining the features of tools intended to support the design of agroecology-oriented cropping systems, ergonomists recommended an analysis of the activities of the future users of these tools in their real work situations. We started out by implementing a diagnosis of use situations, based on observations of real collective design activities. To this end, we took part in six design workshops, which differed in terms of goals and of designers participating (i.e., farmers, advisors, students, or scientists). We first identified the diversity of features of these design situations, and then analyzed three processes across the design workshops: (i) the reformulation of the design goal;(ii) the large exploration of candidate solutions;and (iii) the local adaptation of these solutions while anticipating the on-field implementation. Here, we show, for the first time, the type of reasonings and knowledge that designers and facilitators displayed and used throughout the agroecological cropping system design process. We identify the features that future design-support tools should have to guide co-designers of agroecological CS. Such tools should promote several types of design reasoning and allow the development of external representations of the object under design. Our results provide operational guidelines for the elaboration of new design-support to

关键词： Use situation Design process Design workshop Agroecology distributed knowledge Design reasoning

来源：评论

学校读者我要写书评

暂无评论

Fast parallel SVM based Arrhythmia Detection on Multiple GPU Clusters 10

Fast Parallel SVM based Arrhythmia Detection on Multiple GPU...

引用

10th IEEE International Conference on Communication systems and Network Technologies, CSNT 2021

作者： Latif, Ghazanfar Alghazo, Jaafar Butt, Mohsin Kazimi, Zafar A. Department of Computer Science Prince Mohammad Bin Fahd University -Khobar Saudi Arabia Department of Computer Engineering Prince Mohammad Bin Fahd University -Khobar Saudi Arabia Prep Computer Science Department King Fahd University of Petroleum and Minerals Dhahran Saudi Arabia Department of Information Technology Prince Mohammad Bin Fahd University -Khobar Saudi Arabia

ISBN: (纸本)9780738105239

Regression analysis and classification can be done using a supervised learning technique called Support Vector Machine (SVM) which is one of many such methods. The method creates hyperplanes which are used to analyze data patterns and separate data into multiple classes. The computation complexity of the algorithm is very high for training and testing of large multidimensional datasets. In this work, we propose a scalable and cost-effective method to run SVM that reduces memory usage and computing power. The process uses distributed cloud GPU’s cluster nodes to run the algorithm in parallel on data which is divided into "n" parts. The results obtained from each of the cluster nodes are merged on a master node and the SVM algorithm is applied once more for classification. The study tackles the ECG classification using parallel SVM to investigate heartbeats and brain traces linked with different types of Arrhythmia and Seizure. Experiments performed on real ECG datasets (MIT BIH Diagnostic database and EEG Seizure database) resulted in a classification accuracy of 97.45%. The technique is proven both efficient by reducing training time and with high classification accuracy. The results achieved show that the proposed technique outperforms similar methods proposed in previous literature. © 2021 IEEE.

关键词： Electrocardiograms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：