This article introduces a fast Central Processing Unit (CPU) implementation of geodesic morphological operations using stream processing. In contrast to the current state-of-the-art, that focuses on achieving insensit...
详细信息
This article introduces a fast Central Processing Unit (CPU) implementation of geodesic morphological operations using stream processing. In contrast to the current state-of-the-art, that focuses on achieving insensitivity to the filter sizes with efficient data structures, the proposed approach achieves efficient computation of long chains of elementary 3 x 3 filters using multicore and Single Instruction Multiple Data (SIMD) processing. In comparison to the related methods, up to 100 times faster computation of common geodesic operators is achieved in this way, allowing for real-time processing (with over 30 FPS) of up to 1500 filters long chains, applied on 1024 x 1024 images. In addition, the proposed approach outperformed GPGPU, and proved to be more efficient than the comparable streaming method for the computation of morphological erosions and dilations with window sizes up to 183 x 183 in the case of using char and 27 x 27 when using double data types.
One obstacle to application development on multi-FPGA systems with high-level synthesis (HLS) is a lack of support for a programming interface. Implementing and debugging an application on multiple FPGA boards is diff...
详细信息
ISBN:
(纸本)9781665465007
One obstacle to application development on multi-FPGA systems with high-level synthesis (HLS) is a lack of support for a programming interface. Implementing and debugging an application on multiple FPGA boards is difficult without a standard interface. Message Passing Interface (MPI) is a standard parallel programming interface commonly used in distributed memory systems. This paper presents a tool-independent MPI library called FiC-MPI that can be used in HLS for multi-FPGA systems in which each FPGA node is connected directly. By using FiC-MPI, various parallel software, including a general-purpose benchmark, can be easily implemented. FiC-MPI was implemented and evaluated on the M-KUBOS cluster consisting of Zynq MPSoC boards connected with a static time-division multiplexing network. By using the FiC-MPI simulator, parallel programs can be debugged before implementing on real machines. As a case study, the Himeno-BMT benchmark was implemented with FiC-MPI. It achieved 178.7 MFLOPS with a single node and scaled to 643.7 MFLOPS with four nodes, and 896.9 MFLOPS with six nodes of the M-KUBOS cluster. Through the implementation, the easiness of developing parallel programs with FiC-MPI on multi-FPGA systems was demonstrated.
As parallel processing became ubiquitous in modern computing systems, parallel task models have been proposed to describe the structure of parallel applications. The workflow scheduling problem has been studied extens...
详细信息
ISBN:
(纸本)9781728168760
As parallel processing became ubiquitous in modern computing systems, parallel task models have been proposed to describe the structure of parallel applications. The workflow scheduling problem has been studied extensively over past years, focusing on multiprocessor systems and distributed environments (e.g. grids, clusters). In workflow scheduling, applications are modeled as directed acyclic graphs (DAGs). DAGs have also been introduced in the real-time scheduling community to model the execution of multi-threaded programs on a multi-core architecture. The DAG model assumes, in most cases, a fixed DAG structure capturing only straight-line code. Only recently, more general models have been proposed. In particular, the conditional DAG model allows the presence of control structures such as conditional (if-then-else) constructs. While first algorithmic results have been presented for the conditional DAG model, the complexity of schedulability analysis remains wide open. We perform a thorough analysis on the worst-case makespan (latest completion time) of a conditional DAG task under list scheduling (a.k.a. fixed-priority scheduling). We show several hardness results concerning the complexity of the optimization problem on multiple processors, even if the conditional DAG has a well-nested structure. For general conditional DAG tasks, the problem is intractable even on a single processor. Complementing these negative results, we show that certain practice-relevant DAG structures are very well tractable.
The technological development of the last few years has made a contribution to the form of the work. The tendency to development of work environmental with features that are like those in real life has gone mainstream...
The technological development of the last few years has made a contribution to the form of the work. The tendency to development of work environmental with features that are like those in real life has gone mainstream. The COVID-19 World Pandemic increased the rapidity of implementing these new environmental policies in businesses. Moreover, globalization and sustainability principles allow businesses to be more intercultural and more responsible for the environment. The approach to creating remote work is “it” thing in business now. There are enough researcher results that evaluate the impact of remote work on teamwork. They marked merits and demerits and lightened the main changes in competencies as a leader and a team member by offering some time management and psychological approaches for organization and supporting high productivity during work time. However, russian aggression Ukraine, which has been starting since 2014 and received the second active phase on February 24, 2022, influenced the remote work. The business routine got new challenges. This paper is the first step toward understanding how a war can influence a team member and how it can influence remote work: Ukraine case. The main research questions that were formulated for this research were: (1) how much influence the war has on remote work; (2) if there has been a change in the team development model; and (3) how much influence the war has had on leadership competencies. The research was based on literature reviews and the authors' own experience.
A turbine for power generation is one of the essential infrastructures in our society. A turbine’s failure causes severe social and economic impacts on our everyday life. Therefore, it is necessary to foresee such fa...
详细信息
B-mode ultrasound tongue imaging is a non-invasive and real-time method for visualizing vocal tract deformation. However, accurately extracting the tongue’s surface contour remains a significant challenge due to the ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
B-mode ultrasound tongue imaging is a non-invasive and real-time method for visualizing vocal tract deformation. However, accurately extracting the tongue’s surface contour remains a significant challenge due to the low signal-to-noise ratio (SNR) and prevalent speckle noise in ultrasound images. Traditional supervised learning models often require large labeled datasets, which are labor-intensive to produce and susceptible to noise interference. To address these limitations, we present a novel Counterfactual Ultrasound Anti-Interference Self-Supervised Network (CUAI-SSN), which integrates self-supervised learning (SSL) with counterfactual data augmentation, progressively disentangles confounding factors, ensuring that the model generalizes well across varied ultrasound conditions. Our approach leverages causal reasoning to decouple noise from relevant features, enabling the model to learn robust representations that focus on essential tongue structures. By generating counterfactual image-label pairs, our method introduces alternative, noise-independent scenarios that enhance model training. Furthermore, we introduce attention mechanisms to enhance the network’s ability to capture fine-grained details even in noisy conditions. Extensive experiments on real ultrasound tongue images demonstrate that CUAI-SSN outperforms existing methods, setting a new benchmark for automated contour extraction in ultrasound tongue imaging. Our code is publicly available at https://***/inexhaustible419/CounterfactualultrasoundAI.
Synchronous stochastic gradient descent (S-SGD) with data parallelism is widely used for training deep learning (DL) models in distributedsystems. A pipelined schedule of the computing and communication tasks of a DL...
详细信息
ISBN:
(纸本)9780738112817
Synchronous stochastic gradient descent (S-SGD) with data parallelism is widely used for training deep learning (DL) models in distributedsystems. A pipelined schedule of the computing and communication tasks of a DL training job is an effective scheme to hide some communication costs. In such pipelined S-SGD, tensor fusion (i.e., merging some consecutive layers' gradients for a single communication) is a key ingredient to improve communication efficiency. However, existing tensor fusion techniques schedule the communication tasks sequentially, which overlooks their independence nature. In this paper, we expand the design space of scheduling by exploiting simultaneous All-Reduce communications. Through theoretical analysis and experiments, we show that simultaneous All-Reduce communications can effectively improve the communication efficiency of small tensors. We formulate an optimization problem of minimizing the training iteration time, in which both tensor fusion and simultaneous communications are allowed. We develop an efficient optimal scheduling solution and implement the distributed training algorithm ASC-WFBP with Horovod and PyTorch. We conduct real-world experiments on an 8-node GPU cluster of 32 GPUs with 10Gbps Ethernet. Experimental results on four modern DNNs show that ASC-WFBP can achieve about 1.09 x -2.48x speedup over the baseline without tensor fusion, and 1.15 x -1.35 x speedup over the state-of-the-art tensor fusion solution.
If the challenges involved in agroecological transition are to be addressed, cropping systems (CS) need to be changed profoundly, which in turn requires innovative design adapted to local conditions. This is however b...
详细信息
If the challenges involved in agroecological transition are to be addressed, cropping systems (CS) need to be changed profoundly, which in turn requires innovative design adapted to local conditions. This is however by no means an easy task since such design activity requires extensive knowledge on objects and processes rarely studied until now, most of which is distributed among numerous stakeholders. Since the 2000s, research on design in agriculture has aimed at developing participatory methods to support on-farm design of new systems, but few studies have focused on the elaboration of design-support tools. With a view to defining the features of tools intended to support the design of agroecology-oriented cropping systems, ergonomists recommended an analysis of the activities of the future users of these tools in their real work situations. We started out by implementing a diagnosis of use situations, based on observations of real collective design activities. To this end, we took part in six design workshops, which differed in terms of goals and of designers participating (i.e., farmers, advisors, students, or scientists). We first identified the diversity of features of these design situations, and then analyzed three processes across the design workshops: (i) the reformulation of the design goal;(ii) the large exploration of candidate solutions;and (iii) the local adaptation of these solutions while anticipating the on-field implementation. Here, we show, for the first time, the type of reasonings and knowledge that designers and facilitators displayed and used throughout the agroecological cropping system design process. We identify the features that future design-support tools should have to guide co-designers of agroecological CS. Such tools should promote several types of design reasoning and allow the development of external representations of the object under design. Our results provide operational guidelines for the elaboration of new design-support to
Regression analysis and classification can be done using a supervised learning technique called Support Vector Machine (SVM) which is one of many such methods. The method creates hyperplanes which are used to analyze ...
详细信息
暂无评论