parallel/Distributed particle filters estimate the states of dynamic systems by using Bayesian interference and stochastic sampling techniques with multiple processing units (PUs). the sampling procedure and the resam...
详细信息
ISBN:
(纸本)9798400708688
parallel/Distributed particle filters estimate the states of dynamic systems by using Bayesian interference and stochastic sampling techniques with multiple processing units (PUs). the sampling procedure and the resampling procedure alternatively execute to estimate the states in particle filters. there are two basic types of resampling techniques used in parallel/distributed particle filters. they are centralized resampling and decentralized resampling. the high communication between PUs in centralized resampling lowers the speedup factor in parallel computing but improves the estimation accuracy. the decentralized resampling can avoid the communication and improve the performance. Some types of hybrid resampling techniques mainly execute the decentralized resampling and only invoke the centralized resampling with constant intervals to achieve ideal performance without losing the estimation accuracy. However, the constant intervals cannot guarantee that the centralized resamplings are invoked timely. In this study, we proposed a hybrid resampling technique with adaptive intervals between centralized resamplings to overcome that issue. the experimental results indicate that the proposed hybrid resampling technique is able to improve the performance and the estimation accuracy.
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning s...
ISBN:
(纸本)9783030050627
the proceedings contain 191 papers. the special focus in this conference is on algorithms and architectures for parallelprocessing. the topics include: Incentivizing multimedia data acquisition for machine learning system;Toward performance prediction for multi-BSP programs in ML;exploiting the table of energy and power leverages;a semantic web based intelligent IoT model;Accelerating CNNs using optimized scheduling strategy;data analysis of blended learning in python programming;APs deployment optimization for indoor fingerprint positioning with adaptive particle swarm algorithm;deployment optimization of indoor positioning signal sources with fireworks algorithm;a study of sleep stages threshold based on multiscale fuzzy entropy;qoS-driven service matching algorithm based on user requirements;Blind estimation algorithm over fast-fading multipath OFDM channels;facial shape and expression transfer via non-rigid image deformation;p-schedule: Erasure coding schedule strategy in big data storage system;Answer aggregation of crowdsourcing employing an improved EM-based approach;a parallel fast fourier transform algorithm for large-scale signal data using apache spark in cloud;task offloading in edge-clouds with budget constraint;motion trajectory sequence-based map matching assisted indoor autonomous mobile robot positioning;towards the independent spanning trees in the line graphs of interconnection networks;POEM: Pricing longer for edge computing in the device cloud;mobility analysis and response for software-defined internet of things;Research on overload classification method for bus images based on image processing and SVM;DStore: A distributed cloud storage system based on smart contracts and blockchain;towards an efficient and real-time scheduling platform for mobile charging vehicles;Streaming ETL in polystore era.
the constantly evolving cloud ecosystems are promising environments for the execution of science-intensive applications within the framework of the new concept of Workflow-as-a-Service (WaaS). WaaS platforms, being th...
详细信息
ISBN:
(数字)9798350371369
ISBN:
(纸本)9798350371376
the constantly evolving cloud ecosystems are promising environments for the execution of science-intensive applications within the framework of the new concept of Workflow-as-a-Service (WaaS). WaaS platforms, being the so-called multitenant environments, provide efficient mechanisms for managing continuous and heterogeneous job-flows in cloud computing. However, in this scenario, along withthe complexity of scheduling composite workflows, an additional problem arises to efficiently manage the cloud resources, that is, to determine the start and shutdown times of virtual resources, considering the available economic policy. To handle this problem, we propose a modification to the critical jobs method which has been the basis of the Systems Design course in the Department of Computing Technologies for over 15 years. the resulting solution combines several heuristics to optimize the cloud resource management for WaaS platforms and largely determined the successful completion of RSF project. Some experiments withthe real-world workflows prove the optimization efficiency of the proposed approach.
Recent advances in convolutional neural networks have shown promise for a wide range of engineering applications including production quality assessment. this has been enabled in part by the availability of open-sourc...
详细信息
In the past we have seen two major "walls" (memory and power) whose vanquishing required significant advances in architecture. this paper discusses evidence of a third wall dealing with data locality, which ...
详细信息
ISBN:
(纸本)9783030816827;9783030816810
In the past we have seen two major "walls" (memory and power) whose vanquishing required significant advances in architecture. this paper discusses evidence of a third wall dealing with data locality, which is prevalent in data intensive applications where computation is dominated by memory access and movement - not flops, Such apps exhibit large sets of often persistent data, with little reuse during computation, no predictable regularity, significantly different scaling characteristics, and where streaming is becoming important. Further, as we move to highly parallelalgorithms (as in running in the cloud), these issues will get even worse. Solving such problems will take a new set of innovations in architecture. In addition to data on the new wall, this paper will look at one possible technique: the concept of migrating threads, and give evidence of its potential value based on several benchmarks that have scaling difficulties on conventional architectures.
this study delves into the intersection of data science and machine learning methodologies, specifically examining the recognition of Artificial Intelligence (AI)-generated papers and images. We curate a comprehensive...
详细信息
ISBN:
(数字)9798350350890
ISBN:
(纸本)9798350350906
this study delves into the intersection of data science and machine learning methodologies, specifically examining the recognition of Artificial Intelligence (AI)-generated papers and images. We curate a comprehensive dataset comprising 200 papers and 855 images, meticulously segregating them into AI-generated and non-AI categories, and transform this raw data into structured formats suitable for rigorous analysis. For text classification, we employ a Multicore Support Vector Machines (SVM) model, optimized through cross-validation and grid search techniques, to accurately distinguish AI-authored papers from human-penned ones. For image classification, we develop an enhanced model that builds upon the strengths of ResNet and DenseNet architectures, achieving high accuracy in discerning AI-generated images. Furthermore, we integrate these two classification systems within a weighted, top-level decision framework, offering a holistic approach to AI content recognition. the proposed methodology and findings offer a novel perspective on AI content detection, with potential applications in copyright protection, academic integrity assessment, and media content monitoring. this research contributes to advancing the state-of-the-art in AI content identification and underscores the importance of robust tools for managing the proliferation of AI-generated content.
Pulse compression algorithms play an important role in achieving better range resolution (RR) in automobile radar and ultrasonic sensors. the RR is the sensor's capacity to distinguish between two targets position...
详细信息
ISBN:
(数字)9798350305463
ISBN:
(纸本)9798350305470
Pulse compression algorithms play an important role in achieving better range resolution (RR) in automobile radar and ultrasonic sensors. the RR is the sensor's capacity to distinguish between two targets positioned in the same angular direction but at different distances. the Digital Signal processing Architecture (DSPA) used in the pulse compressor is crucial for reaching high speeds or reducing hardware needs. this work presents two unique radar pulse compressor DSPAs for Barker-7 sequences, based on latency and area. One proposed design is the unfolding DSPA technology, which increases sample rate and speed. At the same time, another architecture, the folding DSPA approach, reduces hardware utilization and power consumption. the proposed DSPA design architectures are implemented and verified withthe Artix-7 FPGA Kit. the unfolded Barker-7 pulse compressor increased the speed by 1.87 times as compared to the standard broadcasting Barker-7 realization, as demonstrated by the hardware implementation results. Folded Barker-7 architecture consumed 89% less power than existing Barker-7 broadcast architecture. the DSPAs proposed for the Barker-7 pulse compressor are ideal for ultrasonic, sonar, and radar applications that require high speed and low power.
Owing to the drastic demand of cryptographic primitives for resource-constrained environments, lots of lightweight cipher designs sprout up in the last twenty years. PRINCE is a hardware-oriented block cipher which me...
详细信息
ISBN:
(数字)9798350355673
ISBN:
(纸本)9798350355680
Owing to the drastic demand of cryptographic primitives for resource-constrained environments, lots of lightweight cipher designs sprout up in the last twenty years. PRINCE is a hardware-oriented block cipher which meets the low-latency encryption demand and has a compact design fashion. Many hardware implementations were proposed, however, there are few software implementations optimized for PRINCE. In this paper, we show that how to accelerate the software performance of the PRINCE block cipher using GPU programming. Our implementation involves the process of transferring the data between CPU and GPU, and the parallel executions of PRINCE kernel on GPU. Some optimizing techniques are adopted in our implementation, such as combining multiple looking-up-tables, function simplification, allocating suitable memories for different components and speeding up the memory access, using unified memory management to speed up the data transfer, and optimizing the size of the thread block. With a GTX 1070 GPU card on a common laptop, the throughput of the whole data processing reaches to about 64 Gbps, which speeds up about 4000 times compared to the original CPU implementation.
As the gap between compute and I/O performance tends to grow, modern High-Performance Computing (HPC) architectures include a new resource type: an intermediate persistent fast memory layer, called burst buffers. this...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
As the gap between compute and I/O performance tends to grow, modern High-Performance Computing (HPC) architectures include a new resource type: an intermediate persistent fast memory layer, called burst buffers. this is just one of many kinds of renewable resources which are orthogonal to the processors themselves, such as network bandwidth or software licenses. Ignoring orthogonal resources while making scheduling decisions just for processors may lead to unplanned delays of jobs of which resource requirements cannot be immediately satisfied. We focus on a classic problem of makespan minimization for parallel-machine scheduling of independent sequential jobs with additional requirements on the amount of a single renewable orthogonal resource. We present an easily-implementable log-linear algorithm that we prove is 25/6-approximation. In simulation experiments, we compare our algorithm to standard greedy list-scheduling heuristics and show that, compared to LPT, resource-based algorithms generate significantly shorter schedules.
Static learning is a learning algorithm for finding additional implicit implications between gates in a netlist. In automatic test pattern generation (ATPG) the learned implications help recognize conflicts and redund...
详细信息
ISBN:
(纸本)9781665413343
Static learning is a learning algorithm for finding additional implicit implications between gates in a netlist. In automatic test pattern generation (ATPG) the learned implications help recognize conflicts and redundancies early, and thus greatly improve the performance of ATPG. though ATPG can further benefit from multiple runs of incremental or dynamic learning, it is only feasible when the learning process is fast enough. In the paper, we study speeding up static learning through parallelization on heterogeneous computing platform, which includes multi-core microprocessors (CPUs), and graphics processing units (GPUs). We discuss the advantages and limitations in each of these architectures. Withtheir specific features in mind, we propose two different parallelization strategies that are tailored to multi-core CPUs and GPUs. Speedup and performance scalability of the two proposed parallelalgorithms are analyzed. As far as we know, this is the first time that parallel static learning is studied in the literature.
暂无评论