The solution of over-determined equations plays a very important role in fields such as data fitting, signal processing, and machine learning. It is of great significance in predicting natural phenomena, optimizing en...
详细信息
Automatic Modulation Classification (AMC) is a technique used to identify signal modulations in applications like IoT devices, cognitive radar, software-defined radio, and electronic warfare. These applications could ...
详细信息
Automatic Modulation Classification (AMC) is a technique used to identify signal modulations in applications like IoT devices, cognitive radar, software-defined radio, and electronic warfare. These applications could be applied to IoT devices. With future wide applications of IoT devices, AMC algorithms need to be more compact yet suitable for embedded devices with limited resources and remain acceptable accuracy. Although current AMC algorithms deliver high accuracy, they require substantial computing power, making them unsuitable for IoT devices. This paper introduces the novel Chessboard-based Automatic Modulation Classification (CAMC) algorithm, which has dramatically high accuracy. Test results reveal that CAMC achieves 99%* accuracy under a 3dB SNR condition and 100% above 5dB SNR. Meanwhile, this algorithm is scalable and demands less computing power. It offers better accuracy results compared to state-of-the-art AMC algorithms, classifying mainstream modulations in IoT devices like BPSK, QPSK, 8PSK, and 16QAM, but requires less computing power than existing algorithms. Additionally, CAMC is hardware-friendly due to its inherent parallelism and scalability. The novelty of this paper is to classify 4 different modulations in a low-computation-loading required and hardware-friendly way and achieve a high accuracy of over 99%* above SNR of 3dB. (* Accuracy that most of the time could reach)
Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between nodes can significantly slow down training spe...
详细信息
Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between nodes can significantly slow down training speed, creating a bottleneck in distributed training. To address this issue, researchers are focusing on communication optimization algorithms for distributed deep learning systems. In this paper, we propose a standard that systematically classifies all communication optimization algorithms based on mathematical modeling, which is not achieved by existing surveys in the field. We categorize existing works into four categories based on the optimization strategies of communication: communication masking, communication compression, communication frequency reduction, and hybrid optimization. Finally, we discuss potential future challenges and research directions in the field of communication optimization algorithms for distributed deep learning systems.
A parallel algorithm for enumerating parse trees of a given string according to a fixed context-free grammar is defined. The algorithm computes the number of parse trees of an input string;more generally, it applies t...
详细信息
A fuzzy-model-based approach is developed to investigate the reinforcement learning-based optimization for nonlinear Markov jump singularly perturbed systems. As the first attempt, an offline parallel iteration learni...
详细信息
A fuzzy-model-based approach is developed to investigate the reinforcement learning-based optimization for nonlinear Markov jump singularly perturbed systems. As the first attempt, an offline parallel iteration learning algorithm is presented to solve the coupled algebraic Riccati equations with singular perturbation and jumping parameters. Furthermore, based on the integral reinforcement learning approach, a novel online parallel learning algorithm is proposed by employing the slow and fast sampled data simultaneously, where the impacts of stochastic jumping and ill-conditioned numerical problems are avoided. Meanwhile, the convergence of the proposed learning algorithms is proved. Finally, we present a tunnel diode circuit model to demonstrate the efficacy of the proposed methods.
Given all pairwise weights (distances) among a set of objects, filtered graphs provide a sparse representation by only keeping an important subset of weights. Such graphs can be passed to graph clustering algorithms t...
详细信息
The breadth-first search procedure is an algorithm that traverses the vertices of a graph, determining the distance from each vertex to the initial vertex. The distance is infinite for a non-reachable vertex from the ...
详细信息
The breadth-first search procedure is an algorithm that traverses the vertices of a graph, determining the distance from each vertex to the initial vertex. The distance is infinite for a non-reachable vertex from the starting vertex. Despite having an efficient serial version, this important algorithm is irregular, making its effective parallel implementation a daunting task. This paper shows the results of an OpenMP-based implementation of the breadth-first search procedure using the bag data structure. Furthermore, the code relied on the C++ programming language. This paper reimplements an existing proposal coded using the Cilk++ programming language. The experiments relied on 32 strongly connected graphs and 31 disconnected graphs in executions performed on two machines. The first machine contained 28 cores and two threads per core. The second machine comprised 48 processing cores, with hyperthreading disabled. Regarding the serial version, the parallel implementation yielded a speedup of up to 20x when using 28 processing cores and up to 25x when using 56 threads in tests performed on a machine with the first generation of Intel (R) Xeon (R) Scalable processors. Furthermore, the new parallel implementation yielded speedups of up to 45x when using 48 cores in experiments performed on a machine with the second generation of Intel (R) Xeon (R) Scalable processors.
Group testing is a widely used binary classification method that efficiently distinguishes between samples with and without a binary-classifiable attribute by pooling and testing subsets of a group. Bayesian Group Tes...
详细信息
ISBN:
(纸本)9798400714436
Group testing is a widely used binary classification method that efficiently distinguishes between samples with and without a binary-classifiable attribute by pooling and testing subsets of a group. Bayesian Group Testing (BGT) is the state-of-the-art approach, which integrates prior risk information into a Bayesian Boolean Lattice framework to minimize test counts and reduce false classifications. However, BGT, like other existing group testing techniques, struggles with multinomial group testing, where samples have multiple binary-classifiable attributes that can be individually distinguished simultaneously. We address this need by proposing Bayesian Multinomial Group Testing (BMGT), which includes a new Bayesian-based model and supporting theorems for an efficient and precise multinomial pooling strategy. We further design and develop SBMGT, a high-performance and scalable framework to tackle BMGT's computational challenges by proposing three key innovations: 1) a parallel binaryencoded product lattice model with up to 99.8% efficiency;2) the Bayesian Balanced Partitioning Algorithm (BBPA), a multinomial pooling strategy optimized for parallel computation with up to 97.7% scaling efficiency on 4096 cores;and 3) a scalable multinomial group testing analytics framework, demonstrated in a real-world disease surveillance case study using AIDS and STDs datasets from Uganda, where SBMGT reduced tests by up to 54% and lowered false classification rates by 92% compared to BGT.
The development of high-precision surface modeling has always been an important research field in computer science. One way to solve this problem is to use differential geometry, which provides a mathematical framewor...
详细信息
Running time is a key metric across the standard physical design flow stages. However, with the rapid growth in design sizes, routing runtime has become the runtime bottleneck in the physical design flow. As a result,...
详细信息
Running time is a key metric across the standard physical design flow stages. However, with the rapid growth in design sizes, routing runtime has become the runtime bottleneck in the physical design flow. As a result, speeding routing becomes a critical and pressing task for IC design automation. Aside from the running time, we need to evaluate the quality of the global routing solution since a poor global routing engine degrades the solution performance after the entire routing stage. This work takes both of them into consideration. We propose a global routing framework with GPU-accelerated routing algorithms and a heterogeneous task graph scheduler, called FastGR, to accelerate the procedure of the modern global router and improve its effectiveness. Its runtime-oriented version FastGRL achieves 2.489x speedup compared with the state-of-the-art global router. Furthermore, the GPU-accelerated L-shape pattern routing algorithm used in FastGRL can contribute to 9.324x speedup over the sequential algorithm on CPU. Its quality-oriented version FastGRH offers a 27.855% improvement of the number of shorts over the runtime-oriented version and still gets 1.970x faster than the most advanced global router.
暂无评论