This article provides a detailed overview of the process of implementing an FIR digital filter using the TI TMS320 series DSP platform with the C programming language. It covers topics such as the working principle of...
详细信息
ISBN:
(数字)9798350352719
ISBN:
(纸本)9798350352726
This article provides a detailed overview of the process of implementing an FIR digital filter using the TI TMS320 series DSP platform with the C programming language. It covers topics such as the working principle of FIR filters, design methods, DSP development environment setup, C code implementation, simulation verification, and system testing. Through simulation verification, the frequency response characteristics of the FIR filter are confirmed, and the system testing results demonstrate that the FIR filter achieves effective signal filtering under appropriate conditions with accurate frequency response and high signal-to-noise ratio. This document comprehensively validates the excellent signal processing performance of FIR filters on the DSP platform, offering valuable insights for applications involving FIR filter design on DSP platforms.
From security to marketing to human-computer interaction, gender recognition from facial photographs is a basic task in many disciplines. This effort intends to develop an appropriate gender recognition model using a ...
详细信息
In training of modern large natural language processing (NLP) models, it has become a common practice to split models using 3D parallelism to multiple GPUs. Such technique, however, suffers from a high overhead of int...
详细信息
ISBN:
(纸本)9781450399166
In training of modern large natural language processing (NLP) models, it has become a common practice to split models using 3D parallelism to multiple GPUs. Such technique, however, suffers from a high overhead of inter-node communication. Compressing the communication is one way to mitigate the overhead by reducing the inter-node traffic volume;however, the existing compression techniques have critical limitations to be applied for NLP models with 3D parallelism in that 1) only the data parallelism traffic is targeted, and 2) the existing compression schemes already harm the model quality too much. In this paper, we present Optimus-CC, a fast and scalable distributed training framework for large NLP models with aggressive communication compression. Optimus-CC differs from existing communication compression frameworks in the following ways: First, we compress pipeline parallel (inter-stage) traffic. In specific, we compress the inter-stage backpropagation and the embedding synchronization in addition to the existing data-parallel traffic compression methods. Second, we propose techniques to avoid the model quality drop that comes from the compression. We further provide mathematical and empirical analyses to show that our techniques can successfully suppress the compression error. Lastly, we analyze the pipeline and opt to selectively compress those traffic lying on the critical path. This further helps reduce the compression error. We demonstrate our solution on a GPU cluster, and achieve superior speedup from the baseline state-of-the-art solutions for distributed training without sacrificing the model quality.
Deep learning has enabled a variety of detection methods, such as text identification and recognition, picture extraction, visual-action detection, and object movement detection, to be implemented in a single algorith...
详细信息
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the infl...
Neural Radiance Field (NeRF) has received widespread attention for its photo-realistic novel view synthesis quality. Current methods mainly represent the scene based on point sampling of ray casting, ignoring the influence of the observed area changing with distance. In addition, The current sampling strategies are all focused on the distribution of sampling points on the ray, without paying attention to the sampling of the ray. We found that the current ray sampling strategy for scenes with the camera moving forward severely reduces the convergence speed. In this work, we extend the point representation to area representation by using relative positional encoding, and propose a ray sampling strategy that is suitable for camera trajectory moving forward. We validated the effectiveness of our method on multiple public datasets.
Recently, AI and deep neural networks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal ...
详细信息
ISBN:
(纸本)9798400716713
Recently, AI and deep neural networks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal latency, there is a need. However, IoT devices, with their limited computing capabilities, are not well-suited for AI inference. Moreover, considering the diverse requirements of different services, it is necessary to provide inference services that address these variations. To address these challenges, many previous studies have explored collaborative approaches between edge servers and cloud servers by partitioning DNN models. However, these methods face difficulties in finding optimal partitioning points for splitting DNN models and are heavily influenced by network bandwidth since intermediate computation results need to be transmitted to other devices. In this paper, we propose the Adaptive block-based DNN network inference framework. This involves breaking down a large DNN model into block-level networks, training them using knowledge distillation techniques to enable inference only through each block network. Subsequently, dynamic block-level inference calculations are offloaded based on the computing capabilities of edge clusters, providing inference results. Even when using multiple devices, our method is not affected by network bandwidth since only input images need to be transmitted. Experimental results demonstrate that our approach consistently reduces inference latency as the number of devices increases. Additionally, by controlling the trade-off between latency and accuracy, we can provide inference services tailored to various latency requirements.
Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execu...
ISBN:
(纸本)9781713871088
Scaling up model sizes can lead to fundamentally new capabilities in many machine learning (ML) tasks. However, training big models requires strong distributed system expertise to carefully design model-parallel execution strategies that suit the model architectures and cluster setups. In this paper, we develop AMP, a framework that automatically derives such strategies. AMP identifies a valid space of model parallelism strategies and efficiently searches the space for high-performed strategies, by leveraging a cost model designed to capture the heterogeneity of the model and cluster specifications. Unlike existing methods, AMP is specifically tailored to support complex models composed of uneven layers and cluster setups with more heterogeneous accelerators and bandwidth. We evaluate AMP on popular models and cluster setups from public clouds and show that AMP returns parallel strategies that match the expert-tuned strategies on typical cluster setups. On heterogeneous clusters or models with heterogeneous architectures, AMP finds strategies with 1.54x and 1.77x higher throughput than state-of-the-art model-parallel systems, respectively.(1)
The medical routine of the future is strongly influenced by medical information technology. The quality and efficiency of medicine are at higher standards due to the image-based methods and the increase in computation...
详细信息
ISBN:
(数字)9781665485579
ISBN:
(纸本)9781665485586
The medical routine of the future is strongly influenced by medical information technology. The quality and efficiency of medicine are at higher standards due to the image-based methods and the increase in computation power. parallel and distributed computing are attractive alternatives to improve processing performance in many areas of study and this paper investigates such solutions for increasing the throughput on large datasets in medical imaging. In this research, different edge detection techniques are used for imageprocessing, as it is the most common and challenging task in the medical domain. The results are relevant and present the effectiveness of the two processing approaches compared to a single CPU for the acceleration of medical imageprocessing. The CUDA processing platform using a hybrid streams-based programming model is the parallel approach. The distributed system is a Linux-based cluster, which uses message passing for communication and efficient load balancing between nodes for high performance. This paper highlights parallel computing as a very good solution for processing large image datasets in the medical domain. Good results were also obtained for the distributed system.
Fuzzy images exist in all imaging processes, including computer vision, photography and medical imaging. Many methods based on deep learning can deblur the image and have a good output. Medical image is a typical proc...
详细信息
Keyword extraction is an important part of text mining. Keywords extracted from text can explore the theme of the whole text and be further applied to text recommendation or text search. Under the wave of big data, th...
详细信息
ISBN:
(数字)9798350360240
ISBN:
(纸本)9798350384161
Keyword extraction is an important part of text mining. Keywords extracted from text can explore the theme of the whole text and be further applied to text recommendation or text search. Under the wave of big data, the traditional text keyword extraction algorithm frequently appears in the low efficiency of the training and testing process on a single machine. Based on this, this paper proposes the TFIDF text classification algorithm based on the distributed platform of Hadoop, and gives the specific process of the algorithm implementation. The parallel TFIDF text classification algorithm considering the position of words in the document is implemented by MapReduce programming model, and compared with the traditional serial algorithm, the experiment is carried out in single machine and cluster mode. Experiments show that the proposed method improves the performance of the algorithm significantly, and can realize efficient information extraction at high speed in the face of massive data.
暂无评论