Internet of things is a rapidly growing industry, and there are more connected devices than humans. There are unlimited internet of things devices around us. The scope of these devices has increased manifold in recent...
详细信息
ISBN:
(纸本)9781665400916
Internet of things is a rapidly growing industry, and there are more connected devices than humans. There are unlimited internet of things devices around us. The scope of these devices has increased manifold in recent years in different areas like healthcare, unmanned vehicles, automatic monitoring systems, and smart homes, etc. These connected devices produce lots of data that need to be processed to achieve the goal of automation in daily life. The cloud provides many centralized solutions to process data using powerful resources in real-time, but there could be latency issues due to increased network traffic. So a distributed solution is much needed to meet this growing demand of processing data in parallel. In this paper, we propose a distributed shared memory [18] abstraction for the internet of things that uses a graph-aware partitioning algorithm and uses smart message passing techniques to process data in distributed and parallel fashion [19].
As humans, the most natural way for expressing feelings and communicating is verbal speech. Communication with machines is done either by traditional input methods i.e., keyboard, joystick or with the help of speech r...
详细信息
In recent years, large-scale pretrained natural language processing models such as BERT, GPT3 have achieved good results in processing tasks. However, in daily applications, these large-scale language models usually e...
In recent years, large-scale pretrained natural language processing models such as BERT, GPT3 have achieved good results in processing tasks. However, in daily applications, these large-scale language models usually exist large model size or long running time, and the model portability and application are not very convenient. To settle this problem, we come up with a lightweight summarization generation method based on knowledge distillation (LW-BERT-KD, abbreviated as LBK model) to prove that complex neural network knowledge (taking BERT model as an example) can be extracted to lightweight language processing models (taking BiLSTM model as an example), thus achieving the purpose of complex knowledge extraction and model compression. In this paper, the BERT model is regarded as the teacher model, and the BiLSTM is regarded as the student model. The knowledge distillation technology is the transfer of knowledge from teacher models to student models, so that the generated model is effective, lightweight, and easy to port. On the Chinese abstract generation dataset, the training effect of this model is significantly better than the basic Transformer baseline. The experimental results show that if the training parameters are increased by 100 times, the model achieves comparable results with complex preprocessing language models.
The terabyte-scale of data series has motivated recent efforts to design fully distributedtechniques for supporting operations such as approximate kNN similarity search, which is a building block operation in most an...
详细信息
ISBN:
(数字)9798350317152
ISBN:
(纸本)9798350317169
The terabyte-scale of data series has motivated recent efforts to design fully distributedtechniques for supporting operations such as approximate kNN similarity search, which is a building block operation in most analytics services on data series. Unfortunately, these techniques are heavily geared towards achieving scalability at the cost of sacrificing the results' accuracy. State-of-the-art systems DPiSAX and TARDIS report accuracy below 10% and 40%, respectively, which is not practical for many real-world applications. In this paper, we investigate the root problems in these existing techniques that limit their ability to achieve better a trade-off between scalability and accuracy. Then, we propose a framework, called CLIMBER, that encompasses a novel feature extraction mechanism, indexing scheme, and query processing algorithms for supporting approximate similarity search in big data series. For CLIMBER, we propose a new loss-resistant dual representation composed of rank-sensitive and ranking-insensitive signatures capturing data series objects. Based on this representation, we devise a distributed two-level index structure supported by an efficient data partitioning scheme. Our similarity metrics tailored for this dual representation enables meaningful comparison and distance evaluation between the rank-sensitive and ranking-insensitive signatures. Finally, we propose two efficient query processing algorithms, CLIMBER-kNN and CLIMBER-kNN-Adaptive, for answering approximate kNN similarity queries. Our experimental study on real-world and benchmark datasets demonstrates that CLIMBER, unlike existing techniques, features results' accuracy above 80% while retaining the desired scalability to terabytes of data.
Videos are a popular type of media that require analysis to extract the information underlying the data in a timely manner. Often due to the very large size of such data and the involvement of computationally expensiv...
详细信息
Videos are a popular type of media that require analysis to extract the information underlying the data in a timely manner. Often due to the very large size of such data and the involvement of computationally expensive operations, performing the analysis can take a significant amount of time. This paper presents techniques to speed up deep learning-based analysis to perform tasks like tracking objects and filtering video data by applying parallelprocessingtechniques. The proposed approach and techniques leverage parallelprocessing on two levels: by using GPUs for analyzing individual frames and by distributing the processing load over a fleet of Executor nodes. Experiments with Apache Spark and TensorFlow-based prototypes built for handling various video analysis use cases were conducted on an Amazon EC2 cloud for various combinations of system and workload parameters. Insights into system performance including the reduction in processing time that accrues from applying the proposed parallelprocessing technique in each scenario are reported in the paper.
Security in signal processing involves implementing robust encryption techniques and authentication measures to safeguard sensitive information from unauthorized access or manipulation, ensuring the integrity and conf...
详细信息
ISBN:
(数字)9798350372120
ISBN:
(纸本)9798350372137
Security in signal processing involves implementing robust encryption techniques and authentication measures to safeguard sensitive information from unauthorized access or manipulation, ensuring the integrity and confidentiality of processed data in various applications. This paper presents a pioneering VLSI architecture merging Discrete Wavelet Transform (DWT) with robust encryption algorithms for fortified security in data processing. Aimed at embedded systems, it ensures data integrity and confidentiality. By marrying DWT’s computational efficiency for analysis and encryption prowess of AES, it enables real-time processing while safeguarding sensitive information. The design harnesses parallelprocessing, augmenting throughput and reducing latency for extensive data volumes. Comparative results highlight superior performance in speed, security, and resource utilization over traditional systems. This adaptable framework finds application in secure communications, biomedical signal processing, multimedia encryption, and IoT devices, addressing the critical intersection of signal processing and data security. This VLSI integration of DWT and advanced encryption presents a potent solution for secure, efficient data processing in resource-limited environments. The proposed system exhibits an on-chip power consumption of 535mW, a memory capacity of 819.922 MB, and a gain of 519.445 dB, all while maintaining a stable junction temperature of 33°C.
Stencil computation is widely adopted in scientific applications as one of the most significant computation patterns. Although there are various optimizations proposed to accelerate the stencil computation, the low-or...
详细信息
Reinforcement learning (RL) offers a transformative approach to adaptive signal processing on low-power edge devices operating in dynamic environments. This work introduces an RL-driven framework utilizing Proximal Po...
详细信息
ISBN:
(数字)9798331505745
ISBN:
(纸本)9798331505752
Reinforcement learning (RL) offers a transformative approach to adaptive signal processing on low-power edge devices operating in dynamic environments. This work introduces an RL-driven framework utilizing Proximal Policy Optimization (PPO) in conjunction with Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to autonomously enhance signal quality and reduce energy consumption. The model dynamically adjusts to fluctuating noise and interference by fine-tuning filtering techniques, optimizing compression, and adapting hardware configurations. Tests on platforms like NVIDIA Jetson Nano and Raspberry Pi show a 12% improvement in signal-to-noise ratio (SNR) and a 25% reduction in power consumption compared to baseline methods, showcasing RL's potential for real-time, energy-efficient edge processing. Future applications of this framework could extend to larger IoT networks, employing federated learning for distributed optimization across interconnected low-power devices.
Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Sinc...
详细信息
ISBN:
(纸本)9781665435772
Peak performance metrics published by vendors often do not correspond to what can be achieved in practice. It is therefore of great interest to do extensive benchmarking on core applications and library routines. Since DGEMM is one of the most used in compute-intensive numerical codes, it is typically highly vendor optimized and of great interest for empirical benchmarks. In this paper we show how to build a novel tool that autotunes the benchmarking process for the Roofline model. Our novel approach can efficiently and reliably find optimal configurations for any target hardware. Results of our tool on a range of hardware architectures and comparisons to theoretical peak performance are included. Our tool autotunes the benchmarks for the target architecture by deciding the optimal parameters through state space reductions and exhaustive search. Our core idea includes calculating the confidence interval using the variance and mean and comparing it against the current optimum solution. We can then terminate the evaluation process early if the confidence interval's maximum is lower than the current optimum solution. This dynamic approach yields a search time improvement of up to 116.33x for the DGEMM benchmarking process compared to a traditional fixed sample-size methodology. Our tool produces the same benchmarking result with an error of less than 2% for each of the optimization techniques we apply, while providing a great reduction in search time. We compare these results against hand-tuned benchmarking parameters. Results from the memory-intensive TRIAD benchmark, and some ideas for future directions are also included.
This paper presents a novel adaptive indexing framework designed to optimize query performance in distributed systems using Azure Data Explorer (ADX). As organizations increasingly rely on doud-based analytics for pro...
详细信息
ISBN:
(数字)9798331511470
ISBN:
(纸本)9798331511487
This paper presents a novel adaptive indexing framework designed to optimize query performance in distributed systems using Azure Data Explorer (ADX). As organizations increasingly rely on doud-based analytics for processing large-scale datasets, traditional static indexing approaches often fall short in handling dynamic query patterns and evolving data characteristics. The proposed framework leverages machine learning techniques to analyze historical and real-time query patterns, automatically adjusting indexing strategies to improve query efficiency. The architecture implements query-driven indexing, adaptive materialized views, and incremental indexing mechanisms, integrated seamlessly with ADX’s distributed environment. Experimental results, conducted on datasets ranging from 100GB to 1TB, demonstrate significant improvements in query performance metrics. Compared to traditional indexing methods, the adaptive framework achieved up to 54% reduction in query execution time, 27% decrease in CPU utilization, and 35% reduction in memory consumption. Statistical analysis confirms the significance of these improvements (p < 0.05). The framework’s effectiveness in optimizing resource utilization while maintaining query performance makes it particularly suitable for high-throughput, real-time analytics applications in cloud-based distributed systems.
暂无评论