Transformer neural networks (TNN) have been widely utilized on a diverse range of applications, including natural language processing (NLP), machine translation, and computer vision (CV). Their widespread adoption has...
详细信息
ISBN:
(纸本)9798350355543
Transformer neural networks (TNN) have been widely utilized on a diverse range of applications, including natural language processing (NLP), machine translation, and computer vision (CV). Their widespread adoption has been primarily driven by the exceptional performance of their multi-head self-attention block used to extract key features from sequential data. The multi-head self-attention block is followed by feedforward neural networks, which play a crucial role in introducing non-linearity to assist the model in learning complex patterns. Despite the popularity of TNNs, there has been limited numbers of hardware accelerators targeting these two critical blocks. Most prior works have concentrated on sparse architectures that are not flexible for popular TNN variants. This paper introduces ProTEA, a runtime programmable accelerator tailored for the dense computations of most of state-of-the-art transformer encoders. ProTEA is designed to reduce latency by maximizing parallelism. We introduce an efficient tiling of large matrices that can distribute memory and computing resources across different hardware components within the FPGA. We provide run time evaluations of ProTEA on a Xilinx Alveo U55C high-performance data center accelerator card. Experimental results demonstrate that ProTEA can host a wide range of popular transformer networks and achieve near optimal performance with a tile size of 64 in the multi-head self-attention block and 6 in the feedforward networks block when configured with 8 parallel attention heads, 12 layers, and an embedding dimension of 768 on the U55C. Comparative results are provided showing ProTEA is 2.5× faster than an NVIDIA Titan XP GPU. Results also show that it achieves 1.3 - 2.8 × speed up compared with current state-of-the-art custom designed FPGA accelerators.
Independent human living systems require smart,intelligent,and sustainable online monitoring so that an individual can be assisted *** from ambient assisted living,the task of monitoring human activities plays an impo...
详细信息
Independent human living systems require smart,intelligent,and sustainable online monitoring so that an individual can be assisted *** from ambient assisted living,the task of monitoring human activities plays an important role in different fields including virtual reality,surveillance security,and human interaction with *** systems have been developed in the past with the use of various wearable inertial sensors and depth cameras to capture the human *** this paper,we propose multiple methods such as random occupancy pattern,spatio temporal cloud,waypoint trajectory,Hilbert transform,Walsh Hadamard transform and bone pair descriptors to extract optimal features corresponding to different human *** features sets are then normalized using min-max normalization and optimized using the Fuzzy optimization ***,the Masi entropy classifier is applied for action recognition and *** have been performed on three challenging datasets,namely,UTDMHAD,50 Salad,and *** experimental evaluation,the proposed novel approach of recognizing human actions has achieved an accuracy rate of 90.1%with UTD-MHAD dataset,90.6%with 50 Salad dataset,and 89.5%with CMU-MMAC *** experimental results validated the proposed system.
Nowadays, the IoT ecosystem is evolving rapidly, with multiple heterogeneous sources producing high volumes of data and processes transforming this data into meaningful or 'smart' information. These volumes of...
详细信息
Generative Artificial Intelligence (GenAI) models such as LLMs, GPTs, and Diffusion Models have recently gained widespread attention from both the research and the industrial communities. This survey explores their ap...
详细信息
When designing an imaging system to perform photoacoustic-guided hysterectomy, one approach to achieve direct illumination of an imaging target is to attach optical fibers to the surgical tool. However, light blockage...
详细信息
The paper presents a scaled laboratory experimental model of ferroresonant circuit designed for detailed investigation of the ferroresonance phenomena. To enable accurate ferroresonance examination, the system is expa...
详细信息
engineering outreach and introductory courses are essential for motivating and training the next generation of capable engineers. Accessibility and portability of the infrastructure for a STEM course is critical for s...
详细信息
ISBN:
(数字)9798350352801
ISBN:
(纸本)9798350352818
engineering outreach and introductory courses are essential for motivating and training the next generation of capable engineers. Accessibility and portability of the infrastructure for a STEM course is critical for spreading STEM education and motivation efforts. The ubiquity of student personal computers and laptops provides a convenient platform for introducing computer programming, but the relatively cumbersome equipment deemed necessary for a hands-on electronics learning environment contributes to the lack of electronic circuit hardware exposure for many K-12 students and university undergraduates. This paper presents the relatively low-cost, versatile, electronics platform we have developed at MIT upon which our new, hands-on introductory STEM electricalengineering course has been built and taught in recent years. This paper focuses its discussion on the power supply and measurement systems we have produced that allow students and instructors to engage in hands-on electric circuit exercises and projects without the requirement for expensive laboratory infrastructure. Example demonstrations of introductory laboratory exercises using our platform are shown.
Aside from pure intellectual interest, why do we teach our students parallel computing? Most people would agree that the primary goal is to produce greater application performance. Yet students frequently parallelize ...
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
Aside from pure intellectual interest, why do we teach our students parallel computing? Most people would agree that the primary goal is to produce greater application performance. Yet students frequently parallelize code only to discover that it runs disappointingly slower because they don't understand performance. To exploit parallelism effectively, it must operate synergistically with a host of other techniques, including caching, vectorization, algorithms, bit tricks, loop unrolling, using compiler switches, tailoring code to the architecture, exploiting sparsity, changing data representation, metaprogramming, etc. Software performance engineering, which encompasses these techniques, is the science and art of making code run fast or otherwise limiting its consumption of resources, such as energy, memory footprint, network utilization, response time, etc. In this talk, I will argue that the end of Moore's Law makes software performance engineering a critical skill for our students to learn.
Regularized system identification has become the research frontier of system identification in the past *** related core subject is to study the convergence properties of various hyper-parameter estimators as the samp...
详细信息
Regularized system identification has become the research frontier of system identification in the past *** related core subject is to study the convergence properties of various hyper-parameter estimators as the sample size goes to *** this paper,we consider one commonly used hyper-parameter estimator,the empirical Bayes(EB).Its convergence in distribution has been studied,and the explicit expression of the covariance matrix of its limiting distribution has been ***,what we are truly interested in are factors contained in the covariance matrix of the EB hyper-parameter estimator,and then,the convergence of its covariance matrix to that of its limiting distribution is *** general,the convergence in distribution of a sequence of random variables does not necessarily guarantee the convergence of its covariance ***,the derivation of such convergence is a necessary complement to our theoretical analysis about factors that influence the convergence properties of the EB hyper-parameter *** this paper,we consider the regularized finite impulse response(FIR)model estimation with deterministic inputs,and show that the covariance matrix of the EB hyper-parameter estimator converges to that of its limiting ***,we run numerical simulations to demonstrate the efficacy of ourtheoretical results.
In this work, we study the problem of deploying and operating correlated data-intensive vNF-SCs in inter-datacenter elastic optical networks. Requiring for a set of correlated data-intensive vNF-SCs, the service compl...
详细信息
暂无评论