Today's deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This...
详细信息
Today's deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This shape dynamism brings tremendous challenges for existing compilation pipelines designed for static models which optimize tensor programs relying on exact shape values. This paper presents TSCompiler, an end-to-end compilation framework for dynamic shape models. TSCompiler first proposes a symbolic shape propagation algorithm to recover symbolic shape information at compile time to enable subsequent optimizations. TSCompiler then partitions the shape-annotated computation graph into multiple subgraphs and fine-tunes the backbone operators from the subgraph within a hardware-aligned search space to find a collection of high-performance schedules. TSCompiler can propagate the explored backbone schedule to other fusion groups within the same subgraph to generate a set of parameterized tensor programs for fused cases based on dependence analysis. At runtime, TSCompiler utilizes an occupancy-targeted cost model to select from pre-compiled tensor programs for varied tensor shapes. Extensive evaluations show that TSCompiler can achieve state-of-the-art speedups for dynamic shape models. For example, we can improve kernel efficiency by up to 3.97× on NVIDIA RTX3090, and 10.30× on NVIDIA A100 and achieve up to five orders of magnitude speedups on end-to-end latency.
ChatGPT, an advanced language model powered by artificial intelligence, has emerged as a transformative tool in the field of education. This article explores the potential of ChatGPT in revolutionizing learning and co...
详细信息
Delay Tolerant Networks (DTNs) have the ability to make communication possible without end-to-end connectivity using store-carry-forward technique. Efficient data dissemination in DTNs is very challenging problem due ...
详细信息
Internet of Vehicles (IoV) integrates with various heterogeneous nodes, such as connected vehicles, roadside units, etc., which establishes a distributed network. Vehicles are managed nodes providing all the services ...
详细信息
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions i...
详细信息
Regular exercise is a crucial aspect of daily life, as it enables individuals to stay physically active, lowers thelikelihood of developing illnesses, and enhances life expectancy. The recognition of workout actions in videostreams holds significant importance in computer vision research, as it aims to enhance exercise adherence, enableinstant recognition, advance fitness tracking technologies, and optimize fitness routines. However, existing actiondatasets often lack diversity and specificity for workout actions, hindering the development of accurate recognitionmodels. To address this gap, the Workout Action Video dataset (WAVd) has been introduced as a significantcontribution. WAVd comprises a diverse collection of labeled workout action videos, meticulously curated toencompass various exercises performed by numerous individuals in different settings. This research proposes aninnovative framework based on the Attention driven Residual Deep Convolutional-Gated Recurrent Unit (ResDCGRU)network for workout action recognition in video streams. Unlike image-based action recognition, videoscontain spatio-temporal information, making the task more complex and challenging. While substantial progresshas been made in this area, challenges persist in detecting subtle and complex actions, handling occlusions,and managing the computational demands of deep learning approaches. The proposed ResDC-GRU Attentionmodel demonstrated exceptional classification performance with 95.81% accuracy in classifying workout actionvideos and also outperformed various state-of-the-art models. The method also yielded 81.6%, 97.2%, 95.6%, and93.2% accuracy on established benchmark datasets, namely HMDB51, Youtube Actions, UCF50, and UCF101,respectively, showcasing its superiority and robustness in action recognition. The findings suggest practicalimplications in real-world scenarios where precise video action recognition is paramount, addressing the persistingchallenges in the field. TheWAVd datas
In this paper, a new approach for mining image association rules is presented, which involves the fine-tuned CNN model, as well as the proposed FIAR and OFIAR algorithms. Initially, the image transactional database is...
详细信息
Effective task scheduling and resource allocation have become major problems as a result of the fast development of cloud computing as well as the rise of multi-cloud systems. To successfully handle these issues, we p...
详细信息
Introducing minor alloying elements is an effective strategy to improve the corrosion and mechanical properties of zirconium alloys for nuclear *** in-reactor service,external environment and stress can affect the dis...
详细信息
Introducing minor alloying elements is an effective strategy to improve the corrosion and mechanical properties of zirconium alloys for nuclear *** in-reactor service,external environment and stress can affect the distribution of alloying elements,substantially changing the degradation process of zirconium *** date,there is a lack of in-depth understanding of the interaction between creep and microchemistry ***,we conducted systematic transmission electron microscopy(TEM)and atom probe tomography(APT)investigations to address creep-induced redistribution of alloying elements in CZ1(Zr-Sn-Nb-Fe-Cr-Cu)zirconium alloy with different initial ***,Fe,Sn,and Cu are found to co-segregate at grain *** higher the intermediate annealing temperature,the larger the Gibbsian interfacial excesses of solute elements *** further demonstrate that creep can reduce the excess value of Fe at grain boundaries due to the coarsening of Zr-Fe-Cr second phase particles via grain boundary and dislocation pipe *** the same time,the excess value of Sn is increased by diffusing from the matrix to grain ***,Cu as a minor element in the concentration range of 0.05-0.3 wt.%is found to segregate at dislocations to form the Cottrell atmosphere and develop Cu-rich nanoclusters for suppressing dislocation *** new understanding of the segregation and clustering of minor alloying elements provides guidance for developing zirconium alloys with enhanced creep resistance.
In blockchain networks, transactions can be transmitted through channels. The existing transmission methods depend on their routing information. If a node randomly chooses a channel to transmit a transaction, the tran...
详细信息
In blockchain networks, transactions can be transmitted through channels. The existing transmission methods depend on their routing information. If a node randomly chooses a channel to transmit a transaction, the transmission may be aborted due to insufficient funds(also called balance) or a low transmission rate. To increase the success rate and reduce transmission delay across all transactions, this work proposes a transaction transmission model for blockchain channels based on non-cooperative game *** balance, channel states, and transmission probability are fully considered. This work then presents an optimized channel transaction transmission algorithm. First, channel balances are analyzed and suitable channels are selected if their balance is sufficient. Second, a Nash equilibrium point is found by using an iterative sub-gradient method and its related channels are then used to transmit transactions. The proposed method is compared with two state-of-the-art approaches: Silent Whispers and Speedy Murmurs. Experimental results show that the proposed method improves transmission success rate, reduces transmission delay,and effectively decreases transmission overhead in comparison with its two competitive peers.
The work proposes a methodology for five different classes of ECG signals. The methodology utilises moving average filter and discrete wavelet transformation for the remove of baseline wandering and powerline interfer...
详细信息
暂无评论