With the emergence of the Transformer architecture, the accuracy of deep learning within the domain of facial emotion recognition has seen further enhancement. However, Transformer comes with increased training comple...
详细信息
Data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er soun...
详细信息
Data race is one of the most important concurrent anomalies in multi-threaded *** con-straint-based techniques are leveraged into race detection,which is able to find all the races that can be found by any oth-er sound race ***,this constraint-based approach has serious limitations on helping programmers analyze and understand data ***,it may report a large number of false positives due to the unrecognized dataflow propa-gation of the ***,it recommends a wide range of thread context switches to schedule the reported race(in-cluding the false one)whenever this race is exposed during the constraint-solving *** ad hoc recommendation imposes too many context switches,which complicates the data race *** address these two limitations in the state-of-the-art constraint-based race detection,this paper proposes DFTracker,an improved constraint-based race detec-tor to recommend each data race with minimal thread context ***,we reduce the false positives by ana-lyzing and tracking the dataflow in the *** this means,DFTracker thus reduces the unnecessary analysis of false race *** further propose a novel algorithm to recommend an effective race schedule with minimal thread con-text switches for each data *** experimental results on the real applications demonstrate that 1)without removing any true data race,DFTracker effectively prunes false positives by 68%in comparison with the state-of-the-art constraint-based race detector;2)DFTracker recommends as low as 2.6-8.3(4.7 on average)thread context switches per data race in the real world,which is 81.6%fewer context switches per data race than the state-of-the-art constraint based race ***,DFTracker can be used as an effective tool to understand the data race for programmers.
Image inpainting aims to restore a realistic image from a damaged or incomplete version. Although Transformer-based methods have achieved impressive results by modeling long-range dependencies, the inherent quadratic ...
详细信息
Edge-assisted mobile crowdsensing(EMCS)has gained significant attention as a data collection ***,existing incentive mechanisms in EMCS systems rely on centralized platforms,making them impractical for the decentralize...
详细信息
Edge-assisted mobile crowdsensing(EMCS)has gained significant attention as a data collection ***,existing incentive mechanisms in EMCS systems rely on centralized platforms,making them impractical for the decentralized nature of EMCS *** address this limitation,we propose CHASER,an incentive mechanism designed for blockchain-based EMCS(BEMCS)*** fact,CHASER can attract more participants by satisfying the incentive requirements of budget balance,double-side truthfulness,double-side individual rationality and also high social ***,the proposed BEMCS system with CHASER in smart contracts guarantees the data confidentiality by utilizing an asymmetric encryption scheme,and the anonymity of participants by applying the zero-knowledge succinct non-interactive argument of knowledge(zk-SNARK).This also restrains the malicious behaviors of ***,most simulations show that the social welfare of CHASER is increased by approximately when compared with the state-of-the-art ***,CHASER achieves a competitive ratio of approximately 0.8 and high task completion rate of over 0.8 in large-scale *** findings highlight the robustness and desirable performance of CHASER as an incentive mechanism within the BEMCS system.
Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory *** previous proposals usually...
详细信息
Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory *** previous proposals usually migrate data at a granularity of 4 KB pages,and thus waste memory bandwidth and DRAM *** this paper,we propose Mocha,a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically,but manages them in a cache/memory *** the commercial NVM device-Intel Optane DC Persistent Memory Modules(DCPMM)actually access the physical media at a granularity of 256 bytes(an Optane block),we manage the DRAM cache at the 256-byte size to adapt to this feature of *** design not only enables fine-grained data migration and management for the DRAM cache,but also avoids write amplification for Intel Optane *** also create an Indirect Address Cache(IAC)in Hybrid Memory Controller(HMC)and propose a reverse address mapping table in the DRAM to speed up address translation and cache ***,we exploit a utility-based caching mechanism to filter cold blocks in the NVM,and further improve the efficiency of the DRAM *** implement Mocha in an architectural *** results show that Mocha can improve application performance by 8.2%on average(up to 24.6%),reduce 6.9%energy consumption and 25.9%data migration traffic on average,compared with a typical hybrid memory architecture-HSCC.
Domain adaptive semantic segmentation enables robust pixel- wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and sto...
详细信息
Domain adaptive semantic segmentation enables robust pixel- wise understanding in real-world driving scenes. Source-free domain adaptation, as a more practical technique, addresses the concerns of data privacy and storage limitations in typical unsupervised domain adaptation methods, making it especially relevant in the context of intelligent vehicles. It utilizes a well-trained source model and unlabeled target data to achieve adaptation in the target domain. However, in the absence of source data and target labels, current solutions cannot sufficiently reduce the impact of domain shift and fully leverage the information from the target data. In this paper, we propose an end-to-end source-free domain adaptation semantic segmentation method via Importance-Aware and Prototype-Contrast (IAPC) learning. The proposed IAPC framework effectively extracts domain-invariant knowledge from the well-trained source model and learns domain-specific knowledge from the unlabeled target domain. Specifically, considering the problem of domain shift in the prediction of the target domain by the source model, we put forward an importance-aware mechanism for the biased target prediction probability distribution to extract domain-invariant knowledge from the source model. We further introduce a prototype-contrast strategy, which includes a prototype-symmetric cross-entropy loss and a prototype-enhanced cross-entropy loss, to learn target intra-domain knowledge without relying on labels. A comprehensive variety of experiments on two domain adaptive semantic segmentation benchmarks demonstrates that the proposed end-to-end IAPC solution outperforms existing state-of-the-art methods. The source code is publicly available at https://***/yihong-97/Source-free-IAPC. IEEE
A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical prop...
详细信息
A large mode area multi-core orbital angular momentum(OAM)transmission fiber is designed and optimized by neural network and optimization *** neural network model has been established first to predict the optical properties of multi-core OAM transmission fibers with high accuracy and speed,including mode area,nonlinear coefficient,purity,dispersion,and effective index *** the trained neural network model is combined with different particle swarm optimization(PSO)algorithms for automatic iterative optimization of multi-core structures *** to the structural advantages of multi-core fiber and the automatic optimization process,we designed a number of multi-core structures with high OAM mode purity(>95%)and ultra-large mode area(>3000µm^(2)),which is larger by more than an order of magnitude compared to the conventional ring-core OAM transmission fibers.
The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression ***,labeling large datasets demands significant human...
详细信息
The effectiveness of facial expression recognition(FER)algorithms hinges on the model’s quality and the availability of a substantial amount of labeled expression ***,labeling large datasets demands significant human,time,and financial *** active learning methods have mitigated the dependency on extensive labeled data,a cold-start problem persists in small to medium-sized expression recognition *** issue arises because the initial labeled data often fails to represent the full spectrum of facial expression *** paper introduces an active learning approach that integrates uncertainty estimation,aiming to improve the precision of facial expression recognition regardless of dataset scale *** method is divided into two primary ***,the model undergoes self-supervised pre-training using contrastive learning and uncertainty estimation to bolster its feature extraction ***,the model is fine-tuned using the prior knowledge obtained from the pre-training phase to significantly improve recognition *** the pretraining phase,the model employs contrastive learning to extract fundamental feature representations from the complete unlabeled *** features are then weighted through a self-attention mechanism with rank ***,data from the low-weighted set is relabeled to further refine the model’s feature extraction *** pre-trained model is then utilized in active learning to select and label information-rich samples more *** results demonstrate that the proposed method significantly outperforms existing approaches,achieving an improvement in recognition accuracy of 5.09%and 3.82%over the best existing active learning methods,Margin,and Least Confidence methods,respectively,and a 1.61%improvement compared to the conventional segmented active learning method.
Digital image processing (DIP) is the ability to manipulate digital photographs via algorithms for pattern detection, segmentation, enhancement, and noise reduction. In addition, the Internet of Things (IoT) acts as t...
详细信息
Today's deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This...
详细信息
Today's deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This shape dynamism brings tremendous challenges for existing compilation pipelines designed for static models which optimize tensor programs relying on exact shape values. This paper presents TSCompiler, an end-to-end compilation framework for dynamic shape models. TSCompiler first proposes a symbolic shape propagation algorithm to recover symbolic shape information at compile time to enable subsequent optimizations. TSCompiler then partitions the shape-annotated computation graph into multiple subgraphs and fine-tunes the backbone operators from the subgraph within a hardware-aligned search space to find a collection of high-performance schedules. TSCompiler can propagate the explored backbone schedule to other fusion groups within the same subgraph to generate a set of parameterized tensor programs for fused cases based on dependence analysis. At runtime, TSCompiler utilizes an occupancy-targeted cost model to select from pre-compiled tensor programs for varied tensor shapes. Extensive evaluations show that TSCompiler can achieve state-of-the-art speedups for dynamic shape models. For example, we can improve kernel efficiency by up to 3.97× on NVIDIA RTX3090, and 10.30× on NVIDIA A100 and achieve up to five orders of magnitude speedups on end-to-end latency.
暂无评论