Today's deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This...
详细信息
Today's deep learning models face an increasing demand to handle dynamic shape tensors and computation whose shape information remains unknown at compile time and varies in a nearly infinite range at runtime. This shape dynamism brings tremendous challenges for existing compilation pipelines designed for static models which optimize tensor programs relying on exact shape values. This paper presents TSCompiler, an end-to-end compilation framework for dynamic shape models. TSCompiler first proposes a symbolic shape propagation algorithm to recover symbolic shape information at compile time to enable subsequent optimizations. TSCompiler then partitions the shape-annotated computation graph into multiple subgraphs and fine-tunes the backbone operators from the subgraph within a hardware-aligned search space to find a collection of high-performance schedules. TSCompiler can propagate the explored backbone schedule to other fusion groups within the same subgraph to generate a set of parameterized tensor programs for fused cases based on dependence analysis. At runtime, TSCompiler utilizes an occupancy-targeted cost model to select from pre-compiled tensor programs for varied tensor shapes. Extensive evaluations show that TSCompiler can achieve state-of-the-art speedups for dynamic shape models. For example, we can improve kernel efficiency by up to 3.97× on NVIDIA RTX3090, and 10.30× on NVIDIA A100 and achieve up to five orders of magnitude speedups on end-to-end latency.
In blockchain-based unmanned aerial vehicle(UAV)communication systems,the length of a block affects the performance of the *** transmission performance of blocks in the form of finite character segments is also affect...
详细信息
In blockchain-based unmanned aerial vehicle(UAV)communication systems,the length of a block affects the performance of the *** transmission performance of blocks in the form of finite character segments is also affected by the block ***,it is crucial to balance the transmission performance and blockchain performance of blockchain communication systems,especially in wireless environments involving *** paper investigates a secure transmission scheme for blocks in blockchain-based UAV communication systems to prevent the information contained in blocks from being completely eavesdropped during *** our scheme,using a friendly jamming UAV to emit jamming signals diminishes the quality of the eavesdropping channel,thus enhancing the communication security performance of the source *** the constraints of maneuverability and transmission power of the UAV,the joint design of UAV trajectories,transmission power,and block length are proposed to maximize the average minimum secrecy rate(AMSR).Since the optimization problem is non-convex and difficult to solve directly,we first decompose the optimization problem into subproblems of trajectory optimization,transmission power optimization,and block length ***,based on firstorder approximation techniques,these subproblems are reformulated as convex optimization ***,we utilize an alternating iteration algorithm based on the successive convex approximation(SCA)technique to solve these subproblems *** simulation results demonstrate that our proposed scheme can achieve secure transmission for blocks while maintaining the performance of the blockchain.
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggr...
详细信息
The effectiveness of modeling contextual information has been empirically shown in numerous computer vision tasks. In this paper, we propose a simple yet efficient augmented fully convolutional network(AugFCN) by aggregating content-and position-based object contexts for semantic ***, motivated because each deep feature map is a global, class-wise representation of the input,we first propose an augmented nonlocal interaction(AugNI) to aggregate the global content-based contexts through all feature map interactions. Compared to classical position-wise approaches, AugNI is more efficient. Moreover, to eliminate permutation equivariance and maintain translation equivariance, a learnable,relative position embedding branch is then supportably installed in AugNI to capture the global positionbased contexts. AugFCN is built on a fully convolutional network as the backbone by deploying AugNI before the segmentation head network. Experimental results on two challenging benchmarks verify that AugFCN can achieve a competitive 45.38% mIoU(standard mean intersection over union) and 81.9% mIoU on the ADE20K val set and Cityscapes test set, respectively, with little computational overhead. Additionally, the results of the joint implementation of AugNI and existing context modeling schemes show that AugFCN leads to continuous segmentation improvements in state-of-the-art context modeling. We finally achieve a top performance of 45.43% mIoU on the ADE20K val set and 83.0% mIoU on the Cityscapes test set.
Because of their advantages of high energy and power density,low self-discharge rate,and long lifespan,lithium-ion batteries(LIBs)have been widely used in many applications such as electric vehicles,energy storage sys...
详细信息
Because of their advantages of high energy and power density,low self-discharge rate,and long lifespan,lithium-ion batteries(LIBs)have been widely used in many applications such as electric vehicles,energy storage systems,smart grids,***,lithium-ion battery systems(LIBSs)frequently malfunction because of complex working conditions,harsh operating environment,battery inconsistency,and inherent defects in battery ***,safety of LIBSs has become a prominent problem and has attracted wide ***,efficient and accurate fault diagnosis for LIBs is very *** paper provides a comprehensive review of the latest research progress in fault diagnosis for ***,the types of battery faults are comprehensively introduced and the characteristics of each fault are ***,the fault diagnosis methods are systematically elaborated,including model-based,data processing-based,machine learning-based and knowledge-based *** latest research is discussed and existing issues and challenges are presented,while future developments are also *** aim is to promote further researches into efficient and advanced fault diagnosis methods for more reliable and safer LIBs.
Graph neural networks (GNNs) have gained increasing popularity, while usually suffering from unaffordable computations for real-world large-scale applications. Hence, pruning GNNs is of great need but largely unexplor...
详细信息
Graph neural networks (GNNs) have gained increasing popularity, while usually suffering from unaffordable computations for real-world large-scale applications. Hence, pruning GNNs is of great need but largely unexplored. The recent work Unified GNN Sparsification (UGS) studies lottery ticket learning for GNNs, aiming to find a subset of model parameters and graph structures that can best maintain the GNN performance. However, it is tailed for the transductive setting, failing to generalize to unseen graphs, which are common in inductive tasks like graph classification. In this work, we propose a simple and effective learning paradigm, Inductive Co-Pruning of GNNs (ICPG), to endow graph lottery tickets with inductive pruning capacity. To prune the input graphs, we design a predictive model to generate importance scores for each edge based on the input. To prune the model parameters, it views the weight’s magnitude as their importance scores. Then we design an iterative co-pruning strategy to trim the graph edges and GNN weights based on their importance scores. Although it might be strikingly simple, ICPG surpasses the existing pruning method and can be universally applicable in both inductive and transductive learning settings. On 10 graph-classification and two node-classification benchmarks, ICPG achieves the same performance level with 14.26%–43.12% sparsity for graphs and 48.80%–91.41% sparsity for the GNN model.
Direct volume rendering(DVR)is a technique that emphasizes structures of interest(SOIs)within a volume visually,while simultaneously depicting adjacent regional information,e.g.,the spatial location of a structure con...
详细信息
Direct volume rendering(DVR)is a technique that emphasizes structures of interest(SOIs)within a volume visually,while simultaneously depicting adjacent regional information,e.g.,the spatial location of a structure concerning its *** DVR,transfer function(TF)plays a key role by enabling accurate identification of SOIs interactively as well as ensuring appropriate visibility of *** generation typically involves non-intuitive trial-and-error optimization of rendering parameters,which is time-consuming and *** at mitigating this manual process have led to approaches that make use of a knowledge database consisting of pre-designed TFs by domain *** these approaches,a user navigates the knowledge database to find the most suitable pre-designed TF for their input volume to visualize the *** these approaches potentially reduce the workload to generate the TFs,they,however,require manual TF navigation of the knowledge database,as well as the likely fine tuning of the selected TF to suit the *** this work,we propose a TF design approach,CBR-TF,where we introduce a new content-based retrieval(CBR)method to automatically navigate the knowledge *** of pre-designed TFs,our knowledge database contains volumes with SOI *** an input volume,our CBR-TF approach retrieves relevant volumes(with SOI labels)from the knowledge database;the retrieved labels are then used to generate and optimize TFs of the *** approach largely reduces manual TF navigation and fine *** our CBR-TF approach,we introduce a novel volumetric image feature which includes both a local primitive intensity profile along the SOIs and regional spatial semantics available from the co-planar images to the *** the regional spatial semantics,we adopt a convolutional neural network to obtain high-level image feature *** the intensity profile,we extend the dynamic time warping technique to address subtle alignment
Bat Algorithm (BA) is a nature-inspired metaheuristic search algorithm designed to efficiently explore complex problem spaces and find near-optimal solutions. The algorithm is inspired by the echolocation behavior of ...
详细信息
To address the matching problem caused by the significant differences in spatial features, spectrum and contrast between heterologous images, a heterologous image matching method based on salience region is proposed i...
详细信息
Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and *** is well known that previous VPR ...
详细信息
Visual Place Recognition(VPR)technology aims to use visual information to judge the location of agents,which plays an irreplaceable role in tasks such as loop closure detection and *** is well known that previous VPR algorithms emphasize the extraction and integration of general image features,while ignoring the mining of salient features that play a key role in the discrimination of VPR *** this end,this paper proposes a Domain-invariant Information Extraction and Optimization Network(DIEONet)for *** core of the algorithm is a newly designed Domain-invariant Information Mining Module(DIMM)and a Multi-sample Joint Triplet Loss(MJT Loss).Specifically,DIMM incorporates the interdependence between different spatial regions of the feature map in the cascaded convolutional unit group,which enhances the model’s attention to the domain-invariant static object *** Loss introduces the“joint processing of multiple samples”mechanism into the original triplet loss,and adds a new distance constraint term for“positive and negative”samples,so that the model can avoid falling into local optimum during *** demonstrate the effectiveness of our algorithm by conducting extensive experiments on several authoritative *** particular,the proposed method achieves the best performance on the TokyoTM dataset with a Recall@1 metric of 92.89%.
Session-based recommendation is a popular research topic that aims to predict users’next possible interactive item by exploiting anonymous *** existing studies mainly focus on making predictions by considering users...
详细信息
Session-based recommendation is a popular research topic that aims to predict users’next possible interactive item by exploiting anonymous *** existing studies mainly focus on making predictions by considering users’single interactive *** recent efforts have been made to exploit multiple interactive behaviors,but they generally ignore the influences of different interactive behaviors and the noise in interactive *** address these problems,we propose a behavior-aware graph neural network for session-based ***,different interactive sequences are modeled as directed ***,the item representations are learned via graph neural ***,a sparse self-attention module is designed to remove the noise in behavior ***,the representations of different behavior sequences are aggregated with the gating mechanism to obtain the session *** results on two public datasets show that our proposed method outperforms all competitive *** source code is available at the website of GitHub.
暂无评论