Training activations of deep neural networks occupy plenty of GPU memory, especially for large-scale deep neuralnetworks. However, the further development of deep neural networks is hampered by the limited GPU memory ...
详细信息
Training activations of deep neural networks occupy plenty of GPU memory, especially for large-scale deep neuralnetworks. However, the further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimal utilization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory in deep learning. As an emerging domain, several dilemmas remain: 1) The efficiency of recomputation is limited and swapping between GPU and CPU costs severe time delay;2) There still lacks a dynamic runtime memory manager of tensor swapping and tensor recomputation nowadays;3) Manually decisions for activations of training deep neural network require professional priors and experience. To remedy the above issues, we propose a novel memory manager named DELTA (Dynamic tEnsor offLoading and recompuTAtion). To the best of our knowledge, we are the first to propose a reasonable dynamic runtime manager on the combination of tensor swapping and tensor recomputation without user oversight. In DELTA, we firstly propose a filter algorithm to select the optimal tensors to be released out of GPU memory and secondly present a director algorithm to select a proper action for each of these tensors. Furthermore, prefetching and overlapping are deliberately considered to overcome the time cost caused by swapping and recomputing tensors. Experimental results show that DELTA not only saves 40%-70% of GPU memory, surpassing the state-of-the-art method to a great extent, but also gets comparable convergence results as the baseline with acceptable time delay. Also, DELTA gains 2.04× maximum batchsize when training ResNet-50 and 2.25× when training ResNet-101 compared with the baseline. Besides, comparisons between the swapping cost and recomputation cost in our experiments demonstrate the importance of making a reasonable decision on tensor swapping and tensor recomputation, which refutes the arguments in
Traffic sign recognition systems have been applied to advanced driving assistance and automatic driving systems to help drivers obtain important road information accurately. The current mainstream detection methods ha...
详细信息
Traffic sign recognition systems have been applied to advanced driving assistance and automatic driving systems to help drivers obtain important road information accurately. The current mainstream detection methods have high accuracy in this task, but the number of model parameters is large, and the detection speed is slow. Based on YOLOv5s as the basic framework, this paper proposes YOLOv5S-A2, which can improve the detection speed and reduce the model size at the cost of reducing the detection accuracy. Firstly, a data augmentation strategy is proposed by combining various operations to alleviate the problem of unbalanced class instances. Secondly, we proposed a path aggregation module for Feature Pyramid Network (FPN) to make new horizontal connections. It can enhance multi-scale feature representation capability and compensate for the loss of feature information. Thirdly, an attention detection head module is proposed to solve the aliasing effect in cross-scale fusion and enhance the representation of predictive features. Experiments on Tsinghua-Tencent 100K dataset (TT100K) show that our method can achieve more remarkable performance improvement and faster inference speed than other advanced technologies. Our method achieves 87.3% mean average precision (mAP), surpassing the original model's 7.9%, and the frames per second (FPS) value is maintained at 87.7. To show generality, we tested it on the German Traffic Sign Detection Benchmark (GTSDB) without tuning and obtained an average precision of 94.1%, and the FPS value is maintained at about 105.3. In addition, the number of YOLOv5s-A2 parameters is about 7.9 M.
Drone formation flights, as performed in the Intel Drone Shows, demonstrate the current state of technology fascinatingly. We revisit this idea using the paradigm of self-organization in the form of swarm behavior. Ap...
详细信息
ISBN:
(纸本)9783031751066;9783031751073
Drone formation flights, as performed in the Intel Drone Shows, demonstrate the current state of technology fascinatingly. We revisit this idea using the paradigm of self-organization in the form of swarm behavior. Applying swarm behavior for formation flight promises high scalability, robustness, and flexibility. Swarm behavior allows for impressive patterns where centrally coordinated approaches might reach their limits. In this paper, we propose PROTEASE(2.0) as an approach for parametrizable swarm behavior of the next level. Like its predecessor, PROTEASE(2.0) enables us to use a single generalized implementation for producing emergent effects by only adjusting parameters for the swarm members. Further, we now facilitate novel formations previously unattainable. New formations include parallel swarms interacting with each other, single swarms using multiple reference points enabling surprising flight patterns, and hierarchical swarm structures extending the possibilities even further. Our focus in this paper lies in the experimental evaluation of these concepts in simulated environments. In combination with successful pre-evaluations concerning swarm behavior using real drones, we confidently look towards future experiments also applying PROTEASE(2.0) in the real world.
Colorectal polyps can evolve into colon cancer over time. Early screening or detection of colon polyps using computer-aided detection (CAD) techniques along with removal of the polyps can lower the risk of colon cance...
详细信息
Fingerprinting is a practical technology for improving Wi-Fi-based positioning in complex indoor environments. However, the laborious and costly nature of site surveys hinders the creation of accurate fingerprints. In...
详细信息
Fingerprinting is a practical technology for improving Wi-Fi-based positioning in complex indoor environments. However, the laborious and costly nature of site surveys hinders the creation of accurate fingerprints. In this article, we propose virtual feature maps and contrastive learning-enhanced indoor positioning (VF-CLIP), a novel method for indoor positioning based on received signal strength (RSS), aiming at reducing the repetitive site survey overhead caused by the dynamic provisioning of access points (APs). VF-CLIP uses a deep neural network fine-tuning technique to reconstruct the fingerprints by incorporating newly detected APs. The proposed method converts raw RSS indicator (RSSI) queries into multiple virtual feature maps (VFMs), which capture the differential similarities between the query vector and the fingerprints from virtual observational reference points (RPs). A depth-wise Transformer (DepTrans) is then employed to learn the directional spatial relations of these virtual features. Subsequently, contrastive learning is applied to compress the features into a latent space, where the feature distribution at a RP becomes compacted. We evaluated VF-CLIP on four public datasets and a fingerprint dataset collected within Huazhong University of science and Technology, comparing its performance with other state-of-the-art methods. The experimental results demonstrated the effectiveness of VF-CLIP in terms of positioning accuracy and its adaptability to varying AP configurations, suggesting its potential applicability in real-world environments.
The open and free nature of online platforms presents challenges for tracing malicious information. To address this, we propose a traceability model based on neighborhood similarity and multitype interaction. First, w...
详细信息
The open and free nature of online platforms presents challenges for tracing malicious information. To address this, we propose a traceability model based on neighborhood similarity and multitype interaction. First, we propose neighborhood similarity algorithms (D-NTC) to address the universality of malicious information dissemination. This algorithm evaluates the impact of user node importance on malicious information propagation by combining node degrees and the topological overlap of neighboring nodes. Second, we consider the interactive nature of multiple types of elements in the network and construct an interactive module based on user-path-malicious information. This module effectively captures the mutual influence relationships among diverse elements. Additionally, we employ representation learning to optimize the transition probability matrix between elements, leveraging hidden relationships to further characterize their interactive impact. Finally, we propose the NSMTI-Rank algorithm, which tackles the complexity of quantifying the influence of multiple types of elements. Drawing inspiration from mutual reinforcement effects, NSMTI-Rank comprehensively quantifies element influence through an iterative framework. Experimental results demonstrate the effectiveness of our approach in mining user node importance and capturing the interaction information among diverse elements in the network. Moreover, it enables the timely and effective identification of sources of malicious information dissemination.
In the software industry, software Reliability Growth Models (SRGMs) with confidence intervals (C.I.) are frequently employed as valuable tools to assist manager to determine the optimal timing for software releases a...
详细信息
In order to achieve the rationality and economy of shear wall layout, an improved algorithm was designed in the architectural design program. The improved algorithm is based on the basic framework of genetic algorithm...
详细信息
In order to achieve the rationality and economy of shear wall layout, an improved algorithm was designed in the architectural design program. The improved algorithm is based on the basic framework of genetic algorithm and particle swarm optimization algorithm, first adjusting the inertia weight, and then introducing elimination mechanism and mutation rate control. A shear wall design model was constructed using an improved algorithm, which was applied to determine the layout of shear walls in a 28 story high-rise building in a certain city. The example results show that when using the designed shear wall design program for scheme design, the success rate reaches 100 %, which is 38.47 % higher than the original particle swarm optimization algorithm. The obtained optimization scheme has interlayer displacement angles of 1/2096 and 1/1800 in the vertical and horizontal directions, respectively, while the torsional displacement ratio in both directions is 1.0908 and the torsional period ratio is 0.7125. After optimizing the algorithm, the length of the shear wall material was saved by 10.97 %, effectively reducing the use of materials. This not only reduces construction costs, but also brings higher space utilization efficiency. The building design scheme obtained from this study not only meets national standards, but also has lower computational time costs. This study demonstrates the potential application of this design algorithm in solving traditional architectural design problems. This not only provides new tools for the field of architectural design, but also stimulates more interdisciplinary cooperation, integrating computer science, artificial intelligence technology more closely with building engineering.
Recently, the IEEE 802.15.4 time-slotted channel hopping (TSCH) has been considered one of the emerging medium access control (MAC) protocols for the low-power and highly reliable wireless powered sensor networks (WPS...
详细信息
Text classification is a fundamental task in web content mining. Although the existing supervised contrastive learning (SCL) approach combined with pre-trained language models (PLMs) has achieved leading performance i...
详细信息
暂无评论