The article discusses an approach to decomposing a spatially distributed monitoring system (MS) into hierarchical levels. An original method for parameterizing the MS model for subsequent use in machine learning is pr...
详细信息
Indoor object detection and recognition present an active research axis in computer vision and artificial intelligence fields. Various deep learning-based techniques can be applied to solve object detection problems. ...
详细信息
Indoor object detection and recognition present an active research axis in computer vision and artificial intelligence fields. Various deep learning-based techniques can be applied to solve object detection problems. With the appearance of deep convolutional neuralnetworks (DCNN) a great breakthrough for various applications was achieved. Indoor object detection presents a primary task that can assist Blind and Visually Impaired persons (BVI) during their navigation. However, building a reliable indoor object detection system used for edge device implementations still presents a serious challenge. To address this problem, we propose in this work to build an indoor object detection system based on DCNN network. Cross-stage partial network (CSPNet) was used for the detection process and a lightweight backbone based on EfficientNet v2 was used as a network backbone. To ensure a lightweight implementation of the proposed work on FPGA devices, various optimization techniques have been applied to compress the model size and reduce its computation complexity. The proposed indoor object detection system was implemented on a Xilinx ZCU 102 board. Training and testing experiments have been conducted on the proposed indoor objects dataset that counts 11,000 images containing 25 landmark classes and in indoor objects detection dataset. The proposed work achieved 82.60 mAP and 28 FPS for the original version and 80.04 with 35 FPS as processing speed for the compressed version.
At present, most of the building components, technologies and frameworks of deep learning are based on convolutional networks. However, some deep learning studies on image processing have shown that the capsule networ...
详细信息
At present, most of the building components, technologies and frameworks of deep learning are based on convolutional networks. However, some deep learning studies on image processing have shown that the capsule network can be more representational because it can capture various 'posture' changes, including translation, rotation and scaling, and can remember the position relationship between parts. Despite the intriguing nature of the capsule network and its potential to open up entirely new natural language processing architectures, little work has been done in this area. In this work, we use the capsule network to learn the content text of the item (such as the plot text of the movie or the description document of the product), to obtain a better representation of the item and help achieve a more accurate recommendation. We proposed 'leveraging the capsule network to learn content text for collaborative filtering (CCCF)'. This model combines the capsule network and neural matrix factorisation to effectively model text data and user-item ratings. Experiments conducted from different perspectives on two popular datasets show that CCCF achieves good performance in common recommendation tasks, which proves the effectiveness of the capsule network in recommendation.
The integration of deep neuralnetwork (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrai...
详细信息
The integration of deep neuralnetwork (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.
Recent research on emotion recognition suggests that deep network-based adversarial learning has an ability to solve the cross-subject problem of emotion recognition. This study constructed a hearing-impaired electroe...
详细信息
Recent research on emotion recognition suggests that deep network-based adversarial learning has an ability to solve the cross-subject problem of emotion recognition. This study constructed a hearing-impaired electroencephalography (EEG) emotion dataset containing three emotions (positive, neutral, and negative) in 15 subjects. The emotional domain adversarial neuralnetwork (EDANN) was carried out to identify hearing-impaired subjects' emotions by learning hidden emotion information between the labeled data and the data with no-label. For the input data, we propose a spatial filter matrix to reduce the overfitting of the training data. A feature extraction network 3DLSTM-ConvNET was used to extract comprehensive emotional information from the time, frequency, and spatial dimensions. Moreover, emotion local domain discriminator and emotion film group local domain discriminator were added to reduce the distribution distance between the same kinds of emotions and different film groups, respectively. According to the experimental results, the average accuracy of subject-dependent is 0.984 (STD: 0.011), and that of subject-independent is 0.679 (STD: 0.140). In addition, by analyzing the discrimination characteristics, we found that the brain regions with emotional recognition in the hearing-impaired are distributed in the wider areas of the parietal and occipital lobes, which may be caused by visual processing.
Graph Convolutional networks (GCNs) are widely successful architectures for performing deep learning on graphs, but their well-known scalability challenges have led to increased interest to develop both improved algor...
详细信息
ISBN:
(纸本)9798350311990
Graph Convolutional networks (GCNs) are widely successful architectures for performing deep learning on graphs, but their well-known scalability challenges have led to increased interest to develop both improved algorithms and hardware accelerators. In this paper, we present and evaluate GraphSAGE-Sparse, a variant of the paradigmatic GraphSAGE GCN that replaces the original's spatial-based node convolution operation with a minibatch-aware sparse matrix multiply (SpMM) kernel. We find that this modification substantially reduces the per-batch memory cost for training and inference on a GPU accelerator, with the tradeoff of increased time and memory needed to preprocess the data structures used by the sparse kernel. On comparing both algorithms with datasets from the Open Graph Benchmark, we find that GraphSAGE-Sparse is able to obtain improved accuracy predictions in less than half of the total training time, even with the additional preprocessing work.
High costs primarily pose challenges to forest management in planning and executing the repair of forest roads. With budget limitations and inadequate oversight, it has become critically essential to monitor the state...
详细信息
High costs primarily pose challenges to forest management in planning and executing the repair of forest roads. With budget limitations and inadequate oversight, it has become critically essential to monitor the state of these roads. Monitoring the condition of forest roads has become imperative, driven by budget constraints and a lack of effective supervision. While smartphones have proven effective in detecting road defects on public roads, their application on forest roads is hindered by the absence of suitable indices and software infrastructure. Addressing this gap, this research focuses on the development of the Forest Road Pavement Condition Index (FRPCI) to facilitate smartphone-based monitoring. We collected and compared data from 4 kilometers of forest roads, employing two traditional harvesting methods alongside smartphone sensor data. Utilizing deep learning methods, including Convolutional neuralnetwork (CNN), Long-Short Term Memory (LSTM), and CNN-LSTM, we processed the collected data. Signal processing using GPS data, coupled with wavelet transformation, demonstrated promising results with an accuracy and recall exceeding 80%. The proposed system functions as a distributed information system, transitioning data from organizational mode to field mode. It measures damage, assesses forest road conditions, and leverages image processing and GPS technologies. This monitoring system technology offers capabilities for preparing, storing, updating, maintaining, and analyzing diverse information. Importantly, adopting this method can significantly reduce operating costs, making forest road monitoring for maintenance purposes more feasible.
Driven by the demands of deep learning, many hardware accelerators, including GPUs, have begun to include specialized tensor processing units to accelerate matrix operations. However, general-purpose GPU applications ...
详细信息
Driven by the demands of deep learning, many hardware accelerators, including GPUs, have begun to include specialized tensor processing units to accelerate matrix operations. However, general-purpose GPU applications that have little or no large dense matrix operations cannot benefit from these tensor units. This article proposes Tensorox, a framework that exploits the half-precision tensor cores available on recent GPUs for approximable, non deep learning applications. In essence, a shallow neuralnetwork is trained based on the input-output mapping of the function to be approximated. The key innovation in our implementation is the use of the small and dimension-restricted tensor operations in Nvidia GPUs to run multiple instances of the approximation neuralnetwork in parallel. With the proper scaling and training methods, our approximation yielded an overall accuracy that is higher than naively running the original programs with half-precision. Furthermore, Tensorox allows for the runtime adjustment of the degree of approximation. For the 10 benchmarks we tested, we achieved speedups from 2x to 112x compared to the original in single precision floating point, while maintaining the error caused by the approximation to below 10 percent in most applications.
This paper studies delayed stochastic algorithms for weakly convex optimization in a distributednetwork with workers connected to a master node. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient...
详细信息
ISBN:
(纸本)9781713899921
This paper studies delayed stochastic algorithms for weakly convex optimization in a distributednetwork with workers connected to a master node. Recently, Xu et al. 2022 showed that an inertial stochastic subgradient method converges at a rate of O(tau(max)/root K) which depends on the maximum information delay tau(max). In this work, we show that the delayed stochastic subgradient method (DSGD) obtains a tighter convergence rate which depends on the expected delay (tau) over bar. Furthermore, for an important class of composition weakly convex problems, we develop a new delayed stochastic prox-linear (DSPL) method in which the delays only affect the high-order term in the complexity rate and hence, are negligible after a certain number of DSPL iterations. In addition, we demonstrate the robustness of our proposed algorithms against arbitrary delays. By incorporating a simple safeguarding step in both methods, we achieve convergence rates that depend solely on the number of workers, eliminating the effect of the delay. Our numerical experiments further confirm the empirical superiority of our proposed methods.
Performing per-packet neuralnetwork (NN) inference on the network data plane is a promising approach for accurate and fast decision-making in computer network. However, data plane architecture like the Reconfigurable...
详细信息
ISBN:
(纸本)9783031697654;9783031697661
Performing per-packet neuralnetwork (NN) inference on the network data plane is a promising approach for accurate and fast decision-making in computer network. However, data plane architecture like the Reconfigurable Match Tables (RMT) pipeline has limited support for NN computation. Previous efforts have utilized the Binary Neuron network (BNN) as a compromise, but the accuracy loss of BNN is high. Inspired by the accuracy gain of the low-bit (2-bit and 4-bit) models, this paper proposes Athena. Athena can deploy the sparse low-bit quantization models on RMT. Compared with the BNN-based state-of-the-art, Athena achieves new Pareto frontier regarding model accuracy and inference latency.
暂无评论