Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neuralnetwork. This paper proposes to enhance EEND by using multi-channel signals from distributed m...
详细信息
ISBN:
(纸本)9781665405409
Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neuralnetwork. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multichannel input: spatio-temporal and co-attention encoders. Both are independent of the number and geometry of microphones and suitable for distributed microphone settings. We also propose a model adaptation method using only single-channel recordings. With simulated and real-recorded datasets, we demonstrated that the proposed method outperformed conventional EEND when a multi-channel input was given while maintaining comparable performance with a single-channel input. We also showed that the proposed method performed well even when spatial information is inoperative given multi-channel inputs, such as in hybrid meetings in which the utterances of multiple remote participants are played back from the same loudspeaker.
Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints c...
ISBN:
(纸本)9781713871088
Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large d-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit learning problem, where the agents who observe the contexts and take actions are geographically separated from the learner who performs the learning while not seeing the contexts. We assume that contexts are generated from a distribution and propose a method that uses approximate to 5d bits per context for the case of unknown context distribution and 0 bits per context if the context distribution is known, while achieving nearly the same regret bound as if the contexts were directly observable. The former bound improves upon existing bounds by a log(T) factor, where T is the length of the horizon, while the latter achieves information theoretical tightness.
The distributed dynamic network is vulnerable to scanning attacks due to the openness of wireless channels. Traditional defense systems tend to be passive and exhibit delayed responses. A moving target defense approac...
详细信息
Brain-Computer Interface (BCI) systems create a bridge between the human brain and the outside world, potentially rendering traditional methods of information transmission obsolete in the not-so-distant future. One of...
详细信息
Brain-Computer Interface (BCI) systems create a bridge between the human brain and the outside world, potentially rendering traditional methods of information transmission obsolete in the not-so-distant future. One of the key research areas in BCI is the classification of brain activity in electroencephalographic (EEG) data. On the other hand, new memory-augmented neuralnetworks, such as the neural Turing Machine (NTM) and the Differentiable neural Computer (DNC), have demonstrated their impressive abilities in solving complex tasks. Therefore, it is useful to evaluate the capability of memory-augmented neuralnetworks to enhance the classification of brain activity within EEG signals. Previous methods have suffered from low accuracy and generalizability in classifying brain activities;primarily due to a lack of proper classification of Motor Imagery/ Execution brain activities, an inability to extract valuable information at different time steps in time series data, and a failure to learn from longer dependencies. This article introduces TDMANN (Time-distributed Memory Augmented neuralnetwork), a framework that leverages the principles of NTM and DNC for the binary classification of brain activities in EEG signals. The controller component of the memory-augmented neuralnetwork is enhanced with a time-distributed approach, which significantly improves the performance of the network in binary classification tasks involving motor imagery/execution brain activities by extracting valuable information at each time step. The benchmark datasets used in this study are EEGmmidb BCI2000 (Imagery/Execution), BCI IV 2B, and BCI IV 2A, all containing motor imagery/execution brain activity data in EEG format. The results demonstrate that the classification accuracy achieved by the proposed DNC@TDMANN method exhibits a maximum improvement of 23.03% compared to baseline research works. The NTM@TDMANN method also shows a maximum improvement accuracy of 22.5%.
DSP holds significant potential for important applications in Deep neuralnetworks. However, there is currently a lack of research focused on shared-memory CPU-DSP heterogeneous chips. This paper proposes CD-Sched, an...
详细信息
Image captioning is a challenging task in artificial intelligence that involves generating descriptive captions for images automatically. In this project, we propose a novel approach leveraging advanced technologies s...
详细信息
作者:
Gan, JiaqiXiao, YueyuZhang, AndongShanghai Univ
Key Lab Specialty Fiber Opt Opt Access Networks Sh Shanghai 200444 Peoples R China Shanghai Univ
Joint Int Res Lab Specialty Fiber Opt & Adv Commun Shanghai 200444 Peoples R China Shanghai Univ
Inst Fiber Opt Shanghai 200444 Peoples R China
Thanks to the development of artificial intelligence algorithms, the event recognition of distributed optical fiber sensing systems has achieved high classification accuracy on many deep learning models. However, the ...
详细信息
Thanks to the development of artificial intelligence algorithms, the event recognition of distributed optical fiber sensing systems has achieved high classification accuracy on many deep learning models. However, the large-scale samples required for the deep learning networks are difficult to collect for the optical fiber vibration sensing systems in actual scenarios. An overfitting problem due to insufficient data in the network training process will reduce the classification accuracy. In this paper, we propose a fused feature extract method suitable for the small dataset of 40-OTDR systems. The high-dimensional features of signals in the frequency domain are extracted by a transfer learning method based on the VGGish framework. Combined with the characteristics of 12 different acquisition points in the space, the spatial distribution characteristics of the signal can be reflected. Fused with the spatial and temporal features, the features undergo a sample feature correction algorithm and are used in a SVM classifier for event recognition. Experimental results show that the VGGish, a pre-trained convolutional network for audio classification, can extract the knowledge features of 40-OTDR vibration signals more efficiently. The recognition accuracy of six types of intrusion events can reach 95.0% through the corrected multi-domain features when only 960 samples are used as the training set. The accuracy is 17.7% higher than that of the single channel trained on VGGish without fine-tuning. Compared to other CNNs, such as ResNet, the feature extract method proposed can improve the accuracy by at least 4.9% on the same dataset. (c) 2024 Optica Publishing Group. All rights, including for text and data mining (TDM), Artificial Intelligence (AI) training, and similar technologies, are reserved.
Precise binary code vulnerability detection is a significant research topic in software security. Currently, the majority of software is released in binary form, and the corresponding vulnerability detection approache...
详细信息
ISBN:
(纸本)9781665464970
Precise binary code vulnerability detection is a significant research topic in software security. Currently, the majority of software is released in binary form, and the corresponding vulnerability detection approaches for binary code are desired. Existing deep learning-based detection techniques can only detect binary code vulnerabilities but cannot precisely identify the types of vulnerabilities. This paper proposes a Binary code-based Hybrid neuralnetwork for Multiclass Vulnerability Detection, dubbed BHMVD. BHMVD generates binary slices according to the control dependence and data dependence of library/API function calls, and then extracts syntax features from binary slices to generate type slices, which can help identify vulnerability types. This paper uses a hybrid neuralnetwork of CNN-BLSTM to extract vulnerability features from binary and type slices. The former extracts local features, while the latter extracts global features. Experiment results on 19 types of vulnerabilities show that BHMVD is effective for binary code-based multiclass vulnerability detection, and using a hybrid neuralnetwork can improve detection ability.
With the widespread application of deep learning (DL) technology in the modern Internet of Things (IoT) areas such as autonomous driving, smart cities and homes, embedded real-time systems are increasingly used at the...
详细信息
In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observati...
详细信息
In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neuralnetwork (CNN/GNN), addressing agent-level visual perception and feature learning, as well as swarm-level communication, local information aggregation and agent action inference, respectively. By jointly training the CNN and GNN, image features and communication messages are learned in conjunction to better address the specific task. We use imitation learning to train the VGAI controller in an offline phase, relying on a centralized expert controller. This results in a learned VGAI controller that can be deployed in a distributed manner for online execution. Additionally, the controller exhibits good scaling properties, with training in smaller teams and application in larger teams. Through a multi-agent flocking application, we demonstrate that VGAI yields performance comparable to or better than other decentralized controllers, using only the visual input modality and without accessing precise location or motion state information.
暂无评论