The real-time fault diagnosis of High Voltage Gas Insulated Switches (HV-GIS) is critical for maintaining the safety and operational reliability of electrical grids. Conventional diagnostic techniques mainly depend on...
详细信息
Beyond 5G networks will operate at high frequencies with wide bandwidths. This brings both opportunities and challenges. Opportunities include high throughput connectivity with low latency. However, one of the main ch...
详细信息
ISBN:
(纸本)9781665451369
Beyond 5G networks will operate at high frequencies with wide bandwidths. This brings both opportunities and challenges. Opportunities include high throughput connectivity with low latency. However, one of the main challenges in these networks is due to the high path loss at these operating frequencies, which requires network to be deployed densely to provide coverage. Since these cells have small inter-site-distance (ISD), the dwell-time of the UEs in these cells are small, thus supporting mobility in these types of dense networks is a challenge and require frequent beam or cell reassignments. A pro-active mobility management scheme which exploits the historical trajectories can provide better prediction of cells and beams as UEs move in the coverage area. We propose an AI based method using sequence-to-sequence modeling for the estimation of handover cells/beams along with dwell-time using the trajectory information of the UE. Results indicate that for a dense deployment, an accuracy of more than 90 percent can be achieved for handover cell estimation and very low mean absolute error (MAE) for dwell-time.
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach, which utilizes text supervision during training. In this approach, we combine a bottle-neck...
详细信息
This paper proposes an any-to-many location-relative, sequence-to-sequence (seq2seq), non-parallel voice conversion approach, which utilizes text supervision during training. In this approach, we combine a bottle-neck feature extractor (BNE) with a seq2seq synthesis module. During the training stage, an encoder-decoder-based hybrid connectionist-temporal-classification-attention (CTC-attention) phoneme recognizer is trained, whose encoder has a bottle-neck layer. A BNE is obtained from the phoneme recognizer and is utilized to extract speaker-independent, dense and rich spoken content representations from spectral features. Then a multi-speaker location-relative attention based seq2seq synthesis model is trained to reconstruct spectral features from the bottle-neck features, conditioning on speaker representations for speaker identity control in the generated speech. To mitigate the difficulties of using seq2seq models to align long sequences, we down-sample the input spectral feature along the temporal dimension and equip the synthesis model with a discretized mixture of logistic (MoL) attention mechanism. Since the phoneme recognizer is trained with large speech recognition data corpus, the proposed approach can conduct any-to-many voice conversion. Objective and subjective evaluations show that the proposed any-to-many approach has superior voice conversion performance in terms of both naturalness and speaker similarity. Ablation studies are conducted to confirm the effectiveness of feature selection and model design strategies in the proposed approach. The proposed VC approach can readily be extended to support any-to-any VC (also known as one/few-shot VC), and achieve high performance according to objective and subjective evaluations.
We present Seq2SeqPy a lightweight toolkit for sequence-to-sequence modeling that prioritizes simplicity and ability to customize the standard architectures easily. The toolkit supports several known models such as Re...
详细信息
ISBN:
(纸本)9791095546344
We present Seq2SeqPy a lightweight toolkit for sequence-to-sequence modeling that prioritizes simplicity and ability to customize the standard architectures easily. The toolkit supports several known models such as Recurrent Neural Networks, Pointer Generator Networks, and transformer model. We evaluate the toolkit on two datasets and we show that the toolkit performs similarly or even better than a very widely used sequence-to-sequence toolkit.
Exploiting user trust information for developing a recommendation system has gained increasing research interest in recent years. Due to the exchange of opinions about items over the social network, trust plays a cruc...
详细信息
Exploiting user trust information for developing a recommendation system has gained increasing research interest in recent years. Due to the exchange of opinions about items over the social network, trust plays a crucial role for a user to like or dislike an item. Graph Neural Networks (GNNs), which have the intrinsic power of integrating node information and topological structure, have a high potential to advance the field of trust-aware social recommendation. However, as of now, this area is little explored, with most of the existing GNN-based models ignoring the trust propagation and trust composition properties. To address this issue, in this paper, we propose a novel GNN-based framework that can capture such trust propagation and trust composition aspects by incorporating the concept of 'user-reliability.' Our proposed user-reliability-aware social recommendation framework, termed as SoURA, generates the user-embedding and item-embedding with consideration to the user-reliability values, which, in turn, helps in better evaluation of the user trust. Experimental evaluations on the benchmark Ciao and Epinion datasets demonstrate the effectiveness of incorporating user-reliability for finding user-embedding and item embedding in a social recommendation system. The proposed SoURA is found to show a minimum of 25% improvement over the state-of-the-art GNN-based recommendation algorithms.
The expression and perception of human emotions are not uniformly distributed over time. Therefore, tracking local changes of emotion within a segment can lead to better models for speech emotion recognition (SER), ev...
详细信息
The expression and perception of human emotions are not uniformly distributed over time. Therefore, tracking local changes of emotion within a segment can lead to better models for speech emotion recognition (SER), even when the task is to provide a sentence-level prediction of the emotional content. A challenge to exploring local emotional changes within a sentence is that most existing emotional corpora only provide sentence-level annotations (i.e., one label per sentence). This labeling approach is not appropriate for leveraging the dynamic emotional trends within a sentence. We propose a framework that splits a sentence into a fixed number of chunks, generating chunk-level emotional patterns. The approach relies on emotion rankers to unveil the emotional pattern within a sentence, creating continuous emotional curves. Our approach trains the sentence-level SER model with a sequence-to-sequence formulation by leveraging the retrieved emotional curves. The proposed method achieves the best concordance correlation coefficient (CCC) prediction performance for arousal (0.7120), valence (0.3125), and dominance (0.6324) on the MSP-Podcast corpus. In addition, we validate the approach with experiments on the IEMOCAP and MSP-IMPROV databases. We further compare the retrieved curves with time-continuous emotional traces. The evaluation demonstrates that these retrieved chunk-label curves can effectively capture emotional trends within a sentence, displaying a time-consistency property that is similar to time-continuous traces annotated by human listeners. The proposed SER model learns meaningful, complementary, local information that contributes to the improvement of sentence-level predictions of emotional attributes.
This paper presents an integrated methodology to forecast the direction and magnitude of movements of lending rates in security markets. We develop a sequence-to-sequence (seq2seq) modeling framework that integrates f...
详细信息
ISBN:
(纸本)9798400702402
This paper presents an integrated methodology to forecast the direction and magnitude of movements of lending rates in security markets. We develop a sequence-to-sequence (seq2seq) modeling framework that integrates feature engineering, motif mining, and temporal prediction in a unified manner to perform forecasting at scale in real-time or near real-time. We have deployed this approach in a large custodial setting demonstrating scalability to a large number of equities as well as newly introduced IPO-based securities in highly volatile environments.
Trajectory prediction of vehicles in city-scale road networks is of great importance to various location-based applications such as vehicle navigation, traffic management, and location-based recommendations. Existing ...
详细信息
Trajectory prediction of vehicles in city-scale road networks is of great importance to various location-based applications such as vehicle navigation, traffic management, and location-based recommendations. Existing methods typically represent a trajectory as a sequence of grid cells, road segments or intention sets. None of them is ideal, as the cell-based representation ignores the road network structures and the other two are less efficient in analyzing city-scale road networks. Moreover, previous models barely leverage spatial dependencies or only consider them at the grid cell level, ignoring the non-Euclidean spatial structure shaped by irregular road networks. To address these problems, we propose a network-based vehicle trajectory prediction model named NetTraj, which represents each trajectory as a sequence of intersections and associated movement directions, and then feeds them into a LSTM encoder-decoder network for future trajectory generation. Furthermore, we introduce a local graph attention mechanism to capture network-level spatial dependencies of trajectories, and a temporal attention mechanism with a sliding context window to capture both short- and long-term temporal dependencies in trajectory data. Extensive experiments based on two real-world large-scale taxi trajectory datasets show that NetTraj outperforms the existing state-of-the-art methods for vehicle trajectory prediction, validating the effectiveness of the proposed trajectory representation method and spatiotemporal attention mechanisms.
We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-sequence (seq2seq) framework. A2O VC aims to convert any speaker, including those unseen during training, to a fixed target speake...
详细信息
ISBN:
(纸本)9781728176055
We present a novel approach to any-to-one (A2O) voice conversion (VC) in a sequence-to-sequence (seq2seq) framework. A2O VC aims to convert any speaker, including those unseen during training, to a fixed target speaker. We utilize vq-wav2vec (VQW2V), a discretized self-supervised speech representation that was learned from massive unlabeled data, which is assumed to be speaker-independent and well corresponds to underlying linguistic contents. Given a training dataset of the target speaker, we extract VQW2V and acoustic features to estimate a seq2seq mapping function from the former to the latter. With the help of a pretraining method and a newly designed postprocessing technique, our model can be generalized to only 5 min of data, even outperforming the same model trained with parallel data.
This paper describes the IVI Lab entry to the GENEA Challenge 2022. We formulate the gesture generation problem as a sequence-to-sequence conversion task with text, audio, and speaker identity as inputs and the body m...
详细信息
ISBN:
(纸本)9781450393904
This paper describes the IVI Lab entry to the GENEA Challenge 2022. We formulate the gesture generation problem as a sequence-to-sequence conversion task with text, audio, and speaker identity as inputs and the body motion as the output. We use the Tacotron2 architecture as our backbone with the locality-constraint attention mechanism that guides the decoder to learn the dependencies from the neighboring latent features. The collective evaluation released by GENEA Challenge 2022 indicates that our two entries (FSH and USK) for the full body and upper body tracks statistically outperform the audio-driven and text-driven baselines on both two subjective metrics. Remarkably, our full-body entry receives the highest speech appropriateness (60.5% matched) among all submitted entries. We also conduct an objective evaluation to compare our motion acceleration and jerk with two autoregressive baselines. The result indicates that the motion distribution of our generated gestures is much closer to the distribution of natural gestures.
暂无评论