Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is gene...
Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is generated while receiving the source sequence. The crux of achieving high-quality generation with low latency lies in identifying the optimal moments for generating, accomplished by learning a mapping between the source and target sequences. However, existing methods often rely on task-specific heuristics for different sequence types, limiting the model's capacity to adaptively learn the source-target mapping and hindering the exploration of multi-task learning for various simultaneous tasks. In this paper, we propose a unified segment-to-segment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. During the process of simultaneous generation, the model alternates between waiting for a source segment and generating a target segment, making the segment serve as the natural bridge between the source and target. To accomplish this, Seg2Seg introduces a latent segment as the pivot between source to target and explores all potential source-target mappings via the proposed expectation training, thereby learning the optimal moments for generating. Experiments on multiple simultaneous generation tasks demonstrate that Seg2Seg achieves state-of-the-art performance and exhibits better generality across various tasks. Code is available at: https://***/ictnlp/Seg2Seg.
Glass is ubiquitous in the real world, and its perception has many applications, including robot navigation and drone tracking. However, due to the transparent property of glass, the interior of a glass area can be an...
详细信息
Glass is ubiquitous in the real world, and its perception has many applications, including robot navigation and drone tracking. However, due to the transparent property of glass, the interior of a glass area can be any surrounding scene or object, which brings challenges for computer vision. Inspired by the human senses, boundary cues are one of the crucial factors for people to judge the location of glass contours. Hence, we propose a boundary cue guidance and contextual feature mining network (BCNet) to accurately and efficiently segment glass. Specifically, we first design a multi-branch boundary extraction module (MBEM) for learning accurate boundary cues combined with multi-level encoded features. Second, we propose a boundary cue guidance module (BCGM), inject the boundary cues into the representation learning, and provide constraints with object structure semantics to guide feature extraction. Besides, we design a contextual feature mining module (CFMM) to dynamically capture the contextual information of different receptive fields for the detection of different sizes and shapes of the glass. Finally, extensive experiments on two benchmark glass datasets, GDD and GSD. The results demonstrate that our BCNet achieves state-of-the-art segmentation performance against existing methods.
To improve the feature representation ability of the YOLOX algorithm and obtain better detection performance, an object detection algorithm based on second-order pooling network and gaussian mixture attention is propo...
详细信息
Transformer has achieved excellent performance in the knowledge tracing (KT) task, but they are criticized for the manually selected input features for fusion and the defect of single global context modelling to direc...
详细信息
Speed skating serves as a significant application domain for multiobject tracking (MOT), presenting unique challenges such as frequent occlusion, highly similar appearances, and motion blur. To address these challenge...
With the size and complexity of a multiprocess computer system grows, the likelihood of having faulty processors in the system increases. How to evaluate the impact of faulty processors on the entire system is what we...
详细信息
Learning how to model global relationships and extract local details is crucial in improving the performance of multi-organ segmentation. Most existing U-shaped structure methods use feature fusion to address these tw...
详细信息
To enhance the expression ability of deep features and improve the tracking performance of the fully convolutional siamese network (SiamFC) in the UAV scene, we propose a UAV visual tracking algorithm based on feature...
详细信息
Network traffic prediction plays a significant role in network management. Previous network traffic prediction methods mainly focus on the temporal relationship between network traffic, and used time series models to ...
Network traffic prediction plays a significant role in network management. Previous network traffic prediction methods mainly focus on the temporal relationship between network traffic, and used time series models to predict network traffic, ignoring the spatial information contained in traffic data. Therefore, the prediction accuracy is limited, especially in long-term prediction. To improve the prediction accuracy of the dynamic network traffic in the long term, we propose an Attention-based Spatial-Temporal Graph Network (ASTGN) model for network traffic prediction to better capture both the temporal and spatial relations between the network traffic. Specifically, in ASTGN, we exploit an encoder-decoder architecture, where the encoder encodes the input network traffic and the decoder outputs the predicted network traffic sequences, integrating the temporal and spatial information of the network traffic data through the Spatio-Temporal Embedding module. The experimental results demonstrate the superiority of our proposed method ASTGN in long-term prediction.
A single-layer, polarization adjustable circular-polarization (CP) antenna with four arc-like slots has been designed for GPS L2 band. The created antenna uses four arc-like slots to tune the phase difference to form ...
详细信息
A single-layer, polarization adjustable circular-polarization (CP) antenna with four arc-like slots has been designed for GPS L2 band. The created antenna uses four arc-like slots to tune the phase difference to form a CP antenna, where the arc-like slots with a specific size relationship are etched on the patch. By adjusting the radius of the arc-like slots, Left-handed -circular-polarization (LHCP) and Right-handed-circular-polarization (RHCP) can be realized easily with simple structure. Simulations and optimizations show that the constructed CP-antenna has a good axial-ratio bandwidth of 10 MHz and impedance-bandwidth of 40 MHz and 30 MHz for LHCP and RHCP application.
暂无评论