We experimentally demonstrated line rate 300-Gb/s PAM4 and 400-Gb/s probabilistically shaped (PS)-PAM16 transmission using packaged thin-film LiNbO3 modulators with only 550-mVpp CMOS-class driving voltage requiring n...
详细信息
We demonstrate a PLC splitter encoded by Bragg waveguide gratings with a wavelength interval of 4 nm and an adjustable reflectance of up to 40% using a femtosecond laser inscribing technique for passive optical networ...
详细信息
We demonstrated an electrical-driven non-volatile MZI based on low-loss Sb2Se3 sub-cell arrays. By constraining the material reflow in the sub-cells, optical transmission contrast of > 5 dB was achieved with > 1...
详细信息
We demonstrate a hybrid-integrated self-injection-locked laser on the Si3N4-on-SOI platform to generate FMCW signal with 9 kHz linewidth, 2 MHz repetition rate, and high linearity, supporting FMCW-LiDAR with refresh r...
详细信息
We present an amplitude-invariant phase shifter based on a PIN diode embedded in a Mach-Zehnder Interferometer for optical phased arrays. The dynamic power contrast is lower than 0.42 dB over a π phase shift. CLEO 20...
详细信息
In this paper, we present the method proposed by our team for Track 2 of NLPCC 2023 Shared Task 7, which focuses on the extraction of paragraph-level and whole essay topic sentences in middle school student ...
详细信息
Based on a resistance-capacitance equalizer, we demonstrated a silicon thermo-optic phase shifter with a rise time of 2.3 µs and a modulation bandwidth of 250 kHz, offering 10 times enhancement over the original ...
详细信息
Contrastive Learning-based models have shown impressive performance in text-image retrieval tasks. However, when applied in video retrieval, traditional contrastive learning strategies have faced challenges in achievi...
详细信息
ISBN:
(纸本)9798400708688
Contrastive Learning-based models have shown impressive performance in text-image retrieval tasks. However, when applied in video retrieval, traditional contrastive learning strategies have faced challenges in achieving satisfactory results due to redundancy of video contents. We discern several potential reasons: (1)Current methodologies sometimes overlook the significant information imbalance between videos and query text, specifically neglecting the in-depth textual representation of the content within the videos. (2) Current video matching methodologies typically focus on cross-model alignment at general entity similarity level, without specific consideration for how entity pair preferences and similarity properties affect the task at hand. (3) Previous vectorized retrieval based on video content features have been somewhat flawed. They primarily focused on aligning overall features without having an video content tags feature for meaningful feature discrimination. Considering the shortcomings identified in the mentioned three aspects, we propose an ontology semantic labels augments retrieval model and introduce a method to integrate video ontology semantic labels into the contrastive learning framework. In particular, we have developed ontology semantic descriptions about entities encompassing both human figures and textual elements within the videos. Subsequently, we conducted training and testing on the CMIVQA dataset to assess the performance of our approach. The experimental results show that employing fine-grained ontology labels as sample pairs for contrastive learning leads to an increased level of precision in video retrieval tasks.
Diffusion models have demonstrated remarkable success in generating continuous data, such as images and audios. Previous studies on text generation employing continuous diffusion models have revealed the potential of ...
详细信息
ISBN:
(纸本)9798400708688
Diffusion models have demonstrated remarkable success in generating continuous data, such as images and audios. Previous studies on text generation employing continuous diffusion models have revealed the potential of the diffusion framework. However, challenges like embedding collapse persist, limiting the overall generation performance. In this paper we introduce LDSeq, a latent diffusion framework employing a two-stage training procedure for sequence-to-sequence text generation. In the proposed framework, we first train a Variational Auto-Encoder (VAE) on downstream datasets to compress the target text of samples into a continuous latent space, and then we train a conditional latent diffusion model in the fixed continuous latent space, where the latent vectors are iteratively sampled conditioned on the input source text. The disjoint training stages prevent the collapse of diffusion space. Experimental results on paraphrase generation and text summarization datasets show that LDSeq achieves comparable or superior performance in comparison to AR and NAR baselines while requiring lower training cost. Furthermore, We discuss some potential future directions for enhancing diffusion models in the text generation domain.
暂无评论