咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Correspondence Autoencoders fo... 收藏

Correspondence Autoencoders for Cross-Modal Retrieval

为跨 Modal 检索的通讯 Autoencoders

作     者:Feng, Fangxiang Wang, Xiaojie Li, Ruifan Ahmad, Ibrar 

作者机构:Beijing Univ Posts & Telecommun Sch Comp Sci Beijing 100876 Peoples R China Univ Peshawar Dept Comp Sci Peshawar Pakistan 

出 版 物:《ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS》 (ACM多媒体计算通信应用汇刊)

年 卷 期:2015年第12卷第1期

页      面:26-26页

核心收录:

学科分类:0809[工学-电子科学与技术(可授工学、理学学位)] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:National Natural Science Foundation of China National High Technology Research and Development Program of China [2012AA011103] discipline building plan in 111 base [B08004] Fundamental Research Funds for the Central Universities [2013RC0304] Engineering Research Center of Information Networks, Ministry of Education 

主  题:Algorithms Design Experimentation Cross-modal retrieval image and text deep learning autoencoder 

摘      要:This article considers the problem of cross-modal retrieval, such as using a text query to search for images and vice-versa. Based on different autoencoders, several novel models are proposed here for solving this problem. These models are constructed by correlating hidden representations of a pair of autoencoders. A novel optimal objective, which minimizes a linear combination of the representation learning errors for each modality and the correlation learning error between hidden representations of two modalities, is used to train the model as a whole. Minimizing the correlation learning error forces the model to learn hidden representations with only common information in different modalities, while minimizing the representation learning error makes hidden representations good enough to reconstruct inputs of each modality. To balance the two kind of errors induced by representation learning and correlation learning, we set a specific parameter in our models. Furthermore, according to the modalities the models attempt to reconstruct they are divided into two groups. One group including three models is named multimodal reconstruction correspondence autoencoder since it reconstructs both modalities. The other group including two models is named unimodal reconstruction correspondence autoencoder since it reconstructs a single modality. The proposed models are evaluated on three publicly available datasets. And our experiments demonstrate that our proposed correspondence autoencoders perform significantly better than three canonical correlation analysis based models and two popular multimodal deep models on cross-modal retrieval tasks.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分