咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Aligning Textual & Visual Data... 收藏
Aligning Textual & Visual Data: Towards Scalable Multimedia ...

Aligning Textual & Visual Data: Towards Scalable Multimedia Retrieval

作     者:Pramod Sankar Kompalli 

作者单位:International Institute of Information Technology Hyderabad 

学位级别:博士

导师姓名:C. V. Jawahar

授予年度:2015年

摘      要:The search and retrieval of relevant images and videos from large repositories of mul- timedia, is acknowledged as one of the hard challenges of computer science. With exist- ing pattern recognition solutions, one cannot obtain detailed, semantic description for a given multimedia document. Several limitations exist in feature extraction, classification schemes, along with the incompatibility of representations across domains. The situation will most likely remain so, for several years to come. Towards addressing this challenge, we observe that several multimedia collections con- tain similar parallel information that are: i) semantic in nature, ii) weakly aligned with the multimedia and iii) available freely. For example, the content of a news broadcast is also available in the form of newspaper articles. If a correspondence could be obtained between the videos and such parallel information, one could access one medium using the other, which opens up immense possibilities for information extraction and retrieval. However, it is challenging to find the mapping between the two sources of data due to the unknown semantic hierarchy within each medium and the difficulty to match information across the different modalities. In this thesis, we propose novel algorithms that address these challenges. Different pairs, require different alignment techniques, depending on the granularity at which entities could be matched across them. We choose four pairs of multimedia, along with parallel information obtained in the text domain, such that the data is both challenging and available on a large scale. Specifically, our multimedia consists of movies, broadcast sports videos and document images, with the parallel text coming from scripts, commentaries and language resources. As we proceed from one pair to the next, we discover an increasing complexity of the problem, due to a relaxation of the temporal binding between the parallel information

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分