版权所有:内蒙古大学图书馆 技术提供:维普资讯• 智图
内蒙古自治区呼和浩特市赛罕区大学西街235号 邮编: 010021
作者机构:Univ Cambridge Comp Lab Cambridge CB2 3QG England Imense Ltd Cambridge England Univ Birmingham Sch Phys & Astron Birmingham W Midlands England
出 版 物:《ASLIB PROCEEDINGS》 (信息管理协会会报:新信息展望)
年 卷 期:2010年第62卷第4-5期
页 面:438-446页
核心收录:
学科分类:1205[管理学-图书情报与档案管理] 081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Data handling Pattern recognition Data analysis Virtual work
摘 要:Purpose - Content-based image retrieval (CBIR) technologies offer many advantages over purely text-based image search. However, one of the drawbacks associated with CBIR is the increased computational cost arising from tasks such as image processing, feature extraction, image classification, and object detection and recognition. Consequently CBIR systems have suffered from a lack of scalability, which has greatly hampered their adoption for real-world public and commercial image search. At the same time, paradigms for large-scale heterogeneous distributed computing such as grid computing, cloud computing, and utility-based computing are gaining traction as a way of providing more scalable and efficient solutions to large-scale computing tasks. Design/methodology/approach - This paper presents an approach in which a large distributed processing grid has been used to apply a range of CBIR methods to a substantial number of images. By massively distributing the required computational task across thousands of grid nodes, very high through-put has been achieved at relatively low overheads. Findings - This has allowed one to analyse and index about 25 million high resolution images thus far, while using just two servers for storage and job submission. The CBIR system was developed by Imense Ltd and is based on automated analysis and recognition of image content using a semantic ontology. It features a range of image-processing and analysis modules, including image segmentation, region classification, scene analysis, object detection, and face recognition methods. Originality/value - In the case of content-based image analysis, the primary performance criterion is the overall through-put achieved by the system in terms of the number of images that can be processed over a given time frame, irrespective of the time taken to process any given image. As such, grid processing has great potential for massively parallel content-based image retrieval and other tasks with similar p