检索结果-内蒙古大学图书馆

Communication via eye blinks and eyebrow raises: Video-based human-computer interfaces

Universal Access in the Information Society 2003年第4期2卷 359-373页

作者： Grauman, K. Betke, M. Lombardi, J. Gips, J. Bradski, G.R. Vision Interface Group AI Laboratory Massachusetts Institute of Technology 77 Massachusetts Avenue CambridgeMA02139 United States Computer Science Department Boston University 111 Cummington St BostonMA02215 United States EagleEyes Computer Science Department Boston College Fulton Hall Chestnut HillMA02467 United States Vision Graphics and Pattern Recognition Microcomputer Research Laboratory Intel Corporation SC12-303 2200 Mission College Blvd Santa ClaraCA95054-1537 United States

Two video-based human-computer interaction tools are introduced that can activate a binary switch and issue a selection command. "BlinkLink," as the first tool is called, automatically detects a user's eye blinks and accurately measures their durations. The system is intended to provide an alternate input modality to allow people with severe disabilities to access a computer. Voluntary long blinks trigger mouse clicks, while involuntary short blinks are ignored. The system enables communication using "blink patterns:" sequences of long and short blinks which are interpreted as semiotic messages. The second tool, "EyebrowClicker," automatically detects when a user raises his or her eyebrows and then triggers a mouse click. Both systems can initialize themselves, track the eyes at frame rate, and recover in the event of errors. No special lighting is required. The systems have been tested with interactive games and a spelling program. Results demonstrate overall detection accuracy of 95.6% for BlinkLink and 89.0% for EyebrowClicker. © Springer-Verlag 2003.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Fast single image dehazing through Edge-Guided Interpolated Filter

Fast single image dehazing through Edge-Guided Interpolated ...

引用

IAPR International Conference on Machine vision Applications (MVA)

作者： Ximei Zhu Ying Li Yu Qiao Shenzhen Key lab of Computer Vision Pattern Recognition The Chinese University of Hong Kong Hong Kong SAR

ISBN: (纸本)9781479982479

Images and videos taken in foggy weather often suffer from low visibility. Recent studies demonstrate the effectiveness of dark channel prior [3] and guided filter [4] based approaches for image dehazing. However, these methods require high computational cost which makes them infeasible for realtime and embedding systems. In this paper, we propose Edge-Guided Interpolated Filter (EGIF) for fast image and video dehazing. The main contributions are twofold. Firstly, we develop Guided Interpolated Filter (GIF) to significantly speed up the estimation of transmission map, which is the most computational cost step in previous methods. Secondly, we utilize edge map as guidance image in GIF to enhance the fine details in dehazed images. Experimental results show that GIF can largely improve the computational efficiency and achieve comparable dehazing performance as previous guided filter based methods. EGIF can further enhance the sharpness of transmission map. Our method can achieve real-time processing for image of size 1024 × 768 with single CPU core (2GHz).

关键词： Image edge detection Interpolation Videos Real-time systems Image restoration computer vision Atmosphere

来源：评论

学校读者我要写书评

暂无评论

EFFICIENT ONLINE LABEL CONSISTENT HASHING FOR LARGE-SCALE CROSS-MODAL RETRIEVAL

EFFICIENT ONLINE LABEL CONSISTENT HASHING FOR LARGE-SCALE CR...

引用

2021 IEEE International Conference on Multimedia and Expo, ICME 2021

作者： Yi, Jinhan Liu, Xin Cheung, Yiu-Ming Xu, Xing Fan, Wentao He, Yi Department of Computer Science and Technology Huaqiao University Xiamen361021 China Xiamen Key Lab. of Computer Vision and Pattern Recognition Fujian Key Lab. of Big Data Intelligence and Security China Department of Computer Science Hong Kong Baptist University Kowloon Hong Kong School of Computer Science and Engineering University of Electronic Science and Technology of China China Provincial Key Laboratory for Computer Information Processing Technology Soochow University China

ISBN: (纸本)9781665438643

Existing cross-modal hashing still faces three challenges: (1) Most batch-based methods are unsuitable for processing large-scale and streaming data. (2) Current online methods often suffer from insufficient semantic association, while lacking flexibility to learn the hash functions for varying streaming data. (3) Existing supervised methods always require much computation time or accumulate large quantization loss to learn hash codes. To address above challenges, we present an efficient Online Label Consistent Hashing (OLCH) for cross-modal retrieval, which aims to incrementally learn hash codes for the current arriving data, while updating the hash functions at a streaming manner. To be specific, an online semantic representation learning framework is designed to adaptively preserve the semantic similarity across different modalities, and a mini-batch online gradient descent approach associated with forward-backward splitting is developed to optimize the hash functions. Accordingly, the hash codes are adaptively learned online with the high discriminative capability, while avoiding high computation complexity to process the streaming data. Experimental results show its outstanding performance in comparison with the-state-of-arts. © 2021 IEEE computer Society. All rights reserved.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

Gene Expression Clustering: a Novel Graph Partitioning Approach

Gene Expression Clustering: a Novel Graph Partitioning Appro...

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Yanhua Chen Ming Dong Manjeet Rege Machine Vision and Pattern Recognition Laboratory Department of Computer Science Wayne State University Detroit MI USA

In order to help understand how the genes are affected by different disease conditions in a biological system, clustering is typically performed to analyze gene expression data. In this paper, we propose to solve the clustering problem using a graph theoretical approach, and apply a novel graph partitioning model -isoperimetric graph partitioning (IGP), to group biological samples from gene expression data. The IGP algorithm has several advantages compared to the well-established spectral graph partitioning (SGP) model. First, IGP requires a simple solution to a sparse system of linear equations instead of the eigen-problem in the SGP model. Second, IGP avoids degenerate cases produced by spectral approach to achieve a partition with higher accuracy. Moreover, we integrate unsupervised gene selection into the proposed approach through two-way ordering of gene expression data, such that we can eliminate irrelevant or redundant genes in the data and obtain an improved clustering result. We evaluate our approach on several well-known problems involving gene expression profiles of colon cancer and leukemia subtypes. Our experiment results demonstrate that IGP constantly outperforms SGP and produces a better result that is closer to the original labeling of sample sets provided by domain experts. Furthermore, the clustering accuracy is improved significantly when IGP is integrated with the unsupervised gene (feature) selection.

关键词： Gene expression Biological system modeling Diseases Biological systems Data analysis Performance analysis Clustering algorithms Partitioning algorithms Equations Colon

来源：评论

学校读者我要写书评

暂无评论

Evaluation of Unconditioned Deep Generative Synthesis of Retinal Images 20th

Evaluation of Unconditioned Deep Generative Synthesis of Ret...

引用

20th International Conference on Advanced Concepts for Intelligent vision Systems, ACIVS 2020

作者： Kaplan, Sinan Lensu, Lasse Laaksonen, Lauri Uusitalo, Hannu Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT P.O. Box 20 Lappeenranta53850 Finland Department of Ophthalmology Faculty of Health and Biotechnology Tampere University and Tays Eye Center Tampere Finland

ISBN: (纸本)9783030406042

Retinal images have been increasingly important in clinical diagnostics of several eye and systemic diseases. To help the medical doctors in this work, automatic and semi-automatic diagnosis methods can be used to increase the efficiency of diagnostic and follow-up processes, as well as enable wider disease screening programs. However, the training of advanced machine learning methods for improved retinal image analysis typically requires large and representative retinal image data sets. Even when large data sets of retinal images are available, the occurrence of different medical conditions is unbalanced in them. Hence, there is a need to enrich the existing data sets by data augmentation and introducing noise that is essential to build robust and reliable machine learning models. One way to overcome these shortcomings relies on generative models for synthesizing images. To study the limits of retinal image synthesis, this paper focuses on the deep generative models including a generative adversarial network and a variational autoencoder to synthesize images from noise without conditioning on any information regarding to the retina. The models are trained with the Kaggle EyePACS retinal image set, and for quantifying the image quality in a no-reference manner, the generated images are compared with the retinal images of the DiaRetDB1 database using common similarity metrics. © 2020, Springer Nature Switzerland AG.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Neural Transformation Fields for Arbitrary-Styled Font Generation

Neural Transformation Fields for Arbitrary-Styled Font Gener...

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Bin Fu Junjun He Jianjun Wang Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shanghai Artificial Intelligence Laboratory

Few-shot font generation (FFG), aiming at generating font images with a few samples, is an emerging topic in recent years due to the academic and commercial values. Typically, the FFG approaches follow the style-content disentanglement paradigm, which transfers the target font styles to characters by combining the content representations of source characters and the style codes of reference samples. Most existing methods attempt to increase font generation ability via exploring powerful style representations, which may be a sub-optimal solution for the FFG task due to the lack of modeling spatial transformation in transferring font styles. In this paper, we model font generation as a continuous transformation process from the source character image to the target font image via the creation and dissipation of font pixels, and embed the corresponding transformations into a neural transformation field. With the estimated transformation path, the neural transformation field generates a set of intermediate transformation results via the sampling process, and a font rendering formula is developed to accumulate them into the target font image. Extensive experiments show that our method achieves state-of-the-art performance on few-shot font generation task, which demonstrates the effectiveness of our proposed model. Our implementation is available at: https://***/fubinfb/NTF.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Fast DCT-I, DCT-III, and DCT-IV via Moments

引用

Eurasip Journal on Applied Signal Processing 2005年第12期2005卷 1902-1909页

作者： Liu, J.G. Liu, Y.Z. Wang, G.Y. Key Laboratory of Education Huazhong University of Science and Technology Ministry for Image Processing and Intelligence Control Wuhan 430074 China Department of Computer Science and Engineering China University of Geosciences Wuhan 430074 China Key Laboratory of State Education Ministry for Image Processing and Intelligent Control Huazhong University of Science and Technology China Institute of Pattern Recognition and Artificial Intelligence Huazhong University of Science and Technology China

This paper presents a novel approach to compute DCT-I, DCT-III, and DCT-IV. By using a modular mapping and truncating, DCTs are approximated by linear sums of discrete moments computed fast only through additions. This enables us to use computational techniques developed for computing moments to compute DCTs efficiently. We demonstrate this by applying our earlier systolic solution to this problem. The method can also be applied to multidimensional DCTs as well as their inverses. © 2005 Hindawi Publishing Corporation.

关键词： Method of moments

来源：评论

学校读者我要写书评

暂无评论

Abstract: Learning to avoid poor images: towards task-aware c-arm cone-beam ct trajectories

Abstract: Learning to avoid poor images: towards task-aware ...

引用

International workshop on Algorithmen - Systeme - Anwendungen, 2020

作者： Zaech, Jan-Nico Gao, Cong Bier, Bastian Taylor, Russell Maier, Andreas Navab, Nassir Unberath, Mathias Laboratory for Computational Sensing and Robotics Johns Hopkins University Baltimore United States Pattern Recognition Lab Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany Computer Vision Laboratory Eidgenössische Technische Hochschule Zürich Zürich Germany

ISBN: (纸本)9783658292669

Metal artifacts in computed tomography (CT) arise from a mismatch between physics of image formation and idealized assumptions during tomographic reconstruction. These artifacts are particularly strong around metal implants, inhibiting widespread adoption of 3D cone-beam CT (CBCT) despite clear opportunity for intra-operative verification of implant positioning, e. g. in spinal fusion surgery. On synthetic and real data, we demonstrate that much of the artifact can be avoided by acquiring better data for reconstruction in a task-aware and patient-specific manner, and describe the first step towards the envisioned task-aware CBCT protocol. © Springer Fachmedien Wiesbaden GmbH, ein Teil von Springer Nature 2020.

关键词： computerized tomography

来源：评论

学校读者我要写书评

暂无评论

Classifiability based omnivariate decision trees

Classifiability based omnivariate decision trees

引用

International Joint Conference on Neural Networks (IJCNN)

作者： Y. Li M. Dong Machine Vision and Pattern Recognition Laboratory Department of Computer Science Wayne State University Detroit MI USA

Decision trees represent a simple and powerful method of induction from labeled examples. Univariate decision trees consider the value of a single attribute at each node, leading to the splits that are parallel to the axes. In linear multivariate decision trees, all the attributes are used and the partition at each node is based on a linear discriminate (a hyperplane). Nonlinear multivariate decision trees are able to divide the input space arbitrarily based on higher order parameterizations of the discriminate, though one should be aware of the increase of the complexity and the decrease in the number of examples available as moves further down the tree. In omnivariate decision trees, the decision node may be univariate, linear, or nonlinear. Such architecture frees the designer from choosing the appropriate tree type for a given problem. In this paper, we propose to do the model selection at each decision node based on a novel classifiability measure when building omnivariate decision trees. The classifiability measure captures the possible sources of misclassification with relative ease and is able to accurately reflect the complexity of subproblems at each node. The proposed approach does not require the time consuming statistic tests at each node and therefore does not suffer from as high computational burden as typical model selection algorithm. Our simulation results over several data sets indicate that our approach can achieve at least as good classification accuracy as statistical tests based model select algorithms, but in much faster speed.

关键词： Classification tree analysis Decision trees Testing Shape Machine vision pattern recognition Laboratories computer science Buildings Statistical analysis

来源：评论

学校读者我要写书评

暂无评论

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL REPRESENTATION LEARNING 10

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL ...

引用

10th International Conference on Learning Representations, ICLR 2022

作者： Li, Kunchang Wang, Yali Gao, Peng Song, Guanglu Liu, Yu Li, Hongsheng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SenseTime Research The Chinese University of Hong Kong Hong Kong

It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in this research have been mainly driven by 3D convolutional neural networks and vision transformers. Although 3D convolution can efficiently aggregate local context to suppress local redundancy from a small 3D neighborhood, it lacks the capability to capture global dependency because of the limited receptive field. Alternatively, vision transformers can effectively capture long-range dependency by self-attention mechanism, while having the limitation on reducing local redundancy with blind similarity comparison among all the tokens in each layer. Based on these observations, we propose a novel Unified transFormer (UniFormer) which seamlessly integrates merits of 3D convolution and spatiotemporal self-attention in a concise transformer format, and achieves a preferable balance between computation and accuracy. Different from traditional transformers, our relation aggregator can tackle both spatiotemporal redundancy and dependency, by learning local and global token affinity respectively in shallow and deep layers. We conduct extensive experiments on the popular video benchmarks, e.g., Kinetics-400, Kinetics-600, and Something-Something V1&V2. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10× fewer GFLOPs than other state-of-the-art methods. For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60.9% and 71.2% top-1 accuracy respectively. Code is available at https://***/Sense-X/UniFormer. © 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.

关键词： Redundancy

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：