检索结果-内蒙古大学图书馆

32nd British Machine Vision Conference, BMVC 2021

作者： Yu, Jiaqi Yang, Jinhai Yang, Hua Zhai, Guangtao Institute of Image Communication and Network Engineering Shanghai Jiao Tong University ChinaShanghai Key Lab of Digital Media Processing and Transmission Shanghai ChinaMoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China

Detecting interaction groups is an essential task for understanding human behaviours and social activities. However, it is still challenging to identify social interactions and the resulting crowd groups using purely visual cues, especially from single images. Prior works either require additional statistics, such as interpersonal angles and kinaesthetic information, or simply deduce the group memberships with the similarity of individual actions. In this paper, we present the Psychology-inspired Relation Network (PRN) to comprehensively understand the static social scenes and effectively model the interaction relations between individuals. More concretely, stimulated by recent advances in social psychology, we first predict the keypoint heatmap from an image with the human bounding boxes as the visual representations of the key factors determining interaction groups: distance, orientation and postural openness. We then incorporate the personal and mutual influences together to compute the interaction strength matrix via self-attention, and finally utilise a perception to convert this matrix into dyadic interaction probability. Moreover, we devise two loss functions, the dyad loss to optimise the dyadic interaction probability and the group loss to enhance the distinguishability among different social groups. To evaluate the performance of PRN, we introduce a novel dataset containing various scenes with different crowd densities, by merging representative databases and relab.ling the group lab.ls. Our method achieves outstanding results on the proposed dataset. © 2021. The copyright of this document resides with its authors.

关键词： Computer vision

来源：评论

学校读者我要写书评

暂无评论

Spatial-Temporal Constrained Pseudo-lab.ling for Unsupervised Person Re-identification via GCN Inference 18th

Spatial-Temporal Constrained Pseudo-labeling for Unsupervis...

引用

18th International Forum of Digital Multimedia communication, IFTC 2021

作者： Ling, Sen Yang, Hua Liu, Chuang Chen, Lin Zhao, Hongtian The Institute of Image Communication and Network Engineering Department of Electronic Engineering Shanghai Jiao Tong University Shanghai China Shanghai Key Laboratory of Digital Media Processing and Transmission Shanghai Jiao Tong University Shanghai China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai China

ISBN: (纸本)9789811922657

Most existing unsupervised person re-identification (Re-ID) methods primarily depend on the cluster distance, and merely exploit the availab.e source lab.led data to assign pseudo lab.ls for the unannotated data. Whereas, the cluster distance usually fails to adapt to different datasets due to the domain gap. Besides, learning exclusively from the source data can not generate accurate pseudo lab.ls for the lack of the target data information. To address this problem, we propose to exploit the spatial-temporal constraints to facilitate the pseudo lab.l generation process. Specifically, graphs for the lab.led source data are constructed and the graph convolution network (GCN) is used to learn graph embeddings. Based on these graph embeddings, the likelihood of linkages between graph nodes is estimated and utilized to assign pseudo lab.ls for the unlab.led data. Then, with the pseudo lab.ls, a smoothed spatial-temporal probability distribution model is generated to amend the likelihood of linkages between graph nodes as well as correct the visual similarity scores for person Re-ID. Finally, we optimize the pseudo lab.l assignment, feature extraction networks, and spatial-temporal model alternatively and iteratively to improve the person Re-ID performance. Comprehensive experiments demonstrate that the proposed method outperforms state-of-the-art methods. © 2022, Springer Nature Singapore Pte Ltd.

关键词： Graph theory

来源：评论

学校读者我要写书评

暂无评论

Decoding Sleep: Microphone-Based Snoring Analysis using Embedded Machine Learning for Obstructive Sleep Apnea Detection

Decoding Sleep: Microphone-Based Snoring Analysis using Embe...

引用

International Conference on Biosignals, images and Instrumentation (ICBSII)

作者： Delpha Jacob Priyanka Kokil Subramanian S Jayanthi Thiruvengadam Department of Biomedical Engineering College of Engineering and Technology SRM Institute of Science and Technology Chennai Department of Electronics and Communication Engineering Advanced Signal and Image Processing (ASIP) Lab Indian Institute of Information Technology Design and Manufacturing Chennai Department of Respiratory Medicine SRM Medical College Hospital and Research Centre Kattankulathur Tamil Nadu India

ISBN: (数字)9798350350951

ISBN: (纸本)9798350350968

Snoring, a recurring habit often disregarded within the Indian community, can signal a grave underlying issue of Obstructive Sleep Apnea (OSA). OSA is a severe sleep disorder characterized by recurrent interruptions in breathing for more than 10 seconds during sleep, typically due to partial or complete airway obstructions. Neglecting OSA can lead to a range of significant health risks, including increased likelihood of occupational accidents, motor vehicle accidents, heightened susceptibility to severe depression, cardiac and cerebrovascular diseases, and reduced life expectancy. The main objective of the study is to detect snoring while at sleep and also to classify it as normal snoring and OSA snoring. Arduino nano 33 BLE sense is used to capture the snore signal, it houses a built-in MP34DT05 sensor. The sensor has a signal-to-noise ratio of 64dB and sensitivity of - 26dBFS ± 3dB. This captures the sound signal of the individual, it is further processed to extract the Mel-filter bank energy features, Mel Frequency Cepstral Coefficients and Spectrogram features. The features are further used to build a model and the same is trained using edge impulse to classify the signal. The dataset is divided into training, testing, and validation sets, with 80% of the data allocated to training, 20% to testing, and an additional 20% within the training data set aside for validation purposes. The results for the two class classification (snoring and non snoring) indicate that the spectrogram-based approach achieved an accuracy rate of 96.9%, while the other two methods yielded accuracy rates of 93.8%. The accuracy for three class classification (normal, snoring and OSA snoring) using the Embedded Machine Learning (EML) approach is 88%. The proposed study demonstrates enhanced accuracy in identifying OSA by snoring compared to previous research. This autonomous system can facilitate the detection of OSA through the analysis of snoring patterns, subsequently alerting the subjec

关键词： Training Performance evaluation Accuracy Embedded systems Training data Machine learning Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Visual comfort aware-reinforcement learning for depth adjustment of stereoscopic 3D images

arXiv

引用

arXiv 2021年

作者： Kim, Hak Gu Park, Minho Lee, Sangmin Kim, Seongyeop Ro, Yong Man Image and Video Systems Lab. KAIST Korea Republic of School of Computer and Communication Sciences EPFL Switzerland

Depth adjustment aims to enhance the visual experience of stereoscopic 3D (S3D) images, which accompanied with improving visual comfort and depth perception. For a human expert, the depth adjustment procedure is a sequence of iterative decision making. The human expert iteratively adjusts the depth until he is satisfied with the both levels of visual comfort and the perceived depth. In this work, we present a novel deep reinforcement learning (DRL)-based approach for depth adjustment named VCA-RL (Visual Comfort Aware Reinforcement Learning) to explicitly model human sequential decision making in depth editing operations. We formulate the depth adjustment process as a Markov decision process where actions are defined as camera movement operations to control the distance between the left and right cameras. Our agent is trained based on the guidance of an objective visual comfort assessment metric to learn the optimal sequence of camera movement actions in terms of perceptual aspects in stereoscopic viewing. With extensive experiments and user studies, we show the effectiveness of our VCA-RL model on three different S3D databases. Copyright © 2021, The Authors. All rights reserved.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Towards a better understanding of VR sickness: Physical symptom prediction for VR contents

arXiv

引用

arXiv 2021年

作者： Kim, Hak Gu Lee, Sangmin Kim, Seongyeop Lim, Heoun-Taek Ro, Yong Man Image and Video Systems Lab. KAIST Korea Republic of School of Computer and Communication Sciences EPFL Switzerland

We address the black-box issue of VR sickness assessment (VRSA) by evaluating the level of physical symptoms of VR sickness. For the VR contents inducing the similar VR sickness level, the physical symptoms can vary depending on the characteristics of the contents. Most of existing VRSA methods focused on assessing the overall VR sickness score. To make better understanding of VR sickness, it is required to predict and provide the level of major symptoms of VR sickness rather than overall degree of VR sickness. In this paper, we predict the degrees of main physical symptoms affecting the overall degree of VR sickness, which are disorientation, nausea, and oculomotor. In addition, we introduce a new large-scale dataset for VRSA including 360 videos with various frame rates, physiological signals, and subjective scores. On VRSA benchmark and our newly collected dataset, our approach shows a potential to not only achieve the highest correlation with subjective scores, but also to better understand which symptoms are the main causes of VR sickness. Copyright © 2021, The Authors. All rights reserved.

关键词： Diseases

来源：评论

学校读者我要写书评

暂无评论

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

arXiv

引用

arXiv 2023年

作者： Sun, Wei Wen, Wen Min, Xiongkuo Lan, Long Zhai, Guangtao Ma, Kede The Institute of Image Communication and Information Processing Shanghai Jiao Tong University Shanghai200240 China The Department of Computer Science City University of Hong Kong Kowloon Hong Kong The Institute for Quantum Information State Key Laboratory of High Performance Computing College of Computer Science and Technology National University of Defense Technology Changsha410073 China The Institute of Image Communication and Information Processing The MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai200240 China The Department of Computer Science The Shenzhen Research Institute City University of Hong Kong Kowloon Hong Kong

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users’ viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models. Code is availab.e at https://***/sunwei925/***. Copyright © 2023, The Authors. All rights reserved.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Mask-Guided Transformer for Human-Object Interaction Detection

Mask-Guided Transformer for Human-Object Interaction Detecti...

引用

IEEE Visual communications and image processing (VCIP)

作者： Daocheng Ying Hua Yang Jun Sun Shanghai Key Lab of Digital Media Processing and Transmission Institute of Image Communication and Network Engineering Shanghai Jiao Tong University Shanghai China MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University China

ISBN: (纸本)9781665475938

Human-object interaction (HOI) detection is a meaningful research topic on human activity understanding. Recent works have made significant progress by focusing on efficient triplet matching and leveraging image-wide features based on encoder-decoder architecture. However, the ability to gather relevant contextual information about human is limited and different sub-tasks in HOI detection are not differentiated by specific decoupling in previous methods. To this end, we propose a new transformer-based method for HOI detection, namely, Mask-Guided Transformer (MGT). Our model, which is composed of five parallel decoders with a shared encoder, not only emphasizes interactive regions by applying body features, but also disentangles the prediction of instance and interaction. We achieve a favorable result at 63.3 mAP on the well-known HOI detection dataset V-COCO.

关键词： Visual communication image processing Focusing Predictive models Transformers Decoding

来源：评论

学校读者我要写书评

暂无评论

Dual-Mode iterative denoiser: Tackling the weak lab.l for anomaly detection

Dual-Mode iterative denoiser: Tackling the weak label for an...

引用

International Conference on Pattern Recognition

作者： Shuheng Lin Hua Yang Institute of Image Communication and Network Engineering Shanghai Jiao Tong University Shanghai key lab of digital media processing and transmission Shanghai China

Crowd anomaly detection suffers from limited training data under weak supervision. In this paper, we propose a dual-mode iterative denoiser to tackle the weak lab.l challenge for anomaly detection. First, we use a convolution autoencoder (CAE) in image space to act as a cluster for grouping similar video clips, where the spatial-temporal similarity helps the cluster metric to represent the reconstruction error. Then we use the graph convolution neural network (GCN) to explore the temporal correlation and the feature similarity between video clips within different rough lab.ls, where the classifier can be constantly updated in the lab.l denoising process. Without specific image-level lab.ls, our model can predict the clip-level anomaly probabilities for videos. Extensive experiment results on two public datasets show that our approach performs favorably against the state-of-the-art methods.

关键词： Training Convolution Noise reduction Neural networks Training data Predictive models Pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Input-Output Optics as a Causal Time Series Mapping: A Generative Machine Learning Solution

arXiv

引用

arXiv 2024年

作者： Sen, Abhijit Parida, Bikram Keshari Jacobs, Kurt Bondar, Denys I. Department of Physics and Engineering Physics Tulane University New OrleansLA70118 United States Artificial Intelligence & Image Processing Lab. Sun Moon University Asan-Si Korea Republic of United States DEVCOM Army Research Laboratory AdelphiMD20783 United States Department of Physics University of Massachusetts at Boston BostonMA02125 United States

The response of many-body quantum systems to an optical pulse can be extremely challenging to model. Here we explore the use of neural networks, both traditional and generative, to learn and thus simulate the response of such a system from data. The quantum system can be viewed as performing a complex mapping from an input time-series (the optical pulse) to an output time-series (the systems response) which is often also an optical pulse. Using both the transverse and non-integrable Ising models as examples, we show that not only can temporal convolutional networks capture the input/output mapping generated by the system but can also be used to characterize the complexity of the mapping. This measure of complexity is provided by the size of the smallest latent space that is able to accurately model the mapping. We further find that a generative model, in particular a variational auto-encoder, significantly outperforms traditional auto-encoders at learning the complex response of many-body quantum systems. For the example that generated the most complex mapping, the variational auto-encoder produces outputs that have less than 10% error for more than 90% of inputs across our test data. Copyright © 2024, The Authors. All rights reserved.

关键词： Time series

来源：评论

学校读者我要写书评

暂无评论

Crowd Counting Via Multi-Level Regression With Latent Gaussian Maps

Crowd Counting Via Multi-Level Regression With Latent Gaussi...

引用

IEEE International Conference on Acoustics, Speech and Signal processing

作者： Yukang Gao Hua Yang Institution of Image Communication and Network Engineering Shanghai Jiao Tong University Shanghai China Shanghai key lab of digital media processing and transmission Shanghai China

Crowd counting still confronts two primary challenges: limited ability to deal with cross density levels caused by fixed density maps and lack of fine-grained or coarse-grained guidance for density estimation. In this paper, a novel end-to-end crowd counting framework via multi-level regression with latent Gaussian maps is proposed, which is consisted of GaussianNet, EstimateNet and Discriminator. GaussianNet is composed of masked Gaussian convolutional blocks and vanillia convolutional layers, to generate latent Gaussian maps adaptively for various density levels. The latent Gaussian maps are then treated as the ground truth density maps for EstimateNet, which outputs density estimations and follows the principle of adversarial learning with Discriminator. Moreover, multi-level losses are combined for density map regression guidance. Extensive experiments on the major public datasets outperform state-of-the-art ones, illustrating the superior validity of the proposed framework.

关键词： Training Convolution Conferences Estimation Acoustics Bayes methods Task analysis

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：