The development of automated methods capable of detecting and localizing actions is crucial for a variety of applications, ranging from surveillance and autonomous driving to content moderation. This thesis focuses on...
详细信息
The development of automated methods capable of detecting and localizing actions is crucial for a variety of applications, ranging from surveillance and autonomous driving to content moderation. This thesis focuses on creating action detection methods that deliver robust performances. At the heart of these methods’ robustness lie two fundamental elements: the detection of atomic actions and the ability for compositional understanding. Atomic actions are those that are identifiable from a single image or a short video. In this research, we developed innovative methods to detect and localize such actions that achieve state-of-the art performance. The key strength of these methods lies in their ability to refine visual features both spatially and semantically, enabling precise identification of action-specific regions. For scalability, we further developed a multi-branch network to recognize new composition of objects and actions. Our design ensures that each branch learns decoupled features, allowing the network to transfer previously learned concepts to identify new compositions. This approach outperforms existing methods by a good margin as our extensive experiments on benchmark datasets demonstrate. Further, the correct identification of the attributes of the participating objects in actions helps to detect unknown compositions. Therefore, we have created a network utilizing spatially localized learning to correctly associate objects and attributes. This network achieves state-of-the-art performance in object-attribute association on cluttered scenes. The developed methods in this thesis can do robust action detection at scale and serve as a base for numerous future applications.
Recent technological advances in Virtual Reality (VR) and Augmented Reality (AR) enable users to experience a high-quality virtual world. Using VR to experience the virtual world, the user's entire view becomes th...
详细信息
ISBN:
(纸本)9798350376975;9798350376968
Recent technological advances in Virtual Reality (VR) and Augmented Reality (AR) enable users to experience a high-quality virtual world. Using VR to experience the virtual world, the user's entire view becomes the virtual world, and the user's physical movement is generally limited because the user cannot see the surrounding situation in the real world. Using AR to experience the virtual world, we generally use special sensors such as LiDAR to detect the real space and superimpose the virtual world on the real space. However, it is difficult for devices without such special sensors to detect real space and superimpose a virtual world at an appropriate position. This study proposes two methods for replacing the background: a method using depth estimation and a method using semantic segmentation. This study also confirmed that the system can be used with sufficient removal accuracy and response time by using appropriate image size for the environment and that a safe and highly immersive virtual world experience can be achieved.
Due to its uneven and curvy surface, researchers had difficulty in getting the wiper arm surface to be evenly illuminated for appearance defect detection using machinevision. As a result, some defects, especially tho...
详细信息
ISBN:
(纸本)9781665485296
Due to its uneven and curvy surface, researchers had difficulty in getting the wiper arm surface to be evenly illuminated for appearance defect detection using machinevision. As a result, some defects, especially those located at the edge of the region of interest (ROI) were missed. In this paper, the ROI was widened by stitching two sequential images together using Laplacian pyramids. Genetic algorithm was then used to enhance the important features of the defects using the best fitness value, parent mating, crossover and mutation. The algorithm was able to reduce the effect of uneven-illumination by repeating regeneration. The resultant image was converted into binary for defect identification, and localized according to its contour. Experimental results showed 90.5% accuracy.
Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-ba...
详细信息
ISBN:
(数字)9798331506520
ISBN:
(纸本)9798331506537
Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-based range-constrained entropy merit function, from which we devise a zero-phase, circular symmetric blind deblurring method. We apply it as a pre-processing step for image classification and test it using pre-trained classification models and images blurred by Gaussian kernels. We compare our method to state-of-the-art restoration methods, showing its superiority, effectively bridging the machine-human gap for most models and blur levels. Our results also rank higher than the competitors in no-reference and full-reference image quality metrics. Notwithstanding the limitation to zero-phase blur, this work shows that, for image pre-processing aimed at visual tasks, it may be advantageous to use merit functions based on vision science and information theory, rather than on the expected error to the latent image.
A system for determination the distance from the robot to the scene is useful for object tracking, and 3-D reconstruction may be desired for many manufacturing and robotic tasks. While the robot is processing material...
详细信息
The traffic density on roads has been increasing rapidly for the past few decades, which has in turn been reflected in the increase in traffic violations and accidents. Official reports from various governments and pr...
详细信息
The proceedings contain 14 papers. The special focus in this conference is on Context-Aware Systems and applications. The topics include: Prediction of Chaotic Time Series Based on LSTM, Autoencoder and Chaos Theory;a...
ISBN:
(纸本)9783031288159
The proceedings contain 14 papers. The special focus in this conference is on Context-Aware Systems and applications. The topics include: Prediction of Chaotic Time Series Based on LSTM, Autoencoder and Chaos Theory;an Approach to Selecting Students Taking Provincial and National Excellent Student Exams;safe Interaction Between Human and Robot Using vision Technique;application of the imageprocessing Technique for Powerline Robot;collaborative Recommendation with Energy Distance Correlation;blockchain Model in Industrial Pangasius Farming;multiple-Criteria Rating Recommendation with Ordered Weighted Averaging Aggregation Operators;a Survey of On-Chip Hybrid Interconnect for Multicore Architectures;a Framework for Brain-Computer Interfaces Closed-Loop Communication Systems;identification of Abnormal Cucumber Leaves image Based on Recurrent Residual U-Net and Support Vector machine Techniques;lung Lesion images Classification Based on Deep Learning Model and Adaboost Techniques;balltree Similarity: A Novel Space Partition Approach for Collaborative Recommender Systems.
Successful applications of deep learning often depend on large amount of training data. However, in practical image recognition tasks, available training data are often limited or imbalanced across classes, causing th...
详细信息
ISBN:
(纸本)9783031189098;9783031189104
Successful applications of deep learning often depend on large amount of training data. However, in practical image recognition tasks, available training data are often limited or imbalanced across classes, causing the over-fitting issue or the prediction bias issue during model training. In this paper, based on word embedding models from studies in natural language processing, the prior knowledge about the relationships between image classes is utilized to help train more generalizable classifiers under the condition of limited or class-imbalanced training data. Such inter-class relational knowledge is captured in the word embedding vectors for the textual names of image classes. Using these word embedding vectors as soft labels for corresponding image classes, the feature extractor part of a deep learning model can be guided to learn to extract visual features which contain both class-specific and class-shared information. Experiments on multiple image classification datasets confirm that the proposed learning framework helps improve model performance when training data is limited or class-imbalanced.
With wide applications of machine learning algorithms, machine learning security has become a significant issue. The vulnerability to adversarial perturbations exists in most machine learning algorithms, including cut...
详细信息
ISBN:
(数字)9783031064272
ISBN:
(纸本)9783031064272;9783031064265
With wide applications of machine learning algorithms, machine learning security has become a significant issue. The vulnerability to adversarial perturbations exists in most machine learning algorithms, including cutting-edge deep neural networks. The standard adversarial perturbation defence techniques with adversarial training need to generate adversarial examples during the training process, which require high computational costs. This paper proposed a novel defence method using self-adaptive logit balancing and Gaussian noise boost training. This method can improve the robustness of deep neural networks without high computational cost and achieve competitive results compared with the adversarial training methods. Meanwhile, this defence method enables deep learning systems to have proactive and reactive defence during the operation. A sub-classifier is trained to determine whether the system is under attack and detect attack algorithms via the patterns of the Log-Softmax values. It can achieve high accuracy for detecting clean inputs and adversarial examples created by seven attack methods.
In the realm of computer vision, the term "autonomous driving" has become a buzzword. The main goal of the autonomous driving is to reduce human efforts while driving. However, dealing with measurements of d...
详细信息
ISBN:
(纸本)9783031243660;9783031243677
In the realm of computer vision, the term "autonomous driving" has become a buzzword. The main goal of the autonomous driving is to reduce human efforts while driving. However, dealing with measurements of distance raises numerous obstacles, both in terms of equipment and approach. The use of cameras to measure the distance of an object is practical and popular for obstacle avoidance and navigation.. This work focuses on vehicle distance measuring of traffic signs and cars, which is a critical task in the imageprocessing domain. In this research, the suggested system employs two cameras installed in the hosting vehicle in front, to obtain the data and estimate distance. The proposed pipeline starts with YOLO v3 and YOLOv2 algorithms for detecting traffic signs and cars in the video frames. The distances of the detected objects are measured using triangle similarity approach. In final phase, lane segmentation and grid marking are added along with these results. As a result, it will assist drivers inmaking decisions prior to reaching signs, potentially resulting in improved safety decisions.
暂无评论