One of the most interesting and challenging research focuses on patternrecognition and imageprocessing has emerged in recent days is writing in the air. In many different applications, it can improve the interface b...
详细信息
ISBN:
(数字)9798350317060
ISBN:
(纸本)9798350317077
One of the most interesting and challenging research focuses on patternrecognition and imageprocessing has emerged in recent days is writing in the air. In many different applications, it can improve the interface between a machine and a human and offers a substantial contribution to the development of automated operations. In the field of computervision, object tracking is considered a key challenge. The method of analyzing a video usually consists of three primary steps: recognizing the object, tracking its movement from frame to frame, and finally evaluating its behavior. Choosing an adequate object representation, selecting tracking features, identifying objects, and tracking them are the four problems taken into consideration for object tracking. Object tracking algorithms are widely used in many real-world applications, including autonomous surveillance, video indexing, and vehicle navigation. This work exploits this gap by developing a motion-to-text converter that may be used as software for wearable intelligent devices that allow writing in the air. The proposed work acts as a recorder of rare gestures. computervision will be utilized to track the finger’s path. With the generated text, messages, emails, and other kinds of correspondence can all be sent. It will enable effective communication for the deaf. Keywords— object, emoji’s, image color, camera
Because the subtle differences between the different sub-categories of common visual categories such as bird species, fine-grained classification has been seen as a challenging task for many years. Most previous works...
Because the subtle differences between the different sub-categories of common visual categories such as bird species, fine-grained classification has been seen as a challenging task for many years. Most previous works focus towards the features in the single discriminative region isolatedly, while neglect the connection between the different discriminative regions in the whole image. However, the relationship between different discriminative regions contains rich posture information and by adding the posture information, model can learn the behavior of the object which attribute to improve the classification performance. In this paper, we propose a novel fine-grained framework named PMRC (posture mining and reverse cross-entropy), which is able to combine with different backbones to good effect. In PMRC, we use the Deep Navigator to generate the discriminative regions from the images, and then use them to construct the graph. We aggregate the graph by message passing and get the classification results. Specifically, in order to force PMRC to learn how to mine the posture information, we design a novel training paradigm, which makes the Deep Navigator and message passing communicate and train together. In addition, we propose the reverse cross-entropy (RCE) and demomenstate that compared to the cross-entropy (CE), RCE can not only promote the accurracy of our model but also generalize to promote the accuracy of other kinds of fine-grained classification models. Experimental results on benchmark datasets confirm that PMRC can achieve state-of-the-art.
The proceedings contain 56 papers. The special focus in this conference is on Mobile Radio Communications and 5G Networks. The topics include: Predictive Analysis of Air Pollutants Using Machine Learning;Machine Learn...
ISBN:
(纸本)9789811979811
The proceedings contain 56 papers. The special focus in this conference is on Mobile Radio Communications and 5G Networks. The topics include: Predictive Analysis of Air Pollutants Using Machine Learning;Machine Learning Techniques Applied of Land Use—Land Cover (LULC) image Classification: Research Avenues Challenges with Issues;crime Analysis Using computervision Approach with Machine Learning;natural Language processing Implementation for Sentiment Analysis on Tweets;VLSI Implementation of BCH Encoder with Triple DES Encryption for Baseband Transceiver;design and Implementation of image De-hazing Using Histogram Equalization;Improved Hybrid Unified Power Flow Controller Using Fractional Order PID Controlled Systems;Optimized Activation Function-Based SAR Ship Detection;framework for Implementation of Smart Driver Assistance System Using Augmented Reality;Power-Efficient Hardware Design of ECC Algorithm on High Performance FPGA;A Creative Domain of Blockchain Application: NFTs;sign Language recognition Using Machine Learning;an IoT-Based Health Monitoring System for Stress Detection in Human Beings;ioT-Based Driving pattern Analysis and Engine Sensor Damage Prediction Using Onboard Diagnostics;Solving the Element Detecting Problem in Graphs via Quantum Walk Search Algorithm (QWSA);critical Analysis of Secure Strategies Against Threats on Cloud Platform;pareto Optimal Solution for Fully Fuzzy Bi-criteria Multi-index Bulk Transportation Problem;automatic Candidature Selection by Artificial Natural Language processing;elimination and Restoring Deduplicated Storage for Multilevel Integrated Approach with Cost Estimation;vulnerability Assessment of Cryptocurrency Wallet and Exchange Websites;greedy Theory Using Improved Performance Prim’s Algorithm, Big Bang Speedup of the Bellman–Ford Algorithm;performance Analysis of High-Speed Optical Communication Systems Under the Impact of Four Wave Mixing.
computer vison [1], [2], [3], [4], [5] studies properties of machine vision, its semantic understanding, and general manipulations by Intelligent Mathematics (IM) [6], [7], [8], [9], [10] [11], [12], [13], [14], [15] ...
computer vison [1], [2], [3], [4], [5] studies properties of machine vision, its semantic understanding, and general manipulations by Intelligent Mathematics (IM) [6], [7], [8], [9], [10] [11], [12], [13], [14], [15] [16], [17]. computer vison has been studies from various aspects such as algorithmic methods, analysis methods, patternrecognitions, and neural-network-regression (AI) technologies [2], [3]. However, there is a lack of fundamental theories for enabling autonomous imagerecognition and processing by machines. Basic research on contemporary IM has revealed that formal manipulations of visual objects by intelligent machines may be rigorously implemented by image Frame Algebra (IFA) [8], [18] in the front-end and Visual Semantic Algebra (VSA) [19] in the backend. IFA formally manipulates visual images as general 2D matrixes by a set of algebraic operators such as modeling, analyses, syntheses, feature elicitation, and patternrecognition [4], [5], [18]. Then, its counterpart, VSA, transforms the geographic relations of visual objects to their semantic interpretations by algebraic analyses and compositions. The coherent theory of IFA and VSA provides a formal methodology for machine-enabled imageprocessing and comprehension. This keynote presents a theoretical framework of machine vision underpinned by IFA and VSA for the structural denotations of visual objects and functional manipulations of visual mechanisms [3], [8], [9]. It demonstrates how the persistent challenges to machine vision may be rigorously and efficiently solved by the IFA/VSA methodology. Case studies on applying IFA/VSA for rigorous visual pattern detection, recognition, analysis, and composition in real world will be demonstrated [5], [18], [20]. As two coherent paradigms of IM, among others [21], [22], [23], [24], [25] [26], [27], [28], [29], [30], IFA and VSA have been applied not only in robot visual and spatial reasoning, but also in computational intelligence and AI for rigorously
Data augmentation is crucial to solve few-sample issues in industrial inspection based on deep learning. However, current industrial data augmentation methods have not yet demonstrated on-par ability in the synthesis ...
Data augmentation is crucial to solve few-sample issues in industrial inspection based on deep learning. However, current industrial data augmentation methods have not yet demonstrated on-par ability in the synthesis of complex defects with pixel-level annotations. This paper proposes a new defect synthesis framework to fill the gap. Firstly, DCDGANc (Diversified and multi-class Controllable Defect Generation Adversarial Networks based on constant source images) is proposed to employ class labels to construct source inputs to control the category and random codes to generate diversified styles of defects. DCDGANc can generate defect content images with pure backgrounds, which avoids the influence of non-defect information and makes it easy to obtain binary masks by segmentation. Secondly, the Poisson blending is improved to avoid content loss when blending generated defect contents to the normal backgrounds. Finally, the complete defect samples and accurate pixel-level annotations are obtained by fine imageprocessing. Experiments are conducted to verify the effectiveness of our work in wood, fabric, metal, and marble. The results show that our methods yield significant improvement in the segmentation performance of industrial products. Moreover, our work enables zero-shot inspection by facilitating defect transfer between datasets with different backgrounds but similar defects, which can greatly reduce the cost of data collection in industrial inspection.
In today’s world, machine learning, artificial intelligence, IoT, deep learning and several other techniques have become the need of the moment. One such division of artificial intelligence is computervision. The ma...
详细信息
In today’s world, machine learning, artificial intelligence, IoT, deep learning and several other techniques have become the need of the moment. One such division of artificial intelligence is computervision. The main goal of computervision development is to create paradigms for extracting data and information from images. It has various applications in the fields of industry, agriculture, automations, healthcare, e-commerce and much more. The study examines the most recent events and conceptual frameworks governing the progress of computervision, with a focus on patternrecognition and imageprocessing, using a variety of applications from the t field. This article attempts to discuss the most current results and applications in computervision.
In this paper, we propose a novel transfer-based targeted attack method that optimizes the adversarial perturbations without any extra training efforts for auxiliary networks on training data. Our new attack method is...
In this paper, we propose a novel transfer-based targeted attack method that optimizes the adversarial perturbations without any extra training efforts for auxiliary networks on training data. Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks. Therefore, we propose to make the perturbation to be agnostic to different local regions within one image, which we called as self-universality. Instead of optimizing the perturbations on different images, optimizing on different regions to achieve self-universality can get rid of using extra data. Specifically, we introduce a feature similarity loss that encourages the learned perturbations to be universal by maximizing the feature similarity between adversarial perturbed global images and randomly cropped local regions. With the feature similarity loss, our method makes the features from adversarial perturbations to be more dominant than that of benign images, hence improving targeted transferability. We name the proposed attack method as Self-Universality (SU) attack. Extensive experiments demonstrate that SU can achieve high success rates for transfer-based targeted attacks. On imageNet-compatible dataset, SU yields an improvement of 12% compared with existing state-of-the-art methods. Code is available at https://***/zhipengwei/Self-Universality.
The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only d...
The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate models to handle specific missing modality settings. In addition, these models are designed for specific tasks, so for example, classification models are not easily adapted to segmentation tasks and vice versa. In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. ShaSpec is designed to take advantage of all available input modalities during training and evaluation by learning shared and specific features to better represent the input data. This is achieved from a strategy that relies on auxiliary tasks based on distribution alignment and domain classification, in addition to a residual feature fusion procedure. Also, the design simplicity of ShaSpec enables its easy adaptation to multiple tasks, such as classification and segmentation. Experiments are conducted on both medical image segmentation and computervision classification, with results indicating that ShaSpec outperforms competing methods by a large margin. For instance, on BraTS2018, ShaSpec improves the SOTA by more than 3% for enhancing tumour, 5% for tumour core and 3% for whole tumour. 1 1 This work received funding from the Australian Government the through Medical Research Futures Fund: Primary Health Care Research Data Infrastructure Grant 2020 and from Endometriosis Australia. G.C. was supported by Australian Research Council through grant FT190100525.
Existing studies indicate that deep neural networks (DNNs) can eventually memorize the label noise. We observe that the memorization strength of DNNs towards each instance is different and can be represented by the co...
Existing studies indicate that deep neural networks (DNNs) can eventually memorize the label noise. We observe that the memorization strength of DNNs towards each instance is different and can be represented by the confidence value, which becomes larger and larger during the training process. Based on this, we propose a Dynamic Instance-specific Selection and Correction method (DISC) for learning from noisy labels (LNL). We first use a two- view-based backbone for image classification, obtaining confidence for each image from two views. Then we propose a dynamic threshold strategy for each instance, based on the momentum of each instance's memorization strength in previous epochs to select and correct noisy labeled data. Benefiting from the dynamic threshold strategy and two-view learning, we can effectively group each instance into one of the three subsets (i.e., clean, hard, and purified) based on the prediction consistency and discrepancy by two views at each epoch. Finally, we employ different regularization strategies to conquer subsets with different degrees of label noise, improving the whole network's robustness. Comprehensive evaluations on three controllable and four real-world LNL benchmarks show that our method outperforms the state-of-the-art (SOTA) methods to leverage useful information in noisy data while alleviating the pollution of label noise. Code is available at https://***/JackYFL/DISC.
Many companies still rely on manual data entry methods for managing their invoices. Some of these companies deal with a high volume of invoices in various formats daily, resulting in time-consuming processes and resou...
Many companies still rely on manual data entry methods for managing their invoices. Some of these companies deal with a high volume of invoices in various formats daily, resulting in time-consuming processes and resource wastage. To address this issue, a proposal is made to implement an efficient automated invoice processing system using deep learning. This system aims to reduce workload and enhance productivity for companies. In addition, a comprehensive review and comparison of existing techniques and similar systems have been conducted to identify the most suitable solution for this scenario. The proposed work utilizes advanced deep learning computervision techniques, a simple Convolutional Neural Network (CNN) based on RPN, and LeNet-5 is used to detect and classify text objects on invoice documents. This paper utilized scanned invoices to assess the system's performance. A dataset consisting of 1000 scanned English invoices from the Scanned Receipts OCR and Information Extraction (SROIE) dataset. The system will predict and extract specific regions such as invoice number, date, payer information, and total amount from the invoices. However, it has been observed that low-resolution and unclear invoices can negatively impact the accuracy of OCR (Optical Character recognition) pattern-matching methods. To mitigate this issue, an image pre-processing method has been incorporated, which reduces image noise and corrects page skew to achieve better performance.
暂无评论