In the process of grabbing and issuing of the unattended system for fine particles dry bulk materials, the three-dimensional distribution of dry bulk materials is continuous, extremely irregular and fast dynamic. The ...
详细信息
ISBN:
(纸本)9781665464680
In the process of grabbing and issuing of the unattended system for fine particles dry bulk materials, the three-dimensional distribution of dry bulk materials is continuous, extremely irregular and fast dynamic. The selection of grabbing area is related to the grabbing safety and grabbing effect. In the extreme case of three-dimensional distribution, the improper selection of grabbing area may lead to accidents such as turning of grabs, closing rope out of groove, poor grabbing effect, or unsatisfied grab efficiency. Based on the distribution characteristics of dry bulk materials, a sliding window detection algorithm based on three-dimensional feature distribution is proposed in this paper. Before grabbing, the detection and analysis are carried out according to the distribution of materials, and the influence characteristics of relatively safe and effective distribution are judged, such as central depression, superelevation slope and other characteristics. The Digital Elevation Model (DEM)three-dimensional feature distribution model of the whole area is established, and the sliding window is further used to detect safety and efficient grab areas. The simulation results show that the sliding window detection algorithm based on DEM three-dimensional feature distribution can predict a more efficient and safe grab area according to the distribution of stacking materials, which provides a new detection method for the application of unattended bridge crane grab engineering.
Photoadaptive synaptic devices enable in-sensor processing of complex illumination scenes, while second-order adaptive synaptic plasticity improves learning efficiency by modifying the learning rate in a given environ...
详细信息
Photoadaptive synaptic devices enable in-sensor processing of complex illumination scenes, while second-order adaptive synaptic plasticity improves learning efficiency by modifying the learning rate in a given environment. The integration of above adaptations in one phototransistor device will provide opportunities for developing high-efficient machinevision system. Here, a dually adaptable organic heterojunction transistor as a working unit in the system, which facilitates precise contrast enhancement and improves convergence rate under harsh lighting conditions, is reported. The photoadaptive threshold sliding originates from the bidirectional photoconductivity caused by the light intensity-dependent photogating effect. Metaplasticity is successfully implemented owing to the combination of ambipolar behavior and charge trapping effect. By utilizing the transistor array in a machinevision system, the details and edges can be highlighted in the 0.4% low-contrast images, and a high recognition accuracy of 93.8% with a significantly promoted convergence rate by about 5 times are also achieved. These results open a strategy to fully implement metaplasticity in optoelectronic devices and suggest their visionprocessingapplications in complex lighting scenes. Organic heterojunction transistors are designed to integrate light intensity-adaptive threshold sliding and second-order adaptive metaplasticity. The unique dual adaptability enables the highlighting of 0.4% low-contrast images, and the efficient recognition can be achieved benefiting from the learning rate changes in the backpropagation process. image
This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with Hibiscus sabdariffa L. as a natural pH indicator. The images of the solutions, each displaying distinc...
详细信息
This study presents a novel approach for pH estimation in buffer solutions using images of solutions prepared with Hibiscus sabdariffa L. as a natural pH indicator. The images of the solutions, each displaying distinctive colours indicative of their pH levels, were transformed into standardized 200x200-pixel images through the application of imageprocessing techniques. Following this, a pH prediction model was constructed using the Adaptive Boosting regressor algorithm. The pH values of the training data used when training the model were distributed irregularly between 0-14. The models were trained with 94 pictures and 1880 experimental values. In addition, a reliable pre-processing part has been placed into the model using imageprocessing techniques, allowing test data to be obtained in any desired environment. The obtained training and test data were separated from noise parameters, affecting the prediction results negatively. A smartphone application based on the model has been developed and made available to everyone. This innovative methodology bridges the gap between traditional pH measurement techniques and computer vision, offering amore accessible and eco-friendly means of pH assessment. The practical applications of this research extend to various fields, including environmental monitoring, agriculture, and educational settings.
Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like Computer vision (CV), Natural Language processing (NLP), Speech processing, Human-Computer Interactio...
详细信息
ISBN:
(纸本)9798400701245
Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like Computer vision (CV), Natural Language processing (NLP), Speech processing, Human-Computer Interaction (HCI), and Multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and Social media applications, there is the need to model the human, including the understanding of human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences, semiotics and psychology. The core is understanding various cross-modal relations, quantifying bias such as social biases, and the applicability of models to real-world problems. Interdisciplinary theories such as semiotics or gestalt psychology can provide additional insights and analysis on perceptual understanding through signs and symbols via multiple modalities. In general, these theories provide a compelling view of multimodality and perception that can further expand computational research and multimedia applications on the Web and Social media. The theme of the MUWS workshop, multimodal human understanding, includes various interdisciplinary challenges related to social bias analyses, multimodal representation learning, detection of human impressions or sentiment, hate speech, sarcasm in multimodal data, multimodal rhetoric and semantics, and related topics. The MUWS workshop will be an interactive event and include keynotes by relevant experts, poster and demo sessions, research presentations and discussion.
Multilevel thresholding plays a crucial role in imageprocessing, with extensive applications in object detection, machinevision, medical imaging, and traffic control systems. It entails the partitioning of an image ...
详细信息
State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully a...
详细信息
ISBN:
(纸本)9781665462839
State-of-the-art approaches in computer vision heavily rely on sufficiently large training datasets. For real-world applications, obtaining such a dataset is usually a tedious task. In this paper, we present a fully automated pipeline to generate a synthetic dataset for instance segmentation in four steps. In contrast to existing work, our pipeline covers every step from data acquisition to the final dataset. We first scrape images for the objects of interest from popular image search engines and since we rely only on text-based queries the resulting data comprises a wide variety of images. Hence, image selection is necessary as a second step. This approach of image scraping and selection relaxes the need for a real-world domain-specific dataset that must be either publicly available or created for this purpose. We employ an object-agnostic background removal model and compare three different methods for image selection: Object-agnostic pre-processing, manual image selection and CNN-based image selection. In the third step, we generate random arrangements of the object of interest and distractors on arbitrary backgrounds. Finally, the composition of the images is done by pasting the objects using four different blending methods. We present a case study for our dataset generation approach by considering parcel segmentation. For the evaluation we created a dataset of parcel photos that were annotated automatically. We find that (1) our dataset generation pipeline allows a successful transfer to real test images (Mask AP 86.2), (2) a very accurate image selection process - in contrast to human intuition - is not crucial and a broader category definition can help to bridge the domain gap, (3) the usage of blending methods is beneficial compared to simple copy-and-paste. We made our full code for scraping, image composition and training publicly available at https://***/parcel2d.
In this study, an ad-hoc imageprocessing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian ...
详细信息
In this study, an ad-hoc imageprocessing pipeline has been developed and proposed for the purpose of semantically segmenting wheat kernel data acquired through near-infrared hyperspectral imaging (HSI). The Gaussian Mixture Model (GMM), characterized as a soft clustering method, has been employed for this task, yielding noteworthy results in both kernel and germ segmentation. A comparative analysis was conducted, wherein GMM was compared with two hard clustering methods, hierarchical clustering and k-means, as well as other common clustering algorithms prevalent in food HSI applications. Notably, GMM exhibited the highest accuracy, with a Jaccard index of 0.745, surpassing hierarchical clustering at 0.698 and k-means at 0.652. Furthermore, the spectral variations observed in wheat kernel topology can be used for semantic image segmentation, especially in the context of selecting the germ portion within the wheat kernels. These findings carry practical significance for professionals in the fields of hyperspectral imaging (HSI) and machinevision, particularly for food product quality assessment and real-time inspection.
This research introduces "Jaddah," an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The system addresse...
详细信息
ISBN:
(数字)9798331506520
ISBN:
(纸本)9798331506537
This research introduces "Jaddah," an innovative AI-based system for the automated detection of road infrastructure defects using advanced computer vision and machine learning techniques. The system addresses the limitations of traditional road inspection methods, which are often slow and prone to human error. Jaddah develops a mobile application that efficiently detects, classifies, and segments road defects at the pixel level. By utilizing a comprehensive dataset of high-resolution images, the model training process is significantly enhanced. The YOLOv8-seg model is implemented to achieve precise defect localization and segmentation, ensuring high accuracy in identifying and categorizing road defects. Performance metrics show an impressive 87% mAP50, demonstrating reliable defect detection. These results contribute to improved infrastructure maintenance, enhanced road safety, and greater operational efficiency.
vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual questio...
详细信息
ISBN:
(纸本)9781713871088
vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection. We present FIBER (Fusion-In-the-Backbone-based transformER), a new VL model architecture that can seamlessly handle both these types of tasks. Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model by inserting cross-attention into the image and text backbones, bringing gains in terms of memory and performance. In addition, unlike previous work that is either only pre-trained on image-text data or on fine-grained data with box-level annotations, we present a two-stage pre-training strategy that uses both these kinds of data efficiently: (i) coarse-grained pre-training based on image-text data;followed by (ii) fine-grained pre-training based on image-text-box data. We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection. Using deep multimodal fusion coupled with the two-stage pre-training, FIBER provides consistent performance improvements over strong baselines across all tasks, often outperforming methods using magnitudes more data. Code is available at https://***/microsoft/FIBER.
暂无评论