An autonomous driving system requires efficient image recognition to interpret the environment, detect obstacles, and make real-time decisions. This study compares Convolutional Neural Networks (CNNs) and vision Trans...
详细信息
Image-based detection of human actions has recently emerged as a hot research area in the fields of computervision and patternrecognition. It is concerned with detecting a person's actions or behavior from a sta...
详细信息
Digit recognition is foundational in pattern recog-nition and machine learning, with applications in document processing and optical character recognition. Current research often targets English digits, overlooking la...
详细信息
Depth estimation is a historical problem in computervision. It is essential to get accurate depth information from cameras to implement autopilot. Most of the early studies were based on multiple observation points. ...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Depth estimation is a historical problem in computervision. It is essential to get accurate depth information from cameras to implement autopilot. Most of the early studies were based on multiple observation points. Therefore, monocular depth estimation was proposed to overcome this restriction, trying to estimate depth information of every pixel from a single image. In this paper, we propose a semi-supervised multi-task model to predict depth map and semantic segmentation by making use of the implicit relation between the two tasks by sharing parameters. Moreover, our presented model is able to predict favorable depth map against the state-of-the-art on the KITTI benchmark. The code we used to train and evaluate our models is available at: https://***/fyhfly/Multi-task-Depth-Estimation.
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set ...
详细信息
ISBN:
(纸本)9781665448994
Training GANs in low-data regimes remains a challenge, as overfitting often leads to memorization or training divergence. In this work, we introduce One-Shot GAN that can learn to generate samples from a training set as little as one image or one video. We propose a two-branch discriminator, with content and layout branches designed to judge the internal content separately from the scene layout realism. This allows synthesis of visually plausible, novel compositions of a scene, with varying content and layout, while preserving the context of the original sample. Compared to previous single-image GAN models, One-Shot GAN achieves higher diversity and quality of synthesis. It is also not restricted to the single image setting, successfully learning in the introduced setting of a single video.
This paper proposes a deep-learning computervision algorithm to estimate hand roll angles for metric-based assessment of surgical suturing skills. The number of rolls metric, previously calculated directly from IMU d...
详细信息
computervision tasks, such as image classification, semantic segmentation, and super resolution, are broadly utilized in many applications. Recent studies revealed that machine learning-based models for the computer ...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
computervision tasks, such as image classification, semantic segmentation, and super resolution, are broadly utilized in many applications. Recent studies revealed that machine learning-based models for the computervision tasks are vulnerable to adversarial attacks. Since the adversarial attack can disturb the computervision models in real-world systems, many countermeasures have been proposed against the adversarial attacks, such as denoising, resizing, and machine learning-based super resolution model as a preprocessing. Recently, a prior work demonstrated that the super resolution model as a preprocessing can be vulnerable to the adversarial attack targeted to the preprocessing itself, only when the perturbation is inactive before the preprocessing. However, we also found that the perturbation before the preprocessing can be another serious threat if the super resolution model is used for a mitigation of adversarial attacks. In this paper, we propose Layered Adversary Generation (LAG) that generates the adversarial example by recursively injecting noises to clean image in white-box environment. We then show that LAG is effective to attack a semantic segmentation model even if the super resolution models with/without two countermeasures as auxiliary methods such as resizing and denoising are adopted to mitigate the adversarial attacks. Furthermore, we demonstrate that LAG is transferable across other super resolution models. Lastly, we discuss our attack method in gray-box and black-box environments, and suggests a mitigation for robust preprocessing.
Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for...
详细信息
ISBN:
(纸本)9781665445092
Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for image generators, where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis. We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators. We also investigate several interesting properties unique to the new architecture.
Past studies have illustrated the prevalence of UI dark patterns, or user interfaces that can lead end-users toward (unknowingly) taking actions that they may not have intended. Such deceptive UI designs can be either...
详细信息
ISBN:
(纸本)9781665457019
Past studies have illustrated the prevalence of UI dark patterns, or user interfaces that can lead end-users toward (unknowingly) taking actions that they may not have intended. Such deceptive UI designs can be either intentional (to benefit an online service) or unintentional (through complicit design practices) and can result in adverse effects on end users, such as oversharing personal information or financial loss. While significant research progress has been made toward the development of dark pattern taxonomies across different software domains, developers and users currently lack guidance to help recognize, avoid, and navigate these often subtle design motifs. However, automated recognition of dark patterns is a challenging task, as the instantiation of a single type of pattern can take many forms, leading to significant variability. In this paper, we take the first step toward understanding the extent to which common UI dark patterns can be automatically recognized in modern software applications. To do this, we introduce AIDUI, a novel automated approach that uses computervision and natural language processing techniques to recognize a set of visual and textual cues in application screenshots that signify the presence of ten unique UI dark patterns, allowing for their detection, classification, and localization. To evaluate our approach, we have constructed CONTEXTDP, the current largest dataset of fully-localized UI dark patterns that spans 175 mobile and 83 web UI screenshots containing 301 dark pattern instances. The results of our evaluation illustrate that AIDUI achieves an overall precision of 0.66, recall of 0.67, F1-score of 0.65 in detecting dark pattern instances, reports few false positives, and is able to localize detected patterns with an IoU score of 0.84. Furthermore, a significant subset of our studied dark patterns can be detected quite reliably (F1 score of over 0.82), and future research directions may allow for improved detection of add
Jigsaw puzzle solving, the problem of constructing a coherent whole from a set of non-overlapping unordered fragments, is fundamental to numerous applications, and yet most of the literature has focused thus far on le...
详细信息
ISBN:
(纸本)9781665445092
Jigsaw puzzle solving, the problem of constructing a coherent whole from a set of non-overlapping unordered fragments, is fundamental to numerous applications, and yet most of the literature has focused thus far on less realistic puzzles whose pieces are identical squares. Here we formalize a new type of jigsaw puzzle where the pieces are general convex polygons generated by cutting through a global polygonal shape with an arbitrary number of straight cuts. We analyze the theoretical properties of such puzzles, including the inherent challenges in solving them once pieces are contaminated with geometrical noise. To cope with such difficulties and obtain tractable solutions, we abstract the problem as a multi-body spring-mass dynamical system endowed with hierarchical loop constraints and a layered reconstruction process that is guided by the pictorial content of the pieces. We define evaluation metrics and present experimental results on both apictorial and pictorial puzzles to indicate that they are solvable completely automatically.
暂无评论