Virtually all of deep learning literature relies on the assumption of large amounts of available training data. Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" fo...
详细信息
ISBN:
(纸本)9781665448994
Virtually all of deep learning literature relies on the assumption of large amounts of available training data. Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" for pre-training. This assumption, however, does not always hold. For some tasks, annotating a large number of classes can be infeasible, and even collecting the images themselves can be a challenge in some scenarios. In this paper, we study this problem and call it "Small Data" setting, in contrast to "Big Data." To unlock the full potential of small data, we propose to augment the models with annotations for other related tasks, thus increasing their generalization abilities. In particular, we use the richly annotated scene parsing dataset ADE20K to construct our realistic Long-tail recognition with Diverse Supervision (LRDS) benchmark, by splitting the object categories into head and tail based on their distribution. Following the standard few-shot learning protocol, we use the head classes for representation learning and the tail classes for evaluation. Moreover, we further subsample the head categories and images to generate two novel settings which we call "Scarce-Class" and "Scarce-Image," respectively corresponding to the shortage of training classes and images. Finally, we analyze the effect of applying various additional supervision sources under the proposed settings. Our experiments demonstrate that densely labeling a small set of images can indeed largely remedy the small data constraints. Our code and benchmark are available at https://***/BinahHu/ADE-FewShot.
Humans possess a unique social cognition capability [43, 20];nonverbal communication can convey rich social information among agents. In contrast, such crucial social characteristics are mostly missing in the existing...
详细信息
ISBN:
(纸本)9781665445092
Humans possess a unique social cognition capability [43, 20];nonverbal communication can convey rich social information among agents. In contrast, such crucial social characteristics are mostly missing in the existing scene understanding literature. In this paper, we incorporate different nonverbal communication cues (e.g., gaze, human poses, and gestures) to represent, model, learn, and infer agents' mental states from pure visual inputs. Crucially, such a mental representation takes the agent's belief into account so that it represents what the true world state is and infers the beliefs in each agent's mental state, which may differ from the true world states. By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents. This "five minds" model differs from prior works that infer beliefs in an infinite recursion: instead, agents' beliefs are converged into a "common mind" [31, 47]. Based on this representation, we further devise a hierarchical energy-based model that jointly tracks and predicts all five minds. From this new perspective, a social event is interpreted by a series of nonverbal communication and belief dynamics, which transcends the classic keyframe video summary . . In the experiments, we demonstrate that using such a social account provides a better video summary on videos with rich social interactions compared with state-of-the-art keyframe video summary methods.
In order to solve the problem of difficult target matching and low matching efficiency in binocular measurement, this paper proposes a real-time target feature matching algorithm based on Binocular Stereo vision-absol...
详细信息
ISBN:
(纸本)9781510636385
In order to solve the problem of difficult target matching and low matching efficiency in binocular measurement, this paper proposes a real-time target feature matching algorithm based on Binocular Stereo vision-absolute window error minimization (CAEW, Calculate the Absolute Error Window) to improve the speed and accuracy of measurements. Firstly, the calibration of the camera is solved by using Zhang's calibration method, and the Bouguet algorithm is used for Binocular Stereo vision of the final calibration data. Then, the AdaBoost iterative algorithm is used to train the target detector for target recognition. The CAEW algorithm is compared with the commonly used SURF (Speeded-Up Robust Feature) algorithm. The evaluation data of experimental results showed that the CAEW algorithm can achieve an evaluation of more than 90%. It is significantly improved compared with the SURF algorithm and meet the needs of binocular real-time target matching.
In order to achieve high efficiency, automatic and accurate measurement, the paper takes the two-dimensional measurement of industrial glass under the experimental *** main contents of this paper includes: Analyzing t...
详细信息
ISBN:
(纸本)9781510636385
In order to achieve high efficiency, automatic and accurate measurement, the paper takes the two-dimensional measurement of industrial glass under the experimental *** main contents of this paper includes: Analyzing the structure and hardware performance parameters of the system, building a measuring platform including computer, Charge-coupled Device image sensor, lens, etc, using high-precision camera to take the image of glass, preprocessing of glass image data and acquiring edge information of glass. The system use second filtering method to filter the image and Canny operator to acquire the edge of the industry glass, transforming computer coordinate system into world coordinate system through coordinate transformation method, and finally calculate the two-dimensional size information of industrial *** system measures the two-dimensional length and width of polygonal glass, the experimental results show that the measurement method in this paper meet the accuracy requirements of general industrial measurement, and the detection system is feasible.
High-speed three-dimensional (3D) measurement is increasingly important in many fields. Phase measurement profilometry (PMP) based on the binary defocusing technique has been applied to the high-speed 3D measurement s...
详细信息
High-speed three-dimensional (3D) measurement is increasingly important in many fields. Phase measurement profilometry (PMP) based on the binary defocusing technique has been applied to the high-speed 3D measurement scene for its higher measurement resolution and precision, and breaking the speed limitations of projector. However, because the PMP needs three phase-shifting (3-PS) patterns, motion error is inevitable to measuring dynamic objects. In this research, we construct a complete high-speed 3-PS PMP system, and re-derive two clearer motion error models than those in Weise's research [Conference on computervision and patternrecognition (CVPR) (IEEE, 2007), pp. 1]. Then, we theoretically analyze the effects of the truncation error on the model accuracy, especially when the motion error is higher. To this end, a polynominal-based motion error model by fitting coefficient matrix of pre-simulation is proposed. Meanwhile, its corresponding error compensation method based on local domain estimation of the Nelder-Mead algorithm is developed. Finally, both simulations and quantitative and qualitative experiments verify the accuracy and effectiveness of the proposed method, as well as demonstrate the proposed method has improvements compared with the Weise's research. (c) 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
In order to meet the requirements of 3D reconstruction in accuracy, reconstruction speed and algorithm applicability, this paper proposes a Delaunay growth algorithm based on point cloud curvature smoothing, which fir...
详细信息
ISBN:
(纸本)9781510636385
In order to meet the requirements of 3D reconstruction in accuracy, reconstruction speed and algorithm applicability, this paper proposes a Delaunay growth algorithm based on point cloud curvature smoothing, which firstly projects a 3D discrete point cloud into a 2D plane and passes a 2D Delaunay triangulation. The two-dimensional Delaunay triangulation is performed by the empty circle criterion and the maximum and minimum angle criterion in the score. The PCA principal component analysis is used to estimate the normal of the three-dimensional point cloud and locate the normal on the same side to avoid the disordered points. The cloud normal, combined with the curvature of the corresponding 3D point cloud, removes the invalid normal in the point cloud due to invalid points and preserves the larger part of the point cloud as much as possible, and finally passes the Delaunay constraint criterion and the evaluation function. Filter the set of alternate points to ensure that the reconstructed triangle approximates the Delaunay triangle. The experimental results show that the reconstruction algorithm proposed in this paper is much better than the traditional greedy triangle projection algorithm and Poisson algorithm and the reconstruction speed is increased by 20%.
A scale space-variant filter (SVF) is proposed on the basis of Harris arithmetic operators, which can smoothly isolate noise efficiently at the situation of keeping edge information of the image. Comparing SVF with Ga...
详细信息
ISBN:
(纸本)9781510636361
A scale space-variant filter (SVF) is proposed on the basis of Harris arithmetic operators, which can smoothly isolate noise efficiently at the situation of keeping edge information of the image. Comparing SVF with Gaussian filter under step jump signal and initial image input, the result indicates that SVF is better than Gaussian filter. Using SVF to detect feature points of an image, the experiment shows that feature points detected from SVF output contain more edge information. Using 2D space limitations, Euclidian distance limitation and angle limitation, we can eliminate redundant feature points so that all the useful feature points are distributed in all regions of the image evenly. From the result of the examination for noise-contained image, we can draw the conclusions that the new robust feature point detector can get more accurate position of feature points and the distribution of the points is more rational than that of the points without those limitations.
Residential real estate price is one of the key components of our economic developments and has also been a major concern of the public, bank industry, government, and investors. The accurate estimation of the sale pr...
详细信息
The generative adversarial network(GAN)is first proposed in 2014,and this kind of network model is machine learning systems that can learn to measure a given distribution of data,one of the most important applications...
详细信息
The generative adversarial network(GAN)is first proposed in 2014,and this kind of network model is machine learning systems that can learn to measure a given distribution of data,one of the most important applications is style *** transfer is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output ***-GAN is a classic GAN model,which has a wide range of scenarios in style *** its unsupervised learning characteristics,the mapping is easy to be learned between an input image and an output ***,it is difficult for CYCLE-GAN to converge and generate high-quality *** order to solve this problem,spectral normalization is introduced into each convolutional kernel of the *** convolutional kernel reaches Lipschitz stability constraint with adding spectral normalization and the value of the convolutional kernel is limited to[0,1],which promotes the training process of the proposed ***,we use pretrained model(VGG16)to control the loss of image content in the position of l1 *** avoid overfitting,l1 regularization term and l2 regularization term are both used in the object loss *** terms of Frechet Inception Distance(FID)score evaluation,our proposed model achieves outstanding performance and preserves more discriminative *** results show that the proposed model converges faster and achieves better FID scores than the state of the art.
Vehicle identification is widely used in route planning, safety supervision and military reconnaissance. It is one of the research hotspots of space-based remote sensing applications. Traditional HOG, Gabor features a...
详细信息
ISBN:
(纸本)9781510636361
Vehicle identification is widely used in route planning, safety supervision and military reconnaissance. It is one of the research hotspots of space-based remote sensing applications. Traditional HOG, Gabor features and Hough transform and other manual design features are not suitable for modern city satellite data analysis. With the rapid development of CNN, object detection has made remarkable progress in accuracy and speed. However, in satellite map analysis, many targets are usually small and dense, which results in the accuracy of target detection often being half or even lower than the big target. Small targets have lower resolution, blurred images, and very rare information. After multi-layer convolution, it is difficult to extract effective information. In the satellite map data set we produced, the target vehicles are not only small but also very dense, and it is impossible to achieve high detection accuracy when using YOLO for training directly. In order to solve this problem, we propose a multi-feature fusion target detection method, which combines satellite image and electronic image to achieve the fusion of target vehicle and surrounding semantic information. We conducted a comparative experiment to demonstrate the applicability of multi-feature fusion methods in different detection models such as YOLO and R-CNN. By comparing with the traditional target detection model, the results show that the proposed method has higher detection accuracy.
暂无评论