A new method is proposed to detect abnormal behaviors in human group activities. This approach effectively models group activities based on social behavior analysis. Different from previous work that uses independent ...
详细信息
Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image. However, the performance is not reliable for images with challenging factors, such as heavy occlusion, motion bl...
Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image. However, the performance is not reliable for images with challenging factors, such as heavy occlusion, motion blur, etc. In this work, we propose to understand human attributes using video frames that can make full use of temporal information. Specifically, we formulate the video-based PAR as a vision-language fusion problem and adopt pre-trained big models CLIP to extract the feature embeddings of given video frames. To better utilize the semantic information, we take the attribute list as another input and transform the attribute words/phrase into the corresponding sentence via split, expand, and prompt. Then, the text encoder of CLIP is utilized for language embedding. The averaged visual tokens and text tokens are concatenated and fed into a fusion Transformer for multi-modal interactive learning. The enhanced tokens will be fed into a classification head for pedestrian attribute prediction. Extensive experiments on a large-scale video-based PAR dataset fully validated the effectiveness of our proposed framework. Both the source code and pre-trained models will be released at https://***/Event–AHU/VTF_PAR.
We propose an algorithm for the determination of three dimensional shape and perspective based on the response of the human visual system to changes in visual textures. Current computervision algorithms are computati...
详细信息
We propose an algorithm for the determination of three dimensional shape and perspective based on the response of the human visual system to changes in visual textures. Current computervision algorithms are computationally intensive and show inherent difficulties in integrating additional cues for the determination of shape, such as shading, contour, or motion. In order to develop a fast and simple mechanism less constrained for integrating other cues, we incorporated aspects of the physiological properties of cortical cells in VI into a network model. We provide psychophysical evidence that the local spatial frequency spectrum is represented by the spatially averaged peak frequency (APF). After normalization, this APF measures texture compression and leads to estimates of 3D shape and depth. Simulations of the model show good agreement with human responses to a range of textured images.< >
This paper considers the problem of modeling and extracting arbitrary deformable contours from noisy images. We propose a global contour model based on a stable and regenerative shape matrix, which is invariant and un...
详细信息
This paper considers the problem of modeling and extracting arbitrary deformable contours from noisy images. We propose a global contour model based on a stable and regenerative shape matrix, which is invariant and unique under rigid motions. Combined with Markov random field to model local deformations, this yields prior distribution that exerts influence over a global model while allowing for deformations. We then cast the problem of extraction into posterior estimation and show its equivalence to energy minimization of a generalized active contour model. We discuss pertinent issues in shape training, minimax regularization and initialization by generalized Hough transform. Finally, we present experimental results and compare its performance to rigid template matching.< >
In this paper, we introduce T-DEED, a Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in sports videos. T-DEED addresses multiple challenges in the task, including the need for discrimina...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In this paper, we introduce T-DEED, a Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in sports videos. T-DEED addresses multiple challenges in the task, including the need for discriminability among frame representations, high output temporal resolution to maintain prediction precision, and the necessity to capture information at different temporal scales to handle events with varying dynamics. It tackles these challenges through its specifically designed architecture, featuring an encoder-decoder for leveraging multiple temporal scales and achieving high output temporal resolution, along with temporal modules designed to increase token discriminability. Leveraging these characteristics, T-DEED achieves SOTA performance on the FigureSkating and FineDiving datasets. Code is available at https://***/arturxe2/T-DEED.
The paper proposes a scalable wavelength-switched optical NoC, named as SWS-ONoC. The proposed architecture is built upon a novel all-optical router which passively routes optical data streams based on their wavelengt...
详细信息
A process is described to determine the shot accuracy of an automatic robotic pool playing system. The system comprises a ceiling-mounted gantry robot, a special purpose cue end-effector, a ceiling-mounted camera, and...
详细信息
Image segmentation is an important research topic in image processing and computervision community. In this paper, a new unsupervised method for MR brain image segmentation is proposed based on fuzzy c-means (FCM) an...
详细信息
This paper describes an automated process for the dynamic creation of a pattern-recognizing computer program consisting of initially unknown detectors, an initially-unknown iterative calculation incorporating the as-y...
详细信息
This paper describes an automated process for the dynamic creation of a pattern-recognizing computer program consisting of initially unknown detectors, an initially-unknown iterative calculation incorporating the as-yet-uncreated detectors, and an initially-unspecified final calculation incorporating the results of the as-yet-uncreated iteration. The program's goal is to recognize a given protein segment as being a transmembrane domain or non-transmembrane area. The recognizing program to solve this problem will be evolved using the recently developed genetic programming paradigm. Genetic programming starts with a primordial ooze of randomly generated computer programs composed of available programmatic ingredients and then genetically breeds the population using the Darwinian principle of survival of the fittest and the genetic crossover (sexual recombination) operation. Automatic function definition enables genetic programming to dynamically create subroutines (detectors). When cross-validated, the best genetically-evolved recognizer achieves an out-of-sample correlation of 0.968 and an out-of-sample error rate of 1.6%. This error rate is better than that recently reported for five other methods.< >
暂无评论