Recently, Transformer-based methods have achieved impressive performance in many computer vision tasks (e.g., image super-resolution (SR)) due to the advantages of long-range modeling. However, the computational cost ...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recently, Transformer-based methods have achieved impressive performance in many computer vision tasks (e.g., image super-resolution (SR)) due to the advantages of long-range modeling. However, the computational cost requirement renders these methods unsuitable on resource-constrain devices, especially for image SR tasks involving high-resolution images. In this paper, we propose a concise and effective Gated Convolutional Attention Unit (GCAU) that uses cheap convolutional operations. Specifically, GCAU consists of Convolutional Transposed Attention (CTA) and Locally-enhanced Gating (LeG) in parallel. The former allows for efficient modeling of the global relational interactions by calculating cross-covariance across channels dimension, while the latter controls the information flow from the former directing the network to focus on more refined image attributes. Without bells and whistles, we present a simple SR Transformer GCAT by cascading the GCAUs. Extensive experimental results demonstrate that our GCAT achieves state-of-the-art performance among the existing efficient SR methods with significantly less complexity. Especially, GCAT is on average 5× faster than SwinIR-light with comparable performance.
High-performance computing (HPC) has become an essential tool for improving the efficiency and scalability of transaction processing systems, especially as data volumes continue to grow in fields like finance, e-comme...
详细信息
ISBN:
(数字)9798331518097
ISBN:
(纸本)9798331518103
High-performance computing (HPC) has become an essential tool for improving the efficiency and scalability of transaction processing systems, especially as data volumes continue to grow in fields like finance, e-commerce, and blockchain. This paper presents a comparative study of parallel and sequential transaction processingmethods using an SQLite database. Specifically, the research investigates the impact of HPC techniques on transaction throughput, processing speed, and efficiency by simulating 1,000 user transactions. The study employs Python's multiprocessing module to simulate parallel execution, contrasting it with traditional sequential execution. Key performance indicators, including execution time, transaction success rate, and system efficiency, were analyzed to determine the advantages of parallelprocessing in a transaction-heavy environment. Our results reveal that parallel execution significantly reduces processing time, boosts throughput, and increases overall system efficiency compared to sequential processing. Additionally, the study discusses how parallelprocessing techniques can address common bottlenecks in transaction-heavy applications and provide solutions for improving the performance of large-scale, data-intensive systems. The findings demonstrate the potential for using HPC to optimize database operations, particularly in systems where high-volume transaction processing is a critical requirement. Future work will explore advanced parallelization strategies, fault tolerance mechanisms, and integrations with distributed databases and blockchain systems. This research contributes to the growing body of knowledge on optimizing transaction processing in high-performance computing environments, with potential applications across various sectors, including financial services, e-commerce, and blockchain technology.
City events are getting popular and are attracting a large number of people. This increase needs for methods and tools to provide stakeholders with crowd size information for crowd management purposes. Previous works ...
详细信息
City events are getting popular and are attracting a large number of people. This increase needs for methods and tools to provide stakeholders with crowd size information for crowd management purposes. Previous works proposed a large number of methods to count the crowd using different data in various contexts, but no methods proposed using social media images in city events and no datasets exist to evaluate the effectiveness of these methods. In this study we investigate how social media images can be used to estimate the crowd size in city events. We construct a social media dataset, compare the effectiveness of face recognition, object recognition, and cascaded methods for crowd size estimation, and investigate the impact of image characteristics on the performance of selected methods. Results show that object recognition based methods, reach the highest accuracy in estimating the crowd size using social media images in city events. We also found that face recognition and object recognition methods are more suitable to estimate the crowd size for social media images which are taken in parallel view, with selfies covering people in full face and in which the persons in the background have the same distance to the camera. However, cascaded methods are more suitable for images taken from top view with gatherings distributed in gradient. The created social media dataset is essential for selecting image characteristics and evaluating the accuracy of people counting methods in an urban event context.
In response to the demand for rapid acquisition of three-dimensional profiles in industrial online inspection, a real-time 3D image acquisition system has been designed and researched. This system is based on the prin...
详细信息
ISBN:
(数字)9798350355413
ISBN:
(纸本)9798350355420
In response to the demand for rapid acquisition of three-dimensional profiles in industrial online inspection, a real-time 3D image acquisition system has been designed and researched. This system is based on the principles of optical projection and binocular vision fusion, utilizing a DLP projector and industrial cameras to construct a hardware platform, along with the development of an efficient 3D reconstruction algorithm. A three-frequency phase-shifting coding scheme is employed in the stripe pattern design, and a phase unwrapping algorithm guided by a quality map is proposed to enhance reconstruction reliability. The system achieves parallelprocessing through a CUDA heterogeneous computing platform and establishes a four-stage pipeline structure to optimize real-time performance. Experimental results show that the system meets the expected goals in key metrics such as spatial resolution, measurement accuracy, and real-time performance. In practical applications on the mobile phone casing inspection production line, the system demonstrates superior detection efficiency and reliability compared to traditional methods. The research results provide a complete solution for rapid 3D inspection, holding significant engineering application value in fields like industrial quality inspection and reverse engineering.
Imaging quality has always been the most critical issue during the analog to digital development of the X-ray non-destructive testing (NDT). Due to the complicated structure of evaluated objects, it is still difficult...
详细信息
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each ...
详细信息
ISBN:
(纸本)9781713871088
In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. There are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of Mishchenko et al. (2019), which implements a variance reduction technique for handling the variance introduced by compression, is the current state of the art. In the case of biased and contractive compressors (e.g., top-k), the EF21 algorithm of Richtarik et al. (2021), which instead implements an error-feedback mechanism, is the current state of the art. These two classes of compression schemes and algorithms are distinct, with different analyses and proof techniques. In this paper, we unify them into a single framework and propose a new algorithm, recovering DIANA and EF21 as particular cases. Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases. This allows us to inherit the best of the two worlds: like EF21 and unlike DIANA, biased compressors, like top-k, whose good performance in practice is recognized, can be used. And like DIANA and unlike EF21, independent randomness at the compressors allows to mitigate the effects of compression, with the convergence rate improving when the number of parallel workers is large. This is the first time that an algorithm with all these features is proposed. We prove its linear convergence under certain conditions. Our approach takes a step towards better understanding of two so-far distinct worlds of communication-efficient distributed learning.
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This ch...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual imageprocessing algorithms. The output images of these algorithms have completely different characteristics from traditional distortions and are included in the PIPAL dataset used in this challenge. This challenge is divided into two tracks, a full-reference IQA track similar to the previous NTIRE IQA challenge and a new track that focuses on the no-reference IQA methods. The challenge has 192 and 179 registered participants for two tracks. In the final testing stage, 7 and 8 participating teams submitted their models and fact sheets. Almost all of them have achieved better results than existing IQA methods, and the winning method can demonstrate state-of-the-art performance.
Current parallel systems are increasingly heterogeneous, mixing devices of different types and computing capabilities. Exploiting multiple different devices for the same application continues to be a challenge that ra...
详细信息
ISBN:
(数字)9798331524937
ISBN:
(纸本)9798331524944
Current parallel systems are increasingly heterogeneous, mixing devices of different types and computing capabilities. Exploiting multiple different devices for the same application continues to be a challenge that ranges from technical problems related to synchronizing and communicating diverse devices to problems of load distribution and flexibility to adjust the computation to the platform resources. In this work, we study the problem of using and extending a heterogeneous portability layer to program and adapt HSOpticalFlow to heterogeneous platforms. HSOpticalFlow is a streaming application to estimate the apparent movement of objects in a sequence of images. It is a simple but characteristic example of the structure of applications based on multilevel ILS (Iterative Loop Stencil), also known as multi-grid methods, applied to a sequence of inputs. Starting from the original CUDA reference code, we present a methodology and programming techniques based on the Controller programming model to implement it as a pipeline among multiple devices. We discuss a technique to determine a proper work partition and mapping for a set of devices. This allows for building very efficient parallel solutions, using similar devices or taking advantage of devices with lower computing power, to reduce the load and increase the productivity of more powerful ones. We present the results of an experimental study using several GPUs of different vendors, architectures, and generations, showing that this solution allows combinations of devices to be efficiently exploited to improve performance. Specifically, the results include speedups of 1.91x using two NVIDIA A100 GPUs and 1.21x using one NVIDIA V100 GPU and one AMD WX9100 GPU, which is about $3 x$ slower than the NVIDIA GPU for this application.
With the continued increase of size and complexity of contemporary digital systems, there is a growing need for models of large size and high complexity, as well as methods of analyzing such models. This paper present...
详细信息
Text coherence analysis is an important and challenging task that is essential for subtasks such as automatic summarisation, viewpoint extraction and machine translation in natural language processing (NLP). A large b...
Text coherence analysis is an important and challenging task that is essential for subtasks such as automatic summarisation, viewpoint extraction and machine translation in natural language processing (NLP). A large body of previous work has used linguistic features such as lexical, syntactic and entity features to capture relatively shallow coherence features, while ignoring deeper logically relevant features hidden in the text. We propose an end-to-end English text coherence analysis model (RGCM for short) that incorporates RST and graph convolutional neural networks (GCN). The intra- and inter-sentence logical relations of the input text are first mapped onto a discourse relationship tree, and then transformed into an RST-dependent context graph by certain pruning strategies. Subsequently, we propose a GCN-based text coherence assessment framework to capture intra- and inter-sentence interactions to assess text coherence. We conducted experiments on two different coherence assessment datasets and achieved accuracy rates of 95.5% and 97.8% respectively.
暂无评论