Many crowdsourcing platforms are emerging, leveraging the resources of recruited workers to execute various outsourcing tasks, mainly for those computing-intensive video analytics with high quality requirements. Altho...
详细信息
The self-attention mechanism is the core component of Transformer, which provides a powerful ability to understand the sequence context. However, the self-attention mechanism also suffers from a large amount of redund...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
The self-attention mechanism is the core component of Transformer, which provides a powerful ability to understand the sequence context. However, the self-attention mechanism also suffers from a large amount of redundant computation. Model sparsification can effectively reduce computational load, but the irregularity of non-zeros introduced by sparsification significantly decreases hardware efficiency. This paper proposes Funnel, an accelerator that dynamically predicts sparse attention patterns and efficiently processes unstructured sparse data. Firstly, we adopt a fast quantization method based on lookup table to minimize the cost of sparse patterns prediction. Secondly, we propose Funnel Computing Unit (FCU), a hardware architecture that efficiently handles sparse attention through multi-dataflow fusion. Sampled Dense-Dense Matrix Multiplication (SDDMM) and Sparse-Dense Matrix Multiplication (SpMM) are core components of sparse attention mechanism. FCU unifies the computation ways of matrix inner product and row-wise product to support SDDMM and SpMM at the same time, which greatly reduces the storage and movement overhead of intermediate results. Lastly, we devise a lightweight buffer and data tiling strategy tailored to the proposed accelerator, aimed at enhancing data reuse. Experiments demonstrate that our accelerator achieves 0.10-0.25 sparsity with small accuracy loss. When computing the self-attention layer, it attains hardware efficiency ranging from 60% to 85%. Compared to CPU and GPU, it achieves 5.60x and 8.20x speedup. Compared to the state-of-the-art attention accelerators A 3 , SpAtten, FTRANS, and Sanger, it achieves 7.37x, 4.52x, 9.58x, and 3.08x speedup.
This study undertakes a comprehensive analysis of second-order Ordinary Differential Equations (ODEs) to examine animal avoidance behaviors, specifically emphasizing analytical and computational aspects. By using the ...
详细信息
This study undertakes a comprehensive analysis of second-order Ordinary Differential Equations (ODEs) to examine animal avoidance behaviors, specifically emphasizing analytical and computational aspects. By using the Picard-Lindel & ouml;f and fixed-point theorems, we prove the existence of unique solutions and examine their stability according to the Ulam-Hyers criterion. We also investigate the effect of external forces and the system's sensitivity to initial conditions. This investigation applies Euler and Runge-Kutta fourth-order (RK4) methods to a mass-spring-damper system for numerical approximation. A detailed analysis of the numerical approaches, including a rigorous evaluation of both absolute and relative errors, demonstrates the efficacy of these techniques compared to the exact solutions. This robust examination enhances the theoretical foundations and practical use of such ODEs in understanding complex behavioral patterns, showcasing the connection between theoretical understanding and real-world applications.
The longitudinal nonreciprocal charge transport (NCT) in crystalline materials is a highly nontrivial phenomenon, motivating the design of next generation two-terminal rectification devices (e.g., semiconductor diodes...
详细信息
The longitudinal nonreciprocal charge transport (NCT) in crystalline materials is a highly nontrivial phenomenon, motivating the design of next generation two-terminal rectification devices (e.g., semiconductor diodes beyond PN junctions). The practical application of such devices is built upon crystalline materials whose longitudinal NCT occurs at room temperature and under low magnetic field. However, materials of this type are rather rare and elusive, and theory guiding the discovery of these materials is lacking. Here, we develop such a theory within the framework of semiclassical Boltzmann transport theory. By symmetry analysis, we classify the complete 122 magnetic point groups with respect to the longitudinal NCT phenomenon. The symmetry-adapted Hamiltonian analysis further uncovers a previously overlooked mechanism for this phenomenon. Our theory guides the first-principles prediction of longitudinal NCT in multiferroic ϵ−Fe2O3 semiconductor that possibly occurs at room temperature, without the application of external magnetic field. These findings advance our fundamental understandings of longitudinal NCT in crystalline materials, and aid the corresponding materials discoveries.
Network Function Virtualization (NFV) technology can tie together a set of Virtual Network Functions (VNFs) as a Service Function Chain (SFC). Although NFV is a promising technology for the placement of user service r...
详细信息
Network Function Virtualization (NFV) technology can tie together a set of Virtual Network Functions (VNFs) as a Service Function Chain (SFC). Although NFV is a promising technology for the placement of user service requests, SFCs often suffer from high delay and low throughput. This paper develops an efficient approach based on Deep Reinforcement Learning (DRL) with the aim of deploying VNFs with low delay during heterogeneous bandwidth demands in Data Center Networks (DCNs). We design an architecture that exploits the dependencies of VNFs to parallelize them and also preserve more resources for processing future requests by extracting the distribution of initialized VNFs. Our algorithm is expected to maximize the long-term reward and improve the computational acceleration in provisioning user service requests. Our evaluations show that the proposed algorithm reduces the service delay by 12% through parallelized VNFs. (c) 2023 The Author(s). Published by Elsevier B.V. on behalf of The Korean Institute of Communications and Information sciences. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction mo...
详细信息
With the growing number of Web services, classifying Web services accurately and efficiently has become a challenging problem. Effective service classification is conducive to improving the quality of service discover...
详细信息
software developers can only obtain a very small amount of information from the individual failure-causing inputs, which makes debugging difficult. Therefore, it is necessary to explore additional failure-causing inpu...
详细信息
software developers can only obtain a very small amount of information from the individual failure-causing inputs, which makes debugging difficult. Therefore, it is necessary to explore additional failure-causing inputs (failure regions) using the known failure-causing inputs. In order to accurately and efficiently identify the failure region, we propose a novel two-stage search algorithm, TS-FRI. In the initial exploration stage, a round-robin search identifies several boundary failure-causing points, and the failure region's centroid is estimated. During the main search stage, the boundary failure-causing points are identified through iterative division of the input domain with an equally sized partitioning strategy. This results in the boundary points being as dispersed as possible around the failure-region boundary, with the polytope formed by the points approximating the failure region (e.g., a polygon in two dimensions). The proposed algorithm is validated through simulation and empirical analysis: The experimental results show that the TS-FRI accuracy is at least comparable to the best accuracy of the compared three algorithms, and can be ten times better. In addition, TS-FRI only takes a quarter of the computation time and half the failure-validation cost of the other algorithms.
Topology optimization of thermal-fluid coupling problems has received widespread *** article proposes a novel topology optimization method for laminar two-fluid heat exchanger *** proposed method utilizes an artificia...
详细信息
Topology optimization of thermal-fluid coupling problems has received widespread *** article proposes a novel topology optimization method for laminar two-fluid heat exchanger *** proposed method utilizes an artificial density field to create two permeability interpolation functions that exhibit opposing trends,ensuring separation between the two fluid ***,a Gaussian function is employed to construct an interpolation function for the thermal conductivity ***,a computational program has been developed on the OpenFOAM platform for the topology optimization of two-fluid heat *** program leverages parallel computing,significantly reducing the time required for the topology optimization *** enhance computational speed and reduce the number of constraint conditions,we replaced the conventional pressure drop constraint condition in the optimization problem with a pressure inlet/outlet boundary *** 3D optimization results demonstrate the characteristic features of a surface structure,providing valuable guidance for designing heat exchangers that achieve high heat exchange efficiency while minimizing excessive pressure *** the same time,a new structure appears in large-scale topology optimization,which proves the effectiveness and stability of the topology optimization program written in this paper in large-scale calculation.
Fake news in social networks causes disastrous effects on the real world yet effectively detecting newly emerged fake news remains difficult. This problem is particularly pronounced when the testing samples (target do...
详细信息
Fake news in social networks causes disastrous effects on the real world yet effectively detecting newly emerged fake news remains difficult. This problem is particularly pronounced when the testing samples (target domain) are derived from different topics, events, platforms or time periods from the training dataset (source domains). Though efforts have focused on learning domain-invariant features (DIF) across multiple source domains to transfer universal knowledge from the source to the target domain, they ignore the complexity that arises when the number of source domains increases, resulting in unreliable DIF. In this paper, we first point out two challenges faced by learning DIF for fake news detection, (1) high intra-domain correlations, caused by the similarity of news samples within the same domain but different categories can be higher than that in different domains but the same categories, and (2) complex inter-domain correlations, stemming from that news samples in different domains are semantically related. To tackle these challenges, we propose two modules, center-aware feature alignment and likelihood gain-based feature disentanglement, to enhance the multiple domains alignment while enforcing two categories separated and disentangle the domain-specific features in an adversarial supervision manner. By combining these modules, we conduct a label-irrelevant multi-domain feature alignment (LIMFA) framework. Our experiments show that LIMFA can be deployed with various base models and it outperforms the state-of-the-art baselines in 4 cross-domain scenarios. Our source codes will be available upon the acceptance of this manuscript.
暂无评论