Benchmarking as yardsticks for system design and evaluation, has developed a long period and plays a pivotal role in many domains, such as database systems and high performance computing. Through prolonged and unremit...
详细信息
Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of com...
详细信息
Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlapping of computation and communication when running iterative big data applications. The pipeline of computing, data movement, and data management plays a key role for current distributed data computing systems. In this paper, we first analyze the overhead of shuffle operation in Hadoop and Spark when running PageRank workload, and then propose an event-driven pipeline and in-memory shuffle design with better overlapping of computation and communication as DataMPI- Iteration, an MPI-based library, for iterative big data computing. Our performance evaluation shows DataMPI-Iteration can achieve 9X-21X speedup over Apache Hadoop, and 2X-3X speedup over Apache Spark for PageRank and K-means.
The subset sum problem is a combinatorial optimization problem,and its complexity belongs to the nondeterministic polynomial time complete(NP-Complete)*** problem is widely used in encryption,planning or scheduling,an...
详细信息
The subset sum problem is a combinatorial optimization problem,and its complexity belongs to the nondeterministic polynomial time complete(NP-Complete)*** problem is widely used in encryption,planning or scheduling,and integer *** accurate search algorithm with polynomial time complexity has not been found,which makes it challenging to be solved on classical *** effectively solve this problem,we translate it into the quantum Ising model and solve it with a variational quantum optimization method based on conditional values at *** proposed model needs only n qubits to encode 2ndimensional search space,which can effectively save the encoding quantum *** model inherits the advantages of variational quantum algorithms and can obtain good performance at shallow circuit depths while being robust to noise,and it is convenient to be deployed in the Noisy Intermediate Scale Quantum *** investigate the effects of the scalability,the variational ansatz type,the variational depth,and noise on the ***,we also discuss the performance of the model under different conditional values at *** computer simulation,the scale can reach more than nine *** selecting the noise type,we construct simulators with different QVs and study the performance of the model with *** addition,we deploy the model on a superconducting quantum computer of the Origin Quantum technology Company and successfully solve the subset sum *** model provides a new perspective for solving the subset sum problem.
In distributed quantum computing(DQC),quantum hardware design mainly focuses on providing as many as possible high-quality inter-chip ***,quantum software tries its best to reduce the required number of remote quantum...
详细信息
In distributed quantum computing(DQC),quantum hardware design mainly focuses on providing as many as possible high-quality inter-chip ***,quantum software tries its best to reduce the required number of remote quantum gates between ***,this“hardware first,software follows”methodology may not fully exploit the potential of *** by classical software-hardware co-design,this paper explores the design space of application-specific DQC *** specifically,we propose Auto Arch,an automated quantum chip network(QCN)structure design *** qubits grouping followed by a customized QCN design,AutoArch can generate a near-optimal DQC architecture suitable for target quantum *** results show that the DQC architecture generated by Auto Arch can outperform other general QCN architectures when executing target quantum algorithms.
With the advent of virtualization techniques and software-defined networking(SDN),network function virtualization(NFV)shifts network functions(NFs)from hardware implementations to software appliances,between which exi...
详细信息
With the advent of virtualization techniques and software-defined networking(SDN),network function virtualization(NFV)shifts network functions(NFs)from hardware implementations to software appliances,between which exists a performance *** to narrow the gap is an essential issue of current NFV ***,the cumbersomeness of deployment,the water pipe effect of virtual network function(VNF)chains,and the complexity of the system software stack together make it tough to figure out the cause of low performance in the NFV *** pinpoint the NFV system performance,we propose NfvInsight,a framework for automatic deployment and benchmarking VNF *** framework tackles the challenges in NFV performance *** framework components include chain graph generation,automatic deployment,and fine granularity *** design and implementation of each component have their *** the best of our knowledge,we make the first attempt to collect rules forming a knowledge base for generating reasonable chain *** deploys the generated chain graphs automatically,which frees the network operators from executing at least 391 lines of bash commands for a single *** diagnose the performance bottleneck,NfvInsight collects metrics from multiple layers of the software ***,we collect the network stack latency distribution ingeniously,introducing only less than 2.2%*** showcase the convenience and usability of NfvInsight in finding bottlenecks for both VNF chains and the underlying *** our framework,we find several design flaws of the network stack,which are unsuitable for packet forwarding inside one single server under the NFV *** optimization for these flaws gains at most 3x performance improvement.
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip ...
详细信息
Dataflow architecture has shown its advantages in many high-performance computing cases. In dataflow computing, a large amount of data are frequently transferred among processing elements through the network-on-chip (NoC). Thus the router design has a significant impact on the performance of dataflow architecture. Common routers are designed for control-flow multi-core architecture and we find they are not suitable for dataflow architecture. In this work, we analyze and extract the features of data transfers in NoCs of dataflow architecture: multiple destinations, high injection rate, and performance sensitive to delay. Based on the three features, we propose a novel and efficient NoC router for dataflow architecture. The proposed router supports multi-destination; thus it can transfer data with multiple destinations in a single transfer. Moreover, the router adopts output buffer to maximize throughput and adopts non-flit packets to minimize transfer delay. Experimental results show that the proposed router can improve the performance of dataflow architecture by 3.6x over a state-of-the-art router.
Analytics based on big data computing can benefit today's banking and financial organizations on many aspects, and provide much valuable information for organizations to achieve more intelligent trading, which can...
详细信息
As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performan...
详细信息
As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+M is proposed to improve N-core processors’ yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors’ performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.8% degradation with negligible computing time.
In propositional normal default logic, given a default theory(?, D) and a well-defined ordering of D, there is a method to construct an extension of(?, D) without any injury. To construct a strong extension of(?, D) g...
详细信息
In propositional normal default logic, given a default theory(?, D) and a well-defined ordering of D, there is a method to construct an extension of(?, D) without any injury. To construct a strong extension of(?, D) given a well-defined ordering of D, there may be finite injuries for a default δ∈ D. With approximation deduction ?s in propositional logic, we will show that to construct an extension of(?, D) under a given welldefined ordering of D, there may be infinite injuries for some default δ∈ D.
Genomic sequence comparison algorithms represent the basic toolbox for processing large volume of DNA or protein sequences. They are involved both in the systematic scan of databases, mostly for detecting similarities...
详细信息
暂无评论