Sequence comparison is a task performed in several Bioinformatics applications daily all over the world. Algorithms that retrieve the optimal result have quadratic time complexity, requiring a huge amount of computing...
详细信息
ISBN:
(数字)9781728165820
ISBN:
(纸本)9781728165837
Sequence comparison is a task performed in several Bioinformatics applications daily all over the world. Algorithms that retrieve the optimal result have quadratic time complexity, requiring a huge amount of computing power when the sequences compared are long. In order to reduce the execution time, many parallel solutions have been proposed in the literature. Nevertheless, depending on the sizes of the sequences, even those parallel solutions take hours or days to complete. Pruning techniques can significantly improve the performance of the parallel solutions and a few approaches have been proposed to provide pruning capabilities for sequence comparison applications. This paper proposes and evaluates a variant of the block pruning approach that runs in multiple GPUs, in homogeneous or heterogeneous environments. Experimental results obtained with DNA sequences in two testbeds show that significant performance gains are obtained with pruning, compared to its non-pruning counterpart, achieving the impressive performance of 694.8 GCUPS (Billions of Cells Updated per Second) for four GPUs.
The proceedings contain 6 papers. The topics discussed include: evaluation of programming models to address load imbalance on distributed multi-core CPUs: a case study with block low-rank factorization;a UPC++ actor l...
ISBN:
(纸本)9781728159799
The proceedings contain 6 papers. The topics discussed include: evaluation of programming models to address load imbalance on distributed multi-core CPUs: a case study with block low-rank factorization;a UPC++ actor library and its evaluation on a shallow water proxy application;pygion: flexible, scalable task-based parallelism with python;enabling low-overhead communication in multi-threaded OpenSHMEM applications using contexts;exploring the use of novel programming models in land surface models;and designing, implementing, and evaluating the upcoming OpenSHMEM teams API.
The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing st...
详细信息
ISBN:
(数字)9781728187778
ISBN:
(纸本)9781728187785
The proliferation of the semantic web in the form of Resource Description Framework (RDF) demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. More precisely, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL query performance regardless of its pattern shape with minimized pre-processing time. In this context, we propose a new relational partitioning scheme called Property Table Partitioning (PTP) for RDF data, that further partitions existing Property Table into multiple tables based on distinct properties (comprising of all subjects with non-null values for those distinct properties) in order to minimize input data and join operations. In this paper, we introduce a distributed RDF data management system called S3QLRDF, which is built on top of Spark and utilizes SQL to execute SPARQL queries over PTP schema. We perform an extensive experimental evaluation with respect to preprocessing costs and query performance, using Lehigh University Benchmark (LUBM) and Waterloo SPARQL Diversity Test Suite (WatDiv) datasets with up to 1.4 billion triples. Our results demonstrate that S3QLRDF outperforms state-of-the-art distributed RDF management systems.
parallel filesystems (PFSs) are one of the most critical high-availability components of High Performance computing (HPC) systems. Most HPC workloads are dependent on the availability of a POSIX compliant parallel fil...
详细信息
ISBN:
(数字)9781728166773
ISBN:
(纸本)9781728166780
parallel filesystems (PFSs) are one of the most critical high-availability components of High Performance computing (HPC) systems. Most HPC workloads are dependent on the availability of a POSIX compliant parallel filesystem that provides a globally consistent view of data to all compute nodes of a HPC system. Because of this central role, failure or performance degradation events in the PFS can impact every user of a HPC resource. There is typically insufficient information available to users and even many HPC staff to identify the causes of these PFS events, impeding the implementation of timely and targeted remedies to PFS issues. The relevant information is distributed across PFS servers; however, access to these servers is highly restricted due to the sensitive role they play in the operations of a HPC system. Additionally, the information is challenging to aggregate and interpret, relegating diagnosis and treatment of PFS issues to a select few experts with privileged system access. To democratize this information, we are developing an open-source and user-facing parallel FileSystem TRacing and Analysis SErvice (PFSTRASE) that analyzes the requisite data to establish causal relationships between PFS activity and events detrimental to stability and performance. We are implementing the service for the open-source Lustre filesystem, which is the most commonly used PFS at large-scale HPC sites. Server loads for specific PFS I/O operations (IOPs) will be measured and aggregated by the service to automatically estimate an effective load generated by every client, job, and user. The infrastructure provides a realtime, user accessible text-based interface and a publicly accessible web interface displaying both real-time and historical data. To democratize this information, we are developing an open-source and user-facing parallel FileSystem TRacing and Analysis SErvice (PFSTRASE) that analyzes the requisite data to establish causal relationships between PFS activity a
Binary addition is a commonly used application in computational arithmetic. Adders are the basic building blocks of the various computational structures leading to wide applications in Digital Signal Processing, arith...
详细信息
ISBN:
(数字)9781728198859
ISBN:
(纸本)9781728198866
Binary addition is a commonly used application in computational arithmetic. Adders are the basic building blocks of the various computational structures leading to wide applications in Digital Signal Processing, arithmetic, and logical units, microprocessors, and microcontrollers. Research on adders with optimal specifications is continuously carried out. The delay of an adder depends on the speed of the carry bit to reach the next bit position for addition. In this paper, we introduce and discuss a fast 64-bit parallel prefix adder design. The proposed novel design uses the advantage of the Ling adder design needed to suppress the area requirement and increase the computation speed compared to the existing algorithms. With a moderate increase in area and power, the adder gives a quality critical path delay of 21.9ns. The structure is coded using Verilog HDL, simulated, and implemented with the Xilinx Vivado tool.
Recent advances in the Internet of Things (IoT) and, more generally, distributedcomputing research have given rise to various edge-oriented computing paradigms including Edge-, Fog-, and Mobile Cloud-computing. Syste...
详细信息
ISBN:
(纸本)9781728167398
Recent advances in the Internet of Things (IoT) and, more generally, distributedcomputing research have given rise to various edge-oriented computing paradigms including Edge-, Fog-, and Mobile Cloud-computing. Systems adhering to these paradigms exhibit a number of characteristics such as high levels of heterogeneity and dynamicity, leading to high complexity, that exacerbate the problems associated with traditional cyber defence. These problems are further exacerbated when edge-oriented systems operate in adversarial environments, e.g. the Internet of Battle Things (IoBT). Current generations of cyber defence approaches straddling attack prevention, detection, response and tolerance are arguably insufficient for edge-oriented systems operating in IoBT environments, especially given the constant increase in cyber attack sophistication. In this paper, we propose proactive antifragility as a new paradigm for the next generation of cyber defence approaches capable of taking into account the aforementioned system characteristics and environmental challenges. We propose a conceptualization of proactive antifragility, and then outline associated challenges and research directions indicating which existing approaches can be re-used or advanced/developed further in order for proactive antifragility to be achieved "at the edge".
Object detection is not only shaping how Computers see and analyze things but it is also helping in the behavior of how an object reacts to the change in its environment. The main application of these object detection...
详细信息
ISBN:
(数字)9781728141428
ISBN:
(纸本)9781728141435
Object detection is not only shaping how Computers see and analyze things but it is also helping in the behavior of how an object reacts to the change in its environment. The main application of these object detection sensors or software is to find the location of an object in space or to track its movement. Object detection has infinitely many use cases and in this paper, we are introducing an application that will allow safety of users struck in a disaster and who need to be evacuated. In such cases the main thing to focus and to eradicate is camera noise, saturation and image compression. Our solution is to establish a connection between the person struck in a disaster with fire safety people. This works over a convolutional network that allows us to detect vulnerable things present inside a room that needs to be rescued and can also give an insight of any explosive inside the room. Our model uses Faster-RCNN and COCO which is a pretrained dataset. This allows real time object detection and classification on our network. Using this we were able to detect an object or a person and get him to rescue by providing them a shortest way out of that place. With this we were able to get an accuracy of more than 75% in our object detection model.
Computational fluid dynamics (CFD) can serve as a complementary approach to conventional wind tunnel testing to assess the wind flow around tall buildings. Being a clear High Performance computing (HPC) task, CFD simu...
详细信息
ISBN:
(数字)9781728165820
ISBN:
(纸本)9781728165837
Computational fluid dynamics (CFD) can serve as a complementary approach to conventional wind tunnel testing to assess the wind flow around tall buildings. Being a clear High Performance computing (HPC) task, CFD simulations conventionally run on supercomputers and compute clusters using specialized software such as OpenFOAM. The limited availability and high maintenance costs of supercomputers and clusters force small and medium companies to search for the cost-efficient infrastructure to conduct their simulations with the appropriate performance. The on-demand offer of compute capacity by cloud service providers are well suited this task. However, engineers and researchers require extensive expertise and experience in working with cloud computing in order to benefit from running CFD simulations on a *** contribution of the paper to the outlined problem is two-fold: 1) a unique Automated parallel Processing Application (APPA) tool that hides the cloud management details from the wind engineer and provides an intuitive user interface; 2) the estimation of the optimal number of cores (vCPUs) for virtual machine instances provided by AWS and Google Cloud based on average run time and total cost metrics for a given number of cells of a CFD-simulation. n1-highcpu-96 Google Cloud VM met both goals: low cost and low runtime per timestep. For the number of vCPUs below 16, the c4.8xlarge AWS VM type has the least runtime per timestep in all the cases. Google Cloud instances with high vCPUs are recommended to run the simulations if budget is a big concern.
The proceedings contain 15 papers. The topics discussed include: on the portability of GPU-accelerated applications via automated source-to-source translation;distributed and parallel programming paradigms on the k co...
ISBN:
(纸本)9781450366328
The proceedings contain 15 papers. The topics discussed include: on the portability of GPU-accelerated applications via automated source-to-source translation;distributed and parallel programming paradigms on the k computer and a cluster;multi-accelerator extension in OpenMP based on PGAS model;an extended roofline model with communication-awareness for distributed-memory HPC systems;a memory saving communication method using remote atomic operations;scalable communication performance prediction using auto-generated pseudo MPI event trace;acceleration of symmetric sparse matrix-vector product using improved hierarchical diagonal blocking format;and cache-efficient implementation and batching of tridiagonalization on manycore CPUs.
To enhance the accessibility and reliability for a distributed generation system (DGS), a grid-tied photovoltaic (PV) generation system based on multiple parallel connected PV-inverters is developed for microgrid appl...
详细信息
ISBN:
(数字)9781728175904
ISBN:
(纸本)9781728175911
To enhance the accessibility and reliability for a distributed generation system (DGS), a grid-tied photovoltaic (PV) generation system based on multiple parallel connected PV-inverters is developed for microgrid application in this work. This microgrid is designed to operate in the grid tied mode (GTM) as well as a standalone mode (SAM) and vice-versa. Here, a grid-forming inverter (GFI) technology is utilized for system with high penetration of renewables. The single stage PV-battery interfaced GFI inverter in parallel with a PV array supported inverters is considered to operate in the GTM as well as in (SAM) of operation. In SAM, the grid forming inverter maintains constant voltage and frequency across the load The control strategies of each unit are independent and using multi-loop controllers with switching logic. In the GTM, each PV panel in DGS is interfaced with an inverter with independent maximum power point tracking (MPPT) to harvest maximum power. The battery supports the PV units during peak load periods and maintains the power balance in the microgrid. Controlled charging and discharging profile of the battery is achieved by a DC-DC bidirectional converter (DBC). The control strategy of microgrid is verified with simulated results in detail.
暂无评论