RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semanti...
详细信息
ISBN:
(纸本)9781605585123
RDF is a data model for representing labeled directed graphs, and it is used as an important building block of semantic web. Due to its flexibility and applicability, RDF has been used in applications, such as semantic web, bioinformatics, and social networks. In these applications, large-scale graph datasets are very common. However, existing techniques are not effectively managing them. In this paper, we present a scalable, efficient query processing system for RDF data, named SPIDER, based on the well-known parallel/distributedcomputing framework, Hadoop. SPIDER consists of two major modules (1) the graph data loader, (2) the graph query processor. the loader analyzes and dissects the RDF data and places parts of data over multiple servers. the query processor parses the user query and distributes sub queries to cluster nodes. Also, the results of sub queries from multiple servers are gathered (and refined if necessary) and delivered to the user. Both modules utilize the MapReduce framework of Hadoop. In addition, our system supports some features of SPARQL query language. this prototype will be foundation to develop real applications with large-scale RDF graph data.
the computer industry has evolved from single-core to many-core architectures to keep offering increasing processing power. parallel programming is not new but in practice much remains to do to take advantage of these...
详细信息
ISBN:
(纸本)9780975840078
the computer industry has evolved from single-core to many-core architectures to keep offering increasing processing power. parallel programming is not new but in practice much remains to do to take advantage of these many-core architectures. Most software is still designed for serial execution, and the level of parallel execution is often limited to running the user interface and engine in separate threads to keep an application responsive. this paper focuses on task parallelization in an environmental modelling framework, the Invisible Modelling Environmental framework (TIME). the case study arises from a project requiring the calibration of rainfall-runoff models in numerous unimpaired gauged catchments located in Northern Australia. the hydrologic model structure is spatially distributed, such that each catchment model can have hundreds of input time series at a daily time step. these models are run on multi-core computers in a cluster, withthe cumulated memory requirements possibly surpassing the available memory, slowing the computation to unacceptable levels due to virtual memory swapping. We thus tailor the number of catchment calibration tasks per compute node such that the cumulated memory footprint fits in the random access memory of that node. In order to still maximize the use of processing power on these multi-core nodes, we need to be able to parallelize these calibration tasks. three parallelization strategies are considered, characterized mainly by different granularities for the tasks considered for parallelization: (1) parallelizing the calibration algorithm, (2) parallelizing the model along the spatial dimension, and (3) parallelizing the model along spatial and temporal dimensions. Assessing each against several criteria notably runtime performance gain, technological know-how, the amount of code changes and architectural impacts, solution (2) with multi-threading is preferred as the best compromise between these criteria. the architectural ch
Performance analysis and functionality testing are major parts of developing distributed software systems. Since the number of communicating software instances heavily influences the behavior of distributed applicatio...
详细信息
Performance analysis and functionality testing are major parts of developing distributed software systems. Since the number of communicating software instances heavily influences the behavior of distributed applications and communication protocols, evaluation scenarios have to consider a large number of nodes. Network emulation provides an infrastructure for running these experiments using real prototype implementations in a controllable and realistic environment. Large-scale experiments, however, have a high resource consumption which often exceeds available physical testbed resources. Time dilation allows for reducing the resource demands of a scenario at the expense of the experiment's runtime. However, current approaches only consider a constant time dilation factor, which wastes a lot of resources in case of scenarios with varying load. We propose a framework for adaptive time virtualization that significantly reduces the runtime of experiments by improving resource utilization in network emulation testbeds. In this framework, resource demands are monitored and the time dilation factor is dynamically adapted to the required level. Our evaluation shows that adaptive virtual time in combination with our lightweight node virtualization architecture allows us to increase the possible scenario sizes by more than an order of magnitude and, at the same time, ensure unbiased emulation results. this represents an important contribution to making network emulation systems highly scalable.
the proceedings contain 84 papers. the topics discussed include: impact of NUMA effects on high-speed networking with multi-opteron machines;an ensemble method using hybrid real-coded genetic algorithm with pruning (H...
the proceedings contain 84 papers. the topics discussed include: impact of NUMA effects on high-speed networking with multi-opteron machines;an ensemble method using hybrid real-coded genetic algorithm with pruning (HRGA/P R);P2P video broadcast based on per-peer transcoding and its evaluation on planetlab;dynamic modification of wireless multihop transmission route in message-by-message manner for shorter transmission delay;an efficient partitioning and scheduling algorithm for streaming applications on FPGA with resource constraint;associativity-based adaptive weighted clustering for large-scale mobile ad hoc networks;directional node-disjoint multipath routing in wireless ad hoc network;toward more parallel frequent itemset mining algorithms;schedulability of aperiodic tasks in hybrid process model;GRID-enabled ensemble subsurface modeling;and parallel double divide and conquer and its evaluation on a super computer.
Large-scale parallel and distributedcomputing environments have several problems concerning power consumption, thermal ascent, and setting space. In particular, power consumption and thermal ascent are the two seriou...
详细信息
ISBN:
(纸本)9780889867741
Large-scale parallel and distributedcomputing environments have several problems concerning power consumption, thermal ascent, and setting space. In particular, power consumption and thermal ascent are the two serious problems that have to be taken care of when a system is used for a long time. We had proposed a CPU power control and scheduling technique (PCST) for realtime parallel and distributedcomputingsystems in order to solve the abovementioned problems;in these systems, the processors can change the frequencies and voltages. In this study, we evaluate the PCST by using the tracking program. the evaluation results show that when the PCST is employed, the power consumption is 60%-80% when compared to that of the commercial power control;moreover, when compared to that of the no power control, the power consumption is only 20%-40%.
Path planning is one of the most computation expensive tasks in the field of mobile robotics especially in dynamically changing environments. It is difficult to meet realtime requirements with serial path planning alg...
详细信息
ISBN:
(纸本)9780889867741
Path planning is one of the most computation expensive tasks in the field of mobile robotics especially in dynamically changing environments. It is difficult to meet realtime requirements with serial path planning algorithms. this would require a high speed processor. Particularly in small autonomous robot systems, this is inefficient due to the energy consumption and space requirement. In contrast, we propose a parallel path planning approach based on Marching Pixels which is a new innovative Organic computing principle. It can be used as coarse global path planner in dynamically changing environments because the algorithm is very fast and only requires few resources.
We study the problem of scheduling tasks in a distributed system where the data (and code) for a program may reside on a processor different from the one where it will be executed. the scheduling of the tasks is compl...
详细信息
ISBN:
(纸本)9780889867741
We study the problem of scheduling tasks in a distributed system where the data (and code) for a program may reside on a processor different from the one where it will be executed. the scheduling of the tasks is complex as one must balance execution and communications times. We present an off-line polynomial time approximation algorithm for the case when the processors can be split into storage (client) and processing (server) nodes. Our algorithm is the first constant ratio approximation algorithm for this problem. then we discuss generalization of our problem as well as the on-line version of our problem.
Current visual text mining platforms are still focused on small or medium-scale datasets and sequential algorithms. However, as document collections increase in size and complexity, more computing resources are requir...
详细信息
ISBN:
(纸本)9780889867741
Current visual text mining platforms are still focused on small or medium-scale datasets and sequential algorithms. However, as document collections increase in size and complexity, more computing resources are required in order to achieve the expected interactive experience. In order to address the scalability problem, this paper proposes and evaluates parallel implementations for three critical visual text mining algorithms. Experiments withthe parallel solutions were conducted for varying dataset sizes and different numbers of processors. the results show a good speedup for the proposed solutions and indicate the potential benefits of exploring task parallelism in critical algorithms to improve scalability of an interactive visual text mining platform.
暂无评论