The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors....
详细信息
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128 K processors. One of the challenges when designing and deploying these systems in a production setting is the need to take failure occurrences, whether it be in the hardware or in the software, into account. Earlier work has shown that conventional runtime fault-tolerant techniques such as periodic checkpointing are not effective to the emerging systems. Instead, the ability to predict failure occurrences can help develop more effective checkpointing strategies. Failure prediction has long been regarded as a challenging research problem, mainly due to the lack of realistic failure data from actual production systems. In this study, we have collected RAS event logs from BlueGene/L over a period of more than 100 days. We have investigated the characteristics of fatal failure events, as well as the correlation between fatal events and non-fatal events. Based on the observations, we have developed three simple yet effective failure prediction methods, which can predict around 80% of the memory and network failures, and 47% of the application I/O failures
The need for mass-produced inexpensive wireless devices operating under strict energy constraints poses new challenges in the system design methodology. This paper presents a methodology for designing wireless nodes i...
详细信息
The need for mass-produced inexpensive wireless devices operating under strict energy constraints poses new challenges in the system design methodology. This paper presents a methodology for designing wireless nodes in which a low cost, reliable antenna is realized by printed circuit traces. We show how to combine the analysis from 2.5D and 3D EM simulators with the PCB design tools to create predictable nodes with printed antennas that meet stringent power and data transmission range goals. The presented approach is applied to the design of a IEEE802.15.4 wireless node deployed in several indoor environments. Copyright 2005 ACM.
Focuses on a study which determined the geometry meaning of the maxima of the CDT mathematical subproblem's dual function. Properties of trust region subproblem; Approximation of the CDT feasible region; Relations...
详细信息
Focuses on a study which determined the geometry meaning of the maxima of the CDT mathematical subproblem's dual function. Properties of trust region subproblem; Approximation of the CDT feasible region; Relations between the CDT problem and the trust region problem; Illustration of the geometry meaning of the jump parameter.
We present results of providing database support to biomedicine via federation of SDB Cooperation/Integration based upon the KEGG GUI for molecular biology. The federation provides a common link to three molecular bio...
详细信息
We present results of providing database support to biomedicine via federation of SDB Cooperation/Integration based upon the KEGG GUI for molecular biology. The federation provides a common link to three molecular biology databases. The added value of the federation is freedom from consulting multiple references to ascertain the full set of enzymatic reactions in a metabolic pathway, and the option of selecting multiple queries to submit to the federated SDBs. Each of the SDBs is extensive, but incomplete. The union of the SDBs, implemented transparently by the federation, is more complete. Each SDB provides a different approach to the options available for data presentation and a different set of Web server tools for data analysis. Thus, an important part of the added value of the federation is the cross-fertilization available in the union of the molecular biological content, the presentation of data, and the tools available for analysis.
The authors address the data access and analysis issues faced by interdisciplinary Earth scientists and graduate students as a prototypical domain community which will be accessing large data sets in Earth system scie...
详细信息
The authors address the data access and analysis issues faced by interdisciplinary Earth scientists and graduate students as a prototypical domain community which will be accessing large data sets in Earth system science in the following decades. They present a working prototype developed at George Mason University to serve wide user needs termed Virtual Domain Application Data Center (VDADC). The VDADC prototype provides tools, data products and services tailored to users and can be extended to other domain communities. The VDADC operates in a distributed environment, the World Wide Web, and in close association with federated data centers. Moreover, the information technology implementation is driven by science scenarios and can apply to a variety of domain users, thus reducing network traffic the data centers by implementing intelligent data searching or "content-based browsing" prior to data ordering thus more effectively addressing user needs.
Some of the most challenging problems in science and engineering are being addressed by the integration of computation and science, a research ?eld known as computational science. computational science plays a vital r...
详细信息
ISBN:
(数字)9783540448631
ISBN:
(纸本)9783540401964
Some of the most challenging problems in science and engineering are being addressed by the integration of computation and science, a research ?eld known as computational science. computational science plays a vital role in fundamental advances in biology, physics, chemistry, astronomy, and a host of other disciplines. This is through the coordination of computation, data management, access to instrumentation, knowledge synthesis, and the use of new devices. It has an impact on researchers and practitioners in the sciences and beyond. The sheer size of many challenges in computational science dictates the use of supercomputing, parallel and distri- ted processing, grid-based processing, advanced visualization and sophisticated algorithms. At the dawn of the 21st century the series of International Conferences on computational Science (ICCS) was initiated with a ?rst meeting in May 2001 in San Francisco. The success of that meeting motivated the organization of the - cond meeting held in Amsterdam April 21–24, 2002, where over 500 participants pushed the research ?eld further. The International Conference on computational Science 2003 (ICCS 2003) is the follow-up to these earlier conferences. ICCS 2003 is unique, in that it was a single event held at two di?erent sites almost opposite each other on the globe – Melbourne, Australia and St. Petersburg, Russian Federation. The conference ran on the same dates at both locations and all the presented work was published in a single set of proceedings, which you hold in your hands right now.
Microservice architectures are increasingly used to modularize IoT applications and deploy them in distributed and heterogeneous edge computing environments. Over time, these microservice-based IoT applications are su...
详细信息
Microservice architectures are increasingly used to modularize IoT applications and deploy them in distributed and heterogeneous edge computing environments. Over time, these microservice-based IoT applications are susceptible to performance anomalies caused by resource hogging (e.g., CPU or memory), resource contention, etc., which can negatively impact their Quality of Service and violate their Service Level Agreements. Existing research on performance anomaly detection for edge computing environments focuses on model training approaches that either achieve high accuracy at the expense of a time-consuming and resource-intensive training process or prioritize training efficiency at the cost of lower accuracy. To address this gap, while considering the resource constraints and the large number of devices in modern edge platforms, we propose two clustering-based model training approaches: (1) intra-cluster parameter transfer learning-based model training (ICPTL) and (2) cluster-level model training (CM). These approaches aim to find a trade-off between the training efficiency of anomaly detection models and their accuracy. We compared the models trained under ICPTL and CM to models trained for specific devices (most accurate, least efficient) and a single general model trained for all devices (least accurate, most efficient). Our findings show that ICPTL’s model accuracy is comparable to that of the model per device approach while requiring only 40% of the training time. In addition, CM further improves training efficiency by requiring 23% less training time and reducing the number of trained models by approximately 66% compared to ICPTL, yet achieving a higher accuracy than a single general model.
暂无评论