The Charles River Analytics solution to the VAST 2012 Mini-Challenge 1 quickly gives a user insight into the overall health of a large computer network across the entire globe while simultaneously highlighting anomali...
详细信息
ISBN:
(纸本)9781467347532
The Charles River Analytics solution to the VAST 2012 Mini-Challenge 1 quickly gives a user insight into the overall health of a large computer network across the entire globe while simultaneously highlighting anomalies in the data that could indicate security threats. The interface is designed to convey geospatial and more detailed information without cognitive overload and to allow the user to quickly delve into details of anomalies as they discover them. Using an in-house application framework, we designed a single, integrated display that uses innovative geospatial visualization and data analysis tools to allow users to quickly and accurately understand overall system health and anomalies with minimal cognitive load.
The US Integrated Ocean Observing System (IOOS (R)) is a collaboration between Federal, State, Local, Academic and Commercial partners to manage and/or provide access to a wide range of ocean observing assets and data...
详细信息
ISBN:
(纸本)9781479949182
The US Integrated Ocean Observing System (IOOS (R)) is a collaboration between Federal, State, Local, Academic and Commercial partners to manage and/or provide access to a wide range of ocean observing assets and data feeds, including in-situ buoys, drifters, gliders, radar, satellite data, and numerical models and meet the needs of the ocean data community. This paper provides a discussion on the evolution of DMAC within IOOS, shows how the evolved DMAC will de-centralize ocean observing and enable the RAs to establish operational observing systems and create new forecast products supporting ocean, coastal, and estuarine interests and provides an update on the status of the current system.
Global access to storage is a common theme of Grid Computing, with access mechanisms often enforcing a major restriction on the distribution of significant applications across a computational grid. The established app...
详细信息
ISBN:
(纸本)0769519148
Global access to storage is a common theme of Grid Computing, with access mechanisms often enforcing a major restriction on the distribution of significant applications across a computational grid. The established approach is to distribute the data with the jobs, sometimes requiring lengthy delays on job completion and the necessity for significant resource discovery to establish local data capabilities. For applications that are truly data intensive, this may render them either highly inefficient, or even incapable of using grid computing environments. In this paper we describe a different approach, where the opportunity to design a grid environment from scratch is used to build a, tightly-coupled, data-oriented infrastructure that leverages deep investment in leading edge technology to provide very high-speed, widespread access to large data storage. Results from a geographically distributed Grid established for the Supercomputing 2002 conference, using preliminary TeraGrid infrastructure, are included and show encouraging performance including data transfer rates of over 700 MB/s using eight 1 Gb/s links from a Storage Area Network to a 10 Gb/s Wide Area Network.
Advances in computational science, combined with the increasingly interdisciplinary and geographically distributed research teams, have led to a need to support multi-tiered, data- and meta-data-rich collaboration inf...
详细信息
ISBN:
(纸本)9780780394858
Advances in computational science, combined with the increasingly interdisciplinary and geographically distributed research teams, have led to a need to support multi-tiered, data- and meta-data-rich collaboration infrastructures. Our research addresses the interactive, remote tasks undertaken in such collaborations, which require a flexible software infrastructure able to dynamically deploy services where and when needed, and to provide data to clients in the forms in which they require it with suitable levels of end-to-end performance. The concept of service augmentation advanced in this, paper seeks to continuously adjust the differences or degrees of incompatibility between the data received and the data displayed or stored by clients. Difference adjustments occur anywhere on the paths between data providers and clients, and compatibility computations leverage all of the resources that may be brought to bear, including CPUs and GPUs on servers and additional data manipulations on server, overlay, and client nodes. A formal structure and experimental evaluations of this concept are performed with the SmartPointer scientific visualization and annotation framework, for which we show that data-driven SLAs provide improved client flexibility and the ability to maintain application-specific notions of quality of service.
The National Academies recommend academic institutions foster a basic understanding of data science in all undergraduates. However, data science education is not currently a graduation requirement at most colleges and...
详细信息
ISBN:
(纸本)9781450369213
The National Academies recommend academic institutions foster a basic understanding of data science in all undergraduates. However, data science education is not currently a graduation requirement at most colleges and universities. As a result, many graduates lack even basic knowledge of data science. To address the shortfall, academic institutions should incorporate introductory data science into general education courses. A general education IT course provides a unique opportunity to integrate data science education. Modules covering databases, spreadsheets, and presentation software, already present in many survey IT courses, teach concepts and skills needed for data science. As a result, a survey IT course can provide comprehensive introductory data science education by adding a data science module focused on modeling and evaluation, two key steps in the data science process. The module should use data science software for application, avoiding the complexities of programming and advanced math, while enabling an emphasis on conceptual understanding. We implemented a course built around these ideas and found that the course helps develop data savvy in students.
Due to the development of internet and the intensive social network communications, the number of data grows exponentially in our society. In response, we need tools to discover structures in multidimensional data. In...
详细信息
ISBN:
(纸本)9783030325206;9783030325190
Due to the development of internet and the intensive social network communications, the number of data grows exponentially in our society. In response, we need tools to discover structures in multidimensional data. In that context, dimensionality reduction techniques are useful because they make it possible to visualize high dimension phenomena in low dimensional space. Space-filling curves is an alternative to regular techniques, for example, principal component analysis (PCA). One interesting aspect of this alternative is the computing time required (less than half a second where PCA spends seconds). Moreover with the algorithms provide results are comparable with PCA in term of data visualization. Intensive experiments are led to characterize this new alternative on several dataset covering complex data behaviors.
Gaining higher level evolutionary information about large software systems is a key in validating past and adjusting future development processes. In this paper, we analyze the proximity of software features based on ...
详细信息
ISBN:
(纸本)0769520278
Gaining higher level evolutionary information about large software systems is a key in validating past and adjusting future development processes. In this paper, we analyze the proximity of software features based on modification and problem report data that capture the system's evolution history. Features are instrumented and tracked, the relationships of modification and problem reports to these features are established, and the tracked features are visualized to illustrate their otherwise hidden dependencies. Our approach uncovers these hidden relationships between features via problem report analysis and presents them in easy-to-evaluate visual form. Particular feature dependencies then can be selected to assess the feature evolution by zooming in into an arbitrary level of detail. Such visualization of interwoven features, therefore, can indicate locations of design erosion in the architectural evolution of a software system. Our approach has been validated using the large open source software project of Mozilla and its bug reporting system Bugzilla.
This paper presents a study of examining the statistical correlation between wildfire and weather by mining historical spatial and temporal wildfire and climate data. Large wildfires have been recently becoming more f...
详细信息
ISBN:
(纸本)9781728172835
This paper presents a study of examining the statistical correlation between wildfire and weather by mining historical spatial and temporal wildfire and climate data. Large wildfires have been recently becoming more frequent, intense and destructive in the West of United States. The occurrence of wildfires can be determined by many human and natural factors, such as the availability of fuels, physical settings, and weather conditions, among which weather is of great interest and importance for wildfire forecasting. The availability of landscape fire data sets and weather data sets now enables the analysis of correlation between wildfire and weather which indicates the possibility of wildfire for given weather conditions in one region. This paper investigates the relation between wildfire and drought conditions in California and visualize the results using geographic information system (GIS) computing technology. Our data analysis findings show a high correlation between the normalized number of wildfires per forest unit area and drought severity, illustrating the potential of forecasting wildfire using weather data.
Subarachnoid hemorrhage (SAH) is a devastating neurological injury that can lead to many downstream complications including epilepsy. Predicting who will get epilepsy in order to find ways to prevent it as well as str...
详细信息
ISBN:
(数字)9781728127828
ISBN:
(纸本)9781728127828
Subarachnoid hemorrhage (SAH) is a devastating neurological injury that can lead to many downstream complications including epilepsy. Predicting who will get epilepsy in order to find ways to prevent it as well as stratify patients for future interventions is a major challenge given the large number of variables not only related to the injury itself, but also to what happens after the injury. Extensive multimodal data are generated during the process of SAH patient care. In parallel, preclinical models are under development that attempt to imitate the variables observed in patients. Computational tools that consider all variables from both human data and animal models are lacking and demand an integrated, time-dependent platform where researchers can aggregate, store, visualize, analyze, and share the extensive integrated multimodal information. We developed a multi-tier web-based application that is secure, extensible, and adaptable to all available data modalities using flask micro-web framework, python, and PostgreSQL database. The system supports data visualization, data sharing and downloading for offline processing. The system is currently hosted inside the institutional private network and holds similar to 14 TB of data from 164 patients and 71 rodents.
Recently, Operation and Asset Management Department pays more and more attention to the accumulation and analysis of operation data. To support the OAMD with decision making and lead the asset management transit from ...
详细信息
ISBN:
(纸本)9781509033980
Recently, Operation and Asset Management Department pays more and more attention to the accumulation and analysis of operation data. To support the OAMD with decision making and lead the asset management transit from "experience dependence" to "data support", this paper analyzes the defect data of circuit-breaker with voltage level above 110kV from the perspective of voltage level, operation department, manufacturer, service time, mechanism type. Through data visualization, the distribution of defect rate of circuit-breakers is obtained firstly. After that, the common defect types of different manufacturers are summarized and a statistical analysis of replaced components of circuit breaker has been made to provide assistance to the Operation and Asset Management Department on spare parts. In the end, a replacement strategy of circuit-breakers is studied and it is suggested that the most efficient scheme is to replace 1 % of circuit breakers older than 17 years.
暂无评论