检索结果-内蒙古大学图书馆

data management and processing workflow for the Materials Physics and Engineering group beamlines at the Advanced Photon Source

引用

JOURNAL OF SYNCHROTRON RADIATION 2019年第2期26卷 373-381页

作者： Park, Jun-Sang Horn, Connor Ramanathan, Prithvi Kenesei, Peter Veseli, Sinisa Argonne Natl Lab Adv Photon Source Lemont IL 60439 USA Cornell Univ Appl & Engn Phys Ithaca NY 14853 USA Univ Illinois Dept Comp Sci Urbana IL 61801 USA

The ability to store, organize, process and distribute experimental data effectively, efficiently and securely is particularly important for large user facilities like the Advanced Photon Source. In this article, the deployment of the APS data Management System (DM) at the 1-ID and 6-BM beamlines of the APS is described. These two beamlines support a wide range of experimental techniques and generate data at relatively high rates, making them ideal candidates to illustrate the deployment and customization of the DM system and its tools. Using several usage examples at these beamlines, various capabilities of the DM system are described.

关键词： data management data processing workflow

来源：评论

学校读者我要写书评

暂无评论

A practical data processing workflow for multi-OMICS projects

引用

BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014年第1期1844卷 52-62页

作者： Kohl, Michael Megger, Dominik A. Trippler, Martin Meckel, Hagen Ahrens, Maike Bracht, Thilo Weber, Frank Hoffmann, Andreas-Claudius Baba, Hideo A. Sitek, Barbara Schlaak, Joerg F. Meyer, Helmut E. Stephan, Christian Eisenacher, Martin Ruhr Univ Bochum Med Proteom Ctr D-44801 Bochum Germany Univ Hosp Essen Dept Gastroenterol & Hepatol D-45122 Essen Germany Univ Hosp Essen Dept Gen Visceral & Transplantat Surg D-45122 Essen Germany Univ Hosp Essen Dept Med Canc Res D-45122 Essen Germany Univ Hosp Essen Dept Pathol & Neuropathol D-45122 Essen Germany Kairos GmbH D-44799 Bochum Germany

Multi-OMICS approaches aim on the integration of quantitative data obtained for different biological molecules in order to understand their interrelation and the functioning of larger systems. This paper deals with several data integration and data processing issues that frequently occur within this context. To this end, the data processing workflow within the PROFILE project is presented, a multi-OMICS project that aims on identification of novel biomarkers and the development of new therapeutic targets for seven important liver diseases. Furthermore, a software called CrossPlatformCommander is sketched, which facilitates several steps of the proposed workflow in a semi-automatic manner. Application of the software is presented for the detection of novel biomarkers, their ranking and annotation with existing knowledge using the example of corresponding Transcriptomics and Proteomics data sets obtained from patients suffering from hepatocellular carcinoma. Additionally, a linear regression analysis of Transcriptomics vs. Proteomics data is presented and its performance assessed. It was shown, that for capturing profound relations between Transcriptomics and Proteomics data, a simple linear regression analysis is not sufficient and implementation and evaluation of alternative statistical approaches are needed. Additionally, the integration of multivariate variable selection and classification approaches is intended for further development of the software. Although this paper focuses only on the combination of data obtained from quantitative Proteomics and Transcriptomics experiments, several approaches and data integration steps are also applicable for other OMICS technologies. Keeping specific restrictions in mind the suggested workflow (or at least parts of it) may be used as a template for similar projects that make use of different high throughput techniques. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era

关键词： Multi-OMICS Quantitative Proteomics Quantitative Transcriptomics data processing workflow Regression analysis Biomarker

来源：评论

学校读者我要写书评

暂无评论

Privacy-preserving workflow scheduling in geo-distributed data centers

引用

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE 2022年 130卷 46-58页

作者： Xiao, Yao Zhou, Amelie Chi Yang, Xuan He, Bingsheng Shenzhen Univ Coll Comp Sci & Software Engn Shenzhen Peoples R China Natl Univ Singapore Sch Comp Singapore Singapore

Due to the increasing volume of data to be analyzed and the need for global collaborations, many scientific applications have been deployed in a geo-distributed manner. Scientific workflows provide a good model for running and managing geo-distributed scientific data analytics. However, due to the multi-level data privacy requirements in geo-distributed data centers (DCs), as well as the costly and heterogeneous inter-DC network performance, executing scientific workflows efficiently in such a geo-distributed environment is not easy. In this paper, we propose a privacy-preserving workflow scheduling algorithm named PPPS, which aims at minimizing the inter-DC data transfer time for workflows while satisfying data privacy requirements. We compare PPPS with five state-of-the-art workflow scheduling algorithms using Windows Azure cloud performance traces and real scientific workflows. Experimental results show that PPPS can greatly reduce the workflow execution time compared to the other algorithms by up to 93% while satisfying complicated data privacy constraints. (C) 2021 Elsevier B.V. All rights reserved.

关键词： data privacy data processing workflow Task scheduling Geo-distributed DCs

来源：评论

学校读者我要写书评

暂无评论

A guide to the processing and standardization of global palaeoecological data for large-scale syntheses using fossil pollen

引用

GLOBAL ECOLOGY AND BIOGEOGRAPHY 2023年第8期32卷 1377-1394页

作者： Flantua, Suzette G. A. Mottl, Ondrej Felde, Vivian A. A. Bhatta, Kuber P. P. Birks, Hilary H. H. Grytnes, John-Arvid Seddon, Alistair W. R. Birks, H. John B. Univ Bergen Dept Biol Sci Bergen Norway Bjerknes Ctr Climate Res Bergen Norway UCL Environm Change Res Ctr London England

AimPalaeoecological data are crucial for comprehending large-scale biodiversity patterns and the natural and anthropogenic drivers that influence them over time. Over the last decade, the availability of open-access research databases of palaeoecological proxies has substantially increased. These databases open the door to research questions needing advanced numerical analyses and modelling based on big-data compilations. However, compiling and analysing palaeoecological data pose unique challenges that require a guide for producing standardized and reproducible compilations. InnovationWe present a step-by-step guide of how to process fossil pollen data into a standardized dataset compilation ready for macroecological and palaeoecological analyses. We describe successive criteria that will enhance the quality of the compilations. Though these criteria are project and research question-dependent, we discuss the most important assumptions that should be considered and adjusted accordingly. Our guide is accompanied by an R-workflow-called FOSSILPOL-and corresponding R-package-called R-Fossilpol-that provide a detailed protocol ready for interdisciplinary users. We illustrate the workflow by sourcing and processing Scandinavian fossil pollen datasets and show the reproducibility of continental-scale data processing. Main ConclusionsThe study of biodiversity and macroecological patterns through time and space requires large-scale syntheses of palaeoecological datasets. The data preparation for such syntheses must be transparent and reproducible. With our FOSSILPOL workflow and R-package, we provide a protocol for optimal handling of large compilations of fossil pollen datasets and workflow reproducibility. Our workflow is also relevant for the compilation and synthesis of other palaeoecological proxies and as such offers a guide for synthetic and cross-disciplinary analyses with macroecological, biogeographical and palaeoecological perspectives. However, we emphasize tha

关键词： data processing workflow fossil pollen data FOSSILPOL large-scale syntheses macroecology Neotoma Paleoecology database palaeoecology R package R-Fossilpol

来源：评论

学校读者我要写书评

暂无评论

data Integration, Cleaning, and Deduplication: Research Versus Industrial Projects 24th

Data Integration, Cleaning, and Deduplication: Research Vers...

引用

24th International Conference on Information Integration and Web Intelligence (iiWAS)

作者： Wreinbel, Robert Poznan Univ Tech Poznan Poland

ISBN: (纸本)9783031210464;9783031210471

In business applications, data integration is typically implemented as a data warehouse architecture. In this architecture, heterogeneous and distributed data sources are accessed and integrated by means of Extract-Transform-Load (ETL) processes. Designing these processes is challenging due to the heterogeneity of data models and formats, data errors and missing values, multiple data pieces representing the same realworld objects. As a consequence, ETL processes are very complex, which results in high development and maintenance costs aswell as long runtimes. To ease the development of ETL processes, various research and technological solutions were development. They include among others: (1) ETL design methods, (2) data cleaning pipelines, (3) data deduplication pipelines, and (4) performance optimization techniques. In spite of the fact that these solutions were included in commercial (and some open license) ETL design environments and ETL engines, there still exist multiple open issues and the existing solutions still need to advance. In this paper (and its accompanying talk), I will provoke a discussion on what problems one can encounter while implementing ETL pipelines in real business (industrial) projects. The presented findings are based on my experience from research and commercial data integration projects in financial, healthcare, and software development sectors. In particular, I will focus on a few particular issues, namely: (1) performance optimization of ETL processes, (2) cleaning and deduplicating large row-like data sets, and (3) integrating medical data.

关键词： data integration data warehouse data lake Big data Extract transform load data processing workflow data processing pipeline data quality data deduplication ETL performance optimization

来源：评论

学校读者我要写书评

暂无评论

Novel Insights into Quantitative Proteomics from an Innovative Bottom-Up Simple Light Isotope Metabolic (bSLIM) Labeling data processing Strategy

引用

JOURNAL OF PROTEOME RESEARCH 2021年第3期20卷 1476-1487页

作者： Senecaut, Nicolas Alves, Gelio Weisser, Hendrik Lignieres, Laurent Terrier, Samuel Yang-Crosson, Lilian Poulain, Pierre Lelandais, Gaelle Yu, Yi-Kuo Camadro, Jean-Michel Univ Paris Mitochondria Met & Oxidat Stress Grp Inst Jacques Monod CNRS F-75013 Paris France NIH Natl Ctr Biotechnol Informat NLM Bethesda MD 20894 USA STORM Therapeut Ltd Cambridge CB22 3AT England Univ Paris ProteoSeine IJM CNRS Inst Jacques Monod F-75013 Paris France Inst Biol & Integrat Cellule F-91190 Orsay France

Simple light isotope metabolic labeling (SLIM labeling) is an innovative method to quantify variations in the proteome based on an original in vivo labeling strategy. Heterotrophic cells grown in U-[C-12] as the sole source of carbon synthesize U-[C-12]-amino acids, which are incorporated into proteins, giving rise to U-[C-12]-proteins. This results in a large increase in the intensity of the monoisotope ion of peptides and proteins, thus allowing higher identification scores and protein sequence coverage in mass spectrometry experiments. This method, initially developed for signal processing and quantification of the incorporation rate of C-12 into peptides, was based on a multistep process that was difficult to implement for many laboratories. To overcome these limitations, we developed a new theoretical background to analyze bottom-up proteomics data using SLIM-labeling (bSLIM) and established simple procedures based on open-source software, using dedicated OpenMS modules, and embedded R scripts to process the bSLIM experimental data. These new tools allow computation of both the C-12 abundance in peptides to follow the kinetics of protein labeling and the molar fraction of unlabeled and C-12-labeled peptides in multiplexing experiments to determine the relative abundance of proteins extracted under different biological conditions. They also make it possible to consider incomplete C-12 labeling, such as that observed in cells with nutritional requirements for nonlabeled amino acids. These tools were validated on an experimental dataset produced using various yeast strains of Saccharomyces cerevisiae and growth conditions. The workflows are built on the implementation of appropriate calculation modules in a KNIME working environment. These new integrated tools provide a convenient framework for the wider use of the SLIM-labeling strategy.

关键词： In vivo metabolic labeling light carbon isotope C-12 quantitative proteomics data processing workflow OpenMS KNIME yeast

来源：评论

学校读者我要写书评

暂无评论

Still Open Problems in data Warehouse and data Lake Research extended abstract 8

Still Open Problems in Data Warehouse and Data Lake Research...

引用

8th International Conference on Social Network Analysis, Management and Security (SNAMS)

作者： Wrembel, Robert Poznan Univ Tech Fac Comp & Telecommun Poznan Poland

ISBN: (纸本)9781665494953

During recent years, we observe a widespread of new data sources, especially all types of social media and IoT devices, which produce huge data volumes, whose content ranges from fully structured to totally unstructured. All these types of data are commonly referred to as big data. They are typically described by the three most important characteristics, called 3V [1], namely: an extremely large volume, a variety of data models and structures (data representations), as well as a high velocity at which data are generated. We argue that out of these three Vs, the most challenging is variety [2]. Such data need to be integrated and transformed into a common representation, which is suitable for analysis, in a similar manner as traditional (mainly table-like) data. © 2021 IEEE.

关键词： data integration data warehouse data lake big data extract transform load data processing workflow data processing pipeline data quality ETL optimization data source evolution metadata

来源：评论

学校读者我要写书评

暂无评论

Visualizing and comparing results of different peptide identification methods

引用

BRIEFINGS IN BIOINFORMATICS 2018年第2期19卷 210-218页

作者： Mohammed, Yassene Palmblad, Magnus Leiden Univ Med Ctr Ctr Prote & Metabol Bioinformat Leiden Netherlands Leiden Univ Med Ctr Bioinformat Grp Ctr Prote & Metabol Leiden Netherlands

In mass spectrometry-based proteomics, peptides are typically identified from tandem mass spectra using spectrum comparison. A sequence search engine compares experimentally obtained spectra with those predicted from protein sequences, applying enzyme cleavage and fragmentation rules. To this, there are two main alternatives: spectral libraries and de novo sequencing. The former compares measured spectra with a collection of previously acquired and identified spectra in a library. De novo attempts to sequence peptides from the tandem mass spectra alone. We here present a theoretical framework and a data processing workflow for visualizing and comparing the results of these different types of algorithms. The method considers the three search strategies as different dimensions, identifies distinct agreement classes and visualizes the complementarity of the search strategies. We have included X! Tandem, SpectraST and PepNovo, as they are in common use and representative for algorithms of each type. Our method allows advanced investigation of how the three search methods perform relatively to each other and shows the impact of the currently used decoy sequences for evaluating the false discovery rates.

关键词： data processing workflow data integration peptide-spectrum matching visualization

来源：评论

学校读者我要写书评

暂无评论

WaaS: workflow-as-a-Service for the Cloud with Scheduling of Continuous and data-Intensive workflows

引用

COMPUTER JOURNAL 2016年第3期59卷 371-383页

作者： Esteves, Sergio Veiga, Luis Univ Lisbon Inst Super Tecn INESC ID Lisboa P-1699 Lisbon Portugal

data-intensive and long-lasting applications running in the form of workflows are being increasingly dispatched to cloud computing systems. Current scheduling approaches for graphs of dependencies fail to deliver high resource efficiency while keeping computation costs low, especially for continuous data processing workflows, where the scheduler does not perform any reasoning about the impact new input data may have in the workflow final output. To face such a challenge, we introduce a new scheduling criterion, Quality-of-data (QoD), which describes the requirements about the data that are worthy of the triggering of tasks in workflows. Based on the QoD notion, we propose a novel service-oriented scheduler planner, for continuous data processing workflows, that is capable of enforcing QoD constraints and guide the scheduling to attain resource efficiency, overall controlled performance and task prioritization. To contrast the advantages of our scheduling model against others, we developed WaaS (workflow-as-a-Service), a workflow coordinator system for the Cloud where data is shared among tasks via cloud columnar database.

关键词： data processing workflow data-intensive scheduling continuous processing cloud computing quality-of-service

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：