With the development of mobile Internet technology, Internet of things, cloud computing and other information technology, big data is more and more widely used in real life. Big data refers to the use of all data for ...
详细信息
With the development of mobile Internet technology, Internet of things, cloud computing and other information technology, big data is more and more widely used in real life. Big data refers to the use of all data for analysis and processing. According to big data's 4 V characteristic: Volume(large number), Velocity(high speed), Variety(variety), Value(value density), this paper explores its role in data processing and analysis, and explores the unknown field of data processing and analysis. There is no mature application of big dataanalysis method in chemical data processing and analysis field at present. Need According to the existing application examples, combined with the characteristics of analytical chemical data processing and analysis, the concrete operation flow is formed. The emphasis of this paper is to understand the application and characteristics of big data's analytical work in various fields at the present stage, and to explore the concrete operation flow and effect of big dataanalysis applied to the processing and analysis of analytical chemical data.
Ensuring that formwork systems are properly installed is essential for construction safety and quality. They have to comply with specific design requirements and meet strict tolerances regarding the installation of th...
详细信息
Ensuring that formwork systems are properly installed is essential for construction safety and quality. They have to comply with specific design requirements and meet strict tolerances regarding the installation of the different members. The current method of quality control during installation mostly relies on manual measuring tools and inspections heavily reliant on the human factor, which could lead to inconsistencies and inaccurate results. This study proposes a way to automate the inspection process and presents a framework within which to measure the spacing of the different members of the formwork system using 3D point cloud data. 3D point cloud data are preprocessed, processed, and analyzed with various techniques, including filtering, downsampling, transforming, fitting, and clustering. The novelty is not only in the integration of the different techniques used but also in the detection and measurement of key members in the formwork system with limited human intervention. The proposed framework was tested on a real construction site. Five cases were investigated to compare the proposed approach to the manual and traditional one. The results indicate that this approach is a promising solution and could potentially be an effective alternative to manual inspections for quality control during the installation of formwork systems.
Clinical proteomics studies aiming to develop markers of clinical outcome or disease typically involve distinct discovery and validation stages, neither of which focus on the clinical applicability of the candidate ma...
详细信息
Clinical proteomics studies aiming to develop markers of clinical outcome or disease typically involve distinct discovery and validation stages, neither of which focus on the clinical applicability of the candidate markers studied. Our clinically useful selection of proteins (CUSP) protocol proposes a rational approach, with statistical and non-statistical components, to identify proteins for the validation phase of studies that could be most effective markers of disease or clinical outcome. Additionally, this protocol considers commercially available analysis methods for each selected protein to ensure that use of this prospective marker is easily translated into clinical practice. Significance: When developing proteomic markers of clinical outcomes, there is currently no consideration at the validation stage of how to implement such markers into a clinical setting. This has been identified by several studies as a limitation to the progression of research findings from proteomics studies. When integrated into a proteomic workflow, the CUSP protocol allows for a strategically designed validation study that improves researchers' abilities to translate research findings from discovery-based proteomics into clinical practice.
The human gut microbiome plays a vital role in preserving individual health and is intricately involved in essential functions. Imbalances or dysbiosis within the microbiome can significantly impact human health and a...
详细信息
The human gut microbiome plays a vital role in preserving individual health and is intricately involved in essential functions. Imbalances or dysbiosis within the microbiome can significantly impact human health and are associated with many diseases. Several metaproteomics platforms are currently available to study microbial proteins within complex microbial communities. In this study, we attempted to develop an integrated pipeline to provide deeper insights into both the taxonomic and functional aspects of the cultivated human gut microbiomes derived from clinical colon biopsies. We combined a rapid peptide search by MSFragger against the Unified Human Gastrointestinal Protein database and the taxonomic and functional analyses with Unipept Desktop and MetaLab-MAG. Across seven samples, we identified and matched nearly 36,000 unique peptides to approximately 300 species and 11 phyla. Unipept Desktop provided gene ontology, InterPro entries, and enzyme commission number annotations, facilitating the identification of relevant metabolic pathways. MetaLab-MAG contributed functional annotations through Clusters of Orthologous Genes and Non-supervised Orthologous Groups categories. These results unveiled functional similarities and differences among the samples. This integrated pipeline holds the potential to provide deeper insights into the taxonomy and functions of the human gut microbiome for interrogating the intricate connections between microbiome balance and diseases.
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applicati...
详细信息
Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) dataanalysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub () and can easily be installed locally through the cross-platform PyPI Python package.
Owing to the uncertainty operation in the sintering process, it is easy to produce uncertain prediction errors in the single drum index prediction model, which makes the prediction results lack certain reliability. Ac...
详细信息
Owing to the uncertainty operation in the sintering process, it is easy to produce uncertain prediction errors in the single drum index prediction model, which makes the prediction results lack certain reliability. Accurate and reliable prediction of the drum index can help improve the drum index. In this paper, a prediction interval estimation method of drum index based on a light gradient boosting machine (LightGBM) and kernel density estimation (KDE) is proposed. LightGBM can obtain accurate points prediction of drum index, and then use the KDE method to obtain the estimated prediction interval of drum index. The comparison results of different methods show that LightGBM has high prediction performance, and KDE can well quantify the prediction error of drum index, which verifies the effectiveness of the prediction interval estimation method combined with LightGBM and KDE, and provides more reliable decision-making information for the optimisation of sintering process parameters.
Reproducibility is at the heart of science. However, most published results usually lack the information necessary to be independently reproduced. Even more, most authors will not be able to reproduce the results from...
详细信息
Reproducibility is at the heart of science. However, most published results usually lack the information necessary to be independently reproduced. Even more, most authors will not be able to reproduce the results from a few years ago due to lacking a gap-less record of every processing and analysis step including all parameters involved. There is only one way to overcome this problem: developing robust tools for dataanalysis that, while maintaining a maximum of flexibility in their application, allow the user to perform advanced processing steps in a scientifically sound way. At the same time, the only viable approach for reproducible and traceable analysis is to relieve the user of the responsibility for logging all processing steps and their parameters. This can only be achieved by using a system that takes care of these crucial though often neglected tasks. Here, we present a solution to this problem: a framework for the analysis of spectroscopic data (ASpecD) written in the Python programming language that can be used without any actual programming needed. This framework is made available open-source and free of charge and focusses on usability, small footprint and modularity while ensuring reproducibility and good scientific practice. Furthermore, we present a set of best practices and design rules for scientific software development and dataanalysis. Together, this empowers scientists to focus on their research minimising the need to implement complex software tools while ensuring full reproducibility. We anticipate this to have a major impact on reproducibility and good scientific practice, as we raise the awareness of their importance, summarise proven best practices and present a working user-friendly software solution.
Traditional Chinese medicine (TCM) is a clinical-based discipline in which real-world clinical practice plays a significant role for both the development of clinical therapy and theoretical research. The large-scale c...
详细信息
Traditional Chinese medicine (TCM) is a clinical-based discipline in which real-world clinical practice plays a significant role for both the development of clinical therapy and theoretical research. The large-scale clinical data generated during the daily clinical operations of TCM provide a highly valuable knowledge source for clinical decision making. Secondary analysis of these data would be a vital task for TCM clinical studies before the randomised controlled trials are conducted. In this article, we discuss the challenges and issues, such as structured data curation, data preprocessing and quality, large-scale data management and complex dataanalysis requirements, in the data processing and analysis of real-world TCM clinical data. Furthermore, we also discuss related state-of-the-art research and solutions in China. We have shown that the clinical data warehouse based on the collection of structured electronic medical record data and clinical terminology would be a promising approach for generating clinical hypotheses and helping the discovery of clinical knowledge from large-scale real-world TCM clinical data. Copyright (c) 2011 John Wiley & Sons, Ltd.
Publishing supporting data significantly impacts researchers' productivity, especially in experiments requiring extensive tracking of data, processing steps, parameters, and outputs. A managed workflow environment...
详细信息
Publishing supporting data significantly impacts researchers' productivity, especially in experiments requiring extensive tracking of data, processing steps, parameters, and outputs. A managed workflow environment, combined with RO-Crates, addresses these data management challenges. Workflows provide an alternative for handling complex data analyses by orchestrating various processing tools. The RO-Crate format, a community-driven proposal for packaging data, provenance, and workflows, facilitates publishing and reproducibility. The Galaxy workflow management system integrates workflows and RO-Crates, enabling the export of analyses, which can be shared and restored by other users. Using Galaxy, we demonstrate how to improve support for reproducibility. We tested our approach by designing an experiment using diverse supporting data from selected papers. In the experiment, we identified specific FAIRness and completeness issues hindering result reproduction, even when authors made significant efforts to document and publish their supporting data. In comparison, the proposed approach supports reproducibility by packaging datasets in RO-Crate format, streamlining the process. The Galaxy RO-Crates, published as supporting materials, enhance data sharing, transparency, and reproducibility, thus supporting the advancement of FAIR research practices in catalysis research.
The advancements in Artificial Intelligence (AI), notably OpenAI's ChatGPT, introduce novel research perspectives and applications to textile science and industry. This study primarily encompasses two domains: aca...
详细信息
The advancements in Artificial Intelligence (AI), notably OpenAI's ChatGPT, introduce novel research perspectives and applications to textile science and industry. This study primarily encompasses two domains: academic research and industrial applications. Within the realm of textile science, using textiles and carbon microspheres as examples, we employ ChatGPT to translate demand language into code, exploring its potential for dataprocessing and visualization;in collaboration with Stable Diffusion's "text-to-image" technology, we visualize concepts in textile design;by integrating Segment Anything Model (SAM)' s image segmentation technology, ChatGPT achieves precise detection of textile defects;and this research also delves into the integration of ChatGPT with finite element modeling software, proposing a more efficient and accurate strategy for composite material modeling. In the textile industry context, the application of ChatGPT offers continuous process optimization and spurs the adoption of innovative techniques and methodologies, thereby advancing sustainable innovation within the sector. This paper presents a thorough survey of ChatGPT, aiming to highlight the transformative capabilities of this AI model and thus suggest a path towards a more innovative and sus-tainable future for the textile science and textile industry.
暂无评论