bigdata are extremely large-scaled data in terms of quantity, complexity, semantics, distribution, and processing costs in computer science, cognitive informatics, web-based computing, cloud computing, and computatio...
详细信息
ISBN:
(纸本)9781479965137
bigdata are extremely large-scaled data in terms of quantity, complexity, semantics, distribution, and processing costs in computer science, cognitive informatics, web-based computing, cloud computing, and computational intelligence. Censuses and elections are a typical paradigm of big data engineering in modern digital democracy and social networks. This paper analyzes the mechanisms of voting systems and collective opinions using bigdata analysis technologies. A set of numerical and fuzzy models for collective opinion analyses is presented for applications in social networks, online voting, and general elections. A fundamental insight on the collective opinion equilibrium is revealed among electoral distributions and in voting systems. Fuzzy analysis methods for collective opinions are rigorously developed and applied in poll data mining, collective opinion determination, and quantitative electoral data processing.
Context MapReduce is a processing model used in bigdata to facilitate the analysis of large data under a distributed architecture. Objective The aim of this study is to identify and categorize the state of the art of...
详细信息
Context MapReduce is a processing model used in bigdata to facilitate the analysis of large data under a distributed architecture. Objective The aim of this study is to identify and categorize the state of the art of software testing in MapReduce applications, determining trends and gaps. Method Systematic mapping study to discuss and classify according to international standards 54 relevant studies in relation to reasons for testing, types of testing, quality characteristics, test activities, tools, roles, processes, test levels, and research validations. Results The principal reasons for testing MapReduce applications are performance issues, potential failures, issues related to the data, or to satisfy the agreements with efficient resources. The efforts are focused on performance and, to a lesser degree, on functionality. Performance testing is carried out through simulation and evaluation, whereas functional testing considers some program characteristics (such as specification and structure). Despite the type of testing, the majority of efforts are focused at the unit and integration test levels of the specific MapReduce functions without considering other parts of the technology stack. Conclusions Researchers have both opportunities and challenges in performance and functional testing, and there is room to improve their research though the use of mature and standard validation methods.
In this study, we delve into the realm of efficient big data engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical D...
详细信息
In this study, we delve into the realm of efficient big data engineering and Extract, Transform, Load (ETL) processes within the healthcare sector, leveraging the robust foundation provided by the MIMIC-III Clinical database. Our investigation entails a comprehensive exploration of various methodologies aimed at enhancing the efficiency of ETL processes, with a primary emphasis on optimizing time and resource utilization. Through meticulous experimentation utilizing a representative dataset, we shed light on the advantages associated with the incorporation of PySpark and Docker containerized applications. Our research illuminates significant advancements in time efficiency, process streamlining, and resource optimization attained through the utilization of PySpark for distributed computing within big data engineering workflows. Additionally, we underscore the strategic integration of Docker containers, delineating their pivotal role in augmenting scalability and reproducibility within the ETL pipeline. This paper encapsulates the pivotal insights gleaned from our experimental journey, accentuating the practical implications and benefits entailed in the adoption of PySpark and Docker. By streamlining big data engineering and ETL processes in the context of clinical bigdata, our study contributes to the ongoing discourse on optimizing data processing efficiency in healthcare applications. The source code is available on request.
bigdata are pervasively generated by human cognitive processes, formal inferences, and system quantifications. This paper presents the cognitive foundations of bigdata systems towards bigdata science. The key perce...
详细信息
bigdata are pervasively generated by human cognitive processes, formal inferences, and system quantifications. This paper presents the cognitive foundations of bigdata systems towards bigdata science. The key perceptual model of bigdata systems is the recursively typed hyperstructure (RTHS). The RTHS model reveals the inherited complexities and unprecedented difficulty in big data engineering. This finding leads to a set of mathematical and computational models for efficiently processing bigdata systems. The cognitive relationship between data, information, knowledge, and intelligence is formally described.
bigdata are products of human collective intelligence that are exponentially increasing in all facets of quantity, complexity, semantics, distribution, and processing costs in computer science, cognitive informatics,...
详细信息
bigdata are products of human collective intelligence that are exponentially increasing in all facets of quantity, complexity, semantics, distribution, and processing costs in computer science, cognitive informatics, web-based computing, cloud computing, and computational intelligence. This paper presents fundamental bigdata analysis and mining technologies in the domain of social networks as a typical paradigm of big data engineering. A key principle of computational sociology known as the characteristic opinion equilibrium is revealed in social networks and electoral systems. A set of numerical and fuzzy models for collective opinion analyses is formally presented. Fuzzy data mining methodologies are rigorously described for collective opinion elicitation and benchmarking in order to enhance the conventional counting and statistical methodologies for bigdata analytics.
bigdata have become an integral part of various research fields due to the rapid advancements in the digital technologies available for dealing with data. The construction industry is no exception and has seen a spik...
详细信息
bigdata have become an integral part of various research fields due to the rapid advancements in the digital technologies available for dealing with data. The construction industry is no exception and has seen a spike in the data being generated due to the introduction of various digital disruptive technologies. However, despite the availability of data and the introduction of such technologies, the construction industry is lagging in harnessing bigdata. This paper critically explores literature published since 2010 to identify the data trends and how the construction industry can benefit from bigdata. The presence of tools such as computer-aided drawing (CAD) and building information modelling (BIM) provide a great opportunity for researchers in the construction industry to further improve how infrastructure can be developed, monitored, or improved in the future. The gaps in the existing research data have been explored and a detailed analysis was carried out to identify the different ways in which bigdata analysis and storage work in relevance to the construction industry. big data engineering (BDE) and statistics are among the most crucial steps for integrating bigdata technology in construction. The results of this study suggest that while the existing research studies have set the stage for improving bigdata research, the integration of the associated digital technologies into the construction industry is not very clear. Among the future opportunities, bigdata research into construction safety, site management, heritage conservation, and project waste minimization and quality improvements are key areas.
暂无评论