检索结果-内蒙古大学图书馆

An Empirical Study on the Effects of Jayvee, a Domain-Specific Language for data engineering, on Understanding data Pipeline Architectures

引用

SOFTWARE-PRACTICE & EXPERIENCE 2025年第6期55卷 1086-1105页

作者： Heltweg, Philip Schwarz, Georg-Daniel Riehle, Dirk Quast, Felix Friedrich Alexander Univ Erlangen Nurnberg Open Source Software Erlangen Germany

A large part of data science projects is spent on data engineering. Especially in open data contexts, data quality issues are prevalent and are often tackled by non-professional programmers. We introduce and evaluate Jayvee, a domain-specific language for data engineering aimed at reducing barriers to building data pipelines. We show that a structured DSL can have positive effects on speed, ease of use, and quality for data engineering by non-professional developers. For this, we present an empirical quantitative study, in which we compare the performance of students as proxies for non-professional programmers using Jayvee with Python and Pandas. We search for reasons for the empirical findings using a follow-up interview study on how using a DSL changes how non-professional programmers build data pipelines. Participants solve a subset of tasks faster, more easily, and with higher quality when using Jayvee compared to Python. Interviewees describe tradeoffs regarding the DSL's more limited features, stricter code structure, and explicit descriptions. Jayvee is found to be more approachable, which leads to a more guided development flow. New data engineering languages should provide good tooling and documentation, plan how to visualize intermediate data and consider new development workflows involving tools like ChatGPT to find adoption.

关键词： data engineering domain-specific language empirical study evaluation open data programming language

来源：评论

学校读者我要写书评

暂无评论

Distributed caching strategy for hot news propagation based on data engineering processing

引用

INTERNET TECHNOLOGY LETTERS 2025年第1期8卷

作者： Duan, Lanlan Henan Light Ind Vocat Coll Zhengzhou Peoples R China

In order to improve the performance of the hot news collection and propagation of hotspot news, we aim at improving the performance and solve the opinion collection and propagation bottleneck by analyzing the demand and framework of current hot news supported systems. Then, a corresponding opinion collection and propagation optimization framework and strategy are proposed with a distributed hotspot caching system is designed and implemented. On the basis of the existing distributed search engine, a cache server is added between the overall hotspot query processing server and each cluster sub-query processing server, so that we can improve the caching and searching efficiency greatly. The experimental results show that after using the proposed distributed caching system and strategy for hot news propagation, the processing capacity of the hot news system has been greatly enhanced.

关键词： data engineering distributed cache hot news collection news propagation searching engine

来源：评论

学校读者我要写书评

暂无评论

Development of Digital Telecommunication to Analyse and Store Complex Media data: A Way to data engineering 1

Development of Digital Telecommunication to Analyse and Stor...

引用

1st International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies, CE2CT 2025

作者： Pachigolla, Yesu Ratnam Pflugerville TX United States

ISBN: (纸本)9798331518578

In the era of digital world data flows are saturated with development of universal multifunctional system to solve problems and to optimise the computing resources. The information system is highly loaded with modern data and large number of resources. The user request and heterogeneity of the incoming streams can be evaluated using different types of multimedia services and its requirement for computing resources and its performance with the entire data. The incoming flow of data heterogeneity is considered as the distinctive feature of request in the modern information system to support different types of multimedia services in single platform. Large volumes of data and data heterogeneity Creates numerous problems related to data storage security and the speed of digital system. To address these challenges artificial intelligence technology can be used for execution of digital telecommunication complex for processing and storing the dynamic flow of data that are in multi format. The prospects and trends can be identified to develop these models based on the perspective characteristics. The development of digital communication with multi object analytic system for storing and analysing complex data with data engineering. An fuzzy based model is used for data processing with enhanced accuracy of 98%. © 2025 IEEE.

关键词： data engineering data flow data storage digital telecommunication Fuzzy model processing

来源：评论

学校读者我要写书评

暂无评论

Song recognition and analysis method based on data engineering and Low-Cost microphone sensor

引用

INTERNET TECHNOLOGY LETTERS 2025年第1期8卷

作者： Li, Ping Wei, Lingshuang Anshan Normal Univ Liaoning China Anshan Peoples R China Anshan Normal Univ Liaoning China 43 Ping St Anshan Liaoning Peoples R China

Song recognition refers to automatically recognizing the corresponding song name for the input audio clip. Because of its friendly interactive form and convenience, song recognition has become a hot topic in the research of music retrieval. However, most of the existing song recognition methods assume that the collected audios are clean data. Unfortunately, in practical applications, they often face problems such as the low price of the acquisition equipment and the serious noise pollution of the collected audio data, resulting in poor recognition accuracy. To solve the above problems, facing data engineering and low-cost microphone scenario, this paper proposes a deep learning based two-stage song recognition framework. Specifically, the Denoising Auto-Encoder network is first used for speech enhancement to obtain clean audio data. Then, the Con-LSTM network is proposed for clean song recognition. More specifically, Con-LSTM network integrates the advantages of convolutional neural network (CNN) and recurrent neural network (RNN), thus it has stronger recognition ability. The final experimental results show that the proposed song recognition framework can effectively identify the songs collected by low-cost microphones. As such, the proposed framework can be embedded in the web of things (WoT) system for well help to improve speech recognition task, which are essential in many advanced WoT systems

关键词： data engineering data mining low-cost microphone noise pollution speech enhancement WoT based song recognition

来源：评论

学校读者我要写书评

暂无评论

Stream Processing and Real Time Analytics in the Era of data engineering

Stream Processing and Real Time Analytics in the Era of Data...

引用

Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), International Conference on

作者： Sathishkumar Chintala Independent researcher Plano Texas

ISBN: (数字)9798331518578

ISBN: (纸本)9798331518585

The prevalence of increased streaming of data revolutionises organisations approach with stream processing and real time analytics through actionable and immediate insights in the generated data streams. Advancements in data engineering drives the stream processing and real time analytics. The present study explores the transformative impact of real time analytics by examining the major components in stream processing and real time analytics such as visualisation tools, in-memory storage, stream processing and data ingestion. It enables the decision-making using data, optimised operations and personalised user experience. The study examines the multinational retail corporations and video streaming services for data engineering. The major advantages are competitive and operational efficiencies enhanced customer experience and rapid insights on stream processing. The tremendous potential is highlighted for implementation of real time analytics and stream processing with integration complexities, skill gaps, privacy and security concerns, systems scalability, enhanced data quality and management of data volume. Stream processing engine is used for higher throughput and lower processing latency based on workload.

关键词： data analysis data integrity Decision making Throughput data engineering Big data applications Real-time systems User experience Engines Systematic literature review

来源：评论

学校读者我要写书评

暂无评论

Development of Digital Telecommunication to Analyse and Store Complex Media data: A Way to data engineering

Development of Digital Telecommunication to Analyse and Stor...

引用

Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), International Conference on

作者： Yesu Ratnam Pachigolla Pflugerville Texas

ISBN: (数字)9798331518578

ISBN: (纸本)9798331518585

关键词： Memory Multimedia computing Streaming media Media data engineering data processing Communications technology Artificial intelligence Streams Information systems

来源：评论

学校读者我要写书评

暂无评论

On the Design of a data engineering Learning Platform Using Web Technologies and LLMs

On the Design of a Data Engineering Learning Platform Using ...

引用

IEEE World engineering Education Conference (EDUNINE)

作者： Jia Yi Venus Lim Ganesh Neelakanta Iyer Department of Computer Science School of Computing National University of Singapore Singapore Singapore

ISBN: (数字)9798331542788

ISBN: (纸本)9798331542795

As data surge, the demand for skilled data engineers significantly increases, underscoring the importance of data engineering. However, learning data engineering skills can be daunting due to the complexity of setting up multiple platforms, often unnecessary as companies typically employ other professionals to handle infrastructure. Additionally, data engineering is rarely taught in traditional educational settings, leaving interested students at a disadvantage. To address this, this project aims to develop a web-based platform that simplifies data engineering learning, providing hands-on experience for free without complex setups for users from different backgrounds. The platform includes a Large Language Model (LLM)-powered chatbot for real-time guidance, creating an interactive learning environment. With access to our platform, users can instantly access the necessary tools and resources. Typically, a web page will have everything required for a course, streamlining the virtual learning process and reducing setup time.

关键词： Electronic learning Large language models Web pages Companies data engineering Real-time systems Complexity theory Surges engineering education Software engineering

来源：评论

学校读者我要写书评

暂无评论

data engineering AND INFORMATION-SYSTEMS

引用

COMPUTER 1986年第1期19卷 18-30页

作者： SHUEY, R WIEDERHOLD, G STANFORD UNIV MED & COMP SCI RESSTANFORDCA 94305

First Page of the Article

关键词： data engineering Information systems Distributed computing Humans Computer displays data communication Postal services Protocols Application software Memory

来源：评论

学校读者我要写书评

暂无评论

data engineering for HPC with Python 9

Data Engineering for HPC with Python

引用

9th Workshop on Python for High-Performance and Scientific Computing (PYHPC)

作者： Abeykoon, Vibhatha Perera, Niranda Widanage, Chathura Kamburugamuve, Supun Kanewalat, Thejaka Amila Maithree, Hasara Wickramasinghe, Pulasthi Uyar, Ahmet Fox, Geoffrey Luddy Sch Informat Comp & Engn Bloomington IN 47408 USA Digital Sci Ctr Bloomington IN 47408 USA Indiana Univ Alumni Bloomington IN 47408 USA Univ Moratuwa Dept Comp Sci & Engn Moratuwa Sri Lanka

ISBN: (纸本)9780738110868

data engineering is becoming an increasingly important part of scientific discoveries with the adoption of deep learning and machine learning. data engineering deals with a variety of data formats, storage, data extraction, transformation, and data movements. One goal of data engineering is to transform data from original data to vector/matrix/tensor formats accepted by deep learning and machine learning applications. There are many structures such as tables, graphs, and trees to represent data in these data engineering phases. Among them, tables are a versatile and commonly used format to load and process data. In this paper, we present a distributed Python API based on table abstraction for representing and processing data. Unlike existing state-of-the-art data engineering tools written purely in Python, our solution adopts high performance compute kernels in C++, with an in-memory table representation with Cython-based Python bindings. In the core system, we use MPI for distributed memory computations with a data-parallel approach for processing large datasets in HPC clusters.

关键词： Python MPI HPC data engineering

来源：评论

学校读者我要写书评

暂无评论

data engineering case-study in digitalized manufacturing 19

Data engineering case-study in digitalized manufacturing

引用

19th IEEE World Symposium on Applied Machine Intelligence and Informatics (SAMI)

作者： Poloskei, Istvan Adesso Hungary Kft Infopk Setany 1 H-1117 Budapest Hungary

ISBN: (纸本)9781728180533

The combination of big data and machine learning appears in the manufacturing context frequently. In a modern factory, data is collected everywhere. It is a challenge for the companies, finding their way to use the produced data. The model's quality is strongly dependent on the quality of the training dataset;the data engineer is responsible for the infrastructure, like providing context and quality input-data for machine learning algorithms. In the discussed case-study, a data pipeline is introduced as a potential solution. It proposes a strategy through the organization, from the shop floor to decision-makers.

关键词： data manufacturing data engineering

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：