We envision PDS2, a decentralized data marketplace in which consumers submit their tasks to be run within the platform, on the data of willing providers. The goal of PDS2 is to ensure that users maintain full control ...
详细信息
ISBN:
(纸本)9781665448901
We envision PDS2, a decentralized data marketplace in which consumers submit their tasks to be run within the platform, on the data of willing providers. The goal of PDS2 is to ensure that users maintain full control on their data and do not compromise their privacy, while being rewarded for the value that their data generates. In order to achieve this, our marketplace architecture employs blockchain technology, privacypreserving computation and decentralized machine learning. We then compare different potential solutions and identify the Ethereum blockchain, trusted execution environments and gossip learning as the most suitable for the implementation of PDS2. We also discuss the main open challenges that are left to tackle and possible directions for future work.
Most application code evolves incrementally, and especially so when being maintained after the applications have been deployed. Yet, most data-flow analyses do not take advantage of this fact. Instead they require cli...
详细信息
ISBN:
(纸本)9781450327565
Most application code evolves incrementally, and especially so when being maintained after the applications have been deployed. Yet, most data-flow analyses do not take advantage of this fact. Instead they require clients to recompute the entire analysis even if little code has changed a time consuming undertaking, especially with large libraries or when running static analyses often, e.g., on a continuous-integration server. In this work, we present REVISER, a novel approach for automatically and efficiently updating inter-procedural data flow analysis results in response to incremental program changes. REVISER follows a clear-and-propagate philosophy, aiming at clearing and recomputing analysis information only where required, thereby greatly reducing the required computational effort. The REVISER algorithm is formulated as an extension to the IDE framework for Inter-procedural Finite Distributed Environment problems and automatically updates arbitrary TDE-based analyses. We have implemented REVISER as an open source extension to the Heros IFDS/IDE solver and the Soot program-analysis framework. An evaluation of REVISER on various client analyses and target programs shows performance gains of up to 80% in comparison to a full recomputation. The experiments also show REVISER to compute the same results as a full recomputation on all instances tested.
This position paper applies real-option-theory perspective to agile software development. We complement real-option thinking with the use of measurements to support midcourse decision-making from the viewpoint of the ...
详细信息
ISBN:
(纸本)9781605580210
This position paper applies real-option-theory perspective to agile software development. We complement real-option thinking with the use of measurements to support midcourse decision-making from the viewpoint of the client. Our position is motivated by using empirical data gathered from secondary sources.
data analysis is an exploratory, interactive, and often collaborative process. Computational notebooks have become a popular tool to support this process, among others because of their ability to interleave code, narr...
详细信息
ISBN:
(数字)9781665479561
ISBN:
(纸本)9781665479561
data analysis is an exploratory, interactive, and often collaborative process. Computational notebooks have become a popular tool to support this process, among others because of their ability to interleave code, narrative text, and results. However, notebooks in practice are often criticized as hard to maintain and being of low code quality, including problems such as unused or duplicated code and out-of-order code execution. data scientists can benefit from better tool support when maintaining and evolving notebooks. We argue that central to such tool support is identifying the structure of notebooks. We present a lightweight and accurate approach to extract notebook structure and outline several ways such structure can be used to improve maintenance tooling for notebooks, including navigation and finding alternatives.
The satellite data transmission scheduling problem is, given a set of imaging satellites, ground stations, and observation targets in the scheduling scene, to study how to complete the satellite-constrained conflict a...
详细信息
ISBN:
(纸本)9781665478366
The satellite data transmission scheduling problem is, given a set of imaging satellites, ground stations, and observation targets in the scheduling scene, to study how to complete the satellite-constrained conflict adjustment and resource allocation and formulate reasonable satellite observation and receiving plans under the premise of limited satellite resource capacity and ground station resource capacity. The problem involves complex constraints, which are very difficult to solve, and it has been proven to be an NP-hard problem. The satellite data transmission scheduling optimization model was established based on studying the advantages and disadvantages of the existing satellite data transmission scheduling problem model. Combined with the characteristics of the satellite data transmission scheduling model, the method of assigning tasks to each satellite and the first-observe-then-transmit classification constraint processing model were adopted to reduce the mutual influence between the constraints and the difficulty of solving the problem. An improved local search algorithm with three pruning strategies was proposed. In order to verify the correctness and effectiveness of the model and algorithm, a data transmission scheduling scene and test example were constructed. The experimental results show that the improved local data search algorithm combined with the pruning strategy is better for soling the problem, and they provide guidance for practical engineering applications.
data has been quickly becoming as the fuel, the new oil, of growth and prosperity of companies in the modern age. With useful data and sufficient tools, companies have the ability to enhance their current products, pr...
详细信息
This tutorial provides an overview of the role of ethnography in softwareengineering research. It describes the use of ethnographic methods as a means to provide an in-depth understanding of the socio-technological r...
详细信息
Assessing the effectiveness of a development methodology is difficult and requires an extensive empirical investigation. Moreover, the design of such investigations is complex since they involve several stakeholders a...
详细信息
ISBN:
(纸本)9783642018527
Assessing the effectiveness of a development methodology is difficult and requires an extensive empirical investigation. Moreover, the design of such investigations is complex since they involve several stakeholders and their validity can be questioned if not replicated in similar and different contexts. Agilists are aware that data collection is important and the problem of designing and execute meaningful experiments is common. This workshop aims at creating a critical mass for the development of new and extensive investigations in the Agile world.
Deep learning and large scale language models have significantly advanced dataengineering, particularly in natural language processing. However, it has been pointed out that deep learning is a black box, raising conc...
详细信息
ISBN:
(纸本)9798350376975;9798350376968
Deep learning and large scale language models have significantly advanced dataengineering, particularly in natural language processing. However, it has been pointed out that deep learning is a black box, raising concerns about its interpretability. To address this, methods for extracting decision bases and training data that influence the decision have been proposed. This paper focuses on a method for extracting training data that have a large impact on the decision, and proposes a method for improving training data by identifying and excluding training data that lead to incorrect decisions. The method divides the training data into subsets-the training data in the training data, the verification data in the training data, and the test data in the training data- and uses them to train and test the task. It then identifies and excludes training data that cause incorrect classifications. We evaluate this approach on document classification tasks using BERT, which is a large scale language model, demonstrating its effectiveness. We also evaluate our method on popular corpora, highlighting both its strengths and limitations.
This research focuses on the study of social networks of Thai researchers that conducted research in the computer science and information technology fields and submitted their works to two conferences which are the In...
详细信息
暂无评论