With the development of digital economy, the number of data centers and their capacity have been increasing sharply. CFD is currently widely used to obtain the thermal fields inside data centers while a full simulatio...
详细信息
Distributed machine learning (ML) has triggered tremendous research interest in recent years. Stochastic gradient descent (SGD) is one of the most popular algorithms for training ML models, and has been implemented in...
详细信息
ISBN:
(数字)9781728129037
ISBN:
(纸本)9781728129044
Distributed machine learning (ML) has triggered tremendous research interest in recent years. Stochastic gradient descent (SGD) is one of the most popular algorithms for training ML models, and has been implemented in almost all distributed ML systems, such as Spark MLlib, Petuum, MXNet, and TensorFlow. However, current implementations often incur huge communication and memory overheads when it comes to large models. One important reason for this inefficiency is the row-oriented scheme (RowSGD) that existing systems use to partition the training data, which forces them to adopt a centralized model management strategy that leads to vast amount of data exchange over the network. We propose a novel, column-oriented scheme (ColumnSGD) that partitions training data by columns rather than by rows. As a result, ML model can be partitioned by columns as well, leading to a distributed configuration where individual data and model partitions can be collocated on the same machine. Following this locality property, we develop a simple yet powerful computation framework that significantly reduces communication overheads and memory footprints compared to RowSGD, for large-scale ML models such as generalized linear models (GLMs) and factorization machines (FMs). We implement ColumnSGD on top of Apache Spark, and study its performance both analytically and experimentally. Experimental results on both public and real-world datasets show that ColumnSGD is up to 930× faster than MLlib, 63× faster than Petuum, and 14× faster than MXNet.
In the relentless pursuit of scientific advancement, comprehending the profound impact and innovation nature inherent in funded research projects assumes paramount significance. To illuminate this matter, I delve into...
详细信息
Top-k queries in uncertain databases are quite popular and useful due to its wide application usage. However, compared to top-k in traditional databases, queries over uncertain database are more complicated because of...
详细信息
ISBN:
(纸本)9781424449934
Top-k queries in uncertain databases are quite popular and useful due to its wide application usage. However, compared to top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. A Top-k aggregate query ranks groups of tuples by their aggregate values, sum or average for example, and returns k groups with the highest aggregate values. As a powerful semantic of top-k, global top-k, returns A highest-ranked tuples according to their probabilities of being in the top-k anTopswers in possible worlds. We propose a dynamic programming based method to process global top-k aggregate queries in uncertain database, where the number of retrieved tuples and group states generated on these tuples are minimized. Experiment results show that our algorithm is effective.
Nowadays, micro-video sharing platforms have become popular tools for people creating and viewing micro-videos in daily life. Micro-video recommendation task has attracted significant attention from researchers, recen...
详细信息
A survey about the information needs of elderly people could find out the information required to address the needs of the aged in a community. Analyzing data collected from 600 elderly people through field investigat...
详细信息
A survey about the information needs of elderly people could find out the information required to address the needs of the aged in a community. Analyzing data collected from 600 elderly people through field investigation with a questionnaire in a rural community in central China, the results show that the preferred information format of the majority of aged people is audio and/or visual information product, especially audio product. Most of the aged people stated that they were in need of healthy and medical non-educational audio information products. The survey maybe lead to improved and expanded information services for respondents who are short of such services, including Public broadcasting services, extending the audiovisual collection, loaning audiovisuals, religious faith audiovisuals and others providing needed information to them. In summary, this paper assembles views on what the elderly people currently need to be helped by both practitioners and researchers in the elderly people services domain.
A survey about the information needs of elderly people could find out the information required to address the needs of the aged in a community. Analyzing data collected from 600 elderly people through
A survey about the information needs of elderly people could find out the information required to address the needs of the aged in a community. Analyzing data collected from 600 elderly people through
This study presents a novel heuristic algorithm called the "Minimal Positive Negative Product Strategy" to guide the CDCL algorithm in solving the Boolean satisfiability problem. It provides a mathematical e...
详细信息
Web sites have become the main targets of many attackers. Signature-based detection needs to maintain a large signature database and Honeypot based methods are not efficient. Since attackers always make the malicious ...
详细信息
Web sites have become the main targets of many attackers. Signature-based detection needs to maintain a large signature database and Honeypot based methods are not efficient. Since attackers always make the malicious codes in Web pages difficult to detect by the browser users, their methods can be classified into various fingerprints. Various malicious codes were analyzed to identify 6 types of fingerprints. The system utilizes a spider integrated with script interpretation to fetch target Web pages and extract specific tags for detection by HTML parsing for matching with the fingerprints to detect malicious codes. This method needs fewer fingerprints than traditional detection methods and is more efficient. Results for 60 websites show that the system has a false negative rate of 2.63% and a false positive rate of 1.99%.
暂无评论