Apache Hadoop has been a major component in the big data ecosystem for more than a decade. It relies on the Hadoop distributed File System (HDFS) to store large datasets and MapReduce to process these distributed data...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Apache Hadoop has been a major component in the big data ecosystem for more than a decade. It relies on the Hadoop distributed File System (HDFS) to store large datasets and MapReduce to process these distributed datasets. HDFS manages the metadata of all its files through a server known as the Namenode. To achieve high availability (HA), Hadoop clusters typically deploy two Namenodes: one active and one standby. This architecture enables Hadoop to store and process massive files with good reliability. However, HDFS often encounters significant performance degradation when managing a large number of small files. Scanning all files to locate the small ones through the Namenode’s service becomes time-consuming and adds extra burdens to the Namenodes. There is a lack of research on how to identify the hotspots of small files in HDFS without querying the Namenode in a time-efficient manner. In this paper, we designed a big data system to identify small files in HDFS by parsing the File System Image (FSImage), which is generated periodically on the Namenode. This system utilizes the standby Namenode to parse the FSImage and send the file information to an Apache Kafka topic. A Doris Routine Load procedure then listens to the topic and loads the information into a table containing the file information for real-time querying by users. This approach allows Hadoop cluster administrators to identify small files using the standby Namenode without impacting the normal operation of the Hadoop cluster. Additionally, it can serve as a tool to locate small-file hotspot tables in a Hive data warehouse.
A virtual channel increases the network throughout, and it also makes the implementation complex, thus increasing the setup latency. In this paper, the PAM routing algorithm, which has no virtual channel and good perf...
详细信息
ISBN:
(纸本)7563504028
A virtual channel increases the network throughout, and it also makes the implementation complex, thus increasing the setup latency. In this paper, the PAM routing algorithm, which has no virtual channel and good performance, is introduced based on k-ary n-mesh.
With the continuous emergence of large-scale low-Earth orbit satellite constellations and the service-oriented functionality of satellites, we have designed a dynamic service substitution approach to enhance space-bas...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
With the continuous emergence of large-scale low-Earth orbit satellite constellations and the service-oriented functionality of satellites, we have designed a dynamic service substitution approach to enhance space-based missions’ reliability and completion rate. Based on our formal description model for satellite services, this approach generates one-to-one direct substitution or one-to-many composite substitution schemes when satellite service failures are detected, enabling task migration between satellites through real-time monitoring of satellite status. Additionally, when it is impossible to generate service substitution schemes that strictly meet function and resource requirements, this approach produces suboptimal solutions with relaxed constraints. These solutions are evaluated using our proposed evaluation model. Testing the prototype system demonstrated that our dynamic service substitution approach improves satellite missions’ reliability and completion rate compared to existing methods.
Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platfor...
详细信息
Today's supercomputers are moving towards deployment of many-core processors like Intel Xeon Phi Knights Landing (KNL), to deliver high compute and memory capacity. Applications executing on such many-core platforms with improved vectorization require high memory bandwidth. To improve performance, architectures like Knights Landing include a high bandwidth and low capacity in-package high bandwidth memory (HBM) in addition to the high capacity but low bandwidth DDR4. Other architectures like Nvidia's Pascal GPU also expose similar stacked DRAM. In architectures with heterogeneity in memory types within a node, efficient allocation and data movement can result in improved performance and energy savings in future systems if all the data requests are served from the high bandwidth memory. In this paper, we propose a memory-heterogeneity aware runtime system which guides data prefetch and eviction such that data can be accessed at high bandwidth for applications whose entire working set does not fit within the high bandwidth memory and data needs to be moved among different memory types. We implement a data movement mechanism managed by the runtime system which allows applications to run efficiently on architectures with heterogeneous memory hierarchy, with trivial code changes. We show upto 2X improvement in execution time for Stencil3D and Matrix Multiplication which are important HPC kernels.
The modeling and simulation on evacuation has recently become a topic of great interest. We present an agent-based model to construct crowd evacuations for emergency response from an area under an explosion. Various t...
详细信息
The modeling and simulation on evacuation has recently become a topic of great interest. We present an agent-based model to construct crowd evacuations for emergency response from an area under an explosion. Various types of agents are designed as well as the interactions of them are concerned in contrast to traditional models in which the total populations are considered to be consisted of identical individuals and the interactions between them are omitted. Different cases are taken into account to test the effect of our model by iterative simulations. At last, plenty simulation results suggest several effective ways to minimize the harmful consequences of such life-threaten events.
In modern datacenter networks (DCNs), the booming online data-intensive applications generate mix-flows with or without deadlines. Balancing these heterogenous flows among parallel equal-cost paths to meet the tight d...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
In modern datacenter networks (DCNs), the booming online data-intensive applications generate mix-flows with or without deadlines. Balancing these heterogenous flows among parallel equal-cost paths to meet the tight deadlines is crucial. However, due to the unaware of deadlines, the existing load balancing mechanisms cannot choose suitable (re)routing path for mix-flows to meet their respective stringent requirements. In this paper, we propose a deadline-aware rerouting scheme called DAR, which applies different routing strategies for mix-flows. Specifically, DAR first perceives the deadline flows and then categorizes them based on the urgency of the deadline, and employs different (re)routing strategies to ensure that flows with more urgent deadlines are completed earlier. The NS-3 simulation results show that DAR effectively balances mix-flows. For example, compared to the state-of-the-art load balancing schemes, DAR reduces the deadline miss rate and the average flow completion time (AFCT) by up to 38% and 35.5%, respectively.
This paper shows that testing open object-based distributed systems is, in principle, feasible by applying the ISO/ITU-T conformance testing methodology framework. In this context, a gap in the (semi-) automatic test ...
详细信息
This paper shows that testing open object-based distributed systems is, in principle, feasible by applying the ISO/ITU-T conformance testing methodology framework. In this context, a gap in the (semi-) automatic test development and execution process has been identified and bridged. We have developed an approach to the application of the standardized test notation TTCN for testing ODL-based implementations. The equivalents of TTCN features informal TINA specifications have been identified. Further a TTCN-based test system implementation has been integrated in the CORBA environment. Test specification and execution are discussed w.r.t. the TINA retailer reference point.
There was a lot of interest in multicast communications within this decade as it is an essential part of many network applications, e.g. video-on-demand, etc. In this paper, we model flow rate allocation for applicati...
详细信息
There was a lot of interest in multicast communications within this decade as it is an essential part of many network applications, e.g. video-on-demand, etc. In this paper, we model flow rate allocation for application overlay as a utility based optimization problem constrained by capacity limitations of physical links and overlay constraints. The optimization flow control presented here addresses not only concave utility functions which are suitable for applications with elastic traffics, but also especial forms of non-concave utilities that are used to model applications with inelastic traffics, which might demand for hard delay and rate requirements. We then propose an iterative algorithm as the solution to the optimization flow control problem and investigate especial forms of non-concave utilities that are supported by this model. Simulation results show that the iterative algorithm can be used to deal with sigmoidal-like utilities which are useful for modeling real-time applications such as live streaming.
Requirements for interoperability and reusability motivate the use of object oriented middleware like the Common Object Request Broker Architecture (CORBA). However, unless CORBA can be implemented efficiently, it wil...
详细信息
ISBN:
(纸本)0769510892
Requirements for interoperability and reusability motivate the use of object oriented middleware like the Common Object Request Broker Architecture (CORBA). However, unless CORBA can be implemented efficiently, it will not be widely used in real time and other latency-sensitive distributed applications. The paper presents three performance enhancement techniques for CORBA based middleware. Two of these exploit limited heterogeneity in systems. In such a system a standard CORBA protocol is used when clients and servers interacting with one another are implemented by using different programming languages and/or operating systems. However, when a similar client-server pair built using the same technology communicates, a number of CORBA operations are bypassed, thus reducing the communication overhead. Based on a commercial middleware product and measurements made on a performance prototype running on a network of workstations, this research demonstrates that there is a strong potential for achieving a significant performance improvement by incorporating these techniques into the middleware.
暂无评论