Mitigating YARN Container Overhead with Input Splits

Wonbae Kim; Young-ri Choi; Beomseok Nam

doi:10.1109/CCGRID.2017.106

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

Mitigating YARN Container Overhead with Input Splits

Year: 2017, Pages: 204-207

DOI Bookmark: 10.1109/CCGRID.2017.106

Authors

Wonbae Kim
Young-ri Choi
Beomseok Nam

Abstract

We analyze YARN container overhead and present early results of reducing its overhead by dynamically adjusting the input split size. YARN is designed as a generic resource manager that decouples programming models from resource management infrastructures. We demonstrate that YARN's generic design incurs significant overhead because each con- tainer must perform various initialization steps, including authentication. To reduce container overhead without changing the existing YARN framework significantly, we propose leverag- ing the input split, which is the logical representation of physical HDFS blocks. With input splits, we can combine multiple HDFS blocks and increase the input size of each container, thereby enabling a single map wave and reducing the number of containers and their initialization overhead. Experimental results shows that we can avoid recurring container overhead by selecting the right size for input splits and reducing the number of containers.

Like what you’re reading?

Already a member?

Get this article FREE with a new membership!

Coalescing HDFS Blocks to Avoid Recurring YARN Container Overhead
2017 IEEE 10th International Conference on Cloud Computing (CLOUD)
Performance evaluation of fair and capacity scheduling in Hadoop YARN
2015 International Conference on Green Computing and Internet of Things (ICGCIoT)
A Study of Data Locality in YARN
2015 IEEE International Congress on Big Data (BigData Congress)
Different Scheduling Options in YARN
2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE)
CoLoc: Distributed data and container colocation for data-intensive applications
2016 IEEE International Conference on Big Data (Big Data)
JellyFish: Online Performance Tuning with Adaptive Configuration and Elastic Container in Hadoop Yarn
2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)
Automatic, On-Line Tuning of YARN Container Memory and CPU Parameters
2016 IEEE 18th International Conference on High-Performance Computing and Communications, IEEE 14th International Conference on Smart City, and IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
Cross-Platform Resource Scheduling for Spark and MapReduce on YARN
IEEE Transactions on Computers
Analysis and Modeling of Resource Management Overhead in Hadoop YARN Clusters
2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)
Modeling and Verifying Spark on YARN Using Process Algebra
2019 IEEE 19th International Symposium on High Assurance Systems Engineering (HASE)

Mitigating YARN Container Overhead with Input Splits

Authors

Abstract

Related Articles