2017 IEEE Symposium on Computers and Communications (ISCC)
Download PDF

Abstract

Big data analytics frameworks are developing towards larger degrees of parallelism and shorter task durations to achieve lower latency. Consequently, millions of scheduling decisions need to be made per second, which has posed a big challenge to today's centralized schedulers. Therefore, many researchers and enterprises turn to distributed scheduling approaches to avoid the throughput limitation of centralized designs. To our knowledge, Omega, Apollo and Sparrow are three famous approaches that make prior moves in distributed scheduling but they each have shortcomings and none of them try peer-to-peer architecture. We then propose a new scheduling approach called Piper that adapts peer-to-peer idea to the domain of distributed scheduling, which provides near-optimal performance. We have implemented Piper using Apache Thrift and the results show that Piper reduces job response times by over 1.5× when compared to Sparrow (we select Sparrow for comparison because it is a leading design and has been open source). In addition, trace-driven simulations have been used to evaluate Piper when scaling to large clusters, which further reveals that Piper provides better performance than Sparrow.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles