Abstract
Emerging scientific simulations on leadership class systems are generating huge amounts of data. However, the increasing gap between computation and disk I/O speeds makes traditional data analytics pipelines based on post-processing cost prohibitive and often infeasible. In this paper, we investigate an alternate approach that aims to bring the analytics closer to the data using data staging and the in-situ execution of data analysis operations. Specifically, we present the design, implementation and evaluation of a framework that can support in-situ feature based object tracking on distributed scientific datasets. Central to this framework is the scalable decentralized and online clustering (DOC) and cluster tracking algorithm, which executes in-situ (on different cores) and in parallel with the simulation processes, and retrieves data from the simulations directly via on-chip shared memory. The results from our experimental evaluation demonstrate that the in-situ approach significantly reduces the cost of data movement, that the presented framework can support scalable feature-based object tracking, and that it can be effectively used for in-situ analytics for large scale simulations.