Abstract
Wide-area memory transfers between on-going computations and remote steering, analysis and visualization sites can be utilized in several High-Performance Computing (HPC) scenarios. Dedicated network connections with high capacity, low loss rates and low competing traffic, are typically provisioned over current HPC infrastructures to support such transfers. To gain insights into such transfers, we collected throughput measurements for different versions of TCP between dedicated multi-core servers over emulated 10 Gbps connections with round trip times (rtt) in the range 0-366 ms. Existing TCP models and measurements over shared links are well-known to exhibit monotonically decreasing, convex throughput profiles as rtt is increased. In sharp contrast, our these measurements show two distinct regimes: a concave profile at lower rtts and a convex profile at higher rtts. We present analytical results that explain these regimes: (a) at lower rtt, rapid throughput increase due to slow-start leads to the concave profile, and (b) at higher rtt, TCP congestion avoidance phase with slower dynamics dominates. In both cases, however, we analytically show that throughput decreases with rtt, albeit at different rates, as confirmed by the measurements. These results provide practical TCP solutions to these transfers without requiring additional hardware and software, unlike Infiniband and UDP solutions, respectively.