Guillaume Rosinosky - StreamBed: capacity planning for stream processing

October 30, 2023

12:45-13:45

Free

Louvain-la-Neuve

Shannon Room Maxwell Building a.105

Guillaume Rosinosky is a post-doc researcher who will present StreamBed: capacity planning for stream processing Distributed Stream Processing (DSP) supports long-lived processing jobs over continuous data.

DSP engines scale out to support parallel processing of incoming data, until they reach the necessary volume of resources. Determining in advance the necessary resources or their configuration for long-running, large-scale jobs is desirable but difficult, as such jobs exhibit non-trivial scaling behaviors or responses to configuration changes.

StreamBed is a capacity planning system for stream processing. It predicts, ahead of any production deployment, the resources that a job will require to process an incoming data rate sustainably, and the appropriate configuration of these resources. StreamBed builds a capacity planning model by piloting a series of runs of the target query in a small-scale, controlled testbed.

We implement StreamBed for the popular Apache Flink DSP engine. Our evaluation with large-scale queries of the Nexmark benchmark demonstrates that StreamBed can effectively and accurately predict capacity requirements for jobs spanning more than 1,000 cores using a testbed of only 48 cores.

Categories Events: