Globally Optimized and Automated Data/ML Pipelines on the Cloud

Efficiently managing the infrastructure of thousands of data pipelines in multi-tenant environments is a daunting task for any enterprise. Poor application tuning, resource allocation, and scheduling can lead to exorbitant costs on the cloud, sluggish performance, and failed jobs due to the intractable infrastructure search space.

We present the automatic globally optimized resource allocation (AGORA) scheduler capable of simultaneously optimally selecting Apache Spark configurations, cloud hardware infrastructure, and Airflow schedules to obtain the mathematically best infrastructure possible.

We experimentally demonstrate on AWS up to 78% cost savings or 31% performance acceleration of a Spark and Airflow DAG. Simulations of our solution on a multi-day Alibaba cloud trace demonstrates a 62% reduction in cost and runtime for their batch workloads.

Home