I have started reading about alternatives to Airflow. Lately, Prefect and Dagster have come up quite a bit so I decided to do a short comparison of these technologies for my use case where I usually run up to a 100 short lived data movement tasks (Lambdas/Azure Functions/AWS Batch Jobs/ Container Instances) per day. After that the data transformation is done using DBT
​ | Airflow | Prefect | Dagster |
Managed Service Available | MWAA, Astronomer, GCP composer | You have to manage your own cluster for the workflows to run | Dagster cloud preview. It seems to still expect you to have your own agents to run the workflows on. |
Local Testing convenience scale 1..5 | 2 | 5 | 5 |
Connectors | A lot of them | ​You have to write your own | ​You have to write your own |
Coupled with data business logic | Nope, but a lot of users still couple it | More coupled | Most coupled with |
Performance | Scheduler is the bottleneck | First impressions have been good | First impressions have been good |
MANAGED SERVICES AVAILABLE
For me this is super important as I want to manage as little as possible. I would say that this is for me the biggest issue of Prefect. Dagster Cloud is still in preview so hard to comment here but they seem to be going with the same approach as Prefect. Airflow at the same time has some managed options but they start from 500 dollars a month which is annoingly too much for me.
LOCAL TESTING.
Although it's been made simpler thanks to Official Airflow docker image, Airflow is by far the most inconvenient here as you can't run Python scripts directly.
Dagster and Prefect work the best as you run them as regular Python functions
CONNECTORS
I want to run my business logic separately from my scheduler. The easier I can do it the better. So far I have not seen a very simple way to connect Dagster or Prefect with services like Lambdas or Azure Functions. These services expect you run the workflows under their management. Airflow provides a lot of connectors
COUPLED SCHEDULING AND BUSINESS LOGIC
PERFORMANCE
From literature it seems Prefect is the best here as it solves a lot issues that Airflow had with scheduling delays.
REFERENCES
Benefits of Dagster compared to Airflow https://dagster.io/blog/dagster-airflow
Comments