Based on a real-life case study of the Airflow platform at Digital Harvest Technology in Shanghai, we present the practice of Airflow application, operation and maintenance, and custom development in complex scenarios.
Challenges of complex scenarios：
- how to guarantee high availability for cross-cloud distributed deployments.
- How to effectively support multiple types of scheduling scenarios.
- How to guarantee high availability for ETL operations.
- how scheduling governance is carried out.
- how to achieve maximum automation.
Also for some business requirements：
- Data analysts have a large number of scheduling needs, and it is difficult to get started with DAG Python script development
- DAGs belonging to departments or individuals do not want to be edited, viewed and manually scheduled by other departmental staff?
- How to improve efficiency and avoid some non-standardized operations?
- How can the messaging system trigger job run approval?
Share the corresponding optimization solutions：
- DAG configuration visualization: interface configuration of DAG parameters, background automatic generation of DAG files.
- DAG authority control: sub-departmental DAG empowerment, distinguishing between read, write and execution.
- Job standardization monitoring: configure detection rules to monitor whether the job conforms to the rules and execute the corresponding prompts.
- Event triggering plug-in: receive various messages such as Sensor jobs and AMQP to trigger the execution of the corresponding jobs.”
Wu Lian: Big data development engineer at Shanghai DataSeed Information Technology with 2 years of experience in using, maintaining and developing airflow, and a deep understanding of airflow, I hope my experience and understanding can contribute to the airflow open source community.