About this project
Parameterised Azure Data Factory pipeline framework for ELT orchestration across on-premises SQL, cloud data stores, and REST APIs. Includes retry logic, metadata-driven loading, and alerting integration.
Background
The data platform at Accent Group needs to bring together data from on-premises SQL servers (point-of-sale history, inventory), cloud APIs (e-commerce, supplier integrations), and various line-of-business systems, and land it in a consistent, governed form for analytics. Azure Data Factory is the orchestration layer, but without a framework around it, you end up with hundreds of individual pipelines that each hard-code their source connection details and loading logic.
The metadata-driven approach solves this by separating the orchestration logic from the configuration. A control table defines source, destination, watermark column, and load strategy for each dataset. The master pipeline reads that table and dispatches to a generic loading pipeline with those parameters. Adding a new data source means adding a row to the control table — not creating a new pipeline. That design reduced the time to onboard a new source from days to hours.
The watermark-based change detection handles incremental loading: each run records the highest value seen in the watermark column (usually a timestamp or sequence), and the next run picks up from there. Retry logic with exponential backoff handles transient network failures against on-premises sources. Failure alerts via Logic Apps give the data team visibility into pipeline health without requiring them to watch the ADF monitoring console.
Highlights
- Metadata-driven pipeline configuration — new sources added without pipeline changes
- Parameterised linked services and datasets for environment portability
- Incremental load patterns with watermark-based change detection
- Failure alerting via Logic Apps email and Teams notifications
- Deployed and configured as code via Bicep and ADF REST API