Ad Code

Data Science labs blog

Free Azure Data Factory hands-on labs from Microsoft

Free Azure Data Factory hands-on labs from Microsoft



Abstract and learning objectives

In this workshop, you will deploy an End to End Azure ELT solution. This workshop uses Azure Data Factory (and Mapping Dataflows) to perform Extract Load Transformation (ELT) using Azure Blob storage, Azure SQL DB. Azure DevOps repositories to perform source control over ADF pipelines and Azure DevOps pipelines to deploy across multiple environments including Dev, Test and Production.

By attending this workshop, you will better able to build a complete Azure data factory ELT pipeline. In addition, you will learn to:

  • Deploy Azure Data Factory including an Integration Runtime.

  • Build Mapping Dataflows in ADF.

  • Create Blob Storage and Azure SQLDB Linked Services.

  • Create Azure Key Vault and Linked Services in ADF.

  • Create ADF parameterized pipeline.

  • Install Azure Data Factory self-hosted integration runtime to ingest from on-premises data systems.

  • (In progress) Perform code-free Spark ELT using Azure Data Factory Mapping Dataflows.

  • (To do) Source control ADF pipelines.

  • (To do) CI/CD ADF pipelines and your ELT code.

This hands-on lab is designed to provide exposure to many of Microsoft’s transformative line of business applications built using Microsoft data and advanced analytics. The goal is to show an end-to-end solution, leveraging many of these technologies, but not necessarily doing work in every component possible. The lab architecture is below and includes:

  • Azure Data Factory (ADF)

  • Azure Storage

  • Azure Data Factory Mapping Dataflows

  • Azure SQL Database

  • Azure Key vault

  • (optional) Azure DevOps

Overview

WideWorldImporters (WWI) imports a wide range of products which then resells to retailers and public directly. In an increasingly crowded market, they are always looking for ways to differentiate themselves, and provide added value to their customers.

They are looking to pilot a data warehouse to provide additional information useful to their internal sales and marketing agents. They want to enable their agents to perform AS-IS and AS-WAS analysis in order to price the items more accurately and predict the product demand at different times during the year.

Also to extend their physical presence WWI is extending their business by and recently acquired a medium supermarket business called SmartFoods which their differentiating factor is their emphasis on providing very comprehensive information on food nutrients to customer in order for them to make health wise decisions. SmartFoods run their own loyalty program which customer can accumulate points on their purchases. WWI CIO is hopping to use the loyalty program information and the food nutrients database of SmartFoods to provide customers with a HealthSmart portal. The portal will be showing aggregated information on customers important food nutrients (Carbs, Saturated fats etc.) to promote healthy and SmartFood shopping.

In this hands-on lab, attendees will build an end-to-end solution to build a data warehouse using data lake methodology.

Solution architecture

Below is a diagram of the solution architecture you will build in this lab. Please study this carefully so you understand the solution as whole, before building various components.

Data sources:

  1. SmartFoods Rest API:
TypeRest API
AuthenticationOauth2
Data Endpoints
  1. Order line Transactions (CSV)

  2. Customers (JSON)

  3. Auth Token (JSON)

FrequencyDaily
Documentationhttps://github.com/Mmodarre/retailDataGeneratorAzureFunction
  1. SmartFoods Items
TypeOn premises Local file system
AuthenticationNA
Data Endpoints
  1. Food (CSV)

  2. Food-Nutrition (CSV)

  3. Nutrition (CSV)

FrequencyNA – One Off
Documentation
  1. WWI OLTP
TypeSFTP
AuthenticationUsername/Password
Data Endpoints
  1. Orderline Transactions (Parquet)

  2. Orders Transactions (Parquet)

  3. Customers (Parquet)

FrequencyDaily
Documentation

Requirements

  1. Microsoft Azure subscription Free Trial or pay-as-you-go (Credit Card) or MSDN subscription.

  2. MS Windows development Environment (Only a requirement for Azure Self Hosted IR – If you are using a Linux or Mac OS workstation you can achieve the same by running a Windows VM locally or in Azure)

  3. Azure Storage Explorer

Getting Started

Hands on lab documents are located under Lab-guide directory. Here is the list labs available:

Azure Data Factory:

  1. Before_the_hands-on_lab_(Prepare_the_environment)

  2. Linked_Services_Datasets_and_Integration_Runtimes

  3. Copy_Activity_Parameters_Debug_and_Publishing

  4. Lookup_activity_ForEach_loop_and_Execute_Pipeline_activity

  5. Get_Metadata_activity_filter_activity_and_complex_expressions

  6. Self-hosted_Integration_Runtime__decompress_files_and_Delete_activity

Azure Data Factory Mapping Data Flows:

  1. SmartFoodsCustomerELT

  2. ELT_with_Mapping_Dataflows–Practice_excercises

Reactions

Post a Comment

0 Comments