This GitHub repository houses terraform script to deploy Azure persistent Infrastructure to support Big Data operations for Data Engineering using Azure Databricks (Spark).
- Azure Resiurce Group - The logical wrapping of application specific resources together.
- Azure Datalake Storage - This stores data in the form of
FilesandBlobs.- Azure Databricks - The Computation layer that provided
Sparkfor Data Engineering.- Azure Virtual Network - The Network restrictions that restricks
open internet accessfor Data Engineering infrastructure.- Azure Subnet - The resource or application level restrictions to restrict traffic between Infrastructure components.
- Azure Network Security Group (NSG) - The security groups that houses rules to restrict
ingressandegressnetworktraffic.- Azure Network security rules - The
rulesinside NSGs to restrict application specific Ingress/Egress traffic.- Azure NSG Subnet association - The association between
Azure SubnetandAzure NSGsto apply specific rules to specific applications.- Azure SQL Server - The SQL server to house
SQL databaseson Azure.- Azure Synapse - The
Data warehousinglayer on SQl server to store and process huge amount of data on azure.- Azure Virtual machine - The
Linux Virtual machinesto support Data Engineering needs and Visualization.- Azure Keyvault - The Vault to store
Keys,SecretsandCertificateon Azure instead of hard coding.
- The repository also houses
Dockerfileto supportJenkinsslave to supportDevOpsautomating. PowerShellscripts to provision access uisngaz cli commandsand modify resource levelconfiguration.Jenkinsfile to support automatedDevOpsdeployment integrated withGitHub.
Latest emhancements will be updated to Master branch for release.
-
masterbranch- merge needs
Pull Requestreview/approval. - Once reviewed and merged with
develop, raisePull Requestformasterfor enhancement to be made.
- merge needs
-
developbranch- Create Enhancement wise branches out of it.
- Work enhancements wise contribution.
- Push latest code with
Pull Requestsand get reviewed. - merge needs
Pull Requestsreview/approval.