Azure Cloud Data Engineers
Azure Cloud
Data Engineers
Summary

An Azure cloud engineer from PeoplActive will be responsible for managing, maintaining, monitoring, and securing (including data security) all servers including installations, upgrades, patches, and documentation. He is accountable to meet deliverable commitments and quality compliance.

  • A Data Platform Solution Architect presently associated with Confidential Corporation with a strong consulting and presales background possessing 15 years of hands on experience in Big Data, Data Science, Cloud and Enterprise applications.
  • Implemented large Lamda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Demonstrated expert level technical capabilities in areas of Azure Batch and Interactive solutions, Azure Machine learning solutions and operationalizing end to end Azure Cloud Analytics solutions.
  • Strong experience in Azure Cloud platforms like
  • Credited with 4 individual (sole inventor) patents in BigData, Analytics, mobility and cellular technology areas at and was rated a PDF level 5 at IBM Business consulting services (BCS, which is the highest level of technical competence at IBM business consulting services United States.
  • Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.
  • Strong experience of leading multiple Azure Big Data and Data transformation implementations in Banking and Financial Services, High Tech and Utilities industries.
  • Possesses hands on experience in Cloudera Hadoop, HortonWorks Hadoop, various ETL tools, Cassandra, and various Confidential IaaS/PaaS services.
  • Owned multiple end – to-end transformations of customer business analytics challenges decomposing them into a combination of suitable infrastructure (IaaS/PaaS/Hybrid) and software (Mapreduce) paradigms and then used machine learning algorithms to derive effective intelligence from data lakes.
  • Sized and designed scalable Big Data landscapes with central Hadoop processing platforms and associated technologies like ETL tools and NoSQL databases on both Cloud and On-Prem hardware to enable end to end business use cases.
  • Rated Top Data Platform Architect in Confidential ’s Connect(Appraisal Program)
Technical Skills
Server

16 two socket Confidential Xeon Servers based on E5-2600 family of processors

DataNodes

26 TB HDFS raw storage / node with 96GB of RAM / node.

Network Adapters

10GbE Converged Network adapters

Switches

2 x 48 Port 10 GbE

Professional Experience
Azure Data Engineer/ Design
Confidential
  • Designed and architected scalable data processing and analytics solutions, including technical feasibility, integration, development for Big Data storage, processing and consumption of Azure data, analytics, big data (Hadoop, Spark), business intelligence (Reporting Services, Power BI), NoSQL, HDInsight, Stream Analytics, Data Factory, Event Hubs, and Notification Hubs.
  • Designed and built a Data Discovery Platform for a large system integrator using Azure HDInsight components. Used Azure data factory and data Catalog to ingest and maintain data sources. Security on HDInsight was enabled using Azure Active directory and
  • Designed an end-to-end Azure cloud-based analytics dashboard for a state government for showing real-time updates for the recent state assembly elections 2016. The solution utilized PowerBI, Enterprise Gateway, and Azure SQL Server.
  • Designed and built a machine learning model to predict airline ticket prediction for a system integrator that enabled them to reduce their employee travel expenditure by 26%. The model used multiple regression models to predict how airline seat prices and hotel reservations varied across the year given various parameters.
  • Owns the Azure technical customer engagement including architectural design sessions and implementation of projects using big data use-cases, Hadoop-based design patterns, and real-time/stream analytics.
  • Designed machine learning models for a major systems integrator to predict datacentre failure rates.
  • Conducted numerous training sessions, demonstration sessions on BigData for various Government and Private sector customers ramping them up on Azure Big Data solutions.
  • Built numerous technology demonstrators using Confidential Edison Arduino shield using Azure EventHub and Stream Analytics, integrated with PowerBI and Azure ML to demonstrate the capabilities of Azure Stream Analytics.
  • Designed machine learning program models(classifiers) for the state of Andhra Pradesh to predict dropouts in their primary and secondary schooling system. These models ingested data from population demographics, student historical performance/dropout rate, and family economics to arrive at a prediction and probability for the dropout rate.
  • Designed an Azure EventHub and Stream Analytics architecture fulfilling the analytics and data storage requirement for a Smart Grid project. Project Envisaged installing around 70,000 electric consumption smart meters at residences and small-scale industries all controlled by a smart grid.
Senior Architect
Confidential
  • Conceptualized and built a Big Data analytics platform at Confidential with open source tools to enable a wide range of Customers to do proof of concepts at very low price points. This initiative has helped in the sales and account mining process with a significant percentage of customers re-engaging after a successful proof of concepts leading to the generation of active Big Data projects.
  • Architected multiple projects using Hadoop both in fresh implementation and production support cases.
  • Senior architect responsible for anchoring the transformation journey for replacing the customer’s existing EDW with a Lamda data processing architecture with MapR Apache Hadoop distribution as the core engine. The total data size is approx. 7.5 Petabytes.
  • Working with customer architects to create roadmaps for 3 separate business tracks migrating to the new data processing architecture.
  • Decomposing requirements from individual business tracks into small use cases which could then be realized using time-bound proof of concepts.
  • Providing hands-on expertise in the realization of small PoC’s to gain business confidence leading to a commitment to the overall transformation
  • Customer Support Area: Enabling loading SFDC data using Sqoop into Hive Tables for querying
  • Designing logical zones for data in MapR PoC clusters for loading data.
  • Studying incoming structures and ETL transformations happening on those files. Translated the same into appropriate Hive Jobs. Also trying out SparkSQL as an alternative.
  • Working with the customer chief architect to define his vision and help him in tool selection.
  • Lead architect in a Big Data implementation for an Australian Financial Services customer enabling Advice Compliance Monitoring using Cloudera Hadoop CDH 5.2.1 and associated Big Data technologies.
  • Architected the Big data Landscape components on AWS cloud which included Linux staging server, Informatica Big Data Edition, Cloudera Hadoop (CDH 5.2.1), Amazon Redshift, and Qlikview.
  • Provided hands-on installation leadership and guidance on setting up and managing the Cloudera Hadoop Cluster with data at rest and data in motion encryption.
  • Designed Hadoop Cluster security using various security protocols which included – –
  • Conducting data security workshops to explain data security aspects and gained insight into customer requirements
  • Enabling cluster perimeter security using AWS VPC
  • Conducting EC2 Port audits to enable only those ports needed by Cloudera Manager and Hadoop to perform
  • Controlling user access using coarse security using Linux access protocols and integration with LDAP for users to seamlessly access HUE.
  • Created data silos to limit visibility into data using Apache Sentry so that data is displayed on a need to know basis
  • Configured data in motion encryption to encrypt shuffle and sort phases of MapReduce
  • Provided insight into Data at rest encryption using Navigator Encrypt utility.
  • Used Cloudera Manager and Cloudera Director to install separate Hadoop Clusters for Development and Testing. Performed benchmark testing on the Hadoop Clusters to ensure. Worked with Cloudera professional services to ensure the cluster was certified for production jobs.
  • Worked with the ETL Team to load the Financial Statements of Advice (SoA) into the HDFS layer.
  • Translated the requirements which included Comparison of Scope and Strategy, Comparison of Scope of advice and recommended products, Cost comparison, Lost Benefits, Insurance ownership into effective MapReduce algorithms which were then run on the Financial Statement of Advice documents.
  • Loaded results in Hive and then extracted them with the Informatica ODBC connectors to an AWS Redshift Cluster for R-based analysis by the customer’s data scientists.
  • Transitioned the cluster to production support and managed production support activities like Cluster space management, maintenance of critical services like Mapreduce, Hive, Impala till handover to managed services.
Azure Data Engineer/ Design
Confidential
  • Installed HortonWorks Hadoop cluster on Confidential Azure cloud in the UK region to satisfy customer’s data locality needs
  • Created Data ingestion methods using SAP BODS to load data from SAP CRM and SAP ECC into HDFS
  • Used Ozzie workflows to query the data to extract tables into MongoDB.
  • Mongo view was used as the operational data store to view the data and generate reports.
  • Built a Hortonworks Cluster on Confidential Azure to extract actionable insights for data collected from IoT sensors installed in excavators.
  • Loaded machine data collected via installed sensors and GPS equipment from an MSSQL data dump into the HDP cluster using sqoop.
  • Conducted multiple workshops with the business to understand the data and determine which insights would bring the most immediate value.
  • Processed the data in HDP using Hive and provided simple analytics using Highcharts on a Meteor UI platform.
  • Used General Linear Model using MS Big ML and provided insights into the risk of failure of these excavators. The analysis was conducted by collating sensor and maintenance data from multiple sources like engine oil temperature, pressure, maintenance history, battery level, and fuel gradient.
  • Project is inflight and is a work in progress to enable multiple insights for the Owner, dealer, and manufacturer.
Big Data Cloud Architect
Confidential
  • Successfully strategized and executed multiple enterprise cloud transformation programs in areas of Big Data and Enterprise Applications in the below areas
  • Designed and built multiple Elastic Map Reduce (EMR) clusters on AWS cloud for Hadoop map-reduce applications for enabling multiple proofs of concepts.
  • Architected Big Data clusters on the cloud to ingest and write data from and to Amazon S3 storage and Redshift to keep the cost of data storage to a minimum.
  • Enabled cloud watch and Ganglia on large AWS-based cloud clusters to enable effective monitoring of clusters.
  • Built mid to large clusters on AWS cloud using multiple instances of the Amazon EC2 cloud. This was to enable use cases that used distributions of Cloudera Hadoop or Hortonworks Hadoop.
  • Conceptualizing and migration of SAP Upstream systems to AWS to reduce TCO.
  • Enabled multiple Big Data use cases for customers ranging from Security as a service to Customer sentiment analysis
  • Enabled Big Data analytics using Hadoop to create a predictive engine in SAP applications like SAP CRM
  • Designed Big Data landscapes with Hortonworks Hadoop stack, Flume, Jasper charts to enable analytics models for security and instruction detection for a Swiss Customer.
  • In process of ramping up on Confidential Azure to enable the creation and management of Big Data architectures using MS Azure.
Azure Data Engineer/ Design
Confidential
  • Enterprise architect for Sales and Marketing IT organization specializing in areas of Enterprise Architecture for Big Data, Hadoop, and Analytics.
  • Involved in conceptualization and design of a Confidential Big Data Platform to extract structured and unstructured data from Enterprise transactional systems and then applying Big Data analysis techniques and Apache mahout machine learning techniques to determine customer sentiment.
  • Designed the Hadoop platform for high performance and low cost when compared with existing in-house data warehousing systems.
  • Architected the congregation of 3 key components of the Big Data platform
  • On-Prem third party Massively Parallel Platform (MPP) hardware
  • Chose Confidential’s distribution of Apache Hadoop after extensive performance evaluations with two other vendors of Hadoop that included the below
  • Fitment into the massively parallel platform, security, high availability, and multitenancy support.
  • Ease of administration, operation, and upgrades.
  • Software architecture including support for ingestion tools, machine learning, and statistical analysis.
  • Implemented “Start Small” strategy to quickly stand up a 16 server 192 core Hadoop on-premise platform in 5 weeks from conceptualization to operationalization.
  • Orchestrated cross-organizational partnership for activities like evaluation of Apache Hadoop distribution, cluster design, hardware selection, and wiring, security and access management, integration of ingestion and extraction tools with existing enterprise landscape.
  • Final Cluster design in the data center was two racks with total HDFS storage of 300 TB (effectively usable 100TB) with 3-way replication running MapReduce Version 1.
  • Evaluated and signed off on final cluster architecture which consisted of the below components
  • Designed the cluster for processing enterprise structured data from data warehouses like Teradata and SAP HANA and unstructured data from weblogs, XML, Social Media, and flat files.
  • Enabled multiple data ingestion techniques like Flume, Sqoop, Enterprise ETL tools, and simple PUT-based HDFS data insertion to enable ingestion of business data into the cluster through multiple ways.
  • Enabled the ‘Capacity Scheduler’ to balance Hadoop cluster resources, enabling multiple MapReduce batch jobs to run in parallel on the cluster.
  • Set up data processing tools like Hive, Pig for processing data. Nagios and Ganglia were enabled for visualizing the cluster resources.
  • Designed a multi-layered security architecture by creating a system for project isolation within the cluster preventing users from accidentally viewing or deleting data from another project’s center. Integrated system to Confidential’s Enterprise Access Management tool (EAM) and Active Directory for seamless authentication into the system.
  • Developed a predictive analytics engine to deliver information and insights using real-time ongoing predictive analytics.
  • Led multiple POCs with the aforementioned Big Data platform demonstrating value to business groups.
  • Led the conceptualization of an integrated analytics hub (IAH) at Confidential as a central Hadoop data store for various sales and marketing businesses to load and extract data. The centralized data strategy also helps Confidential get more insight into the data dependencies and drive more actionable intelligence.
  • Successfully delivered on the below big data use cases at Confidential resulting in a total estimated savings of $10 M a year
  • Incident predictability: Proactively understand and monitor customer incidents and automate root cause analysis and incident prevention. A proof of concept was undertaken that demonstrated reduction of incidents about 30% of 4000 per week.
  • Recommendation Engine: Apache Mahout Algorithms were utilized to classify and cluster incidents to build a scalable recommendation engine that could be used by various businesses.
  • Customer Insight: Conjoining data gathered from Websites and Consumption data gathered from the Teradata warehouse was used to enable better product availability and appropriate inventory levels at various manufacturing sites.
SAP Business Analyst
Confidential
  • Architected an SAP HANA landscape to transform the existing data warehousing landscape from SAP BW into SAP BW and SAP BO on HANA.
  • Designed the process to move an older data warehousing architecture from an Ab Initio – Teradata-based system to BODS, SLT, SAP HANA-based system.
  • Created methodologies to extract and load data from flat files (Sales Commissions and Geo Discounts) into SAP HANA using SAP BODS
  • Identified tables extracted from SAP ECC, SAP SCM, and SAP CRM and designed methodologies to load these tables into HANA using SAP Landscape Transformation (SLT) service
  • Improved the and transformation logic in BODS to reduce errors in master data and transaction data to reduce jobs from failing in SAP BW
  • Generated EOL plans for Ab Initio and Teradata based systems to ensure smooth rollover into SAP HANA
  • Managed and influenced multiple business stakeholders from various geographies by way of explaining their use cases and how they will be impacted, benefits of HANA in their use case, and critical timelines like quarter close reporting for commissions payout.
  • Currently working on enabling a Hadoop cluster to enable the extraction and processing of Customer data from Confidential web interaction sites.
  • Comprehensive understanding of large scale reporting architectures involving SAP BOBI, Information Design Tool (IDT), SAP Lumira, and Xcelsius tools on the below areas:
    - Reporting from SAP BO BI from Teradata or SAP HANA
    - Creation of a BO BI universe and reporting using SAP Web intelligence
    - Creation of simple reports from excelsior
    - Creation of databases in MongoDB and creating Java applications