IN_Senior Associate_Data Engineer_Emerging Businesses_Advisory_Bangalore

Pwc•4h ago

Bengaluru MilleniaOnsiteFull-timeSenior Level6+ yrs exp

Top focus

Data EngineerSenior Data EngineerVp DataData Warehouse EngineerData Analyst

Line of Service Advisory Industry/Sector Not Applicable Specialism Operations Management Level Senior Associate Job Description & Summary At PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients.

These individuals combine technical experience with creative thinking to deliver innovative software products and solutions. In emerging technology at PwC, you will focus on exploring and implementing cutting-edge technologies to drive innovation and transformation for clients.

You will work in areas such as artificial intelligence, blockchain, and the internet of things (IoT). * Why PWC At PwC , you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities.

This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life.

Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more about us . At PwC , we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law.

We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. " Job Description & Summary: A career within ..............................

Responsibilities

Design, build, and maintain robust ETL/ELT pipelines on Azure using Apache Spark (PySpark/Scala) on Databricks/ HDInsight
Orchestrate complex data workflows with Azure Data Factory (pipelines, triggers, integration runtimes) and Databricks Jobs/Workflows
Develop and manage data lakes on Azure Blob Storage/ADLS Gen2, including naming, partitioning, lifecycle policies, and schema evolution
Implement curated layers (raw, staged, curated), leveraging columnar formats (Parquet/ORC/Avro) and table formats (Delta Lake); manage metastore/Unity Catalog
Optimize Spark jobs and cluster configurations for performance and cost (autoscaling, spot VMs, Photon, adaptive query execution, caching, partition tuning)
Operationalize jobs with monitoring, logging, and alerting via Azure Monitor, Log Analytics, and Databricks metrics; build runbooks and dashboards
Implement data quality, testing, and observability for pipelines (unit/integration tests, Great Expectations, SLAs, lineage)
Collaborate with Analytics, Data Science, and Product to deliver modeled, trustworthy datasets for BI, ML, and applications
Enforce security and governance best practices (AAD RBAC, Managed Identities, ACLs, Key Vault, Private Endpoints, VNet integration, encryption with CMK/SSE)
Contribute to infrastructure as code and CI/CD (Azure DevOps or GitHub Actions) including Databricks objects deployment
Participate in on-call rotations, incident response, and postmortems
drive continuous improvement and documentation Man Mandatory skill sets: 6+ years as a Data Engineer (or similar) with a strong focus on Azure data services
Expert-level experience with Apache Spark (PySpark and/or Scala) and distributed data processing
Hands-on experience with Databricks and Azure HDInsight for large-scale batch processing
Proficient in Azure Data Factory for orchestration (pipelines, data flows, triggers)
Strong Python skills and solid SQL (window functions, performance tuning, optimization)
Practical experience with Azure Blob Storage/ADLS Gen2, Azure Key Vault, Azure Monitor/Log Analytics
Understanding of Hadoop ecosystem fundamentals (HDFS, YARN, Hive/Metastore)
Strong grasp of data modeling, file formats (Parquet/ORC/Avro), partitioning, and performance best practices
Experience building production-grade pipelines with testing, monitoring, and alerting
Version control with Git and collaborative development practices
Excellent communication and cross-functional collaboration skills
Understanding of Hadoop ecosystem fundamentals (HDFS, YARN, Hive/Metastore)
Strong grasp of data modeling, file formats (Parquet/ORC/Avro), partitioning, and performance best practices
Experience building production-grade pipelines with testing, monitoring, and alerting
Version control with Git and collaborative development practices
Excellent communication and ability to work cross-functionally Preferred skill sets: Stream processing and event-driven architectures (Kafka, Azure Event Hubs, Azure Functions)
Lakehouse technologies (Delta Lake, Unity Catalog) and query engines (Synapse Serverless, Databricks SQL)
Governance and lineage tools (Microsoft Purview, OpenLineage)
Cost optimization on Azure (cluster policies, Photon, spot VMs, storage tiers hot/cool/archive)
Infrastructure as code (Terraform/Bicep) and CI/CD for data workflows (Azure DevOps/GitHub Actions)
Containerization and orchestration (Docker, AKS) and packaging for Databricks (dbx, DABs)
Experience integrating with warehouses (Synapse Dedicated SQL Pools, Snowflake) and BI tools (Power BI)
Security/compliance exposure (PII handling, least-privilege, network isolation) Years of experience required 5-7 years Education qualification: BE/B.Tech/MBA/MCA Education (if blank, degree and/or field of study not specified) Degrees/Field of Study required: Bachelor of Engineering, MBA (Master of Business Administration) Degrees/Field of Study preferred: Certifications (if blank, certifications not specified) Required Skills Data Engineering Optional Skills Accepting Feedback, Accepting Feedback, Active Listening, Analytical Thinking, Artificial Intelligence, Business Planning and Simulation (BW-BPS), Communication, Competitive Advantage, Conducting Research, Creativity, Digital Transformation, Embracing Change, Emotional Regulation, Empathy, Implementing Technology, Inclusion, Innovation Processes, Intellectual Curiosity, Internet of Things (IoT), Learning Agility, Optimism, Product Development, Product Testing, Prototyping, Quality Assurance Process Management {+ 10 more} Desired Languages (If blank, desired languages not specified) Travel Requirements Available for Work Visa Sponsorship? Government Clearance Required? Job Posting End Date July 6, 2026

Required skills

AzureApache SparkPySparkScalaDatabricksAzure Data FactoryPythonSQLAzure Blob StorageADLS Gen2Azure Key VaultAzure MonitorLog AnalyticsGitTerraform