Sr Manager, Software Engineering

Lowes•14h ago

BengaluruOnsite$83Full-timeManager Level12+ yrs exp

Top focus

Engineering ManagerVp EngineeringSenior Engineering ManagerSoftware EngineerSoftware Engineer Ii

Innovate in Bengaluru This position is based at our on-site office in Bengaluru. Lowe's offers an ultramodern work environment, complete with cutting-edge technology, collaborative workspaces, an on-site gym and clinic, and other perks to enhance your work experience.

About Lowe’s Lowe’s is a FORTUNE® 100 home improvement company serving approximately 16 million customer transactions a week in the United States. With total fiscal year 2024 sales of more than $83 billion, Lowe’s operates over 1,700 home improvement stores and employs approximately 300,000 associates.

Based in Mooresville, N.C., Lowe’s supports the communities it serves through programs focused on creating safe, affordable housing, improving community spaces, helping to develop the next generation of skilled trade experts and providing disaster relief to communities in need.

For more information, visit Lowes.com Lowe’s India, the Global Capability Center of Lowe’s Companies Inc., is a hub for driving our technology, business, analytics, and shared services strategy. Based in Bengaluru with over 4,500 associates, it powers innovations across omnichannel retail, AI/ML enterprise architecture, supply chain, and customer experience.

From supporting and launching homegrown solutions to fostering innovation through its Catalyze platform, Lowe’s India plays a pivotal role in transforming home improvement retail while upholding strong commitment to social impact and sustainability.

For more information, visit Lowes India Job Summary: As a Senior Engineering Manager, Reliability Engineering, you will lead multiple Reliability Engineering teams responsible for ensuring the reliability, performance, scalability, and operational readiness of Lowe's critical Supply Chain platforms, including Inventory Management, Distribution Centers, Transportation Management, Warehouse Management Systems (WMS), Order Fulfillment, Replenishment, Merchandising, and Buy-Move applications.

You will drive the adoption of reliability engineering principles, champion observability and automation, lead incident management and resiliency initiatives, and foster a culture of continuous improvement across the organization Roles & Responsibilities: Core Responsibilities: Lead, mentor, and develop high-performing Site Reliability Engineering teams while fostering a culture of ownership, innovation, operational excellence, and continuous learning.

Establish and drive SRE practices including Service Level Indicators (SLIs), Service Level Objectives (SLOs), Error Budgets, capacity planning, resiliency engineering, and reliability governance. Partner with Supply Chain Product and Engineering leaders to improve system reliability, scalability, availability, and performance.

Lead major incident management processes, post-incident reviews, root cause analysis, and continuous improvement initiatives to reduce operational risk and prevent recurrence. Drive observability strategy across monitoring, logging, tracing, alerting, and analytics platforms to provide actionable insights into system health and customer experience.

Champion automation of operational processes, incident response, infrastructure management, deployment workflows, and reliability controls to reduce manual effort and improve operational efficiency. Collaborate with Cloud Engineering, Platform Engineering, Security, and Application Development teams to design and operate resilient cloud-native architectures.

Establish reliability engineering standards and operational readiness reviews for new services and platform capabilities. Oversee capacity planning, performance engineering, disaster recovery testing, and business continuity initiatives for critical supply chain platforms.

Develop and track reliability KPIs, operational metrics, and executive-level reporting to communicate service health and organizational risk. Develop SRE platforms Drive adoption of DevSecOps, Infrastructure as Code (IaC), CI/CD automation, and cloud reliability best practices.

Manage vendor relationships and evaluate tools and technologies that improve observability, incident response, automation, and service reliability. Support budget planning, workforce planning, and strategic roadmap development for the reliability engineering organization.

Partner with global teams across the United States and India to provide 24x7 operational excellence and support critical business events. Years of Experience: 12+ years of experience in Software Engineering, Site Reliability Engineering, Production Engineering, DevOps, Cloud Engineering, or related disciplines. 3+ years of people leadership experience managing engineering teams.

Experience leading geographically distributed teams across multiple time zones. Experience leading both application engineering and operational support organizations. Experience in implementing distributed systems, microservices architecture, event-driven systems, API ecosystems Education Qualification & Certifications Required Minimum Qualifications : Bachelor’s degree in computer science, computer information systems (CIS), or related field or equivalent years of experience in lieu of education requirement, if applicable Skill Set Required Deep expertise in SRE principles, reliability engineering methodologies, and operational excellence frameworks.

Experience with public cloud platforms such as Google Cloud Platform (GCP), AWS, or Azure. Experience with Kubernetes, container orchestration, service mesh, and cloud-native technologies. Strong knowledge of observability platforms such as Datadog, Dynatrace, Splunk, Grafana, Prometheus, Open Telemetry, or equivalent.

Experience implementing and managing SLIs, SLOs, Error Budgets, and reliability scorecards. Strong understanding of CI/CD pipelines, Infrastructure as Code (Terraform), GitOps, and automation frameworks. Experience leading incident command, crisis management, and large-scale outage response.

Experience driving reliability transformation initiatives within large retail Experience supporting large-scale enterprise platforms and mission-critical Supply Chain systems. Experience managing large-scale production environments with strict availability and performance requirements.

Lowe's is an equal opportunity employer and administers all personnel practices without regard to race, color, religious creed, sex, gender, age, ancestry, national origin, mental or physical disability or medical condition, sexual orientation, gender identity or expression, marital status, military or veteran status, genetic information, or any other category protected under federal, state, or local law.

Required skills

SREAWSAzureGCPKubernetesTerraformCI/CDDevOpsobservabilitymicroservicesAPI