Senior Site Reliability Engineer
Mastercard•12h ago
Pune, IndiaHybridFull-timeSenior Level5+ yrs exp
Top focus
Sre
- Our Purpose Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential. Title and Summary Senior Site Reliability Engineer Overview The Mastercard Authentication Program owns how consumer authentication works across both in-store and e-commerce use cases. This role contributes to building and operating best-in-class authentication products such as ID Check, Token Authentication Service
- Token Authentication Framework, driving adoption, scale
- revenue growth. This position sits at the intersection of Site Reliability Engineering, platform engineering
- release engineering, ensuring our platforms are scalable, resilient, secure
- production-ready across hybrid environments (Kubernetes, cloud
- on-prem). Role Summary As a Senior Software Engineer (SRE), you will own the reliability, performance
- operational excellence of Kubernetes-based workloads and distributed systems. You will drive automation, observability
- continuous delivery practices, ensuring systems are highly available, scalable
- enabling fast and reliable product delivery. Key Responsibilities Platform Reliability & Engineering
- Ensure availability, performance, and resilience of workloads deployed on Kubernetes, On premise, PCF or Cloud platforms aligned to SLO/SLI objectives
- Design, build, and operate scalable, secure, and highly available containerized platforms and services
- Optimize capacity, performance, and cost efficiency across distributed systems Automation, CI/CD & Release Engineering
- Build and enhance CI/CD pipelines to enable fast, reliable, and repeatable deployments
- Automate infrastructure and operations using Infrastructure as Code and GitOps practices
- Plan and execute software releases across environments (Kubernetes, AWS, Azure, PCF, VMs)
- Drive continuous improvement in release processes, deployment strategies, and development workflows Observability, Performance & Operations
- Implement strong monitoring, logging, and tracing to enable proactive issue detection
- Conduct performance testing and tuning to ensure scalability and reliability
- Lead incident management, root cause analysis (RCA), and postmortems to drive continuous improvement Security & Governance
- Ensure adherence to security, compliance, and enterprise standards across deployments
- Implement workload security, secrets management, and runtime protections
- Contribute to operational governance and platform standardization Collaboration & Engineering Excellence
- Partner with engineering, product, and operations teams to improve production readiness and system design
- Support troubleshooting, debugging, and issue resolution across environments
- Maintain clear documentation for release processes, pipelines, and operational procedures
- Drive adoption of modern engineering practices to improve reliability, maintainability, and scalability AI & Innovation
- Leverage AI-driven tools and automation to enhance observability, incident response, release efficiency, and capacity planning
- Identify and build AI-enabled use cases to improve operational efficiency and reliability outcomes All About You Experience & Skills
- Strong experience in SRE / Release Engineering / Platform Engineering roles
- Proven experience managing and operating workloads deployed on Kubernetes, on-premise (bare metal / VMs), and PCF, with good-to-have exposure to public cloud platforms (AWS, Azure)
- Hands-on expertise in containerization (Docker) and Kubernetes orchestration
- Strong experience with CI/CD tools, scripting (Shell/Python/Groovy), and automation frameworks
- Solid knowledge of monitoring, observability, performance testing, and scaling of distributed systems (e.g., Dynatrace, Splunk, JMeter, LoadRunner)
- Strong programming experience (preferably Java / Spring Boot) Engineering Practices
- Strong understanding of SDLC, Git-based workflows (Gitflow), and peer review practices
- Expertise in troubleshooting and debugging (thread dumps, heap dumps, performance bottlenecks)
- Familiarity with security principles and deployment of secure systems
- Exposure to Infrastructure as Code, DevOps, and GitOps practices Personal Attributes
- Strong problem-solving and analytical skills
- Excellent communication and collaboration abilities
- Ability to work in a fast-paced, high-impact environment and manage multiple priorities
- Self-driven with ability to work both independently and as part of a team Our Teams & Values
- Work in small, collaborative teams of engineers and product partners
- Put customer success at the center of everything we do
- Foster a diverse, inclusive, and high-performing culture
- Believe in doing well by doing good, supporting ethical and responsible innovation Corporate Security Responsibility All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must: Abide by Mastercard’s security policies and practices
- Ensure the confidentiality and integrity of the information being accessed
- Report any suspected information security violation or breach, and Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.
Required skills
KubernetesAWSAzureDockerCI/CDShellPythonGroovyDynatraceSplunkJMeterLoadRunnerJavaSpring Boot