Engineer II - Site Reliability (Hybrid, IND)

Crowdstrike•1d ago

India - BangaloreHybridFull-timeMid Level3+ yrs exp

Top focus

Sre

As a global leader in cybersecurity, CrowdStrike protects the people, processes and technologies that drive modern organizations. Since 2011, our mission hasn’t changed — we’re here to stop breaches, and we’ve redefined modern security with the world’s most advanced AI-native platform.

Our customers span all industries, and they count on CrowdStrike to keep their businesses running, their communities safe and their lives moving forward. We’re also a mission-driven company. We cultivate a culture that gives every CrowdStriker both the flexibility and autonomy to own their careers.

We’re always looking to add talented CrowdStrikers to the team who have limitless passion, a relentless focus on innovation and a fanatical commitment to our customers, our community and each other. Ready to join a mission that matters? The future of cybersecurity starts with you.

About This Role: CrowdStrike's engineering organization depends on shared infrastructure platforms that power critical product capabilities. The Temporal Platform team owns a production workflow orchestration system that serves engineering teams across the organization.

You'll help operate and evolve our internal Temporal infrastructure, a stateful, distributed system running on Kubernetes across multiple regions. The work spans day to day operations, automation, performance tuning and capacity planning. You'll learn how to run complex infrastructure at scale while working alongside experienced platform engineers who will help you grow into broader ownership over time.

This is a growth oriented role. We're looking for someone early in their platform engineering journey who's ready to build operational depth, develop automation skills and understand what it takes to run production infrastructure that teams depend on

What You'll Do

Operate Temporal infrastructure in production - deploy updates, monitor cluster health, respond to alerts, and maintain availability across multiple environments using Helm, Kubernetes and FluxCD Automate operational work - write scripts and workflows that make deployments, upgrades, scaling operations, and troubleshooting repeatable and safe
reduce manual toil over time Support capacity planning and performance tuning - track resource utilization, identify bottlenecks, tune configuration for better performance and contribute to capacity forecasts under guidance Build observability - instrument services with metrics and logs, improve dashboards, and refine alerting so the team can catch problems before they impact users Contribute to on call rotation - participate in incident response, learn how to triage and escalate issues effectively, write runbooks that help the next person on-call Learn GitOps workflows - work with FluxCD to manage infrastructure-as-code, submit pull requests for configuration changes, and understand how declarative deployment pipelines work Troubleshoot operational issues - investigate deployment failures, connectivity problems, performance degradations, and work with teammates to determine root cause and preventive fixes Partner with consuming teams - help internal engineers onboard to Temporal, answer questions, debug integration issues, and contribute to documentation that makes adoption easier Grow your infrastructure skills - work with PostgreSQL, AWS/GCP, Kubernetes networking, Helm chart management, certificate rotation, secret management and distributed systems operations under mentorship What You'll Need: 3+ years in DevOps, SRE, platform engineering or infrastructure roles - you've worked on production systems and understand the basics of running services reliably Kubernetes fundamentals - you've deployed services to Kubernetes, understand pods/deployments/services, and can debug basic cluster issues
you don't need deep expertise but should be comfortable navigating kubectl and reviewing YAML Helm experience - you've used Helm to deploy applications, understand charts and values files, and can troubleshoot failed releases Some infrastructure-as-code experience - you've used tools like Terraform, Ansible, or GitOps workflows (FluxCD, ArgoCD) to manage infrastructure declaratively rather than clicking in consoles Cloud platform exposure - you've worked with AWS or GCP in some capacity
you understand basic compute, networking, and storage primitives but don't need to be an expert Scripting ability - you can write scripts (Bash, Python, Go) to automate repetitive tasks and build simple tooling Basic understanding of stateful systems - you've worked with databases (PostgreSQL preferred) or other persistent services and understand backups, schema management, and connection handling at a foundational level Willingness to learn and ask for help - you're comfortable saying "I don't know" and diving into unfamiliar territory with support from teammates What Success Looks Like: In your first few months: You can deploy Temporal upgrades across environments with confidence You've automated at least one recurring operational task You respond to on-call pages effectively and write clear incident summaries You've contributed meaningful improvements to dashboards or runbooks Internal teams reach out to you directly for help with Temporal questions Over your first year: You own end-to-end operations for specific Temporal components or environments You proactively identify performance issues and propose tuning strategies You're contributing to capacity planning and cost optimization discussions You're helping onboard new engineers to the team's operational practices Bonus Points: Experience operating workflow orchestration platforms (Temporal, Airflow, Prefect, Cadence) Experience with FluxCD or ArgoCD in production Exposure to distributed tracing or observability platforms Go experience (our services and many consuming applications are written in Go) Previous work on internal platform teams or DevOps infrastructure roles Understanding of PostgreSQL performance tuning and operational best practices Familiarity with multi-region infrastructure deployment and failover patterns #LI-SM2 Benefits of Working at CrowdStrike: Market leader in compensation and equity awards Comprehensive physical and mental wellness programs Competitive vacation and holidays for recharge Paid parental and adoption leaves Professional development opportunities for all employees regardless of level or role Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections Vibrant office culture with world class amenities Great Place to Work Certified™ across the globe CrowdStrike is proud to be an equal opportunity employer.
We are committed to fostering a culture of belonging where everyone is valued for who they are and empowered to succeed.
We support veterans and individuals with disabilities through our affirmative action program.
CrowdStrike is committed to providing equal employment opportunity for all employees and applicants for employment.
The Company does not discriminate in employment opportunities or practices on the basis of race, color, creed, ethnicity, religion, sex (including pregnancy or pregnancy-related medical conditions), sexual orientation, gender identity, marital or family status, veteran status, age, national origin, ancestry, physical disability (including HIV and AIDS), mental disability, medical condition, genetic information, membership or activity in a local human rights commission, status with regard to public assistance
any other characteristic protected by law.
We base all employment decisions--including recruitment, selection, training, compensation, benefits, discipline, promotions, transfers, lay-offs, return from lay-off, terminations and social/recreational programs--on valid job requirements.
If you need assistance accessing or reviewing the information on this website or need help submitting an application for employment or requesting an accommodation, please contact us at recruiting@crowdstrike.com for further assistance.

Required skills

KubernetesHelmAWSGCPTerraformAnsibleGitOpsBashPythonGoPostgreSQL