All jobs

Staff SRE, Ads

Reddit4h ago
United KingdomRemoteFull-timeStaff Level8+ yrs exp

Top focus

Sre
  • Reddit is a community of communities. It’s built on shared interests, passion
  • is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote
  • comment on the topics they care most about. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet’s largest sources of information. For more information, visit www.redditinc.com .
  • Location: Reddit has a flexible first workforce. Don't live near our office? No worries: you can work remotely from anywhere in the UK, the Netherlands or Ireland.
  • The Ads organization powers Reddit's advertising platform, enabling advertisers to reach highly engaged communities while helping Reddit grow its business. The reliability of our Ads systems directly impacts advertiser success, revenue generation, and user experience.
  • The Ads Reliability team partners closely with Ads Engineering teams to improve reliability, scalability, operational excellence, and developer productivity across Reddit's advertising ecosystem.
  • We're looking for a Staff Site Reliability Engineer who will provide technical leadership for reliability initiatives across the Ads organization and help shape the future of Ads infrastructure at Reddit

What You’ll Do

  • Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
  • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
  • Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
  • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
  • Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events.
  • Identify systemic reliability risks and drive long-term solutions that improve platform resilience.
  • Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
  • Mentor engineers and provide technical leadership across multiple teams.
  • Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments

What We’re Looking For

  • 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
  • Strong experience supporting high traffic, user facing production environments.
  • Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
  • Experience designing highly available systems with strong operational and reliability practices.
  • Strong understanding of observability systems including metrics, logging, tracing, and alerting.
  • Good programming skills in languages such as Go, Python, or similar.
  • Experience improving reliability through SLOs, automation, incident management, and performance optimization.
  • Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
  • Strong collaboration and communication skills with the ability to influence technical direction across teams

Nice To Have

  • Experience supporting advertising technology platforms or other large-scale revenue-critical systems.
  • Deep understanding of reliability challenges associated with ad-serving, real-time auctions, budget pacing, campaign delivery, measurement, attribution, or billing systems.
  • Experience operating high-QPS, low-latency services where latency directly impacts business outcomes.
  • Experience establishing reliability programs that deliver meaningful, measurable business outcomes
  • Experience with Kubernetes, cloud infrastructure, and large-scale distributed systems.
  • Familiarity with Kafka, ClickHouse, Spark, Flink, BigQuery, or similar large-scale data platforms.
  • Experience partnering with Product, Data Science, and Ads Engineering organizations.
  • Experience supporting machine learning inference or recommendation systems at scale

Benefits

  • Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
  • Family Planning Support
  • Gender-Affirming Care
  • Mental Health & Coaching Benefits
  • Group Personal Pension Scheme with Employer match
  • Private Medical and Dental Scheme
  • Income Replacement Programs
  • Bike to Work scheme
  • Flexible Vacation & Paid Volunteer Time Off
  • Generous Paid Parental Leave
  • In select roles and locations, the interviews will be recorded, transcribed and summarized by artificial intelligence (AI). You will have the opportunity to opt out of recording, transcription and summarization prior to any scheduled interviews.
  • During the interview, we will collect the following categories of personal information: Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video recording)
  • any other categories of personal information you choose to share with us. We will use this information to evaluate your application for employment or an independent contractor role, as applicable. We will not sell your personal information or disclose it to any third party for their marketing purposes. We will delete any recording of your interview promptly after making a hiring decision. For more information about how we will handle your personal information, including our retention of it, please refer to our Candidate Privacy Policy for Potential Employees and Contractors .
  • Reddit is proud to be an equal opportunity employer
  • is committed to building a workforce representative of the diverse communities we serve. Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If, due to a disability, you need an accommodation during the interview process, please let your recruiter know.

Required skills

Site Reliability EngineeringInfrastructure Engineeringdistributed systemsLinuxcloud native architecturesobservabilityGoPythonKubernetesKafkaClickHouseSparkFlinkBigQuery
Posted on JobRush — the end-to-end AI job-search platform.