Senior Manager Software Development, AWS Systems Manager
Amazon Development Center U.S., Inc.•3h ago
United StatesOnsiteFull-timeSenior Level10+ yrs exp
H-1B verified · 2310 LCAs
Top focus
Senior Software EngineerSoftware EngineerSystems EngineerSoftware Engineer Ii
- AI agents can diagnose infrastructure problems in seconds — but they still stop at "here's what you should do" because nobody trusts them to actually press the button. Your teams will build the execution layer that changes that. This is a Sr. SDM role leading three engineering teams in AWS Systems Manager's Automation suite, making it safe for AI agents and human operators to take real actions on cloud infrastructure at scale. Systems Manager Automation already runs 175M+ steps per week across 668K active accounts. The next chapter is turning it into the standard execution interface for autonomous operations — where AI agents (AWS Frontier Agents, third-party tools, customer-built agents) can safely execute runbooks with pre-execution impact analysis, blast radius scoping
- automatic rollback. Your teams will ship the capabilities that make customers say "yes, I trust this to run without me watching." What your teams will build: - Pre-execution impact analysis — LLM-powered previews that show customers exactly what a runbook will change before anything executes. Catches risky actions (restarting during peak, deleting resources with active dependencies) in real time. - Enterprise-scale document sharing — seamless runbook sharing across hundreds of accounts and organizational units, solving the top customer pain point for enterprises operating at scale - The safety framework that unlocks AI agent adoption — granular runtime permissions, approval workflows
- a break-glass execution model where agents execute through runbooks without needing direct infrastructure write permissions - Intelligent orchestration — context-aware runbook recommendations that surface the right action at the right time based on operational signals Why this is a rare opportunity: - You're building at the intersection of AI and infrastructure safety — the problem that determines whether autonomous operations stays a demo or becomes production reality - Your customers are the largest enterprises in the world
- your decisions about what's "safe enough" directly shape how the industry approaches AI-driven operations - The flywheel is real: better safety → more trust → more content → more integrations → more adoption. You're early enough to influence the shape of it. - Three teams is the right size — enough leverage to ship meaningful capabilities every quarter, small enough to know the architecture and the engineers What you'll do: - Lead three SDMs and their teams through a multi-year roadmap from foundations (2026) through intelligent orchestration (2028) - Partner with Frontier Agents, Pulsar (observability)
- Binder (security) teams as the execution layer they build on top of - Make hard calls about safety vs. speed — how much verification is enough before letting an agent execute a production change? - Own operational excellence for services handling 175M+ weekly executions across all AWS regions - Grow engineers and managers who can independently design safety-critical distributed systems You're a great fit if you've led multiple engineering teams, operated services at scale
- are energized by the problem of building trust in autonomous systems. Experience with AI/ML integration, workflow engines
- safety-critical systems is a plus — but strong engineering leadership fundamentals matter more than domain expertise. Key job responsibilities You'll own three engineering teams end-to-end — their roadmaps, their operational health
- their people. Specifically: - Set technical direction for the platform in partnership with Principal Engineers and product managers. You decide what to build, what to defer
- what to kill. - Hire, develop
- retain three SDMs and their engineering teams. You understand the systems deeply enough to challenge technical decisions and are actively applying generative AI in all of your daily actions. - Own service availability and operational excellence for services processing 175M+ automated actions per week across all AWS regions. When something breaks at 2am, your oncall teams handle it — because you built the mechanisms (runbooks, alarms, escalation paths) that make that possible. - Drive cross-team partnerships with AI agent teams, observability
- security services who depend on your platform as their execution layer. - Represent your teams in business reviews with crisp, data-driven narratives. You'll review your teams' progress monthly with multiple directors— showing roadmap progress, operational health
- customer adoption metrics. - Make prioritization trade-offs between new features, tech debt, security compliance
- operational burden reduction. A day in the life Mornings usually start with operational signals — checking deployment health, scanning overnight tickets, reviewing what your oncall teams handled. Mid-morning might be a roadmap review with your PM counterpart, followed by a 1:1 with one of your SDMs where you're coaching them through a hard prioritization call. Afternoons shift between design reviews (your teams are building safety-critical systems — the details matter), cross-team syncs with AI agent partner teams
- unblocking work. You'll context-switch between people problems and systems problems daily — and enjoy both. About the team AWS Systems Manager helps customers operate their infrastructure safely at scale — from a handful of servers to millions of managed nodes across AWS, on-premises
- multi-cloud environments. Our team builds the execution engine: the runbooks, the orchestration
- the safety mechanisms that let customers (and increasingly, AI agents) take action on their infrastructure with confidence. We're also pushing the boundary on how engineering teams themselves work — using AI agents in our own development workflows for operational reviews, code quality, incident investigation
- decision support. We build AI-powered products and we're practitioners of AI-assisted engineering.
- 10+ years of engineering experience - 5+ years of engineering team management experience - 10+ years of planning, designing, developing and delivering consumer software experience - Experience partnering with product or program management teams - Experience managing multiple concurrent programs, projects and development teams in an Agile environment
- Experience partnering with product and program management teams - Experience designing and developing large scale, high-traffic applications Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability
- other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner. The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications
- location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off
- parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits . USA, WA, Seattle - 220,100.00 - 297,700.00 USD annually
Required skills
AWSAIMLengineering leadershipsafety-critical systems