All jobs

Software Development Engineer II - Amazon MSK, Managed Streaming Kafka (MSK), MSK Infrastructure Management

Amazon Development Center U.S., Inc.3d ago
United StatesOnsiteFull-timeMid Level3+ yrs exp
H-1B verified · 2310 LCAs

Top focus

Software EngineerInfrastructure EngineerSoftware Engineer IiMl Infra EngineerSenior Software Engineer
  • Come keep one of the world's largest Apache Kafka fleets healthy, secure
  • always on. On Amazon MSK, you will build the automation that maintains hundreds of thousands of streaming hosts, so customers never have to think about the infrastructure underneath their applications. This is infrastructure engineering at a scale where every change has to be safe by design. MSK is stateful: every host holds customer data that must stay replicated and in sync, so routine maintenance is never as simple as rebooting or replacing a host. Patching, repairing
  • replacing nodes across a fleet this size means coordinating each action against the availability guarantees customers depend on, so that maintenance stays invisible to them. You will design and build the automation that does this, turning hard operational problems into systems that run themselves. If you enjoy distributed systems, large-scale automation
  • work whose reliability countless streaming applications quietly depend on, this is a rare place to do it. Key job responsibilities - Design, build
  • operate automation that patches and maintains hundreds of thousands of stateful hosts, keeping fleet maintenance invisible to customers. - Build systems that automatically detect unhealthy hosts and remediate them, balancing fast recovery against avoiding needless disruption. - Develop rollout and rollback mechanisms that keep the blast radius of any change small at fleet scale
  • that let changes be tested before they reach customers and reversed if something goes wrong. - Own your services end to end: take part in on-call, debug production issues
  • continually reduce the manual effort needed to operate the fleet. - Write design documents, collaborate with engineers across MSK
  • raise the engineering bar through design and code reviews. A day in the life No two days look the same, but most blend building with operating. You might spend the morning designing a safer way to roll out a fleet-wide change, pairing with a teammate on the review
  • writing the automation that carries it out. In the afternoon you might dig into a signal that a small set of hosts is unhealthy, trace it to root cause
  • improve the automation so it handles that case on its own next time. You will write design documents, review your teammates' code
  • steadily push routine operational work out of human hands and into systems that run themselves. When you are on call, you keep the fleet healthy and feed every manual step you take back into automation, so the same issue does not page anyone twice. Our team keeps Amazon MSK's managed Kafka fleet healthy, secure
  • available, so customers never have to think about the infrastructure beneath their streaming applications. People choose MSK because they want Apache Kafka without the burden of operating it. Our mission is to make that promise real at very large scale: keeping the fleet patched and secure, automatically recovering hosts that become unhealthy
  • renewing the certificates that protect data in transit, all without interrupting the applications running on top. We believe operations should be automated, not heroic. We invest in systems that detect and resolve problems on their own, so the team spends its time building durable automation instead of reacting to pages
  • we are increasingly using generative AI to strengthen those mechanisms. You will build alongside teammates who jointly own these systems
  • you will partner with the wider MSK engineering, product
  • operations groups that depend on the fleet staying healthy as the service grows. Your work shapes the reliability every MSK customer feels
  • you will have real ownership of meaningful problems from design through operation. About the team AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path
  • includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home
  • is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud.
  • 3+ years of non-internship professional software development experience - 2+ years of non-internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience - Experience programming with at least one software programming language
  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing
  • operations experience - Bachelor's degree in computer science or equivalent Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability
  • other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner. The base salary range for this position is listed below. Your Amazon package will include sign-on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications
  • location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off
  • parental leave. Learn more about our benefits at https://amazon.jobs/en/benefits . USA, WA, Seattle - 143,700.00 - 194,400.00 USD annually

Required skills

Apache Kafkaautomationdistributed systemsdesign patternsreliabilityscalingprogramming
Posted on JobRush — the end-to-end AI job-search platform.