Data Scientist, Network Fabric Engineering

Amazon Support Services Pty Ltd•7h ago

AU, NSW, SydneyOnsiteFull-timeEntry Level1+ yrs exp

Apply now

Top focus

Data ScientistVp EngineeringVp Data

AWS Networking operates one of the largest and most complex networks on the planet. The team you'd join is responsible for the availability of that network — measuring how it performs for customers, predicting where it is most likely to degrade, and reshaping how we operate it as the workload grows. We are in the middle of a significant change in how network operations are run. Lessons from our recent work on automation, AI, and ML — including agentic systems that triage and mitigate incidents alongside engineers — are feeding into a broader rethink of where humans focus, where automation takes over, and how we measure whether either is working. We are looking for a Data Scientist to join the team in Sydney to be the data science partner behind that change. You will provide the evidence the team uses to make those decisions, and then measure whether they delivered the outcomes we expected. Concretely, that means you will work closely with software engineers, network engineers, and other scientists to build risk and reliability models against telemetry from millions of network devices, surface the patterns that drive customer-impact incidents, and turn that analysis into the dashboards and metrics our leaders use to set priorities. It also means designing the evaluations that tell us when a new piece of automation — including the agents we are rolling out to support engineers on the front line — is actually moving the needle on availability, and not just adding noise. If you are an early-career scientist who wants to do data science where every analysis you do feeds a decision and every decision you help make is measured against its outcome — at a scale no academic lab or startup can match — this is the team for you. Key job responsibilities - Develop predictive risk and reliability models for network availability — using historical device failures, alarm telemetry, ticket data, and traffic signals to identify the devices, fabrics, and event types most likely to escalate - Provide the evidence base behind program decisions: surface where availability is at risk, where automation is ready to expand, and where human engineering effort has the highest leverage - Build operational analytics and dashboards (in Amazon QuickSight, Amazon CloudWatch, and Python) that our leaders use to track network health and the impact of the operational changes we are making - Design and run experiments to evaluate the automation we are rolling out — including agentic systems that support engineers on incidents — comparing automated decisions against runbooks and human engineers, and measuring whether each rollout improved availability - Improve the data quality and classification underlying our availability program — from event categorisation to root-cause attribution — so the metrics we report and the decisions we make rest on solid ground - Contribute to event-driven scoring and monitoring pipelines (Python, SQL, AWS Lambda, Amazon S3, Amazon Athena) that keep the decide / measure / improve loop running continuously - Translate findings into clear insights and recommendations for technical and business stakeholders A day in the life You might start the morning pulling together the evidence the team needs for an upcoming decision — which fabrics to prioritise this quarter, which event types our automation is reliable enough to take on next, or whether a class of devices is showing the early warning signs of a wider issue. You'd build a quick analysis in Python, pull data from internal sources via Amazon Athena, and bring back a recommendation grounded in the numbers. By mid-morning you're in a working session with a network engineer and the agent team turning that recommendation into a plan. The decision gets made
your job from this point is to make sure we measure whether it worked. In the afternoon you switch over to outcome measurement. You refine the evaluation pipeline that tracks how the automation has performed since last week's rollout, update the CloudWatch dashboard the launch team uses to gate the next expansion, and flag a regression you caught in mitigation accuracy on a specific event type. The work you ship today directly shapes both the next decision the team makes and how we know whether the last one delivered. About the team We sit inside AWS Networking with a strong Sydney presence and a remit that spans network availability, the data and analytics that support it, and the automation we are building to change how operations are done. You'd partner closely with the team in Sydney and the broader network engineering organisation across Seattle and Dublin. Small team, high autonomy, and a roadmap with real production impact rather than research demos.
1+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience - 2+ years of data/research scientist, statistician or quantitative analyst in an internet-based company with complex and big data sources experience
Knowledge of statistical packages and business intelligence tools such as SPSS, SAS, S-PLUS
R - Knowledge of machine learning concepts and their application to reasoning and problem-solving - Experience with clustered data processing (e.g., Hadoop, Spark, Map-reduce
Hive) - Experience working with or evaluating AI systems - Experience applying quantitative analysis to solve business problems and making data-driven business decisions - Master's degree or equivalent in Science, Technology, Engineering
Mathematics (STEM) Acknowledgement of country: In the spirit of reconciliation Amazon acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today. IDE statement: Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability
other legally protected status. Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Required skills

PythonSQLAWS LambdaAmazon S3Amazon AthenaAmazon QuickSightAmazon CloudWatchHadoopSparkMachine Learning