Senior Software Engineer, Performance Tooling and Infrastructure
Nuro•4h ago
United StatesHybrid$2Full-timeSenior Level5+ yrs exp
Visa-friendly
Top focus
Software EngineerSenior Software EngineerInfrastructure EngineerSoftware Engineer IiMl Infra Engineer
- Who We Are
- Nuro believes self-driving vehicles are the most immediate and profound opportunity for AI to drive positive change in the physical world. Safer streets, more time for what matters
- easier access to the world around us, that’s why we’re building a universal autonomy platform: self-driving for all roads and all rides.
- Founded in 2016, Nuro is a physical AI company developing Level 4 autonomous driving technology for a wide range of vehicles, use cases
- markets. Powered by the Nuro Driver™, our universal autonomy platform enables the global mobility ecosystem to deploy autonomy at scale, from robotaxis and logistics fleets to personal vehicles.
- With years of real-world deployment experience and a flexible, partner-led business model, Nuro is working toward a future where millions of autonomous vehicles powered by our technology help make everyday life safer, easier, and more connected.
- Nuro has raised over $2B in capital from Uber, NVIDIA, Google, Softbank, Fidelity, T. Rowe Price, and other leading investors
- About the Role
- Nuro leverages many different bench-top systems to evaluate and regression test different aspects of the software and hardware integration layer. This performance simulation platform includes systems
- At Nuro, every autonomy code change, from ML model updates to radius of map around the robot to number of evaluated trajectories, must be validated for real-time performance on actual robot compute hardware before it reaches the road. You will own the infrastructure that makes this possible.
- Our Performance Simulation Platform is a hybrid benchmarking system: physical bench-top rigs running production robot compute (NVIDIA Thor platform), orchestrated by cloud-native infrastructure (Kubernetes, GCP), automated data pipelines feeding performance metrics into BigQuery and Grafana, pre/post simulation magic, custom tracing and profiling tools, and much much more.
- Engineers across the company rely on this platform daily to answer questions like:
- How will my new ML model affect contention on the GPU?
- How does a new data format impact onboard logging rate or network contention as more data might be flowing from through the system?
- How much memory should be allocated for this new module, and how does it fit into the overall system budget?
- You'll be responsible for development, integration
- the evolution of this platform — from the bare-metal OS and networking layer through the job orchestration and CI/CD integration up to the data analysis and visualization layer. This is a high-ownership, high-autonomy role on a small team where your work directly gates the release velocity of the entire autonomy stack. You'll be the technical DRI for the platform — setting the roadmap, making architectural calls, representing the platform's needs to the leadership team
- ensuring the system scales through multiple hardware generations.
- About the Work
- Benchmarking Infrastructure: Develop and maintain the job orchestration layer that schedules, executes
- validates autonomy performance benchmarks across a fleet of physical bench-top systems — integrated into CI/CD pipelines as merge-blocking and release-blocking quality gates.
- Platform Reliability & Observability: Build monitoring, alerting
- self-healing automation for the bench fleet. Proactively identify systemic risks — capacity bottlenecks, hardware degradation patterns, infrastructure single points of failure — before they become outages. Track utilization, failure rates
- capacity trends to ensure the platform scales ahead of organizational demand.
- Performance Data Pipelines: Design and build end-to-end data pipelines that capture fine-grained performance metrics (CPU/GPU utilization, memory bandwidth, E2E latency, scheduling jitter) from bench-top runs, process them at scale
- surface actionable insights through dashboards and automated regression detection.
- Statistical Analysis & Experimentation: Work with Data Science to develop rigorous experimentation methodology for performance results from non-deterministic autonomy workloads — including variance analysis, significance testing
- regression detection.
- Bare-Metal & OS Platform: Guide the SRE team through the OS and system-level configuration of bench hardware — including Linux kernel tuning, boot infrastructure, networking
- hardware bring-up — ensuring the platform faithfully reproduces production robot compute behavior.
- Drive Platform & Allocation Strategy: Own the planning lifecycle for the benchmarking fleet across hardware generations. Partner with engineering and program leadership to negotiate hardware allocation, model utilization scenarios under real-world constraints
- present data-backed trade-off recommendations — balancing testing coverage, user throughput, cost
- SLA commitments against finite physical resources.
- Cross-Functional Collaboration: Partner with Hardware Engineering, NPI (New Product Introduction), SRE (Site Reliability Engineering), Perception, Behavior
- Data Science teams to translate their performance analysis needs into robust, self-service infrastructure.
- About You
- Experience: 5+ years of industry software engineering experience.
- Software Engineering: Strong proficiency in Python and working proficiency in C++. You write clean, testable, well-documented code and care about long-term maintainability.
- Data Engineering: Experience building data pipelines, ingestion, transformation, storage, and visualization. Familiarity with SQL and analytical workflows.
- Systems & Infrastructure: Deep comfort with Linux systems — you've configured kernels, debugged boot issues, written systemd units
- managed bare-metal infrastructure. You understand networking, storage
- compute at a level beyond "it just works."
- Technical Leadership: Experience setting technical vision and roadmap for a project or platform, driving alignment across multiple stakeholders. You've independently identified the cross-functional partners needed to unblock and deliver
- you've briefed senior engineering leadership on trade-offs and recommendations.
- AI-Native: You treat AI as a core part of your engineering workflow, not an occasional shortcut — you use agentic tooling (e.g., Claude Code) across the development lifecycle and you understand the boundaries of when AI output demands extra scrutiny versus when it accelerates you.
- Bias for Action: Comfortable operating in ambiguous, fast-moving environments where you need to balance long-term architecture with short-term delivery.
- Bonus Points:
- Experience with performance engineering, especially around tooling integration (perf, Perfetto, pprof, eBPF, NVIDIA Nsight Systems, NVIDIA CUPTI).
- Experience in robotics or AV, particularly with NVIDIA DriveOS stack.
- At Nuro, your base pay is one part of your total compensation package. For this position, the reasonably expected base pay range is between $183,000 and $275,000 for the level at which this job has been scoped. Your base pay will depend on several factors, including your experience, qualifications, education, location
- skills. In the event that you are considered for a different level, a higher or lower pay range would apply. This position is also eligible for an annual performance bonus, equity
- a competitive benefits package.
- At Nuro, we celebrate differences and are committed to a diverse workplace that fosters inclusion and psychological safety for all employees. Nuro is proud to be an equal opportunity employer and expressly prohibits any form of workplace discrimination based on race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, veteran status
- any other legally protected characteristics.
Required skills
PythonC++KubernetesGCPBigQueryGrafana