Senior Software Engineer, Applied AI
Top focus
We are looking for a Senior Software Engineer, Applied AI Systems, to build production AI / ML and agentic solutions. We need a hands-on senior engineer who can turn ambiguous technical problems into durable software systems and AI-enabled systems: agents, workflow services, APIs, data pipelines, tool integrations, evaluation and benchmarking harnesses, reference architectures, and operational tooling.
We work at the intersection of applied AI, agentic workflows, software engineering, distributed systems, performance engineering, accelerated computing, and data infrastructure. In this role, you will build AI systems as real software systems: write and review high-quality code, make architecture tradeoffs, benchmark behavior and performance, and outcomes from prototype through validation, hardening, deployment, and ongoing support.
This is an opportunity to shape how production applied AI systems are built, measured, and reused inside NVIDIA! We partner across global teams and time zones for design reviews, planning, debugging, support critical issues, and technical decision-making.
We need an engineer who turns complex requirements into clear technical plans, keeps the focus on reusable software capability rather than one-off delivery, and drives execution across teams. What you will be doing: Build and own production-grade applied AI systems for NVIDIA’s technical and solution development use cases, including agentic solutions where they materially improve the systems and softwares.
Design and build agentic workflows and the software around them: workflow services, APIs, retrieval, MCP/A2A-style tool integrations, agent harnesses, automation, telemetry, operational controls, and human oversight. Design reliable services, APIs, workflow state, event-driven execution, and observability using systems such as Kafka, ClickHouse, and OTel-style patterns.
Translate complex technical and operational requirements into clear system designs, plans, interfaces, measurable outcomes, and pragmatic technical decisions through design reviews, code reviews, and clear communication. Develop production software in Python and other relevant languages, with strong testing, observability, CI/CD, documentation, and operational practices.
Build performance and benchmarking workflows for existing production solutions or products, including validation harnesses, regression tests, tracing, metrics, failure analysis, latency, throughput, reliability, resource usage, and AI/inference behavior where relevant.
Improve standard solution patterns alongside larger applied AI systems, working with NVIDIA engineering and solution teams to codify repeated patterns, product gaps, and field lessons into APIs, services, reference architectures, playbooks, test harnesses, and shared engineering building blocks.
Debug and support production solutions across software, infrastructure, AI models, data pipelines, inference services, and GPU-accelerated environments, turning recurring support patterns into product or platform improvements. What we need to see: BS, MS, or PhD in Computer Science, Engineering, AI/ML, or equivalent experience, with 5+ years of professional software engineering experience owning production systems or meaningful platform components.
Hands-on experience with LLM, generative AI, RAG, agentic AI, MCP or intelligent AI technologies beyond simple prompting or notebooks, including tool use, retrieval, evaluation, guardrails, orchestration, or human-in-the-loop control. Strong Python engineering skills and practical experience with at least one additional production programming language such as C++, Go, Rust, or TypeScript.
Demonstrated ability to develop and build distributed systems, backend services, data pipelines, workflow orchestration, APIs, or developer platforms using production environments like Kafka, ClickHouse, PostgreSQL, Redis, object storage, Kubernetes, or similar technologies.
Strong system design and operational judgment, including reliability, latency, cost, security, privacy, scalability, debuggability, maintainability, performance analysis, benchmarking, profiling, or capacity evaluation. Excellent debugging and problem-solving skills across software, infrastructure, AI systems, and performance bottlenecks.
Proven ownership of ambiguous, cross-team engineering work, with ability to collaborate with distributed teams spanning US Pacific, EMEA, and APAC timezones
Required
- Strong written and verbal communication skills in English.
- Ways to stand out from the crowd: Experience building real-world AI implementations, agent tools, MCP-compatible modules, A2A-style bridges, agent frameworks, evaluation frameworks, or RAG systems used by real users.
- Familiarity with NVIDIA GPU, AI Software Technologies such as NVIDIA NIM, NeMo Agent Toolkit, CUDA and Agentic AI development frameworks Open-source contributions, technical papers, patents, conference talks, engineering blogs
- major internal engineering artifacts We are an equal opportunity employer and value diversity at our company.
- We do not discriminate on the basis of race, religion, color, national origin, sex, gender, gender expression, sexual orientation, age, marital status, veteran status, or disability status.
- We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions
- to receive other benefits and privileges of employment.
- Please contact us to request accommodation.
- Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package.
- As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/