Senior Deep Learning Research Engineer – Multimedia GPU team
Top focus
NVIDIA is a global leader in AI, high-performance computing, and visualization, with GPU technology powering everything from modern computers to robots, autonomous systems, and the metaverse. As the pioneer of AI computing, NVIDIA is shaping the future of next-generation multimedia, generative AI, and spatial computing.
We are looking for a visionary Senior Deep Learning Research Engineer to join our team. In this role, you will bridge the gap between Deep learning research and production, pioneering the next generation of audio foundation models. If you are passionate about solving ultra-complex, multi-modal AI problems that will redefine industry standards for entertainment, communication, and digital twins, come join us.
What you’ll be doing: Pioneer Generative AI - Audio Architecture: Lead the research, design, and development of state-of-the-art deep learning models solving complex problems in audio and speech domains. Drive Future Technology Roadmaps: Define the technical vision and R&D strategy for NVIDIA’s future Audio, Speech and multimedia DL algorithms, ensuring alignment with hardware advancements.
Scale Massive Foundation Models: Train and optimize large-scale generative models (Diffusion, Transformers, Autoregressive models) using distributed training across massive GPU clusters. Audio-Speech-Visual Fusion: Develop advanced algorithms for speech transformation, spatial audio, audio enhancements, and real-time video/audio enhancements using Deep Learning Algos.
Cross-Functional Leadership: Collaborate closely with NVIDIA Research, hardware architecture teams, and product groups (such as NeMo, AI4Media, and Broadcast) to productize breakthrough technologies. Productize and Deploy models on Edge: Develop and productize inference models on NVIDIA GPUs and Nvidia RTX Spark platforms as SDK/Microservices after optimization.
Technical Mentorship: Mentor senior engineers and scientists, foster a culture of technical excellence, and maintain high standards via code and design reviews. What we need to see: PhD in Computer Science, Artificial Intelligence, Applied Mathematics, or a related quantitative field. 10+ years of industry or post-doc experience directly developing advanced deep learning models for audio, image, and video processing.
Deep Mastery of Generative AI: Proven track record of working with Diffusion models, GANs, Transformers, VAEs, Neural Radiance Fields (NeRFs) / 3D Gaussian Splatting, GRUs etc. Strong Programming & Software Architecture: Elite Python coding skills with a solid foundation in production-grade software design, scalability, and data structures.
Framework Proficiency: Expert-level hands-on experience with PyTorch and deep familiarity with multi-modal data processing pipelines (video decoding, audio DSP, spectrogram analysis). Proven Impact: A strong portfolio of shipped high-impact commercial AI products or a stellar publication record at top-tier AI conferences (CVPR, ICCV, SIGGRAPH, NeurIPS, ICASSP, Interspeech).
Ways to stand out from the crowd: Deep understanding of Generative Audio/Speech Architecture and Algorithms Experience with large-scale distributed training frameworks (e.g., Megatron-LM, DeepSpeed, PyTorch FSDP) on cluster architectures. High proficiency in C++ and low-level GPU optimization tools like CUDA, cuDNN, Triton, or TensorRT .
Experience in building World Models or physics-informed neural networks for video synthesis would be an add-on. NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We do not discriminate based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.