All jobs

Senior Member of Technical Staff: ML Systems and Infrastructure

Devrev4h ago
Bangalore, IndiaOnsite$150MFull-timeSenior Level5+ yrs exp

Top focus

Systems EngineerInfrastructure EngineerMl Infra EngineerTechnical RecruiterTechnical Writer

About DevRev At DevRev, we're building the future of work with Computer – your AI teammate. Unlike traditional tools, Computer unifies all your data sources, tools, and workflows into a single AI-ready platform, giving employees real-time insights, proactive suggestions, and powerful agentic actions.

It extends your existing software with AI-native apps and agents that work alongside your teams and customers – updating workflows, coordinating across teams, and eliminating repetitive work. We call this Team Intelligence: human-AI collaboration that breaks down silos, brings people back together, and frees you to solve bigger problems.

Backed by Khosla Ventures and Mayfield with $150M+ raised, DevRev is trusted by global companies across industries

What You’ll Do

  • Architect the Future of AI Infrastructure: You will design, build
  • own the end-to-end platform that supports the entire lifecycle of our ML models—from massive-scale distributed training to ultra-low-latency, highly-available inference.
  • Optimize and Serve Cutting-Edge Models: You'll implement and scale sophisticated inference stacks for LLMs using frameworks like vLLM, TensorRT-LLM
  • SGLang . You’ll solve complex challenges in throughput, latency, token streaming
  • automated scaling to deliver a seamless user experience.
  • Empower AI Innovation: You will act as a strategic partner to our AI Research and Data Science teams. You’ll create a seamless developer experience that accelerates their ability to experiment, fine-tune
  • deploy groundbreaking models with velocity and confidence.
  • Automate Everything: You'll develop robust CI/CD/CT (Continuous Training) pipelines using tools like Argo Workflows, ArgoCD
  • GitHub Actions to automate model validation, deployment
  • lifecycle management, ensuring our systems are both agile and rock-solid.
  • What are we looking for
  • Experience: 5+ years in infrastructure or software engineering, with at least 2+ years laser-focused on MLOps or ML infrastructure for large-scale distributed systems.
  • Education: A Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • Kubernetes & Cloud Native Expertise: Deep, hands-on expertise with Kubernetes in production. You are fluent in the cloud-native ecosystem, including Helm, ArgoCD, and Argo Workflows .
  • GPU & Cloud Mastery: Optimize the platform’s performance and scalability, considering factors such as GPU resource utilization, data ingestion, model training, and deployment.
  • Modern LLM Serving Experience: Hands-on experience with modern LLM inference serving frameworks (e.g., vLLM, SGLang, Triton Inference Server, Ray Serve ). You understand the unique challenges of serving generative models.
  • Strong Coder: Strong programming proficiency in Python or Go , with experience using ML frameworks like PyTorch , Jax , TensorFlow .
  • Observability Mindset: A passion for building observable and resilient systems using modern monitoring tools (e.g., Prometheus, Grafana, OpenTelemetry).
  • We would love to see:
  • Deep performance optimization skills, including writing custom inference kernels in CUDA or Triton to accelerate model performance beyond what off-the-shelf frameworks provide.
  • Experience with model optimization techniques like quantization, distillation, and speculative decoding .
  • Exposure to training and serving multi-modal models (e.g., text-to-image, vision-language).
  • Knowledge of AI safety and evaluation frameworks for monitoring model performance for things like bias, toxicity, and hallucinations.
  • As part of our hiring process, shortlisted candidates will undergo a Background Verification (BGV). By applying, you consent to sharing personal information required for this process. Any offer made will be subject to successful completion of the BGV.
  • DevRev is an equal opportunity employer and does not discriminate on the basis of race, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition
  • any other basis protected by law.

Required skills

KubernetesHelmArgoCDArgo WorkflowsPythonGoPyTorchJaxTensorFlowvLLMSGLangTriton Inference ServerRay ServePrometheusGrafana
Posted on JobRush — the end-to-end AI job-search platform.