All jobs

Staff AI Inference and Acceleration Engineer

Figureai5h ago
United StatesOnsite$180K–$275KFull-timeStaff Level8+ yrs exp

Top focus

Staff Engineer

Figure is an AI robotics company developing autonomous general-purpose humanoid robots. The goal of the company is to ship humanoid robots with human level intelligence. Its robots are engineered to perform a variety of tasks in the home and commercial markets.

Figure is headquartered in San Jose, CA. We are looking for a Staff AI Inference & Acceleration Engineer to join the Platform Software team and own the on-board inference architecture for Figure’s humanoid robots. You will be the technical authority on how AI workloads are mapped, optimized, and executed across the robot’s compute hardware — driving down power consumption and cost while meeting the strict latency and reliability demands of a real-time autonomous system

Responsibilities

  • Own the on-board inference architecture — mapping models to available accelerators (NPU, GPU, DSP, CPU) based on latency, power, and memory budgets.
  • Partition inference workloads across heterogeneous compute resources, balancing real-time performance with power and thermal constraints.
  • Define and maintain a system-level compute budget across all inference tasks running on the robot.
  • Evaluate next-generation acceleration hardware and contribute to the definition of future compute platform requirements.
  • Optimize inference toolchains end-to-end — from model export through runtime execution — for target hardware.
  • Apply quantization (INT8, INT4, mixed-precision), pruning, operator fusion, and other compression techniques to reduce compute, memory, and power footprint.
  • Profile inference pipelines to identify and eliminate bottlenecks in latency, memory bandwidth, and power consumption.
  • Optimize kernel scheduling, memory layout, and data movement across the compute hierarchy.
  • Partner closely with the AI/ML team to define model architecture constraints that are hardware-friendly from the outset.
  • Work with the Platform Software team on runtime integration, scheduling, and power management.
  • Engage with silicon vendors and research teams to track the accelerator landscape and influence hardware roadmaps

Requirements

  • M.S. or Ph.D. in Computer Engineering, Electrical Engineering, Computer Science, or a related field — or equivalent industry experience.
  • At least 8 years of industry experience in hardware acceleration, ML systems, or compute architecture.
  • Deep understanding of AI/ML inference — model formats (ONNX, TFLite, etc.), inference runtimes, and deployment pipelines.
  • Hands-on experience optimizing models for edge or embedded hardware using quantization, pruning, and operator-level tuning.
  • Strong understanding of computer architecture — memory hierarchies, data movement, and heterogeneous compute.
  • Experience profiling and benchmarking inference workloads across CPU, GPU, NPU, DSP.
  • Familiarity with low-level toolchains and compilation frameworks (e.g. TVM, MLIR, TensorRT, Torch, SNPE/QNN, JAX, CUDA, ROCm).
  • Solid software engineering skills in C++ and Python.
  • Strong cross-functional communication skills — able to work effectively across hardware, software, and AI/ML teams.
  • Bonus Qualifications:
  • Knowledge of real-time operating constraints and their impact on inference scheduling.
  • Track record of co-designing model architectures with ML teams to meet hardware constraints.
  • The US base salary range for this full-time position is between $180,000 - $275,000 annually.
  • The pay offered for this position may vary based on several individual factors, including job-related knowledge, skills
  • experience. The total compensation package may also include additional components/benefits depending on the specific role. This information will be shared if an employment offer is extended.

Required skills

AIMLhardware accelerationcompute architecturemodel optimizationquantizationpruningC++PythonONNXTFLiteTVMMLIRTensorRTTorch
Posted on JobRush — the end-to-end AI job-search platform.