All jobs

Staff Software Engineer - Forecast Engine

Servicenow4h ago
United StatesOnsiteFull-timeStaff Level7+ yrs exp
H-1B verified · 143 LCAs

Top focus

Software EngineerStaff EngineerSoftware Engineer Ii

It all started when engineer Fred Luddy wrote code that automated a tedious task for his coworker, Phyllis. She cried tears of joy. That moment inspired Fred to build a company that could do that for everyone—freeing people from busywork so they could focus on meaningful work.

Today, ServiceNow is the AI control tower for business reinvention. Our ServiceNow AI platform brings together any AI, any data, and any workflow— helping 85% of the Fortune 500® work smarter, faster, and better. We're building an AI-native culture where technology and talent are unstoppable together.

And we're just getting started. Join us to put AI to work for people. Employees can work remotely Job Description Team Join the Global Cloud Services organization's FinOps Tools team, which is building ServiceNow's next-generation analytics and financial governance platform.

Our team owns the full modern data stack: Trino for distributed queries, dbt for transformations, Iceberg for lakehouse architecture, Lightdash for business intelligence, and Argo Workflows for orchestration. You will own the Forecast Engine, the system that turns ServiceNow's cloud capacity and cost actuals into forward-looking forecasts, then automatically tracks those forecasts against plan and budget and alerts the right people when reality diverges.

The Forecast Engine also feeds directly into our Future Capacity Reservation (FCR) automation: its forecast of fleet growth and workload migration timing is the signal that drives how much hyperscaler capacity to reserve, in which providers and regions, and when, against the lead-time windows FinOps and Cloud Operations plan around.

Role The Forecast Engine is the simulation and automation core behind FinOps capacity and cost planning. It reads forecasting actuals from the lakehouse and runs a deterministic multi-period simulation of fleet growth, workload migration, placement, and sizing.

It validates each result against hard invariants and publishes forecasts that data scientists, analysts, and FinOps engineers consume in Lightdash. Today it is a fast, single-binary Rust core with a streaming Trino read and an Iceberg publish path.

The next chapter is to turn that engine into an automated, always-on forecasting service. As our Staff Software Engineer for the Forecast Engine, you will design and build the automation layer around the engine: scheduled forecast runs, variance and budget tracking against plan, anomaly and threshold alerting, first-class integration with planning systems, Splunk, and the broader observability stack, and the handoff that turns forecasts into Future Capacity Reservation (FCR) recommendations.

You will make the forecast a living signal: recomputed on a cadence, reconciled against actuals, and translated into the capacity reservations that keep hyperscaler supply ahead of demand. This role demands speed and high velocity. You will take a proven simulation core and rapidly make it a dependable, observable, self-monitoring product that the organization plans against, shipping working increments fast and iterating in tight loops.

The automation layer around the engine is greenfield: you will build it from the ground up. We operate like a small startup, and this is the operating mode of the role and the department: we move quickly, deliver early, keep process light, and keep momentum

What You'll Do

  • Core Responsibilities Design and develop scalable, maintainable
  • reusable software components with a strong emphasis on performance, determinism
  • reliability. Collaborate with product managers and FinOps partners to translate planning and budgeting requirements into well-architected solutions, owning features from design through delivery. Build intuitive and extensible interfaces for forecast consumption (Lightdash models, alert payloads
  • APIs) ensuring flexibility for finance and capacity-planning use cases. Contribute to the design and implementation of new Forecast Engine capabilities while enhancing existing simulation, validation
  • publish paths. Integrate automated testing into development workflows to ensure consistent quality across releases, including determinism (byte-identical output) and forecast-accuracy regression checks. Participate in design and code reviews ensuring best practices in performance, maintainability
  • testability. Develop comprehensive test strategies covering functional, regression, integration
  • accuracy aspects (period-over-period identity, backtest grading against real actuals). Foster a culture of continuous learning and improvement by sharing best practices in engineering and quality. Promote a culture of engineering craftsmanship, knowledge-sharing
  • thoughtful quality practices across the team. Technical Leadership & Architecture Own the architecture of the Forecast Engine and the automation layer around it: scheduled runs, variance/budget tracking
  • alerting. Lead technical decision-making on forecast cadence, reconciliation against actuals, alert routing
  • the contract between the simulation core and downstream consumers. Establish best practices for forecast automation: idempotent scheduled runs, deterministic reproducibility, fail-loud data contracts
  • no silent fallbacks. Define how forecast signals (variance, budget breach, capacity headroom, migration drift) are computed, thresholded
  • surfaced. Drive innovation in forecasting and planning automation, including the responsible use of AI/ML tooling to accelerate development and analysis. Hands-On Development Build the automation that runs the Forecast Engine on a schedule via Argo Workflows, with retries, alerting on failure
  • run-to-run reproducibility. Develop variance and budget tracking: reconcile each forecast against plan and against the latest actuals, compute deltas at the grains that matter (provider, region, pod, workload)
  • persist a queryable variance history. Implement alerting that fires on budget breach, forecast drift, capacity thresholds
  • pipeline health, routed to Splunk and the team's notification channels. Integrate with planning systems so plan/budget targets flow into the engine and forecast outputs flow back out to the planning surface. Drive the Future Capacity Reservation (FCR) handoff: translate the forecast of fleet growth and migration timing into reservation recommendations (how much capacity
  • providers/regions/pods
  • by when), aligned to hyperscaler procurement lead-time windows and reconciled with Cloud Operations so the same capacity is never reserved twice. Build and extend the Rust simulation core (period loop, growth, migration, routing, packing, sizing, validation) and its streaming Trino read and Iceberg publish paths. Create and maintain the Lightdash forecast and variance marts (standard dbt models on the published tables) that finance and capacity partners consume. Platform Foundation Design the forecast data contract (the upstream view the engine reads) so data-quality problems halt loudly and are fixed at the source, never papered over downstream. Implement scheduled, observable forecast runs with full run lineage: inputs, seed, config, output location
  • metrics for every run. Build observability and monitoring for the Forecast Engine: run success rates, forecast latency, memory ceilings, accuracy drift
  • alert-delivery health, emitted to Splunk and the observability stack. Establish an automation foundation that scales from a handful of scheduled scenarios to a broad, multi-scenario forecasting program. Forecast Automation & Alerting Create scheduled, parameterized forecast scenarios with opinionated structure: pinned config, deterministic seeds, validated inputs
  • published outputs. Build tooling for one-command scenario runs and for promoting a scenario from ad-hoc to scheduled with minimal manual intervention. Establish guardrails: input data contracts, resource/memory ceilings
  • loud halts that surface real problems instead of producing wrong-but-quiet numbers. Collaborate closely with FinOps analysts and capacity planners to rapidly iterate on variance definitions, alert thresholds
  • the signals that matter, without over-engineering. Prioritize forecast reliability, accuracy tracking
  • clear alerting over feature breadth. AI-Augmented Development Use modern AI development tools (e.g., Claude Code, Cursor, GitHub Copilot) to accelerate development, testing
  • help the team adopt effective, well-validated AI-assisted practices. Collaboration & Integration Work autonomously with guidance from Engineering and FinOps leadership. Collaborate with DevOps and platform teams on scheduling infrastructure, CI/CD pipelines
  • Splunk/observability integration. Partner with FinOps Tools team members working on Trino, dbt, Lightdash
  • Iceberg to ensure seamless integrations. Partner with finance and capacity-planning stakeholders to ensure forecasts, variance
  • alerts map to how they actually plan and budget.
  • Required Experience Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry. 8+ years of experience in software engineering, with a track record of delivering high-quality products with deep expertise in backend systems and cloud-native, data-intensive architecture with a Bachelor's degree
  • or 6 years and a Master's degree
  • or a PhD with 3 years experience in Computer Science, Engineering, or related technical field
  • or equivalent experience. Strong skills in a systems or backend language (Rust, Go, Java, C++, or similar) and in Python for data tooling, automation, and analysis. Proven track record building automated, scheduled data or forecasting pipelines that run reliably in production. Demonstrated ability to deliver at high velocity: shipping production-quality software fast, in tight iteration loops, without sacrificing reliability. Proven track record of greenfield development and building from scratch in environments with evolving requirements. We operate like a small startup, and this role thrives on that: short paths from idea to shipped, minimal process, and high ownership. Hands-on experience building variance/anomaly detection, budget or SLA tracking, or alerting systems at scale. Experience integrating with observability and logging platforms (Splunk, Datadog, Prometheus/Grafana, or similar). Experience with workflow orchestration systems (Argo, Airflow, or similar) and with the modern data stack. Strong knowledge of data structures, algorithms, object-oriented and data-oriented design, design patterns, and performance optimization. Familiarity with automated testing frameworks and integrating tests into CI/CD pipelines. Understanding of software quality principles including reliability, determinism, observability, and production readiness. Ability to troubleshoot complex systems and optimize performance and memory across the stack. Experience validating data correctness: reconciling pipeline outputs against ground-truth actuals and catching silent regressions. Comfort with development tools such as IDEs, debuggers, profilers, source control, and Unix-based systems. Full professional proficiency in English. Technical Expertise Forecasting & simulation: time-series or simulation-based forecasting, scenario modeling, and reconciliation of forecasts against actuals. Variance & alerting: budget vs. actual tracking, anomaly/threshold detection, alert routing, and noise control (deduplication, suppression, severity). Observability: Splunk (search, dashboards, alerts) and metrics/logging integration for pipeline and forecast health. Orchestration: Argo Workflows or similar: scheduled runs, retries, idempotency, failure alerting. Modern data stack: Trino, dbt, Iceberg, Lightdash, or similar lakehouse and BI technologies. Systems engineering: streaming/bounded-memory data processing, deterministic and reproducible computation, and config-driven design (no hardcoded business constants). Data contracts & quality: fail-loud ingestion, upstream contract views, and correctness invariants enforced in code. API & integration design: RESTful services, authentication (OAuth/SAML), and webhook/notification integrations. For positions in this location, we offer a base pay of $166,500 - $291,400 , plus equity (when applicable), variable/incentive compensation and benefits. Sales positions generally offer a competitive On Target Earnings (OTE) incentive compensation structure. Please note that the base pay shown is a guideline, and individual total compensation will vary based on factors such as qualifications, skill level, competencies, and work location. We also offer health plans, including flexible spending accounts, a 401(k) Plan with company match, ESPP, matching donations, a flexible time away plan and family leave programs. Compensation is based on the geographic location in which the role is located and is subject to change based on work location.
  • Work Personas We approach our distributed world of work with flexibility and trust. Work personas (flexible, remote
  • required in office) are categories that are assigned to ServiceNow employees depending on the nature of their work and their assigned work location. Learn more here . To determine eligibility for a work persona, ServiceNow may confirm the distance between your primary residence and the closest ServiceNow office using a third-party service. Equal Opportunity Employer ServiceNow is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, national origin, age, disability, gender identity, veteran status
  • any other category protected by law. In addition, all qualified applicants with arrest or conviction records will be considered for employment in accordance with legal requirements. Accommodations We strive to create an accessible and inclusive experience for all candidates. If you require a reasonable accommodation to complete any part of the application process
  • are unable to use this online application and need an alternative method to apply, please contact globaltalentss@servicenow.com for assistance. Export Control Regulations For positions requiring access to controlled technology subject to export control regulations, including the U.S. Export Administration Regulations (EAR), ServiceNow may be required to obtain export control approval from government authorities for certain individuals. All employment is contingent upon ServiceNow obtaining any export license or other approval that may be required by relevant export control authorities. From Fortune. ©2026 Fortune Media IP Limited. All rights reserved. Used under license.

Required skills

PythonGoRustJavadbtAirflowCI/CDPrometheusGrafanaDatadog
Posted on JobRush — the end-to-end AI job-search platform.