Debug Validation Engineer — Multiple Levels
Graphcore•4h ago
United KingdomOnsiteFull-time
- About us
- At Graphcore, we’re building the future of AI compute.
- We’re a team of semiconductor, software and AI experts, with deep experience in creating the complete AI compute stack - from silicon and software to infrastructure at datacenter scale.
- As part of the SoftBank Group, backed by significant long-term investment, we are delivering key technology into the fast-growing SoftBank AI ecosystem.
- To meet the vast and exciting AI opportunity, Graphcore is expanding its teams around the world.
- We are bringing together the brightest minds to solve the toughest problems, in a place where everyone has the opportunity to make an impact on the company, our products and the future of artificial intelligence.
- Job Summary
- Reporting to Senior Director of Post Silicon Validation, the Debug Validation Engineer will drive post-silicon debug and validation activities for next-generation AI compute silicon and systems. You will lead teams passionate about identifying, reproducing, analyzing
- resolving complex silicon, firmware
- system-level issues during bring-up, characterization
- product readiness. This role combines deep technical debugging expertise with strong cross-functional collaboration across multiple engineering fields.
- The Team
- The Post-Silicon Debug and Validation team manages bring-up, fault diagnosis
- validation of Graphcore silicon and systems. Our team participates throughout the entire product lifecycle, supporting initial silicon bring-up, subsystem validation, system integration
- production readiness tasks. We coordinate closely with hardware, firmware, software
- systems teams to examine complex failures, develop debug strategies
- advance validation infrastructure.
- Responsibilities and Duties
- Lead post-silicon debugging and validation efforts for AI compute silicon and platform technologies
- Contribute to debug and validation activities across multiple projects and achievements
- Analyze and address intricate silicon, firmware, software, and system-level problems during bring-up and validation
- Develop structured debug methodologies and failure analysis processes to improve issue resolution efficiency
- Work in close partnership with architecture, RTL, firmware, software, and systems engineering groups to determine root causes and carry out corrective measures
- Drive debug of CPU, memory, interconnect, and high-speed I/O subsystems under functional, stress, and workload conditions
- Develop and improve automated debug, regression, and validation infrastructure using Python and related technologies
- Analyze logs, traces, telemetry, and hardware data to isolate and characterize system failures and performance issues
- Support development of validation tests, debug tooling, and custom diagnostics to improve coverage and observability
- Define validation metrics, debug workflows, and reporting standards to ensure consistent and repeatable analysis
- Communicate technical risks, status, and recommendations clearly to engineering leadership and cross-functional collaborators
- Support silicon readiness reviews and contribute to product quality and release decisions
- Contribute to continuous improvement of debug methodologies, validation infrastructure, and engineering workflows
- Candidate Profile
- Essential:
- Strong experience in bare metal environments
- Strong understanding of SoC and platform architectures
- Expertise in debug infrastructure and post-silicon debug methodologies
- Strong programming skills in Python, C, or debug scripting languages such as CMM or equivalent experience
- Highly motivated self-starter with a collaborative and team-oriented approach
- Ability to collaborate across teams and programming languages to uncover root causes of deep and complex issues
- Experience of the post-silicon validation process applied in digital ASIC environments
- Strong Linux and Python experience
- Outstanding communication skills and the ability to collaborate effectively to solve complex problems
- Excellent problem-solving, analytical, and diagnostic skills
- Deep knowledge of scan, DFT, JTAG, and trace infrastructure
- Strong debug skills including fault tree analysis, failure isolation, fishbone methodologies, and system-level debug techniques
- Capability to operate autonomously on technically intricate debug and validation tasks spanning hardware, firmware, and software areas
- Desirable
- Understanding of DFT flows from insertion through post-silicon validation
- Experience developing tooling for parsing and analyzing debug data, including scan dump parsing
- Driver-level experience with one or more of the following technologies: PCIe, Ethernet, Memory technologies including LPDDR, DDR, and HBM, Peripheral interfaces such as I2C, I3C, and SPI
- Experience using CoreSight and similar debug infrastructure including CTI, ETx, DStream, JLink, Lauterbach, ATB, and STM or equivalent experience
- Strong understanding of mixed-signal components like PLLs, high-speed PHYs, and IC control/communication protocols
- Experience with Arm CPU architectures, system IP, and associated debug tooling
- Experience with AMBA protocols
- Understanding of ML applications and associated workloads
- Experience in characterization, failure analysis, test development, statistical analysis, and customer support
- Benefits
- In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences
- we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments.
Required skills
PythonCLinuxdebug scriptingSoC architecturepost-silicon validationdebug infrastructurefault tree analysisfailure isolationsystem-level debugPCIeEthernetmemory technologiesI2CJTAG