Skip to main content
Mr Terrence Tan_1000x1000

Mr. Terrence TAN

Senior Director, Silicon Manufacturing and Packaging Engineering, Microsoft

Terrence Tan is a Senior Director and Manufacturing Technologist at Microsoft, based in Penang, Malaysia. He leads high‑volume manufacturing and system execution for advanced compute platforms, spanning silicon, packaging, system integration, and rack deployment. Terrence focuses on practical manufacturing strategies to improve quality, reliability, and total cost of ownership at scale, working closely with OSATs, system integrators, and cross‑functional engineering teams across the global semiconductor ecosystem.

Presentation Title

From Silicon to Rack: Building an End‑to‑End Debug and Health Framework for Hyperscale Systems

As semiconductor products evolve toward custom silicon, advanced packaging, and disaggregated system architectures, manufacturing quality challenges increasingly emerge only at system and rack scale, rather than at individual process steps. Failures driven by production workloads, power delivery, thermal interactions, and interconnect behavior are often missed by siloed debug approaches limited to wafer sort, package test, or standalone system validation—resulting in extended debug cycles, noisy RMA flows, and slow reliability learning.

This paper presents a practical, end‑to‑end manufacturing health and debug framework spanning silicon, system, and rack levels, designed for high‑volume, scale‑up and scale‑out manufacturing environments. The approach combines deterministic manufacturing health indicators—such as silicon telemetry, test coverage, and system validation hooks—with structured data correlation across production test results, workload context, and rack‑level observability (power, thermal, and interconnect signals). Lightweight AI‑assisted analytics are applied to cluster failure signatures and prioritize debug paths, while preserving engineering interpretability and manufacturability.

Deployed in production, the framework improves RMA signal quality, shortens failure isolation cycle time, and enables earlier detection of latent silicon‑system marginalities. The paper concludes with lessons learned from scaling this approach across high‑volume manufacturing and its applicability to advanced SoCs, chiplet‑based designs, and AI‑optimized datacenter platforms.

Back to AI for Manufacturing Forum