Member of Technical Staff - XPU Architecture

Palo AltoPosted 16 April 2026

Tech Stack

Job Description

Member of Technical Staff - XPU Architecture ABOUT ARCHITECT Architect is a frontier AI lab for chip design. We build AI models and tools for on-demand custom ASICs at scale. Our goal is to co-design custom ASICs alongside evolving ML workloads, and enable a new era of domain-specific chips that unlock capabilities impossible with current hardware paradigms. Born out of Stanford Research, our team blends AI with Silicon with a founding team from Anthropic, Google DeepMind, Meta SuperIntelligence, xAI, Apple and Intel. WHAT YOU'LL DO As a Founding Member of the Technical Staff on the Architecture team at Architect, you'll own the microarchitecture of our AI cores, discovered, and explored using AI. You'll drive HW/SW co-design with compiler and systems teams and carry the architecture from spec through RTL handoff to silicon bring-up. - Define and own the microarchitecture of the AI core — PE/MAC arrays, scratchpad memory hierarchy, on-chip SRAM banking and arbitration, and datapath design — targeting best-in-class performance-per-watt. - Build and maintain cycle-accurate architectural models (C++/SystemC) to evaluate PPA trade-offs across compute density, memory bandwidth, numeric precision, and power before RTL commitment. - Drive HW/SW co-design with compiler and systems teams by defining ISA-level abstractions, instruction scheduling constraints, and data movement patterns that map efficiently to target ML workloads (convolution, attention, elementwise). - Deliver complete microarchitectural specifications with interface definitions, modular cut-lines, and architectural validation models for handoff to RTL and DV teams. - Work with DV and DMA teams to define interface contracts (AXI/AXI-Stream), architectural checkpoints, and validation-ready reference models. WHAT WE'D LIKE TO SEE Qualifications & Skills: - Degree: Bachelor's, Master's, or PhD in Electrical Engineering, Computer Engineering, or a closely related field. - Tapeout Experience: 5+ years (10+ preferred) in advanced-node tapeouts at top chip companies or fast-moving silicon startups. - Domain Background: Deep expertise in NPU and ML accelerator architecture, ideally with experience on Apple Neural Engine, Qualcomm Hexagon NPU, Google TPU, AMD XDNA, Samsung NPU, MediaTek APU, or accelerators at Groq, d-Matrix, Cerebras, MatX, or similar. - Compute Datapaths: Hands-on experience designing systolic arrays, MAC units, vector/SIMD engines, or VLIW execution pipelines. - Memory Hierarchy: Experience with on-chip SRAM banking, scratchpad management, data reuse strategies, and bandwidth balancing against compute throughput. - Modeling: Strong architectural modeling skills in C++, SystemC, or equivalent. Proficiency in SystemVerilog and Python. - End-to-End Ownership: Proven track record taking an architecture from specification through RTL handoff to silicon bring-up and validation. Bonus: - ISA design or programmable accelerator architecture experience. - Understanding of model quantization, mixed-precision inference, and numeric format trade-offs (INT, BF16, NV, MX types). - Advanced power optimization techniques for edge: clock gating, power gating, voltage scaling. - NoC design, on-chip interconnect fabrics, or AMBA protocols (AXI, AHB, APB). - Hands-on FPGA prototyping for architecture validation (ideally Xilinx). WHAT WE OFFER - Competitive salary and meaningful equity stake - Fast-paced startup with autonomy and visible impact - Cutting-edge challenges at the intersection of AI and silicon design

Apply Now

Direct link to company career page

More jobs atArchitect

AI Resume Fit Check

See exactly which skills you match and which are missing before you apply. Free, instant, no spam.

Check my resume fit

Free · No credit card