TT-Forge™

TT-Forge™ is Tenstorrent’s MLIR-based compiler, designed to work with ML frameworks ranging from domain-specific compilers to custom kernel generators. TT-Forge™ is natively integrated with Tenstorrent’s existing AI software ecosystem, so you can build with ease.

MLIR: A Unified Approach to AI Compilation

The TT-MLIR compiler serves as the bridge from high-level models and HPC workloads to Tenstorrent hardware execution. MLIR seamlessly integrates with open-source frameworks such as PyTorch, OpenXLA, and JAX, enabling a structured approach to compilation. As the MLIR ecosystem expands, so does the opportunity for future integrations.

Engineered for Innovation

Built with open-source in mind, TT-Forge™ integrates with key technologies, including OpenXLA (JAX, Shardy), LLVM’s MLIR and torch-mlir, ONNX, TVM, PyTorch, and TensorFlow. TT-Forge™ provides a flexible and extensible foundation for AI workloads, bridging high-level frameworks to low-level execution with open standards and community-driven development. For hardware execution, TT-Forge™ leverages Tenstorrent's AI software stack, optimizing workloads for Tenstorrent hardware.

Expanding Generality with  Multi-Framework Frontend Support

tt-torch

tt-torch is a MLIR-native, open-source, PyTorch 2.X and torch-mlir based front-end. It provides stableHLO (SHLO) graphs to tt-mlir.

It supports the ingestion of PyTorch models via PT2.X compile and ONNX models via torch-mlir (ONNX->SHLO).

It also enables breaking down PyTorch graphs into individual operations, facilitating parallelized bug or missing operation discovery.

tt-forge-fe

The TT-Forge-FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their performance and efficiency.

It supports the ingestion of PyTorch, ONNX, TensorFlow, and similar ML frameworks via TVM (tt-tvm).

Based on TVM IR, it enables breaking down graphs from different frameworks into individual operations, making model bringup effort data-driven.

tt-xla

tt-xla leverages PJRT interface to integrate JAX (and in the future other frameworks), tt-mlir and Tenstorrent hardware.

It supports the ingestion of JAX models via jit compile, providing StableHLO (SHLO) graph to the tt-mlir compiler.

The tt-xla plugin is loaded natively in JAX to compile and run JAX models with the tt-mlir compiler and runtime.

Features

Performance

Optimized compilation and custom dialects (TTIR, TTNN, TTKernel) enable efficient execution, maximizing inference and training performance on Tenstorrent hardware. Simplified performance optimization via tt-explorer.

Generality

TT-Forge™ supports multiple ML frameworks (PyTorch, JAX, TensorFlow, ONNX) and MLIR dialects, ensuring broad compatibility and flexibility across diverse AI workloads with the ability to expand to future frameworks.

Tools

Tenstorrent’s toolchain streamlines ML model compilation, optimization, and execution. From MLIR-based compilation, to runtime inspection, these tools enable efficient development, debugging, and performance tuning on Tenstorrent hardware.

Want to learn more about TT-Forge?

Tenstorrent Discord

TT-Forge Github