Llama-3.1 Announcement

We are happy to announce that we have brought up support for Llama-3.1-70B inference on Tenstorrent’s 8-chip systems, the TT-QuietBox and the TT-LoudBox.

Share this article

The source code for Llama-3.1-70B and other models that are supported is on our GitHub. We have also merged support for Llama-3.1-8B, running on our single-chip n150 card.

Implementation highlights:

Fractured with 8-way tensor parallelism
Uses FlashAttention and FlashDecode
Uses Mixed BF16, BFP8, and BFP4 precision
Performance was measured in eager mode with tracing disabled

We are working on optimizations which will get us to our target of 20 tokens/second/user. Buy our 8-chip systems (TT-QuietBox and TT-LoudBox) to try Llama-3.1-70B at home on Tenstorrent hardware!

Other articles

Tenstorrent Launches Blackhole™ Developer Products at Tenstorrent Dev Day

Tenstorrent launched the next generation Blackhole™ chip family today at their DevDay event in San Francisco.

Community Highlight: Tenstorrent Wormhole Series Part 3: NoC propagation delay

An in depth look at Tenstorrent Wormhole, originally posted on corsix.org

ECOBLOX Partners with Tenstorrent to Drive AI/HPC Data Center Growth in the Middle East/Africa Region

Dubai, UAE, March 11, 2025 – ECOBLOX, a pioneer in AI/HPC supercomputing system integration for design and construction of data centers, has announced a strategic partnership with Tenstorrent, a next-generation computing company that builds computers for AI, to support rapid growth in the Middle East and Africa region.