hazyresearch.stanford.edu

GPUs Go Brrr

The latest advancements in AI compute power optimization and efficiency with a deep dive into ThunderKittens technology and projects such as Based and FlashFFTConv.

Explore links in this article

www.semianalysis.com

GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE

Demystifying GPT-4: The engineering tradeoffs that led OpenAI to their architecture.

https://semianalysis.com

www.instagram.com

AI uses an awful lot of compute.

https://instagram.com

www.semianalysis.com

Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors

Compute Resources That Make Everyone Look GPU-Poor

https://semianalysis.com

arxiv.org

AI uses an awful lot of compute.

https://arxiv.org

www.together.ai

BASED: Simple linear attention language models balance the recall-throughput tradeoff

Making AI use less compute with Based.

https://together.ai

hazyresearch.stanford.edu

Monarchs and Butterflies: Towards Sub-Quadratic Scaling in Model Dimension

A summary of our work buildings towards sub-quadratic models.

https://stanford.edu

hazyresearch.stanford.edu

H3: Language Modeling with State Space Models and (Almost) No Attention

Replacing attention with SSMs in language modeling.

https://stanford.edu

hazyresearch.stanford.edu

The Safari of Deep Signal Processing: Hyena and Beyond

Hyena is a large language model that uses long convolutions and gating to reach attention quality with lower time complexity.

https://stanford.edu

hazyresearch.stanford.edu

Simplifying S4

Explaining S4 from the first principles of signal processing.

https://stanford.edu

hazyresearch.stanford.edu

FlashAttention: Fast Transformer Training with Long Sequences

https://stanford.edu

Breakdown

This article from Hazy Research highlights the significant amount of compute power that Artificial Intelligence (AI) systems require and discusses recent efforts aimed at reducing this demand while increasing efficiency.

Key points

NVIDIA's new H100 GPU has immense compute power, but its full performance requires carefully managing various hardware components like tensor cores, shared memory, address generation, and occupancy.
ThunderKittens, an embedded DSL developed by Hazy Research, is introduced as a tool to accelerate the creation of high-speed kernels for AI applications.
The authors argue for reorienting AI models and system design around the constraints and capabilities of modern accelerator hardware.

Read full post on hazyresearch.stanford.edu →

Latest News

July 2024

Build Design Systems With Penpot Components

Jul 21, 2:15 AM

Penpot's new component system for building scalable design systems, emphasizing designer-developer collaboration.

smashingmagazine.com

CSS Stuff I’m Excited After the Last CSSWG Meeting

Jul 20, 10:24 PM

Key CSS developments from CSSWG meeting, including inline conditionals, cross-document transitions, and anchor positioning.

css-tricks.com

I Tried to Vape the Internet

Jul 19, 7:29 PM

Journalist Samantha Cole explores the reality behind viral 'internet vape' memes, testing a smart vape with limited connectivity features.

404media.co

The Objects of Our Life (1983)

Jul 18, 7:32 PM

Steve Jobs' visionary 1983 Aspen talk highlights the crucial role of design in making personal computers accessible and envisions them as tools for creativity and human progress.

stevejobsarchive.com

How To Design Effective Conversational AI Experiences

Jul 17, 2:41 PM

Learn how to design effective conversational AI experiences with this comprehensive guide by Yinjian Huang.

smashingmagazine.com

AI's $600B Question

Jul 16, 2:38 PM

The widening gap between AI infrastructure investments and revenue growth.

sequoiacap.com

The End of Influencers

Jul 15, 2:35 PM

Michal explores the decline of genuine engagement on social media, the rise of personal branding, and the potential resurgence of long-form content in the influencer-saturated digital landscape.