Making Large Language Models work for you

simonwillison.net

Making Large Language Models work for you

The evolution of large language models from OpenAI to Meta's ChatGPT and LLaMA. Explore how concepts like "transformers architecture" and models like GPT-3 have revolutionized AI research and technological advancements.

Explore links in this article

arXiv.org

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

https://arxiv.org

platform.openai.com

Playground for experimenting with AI models, including GPT-3

https://openai.com

simonwillison.net

How to use the GPT-3 language model

I ran a Twitter poll the other day asking if people had tried GPT-3 and why or why not. The winning option, by quite a long way, was “No, I …

https://simonwillison.net

ai.meta.com

Information about LLaMA, a large language model from Meta AI

https://meta.com

Breakdown

Essential reading from Simon Willison's talk on Large Language Models including what they are , how they work, how to use them, and very importantly, personal AI ethics – how should we feel about using AI, and how to use it responsibly.

The talk provides a wholistic overview of LLMs like GPT-3 and ChatGPT, tracing their development. It covered how LLMs work by predicting the next word, their training on vast datasets including copyrighted material, emerging generative techniques, and code generation.

Main Arguments:

LLMs like GPT-3 and ChatGPT predict next words with fluency by training on vast datasets
LLMs raise personal ethical dilemmas around blindly publishing AI-generated content vs. understanding the outputs
LLM training data includes copyrighted books, raising legal concerns shown by lawsuits against OpenAI and Meta
Openly licensed models like LLaMA and LLaMA 2 accelerate open innovation

Read full post on simonwillison.net →

Latest News

July 2024

Build Design Systems With Penpot Components

Jul 21, 2:15 AM

Penpot's new component system for building scalable design systems, emphasizing designer-developer collaboration.

smashingmagazine.com

CSS Stuff I’m Excited After the Last CSSWG Meeting

Jul 20, 10:24 PM

Key CSS developments from CSSWG meeting, including inline conditionals, cross-document transitions, and anchor positioning.

css-tricks.com

I Tried to Vape the Internet

Jul 19, 7:29 PM

Journalist Samantha Cole explores the reality behind viral 'internet vape' memes, testing a smart vape with limited connectivity features.

404media.co

The Objects of Our Life (1983)

Jul 18, 7:32 PM

Steve Jobs' visionary 1983 Aspen talk highlights the crucial role of design in making personal computers accessible and envisions them as tools for creativity and human progress.

stevejobsarchive.com

How To Design Effective Conversational AI Experiences

Jul 17, 2:41 PM

Learn how to design effective conversational AI experiences with this comprehensive guide by Yinjian Huang.

smashingmagazine.com

AI's $600B Question

Jul 16, 2:38 PM

The widening gap between AI infrastructure investments and revenue growth.

sequoiacap.com

The End of Influencers

Jul 15, 2:35 PM

Michal explores the decline of genuine engagement on social media, the rise of personal branding, and the potential resurgence of long-form content in the influencer-saturated digital landscape.