Apple’s Transformer-powered predictive text model

jackcook.com

Apple’s Transformer-powered predictive text model

At WWDC, Apple unveiled a new transformative feature powered by a Transformer language that's set to enhance predictive text recommendations within upcoming iOS and macOS versions.

Explore links in this article

news.ycombinator.com

Hacker News discussion on Apple's new Transformer-powered predictive text model

https://ycombinator.com

www.reddit.com

Reddit thread discussing Apple's new Transformer language model for predictive text

https://reddit.com

Apple Newsroom

iOS 17 makes iPhone more personal and intuitive

Apple today announced iOS 17, a major release that upgrades the communications experience across Phone, FaceTime, and Messages.

https://apple.com

GitHub

GitHub - hot3eed/xpcspy: Bidirectional XPC message interception and more. Powered by Frida

Bidirectional XPC message interception and more. Powered by Frida - hot3eed/xpcspy

https://github.com

machinethink.net

Insights into Apple's CoreML framework for machine learning on iOS

https://machinethink.net

GitHub

GitHub - jackcook/predictive-spy: Spying on Apple’s new predictive text model

Spying on Apple’s new predictive text model. Contribute to jackcook/predictive-spy development by creating an account on GitHub.

https://github.com

arXiv.org

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly-available datasets. Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board. Further, in conducting a thorough evaluation of modeling choices, both for pretraining and task-specific fine-tuning, we discover that some common practices are unnecessary with BERT models, such as using complex tagging schemes in named entity recognition (NER). To help accelerate research in biomedical NLP, we have released our state-of-the-art pretrained and task-specific models for the community, and created a leaderboard featuring our BLURB benchmark (short for Biomedical Language Understanding & Reasoning Benchmark) at https://aka.ms/BLURB.

https://arxiv.org

arXiv.org

BloombergGPT: A Large Language Model for Finance

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT.

https://arxiv.org

Microsoft Research

Unified Language Model Pre-training for Natural Language Understanding and Generation - Microsoft Research

This paper presents a new UNIfied pre-trained Language Model (UNILM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to […]

https://microsoft.com

openai.com

OpenAI's research on improving language models

https://openai.com

Breakdown

At WWDC, Apple unveiled a new transformative feature powered by a Transformer language that's set to enhance predictive text recommendations within upcoming iOS and macOS versions.

Jack Cook highlights this as Apple's foray into the more uncertain world of leveraging language models (LLMs), a change from their usual focus around polish and perfection. The feature raised questions about the underlying model, its architecture, and the training data used, with elusive details prompting further inquiry.

Key points:

Apple's adoption of the Transformer language model for predictive text recommendations in iOS and macOS reflects a strategic shift towards incorporating advanced language technologies.
The feature's operational mechanism involves suggesting completed individual words as users type, with occasional multi-word suggestions, demonstrating an evolving integration of predictive text functionality.
Uncertainties linger around the specifics of the model's framework, training data sources, and the extent of Apple's integration of Transformer-based technology.

Highlights

Apple hasn't deployed many language models of their own, despite most of their competitors going all-in on large language models over the last couple years. I see this as a result of Apple generally priding themselves on polish and perfection, while language models are fairly unpolished and imperfect.

Read full post on jackcook.com →

Latest News

July 2024

Build Design Systems With Penpot Components

Jul 21, 2:15 AM

Penpot's new component system for building scalable design systems, emphasizing designer-developer collaboration.

smashingmagazine.com

CSS Stuff I’m Excited After the Last CSSWG Meeting

Jul 20, 10:24 PM

Key CSS developments from CSSWG meeting, including inline conditionals, cross-document transitions, and anchor positioning.

css-tricks.com

I Tried to Vape the Internet

Jul 19, 7:29 PM

Journalist Samantha Cole explores the reality behind viral 'internet vape' memes, testing a smart vape with limited connectivity features.

404media.co

The Objects of Our Life (1983)

Jul 18, 7:32 PM

Steve Jobs' visionary 1983 Aspen talk highlights the crucial role of design in making personal computers accessible and envisions them as tools for creativity and human progress.

stevejobsarchive.com

How To Design Effective Conversational AI Experiences

Jul 17, 2:41 PM

Learn how to design effective conversational AI experiences with this comprehensive guide by Yinjian Huang.

smashingmagazine.com

AI's $600B Question

Jul 16, 2:38 PM

The widening gap between AI infrastructure investments and revenue growth.

sequoiacap.com

The End of Influencers

Jul 15, 2:35 PM

Michal explores the decline of genuine engagement on social media, the rise of personal branding, and the potential resurgence of long-form content in the influencer-saturated digital landscape.