Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones

Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones

This paper has been accepted at the Foundation Models in the Wild workshop at ICML 2024.
Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get a small language model with good specialized accuracy, even when specialization data is unknown during pretraining. We propose a novel architecture, projected networks (PN). PN is a high capacity network whose parameters can be linearly projected into a small network for fine tuning. We assess the empirical effectiveness of our solution compared to small model training, distillation and hard mixture of experts.

More

Farewell: Fintech Nexus is shutting down

When we started Fintech Nexus in 2013 (known as...

Marty Guy Fink featured on Carl White & Steve Kyles Podcasts

We’re thrilled to share that Marty Guy Fink, Branch...

Goldman Sachs loses profit after hits from GreenSky, real estate

Second-quarter profit fell 58% to $1.22 billion, or $3.08...

How to Craft the Perfect AI Prompt: For Individuals and Businesses

In the rapidly evolving landscape of artificial intelligence, the...

chat gpt login free | Best Tool For Free OpenAI 2024

Introducing ChatGPTOpenAI created the advanced computer program chat gpt...

OpenAI’s GPT-4o mini: AI Power Meets Affordability

In a move towards democratizing artificial intelligence, OpenAI has...