Size Isn't Everything - Why This Big Brain AI Needs to Go Back to School

Making Big Brain AI Smarter, Not Harder

Bigger isn't always better when it comes to training AI models.

Turns out cramming billions of parameters sometimes makes them dumber, not smarter!

Let's chat about why sweat and study trump simple scale for language AIs.

Lately AI models are packing insane numbers of parameters - like billions of "brain cells."

Gopher has 280 billion!

Megatron is even bigger at 530 billion!

But are massive models overkill?

Turns out, smaller brains trained rigorously on more data do better on language tasks.

Should we send these hulking AIs back to school?

Remember that student who crams the night before an exam vs the one who studies consistently?

More data over time trumps brute force cramming.

DeepMind tested this.

Their smaller 70 billion parameter model Chinchilla, trained on way more data, smoked those brainy behemoths on language tasks!

So maybe huge models compensate for lack of learning.

With more study time, a "medium" brain could do just as well or better.

DeepMind's research suggests smaller, diligently trained models are more efficient at language understanding than massive parameter AIs.

Their 540 billion parameter model PaLM did top Chinchilla in some areas - but mainly by throwing more resources at it.

In the future, a tortoise-like model trained slowly but thoroughly may outpace these hares who rely more on raw size.

Sweat and Steady Beats Sheer Scale.

At the end of the day, training still trumps size for advanced language tasks.

As AI progresses, striking the right balance of parameters vs data will be key.

Rather than just building the biggest brain possible, we need to ensure AI also gets the education it needs to reach its full potential.