Artificial intelligence: Performance on knowledge tests vs. training computation

Performance on knowledge tests is measured with the MMLU benchmark, here with 5-shot learning, which gauges a model’s accuracy after receiving only fiveexamples for each task. Training computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations.

Artificial intelligence: Performance on knowledge tests vs. training computation

Interactive visualization requires JavaScript