Calculating GPT-2's Inference Speedups

p1esk · 1 hour ago

Good post, thank you!

On an A100 80GB we get 312 teraflops per second of float16 compute and 1.5 TB/s of memory bandwidth, and this ratio comes out to roughly 208 tokens.

Few thoughts:

1. One token != one byte

2. Your prompt ("Edgar Allan Poe is a”) is short (<<300 tokens)

3. Both flops and memory bandwidth for A100 are theoretical maximums. Reality is usually very different and is workload dependent.

reply