Hacker News Clone new | comments | show | ask | jobs | submit | github repologin
Calculating GPT-2's Inference Speedups (www.njkumar.com)
2 points by njkumarr 2 hours ago | hide | past | web | 1 comment | favorite





Good post, thank you!

On an A100 80GB we get 312 teraflops per second of float16 compute and 1.5 TB/s of memory bandwidth, and this ratio comes out to roughly 208 tokens.

Few thoughts:

1. One token != one byte

2. Your prompt ("Edgar Allan Poe is a”) is short (<<300 tokens)

3. Both flops and memory bandwidth for A100 are theoretical maximums. Reality is usually very different and is workload dependent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: