On an A100 80GB we get 312 teraflops per second of float16 compute and 1.5 TB/s of memory bandwidth, and this ratio comes out to roughly 208 tokens.
Few thoughts:
1. One token != one byte
2. Your prompt ("Edgar Allan Poe is a”) is short (<<300 tokens)
3. Both flops and memory bandwidth for A100 are theoretical maximums. Reality is usually very different and is workload dependent.
reply
On an A100 80GB we get 312 teraflops per second of float16 compute and 1.5 TB/s of memory bandwidth, and this ratio comes out to roughly 208 tokens.
Few thoughts:
1. One token != one byte
2. Your prompt ("Edgar Allan Poe is a”) is short (<<300 tokens)
3. Both flops and memory bandwidth for A100 are theoretical maximums. Reality is usually very different and is workload dependent.
reply