Might be worth explicitly stating popular/known datapoints you were unable to include or evaluate yet.
For example, the fastest inference provider by multiples as of this writing, Cerebras, is missing. It's popular, so I'm surprised it was missed: https://news.ycombinator.com/item?id=42178761 and makes me wonder if other evaluations are missing.
I hope this is useful. It was created using Sonnet 3.5 + o1 + Cursor.
Let me know if you have any feedback! Thanks.
PS:
It's hard to compare providers' quality because they use different precision at inference. Also, some labs cherry pick the benchmarks they want to report for their models. Medium term goal is to run the evals myself.
For example, the fastest inference provider by multiples as of this writing, Cerebras, is missing. It's popular, so I'm surprised it was missed: https://news.ycombinator.com/item?id=42178761 and makes me wonder if other evaluations are missing.
See also a similar (commercial AFAIK) project: https://artificialanalysis.ai/
reply