I much prefer Claude over ChatGPT, based on my experience using both extensively. Claude understands me significantly better and seems to "know" my intentions with much greater ease. For example, when I request the full file, it provides it without any issues or unnecessary reiterations (ChatGPT fails after me repeatedly instructing it to), often confirming my request with a brief summary beforehand, but nothing more. Additionally, Claude frequently asks clarifying questions to better understand my goals, something I have noticed ChatGPT never did. I have found it quite amazing that it does that.
So... as long as this money helps them improve their LLM even more, I am all up for it.
My main issue is quickly being rate-limited in relatively long chats, making me wait 4 hours despite having a subscription for Pro. Recently I have noticed some other related issues, too. More money could help with these issues, too.
To the developers: keep up the excellent work and may you continue striving for improvement. I feel like ChatGPT is worse now than it was half a year ago, I hope this will not happen to Claude.
I'm not sure which part in the chain is responsible, but the Kagi Assistant got extremely testy with me when (a) I was using Claude for its engine (hold that thought) and (b) I asked the Assistant how much it changed its approach when I changed to ChatGPT, etc. (Kagi Assistant can access different models, but I have no idea how it works.) The Assistant insisted, indignantly, that it was completely separate from Claude. It refused to describe how it used the various engines.
I politely explained that the Assistant interface allowed selecting from these engines and it became apologetic and said it couldn't give me more information but understood why I was asking.
Peculiar, but, when using Claude, entirely convincing.
In other words it has no idea that you changed models. There's no meta data telling it this.
That said Poe handles it differently and tells the model when another model said something, but oddly enough doesn't tell the current model what it's name is. On Poe when you switch models the AI sees this:
~~
Aside from you and me, there is another person: Claude-3.5-Sonnet.
I said, "Hello!"
Claude-3.5-Sonnett said, "Hi there how can I help you??
"
I said, "I just changed your model how do you feel?"
So, I have a custom prompt I use with GPT that I found here a year or so ago. One of the custom prompt instructions was something along the lines of being more direct when it does not know something. Since then, I have not had that problem, and have even managed to get just "no" or "I don't know" as an answer.
Turns out it's just human psychology sans embodied concerns: metabolic, hormonal, emotional, socioeconomic, sociopolitical or anything to do with self-actualization.
I've started to notice that GPT-* vs. Claude is quite domain (and even subdomain) specific.
For programming, when using languages like C, python, ruby, C#, and JS, both seemed fairly comparable to me. However, I was astounded at how awful Claude was at Swift. Most of what I would get from Claude wouldn't even compile, contained standard library methods that did not exist, and so on. For whatever reason, GPT is night and day better in this regard.
In fact, I found GPT to be the best resource for less common languages like Applescript. Of course, GPT is not always correct on the first `n` number of tries, but with enough back-and-forth debugging, GPT really has pulled through for me.
I've also found GPT to be better at math and grammar, but only the more advanced models like O1-preview. I do agree with you too that Claude is better in a conversational sense. I have found it to be more empathetic and personable than GPT.
That seems highly likely given Sam Friedman's extensive reputation across multiple companies as being abusive, a compulsive liar, and willing to outright do blatantly illegal things like using a celebrity's voice and then, well...lie about it.
For long chats, I suggest exporting any artifacts, asking Claude to summarize the chat and put the artifacts and summarization in a project. There's no need to stuff Claude's context windows, especially if you tend to ask a lot of explanation-type questions like I do.
I've also read some people get around rate limits using the API through OpenRouter, and I'm sure you could hook a document store around that easily, but the Claude UI is low-friction
I just use the API (well, via Openrouter) together with custom frontends like Open WebUI. No rate limiting issues then, and I can super easily switch models even in an existing conversation. Though I guess I do miss a few bells & whistles from the proprietary chat interfaces.
Speaking of ChatGPT getting worse over time, it would be interesting to see ChatGPT be benchmarked continuously to see how it performs over time (and the results published somewhere publically).
They certainly need the money. The Pro service has been running in limited mode all week due to being over capacity. It defaults to “concise” mode during high capacity but Pro users can select to put it back into “Full Response.” But I can tell the quality drops even when you do that, and it fails and brings up error messages more commonly. They don’t have enough compute to go around.
in the beginning i was agitated by Concise and would move it back manually. But then I actually tried it, I asked for SQL and it gave me back SQL and 1-2 sentences at most
Regular mode gives SQL and entire paragraphs before and after it. Not even helpful paragraphs, just rambling about nothing and suggesting what my next prompt should be
Now I love concise mode, it doesn't skimp on the meat, just the fluff. Now my problem is, concise only shows up during load. Right now I can't choose it even if i wanted to
Oh you are asking for a 2 line change? Here is the whole file we have been working on with a preamble and closing remarks, enjoy checking to see if I actually made the change I am referring to in my closing remarks and my condolences if our files have diverged.
You know the craziest thing I’ve seen ChatGPT do is claim to have made a change to my terraform code acting all “ohh here is some changes to reflect all the things you commented on” and all it did was change the comments.
It’s very bizarre when it rewrites the exact same code a second or third time and for some reason decides to change the comments. The comments will have the same meaning but will be slightly different wording. I think this behavior is an interesting window into how large language models work. For whatever reason, despite unchanging repetition, the context window changed just enough it output a statistically similar comment at that juncture. Like all the rest of the code it wrote out was statistically pointing the exact same way but there was just enough variance in how to write the comment it went down a different path in its neural network. And then when it was done with that path it went right back down the “straight line” for the code part.
I don't think the context window has to change for that to happen. The LLMs don't just pick the most likely next token, it's sampled from the distribution of possible tokens so on repeat runs you can get different results.
I’ve been using the API for a few weeks and routinely get 529 overloaded messages. I wasn’t sure if that’s always been the case but it certainly makes it unsuitable for production workloads because it will last hours at a time.
Hopefully they can add the capacity needed because it’s a lot better than GPT-4o for my intended use case.
Sonnet is better than 4o for virtually all use cases.
The only reason I still use OpenAI's API and chatbot service is o1-preview. o1 is like magic. Everything Sonnet and 4o do poorly, o1 solves like a piece of cake. Architecting, bug fixing, planning, refactoring, o1 has never let me know on any 'hard' task.
A nice combo is have o1 guiding Sonnet. I ask o1 to come up with a solution and explanation, then simply feed its response into Sonnet to execute. That running on Aider really feels like futuristic stuff.
I alluded to this in another comment, but I have 4o to be better than Sonnet in Swift, Obj-C, and Applescript. In my experiences, Claude is worse than useless with those three languages when compared to GPT. Everything else, I'd say the differences haven't been too extreme. Though, o1-preview absolutely smokes both in my experiences too, but it isn't hard for me to hit it's rate limit either.
Exactly my experience as well. Like Sonnet can help me in 90% of the cases but there are some specific edge cases where it struggles that o1 can solve in an instant. I kinda hate it because of having to pay for both of them.
Interesting. I also find it frustrating to be rate limited/have responses fail when I’m paying for the product, but I’ve actually found that the “concise” mode answers have less fluff and make for faster back and forth. I’ve once or twice looked for the concise mode selector when the load wasn’t high.
I've had it refuse to generate a long text response (I was trying to concise a 300kb documentation to 20-30kb to be able to put it in the project's context), and every time I asked it replied "How should structure the results ?", "Shall I go ahead with writing the artifacts now ?", etc.
It wasn't even during the over-capacity event I don't think, and I'm a pro user.
Hate to be that guy, but did you tell it up front not to ask? And, of course, in a long-running conversation it's important not to leave such questions in the context.
The weird thing is that when I tried to tell it to distill it to a much smaller message it had no problem outputting it without any followup questions. But when I edited my message to ask it to generate a larger response, then I got stuck in the loop of it asking if I was really sure or telling me that `I apologize, but I noticed this request would result in a very large response.`
It sparks me as odd, because I've had quite a few times where it would generate me a response over multiple messages (since it was hitting its max message length) without any second-guessing or issue.
Neither does OAI. Their service has been struggling for more than a week now. I guess everyone is scrambling after the new qwen models dropped and matched the current state of the art with open weights.
I am a paying customer with credits and the API endpoints rate-limited me to the point where it's actually unusable as a coding assistant. I use a VS Code extension and it just bailed out in the middle of a migration. I had to revert everything it changed and that was not a pleasant experience, sadly.
When working with AI coding tools commit early, commit often becomes essential advice. I like that aider makes every change its own commit. I can always manicure the commit history later, I'd rather not lose anything when the AI can make destructive changes to code.
I can recommend https://github.com/tkellogg/dura for making auto-commits without polluting main branch history, if your tool doesn't support it natively
Could you explain more on how to do this? e.g if I am using the Claude API in my service, how would you suggest I go about setting up and controlling my own inference endpoint?
You're right, but that's also subject to compute costs and time value of money. The calculus is different for companies trying to exploit language models in some way, and different for individuals like me who have to feed the family before splurging for a new GPU, or setting up servers in the cloud, when I can get better value by paying OpenAI or Claude a few dollars and use their SOTA models until those dollars run out.
FWIW, I am a strong supporter of local models, and play with them often. It's just that for practical use, the models I can run locally (RTX 4070 TI) mostly suck, and the models I could run in the cloud don't seem worth the effort (and cost).
Their shitty UI is also not doing them any infrastructure favors, during load it'll straight up write 90% of an answer, and then suddenly cancel and delete the whole thing, so you have to start over and waste time generating the entire answer again instead of just continuing for a few more sentences. It's like a DDOS attack where everyone gets preempted and immediately starts refreshing.
Yes! It's infuriating when Claude stops generating mid response and deletes the whole thread/conversation. Not only you lose what it has generated so far, which would've been at least somewhat useful, but you also lose the prompt you wrote, which could've taken you some effort to write.
> But I can tell the quality drops even when you do that
Dario said in a recent interview that they never switch to a lower quality model in terms of something with different parameters during times of load. But he left room for interpretation on whether that means they could still use quantization or sparsity. And then additionally, his answer wasn't clear enough to know whether or not they use a lower depth of beam search or other cheaper sampling techniques.
He said the only time you might get a different model itself is when they are A-B testing just before a new announced release.
And I think he clarified this all applied to the webui and not just the API.
Hmmm... I wonder if this is why some of the results I've gotten over the past few days have been pretty bad. It's easy to dismiss poor results on LLM quality variance from prompt to prompt vs. something like this where the quality is actively degraded without notification. I can't say this is in fact what I'm experience, but it was noticeable enough I'm going to check.
Never occurred to me that the response changes based on load. I’ve definitely noticed it seems smarter at times. Makes evaluating results nearly impossible.
Unrelated. Inference doesn't run in sync with the wall clock; it takes whatever it takes. The issue is more like telling a room of support workers they are free to half-ass the work if there's too many calls, so they don't reject any until even half-assing doesn't lighten the load enough.
This is one reason closed models suck. You can't tell if the bad responses are due to something you are doing, or if the company you are paying to generate the responses is cutting corners and looking for efficiencies, eg by reducing the number of bits. It is a black box.
Recently I started wondering about the quality of ChatGPT. A couple of instances I was like: "hmm, I’m not impressed at all by this answer, I better google it myself!"
Recently I asked 4o to ‘try again’ when it failed to respond fully, it started telling me about some song called Try Again. It seems to lose context a lot in the conversations now.
Curious if anyone knows the logistics of these cloud provider/AI company deals. In this case, it seems like the terms of the deal mean that Anthropic ends up spending most of the investment on AWS to pay for training.
Does anthropic basically get at cost pricing on AWS? If Amazon has any margin on their pricing, it seems like this $4B investment ends up costing them a lot less, and this is a nice way to turn a cap ex investment into AWS revenue.
This was the brilliance of the original MSFT investment into OpenAI. It was an investment in Azure scaling its AI training infra, but roundabout through a massive customer (exactly what you’d want as a design partner) and getting equity.
I’m sure Anthropic negotiated a great deal on their largest cost center, while Amazon gets a huge customer to build out their system with.
That’s honestly one of the hardest things in engineering — identifying not just a customer to drive requirements, but a knowledgeable customer who can drive good requirements that work for a broader user base and can drive further expansion. Anthropic seems ideal for that, plus they act as a service/API provider on AWS.
This explanation makes no sense, I could be AWS' biggest customer if they wanted to pay me for it. Something a little closer could be that the big tech companies wanted to acquire outside LLMs, not quite realizing that spending $1B on training only puts you $1B ahead.
Anthropic is getting $4B in investment in a year where their revenue was about $850M. Even if Amazon had bought them outright for that much, they would not be ahead. The fact that everybody keeps repeating the claim that Amazon is "making money" makes this appear like some kind of scam.
Second, the investment isn't a loan that they need to repay. They are getting equity.
Third, Anthropic is exclusively using AWS to train its models. Which, yes, means if AWS gives them $4B and it costs them $500M/year to pay for AWS services then after 8 years, the cash is a wash. However this ignores the second point.
Fourth, there is brand association for someone who wanted to run their own single tenant instance of Claude whereby you would say "well they train Claude on AWS, so that must be the best place to run it for our <insert Enterprise org>" similar to OpenAI on Azure.
Fifth, raising money is a signaling exercise to larger markets who want to know "will this company exist in 5 years?"
Sixth, AWS doesn't have its own LLM (relative to Meta, MS, etc.). The market will associate Claude with Amazon now.
This is a way to keep the money printer called AWS Bedrock going and going and going. Don't underestimate the behemoth enterprises in the AWS rolodex who are all but assured to use that service for the next 5+ years at high volume.
Last I checked, AWS reserve pricing for one year of an 8x H100 pod costs more than just buying the pod yourself (with tens of thousands left over per server for the NVIDIA enterprise license and to hire people to manage them). On demand pricing is even worse.
This is essentially money that they would have spent to build out their cloud anyway, except now they also get equity in Anthropic. Whether or not Anthropic survives, AWS gets to keep all of those expensive GPUs and sell them to other customers so their medium/long term opportunity cost is small. Even if the deal includes cheaper rates the hardware still amortizes over 2-3 years, and cloud providers are running plenty of 5+ year old GPUs so there's lots of money to be made in the long tail (as long as ML demand keeps up).
They're not making money yet because there's the $4 billion opportunity cost, but even if their equity in Anthropic drops to zero, they're probably still going to make a profit on the deal. If the equity is worth something, they'll make significantly more money than they could have renting servers. Throw financial engineering on top of that, and they may come out far ahead regardless of what happens to Anthropic: Schedule K capital equipment amortizations are treated differently from investments and AFAICT they can double dip since Anthropic is going to spend most of it on cloud (IANAL). That's likely why this seems to be cash investment instead of in-kind credits.
I think that’s what people mean when they say Amazon is making money off the deal. It’s not an all or nothing VC investment that requires a 2-3x exit to be profitable because the money just goes back to AWS’s balance sheet.
These sort of investments usually also contain licensing deals.
Amazon probably gets Anthropic models they can resell “for free”. The 850M revenue is Anthropic’s, but there is incremental additional revenue to AWS’s hosted model services. AWS was already doing lots of things with Anthropic models, and this may alter the terms more in amazons favor.
Are they actually making money? I don’t know, investments aren’t usually profitable on day one. Is this an opportunity for more AWS revenue in the future? Probably.
AI needs to be propped up because the bug tech cloud providers they depend on need AI to be a thing to justify their valuations. Tech is going through a bit of a slump where all things being hyped a few years ago sort of died down (crypto? VR? Voice assistants? Metaverse?). Nobody gets very hyped about any of those nowadays. I am probably forgetting a couple of hyped things that fizzled out over the years.
Case in point, as much as I despise Apple, they are not all-in the AI bandwagon because it does nothing for them.
I am not privy to specific details, but in general there is a difference between investment and partnership. If it's literally an investment, it can either be in cash or in kind, where in kind can be like what MSFT did for OpenAI, essentially giving them unlimited-ish ($10b) Azure credits for training ... but there was quid pro quo where MSFT in turn agreed to embed/extend OpenAI in Azure services.
If it's a partnership investment, there may be both money & in-kind components, but the money won't be in the context of fractional ownership. Rather it would be partner development funds of various flavors, which are usually tied to consumption commits as well as GTM targets.
Sometimes in reading press releases or third party articles it's difficult to determine exactly what kind of relationship the ISV has with the CSP.
There's also another angle. During the call with Lex last week, Dario seemed to imply that future models would run on amazon chips from Annapurna Labs (Amazon's 2015 fabless purchase). Amazon is all about the flywheel + picks and shovels and I, personally, see this as the endgame. Create demand for your hardware to reduce the per unit cost and speed up the dev cycle. Add the AWS interplay and it's a money printing machine.
Supermicro is currently under DOJ investigation for similar schemes to this. The legality of it probably depends on the accounting, and how revenue is recognized, etc.
It certainly looks sketchy. But I’m sure there’s a way to do it legitimately if their accountants and lawyers are careful about it…
Anthropic should double down on the strategy of being the better code generator. No I don't need an AI agent to call the restaurant for me. Win the developers over and the rest will follow.
> Win the developers over and the rest will follow.
Will they really? Anecdotal evidence, but nobody I know in real life knows about Claude (other than it's an ordinary first name). And they all use or at least know about ChatGPT. None of them are software engineers of course. But the corporate deciders aren't software engineers either.
Most people I know in real life have certainly heard of ChatGPT but don't pay for it.
I think someone enthusiastic enough to pay for the subscription is more likely to be willing to try a rival service, but that's not most people.
Usually when these services are ready to grow they offer a month or more free to try, at least that's what Google has been doing with their Gemini bundle.
I'm actually baffled by the number of people I've met who pay for such services, when I can't tell the difference between the models available within one service, or between one service or the other (at least not consistently).
I do use them everyday, but there's no way I'd pay $20/month for something like that as long as I can easily jump from one to the other. There's no guarantee that my premium account on $X is or will remain better than a free account on $Y, so committing to anything seems pointless.
I do wonder though: several services started adding "memories" (chunks of information retained from previous interactions), making future interactions more relevant. Some users are very careful about what they feed recommendation algorithms to ensure they keep enjoying the content they get (another behavior I'm was surprised by), so maybe they also value this personalization enough to focus on one specific LLM service.
The amount of free chats you get per day is way too limiting for anyone who uses LLMs as an important tool in their day job.
20 USD a month to make me between 1.5x and 4x more productive in one of the main tasks of my job really is a bargain, considering that 20 USD is very small fraction of my salary.
If I didn't pay, I'd be forced to wait, or create many accounts and constantly switch between them, or be constantly copy-pasting code from one service to the other.
And when it comes to coding, I've found Claude 3.5 Sonnet better than ChatGPT.
Consumers don't have to consciously choose Claude, just like most people don't know about Linux. But if they use an Android phone or ever use any web services they are using Linux.
If they ever do Apple and Google will offer it as a service built into your phone .
For example, you could say ok Google call that restaurant me and My girlfriend had our first date at 5 years ago, set up something nice so I can propose. And I guess Google Gemini ( or whatever it's called at this point), Will hire a band, some photographers, and maybe even a therapist just in case it doesn't work out.
All of this will be done seamlessly.
But I don't imagine any normal person will pay 20 or $30 a month for a standalone service doing this. As is it's going to be really hard to compete against GitHub Copilot they effectively block others from scrapping GitHub.
But why hire a therapist when Gemini is there to talk to?
Re: Github Copilot: IME it's already behind. I finally gave Cursor a try after seeing it brought up so often, and its suggestions and refactors are leagues ahead of what Copilot can do.
I use Claude Pro paid version every day, but not for coding. I used to be a software engineer, but no longer.
I tried OpenAI in the past, but I did not enjoy it. I do not like Sam Altman.
My use cases:
Generating a business plan, podcast content, marketing strategies, sales scripts, financial analyses, canned responses, and project plans. I also use it for general brainstorming, legal document review, and so many other things. It really feels like a super-assistant.
Claude has been spectacular about 98% of the time. Every so often it will refuse to perform an action - most recently it was helping me research LLC and trademark registrations, combined with social media handles (and some deviations) and web URL availability. It would generate spectacular reports that would have taken me hours to research, in minutes. And then Claude decided that it couldn't do that sort of thing, until it could the next day. Very strange.
I have given Gemini (free), OpenAI (free and Paid), Copilot (free), Perplexity (free) a shot, and I keep coming back to Claude. Actually, Copilot was a pretty decent experience, but felt the guardrails too often. I do like that Microsoft gives access to Dall-E image generation at no cost (or maybe it is "free" with my O365 account?). That has been helpful in creating simple logo concepts and wireframes.
I run into AI with Atlassian on the daily, but it sucks. Their Confluence AI tool is absolute garbage and needs to be put down. I've tried AI tools that Wix, Squarespace, and Mira provide. Those were all semi-decent experiences. And I just paid for X Premium so I can give Grok a shot. My friend really likes it, but I don't love the idea of having to open an ultra-distracting app to access it.
I'm hoping some day to be like the wizards on here who connect AI to all sorts of "things" in their workflows. Maybe I need to learn how to use something like Zapier? If I have to use OpenAI with Zapier, I will.
I also prefer Claude after trying the same options as you.
That said I can't yet confidently speak to exactly why I prefer Claude. Sometimes I do think the responses are better than any model on ChatGPT. Other times I am very impressed with chatGPT's responses. I haven't done a lot of testing on each with identical prompt sequences.
One thing I can say for certainty is that Claude's UI blows chatGPT's out of the water. Much more pleasant to use and I really like Projects and Artifacts. It might be this alone that has me biased towards Claude. It makes me think that UI and additional functionality is going to play a much larger role in determining the ultimate winner of the LLM wars than current discussions give it credit for.
I have been flogging the hell out of copilot for equities research and to teach me about finance topics. I just bark orders and it pumps out an analysis. This is usually so much work, even if you have a service like finviz, Fidelity or another paid service.
Thirty seconds to compare 10yrs of 10ks. Good times.
In my experience*, for coding, Sonnet is miles above any model by OpenAI, as well as Gemini. They're all far from perfect, but Sonnet actually "gets" what you're asking, and tries to help when it fails, while the others wander around and often produce dismal code.
* Said experience is mostly via OpenRouter, so it may not reflect the absolute latest developments of the models. But there at least, the difference is huge.
I also don't understand the idea of voice mode, or agent controller computer. Maybe it is cool to see as a tech demo, but all I really want is good quality, at reasonable price for the LLM service
I think voice mode makes significantly more sense when you consider people commuting by car by themselves every day.
Personally I don't (and I'd never talk to an LLM on public transit or in the office), but almost every time I do drive somewhere, I find myself wishing for a smarter voice-controlled assistant that would allow me to achieve some goal or just look up some trivia without ever having to look at a screen (phone or otherwise).
This is the direction I am building my personal LLM based scripts. I don’t really know any python but Claude has written python scripts that e.g. write a document iteratively using LLMs. Next step will be to use voice and autogpt to do things that I would rather dictate to someone. E.g. find email from x => write reply => edit => send
Much more directed/almost micro managing but it’s still quicker than me clicking around (in theory).
Edit:
I’m interested to explore how much better voice is as an input (vs writing as an input)
To me, reading outputs is much more effective than listening to outputs.
Anthropic gets a lot of it's business via AWS Bedrock so it's fair to say that Amazon probably has reasonable insight into how the Claude usage is growing that makes them confident in this investment
They are also confident in the investment because they know that all the money is going to come right back to them in the short term (via AWS spending) whether or not Anthropic actually survives in the long term.
I had to switch from Pro to Teams plan and pay 150 USD for 5 accounts because the Pro plan has gotten unusable. It will allow me to ask a dozen or so questions and then will block me for hours because of „high capacity.“ I don’t need five accounts, one for 40 USD would be totally fine if it would allow me to work uninterrupted for a couple of hours.
All in all Claude is magic. It feels like having ten assistants at my fingertip. And for that even 100 USD is worth paying.
I just start new chats whenever the chat gets long (in terms of number of tokens). It's kind of a pain to have to form a prompt that encapsulates enough context, but it has prevented me from hitting the Pro limit. Also, I include more questions and detail in each prompt.
Why does that work? Claude includes the entire chat with each new prompt you submit [0], and the limit is based on the number of tokens you've submitted. After not too many prompts, there can be 10k+ tokens in the chat (which are all submitted in each new prompt, quickly advancing towards the limit).
(I also have a chatGPT sub and I use that for many questions, especially now that it includes web search capabilities)
> It's kind of a pain to have to form a prompt that encapsulates enough context, but it has prevented me from hitting the Pro limit. Also, I include more questions and detail in each prompt.
i get it to provide a prompt to start the new chat. i sometimes wish there was a button for it bc it's such a big part of my workflow
The status pages of OpenAI and Anthropic are in stark contrast and that mirrors my experience. Love Anthropic for code and its Projects feature, but OpenAI is still way ahead on voice and reliability.
This makes sense in the grand scheme of things.
Anthropic used to be in the Google camp, but DeepMind seems to have picked up speed lately, with new “Experimental” Gemini Models beating everyone, while AWS doesn't have anything on the cutting edge of AI.
Hopefully this helps Anthropic to fix their abysmal rate limits.
> Amazon Web Services will also become Anthropic’s “primary cloud and training partner,” according to a blog post. From now on, Anthropic will use AWS Trainium and Inferentia chips to train and deploy its largest AI models.
I suspect that's worth more than $4B in the long term? I'm not familiar with the costs, though.
I’ve been impressed with the AI assisted tooling for the various monitoring systems in Azure at least. Of course this is mainly because those tools are so ridiculously hard to use that I basically can’t for a lot of things. The AI does it impressively well though.
I’d assume there is a big benefit to having AI assisted resource generation for cloud vendors. Our developers often have to mess around with things that we really, really, shouldn’t in Azure because operations lacks the resources and knowledge. Technically we’ve outsourced it, but most requests take 3 months and get done wrong… if an AI could generate our network settings from a global policy that would be excellent. Hell if it could handle all our resource generation they would be so much useless time wasted because our organisation views “IT” as HRs uncharming cost center cousin.
I've been playing with Alibaba's Qwen 2.5 model and I've had it claim to be Claude. (Though it usually claims to be Llama, and it seems to think it's a literal llama, i.e. it identifies as an animal, "among other things".)
I love Claude 3.5 sonnet and their UI is top notch especially for coding, recently though they have been facing capacity issues especially during weekdays correlating with working hours. Have tried Qwen2.5 coder 32B and it's very good and close to Claude 3.5 in my coding cases.
Microsoft -> OpenAI (& Inflection AI)
Google -> Gemini (and a bit of Anthropic)
Amazon -> Anthropic
Meta -> Llama
Is big tech good for the startup ecosystem, or are they monopolies eating everything (or both?). To be fair to Google and Meta they came up with a lot of the stuff in the first place, and aren't just buying the competition.
There wouldn't be an LLM startup ecosystem without big tech.
Notable contributions: Nvidia for, well, (gestures at everything), Google for discovering (inventing?) transformers, being early advocates of ML, authoring tensorflow, Meta for Torch and open sourcing Llama, Microsoft for investing billions in OpenAI early on and keeping the hype alive. The last one is a reach, I'm sure Microsoft Research did some cool things I'm unaware of.
You might be right, we don’t know how an alternative reality would have played out though to say if this is the only way (and fastest) way we could have got here.
AWS Trainium is a machine learning chip designed by AWS to accelerate training deep learning models. AWS Bedrock is a fully managed service that allows developers to build and scale generative AI applications using foundation models from various providers.
Trainium == Silicon (looks like Anthropic has agreed to use it)
Bedrock == AWS Service for LLMs behind APIs (you can use Anthropic models through AWS here)
Anthropic will be the winner here, zero doubts in my mind. They have leapfrogged head and shoulders above OpenAI over the last year. Who'd have thought a business predicated entirely on keeping the ~1000 people on earth qualified to work on this stuff happy would go downhill once they failed at that.
I'
m not sure how they make it back. The guardrails in place are extremely strict. The only people who seem to use it are a subset of developers who are unhappy with OpenAI. With Bard popping up free everywhere taking away much of the general user crowd and OpenAI offering the mini model always free and limited image generation / expensive model. Then you have to do it yourself crowd with llama. What is their target market? Governments? Amazon companies?There free their offers 10 queries and half of them need to be used to get around filters I don't see this positioned well for general customers.
The Guardrails on Claude Sonnet 3.5 API are not stricter than Openai's guardrails in my experience. More specifically, if you access the models via API or third party services like Poe or Perplexity the guardrails are not stricter than GPT4o. I've never subscribed to Claude.ai so can't comment on that.
I have no experience with Claud.ai vs ChatGPT but it's clear the underlying model has no issue with guardrails and this is simply an easily tweaked developer setting if you are correct that they are stricter on Claude.ai.
(The old Claude 2.1 was hilariously unwilling to follow reasonable user instructions due to "ethics" but they've come a long way since then.)
> The Guardrails on Claude Sonnet 3.5 API are not stricter than Openai’s guardrails in my experience.
Both Gemini and Claude (via the API) have substantially tighter guardrails around recitation (producing output matching data from their training set) than OpenAI, which I ran into when testing an image text-extraction-and-document-formatting toolchain against all three.
Both Claude and Gemini gave refusals on text extraction from image documents (not available publicly anywhere I can find as text) from a CIA FOIA release
I just asked GPT4o to recognize a cartoon character (I accessed it via Perplexity) and it told me it isn't able to do that, while Claude Sonnet happily identified the character, so this might vary by use case or even by prompt.
I've had a situation where Claude (Sonnet 3.5) refused to translate song lyrics because of safety/copyright bullshit. It worked in a new chat where I mentioned that it was a pre 1900s poem.
It has held this position since at least June. The Aider LLM leaderboards [1] have the Sonnet 3.5 June version beating 4o handily. Only o1-preview beat it narrowly, but IIRC at much higher costs. Sonnet 3.5 October has taken the lead again by a wide margin.
Anecdotally, Claude seems to hallucinate more during certain hours. It's amusing to watch, almost like your dog that gets too bored and stops responding to your commands - you say "sit" and he looks at you, tilts his head, looks straight up at you, almost like saying "I know what you're saying..." but then decides to run to another room and bring his toy.
And you'd be wondering: "darn, where's that toughest, most obidient and smart Belgian malinois that just a few hour ago was ready to take down a Bin Laden?"
Talking of anecdotal, 4o with canvas, which is normally excellent, tends to give up around a certain context length, and you have to copy and paste what you have into a new window to get it to make edits
With Claude on Bedrock I can use LLMs in production without sending customer data to the US. And if you're already on AWS it's super easy to onboard wrt. auth and billing and compliance.
Claude api use is already as high as openai. I believe that market will grow far more over time than chat as AI gets embedded in more of the applications we already use.
I am in Operations. I use it (and pay for it) because the free version seemed to work best for me compared to Perplexity (which had been my go-to) and ChatGPT/OpenAI.
Government alone could be huge, with this recent nonsense about the military funding a “Manhattan project for AI” and the recently announced Pentagon contracts.
Some of these investments sound big in absolute terms..
However not that big considering the scale of the investor AND that many of these investors are also vendors.
MSFT/AMZN/NVDA investing in AI firms that then use their clouds/chips/whatever is an interesting circular investment.
Can someone with familiarity in rounds close to this size speak to their terms?
For instance: i imagine a significant part of this will be “paid” as AWS credits and is not going to be reflected as a balance in a bank account transfer.
Yes, that is the case. It is largely 4B in capex investment, I’d imagine 10% or less is cash. One would think nvidia could get much better terms investing its gpu (assuming they can get it into a working cluster). Instead it’s nvidia gets cash for gpu hardware, that hardware gets put into a data center and AWS invests their hardware as credits for equity instead of cash. And because AWS has already built out their data center infra they can get a better deal than nvidia making the play because nvidia has to rebuild an entire data center infra from scratch (in addition to designing gpu etc).
Now if AWS or gcp can crack gpu compute better than nvidia for training and hosting, then they can basically cut out nvidia and so essentially they get gpu at cost (vs whatever markup they pay to nvidia).
Because essentially whatever return AWS will make from Anthropic will be modulated by the premiums paid to nvidia to invest and also the cost of operating a data center for Anthropic.
But thankfully all of that gets mediated on paper because valuation is more speculative than the returns on nvidia hardware (which will be known to the cent by AWS given its some math of hourly rate and utilization which they have a good idea of)
Same as the big tech companies, probably make all of their products worse in service to advertising. AI-generated advertising prompted by personal data could be extremely good at getting people to buy things if tuned appropriately.
Well. If you're using AI instead of a search engine, they could make the AI respond with product placement more or less subtle.
But if you're using AI for example to generate code as an aid in programming, how's that going to work? Or any other generative thing, like making images, 3d models, music, articles or documents... I can't imagine inserting ads into those would not destroy the usefulness instantly.
My guess is they don't know themselves. The plan is to get market shre now, and figure it out later. Which may or may not turn out well.
Cost of inference will tend to the the same as cost of a Google search. It is infra that will come down to negligible and almost free. Then as others have said it will tend to freemium (pay to have no ads). And additional value added services as they continue to evolve up the food chain (ai powered sales, marketing, etc)
I'm working with models and the costs are ridiculous. $7000 card and 800 watts later for my small projects and I can't imagine how they can make money in the next 5 to 10 years. I need to do more research on hardware approaching that reduces costs and power consumption. I just started experimenting with llama.cpp and I'm mildly impressed.
relatedly: is claude3.5-haiku being delivered above their cost, after they quadrupled the price?
Though it wouldn't ensure profitability since they're spending so much on training. I'm sure with inference-use growing, they're hoping that eventually total_expenses(inference) grows to be much much larger than total_expenses(training)
And you think MSFT isn't 95% copycat? Teams is Slack clone. Azure is AWS clone. SurfaceBook (remember those?) Macbook clone. Edge is Chrome clone. Bing is Google clone. Even VSCode was an Atom/Electron fork and Windows Subsystem for Linux...
Are you a paying customer? I exclusively use their best model and while I get warnings (stuff about longer chats leading to more limit usage), I've never been kicked out.
The only thing is that they've recently started defaulting to Concise to cut costs, which is fine with me.
Concise mode is honestly better anyway. I’d prefer it always be in that mode.
But that being said I bump into hard limits far more often than I do with ChatGPT. Even if I keep chats short like it constantly suggests, eventually it cuts me off.
Anecdotal experience, but as far as I've played around with them, Claude's models have given me a better impression. I would much rather have great responses with lower availability than mediocre responses available all the time.
As someone who doesn't really follow the LLM space closely, I have been consistently turning to Anthropic when I want to use an LLM (usually to work through coding problems)
Beside Sonnet impressing me, I like Anthropic because there's less of an "icky" factor compared to OpenAI or even Google. I don't know how much better Anthropic actually is, but I don't think I'm the only one who chooses based on my perception of the company's values and social responsibility.
Yea, even if they're practically as bad, there's value in not having someone like Altman who's out there saying things about how many jobs he's excited to make obsolete and how much of the creative work of the world is worthless.
I mean, he's certainly acting as if he's entitled to train on all of it for free as long as it's not made by a big enough company that may be able to stop/sue him. And then feels entitled to complain about artists tainting the training data with tools.
He has a very "wealth makes right" approach to the value of creative work.
> Last year, Google committed to invest $2 billion in Anthropic, after previously confirming it had taken a 10% stake in the startup alongside a large cloud contract between the two companies.
Well, there you go. These companies are always closer than they seem at first glance, and my preference for Anthropic may just be patting myself on the back.
Personally, I find companies with names like "Anthropic" to be inherently icky too. Anthropic means "human," and if a company must remind me it is made of/by/for humans, it always feels less so. E.g.
The Browser Company of New York is a group of friendly humans...
Second, generative AI is machine generated; if there's any "making" of the training content, Anthropic didn't do it. Kind of like how OpenAI isn't open, the name doesn't match the product.
> Anthropic means "human," and if a company must remind me it is made of/by/for humans
Why do you think that that's their intended reading? I had assumed the name was implying "we're going to be an AGI company eventually; we want to make AI that acts like a human."
> if there's any "making" of the training content, Anthropic didn't do it
This is incorrect. First-gen LLM base models were made largely of raw Internet text corpus, but since then all the improvements have been from:
• careful training data curation, using data-science tools (or LLMs!) to scan the training-data corpus for various kinds of noise or bias, and prune it out — this is "making" in the sense of "making a cut of a movie";
• synthesis of training data using existing LLMs, with careful prompting, and non-ML pre/post-processing steps — this is "making" in the sense of "making a song on a synthesizer";
• Reinforcement Learning from Human Feedback (RLHF) — this is "making" in the sense of "noticing when the model is being dumb in practice" [from explicit feedback UX, async sentiment analysis of user responses in chat conversations, etc] and then converting those into weights on existing training data + additional synthesized "don't do this" training data.
I read Anthropic as eluding to the Anthropic Principle as well as the doomsday argument and related memeplex[0] mixed with human-centric or about humans. Lovely naming IMHO.
Funny, I use Mistral because it has 'more" of that same factor, even in the name!
They're the only company who doesn't lobotomize/censor their model in the RLHF/DPO/related phase. It's telling that they, along with huggingface, are from le france - a place with a notably less puritanical culture.
> "This new CASH infusion brings Amazon’s total investment in Anthropic to $8 billion while maintaining the tech giant’s position as a minority investor, Anthropic said."
ps- plenty of people turning a blind eye towards rampant valuation inflation and "big words" statements on deals. Where is the grounding on the same dollars that are used at a grocery store? The whole thing is fodder for instability in a big way IMHO
I look forward to the moment the sunk cost fallacy shows up. "We've invested $20B into this, and nothing yet. Shall we invest $4B more? Maybe it will actually return something this time." That will be fun.
It could be the anthropic models makes bedrock attractive and profitable and more importantly medium term competitive against azure. It seems worth it.
So... as long as this money helps them improve their LLM even more, I am all up for it.
My main issue is quickly being rate-limited in relatively long chats, making me wait 4 hours despite having a subscription for Pro. Recently I have noticed some other related issues, too. More money could help with these issues, too.
To the developers: keep up the excellent work and may you continue striving for improvement. I feel like ChatGPT is worse now than it was half a year ago, I hope this will not happen to Claude.
reply