Hacker News Clone new | comments | show | ask | jobs | submit | github repologin
Alibaba releases an 'open' challenger to OpenAI's O1 reasoning model (techcrunch.com)
44 points by bn-l 2 hours ago | hide | past | web | 32 comments | favorite





32B is a good choice of size, as it allows running on a 24GB consumer card at ~4 bpw (RTX 3090/4090) while using most of the VRAM. Unlike llama 3.1, which had 8b, 70B (much too big to fit), and 405B.

So western controls on training hardware (hello NVIDIA) seem to have failed. I wonder if there will be any repercussions here.

Or they could be training the models in the states? It’s hard to say since alibaba does R&D in Bellevue as well as Hangzhou.

Most of the papers in machine learning are coming from China. The vast majority.

Most of the open source models on GitHub, too.

China is dominating this field.


Are they making breakthroughs or are they taking what exists and copying/marginally improving?

Bigger discussion from two days ago: https://news.ycombinator.com/item?id=42259184

full list of chinese o1 clones

- QwQ https://huggingface.co/collections/Qwen/qwq-674762b79b75eac0...

- Marco-o1 by AIDC, Alibaba (yes this is a different, less known team also from alibaba) https://huggingface.co/AIDC-AI/Marco-o1

- Skywork-o1 by Kunlun Tech https://huggingface.co/collections/Skywork/skywork-o1-open-6...

- DeepSeek-R1-Lite-Preview https://chat.deepseek.com

- InternThinker preview by Shanghai AI lab https://sso.openxlab.org.cn/login?redirect=https://internlm-...

- k0-math by Moonshot AI https://kimi.moonshot.cn

https://x.com/adinayakup/status/1861908631807017007?s=46

the main ones to watch are QwQ and r1.


Is Alibaba's LLM the "Chinese LLM"?

It would appear to have been a U.S.-only game until now. As Eric Schmidt said in the YouTube lecture (that keeps getting pulled down), LLM's have been a rich-companies game.


you only think that because you haven’t been paying close attention

qwen, deepseek, yi - there have been a number of high quality, open chinese competitors


Thanks. You're right, I am a layman. I may also have been focusing on the "open" LLMs since they seem to get the most talked about on HN.

And AI21, which is Israeli

it’s a good model, but not in the same class as the ones i just named

And UAE's TII with Flacon

that model is undertrained and kinda sucks

Deepmind is in the UK and Mistral is in France?

Alibaba has been pumping out a bunch of useful models for a long time.


What lecture is this?

no, it is the opposite. China had versions of LLMs since before they were widely public. see the LLama family history chart for one example

We are lucky that Alibaba, Meta and Mistral sees some strategic value in public releases. If we it was just one of them, it would be a fragile situation for downstream startups. And they’re even situated in three different countries.


Is their repo / model free of any undisclosed telemetry, ie is it purely weights

Is it even possible to embed telemetry into a model itself, as opposed to the runtime environment / program (e.g. Ollama)?

I would be disinclined to believe that to be possible, but if anyone knows otherwise, please share.


That's literally why the safetensor format exists. The previous pickle (ckpt) format allowed for arbitrary code execution.

I can't wait for Ebay to release theirs

Not Amazon?

I am right now playing with it running it locally using ollama. It is a 19GB download and it runs nicely on a nvidia A100 GPU.

https://ollama.com/library/qwq


A100 geez. The privileged few.

Runs nicely on my AMD 7900XTX too.

Is o1 even that good? It's doesn't even rank first on LMArena..

Good at what? It's great at breaking down complex problems into small, logical steps. Claude Sonnet 3.5 is still the best for coding. They can be leveraged together by using Aider's architect mode. It gives your request to the "architect" model first and returns a list of steps to implement your idea, but it doesn't write any code at that point. You then approve this and it sends the list to the coding model to actually write the code. This technique creates better quality code than any one model by itself. In Aider, you can assign any model you want as the architect and any other model as the coder. It's really great and I'm looking forward to the AI coding extensions for VSCode doing the same thing since I prefer to work in VSC than on the command line as is necessary with Aider.

My only real problem with o1 is that it's ridiculously expensive, to the point that it makes no sense to use it for actual code. In architect mode, however, you can keep the costs under control as there are far fewer input/output tokens.


I haven’t been super impressed with it, and haven’t encountered any practical tasks I wanted to solve with an LLM where o1 worked any better than prompting 4o or Sonnet to use more extensive CoT.

There might be some narrow band of practical problems in between what other LLMs can do and what o1 can’t, but I don’t think that really matters for most use cases, especially given how much slower it is.

Day to day, you just don’t really want to prompt a model near the limits of its capabilities, because success quickly becomes a coin flip. So if a model needs five times as long to work, it needs to dramatically expand the range of problems that can be solved reliably.


I think the true edge of CoT models will come from layman usability. While I can easily prompt Claude for examples and then manually modify the code to fill in the gaps, general domain knowledge and technical understanding is absolutely required from the human sitting in front of the screen. With o1, a layman can sit in front of the computer, and ask 'I want a website for tracking deliveries for my webshop and make it pretty', and the model will do it.

So it's not so much about increased capability, but removing the expert human in the loop.


yes, it’s extremely good.

don’t overindex on the lmsys arena, the median evaluator is kinda mid


My understanding was that the metric for LMArena is that one answer is “better” than another, for a deliberately 100% subjective definition of better.

My experience has been that typical LLMs will have more “preamble” to what they say, easing the reader (and priming themselves autoregressively) into answers with some relevant introduction of the subject, sometimes justifying the rationale and implications behind things. But for o1, that transient period and the underlying reasoning behind things is part of OpenAI’s special sauce, and they deliberately and aggressively take steps to hide it from users.

o1 will get correct answers to hard problems more often than other models (look at the math/coding/hard subsections on the leaderboard, where anecdotal experiences aside, it is #1), and there’s a strong correlation between correctness and a high score in those domains because getting code or math “right” matters more than the justification or explanation. But in more general domains where there isn’t necessarily an objective right or wrong, I know the vibe matters a lot more to me, and that’s something o1 struggles with.


For very specific tasks, yeah it's good. For my usual coding tasks...no way.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: