Hacker News Clone

Hacker News Clone new | comments | show | ask | jobs | submit | github repo

		How much memory do you need in 2024 to run 1M concurrent tasks? (hez2010.github.io)
		87 points by neonsunset 2 hours ago \| hide \| past \| web \| 38 comments \| favorite

AkshitGarg 1 hour ago [-]

I feel this benchmark compares apples to oranges in some cases.

For example, for node, the author puts a million promises into the runtime event loop and uses `Promise.all` to wait for them all.

This is very different from, say, the Go version where the author creates a million goroutines and puts `waitgroup.Done` as a defer call.

While this might be the idiomatic way of concurrency in the respective languages, it does not account for how goroutines are fundamentally different from promises, and how the runtime does things differently. For JS, there's a single event loop. Counting the JS execution threads, the event loop thread and whatever else the runtime uses for async I/O, the execution model is fundamentally different from Go. Go (if not using `GOMAXPROCS`) spawns an OS thread for every physical thread that your machine has, and then uses a userspace scheduler to distribute goroutines to those threads. It may spawn more OS threads to account for OS threads sleeping on syscalls. Although I don't think the runtime will spawn extra threads in this case.

It also depends on what the "concurrent tasks" (I know, concurrency != parallelism) are. Tasks such as reading a file or doing a network call are better done with something like promises, but CPU-bound tasks are better done with goroutines or Node worker_threads. It would be interesting to see how the memory usage changes when doing async I/O vs CPU-bound tasks concurrently in different languages.

n2d4 8 minutes ago [-]

Actually, I think this benchmark did the right thing, that I wish more benchmarks would do. I'm much less interested in what the differences between compilers are than in what the actual output will be if I ask a professional Go or Node.js dev to solve the same task. (TBF, it would've been better if the task benchmarked was something useful, eg. handling an HTTP request.)

Go heavily encourages a certain kind of programming; JavaScript heavily encourages a different kind; and the article does a great job at showing what the consequences are.

SPascareli13 56 minutes ago [-]

As far as I know there is no way to do Promise like async in go, you HAVE to create a goroutine for each concurrent async task. If this is really the case then I believe the submition is valid.

But I do think that spawning a goroutine just to do a non-blocking task and get its return is kinda wasteful.

threeseed 25 minutes ago [-]

The requirement is to run 1 million concurrent tasks.

Of course each language will have a different way of achieving this task each of which will have their unique pros/cons. That's why we have these different languages to begin with.

cperciva 1 hour ago [-]

This depends a lot on how you define "concurrent tasks", but the article provides a definition:

Let's launch N concurrent tasks, where each task waits for 10 seconds and then the program exists after all tasks finish. The number of tasks is controlled by the command line argument.

Leaving aside semantics like "since the tasks aren't specified as doing anything with side effects, the compiler can remove them as dead code", all you really need here is a timer and a continuation for each "task" -- i.e 24 bytes on most platforms. Allowing for allocation overhead and a data structure to manage all the timers efficiently, you might use as much as double that; with some tricks (e.g. function pointer compression) you could get it down to half that.

Eyeballing the graph, it looks like the winner is around 200MB for 1M concurrent tasks, so about 4x worse than a reasonably efficient but not heavily optimized implementation would be.

I have no idea what Go is doing to get 2500 bytes per task.

masklinn 15 minutes ago [-]

> I have no idea what Go is doing to get 2500 bytes per task.

TFA creates a goroutine (green thread) for each task (using a waitgroup to synchronise them). IIRC goroutines default to 2k stacks, so that’s about right.

One could argue it’s not fair and it should be timers which would be much lighter. There’s no “efficient wait” for them but that’s essentially the same as the appendix rust program.

liveoneggs 1 hour ago [-]

https://tpaschalis.me/goroutines-size/

https://github.com/golang/go/blob/master/src/runtime/stack.g...

promiseofbeans 31 minutes ago [-]

It would be nice if the author also compared different runtimes (e.g. NodeJS vs Deno, or cpython vs pypy) and core language engines (e.g. v8 vs spider monkey vs JavaScript core)

theamk 1 hour ago [-]

> high number of concurrent tasks can consume a significant amount of memory

note absolute numbers here: in the worst case, 1M tasks consumed 2.7 GB of RAM, with ~2700 bytes overhead per task. That'd still fit in the cheapest server with room to spare.

My conclusion would be opposite: as long as per-task data is more than a few KB, the memory overhead of task scheduler is negligible.

davidatbu 1 hour ago [-]

I write (async) Rust regularly, and I don't understand how the version in the appendix doesn't take 10x1,000,000 seconds to complete. In other words, I'd have expected no concurrency to take place.

Am I wrong?

UPDATE: From the replies below, it looks like I was right about "no concurrency takes place", but I was wrong about how long it takes, because `tokio::time::sleep()` keeps track of when the future was created, (ie when `sleep()` was called) instead of when the future is first `.await`ed (which was my unsaid assumption).

claytonwramsey 1 hour ago [-]

The implementation of `sleep` [1] decides the wake up time by when `sleep` is called, rather than when its future is polled. So the first task waits one second, then the remaining tasks see that they have already passed the wake-up time and so return instantly.

[1]: https://docs.rs/tokio/latest/tokio/time/fn.sleep.html

davidatbu 1 hour ago [-]

This makes total sense!

ch33zer 1 hour ago [-]

Tokyo::sleep is async

davidatbu 59 minutes ago [-]

I think the points people made in other replies make sense, but "Tokio::sleep is async" by itself is not enough of an explanation. If it were the case that `Tokio::sleep()` tracked the moment `.await` was called as it's start time, I believe it would indeed take 10x1,000,000 seconds, _even if it's async_.

fuzzybear3965 1 hour ago [-]

Yeah, I think you're wrong. It should only take ~10s. tokio::time::sleep records the time it was called before returning the future [1]. So, all 1 million tasks should be stamped with +/- the same time (within a few milliseconds).

[1]: https://docs.rs/tokio/1.41.1/src/tokio/time/sleep.rs.html#12...

davidatbu 1 hour ago [-]

This makes total sense!

0xcoffee 38 minutes ago [-]

The C# version will copy the list into an array during Task.WhenAll, it may save some memory to use an array directly.

Souce: https://github.com/microsoft/referencesource/blob/master/msc...

MarkSweep 18 minutes ago [-]

The referencesource repository is only relevant if you are using the legacy .NET Framework. Modern .NET has a special case for passing a List<Task> and avoids the allocation:

https://github.com/dotnet/runtime/blob/1f01bee2a41e0df97089f...

neonsunset 19 minutes ago [-]

It doesn't take that much space, and not all languages have option to easily map an initial range onto an iterator that produces tasks. Most are dominated by the size of state machines/virtual threads.

Please note that the link above leads to old code from .NET Framework.

The up-to-date .NET implementation lives here: https://github.com/dotnet/runtime/blob/main/src/libraries/Sy...

bilbo-b-baggins 18 minutes ago [-]

This benchmark is nonsense. Apart from the fact that Go has an average Goroutine overhead of a 4kB stack (meaning an average usage of 3.9GB for 1M tasks), the code written is also in a closure, and scheduling a 2nd Goroutine in the wg.Done(), so unlike some of the others it had at least 2M function calls on the event loop stack in addition to at least 1M closure references. So yeah, it’s a great example of bad code in any language.

neonsunset 10 minutes ago [-]

Here's an implementation in C# that more faithfully matches what you have to do in Go:

    var count = int.Parse(args[0]);
    var countdown = new CountdownEvent(count);
    for (var i = 0; i < count; i++) {
        async Task Execute() {
            await Task.Delay(TimeSpan.FromSeconds(10));
            countdown.Signal();
        }
        _ = Execute();
    }
    countdown.Wait();

It ends up consuming roughly 264.5 MB on ARM64 macOS 15.1.1 (compiled with NativeAOT).

pgAdmin4 1 hour ago [-]

Why C with pthreads missing in this benchmark ?

throwaway81523 49 minutes ago [-]

I don't think 1M posix threads is a thing. 1K is no big deal though.

liontwist 40 minutes ago [-]

~100k is a thing on Linux.

rwaksmunski 1 hour ago [-]

Be sure to read the Appendix, Rust's state machine async implementation is indeed very efficient.

ankit70 1 hour ago [-]

NodeJS is better at memory than go?

erik_seaberg 1 hour ago [-]

I'd expect that because Promises are small Javascript objects while goroutines each get a stack that grows from at least 2 KB.

dboreham 55 minutes ago [-]

Otoh Go actually supports concurrency.

jpgvm 51 minutes ago [-]

Well they are all concurrent. I think what you mean is Go is also parallel. As is C#, Rust and Java in this bench.

liontwist 47 minutes ago [-]

No baseline against UNIX processes?

masklinn 20 minutes ago [-]

A million processes?

liontwist 4 minutes ago [-]

I've seen 100k. What happens at a million? How many is unworkable?

lxe 1 hour ago [-]

NodeJS does what it was designed to do well.

ankit70 59 minutes ago [-]

I wonder how compiled (using Deno or others) JS would perform.

lowyek 27 minutes ago [-]

depends on the tasks.

amazingamazing 1 hour ago [-]

Yet again nodejs surpasses my pre-read expectations 3rd best (generalized) for a million? Wow.

I must be missing something - isn’t Go supposed to be memory efficient? Perhaps promises and goroutines aren’t comparable?

fuzzybear3965 1 hour ago [-]

I'm not sure what "memory efficient" means. But, Go sprung as a competitor to Java (portability, language stability, corporate language support/development) and C++ (faster compile times). Can't beat C++ in terms of memory management (performance, guys, not safety) much. But, you can fare well against the JVM, I'm guessing.

jpgvm 49 minutes ago [-]

In this benchmark actually no, Go doesn't fare well. There is actually higher static overhead per goroutine than JVM VirtualThread. I presume this is because of a larger initial stack size though/

This probably doesn't matter in the real world as you will actually use the tasks to do some real work which should really dwarf the static overhead is almost all cases.

neonsunset 45 minutes ago [-]

To add a data point for Elixir: https://gist.github.com/neon-sunset/8fcc31d6853ebcde3b45dc7a...

Note 1: The gist is in Ukrainian, and the blog post by Steve does a much better job, but hopefully you will find this useful. Feel free to replicate the results and post them.

Note 2: The absolute numbers do not necessarily imply good/bad. Both Go and BEAM focus on userspace scheduling and its fairness. Stackful coroutines have their own advantages. I think where the blog post's data is most relevant is understanding the advantages of stackless coroutines when it comes to "highly granular" concurrency - dispatching concurrent requests, fanning out to process many small items at once, etc. In any case, I did not expect sibling comments to go onto praising Node.js, is it really that surprising for event loop based concurrency? :)

Also, if you are an Elixir aficionado and were impressed by C#'s numbers - know that they translate ~1:1 to F# now that it has task CE, just sayin'.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact