> - Writing to a fresh file is slower than writing to an existing file
> mold can link a 4 GiB LLVM/clang executable in ~1.8 seconds on my
machine if the linker reuses an existing file and overwrites it.
However, the speed decreases to ~2.8 seconds if the output file does
not exist and mold needs to create a fresh file. I tried using
fallocate(2) to preallocate disk blocks, but it didn't help. While 4
GiB is not small, should creating a file really take almost a second?
1s difference here is insane these days. There must be something weird going on, even if the physical disk is one of those ancient spinning things.
He doesn't specify what file system he's using, but offhand, you would assume that what actually takes time isn't to creating the file, but rather allocating all the blocks. A good first step would be to reproduce the issue and take a profile of both cases?
xfs, ext4 and btrfs all have delayed allocation. so they should only synchronously allocate the blocks if there is memory pressure or if they're triggering those well-meant (but in this case counterproductive) auto_da_alloc heuristics.
It must be stressful to write to LKML knowing that Phoronix is hiding behind a corner waiting to sensationalize every little thing that sounds like could give it free clicks you write even before any sensible discussion on what you wrote has a chance of happening.
presumably the existing file is already backed by pages in the page cache while the new one still has to be allocated (+ whatever the io subsystem is doing).
I'm interested in knowing what kind of workloads this is targetting with multi-GB executables being built at such a pace a 0.2 second wait between them is unacceptable.
Performance optimization is not always just low-hanging fruit. When you start trying to optimize something, there's often large bottlenecks to clean up -- "we can add a hashmap here to turn this O(n^2) function into O(n) and improve performance by 10x" kind of thing. But once you've got all the easy stuff, what's left is the little things -- a 1% improvement here, a 0.1% improvement there. If you can find 10 of those 1% improvements, now you've got a 10% performance improvement. 0.2 seconds on its own isn't that much, but the reason mold is so fast is because the author has found a lot of 0.2 second improvements.
And even disregarding that, the linked LKML post
mentions LLVM/clang at a case of building a 4GB executable. If you've ever built the LLVM project from source, there's about 50ish (?) binaries that need to be linked at the end of the build process -- it's not just clang, but all sorts of other tools, intermediate libraries and debugging utilities. So that is an example of a workload with "multi-GB executables being built at such a pace" -- saving 0.2 seconds per executable saves something like 10 seconds on the build.
I'm well aware of the joys of optimization, I just haven't come across someone building multi-GB executables at a pace where milliseconds spent linking mattered.
To me that's an exotic workload which sounds interesting, hence why I'm curious.
Well, keep in mind that the full linking step has to be done at the end of an incremental build. So if you're a developer actively working on a project with a 4GB executable, that linking time is part of your edit-compile-test cycle, and you have to wait for it every time you change a line of code.
The benchmarks on mold's README show that GNU gold takes 33 seconds to link clang, whereas mold takes 1.3 seconds. If you're a developer working on Clang, that's a pretty serious productivity improvement.
It's more about sending a message, and I support the idea very much.
It always starts with "it's only a fraction of a second" or "just 100kb more Javascript to load", and suddenly every website pulls in 25MB JS at least, and starting Windows calculator shows a splash screen (on a modern machine), because it takes that long to start up.
Sure, as I mentioned I'm just genuinely curious what the use-case is.
For example, running a compiler test suite I could understand, that would be quite impacted. But those tests wouldn't be multi-GB builds for the most part.
And I could understand being annoyed by it, but the author took steps to work around it, hence finding it unacceptable.
> mold can link a 4 GiB LLVM/clang executable in ~1.8 seconds on my machine if the linker reuses an existing file and overwrites it. However, the speed decreases to ~2.8 seconds if the output file does not exist and mold needs to create a fresh file. I tried using fallocate(2) to preallocate disk blocks, but it didn't help. While 4 GiB is not small, should creating a file really take almost a second?
1s difference here is insane these days. There must be something weird going on, even if the physical disk is one of those ancient spinning things.
reply