Optimising the PureScript Compiler
2024-12-29: This post is from a former employer’s blog. I’m particularly proud of this contribution to PureScript, and the blog is no longer around, so here is the post.
I work on the SlamData front-end, which is completely written in PureScript. I work with 2 other PureScript engineers and we’ve noticed some slow compilation while working on the UI. If we have to modify one of the core modules, it could take up to 2.5 minutes to compile the application. Not a great situation when we want to iterate on our UI changes.
I have some experience with profiling Haskell applications and so had a look. I kept notes and wrote this blog post up to help people optimise Haskell code and the PureScript compiler in the future.
I use NixOS for pretty much everything. Here’s the Nix expression I wrote (with thanks to the cabal2nix docs) to make a PureScript installation which had profiling enabled:
with import { };
let
packages =
{
haskell.packages.ghc7101.override overrides = self: super: {
mkDerivation = args: super.mkDerivation (args // {
enableLibraryProfiling = true;
});
};
};
in
[
haskell.lib.appendConfigureFlag packages.purescript "–enable-executable-profiling"
"–ghc-option=-fprof-auto"
]
The -fprof-auto
flag enables automatic cost centre creation. Alternatively I could have started with -fprof-auto-top
and -fprof-auto-exported
to get a general look first.
Haskell programs provide an +RTS
option to tell the runtime what to do. The -p
flag tells the runtime to profile and dump out a file:
psc-make -o dist/node_modules \
bower_components/purescript-*/src/**/*.purs \
src/**/*.purs \
+RTS -p
Profiling of course slows things down a bit. Compilation of the UI went up to over 3.5 minutes when a core module was touched.
But it gave a nice psc-make.prof
file. Here’s what the top cost centres were:
COST CENTRE MODULE %time %alloc
typeCheckAll.go Language.PureScript.TypeChecker 20.7 11.3
== Language.PureScript.Names 17.9 0.0
fmap Language.PureScript.TypeChecker.Monad 10.0 26.4
entails.filterModule Language.PureScript.TypeChecker.Entailment 8.0 0.0
compare Language.PureScript.Names 5.5 0.0
typeCheckAll.go.instances.\\ Language.PureScript.TypeChecker 5.5 0.0
everywhereOnTypesTopDownM.go Language.PureScript.Types 4.4 17.4
fmap Control.Monad.Unify 3.6 11.1
profiteur is a great tool which can load .prof files and visualise where time is spent. After clicking a few times I saw something like this:
Which helped confirm that we were spending a lot of time running the typeCheckAll.go
clause. I went down to the “cost-centre stack” and looked for that identifier:
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
typeCheckAll.go Language.PureScript.TypeChecker 32777 0 20.7 11.3 96.3 85.3
typeCheckAll.go.instances.\\ Language.PureScript.TypeChecker 36234 476948645 5.5 0.0 17.0 0.0
== Language.PureScript.Names 36238 476948645 10.1 0.0 10.1 0.0
typeCheckAll.go.instances..mn Language.PureScript.TypeChecker 36237 476948645 0.0 0.0 0.0 0.0
typeCheckAll.go.instances..(...) Language.PureScript.TypeChecker 36235 476948645 1.5 0.0 1.5 0.0
tcdName Language.PureScript.TypeClassDictionaries 36236 476948645 0.0 0.0 0.0 0.0
And saw that 96.3% of the total time was under that clause, not good. The typeCheckAll.go.instances.\
entry referred to a lambda called 476,948,645 times! And names were checked for equality each time that lambda was called, adding to 10.1% of the total time running the program. Removing name equality tests seemed like something which could be fixed.
Here’s what the part of the function looked like:
@(ImportDeclaration importedModule _ _) : rest) = do
go (d<- getTypeClassDictionaries
tcds let instances = filter (\tcd -> let Qualified (Just mn) _ = tcdName tcd in importedModule == mn) tcds
= Qualified (Just moduleName) ident, tcdType = TCDAlias (canonicalizeDictionary tcd) }
addTypeClassDictionaries [ tcd { tcdName | tcd <- instances
, tcdExported tcdlet (Qualified _ ident) = tcdName tcd
,
]<- go rest
ds return $ d : ds
Each time we imported a module, we found all type-class dictionaries and filtered down the ones which were in that module. We just needed a way to index to dictionaries by modules.
Turns out the dictionaries were already stored in a Map:
typeClassDictionaries ::
M.Map (Qualified Ident, Maybe ModuleName) TypeClassDictionaryInScope
But indexed with the identifier. I took a while to see if the compiler actually did a lookup by the name, but I couldn’t easily see it. Instead of spending time looking for where, I decided to just delete the identifier part of the key and see what broke. Turns out that after preserving the “add this dictionary” and “return all dictionaries” functions, everything compiled. We never actually did a look up by the keys, we just used the Map for uniqueness.
We could make the lookup efficient and guarantee uniqueness by changing the type to:
typeClassDictionaries ::
M.Map (Maybe ModuleName) (M.Map (Qualified Ident) TypeClassDictionaryInScope)
Easy.
Now I just needed to verify that I actually improved speed. Haskell has great tooling for this too, with the Criterion library.
Here is a program which runs the previous compiler and the updated compiler, updating a file to trigger rebuilding of certain modules:
import Criterion.Main
import System.FilePath.Glob
import System.Posix.Files
import System.Process
= "./psc-make-old"
unpatched = "./psc-make-new"
patched
make :: String -> IO String
= do
make c <- glob "bower\_components/purescript-_/src/**/\*.purs"
bowerFiles <- glob "src/**/_.purs" touchFile "src/EffectTypes.purs"
srcFiles "-o", "dist/node\_modules"] ++ bowerFiles ++ srcFiles) ""
readProcess c ([
main :: IO ()
=
main
defaultMain ["unpatched" $ nfIO (make unpatched),
bench "patched" $ nfIO (make patched)
bench ]
This took a long time to run, since a single run of the original code took 2.5 minutes. An hour or so later, I got these results:
benchmarking unpatched time 142.9 s (133.9 s .. 149.8 s) 1.000 R² (0.998 R² .. 1.000 R²) mean 145.8 s (144.0 s .. 146.9 s) std dev 1.699 s (0.0 s .. 1.938 s) variance introduced by outliers: 19% (moderately inflated)
benchmarking patched time 56.14 s (41.02 s .. 68.60 s) 0.992 R² (0.969 R² .. 1.000 R²) mean 58.99 s (56.64 s .. 60.73 s) std dev 2.642 s (0.0 s .. 3.008 s) variance introduced by outliers: 19% (moderately inflated)
We got similar variance but ~86s taken off each run. Around a 2.5x speedup over the original.
I hope this shows how easy it can be to accidentally quadratic but also how easy it can be to profile and optimise Haskell programs. There’s much more we can do to make the PureScript compiler even quicker and thankfully we have access to Haskell’s great tooling.