BAM Weblog

Optimising the PureScript Compiler

Brian McKenna — 2015-06-11

2024-12-29: This post is from a former employer’s blog. I’m particularly proud of this contribution to PureScript, and the blog is no longer around, so here is the post.

I work on the SlamData front-end, which is completely written in PureScript. I work with 2 other PureScript engineers and we’ve noticed some slow compilation while working on the UI. If we have to modify one of the core modules, it could take up to 2.5 minutes to compile the application. Not a great situation when we want to iterate on our UI changes.

I have some experience with profiling Haskell applications and so had a look. I kept notes and wrote this blog post up to help people optimise Haskell code and the PureScript compiler in the future.

I use NixOS for pretty much everything. Here’s the Nix expression I wrote (with thanks to the cabal2nix docs) to make a PureScript installation which had profiling enabled:

with import { };

let
  packages =
    haskell.packages.ghc7101.override {
      overrides = self: super: {
        mkDerivation = args: super.mkDerivation (args // {
          enableLibraryProfiling = true;
        });
      };
    };
in
  haskell.lib.appendConfigureFlag packages.purescript [
    "–enable-executable-profiling"
    "–ghc-option=-fprof-auto"
  ]

The -fprof-auto flag enables automatic cost centre creation. Alternatively I could have started with -fprof-auto-top and -fprof-auto-exported to get a general look first.

Haskell programs provide an +RTS option to tell the runtime what to do. The -p flag tells the runtime to profile and dump out a file:

psc-make -o dist/node_modules \
  bower_components/purescript-*/src/**/*.purs \
  src/**/*.purs \
  +RTS -p

Profiling of course slows things down a bit. Compilation of the UI went up to over 3.5 minutes when a core module was touched.

But it gave a nice psc-make.prof file. Here’s what the top cost centres were:

COST CENTRE                   MODULE                                      %time   %alloc

typeCheckAll.go               Language.PureScript.TypeChecker             20.7    11.3
==                            Language.PureScript.Names                   17.9     0.0
fmap                          Language.PureScript.TypeChecker.Monad       10.0    26.4
entails.filterModule          Language.PureScript.TypeChecker.Entailment   8.0     0.0
compare                       Language.PureScript.Names                    5.5     0.0
typeCheckAll.go.instances.\\  Language.PureScript.TypeChecker              5.5     0.0
everywhereOnTypesTopDownM.go  Language.PureScript.Types                    4.4    17.4
fmap                          Control.Monad.Unify                          3.6    11.1

profiteur is a great tool which can load .prof files and visualise where time is spent. After clicking a few times I saw something like this:

Which helped confirm that we were spending a lot of time running the typeCheckAll.go clause. I went down to the “cost-centre stack” and looked for that identifier:

                                                                                                individual   inherited
COST CENTRE                       MODULE                                     no.    entries    %time %alloc %time %alloc

typeCheckAll.go                   Language.PureScript.TypeChecker            32777          0   20.7   11.3  96.3   85.3
typeCheckAll.go.instances.\\      Language.PureScript.TypeChecker            36234  476948645    5.5    0.0  17.0    0.0
==                                Language.PureScript.Names                  36238  476948645   10.1    0.0  10.1    0.0
typeCheckAll.go.instances..mn     Language.PureScript.TypeChecker            36237  476948645    0.0    0.0   0.0    0.0
typeCheckAll.go.instances..(...)  Language.PureScript.TypeChecker            36235  476948645    1.5    0.0   1.5    0.0
tcdName                           Language.PureScript.TypeClassDictionaries  36236  476948645    0.0    0.0   0.0    0.0

And saw that 96.3% of the total time was under that clause, not good. The typeCheckAll.go.instances.\ entry referred to a lambda called 476,948,645 times! And names were checked for equality each time that lambda was called, adding to 10.1% of the total time running the program. Removing name equality tests seemed like something which could be fixed.

Here’s what the part of the function looked like:

go (d@(ImportDeclaration importedModule _ _) : rest) = do
  tcds <- getTypeClassDictionaries
  let instances = filter (\tcd -> let Qualified (Just mn) _ = tcdName tcd in importedModule == mn) tcds
  addTypeClassDictionaries [ tcd { tcdName = Qualified (Just moduleName) ident, tcdType = TCDAlias (canonicalizeDictionary tcd) }
                           | tcd <- instances
                           , tcdExported tcd
                           , let (Qualified _ ident) = tcdName tcd
                           ]
  ds <- go rest
  return $ d : ds

Each time we imported a module, we found all type-class dictionaries and filtered down the ones which were in that module. We just needed a way to index to dictionaries by modules.

Turns out the dictionaries were already stored in a Map:

typeClassDictionaries ::
  M.Map (Qualified Ident, Maybe ModuleName) TypeClassDictionaryInScope

But indexed with the identifier. I took a while to see if the compiler actually did a lookup by the name, but I couldn’t easily see it. Instead of spending time looking for where, I decided to just delete the identifier part of the key and see what broke. Turns out that after preserving the “add this dictionary” and “return all dictionaries” functions, everything compiled. We never actually did a look up by the keys, we just used the Map for uniqueness.

We could make the lookup efficient and guarantee uniqueness by changing the type to:

typeClassDictionaries ::
  M.Map (Maybe ModuleName) (M.Map (Qualified Ident) TypeClassDictionaryInScope)

Easy.

Now I just needed to verify that I actually improved speed. Haskell has great tooling for this too, with the Criterion library.

Here is a program which runs the previous compiler and the updated compiler, updating a file to trigger rebuilding of certain modules:

import Criterion.Main
import System.FilePath.Glob
import System.Posix.Files
import System.Process

unpatched = "./psc-make-old"
patched = "./psc-make-new"

make :: String -> IO String
make c = do
  bowerFiles <- glob "bower\_components/purescript-_/src/**/\*.purs"
  srcFiles <- glob "src/**/_.purs" touchFile "src/EffectTypes.purs"
  readProcess c (["-o", "dist/node\_modules"] ++ bowerFiles ++ srcFiles) ""

main :: IO ()
main =
  defaultMain [
    bench "unpatched" $ nfIO (make unpatched),
    bench "patched" $ nfIO (make patched) 
  ]

This took a long time to run, since a single run of the original code took 2.5 minutes. An hour or so later, I got these results:

benchmarking unpatched time 142.9 s (133.9 s .. 149.8 s) 1.000 R² (0.998 R² .. 1.000 R²) mean 145.8 s (144.0 s .. 146.9 s) std dev 1.699 s (0.0 s .. 1.938 s) variance introduced by outliers: 19% (moderately inflated)

benchmarking patched time 56.14 s (41.02 s .. 68.60 s) 0.992 R² (0.969 R² .. 1.000 R²) mean 58.99 s (56.64 s .. 60.73 s) std dev 2.642 s (0.0 s .. 3.008 s) variance introduced by outliers: 19% (moderately inflated)

We got similar variance but ~86s taken off each run. Around a 2.5x speedup over the original.

I hope this shows how easy it can be to accidentally quadratic but also how easy it can be to profile and optimise Haskell programs. There’s much more we can do to make the PureScript compiler even quicker and thankfully we have access to Haskell’s great tooling.

Please enable JavaScript to view the comments powered by Disqus.