Optimizing Run Time for Large Universe
22 Apr 2015
To install Systematic Investor Toolbox (SIT) please visit About page.
Recently, I was asked if Systematic Investor Toolbox (SIT)
back-testing framework is suitable for running large universe back-tests.
Below, I will present the steps I took to test and optimize the running time.
I will base this experiment on the Volatility Quantiles
post i wrote back in 2012.
First, let’s generate 10 years of daily data for 25,000 securities.
Please note that this code requires lot’s of memory to store results;
hence, you might want to add
--max-mem-size=8000M to your
Next I tried to run the original code from the Volatility Quantiles
post and eventually run out of memory.
Below i modified original code to make it work.
Elapsed time is 48.12 seconds
used (Mb) gc trigger (Mb) max used (Mb) Ncells 561806 30.1 984024 52.6 984024 52.6 Vcells 81877454 624.7 199500757 1522.1 199499815 1522.1
Elapsed time is 10.63 seconds
Elapsed time is 50.27 seconds
Elapsed time is 109.02 seconds
Looking at the timing, there are 2 bottle necks:
- computation of historical one year volatility
- creating back-test for each quantile
To address running time for computation of historical one year volatility, I tried
Elapsed time is 19.56 seconds
Elapsed time is 17.43 seconds
Elapsed time is 0.54 seconds
Elapsed time is 37.67 seconds
Not much faster, it takes almost 19 seconds to move large
ret matrix around.
Next I looked at RcppParallel for help.
Please save above code in the
runsd.cpp file or download runsd.cpp.
Elapsed time is 0.98 seconds
The Rcpp helper function took less than a second. This is an improvement i was lookin for.
Next, let’s address the run time for making back-test for each quantile. Please notice that
instead of lagging and back-filling weights for each quantile, we can do it once to save time.
Following is final code that runs in about 28 seconds.
Elapsed time is 6.85 seconds
Elapsed time is 0.86 seconds
Elapsed time is 11.4 seconds
used (Mb) gc trigger (Mb) max used (Mb) Ncells 637966 34.1 1073225 57.4 1012438 54.1 Vcells 208010169 1587.0 518657040 3957.1 518656088 3957.1
used (Mb) gc trigger (Mb) max used (Mb) Ncells 638113 34.1 1073225 57.4 1012438 54.1 Vcells 154037802 1175.3 414925632 3165.7 518656088 3957.1
Elapsed time is 9.17 seconds
Elapsed time is 0.03 seconds
Elapsed time is 28.31 seconds
Aside: Rcpp helper function testing:
||bt.apply.matrix(ret, runSD, 252)
(this report was produced on: 2015-05-01)