# Optimizing Run Time for Large Universe

22 Apr 2015To install Systematic Investor Toolbox (SIT) please visit About page.

Recently, I was asked if Systematic Investor Toolbox (SIT) back-testing framework is suitable for running large universe back-tests.

Below, I will present the steps I took to test and optimize the running time. I will base this experiment on the Volatility Quantiles post i wrote back in 2012.

First, let’s generate 10 years of daily data for 25,000 securities.

Please note that this code requires lot’s of memory to store results;
hence, you might want to add `--max-mem-size=8000M`

to your `R`

parameters.

Next I tried to run the original code from the Volatility Quantiles post and eventually run out of memory.

Below i modified original code to make it work.

Elapsed time is 48.12 seconds

```
used (Mb) gc trigger (Mb) max used (Mb) Ncells 561806 30.1 984024 52.6 984024 52.6 Vcells 81877454 624.7 199500757 1522.1 199499815 1522.1
```

Elapsed time is 10.63 seconds

Elapsed time is 50.27 seconds

Elapsed time is 109.02 seconds

Looking at the timing, there are 2 bottle necks:

- computation of historical one year volatility
- creating back-test for each quantile

To address running time for computation of historical one year volatility, I tried following approaches:

Elapsed time is 19.56 seconds

Elapsed time is 17.43 seconds

Elapsed time is 0.54 seconds

Elapsed time is 37.67 seconds

Not much faster, it takes almost 19 seconds to move large `ret`

matrix around.

Next I looked at RcppParallel for help.

Please save above code in the `runsd.cpp`

file or download runsd.cpp.

Elapsed time is 0.98 seconds

The Rcpp helper function took less than a second. This is an improvement i was lookin for.

Next, let’s address the run time for making back-test for each quantile. Please notice that instead of lagging and back-filling weights for each quantile, we can do it once to save time.

Following is final code that runs in about 28 seconds.

Elapsed time is 6.85 seconds

Elapsed time is 0.86 seconds

Elapsed time is 11.4 seconds

```
used (Mb) gc trigger (Mb) max used (Mb) Ncells 637966 34.1 1073225 57.4 1012438 54.1 Vcells 208010169 1587.0 518657040 3957.1 518656088 3957.1
```

```
used (Mb) gc trigger (Mb) max used (Mb) Ncells 638113 34.1 1073225 57.4 1012438 54.1 Vcells 154037802 1175.3 414925632 3165.7 518656088 3957.1
```

Elapsed time is 9.17 seconds

Elapsed time is 0.03 seconds

Elapsed time is 28.31 seconds

Aside: Rcpp helper function testing:

item1 | item2 | equal |
---|---|---|

r.sd252 | cp.sd252 | TRUE |

rownames(x) | test | replications | elapsed | relative |
---|---|---|---|---|

1 | cp_run_sd(ret, 252) | 20 | 0.16 | 1.000 |

2 | bt.apply.matrix(ret, runSD, 252) | 20 | 0.94 | 5.875 |

*(this report was produced on: 2015-05-01)*