Testing a strategy

Testing has the purpose to determine a strategy's profit expectancy in live trading. It is not possible to calculate this with some algorithm or formula; the only way to find out it really trading it for a couple years. A proxy of this is the backtest: simulating the strategy with a couple years of historical price data. Problem is that a simple backtest will normally not produce a result that is representative for live trading. Backtest results are tainted by various sorts of bias (see Backtest theory) that all must taken into account and eliminated from the result. Thus, testing a strategy is surprisingly complex.

Testing a script with Zorro

For quickly testing a strategy, just select the script and click [Test]. Depending on the strategy, a test can be finished in a second, or it can run for several minutes on complex portfolio strategies. If the test requires strategy parameters, capital allocation factors, or trade rules, they must be generated in a [Train] run before. Otherwise Zorro will complain with an error message that the parameter, factor, or rule file was not found.

If the script needs to be recompiled, or if it contains any global or static variables that must be reset, click [Edit] before [Test]. This also resets the sliders to their default values. Otherwise the static and global variables and the sliders keep their settings from the previous test run.

If the test needs historical price data that is not available, a dialog will pop up and propose to download the missing data set from the broker. Downloaded price data is stored in the History folder in the way described under Data Import. If the broker does not offer price data for all years in the test period, substitute the missing years manually with files of length 0 to prevent the download dialog. No trades take then place during those years.

All test results - performance report, log file, trade list, chart - are stored in the Log folder. If the folder is cluttered with too many old log files and chart images, every 60 days a dialog will pop up and propose to automatically delete files that are older than 30 days. The minimum number of days can be set up in the Zorro.ini configuration file.

The test simulates a real broker account with a given leverage, spread, rollover, commission, and other asset parameters. By default, a microlot account with 100:1 leverage is simulated. If your account is very different, download your actual account data from the broker as described under data import. Simulated slippage, spread, rollover, commission, or pip cost can alternatively be set up in the script. If the NFA flag is set, either directly in the script or through the NFA parameter of the selected account, the simulation runs in NFA mode.

The test is not a real historical simulation. It rather simulates trades as if they were entered today, but with a historical price curve. For a real historical simulation, the spread, pip costs, rollovers and other asset and account parameters had to be changed during the simulation according to their historical values. This can indeed be done by script, but is normally not recommended for strategy testing because it would add artifacts to the results.

The test runs through one or many sample cycles, either with the full historical price data set, or with out-of-sample subsets. It generates a number of equity curves that are then used for a Monte Carlo Analysis of the strategy. The optional Walk Forward Analysis applies different parameter sets for staying always out-of-sample. Several switches (see Mode) affect test and data selection. During the test, the progress bar indicates the current position within the test period. The lengths of its green and red area display the current gross win and gross loss of the strategy. The result window below shows some information about the current profit situation, in the following form:

 3:   +8192 +256  214/299  

3: Current oversampling cycle (if any).
+8192 Current balance.
+256 Current value of open trades.
214 Number of winning trades so far.
/299 Number of losing trades so far.

After the test, a performance report and - if the LOGFILE flag was set - a log file is generated in the Log folder. The result window displays the annual return (if any) in percent of the required capital, and the annual gain/loss in pips. Note that due to different pip costs of the assets, a portfolio strategy can end with a negative pip value even when the annual return is positive, or vice versa.

After the test, the info window displays the annual return in percent and in pips, and the message window contains a short version of the performance report. The content of the message window can be copied to the clipboard by double clicking on it. The following performance figures are displayed (for details see performance report):

Median AR Annual return by Monte Carlo analysis at 50% confidence level.
Profit Total profit of the system in units of the account currency.
MI Average monthly income of the system in units of the account currency.
DD Maximum balance-equity drawdown.
Capital Required initial capital.
Trades Number of trades in the backtest period.
Win Percentage of winning trades.
Avg Average profit/loss of a trade in pips.
Bars Average number of bars of a trade. Fewer bars mean less exposure to risk.
PF Profit factor, gross win divided by gross loss.
SR Sharpe ratio. Should be > 1 for good strategies.
UI Ulcer index, the average drawdown percentage. Should be < 10% for ulcer prevention.
R2 Determination coefficient, the equity curve linearity. Should be close to 1 for good strategies.
AR Annual return of the simulation, for non-reinvesting systems.
CAGR Compound annual growth rate, for reinvesting systems.

If [Result] is clicked after the test, the performance sheet and the trades & equity chart are displayed (see performance).

Single step mode

For debugging the trade behavior, a test can be run in single step mode by setting the STEPWISE flag. In this mode execution pauses after every bar, and the buttons change their behavior. [Step] moves one bar forward, [Skip] moves to the next bar at which a trade opens or closes. A HTML browser window will pop up and display the current chart and open trade status on every step. The window is refreshed every two seconds.

Single Step Debugging

The stepwise change of variables and indicators can be made visible either in the message window with a watch statement, in the browser window with print(TO_HTML, ...), or on the chart with plot. Setting PlotBars to a negative value displays only the last part of the chart, f.i. -300 displays the last 300 bars. For debugging single loops or function calls, watch ("!...", ...) statements can be used.

Stepwise debugging normally begins at the end of the LookBack period. For beginning at a certain date or bar number, set the STEPWISE flag dependent on a condition, f.i. if(date() >= 20150401) set(STEPWISE);.

Scalping and HFT strategies

The required price data resolution depends on the strategy to be tested. For long-term systems, such as options trading or portfolio rebalancing, daily data as provided from Yahoo™ or Quandl™ is normally sufficient. For day trading strategies, use historical data with 1-minute price ticks (M1 data, .t6 files) that is freely available from most broker APIs. For scalping strategies that open or close trades in minutes, you'll need quote based price ticks (T1 data, .t1 files). And for backtesting high frequency trading systems that must react in microseconds, you'll need order book data with exchange time stamps, as available from some data vendors in NxCore tape format. For testing with T1 or NxCore data, Zorro S is required.

The .t6 and .t1 file formats are described in the import chapter. In T1 data, any tick represents a price change by a new price quote. In M1 data, a tick represents one minute. Because many price quotes can arrive in a single second, T1 data contains a lot more price ticks than M1 data. Using T1 data affects the backtest, especially when the TICKS flag is set. Trade management functions (TMFs) are called more often, the tick function is executed more often, and trade entry and exit conditions are simulated with higher precision.

For using T1 historical price data, just set the History string in the script accordingly, f.i. with History = ".t1". Zorro will then load its price history from .t1 files. Make sure that you have downloaded the required files before, either with the Download script, or from the Zorro download page. Special streaming data formats such as NxCore should be directly read by script with the priceQuote function. An example can be found under Fill mode.

A backtest with T1 data in TICKS mode takes a long time and needs a lot of memory. Due to the high memory requirement, not more than one or two years can be backtested with T1 data. Even with non-scalping strategies T1 backtests can produce different results than M1 backtests. Differences arise from the different composition of M1 ticks on the broker's price server, and from different trade entry and exit prices. For instance, a trade at 15:00:01 would be entered at the first price quote after 15:00:01 in T1 mode, but at the close of the 15:00:00 M1 tick (or the open of the 15:01:00 tick) in M1 mode. Therefore M1 entry/exit prices can be off by one minute compared to T1 prices. If this difference is relevant for your strategy, test it with T1 data. Alternatively, the TickFix variable can shift M1 ticks forward or backward in time and compensate differences.

Backtest realism

Zorro backtests are as close to real trading as possible, especially when the TICKS flag is set, which causes trade functions and stop, profit, or entry limits to be evaluated at every price tick in the historical data. Theoretically a backtest should generate precisely the same trades and return the same profit or loss as live trading the script during the same time period. This is largely the case, but the following effects can cause differences:

The likeliness that the strategy exploits real inefficiencies depends on in which way it was developed and optimized. There are many factors that can cause bias in the test result. Curve fitting bias affects all strategies that use the same price data set for test and training. It generates largely too optimistic results and the illusion that a strategy is profitable when it isn't. Peeking bias is caused by letting knowledge of the future affect the trade algorithm. An example is calculating trade volume with capital allocation factors (OptimalF) that are generated from the whole test data set (always test at first with fixed lot sizes!). Data mining bias (or selection bias) is caused not only by data mining, but already by the mere act of developing a strategy with historical price data, since you will selecting the most profitable algorithm or asset dependent on test results. Trend bias affects all 'asymmetric' strategies that use different algorithms, parameters, or capital allocation factors for long and short trades. For preventing this, detrend the trade signals or the trade results. Granularity bias is a consequence of different price data resolution in test and in real trading. For reducing it, use the TICKS flag, especially when a trade management function is used. Sample size bias is the effect of the test period length on the results. Values derived from maxima and minima - such as drawdown - are usually proportional to the square root of the number of trades. This generates more pessimistic results on large test periods.

See also:

training, trading, mode, performance, troubleshooting


► latest version online