Machine learning algorithms can be used for market prediction with Zorro's
advise functions.
Due to the low signal-to-noise ratio and to ever-changing market conditions,
analyzing price series is an ambitious task for machine
learning. But since the price curves are not completely random, even simple
machine learning methods, such as in the **DeepLearn** script,
can predict the next price movement with a better than 50% success rate. If
the success rate is high enough to overcome transactions costs is another
question.

Compared with other machine learning algorithms, such as Random Forests or Support Vector Machines, deep learning systems have often a high success rate with a small programming effort. A linear neural network with 8 inputs driven by indicators and 4 outputs for buying and selling has a structure like this:

Deep learning uses linear or special neural network structures (convolution layers, LSTM) with a large number of neurons and hidden layers. Some parameters common for most neural networks:

**Hidden layers**of a linear network are usually defined with a vector, f.i.**c(50,100,50)**defines 3 hidden layers, the first with 50, second with 100, and third with 50 neurons.- The
**Activation**function converts the sum of neuron input values to the neuron output. Most often used are a**Rectifier**(RELU = rectified linear unit) that has a linear slope from 0 to 1,**Sigmoid**that saturates to 0 or 1,**Tanh**that saturates to -1 or +1, or**SoftMax**that approximates the highest input. - An
**Eoch**is a training iteration over the entire data set. Training will stop once the number of epochs is reached. More epochs mean better prediction, but longer training. - The
**Learning rate**controls the step size for the gradient descent in training; a lower rate means finer steps and possibly more precise prediction, but longer training time. The**Larning rate scale**is a multiplication factor for changing the learning rate after each iteration. **Momentum**adds a fraction of the previous step to the current one. It prevents the gradient descent from getting stuck at a tiny local minimum or saddle point.- The
**Batch size**is a number of random samples – a**mini batch**– taken out of the data set for a single training run. Splitting the data into mini batches speeds up training since the weight gradient is then calculated from fewer samples. The higher the batch size, the better is the training, but the more time it will take. **Dropout**is a number of randomly selected neurons that are disabled during a mini batch. This way the net learns only with a part of its neurons, which can effectively reduce overfitting.

Here's a short description of installation and usage of four popular R based deep learning packages, each with an example of a (not really deep) linear neural net with one hidden layer.

library('deepnet') neural.train = function(model,XY) { XY <- as.matrix(XY) X <- XY[,-ncol(XY)] Y <- XY[,ncol(XY)] Y <- ifelse(Y > 0,1,0) Models[[model]] <<- sae.dnn.train(X,Y, hidden = c(30), learningrate = 0.5, momentum = 0.5, learningrate_scale = 1.0, output = "sigm", sae_output = "linear", numepochs = 100, batchsize = 100) } neural.predict = function(model,X) { if(is.vector(X)) X <- t(X) return(nn.predict(Models[[model]],X)) } neural.save = function(name) { save(Models,file=name) } neural.init = function() { set.seed(365) Models <<- vector("list") }

library('h2o') neural.train = function(model,XY) { XY <- as.h2o(XY) Models[[model]] <<- h2o.deeplearning( -ncol(XY),ncol(XY),XY, hidden = c(30), seed = 365) } neural.predict = function(model,X) { if(is.vector(X)) X <- as.h2o(as.data.frame(t(X))) else X <- as.h2o(X) Y <- h2o.predict(Models[[model]],X) return(as.vector(Y)) } neural.save = function(name) { save(Models,file=name) } neural.init = function() { h2o.init() Models <<- vector("list") }

Keras is available as a R library, but installing it with Tensorflow requires also a Python environment. First install Anaconda from www.anaconda.com. Open the Anaconda Navigator and install the

The Keras R script:

library('keras') neural.train = function(model,XY) { X <- data.matrix(XY[,-ncol(XY)]) Y <- XY[,ncol(XY)] Y <- ifelse(Y > 0,1,0) Model <- keras_model_sequential() Model %>% layer_dense(units=30,activation='relu',input_shape = c(ncol(X))) %>% layer_dropout(rate = 0.2) %>% layer_dense(units = 1, activation = 'sigmoid') Model %>% compile( loss = 'binary_crossentropy', optimizer = optimizer_rmsprop(), metrics = c('accuracy')) Model %>% fit(X, Y, epochs = 20, batch_size = 20, validation_split = 0, shuffle = FALSE) Models[[model]] <<- Model } neural.predict = function(model,X) { if(is.vector(X)) X <- t(X) X <- as.matrix(X) Y <- Models[[model]] %>% predict_proba(X) return(ifelse(Y > 0.5,1,0)) } neural.save = function(name) { for(i in c(1:length(Models))) Models[[i]] <<- serialize_model(Models[[i]]) save(Models,file=name) } neural.load <- function(name) { load(name,.GlobalEnv) for(i in c(1:length(Models))) Models[[i]] <<- unserialize_model(Models[[i]]) } neural.init = function() { set.seed(365) Models <<- vector("list") }

cran <- getOption("repos") cran["dmlc"] <- "https://s3-us-west-2.amazonaws.com/apache-mxnet/R/CRAN/" options(repos = cran) install.packages('mxnet')The MxNet R script:

library('mxnet') neural.train = function(model,XY) { X <- data.matrix(XY[,-ncol(XY)]) Y <- XY[,ncol(XY)] Y <- ifelse(Y > 0,1,0) Models[[model]] <<- mx.mlp(X,Y, hidden_node = c(30), out_node = 2, activation = "sigmoid", out_activation = "softmax", num.round = 20, array.batch.size = 20, learning.rate = 0.05, momentum = 0.9, eval.metric = mx.metric.accuracy) } neural.predict = function(model,X) { if(is.vector(X)) X <- t(X) X <- data.matrix(X) Y <- predict(Models[[model]],X) return(ifelse(Y[1,] > Y[2,],0,1)) } neural.save = function(name) { save(Models,file=name) } neural.init = function() { mx.set.seed(365) Models <<- vector("list") }