Dataset handling

The following functions can be used for downloading and parsing historical or recent data records from various sources on the Internet. Every record begins with a time stamp. The rest of the data can have arbitrary content, such as option chains, order book content, reports, interest rates, or any other CSV formatted data.

dataDownload (string Code, int Mode, int Period): int

Downloads the dataset with the given Code from Quandl™ or Yahoo™, and stores it in CSV format in the History folder. Returns the number of data records. Data is only downloaded when it is more recent than the last downloaded data plus the given Period in minutes (at 0 the data is always downloaded). The Quandl Bridge or Zorro S is required for loading Quandl datasets.

dataParse (int Handle, string Format, string FileName): int

Parses data records from the CSV file FileName and appends them to the begin of the dataset with the given Handle number. Records can have time/date, floating point, integer, and text fields. CSV headers are skipped. Several CSV files can be appended to the same dataset when their record format is identical. The CSV file can be in ascending or descending chronological order, but the resulting dataset should be always in descending order, i.e. the newest records are at the begin. Any record begins with the time stamp field in wdate format, followed by the other fields in the order determined by the Format string. Each record thus has a size of 8 bytes for the time stamp plus 4 bytes * number of f, i, s placeholders in the format string. The function returns the number of records read, or 0 when the file can not be read or has a wrong format.

dataSort (int Handle)

Sorts the dataset with the given Handle in descending time stamp order.

dataSave (int Handle, string FileName, int Start, int Num)

Stores a part of the dataset with the given Handle number in a binary file in the History folder for faster access. Num records are stored, beginning with the record Start. If both parameters are omitted or zero, the whole dataset is stored.

dataLoad (int Handle, string FileName, int Fields): int

Reads a dataset from a binary file. Fields is the number of fields per record, including the date/time field at the begin of any record. Returns the number of records read, or 0 when the file can not be read or has a wrong size.

dataNew (int Handle, int Records, int Fields): void*

Deletes the given dataset and creates a new empty dataset with the given number of Records and Fields. If they are 0, the dataset is just deleted and the memory freed. Returns a pointer to the begin of the first record, or 0 when no new dataset was created.

dataMerge (int Handle1, int Handle2): int

Merges the dataset Handle2 into Handle1 so that it contains all records in descending time stamp order. The Handle1 dataset must be either empty or have the same number of columns as Handle2. The number of rows may be different. Returns the total number of records, or 0 when the datasets could not be merged.

dataFind (int Handle, var Date): int

Returns the number of the first record at or before the given Date in wdate format. Returns -1 when no matching record was found or when no data array with this handle exists.

dataVar (int Handle, int Row, int Column): var

Returns the value of the floating point field Column from the record Row. If Column is 0, the time stamp of the record is returned in wdate format. If Row is negative, the record is taken from the end of the dataset, i.e. Row = -1 accesses the oldest record.

dataInt (int Handle, int Row, int Column): int

Returns the value of the integer field Column from the record Row.

dataStr (int Handle, int Row, int Column): string

Returns a string of up to 3 characters from the text field Column from the record Row. If Column is 0, it returns a pointer to the timestamp field, i.e. the start of the record. For getting a pointer to the first record, call dataStr(Handle,0,0). If Row or Column exceed the number of records and fields, 0 is returned.

dataSet (int Handle, int Row, int Column, var Value)

dataSet (int Handle, int Row, int Column, int Value)

Stores the Value in the floating point or integer field Column of the record Row. Can be used for modifying datasets f.i. for removing outliers or adding parameters. When modifying the time stamp field of the record (Column = 0), make sure to keep descending order of dates in the array.

dataFromQuandl (int Handle, string Format, string Code, int Column): var

Helper function for generating an indicator based on a Quandl™ dataset. Works in live trading as well as in backtest mode, and returns the content of the field Column from the dataset Code in the given Format. Source code in options.c, which must be included for using this function. Quandl Bridge or Zorro S required.
 
 
 

Parameters:

Code The Yahoo asset name or Quandl database/dataset code, f.i. "WIKI/AAPL". The name of the stored file is composed from Code with "/" replaced by "-", plus "1" when only the most recent record was downloaded, plus ".csv".
Mode FROM_YAHOO for downloading the dataset from Yahoo™, FROM_QUANDL for downloading it from Quandl™, FROM_QUANDL|1 for downloading only the most recent record for live trading.
Period Minimum time in minutes to keep the last downloaded file until a newer file is downloaded, or 0 for always downloading the file.
Handle A number from 1...800 that identifies the dataset. Handles above 800 are interally used for Zorro's pre-defined indicators.
FileName Name of the file. If no path is included, the file is expected in the History folder. If the name has no extension, ".csv" is added.
Format

Format string with placeholders, similar to the printf and wdatef format, for parsing CSV records into a dataset. Fields are separated with commas or semicolons (not mixed). Any field can be either empty, or contain a placeholder that determines the field content. Empty fields are skipped. The following placeholders are supported:

+ at the begin of the format string - ascending date order in the .csv file, and appending the file to the end of the dataset.
- at the begin of the format string - the .csv file contains no header. Otherwise the first line is assumed a header and is skipped.
f - for a floating point field, f.i. 123.456.
i - for an integer field. Nonnumerical characters are skipped, f.i. "07/21/16 13:57" is parsed as 721161357.
s - for a text field. Only the first 3 characters are stored.
%
- for the date/time field in the wdate format. A record must contains at least one date/time field. If there are more, f.i. separate fields for date and time, they are added.

The f, i, s placeholders can be followed by a field number in the destination dataset. Example: "+%Y%m%d %H%M%S,f3,f1,f2,f4,f6" parses Histdata™ CSV files into a dataset in T6 format (more examples below). If the number is omitted, the fields are parsed in ascending order. The number of fields in the format string can be different to the fields in the CSV record. The remaining fields are then filled with 0.

Records Number of records in the dataset.
Fields Number of fields per record, including the date field.
Date Timestamp in Windows DATE (wdate) format.
Start, Num The first record and the number of records to be stored.
Row, Column The record and field number, starting with 0. The date is always the first field of the record. If Row is negative, the record is taken from the end of the file, i.e. Row = -1 accesses the most recent record.
Value New value of the addressed field.

Remarks:

Example:

// parse iVolatility historical option chain data and store the resulting array 
void main()
{
  string Format = "+,,%m/%d/%y,,,i,f,s,s,f,f,f,,f";
  int records = dataParse(1,Format,"iVolatility_SPY_2014_1.csv");
  records += dataParse(1,Format,"iVolatility_SPY_2014_2.csv");
  records += dataParse(1,Format,"iVolatility_SPY_2015_1.csv");
  records += dataParse(1,Format,"iVolatility_SPY_2015_2.csv");
  records += dataParse(1,Format,"iVolatility_SPY_2016_1.csv");
  printf("\n%d records parsed",records);
  dataSave(1,"SPY_Options.t8");
}
 
// dataFromQuandl source code
var dataFromQuandl(int Handle,string Format,string Code,int Column)
{
  string Filename = strxc(Code,'/','-');
  if(dataFind(Handle,0) < 0) { // data array not yet loaded
    dataDownload(Code,FROM_QUANDL,12*60);
    dataParse(Handle,Format,Filename);
  }
  if(is(TRADEMODE) && !is(LOOKBACK)) {
    strcat(Filename,"1");
    int Rows = dataDownload(Code,FROM_QUANDL+1,60);
    if(Rows) dataParse(Handle,Format,Filename);	// add new record to the begin
    return dataVar(Handle,0,Column);
  } else {
    int Row = dataFind(Handle,wdate());
    return dataVar(Handle,Row,Column); 
  }
}
 
// US treasury 3-months interest rate
var DTB3() { 
  return dataFromQuandl(801,"%Y-%m-%d,f","FRED/DTB3",1); 
}
 
// COT report for S&P500
var CFTC_SP(int Column) { 
  return dataFromQuandl(802,"%Y-%m-%d,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f,f","CFTC/TIFF_CME_SP_ALL",Column); 
}

// more format examples
string Format = "%Y-%m-%d,f3,f1,f2,f4,,,f6,f5"; // Quandl futures data to .t6, f.i. "CHRIS/CME_CL1"
string Format = "%Y-%m-%d,f3,f1,f2,f4,f6,f5"; // Yahoo data to unadjusted .t6, with adjusted close stored in fVal

See also:

file, strvar, data import, options

► latest version online