- Analyzes the top 200 ETFs by volume according to etfdb.com
- Uses the machine learning algorithms of the WEKA package from University of Waikato, New Zealand
in particular the
*RandomForest and IBk*classifiers. - Uses the following stock technical analysis tools from ta-lib libraries by TicTacTec LLC.
to create various attributes based on optimized and correlated values of other ETFs for the same period.
- MACD - moving average convergence/divergence
- ATR - average true range.
- Moving Average - SMA or EMA
- Stochastic Momemtum (not in TA-lib package)
- TSF - time series forecast.
- DMI - Directional Movement Index

- Uses the following classifications:
- Very Strong Sell - value of 0;
- Strong Sell - value of 1;
- Sell - value of 2;
- May be sell - value of 3;
- Weak sell - value of 4;
- Weak buy - value of 5;
- May be buy - value of 6;
- Buy - value of 7;
- Strong Buy - value of 8;
- Very Strong Buy - value of 9;

- Take the weighted average from all calls and then average again for a period of 5 days, 1-5 days, 6-10 days, etc... More weight is applied the closer the call is to the end of the period.
- The report only shows
- Strong Buy when average > 7.25
- Strong Sell when average < 1.75

- As of 2022-02-27 - I rewrote the instance datasets to make them more balanced. Why? Most ETFs and stock prices have a positive tendency; there are more up days than down days. A simple example is to look at the chart for $QQQ for the past ten years. Its trend is up; there are more up days than down days. This type of trend may be a problem for the A.I. program. When using data that contains more up days than down days, the predictions tend to have a positive or bullish bias. To overcome this bias, my rewrite split the instance data. The instances are now in two separate and balanced groups. The splitting logic tries to get the instance data to have an equal number of positive instance data with negative data. So initially if an instance dataset had 150 positive instances and 100 negative instances, then the new process would have to datasets. The first dataset will hold the first 100 positive instances with all 100 negative days. The second dataset contains the last 100 positive instances with all 100 negatives. And there would be an overlap of the positive cases between the two datasets. One of the benefits of splitting the instance data is that it allows the models to be run twice, thereby providing another set of outcomes.

See my video on this Getting A More Precise Buy/Sell Prediction Classifier

Last updated 2020-07-01

©2017 - 2020 McVerry Report LLC Raleigh NC USA