Investment Opportunities Forecasting: Extending the Grammar of a GP-based Tool

: In this paper we present a new version of a GP ﬁnancial forecasting tool, called EDDIE 8. The novelty of this version is that it allows the GP to search in the space of indicators, instead of using pre-speciﬁed ones. We compare EDDIE 8 with its predecessor, EDDIE 7, and ﬁnd that new and improved solutions can be found. Analysis also shows that, on average, EDDIE 8’s best tree performs better than the one of EDDIE 7. The above allows us to characterize EDDIE 8 as a valuable forecasting tool.


Introduction
The forecasting of time series is an important area in computational finance.There are numerous works that attempt to forecast the future price movements of a stock; several examples can be found in the literature. 1,2Some more recent works on time series predictions are Ref.3-7, which describe applications in both low and high frequency data.Furthermore, several different methods have been used for financial forecasting.Some examples of such methods are Support Vector Machines, [8][9][10] Artificial Neural Networks, [11][12][13][14] and Genetic Programming. 15,16Furthermore, some of these methods have been combined to produce hybrid systems.For instance, Huang et al. 17 combined support vector machines with neural networks to investigate credit rating, and Kim 18 combined neural networks with evolutionary strategies for financial forecasting.
EDDIE (Evolutionary Dynamic Data Investment Evaluator), [19][20][21] is a decision support tool that uses Genetic Programming (GP), 22,23 for financial forecasting.In this paper we present ED-DIE 8, which is the newest version.The novelty of this algorithm is in its rich, extended grammar.Instead of using a fixed number of pre-specified indicators from technical analysis, 24 like the previous versions do, EDDIE 8 allows the GP to search in the space of these technical indicators and use the ones that it considers to be optimal.Thanks to its extended grammar, EDDIE 8 is considered to be an improvement.This is because it has the potential, through the learning process, to discover better solutions that its predecessors cannot.A similar approach to ours, where there is an attempt to address the problem of fixed number of pre-specified strategies, can be found in Ref. 25,26, where Grammatical Evolution was used in place of the traditional GP.
In a previous work, 27 in order to present the value of EDDIE 8, we compared it with EDDIE 7, which is a re-implementation of Jin Li's EDDIE 4 (a.k.a.FGP-2) 20,28 , with the addition of some indicators that Martinez-Jaramillo 29 found helpful and used in his own version of EDDIE.Those experiments took place under an artificial dataset framework.This work serves as an important extension, because we test the performance of EDDIE 7 and ED-DIE 8 under 10 empirical datasets and then compare these results with the existing ones from the artificial framework.The rest of this paper is organized as follows: Section 2 explains how ED-DIE functions; it also presents the two versions discussed in this paper, EDDIE 7 and EDDIE 8. Section 3 presents the experimental parameters for our tests, and Sect. 4 discusses the results of the comparison of the two versions, on 10 different empirical datasets.Section 5 then briefly discusses the performance results and conclusions for the tests that took place under the artificial datasets. 27e then extend the conclusions drawn from the artificial datasets experiments to the empirical datasets experiments, in Sect.6.Finally, Sect.7 concludes this paper and discusses future work.

How EDDIE works
In this section we present the two versions, EDDIE 7 and ED-DIE 8, and explain their differences.We first start by presenting EDDIE 7 and the way it works.

EDDIE 7
EDDIE is a forecasting tool, which learns and extracts knowledge from a set of data.As we said in the previous section, EDDIE 7 is a re-implementation of Jin Li's FGP-2 with the only difference being that EDDIE 7 uses some additional indicators that Martinez-Jaramillo used in his version of EDDIE. 29he way that EDDIE 7, and all versions of EDDIE, work is as follows.The user first feeds the system with a set of past data; EDDIE then uses this data and, through a GP process, it produces and evolves Genetic Decision Trees (GDTs), which make recommendations to buy (1) or not-to-buy (0).It then evaluates the performance of these GDTs, on a training set, for each generation.The GDT with the highest fitness at the last generation is finally applied to a testing set.
The set of data EDDIE uses comprises three parts: daily closing price of a stock, attributes, and signals.Stocks' daily closing prices can be obtained online at websites such as http : //finance.yahoo.comand also from financial statistics databases like Datastream.The attributes are indicators commonly used in technical analysis. 24The choice of indicators de-Table 1 Technical Indicators used by EDDIE 7.Each indicator uses 2 different periods, 12 and 50, in order to take into account a short-term and a long-term period.For completeness, we provide formulas of our interpretation for these indicators in the Appendix.pends on the user and his belief in their relevance to the prediction.Table 1 presents the technical indicators that EDDIE 7 uses. 1 The signals are calculated by looking ahead of the closing price for a time horizon of n days, trying to detect if there is an increase of the price by r%. 19Thus, if such an increase occurs, we denote it by 1; otherwise, by 0. A positive signal (1) means that there is a buy opportunity in the market, because the price is going up.Therefore if someone could predict this, he would make profit.The more opportunities EDDIE can correctly predict, the more successful it is.The values of n and r are discussed later in this paper, in Sect.3.
After we feed the data to the system, EDDIE creates and evolves a population of GDTs. Figure 1 presents the Backus Naur Form (BNF) 31 (grammar) of EDDIE 7. As we can see, the root of the tree is an If-Then-Else statement.Then the first branch is either a Boolean (testing whether a technical indicator is greater than/less than/equal to a value), or a logic operator (and, or, not), which can hold multiple boolean conditions.The 'Then' and 'Else' branches can be a new Genetic Decision Tree (GDT), or a decision, to buy or not-to-buy (denoted by 1 and 0).
We would also like to draw the reader's attention at the Variable symbol of Fig. 1; here are the 12 indicators which we mentioned earlier in Table 1 that EDDIE 7 uses.They are pre-specified and should thus be considered as constants of the system.As we will see later, EDDIE 8 does not use these constants, but a function instead.
Each GDT's performance is evaluated by a fitness function, presented here.If the prediction of the GDT is positive (buy-1), and also the signal in the data for this specific entry is also positive (buy), then this is classified as True Positive (TP).If the prediction is positive, but the signal is negative (not-buy), then this is False Positive (FP).On the other hand, if the prediction is negative, and the signal is positive, then this is False Negative (FN), and if the prediction of the GDT is negative and the signal is also negative, then this is classified as True Negative (TN).These four together give the familiar confusion matrix, which is presented in Table 2.
As a result, we can use the metrics presented in (1), ( 2) and (3).

RC = T P + T N T P + T N + FP + FN
(1)

Rate of Missing Chances
We can then combine the above metrics and define the following fitness function, which is presented in (4): where w 1 , w 2 and w 3 are the weights for RC, RMC and RF respectively.Li 28 states that these weights are given in order to reflect the preferences of investors.For instance, a conservative investor would want to avoid failure; thus a higher weight for RF should be used.However, Li also states that tuning these parameters does not seem to affect the performance of the GP.For our experiments, we chose to include strategies that mainly focus on correctness and reduced failure.Thus these weights have been set to 0. During the evolutionary procedure, we allow three operators: crossover, mutation and reproduction.After reaching the last generation, the best-so-far GDT, in terms of fitness, is applied to the testing data.
Figure 2 summarizes what we have said so far, by presenting the pseudo code that the EDDIE algorithms use for their experiments.
This concludes this short presentation of EDDIE 7.However, EDDIE 7 and its previous versions are considered to have a drawback: nobody can guarantee that the periods chosen for the indicators are the appropriate ones.Why is 12 days MA the right period for a short term period and not 10, or 14?As we mentioned earlier, choosing an indicator and, as a consequence, a period for this indicator, depends on the user of EDDIE and his belief in how helpful this specific indicator can be for the prediction.However, it can be argued that this is subjective and different experts could pick a different period for their indicators.In addition, this choice of indicators limits the patterns that EDDIE 7 can discover.This is hence the focus of our research.We believe that allowing EDDIE to search in the space of the periods of the indicators would be advantageous and eliminate any possible weaknesses of the human decision process.For these purposes, we implemented a new version, EDDIE 8, which allows the GP to search in the search space of the periods of the indicators.The following section explains how EDDIE 8 manages this.

EDDIE 8
Let us consider a function y = f (x), where y is the output, and x is the input.In our case, the input is the indicators and the output is the prediction made by our GP.The function f is unknown to the user and is the GDTs that the algorithm generates, in order to make its prediction.As we just said in the previous section, the input is fixed in EDDIE 7; it uses 6 indicators, with 2 different pre-specified periods (12 and 50 days).This limits EDDIE 7's capability to find patterns that cannot be expressed in its vocabulary.EDDIE 8 uses another function y = f (g(z)), where x = g(z); in other words, g is a function that generates indicators and periods for EDDIE to use.EDDIE 8 is not only searching in the space of GDTs, but also in the space of indicators.It can thus return Genetic Decision Trees (GDTs) that are using any period within a range that is defined by the user.
As we can see from the new syntax at Fig. 3, there is no such  As a result, EDDIE 8 can return decision trees with indicators like 15 days Moving Average, 17 days Volatility, and so on.The period is not an issue anymore, and it is up to EDDIE 8, and as a consequence up to the GP and the evolutionary process, to decide which lengths are more valuable to the prediction.
The immediate consequence of this is that now EDDIE 8 is not restricted only to the 12 indicators that EDDIE 7 uses (which are still part of EDDIE 8's search space); on the contrary, it now has many more options available, thanks to this new grammar.

Experimental Parameters
As we said in Sect.2, the data we feed to EDDIE consist of daily closing prices.These closing prices are from 10 arbitrary stocks from FTSE100.These stocks are: British American Tobacco (BAT), British Petroleum (BP), Cadbury, Carnival, Hammerson, Imperial Tobacco, Next, Schroders, Tesco, and Unilever.The training period is 1000 days and the testing period 300.The GP parameters are presented in Table 3.The values of these parameters are the ones used by Koza. 22The results seem to be insensitive to these parameters.For statistical purposes, we run the GP 50 times for both EDDIE 7 and EDDIE 8.
Thus, the process is as follows.We create a population of 500 GDTs, which are evolved for 50 generations, over a training period of 1000 days.At the last generation, the best performing GDT in terms of fitness is saved and applied to the testing period.As we have already said, this procedure is done for 50 individual runs.
In addition, we should emphasize that we want the datasets to have a satisfactory number of actual positive signals.By this we mean that we are neither interested in datasets with a very low number of signals, nor those with an extremely high one.Such cases would be categorized as chance discovery, where people are interested in predicting rare events such as a stock market crash.Clearly this is not the case in our current work, where we use ED-DIE for investment opportunities forecasting.We are thus interested in datasets that have opportunities around 50-70% (i.e.50-70% of actual positive signals).Therefore, we need to calibrate the values of r and n accordingly, so that we can obtain the above percentage from our data.For our experiments, the value of n is set to 20 days.The value of r varies, depending on the dataset.This is because one dataset might reach a percentage of 50-70% with r = 4%, whereas another one might need a higher or lower r value.Accordingly, we need to calibrate the value of the R constraint, so that EDDIE produces GDTs that forecast positive signals in a range which includes the percentage of the actual positive signals of the dataset we are experimenting with.R thus takes values in the range of [−5%, +5%] of the number of positive signals that the dataset has.For instance, if under r = 4% and n = 20 days, a dataset has 60% of actual positive signals, then R would be set to [55,65].
Finally, we should mention that a single run of either version does not last for more than a few minutes.EDDIE 8 is slightly slower than EDDIE 7 of course, due to its large search space, but this fact does not seem to significantly affect its runtime.

Test Results
This section presents the experimental results after having tested the 10 datasets under EDDIE 7 and EDDIE 8. We first start by observing how EDDIE 8 affects the fitness of the population during the training period.We are interested in seeing whether the extended grammar is giving EDDIE 8 an advantage, and if this is the case, how fast this happens during the evolutionary procedure.We then continue by presenting a summary statistics comparison between the two versions, under the data of the testing period.At this point we should mention that all fitness results have been normalized to a scale of [0,1].The other measures (RC, RMC, RF) are already in this scale and thus no normalization took place.

Training performance comparison
In this section, we compare the training fitness of the two algorithms.As we have said, we are interested in examining the behaviour of the GP, now that it searches in a much bigger search space.Does it find very good solutions from the beginning of the evolutionary procedure, because now it has more options to look into?Or does it start with low performance due to these many options and later manages to focus on the promising ones?These are just two examples of behavioral questions we could be asking.
We conduct our analysis in two different parts.Firstly, we compare the training fitness in terms of the whole population.To do that, we calculate the average fitness for the whole population of GDTs; this process is done for each generation.Let us call this average Avg Fit .Thus, we can observe how the GDTs' Avg Fit changes over the 50 generations of a single run.We then repeat this procedure for each one of the 50 runs.Finally, we calculate the average, over these 50 runs, of Avg Fit .Figure 4 presents these results.Each line in the graph denotes the average Avg Fit for a different dataset.As we can see, the population of EDDIE 7 starts at generation 1 with an average fitness between 0.1-0.2, for all stocks.This quickly rises to 0.4-0.5 and stabilizes around 0.6, with half of the stocks slightly exceeding this level.On the other hand, EDDIE 8's population average fitness for all stocks starts from a much higher point, around 0.3.Fitness here also rises quickly to 0.5-0.6 and stabilizes between 0.6 and 0.7.As we can see, the average training fitness population of EDDIE 8 is somewhat higher than EDDIE 7's.It is obvious that EDDIE 8's grammar has allowed it to come up with better individuals in the first generation, and thus start with a population that has higher fitness.
For the second part of our analysis, we compare the fitness of the best individual (i.e. the GDT with the highest fitness) per generation; this fitness is called Best Fit .So now instead of calculating the average fitness of the whole population for each generation, we just obtain the highest fitness.We can thus present how the highest fitness changes over the 50 generations of a single run.We then repeat this procedure for each one of the 50 runs.Finally we find the average, over these 50 runs, of Best Fit .Figure 5 presents these results.In order to get a clearer idea of these results, we have divided them into two graphs per algorithm.The first column presents the graphs for EDDIE 7, and the second one for EDDIE 8.The graphs at the top are for the first 5 stocks (in alphabetical order) and the bottom graphs are for the remaining 5 stocks.We can see that results vary per stock for both algorithms, although they seem to follow the same pattern.The Best Fit values for EDDIE 7 start from a range of [0.58,0.66],at generation 1, and reach up to a range of [0.64,0.74],at the last generation.The datasets for EDDIE 8 seem to follow a very similar behavior: the Best Fit values start in the range of [0.58,0.67]and end up in the range of [0.65,0.75].
Table 4 also presents the average BestFit values for the first and the last generation.Each stock has 4 values, 2 for EDDIE 7 and 2 for EDDIE 8.The top value represents the average Best Fit at generation 1, and the bottom value represents the average Best Fit at generation 50.EDDIE 8's Best Fit starts with higher fitness for 7 stocks.This means that there are 3 stocks which EDDIE 7 has better initial values: Carnival (0.6298), Hammerson (0.6121), Schroders (0.5935).In addition, at the end of the evolutionary procedure (generation 50), there are 2 stocks that EDDIE 7's Best Fit is higher than EDDIE 8's: BAT (0.7320), and Hammerson (0.6894).However, these differences from EDDIE 7 are relatively small (below 1%).
As we can see, there can be times where EDDIE 7 outperforms EDDIE 8, although this is only to a small degree.Nonetheless, this is quite interesting, because it indicates that there can be cases where EDDIE 8 might not be able to outperform its predecessor.Of course, at this moment this is only an indication that comes from results during the training period and this is why more analysis needs to be conducted.

Summary Results for Testing Period
In this section we present summary results for the two algorithms, after the GDTs were applied to the testing period.The first part presents the averages of the metrics we used and the second part presents the improvements and diminutions caused by the best GDT evolved by EDDIE 8.

Average Results
We first start with the average results for Fitness.In this way, we can have a general view of how the two algorithms have performed.We then move to the performance measures (RC, RMC and RF).
Figure 6 presents the average fitness results over the 50 runs for EDDIE 7 and EDDIE 8.As mentioned at the beginning of this section, the results have been normalized and are in the scale of Figure 4 Average of the average fitness of the population of the GDTs for EDDIE 7 and EDDIE 8.This means that we first obtain the average fitness of the whole population, per generation.Then we find the average of this number over the 50 runs.
Figure 5 Average Best Fit .We first obtain the best GDT's fitness per generation, for each one of the 50 runs.This happens for both algorithms.We then calculate the average of these fitness values (over the 50 runs) and present them in this figure .For the convenience of the reader, we have split the stocks into 5 per graph (by alphabetical order).The graphs in the first column are for EDDIE 7 and the others for EDDIE 8.In order to test for the statistical significance of these results, we use the Kolmogorov-Smirnov test (K-S).We find that EDDIE 8 is better in only 3 stocks (BP, Carnival, Hammerson) and worse in 4 (Cadbury, Next, Schroders, Unilever), at 5% significance level.We get a similar picture for the rest of summary statistics results, namely RC, RMC and RF.Regarding the average RC (Fig. 7), EDDIE 8 is significantly better in 2 stocks only (Carnival, Hammerson), whereas it performs worse in 5 (Cadbury, Imperial Tobacco, Next, Schroders, Unilever).Figure 8 shows that EDDIE 8 is better in only 1 stock (Hammerson), in terms of average RMC, whereas EDDIE 7 performs better in 5 (BAT, BP, Next, Tesco, Unilever).Finally, Fig. 9 informs us that EDDIE 8 is better in 3 stocks (BAT, BP, Carnival, Tesco), in terms of RF, and worse in 5 (Cadbury, Imperial Tobacco, Next, Schroders, Unilever).The reader should bear in mind when reading the figures that we are interested in maximizing the values of Fitness and RC, and minimizing the values of RMC and RF.So when we say that EDDIE 8 performs better in terms of fitness and RC, it means that these values have increased; on the other hand, when we say that EDDIE 8 performs better in terms of RMC and RF, this means that these values have decreased.Finally, we should again mention that all of the results reported here have been tested by the K-S test and were found to be significant at 5% significance level.

Best GDTs
In this section, we investigate the improvements and diminutions caused by the best GDT that was evolved by EDDIE 8. From now on, we will be referring to this GDT as Best-8.Best-8 is essentially the GDT with the highest fitness at the end of the training period, among all 50 runs.It is thus the best solution that EDDIE 8 could come up with, after these 50 individual runs.After obtaining Best-8, we apply it to the testing period.Likewise, we obtain the best GDT evolved by EDDIE 7, named Best-7, and also apply    it to the testing period.
The reason for choosing to compare the best GDTs is quite obvious.If an investor was using EDDIE to assist him with his investments, he would run the algorithm many times, and then pick the best GDT that was produced during training.Thus, by comparing Best-7 and Best-8, we can get insight into which EDDIE version would be more effective to an investor's predictions.
Table 5 presents the improvements and diminutions caused by Best-8, after having calculated the differences between Best-7 and Best-8, for each metric.Thus, an entry with positive sign indicates that Best-8 has improved the results in that metric by the respective percentage.Likewise, an entry with negative sign indicates that Best-8's results for that metric have declined by the respective percentage.
In addition, the last two rows of Table 5 present the mean of the above improvements and diminutions.Therefore, when we want to calculate the mean of improvements for Fitness, we sum up the values where Fitness is positive; we hence sum up the Fitness values for BAT (7.31), BP (1.05), Carnival (10.15),Tesco (3.27), Unilever (9.72) and then divide them by 5 (since that is the number of stocks with positive sign).Hence, when we want to calculate the mean of improvements for a metric, we calculate the mean for those values that have positive sign.On the other hand, when we want to calculate the mean of diminutions, we calculate the mean for those values that have negative sign.The same process stands for all metrics of the table.
Finally, apart from Fitness and the three metrics presented earlier in Sect.2, Table 5 uses two additional metrics: Average Annualized Rate of Return (AARR), and Rate of Positive Return (RPR).Since the EDDIE application lies in finance, we consider that it would be beneficial to an investor to use as a reference performance criteria that are related to investment return.Obviously, the higher these metrics are, the higher the return for the investor.The formulas for these two additional metrics are presented in the Appendix.We should emphasize that these two metrics are given here only as reference, and are not part of the fitness function that EDDIE 7 and EDDIE 8 use.
What we can observe from Table 5 is that Best-8 does better than Best-7 for 5 stocks in terms of Fitness (BAT, BP, Carnival, Tesco, Unilever), for 4 stocks in terms of RC (BAT, Carnival, Tesco, Unilever), for 4 stocks in terms of RMC (BAT, Carvival, Imp.Tobacco, Schroders), for 6 stocks in terms of RF (BAT, BP, Carnival, Hammerson, Tesco, Unilever), for 5 stocks in terms of AARR (BAT, BP, Imp.Tobacco, Tesco, Unilever), and for 5 stocks in terms of RPR (BAT, BP, Carnival, Tesco, Unilever).The differences in the values of the metrics are often quite big; for instance, EDDIE 8 has improved the Fitness of BAT and Carnival by 7.31 and 10.15%, respectively.What is even more remarkable is the differences in AARR: 31.03% for Tesco, and 48.81% for Unilever.Similar extremes can be observed for the diminutions.However, it seems that the improvements of Best-8 have a greater impact than its diminutions.
To make this clearer, let us move our focus to the last two rows of the table, where the mean of Best-8's improvements and diminutions in all metrics is presented.As we can see, improvements have on average had a greater effect than diminutions (6.30% vs -5.22% [Fitness], 8.00% vs -3.83% [RC], 11.64% vs -3.70% [RMC], 7.10% vs -6.96% [RF], 25.55% vs -16.61% [AARR], 8.62% vs -6.82% [RPR]).This is a very important result, because it indicates that an investor using EDDIE 8's best GDT would on average gain more than if he was using EDDIE 7's best tree.

Discussion on the summary statistics results
So far we have presented summary statistics for EDDIE 7 and EDDIE 8. From what we saw in the previous sections, EDDIE 7 outperforms EDDIE 8 in more stocks, in terms of all average statistics (Fitness, RC, RMC and RF).On the other hand, EDDIE 8 outperforms EDDIE 7 in terms of the average results of the best GDT.
An interesting observation from the above is that although ED-DIE 8's best GDT can on average perform better than the one of EDDIE 7, this superiority is not reflected in the mean values of Fitness, RC, RMC, and RF.EDDIE 8 is able to come up with very good GDTs, sometimes even better than EDDIE 7's.However, the problem is that it does not come up with such trees often enough.Figure 10 illustrates this problem.It presents the relationship between performance (i.e.fitness) (x-axis) and precision (y-axis).It is divided into two parts.The top graph (Fig. 10a) presents the performance-precision values for stocks where EDDIE 8's average fitness is lower than EDDIE 7's.Let us denote these two fitness values by ED8 Fit and ED7 Fit , respectively.The bottom graph (Fig. 10b) presents the performance-precision relationship for stocks where ED8 Fit > ED7 Fit .
What we can observe from Fig. 10 is that EDDIE 8 always has lower precision than EDDIE 7 for stocks where ED8 Fit < ED7 Fit .This indicates that EDDIE 8's GDTs are spread in a bigger fitness range than the ones of EDDIE 7. It seems that there is something preventing EDDIE 8 from having results with high fitness more often.The picture is exactly opposite in Fig. 10b, where ED8 Fit > ED7 Fit .We can see that here EDDIE 8 is not having difficulties finding good solutions, with precision at least as good as EDDIE 7's.
To summarize, the conclusions we can draw are the following: • EDDIE 8 can perform better than EDDIE 7 • However, there are stocks where ED8 Fit < ED7 Fit • EDDIE 8's best GDT does on average better than EDDIE 7's best GDT • EDDIE 8's precision is lower than EDDIE 7's, for stocks where ED8 Fit < ED7 Fit .This does not happen for stocks where ED8 Fit > ED7 Fit • Therefore, there is something which prevents EDDIE 8 from returning high fitness GDTs more often.This unknown factor reduces EDDIE 8's precision and only happens when Hence, our next goal is to identify the reason that EDDIE 8 cannot return high fitness GDTs more often, for the stocks where ED8 Fit < ED7 Fit .One explanation could be that there is something special in the nature of the patterns of these stocks.We therefore need to deepen our analysis and try to explain when and why EDDIE 8 outperforms EDDIE 7.

Artificial Datasets
So far, the experiments were tested under 10 empirical datasets.As we saw, results cannot be considered conclusive, since it is not yet clear why EDDIE 8 cannot always outperform EDDIE 7.This section attempts to provide an answer to this question, by presenting some previously derived results, 27 where we used artificial datasets.The reason for using artificial datasets was twofold.First of all, a potential drawback of experimental work with real data is that we cannot be sure that there are always patterns in the data.As the result, a failure of an algorithm to find patterns could also be attributed to this fact.Of course, someone could argue that in our current work both EDDIE 7 and EDDIE 8 have managed to find patterns and that EDDIE 7 just happens to be better in more cases.Nonetheless, using our own artificial dataset can reassure us of the existence of such patterns.At the same time, artificial datasets can guarantee the absence of any noise.The second reason we used artificial datasets was that we could have control over the nature of the patterns.This was very important, because it enabled us to study the weaknesses and strengths of the algorithms, i.e. with what kind of data would EDDIE 7 or EDDIE 8 perform better.
In order to study how different patterns can affect the results, we created two different datasets, one with patterns from EDDIE 7's vocabulary only, and one with patterns from the extended vocabulary of EDDIE 8.It was then found that when patterns came from EDDIE 7's vocabulary, EDDIE 7 would perform significantly better than EDDIE 8.This was an interesting observation, because although EDDIE 7's patterns were included in EDDIE 8's search space, the latter seemed to have difficulties in finding those patterns.On the other hand, when we tested the two versions under the dataset that had patterns from EDDIE 8, EDDIE 7 was outperformed by EDDIE 8, which was of course something we anticipated.
Thus, what we concluded from the experiments under artificial datasets, was that results were indeed affected by the patterns in the dataset.More importantly, it seems that there is a tradeoff between 'searching in a bigger space' and 'search effectiveness'.Hence, when patterns come from EDDIE 7's limited vocabulary, EDDIE 8 is having difficulties in searching effectively in such a small search space.The solutions are indeed in its search space, but because they come from a very small area of it, ED-DIE 8 cannot only focus its search in this area.The search space has increased exponentially and there is an obvious trade-off between the more expressive language that EDDIE 8 provides and the search efficiency of EDDIE 7.

Extending the Artificial Datasets' Results
So far, we have made two valuable observations in Sect. 4 and 5: • EDDIE 8 has lower precision than EDDIE 7, for stocks where ED8 Fit < ED7 Fit • EDDIE 8 performs better than EDDIE 7 (on artificial datasets), when patterns come from EDDIE 8's vocabulary.If, on the other hand, patterns come from EDDIE 7's vocabulary, then EDDIE 8 is having difficulties discovering them, and thus ends up with lower performance We also said at the end of Sect. 4 that a plausible explanation for EDDIE 8's lower precision is that the nature of the patterns in the data prevents EDDIE 8 from performing well more often.Now, after having the insight from the artificial datasets' results, we want to see if we can apply our conclusions to the empirical datasets.We shall therefore move our focus to the indicators that EDDIE 8's GDTs use and examine their relation with EDDIE 7's vocabulary.We saw earlier that if patterns in the hidden function come from EDDIE 7's vocabulary, then EDDIE 8 is having difficulties discovering them.This is what we are going to investigate now with the empirical datasets.Our aim is to show that when ED8 Fit < ED7 Fit , it is because the GDTs of EDDIE 8 contain a high percentage of indicators that come from the vocabulary of EDDIE 7, or indicators very close to it.If this happens, it means that EDDIE 8 needs to look for patterns in a very small search space, and thus faces difficulties in doing so.
One more thing to say is that here there are no hidden functions that EDDIE 8 is trying to discover.When dealing with empirical datasets, we have "solutions".A solution should be considered as the GDT that had the highest fitness at the end of the training period, and was then applied to the testing period.This GDT was the best solution EDDIE 8 could come up with for that specific run, for that specific dataset.
Let us now have a look into the components of the best solution of EDDIE 8, which as we said in Sect.4.2.2 is called Best-8.A reminder that Best-8 is obtained by first getting the "best GDT" (solution) per each individual run.We thus have 50 best GDTs, which presumably have high fitness.We finally pick the best one among them.Best-8 is therefore the best tree that EDDIE 8 could find among a total of 50 runs3 .We want to examine the components of Best-8, and calculate the percentage of indicators that come from the vocabulary of EDDIE 7. Figures 11 and 12 present us these results, for stocks where ED8 Fit < ED7 Fit (11) and stocks where ED8 Fit > ED7 Fit (12).The x-axis shows the number of days that an indicator of EDDIE 8 is away from the pre-specified indicators of EDDIE 7.For instance, "+/-1" means that EDDIE 8's indicator has a distance from EDDIE 7's indicators by a +1 or -1 day.Thus, since EDDIE 7's indicators have lengths 12 and 50 days, EDDIE 8's indicators in this example could be 11, 12, 13, 49, 50 and 51.The y-axis presents the percentage of EDDIE 8's indicators that come from EDDIE 7's vocabulary.
As we can see from Fig. 11, even though none of the 4 stocks' Best-8 trees are using any indicators from the vocabulary of ED-DIE 7 (all stocks have 0% at +/-0 days), they are using indicators in a very close range.To be more specific, 50-60% of the Best-8 indicators for these 4 stocks are close to indicators from EDDIE 7's grammar, in a range of [-4,+4] days; this percentage increases to 50-80% for range [-6,+6] days.
On the other hand, for stocks where EDDIE 7 is outperformed by EDDIE 8 (Fig. 12), the previous percentage is much lower.For the range of [-4,+4] days, Best-8 for all 3 stocks has a percentage of 18-30%.For the range of [-6,+6] days, this percentage increases only a little, and is in the range of 18-44%, which is clearly much lower than the percentages we observed in Fig. 11.
Our theory hence seems to be verified.EDDIE 8's performance is indeed affected by the nature of the patterns in the GDTs.When these patterns come from EDDIE 8's broader vocabulary, then ED-DIE 8 has no problem finding these GDTs.On the other hand, when solutions come from a very small space (in our case a search space around the one of EDDIE 7), then EDDIE 8 is having difficulties focusing there.This, as a consequence, affects EDDIE 8's performance results, which become poorer than the ones of EDDIE 7.
In this paper we presented two investment opportunities forecasting algorithms, EDDIE 7 and EDDIE 8. EDDIE 8 is an extension of EDDIE 7, because it extends its grammar.Traditionally, EDDIE 7 and other similar GP algorithms use predefined indicators to forecast the future movements of the price with a pre-specified period length.In this approach, we suggested that it should be left to the GP to decide the optimal period length.ED-DIE 8 is thus an improvement to the previous algorithm because it has richer grammar and also because it can come up with solutions that EDDIE 7 could never discover.In addition, the improvements introduced by the best GDT evolved by EDDIE 8, called Best-8, have on average a greater impact than its diminutions.This is quite an important finding, because it indicates that an investor using EDDIE 8's best GDT would on average gain more than if he was using EDDIE 7's best tree.More specifically, we found that ED-DIE 8's best tree was, on average, able to outperform EDDIE 7's best tree in terms of all performance measures.In addition, Best-8 had significantly higher average annual return (AARR) than Best-7, which means that on an annual basis, an investor would make more profit if he was using EDDIE 8.The above thus allows us to characterize EDDIE 8 as a successful extension to its predecessor, and a valuable forecasting tool.
However, there seems to be a trade-off between 'discovering new solutions' and 'effective search'.Results from 10 empirical datasets from FTSE100 showed that EDDIE 8 cannot always outperform EDDIE 7. In order to further understand this behaviour, we used previously derived results under artificial datasets 27 .Those results were suggesting that EDDIE 8 can outperform ED-DIE 7, as long as the solutions come from its own vocabulary.If they come from EDDIE 7's, then EDDIE 8 is having difficulties finding these solutions, due to the fact that it has to look in EDDIE 7's narrow search space.These results were also verified by our empirical datasets.
We can thus conclude that the current version of EDDIE 8 has its limitations.Nevertheless, EDDIE 8 is still a very valuable tool, due to the fact that it can guarantee significantly higher profits than its predecessor, as already explained.
Future research will focus on improving EDDIE 8's search effectiveness.Of course there are different ways to do this.A promising way that we have already investigated is hyperheuristics [32][33][34][35] , where we created a framework for financial forecasting 36 .The results were promising, because hyper-heuristics led to a significant decrease in the RMC; another promising result was that the search became more effective, since more areas of the search space were visited.We believe that more sophisticated hyper-heuristic frameworks, which will include more heuristics than the ones used in 36 , can lead to even better results.We therefore intend to focus in that direction.Moreover, another direction of our research could be to produce some new search operators or to create a new constrained fitness function.

Figure 1
Figure 1 The Backus Naur Form of the EDDIE 7.

Figure 3
Figure 3 The Backus Naur Form of EDDIE 8

Figure 6
Figure 6 Summary results over 50 runs for fitness for ED-DIE 7 and EDDIE 8. Results are normalized to [0,1] scale.

Figure 7
Figure 7 Summary results over 50 runs for RC for EDDIE 7 and EDDIE 8. Results are on the scale of [0,1].

Figure 8
Figure 8 Summary results over 50 runs for RMC for ED-DIE 7 and EDDIE 8. Results are on the scale of [0,1].

Figure 9
Figure 9 Summary results over 50 runs for RF for EDDIE 7 and EDDIE 8. Results are on the scale of [0,1].

Figure 10
Figure 10 Performance-precision relationship.The x-axis presents the performance (average fitness) and the y-axis the average precision over 50 runs.The top graph (a) presents the stocks where ED8 Fit < ED7 Fit .The bottom graph (b) presents the stocks where ED8 Fit > ED7 Fit .

Figure 11 Figure 12
Figure 11 Percentage of EDDIE 8's indicators that are close to EDDIE 7's vocabulary, for the stocks that ED8 Fit < ED7 Fit .This percentage can be viewed in the y-axis.The x-axis presents the number of days that an EDDIE 8's indicator is away from the pre-specified indicators of EDDIE 7

Table 2
Confusion Matrix
thing as a Variable symbol in EDDIE 8. Instead, there is the Var-Constructor function, which takes two children.The first one is the indicator, and the second one is the Period.Period is an integer within the parameterized range [MinP, MaxP] that the user specifies.

Table 4 Average
Best Fit at generation 1 and 50, for EDDIE 7 and EDDIE 8, over the 10 stocks.Each stock has 4 values, 2 for EDDIE 7 and 2 for EDDIE 8.The top value represents the average Best Fit at generation 1, and the value represents the average Best Fit at generation 50.

Table 5
Improvements and diminutions of Best-8 in Fitness and the other metrics, for all 10 stocks.The data presented in the first ten rows is basically the difference between metric values of EDDIE 8 and EDDIE 7. Finally, the last two rows of the table present the mean of the improvements and diminutions of the Best-8 to the metrics.