Volatility Forecasting A comparative study of different forecasting models. Emil Sturesson, Anton Wennström Bachelor’s Thesis in Financial Economics, 15HP Supervisor: Marcin Zamojski University of Gothenburg Sweden Spring Term 2023 Abstract This study evaluates the out-of-sample forecasting performance of different volatility mod- els. When applied to XACT OMXS30, we use GARCH(1,1), EGARCH(1,1), and t- GAS(1,1) to forecast squared daily returns while Realized GARCH(1,1) and HAR-RV are used to forecast Realized Variance. We forecast both measures with open-close as well as close-close data. One-day-ahead forecasts are computed using a five year mov- ing window. The performance is measured with two different loss functions, MSE and QLIKE. The Diebold-Mariano test is then used to test significance. Our findings indicate that EGARCH(1,1) is superior when forecasting squared daily returns and that HAR-RV is superior when forecasting Realized Variance. Comparing EGARCH and HAR-RV, we find that the latter is more accurate for a symmetrical loss function while EGARCH is superior using the QLIKE loss function. We find no evidence indicating that Student’s t-distribution for the conditional volatility improves forecasting accuracy. Finally, we con- clude that open-close data generates smaller forecast errors than close-close data. Keywords: Volatility, GARCH, EGARCH, t-GAS, HAR-RV, Realized GARCH, Volatil- ity Forecasting, Volatility Modelling 1 Acknowledgements We would like to express our gratitude to our supervisor, Marcin Zamojski, for his excep- tional guidance, support, and expertise throughout the completion of our thesis. Further, we would like to extend our appreciation to the teachers at the University of Gothenburg for their knowledge and engaging teaching methods that have provided a solid foundation for the research conducted in this thesis. Finally, we would like to thank our opponents for valuable feedback and discussions. 2 Contents 1 Introduction 5 2 Literature Review 8 2.1 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Previous Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Methods 13 3.1 GARCH-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.1 ARCH(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.2 GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.3 E-GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 GAS(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 t-GAS(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3 Realized Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.1 HAR-RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3.2 Realized GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.1 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.2 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.4.3 Diebold-Mariano Test . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Data 21 4.1 Daily Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Realized Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5 Results 30 5.1 Squared Daily Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.1.1 Open-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 5.1.2 Close-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Realized Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.1 Open-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2.2 Close-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 EGARCH vs. HAR-RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4 Volatility Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6 Conclusions 42 4 1 Introduction The financial environment is and has always been influenced by a forward-looking perspec- tive where expectations about future events are reflected in the current price of financial assets. To gain an advantage, or simply to limit risk, people constantly try to predict what will happen in the future based on available information. One of the areas where forecasting has been applied and studied to a great extent is volatility. Assets with high volatility are more risky, and investors demand higher returns from them to justify the risk. Consequently, modeling the conditional variances of assets is a significant research area in finance, which this study focuses on. Specifically, we evaluate and compare the perfor- mance of different volatility models, including GARCH(1,1), EGARCH(1,1), t-GAS(1,1), HAR-RV, and Realized GARCH(1,1), on the OMXS30 index over a period of 10 years. Our primary contribution is to provide empirical evidence on the out-of-sample performance of these models. Given the varying strengths and weaknesses of different forecasting mod- els, this study aims to compare their performance to identify the most accurate model in regard to squared daily returns and realized measures. The Autoregressive Conditional Heteroskedasticity (ARCH) model introduced by Engle (1982) allows the conditional variance to change over time in order to capture volatility clustering. Bollerslev (1986) proposes a Generalized ARCH: GARCH, that is able to cap- ture the clustering effect using fewer lags than the ARCH. Different GARCH models have been proposed to deal with different distributions and properties: Nelson (1991) introduces the Exponential GARCH (EGARCH) model to deal with the fact that volatility seems to be larger following negative returns than positive returns. Creal et al. (2013) propose a score-driven model, that encompasses the Gaussian GARCH(1,1) as well as many other models. Since the model is score-driven it can provide more detailed information about the probability density function than just the mean and higher moments, which makes it more suitable to model conditionally t-distributed returns (t-GAS) than a GARCH model would be. 5 Much of the recent development in the field of volatility modelling is focused on realized measures. Andersen et al. (2003) propose Realized Variance (RV) as an unbiased estimator of the true volatility. Other complex realized estimators have been proposed to deal with microstructure noise (see Barndorff-Nielsen et al., 2008). Hansen et al. (2012) propose a GARCH model that uses realized measures together with daily returns to estimate volatil- ity. HAR-RV by Corsi (2009) is another powerful model that utilizes realized measures. While standard GARCH models prove to have no scaling behaviour, that is incorporating long-memory processes, the HAR-RV is beneficial to include in this study as it is able to capture volatility persistence over a longer time horizon. When modelling and forecasting volatility it is necessary to consider whether the time series are defined as observation-driven models or parameter-driven models (Creal et al., 2013). The models that we so far have discussed are all observation-driven. These models are commonly used as they only use past observations to predict the parameters, which is computationally simpler than the parameter-driven models. Parameter-driven mod- els assume stochastic parameters that are at least partly independent and therefore the parameters require more complex estimation methods. One of the most used parameter- driven models is the stochastic volatility model (SV). In the interest of time, we choose to focus on observation-driven models, to which the GARCH models belong. In this paper, we study two sets of return series, open-close and close-close. The latter is expected to be more volatile and it is, therefore, of interest to analyse the two sets of data separately. When estimating the models, we utilize intraday data for the models incorporating realized variance. The return series is re-sampled at different frequencies in order to find the optimal frequency with the least microstructure noise. Further, the ob- servations are checked for autocorrelation as well as the unconditional distribution. Once shown that historical squared daily returns and realized variance provide sufficient infor- 6 mation to predict future volatility, the models are estimated and used for one-step-ahead forecasting using a rolling window. To evaluate the forecasts we use two different loss functions: mean squared error and QLIKE. These are two robust loss functions that are able to incorporate any imperfec- tions in the proxy of the true volatility. Further, we use the Diebold-Mariano test to see whether the differences in the loss functions are significant or not. This is necessary for us to be able to draw any conclusions about which model provides the best volatility forecast. Our findings when evaluating the forecasting performance of the volatility models are that the EGARCH model outperforms GARCH and t-GAS for both Mean Squared Error and QLIKE loss function. For models using realized measures, we conclude that the HAR-RV model significantly outperforms the Realized GARCH model for both mean squared error and the QLIKE loss function. Comparing EGARCH and HAR-RV, we see that the supe- rior model depends on which loss function is being used where EGARCH is superior when evaluating by the QLIKE loss function and HAR-RV significantly better when applying Mean Squared Error. The rest of the paper is structured as follows. In Section 2, we discuss the previous liter- ature in the area. In Section 3, we explicate the econometric methodology, incorporating a detailed explanation of the model specifications and performance measures employed in our empirical analysis. In Section 4, we discuss data collecting and processing. In Section 5, we present the results of our analysis. Finally, in Section 6, we present our conclusions. 7 2 Literature Review 2.1 Volatility When forecasting an unobserved variable, a common problem is the fact that one has to solve for a proxy used as an unbiased estimator. The choice of the proxy has a significant impact when forecasting the variable of interest. If the forecast does not depend on the proxy, it means that the proxy variable is not a useful predictor of volatility which will result in a less performative model. While many of the GARCH models utilize squared daily returns as a proxy for the true level of volatility, many studies indicate that the realized variance is a much more informative and precise estimator for the current level of volatility (Andersen & Bollerslev, 1998; Hansen et al., 2003). Bandi and Russell (2008) discuss a trade-off between bias and variance of the estimator that has to be considered when estimating realized variance. They state that higher sam- pling frequency results in more precise estimates when the true price process is observable. But in reality, the true price process is not observable due to microstructure frictions. In- creasing sampling frequency will in this case provide a higher degree of information about the variance, however, it will also include a higher degree of noise (bid-ask spread, etc.). Patton (2011) shows that volatility proxies with less noise can significantly improve fore- casting ability. However, even though realized variance is seen as a superior volatility proxy compared to squared daily returns, L. Y. Liu et al. (2015) show that it still can incorporate a relatively large degree of distortion and, therefore, it leads to a trade-off between better accuracy with higher frequency and significant effects of microstructure noise. Patton (2011) shows how this proxy gets more efficient as the observation fre- quency increases. Using 5-min returns when analysing a stock index return significantly reduces the noise compared to when using half-hour returns. 8 Bollerslev et al. (2006) identify another significant advantage of using intra-day data as it can provide a more accurate assessment of the two key factors that drive the asymmetric relationship between volatility and past returns. Specifically, the leverage effect that ex- plains why negative returns tend to result in higher volatility and the volatility feedback effect that describes how higher volatility levels can lead to negative returns. In lower fre- quency data, e.g., daily observations, these causal relationships may appear immediately and can be indistinguishable from one another. Therefore, by using high-frequency data, it is possible to differentiate between the leverage effect and the volatility feedback effect more clearly and describe the relationship between past returns and volatility. In a comprehensive study of over 400 estimators of asset price variation across various asset classes, L. Y. Liu et al. (2015) conclude that realized variance is not inferior to any other estimator. The study’s use of a five-minute sampling frequency as the benchmark realized measure proves that there is little evidence suggesting its inferiority to other estimators. Moreover, adopting a five-minute sampling frequency for realized variance yields superior results for less liquid assets, while the advantages of utilizing more advanced estimators are more noticeable for liquid assets. Engle and Patton (2001) classify volatility models into two main categories. The first category, known as observation-driven models, involves formulating conditional variance as a direct function of observable variables. The second category, referred to as latent volatility models, i.e., parameter-driven models, are based on variables that are not solely observable which makes it more difficult to forecast the future volatility compared to when utilizing observation-driven models. An example of a parameter-driven model is the Stochastic Volatility model (SV). Further, the authors highlight some stylized facts about asset price and volatility that they believe have to be incorporated by volatility models to provide accurate results. The four main empirical properties discussed are as follows: 1. Clustering, periods of large or small changes tends to come in clusters. Today’s 9 volatility shocks will have a lasting impact on the anticipated volatility for many future periods. 2. Leverage effect, volatility increases more after a negative price shock than after positive returns of the same size. 3. Mean reversion, there is a normal level of volatility that the volatility eventually returns to. Long-run forecasts should converge to the normal level of volatility. 4. Heavy tails, the unconditional distribution of returns has fat tails. 2.2 Previous Comparisons When forecasting conditional variances there are multiple GARCH-type models to use, each with different modifications and adjustments to fit the observed data. Hansen and Lunde (2005) compare forecasts generated by 330 different GARCH models to determine whether there are any models that prove to be better at forecasting the conditional variance than the most commonly used, GARCH(1,1) model. While they show that GARCH(1,1) proves to be no worse in forecasting exchange rate data, it seems to perform worse than many other models when forecasting stock returns, more specifically other GARCH mod- els that incorporate the leverage effect, e.g., EGARCH. Bollerslev (1987) studies the distributional properties of stock returns and their possible implications for the performance of volatility forecasting models. When comparing the performance of the normal GARCH(1,1) model to the t-distributed GARCH(1,1) model he concludes that the latter performs relatively well as it can capture the non-Gaussian properties of the return series, i.e., heavy tails. H.-C. Liu and Hung (2010) investigate the effectiveness of various GARCH models with different distributional assumptions as well as their ability to incorporate the leverage effect. According to the study, while GARCH models that assume various probability 10 distributions are not very effective in improving forecasting performance in the presence of fat-tailed distributions, asymmetric models such as EGARCH and GJR-GARCH show better results in predicting stock market volatility. Moreover, modeling an asymmetric component is more important than adjusting the error term distribution, and using a Gaussian distribution is recommended when using a GARCH model. Christoffersen et al. (2014) further analyse whether utilizing realized measures in volatility models may not only improve the forecasting accuracy but also bring economic gains. They create a model that incorporates two models with different volatility components, namely daily returns and realized measures. Their findings demonstrated that the inclusion of realized measures leads to reduced prediction errors across key economic benchmarks, in- cluding moneyness, maturity, and volatility. This suggests that utilizing realized measures in volatility models not only improves forecasting accuracy but also holds potential eco- nomic benefits. Koopman et al. (2016) study the ability of observation-driven models and parameter- driven models to predict time-varying parameters. When comparing models from the two classes, it is shown that an observation-driven model with a score function, e.g., t-GAS, performs equally well as correctly specified parameter-driven counterparts. Score-driven models are a class of observation-driven models that consider all relevant features of the observation density function and provide a general way of updating parameters. These empirical properties make score-driven models a commonly used model for the purpose of volatility forecasting (Artemova et al., 2022). 2.3 Performance Evaluation Since true volatility can not be observed, a proxy for the true volatility has to be used when forecasting volatility. The estimation error of the proxy itself is likely to distort the 11 evaluation of the models. Many studies have shown that the superior model depends on the choice of loss function, see Hansen and Lunde (2005) and Patton (2011). The latter also concludes that the two loss functions that are robust enough to handle imperfections in the proxy for volatility are Mean Squared Error (MSE) and Quasi-Likelihood (QLIKE). The QLIKE function is an asymmetrical loss function that penalizes under-prediction heavier than over-prediction, while the MSE penalizes symmetrically. This implies that if you are comparing two forecast procedures, and one consistently produces positively biased forecasts, while the other produces forecasts that are negatively biased by the same magnitude, then the QLIKE function can significantly favor the positively biased forecast. Penalizing under-prediction heavier than over-predictions is preferred as the former is nor- mally more costly and, therefore, of importance when considering activities such as risk management. Studies have shown that the Realized GARCH outperforms the EGARCH when using the QLIKE loss function, however, when using MSE the EGARCH model is superior (Sharma & Vipul, 2016). The Diebold-Mariano test (Diebold &Mariano, 1995) is useful when testing and comparing the accuracy of forecasts for two different models. However, when Diebold (2015) looks back on his earlier work he concludes that the DM test is to a large extent applied with an improper purpose or intention. Instead of comparing forecast performance, much literature uses the DM test to compare the models themselves. This clear distinction is necessary to avoid drawing false conclusions from the results of a study. When the DM test is applied with consideration to its actual intentions, it can be very useful thanks to its simplicity and wide applicability. Harvey et al. (1997) analyse the behaviour of the DM test further and conclude that despite that it is easily computed, it generally performs better for large samples while it tends to be oversized for smaller sample sizes. When applied on larger samples, the DM test performs well even in the case of autocorrelated forecast errors, and fat-tailed as well as Gaussian distributed errors. 12 3 Methods We calculate daily open-close and close-close log returns as follows: rOC,t = ln(Pclose,t)− ln(Popen,t), (1) rCC,t = ln(Pclose,t)− ln(Pclose,t−1), (2) Using log returns is preferred when working with time series of stock prices as in this case since it allows us to add up periods of returns and say something about the total return of that period. For example, if we have the log return for two consecutive days we can simply add these two to get the total return over the two days. Since volatility is not an observable variable it is necessary to make use of signal as an approximation of the true volatility. In this study, the daily volatility is approximated by: σ2t = r 2 t , (3) The one-day ahead forecast σ2T+1|T is generated by the data t = 1, ..., T . We use a rolling window so that the following one-day ahead forecast is given by t = 2, ..., T+1. The length of the rolling window T is set to 1260, which is approximately five years worth of trading days. Using a rolling window has several advantages as it allows for a more dynamic and adaptive analysis of the return series. The approach can help capture temporal patterns and trends in the data over time. Another reason for a using rolling window is that it may provide more precise estimates as we continuously re-estimate the parameters with regard to new information as we move forward in time. A drawback of rolling window estimates is the sensitivity to window size. Too small windows might not capture the underlying level of volatility as well as outliers could have a significant impact on the estimates, while 13 too large windows could make the model slow at catching up to changes in the underlying volatility. 3.1 GARCH-models 3.1.1 ARCH(q) The ARCH(q) model first introduced by Engle (1982), exploits the fact that variance appears to change over time and that periods of small and large returns tend to be clus- tered. The model allows for lagged past values of residuals to influence the current level of volatility. The ARCH(q) model is defined as: ∑q σ2t = ω + α 2 iεt−i, (4) i=1 where, q is the number of lags, σ2t is the conditional variance at time t, εt is the residual term at time t and ω and αi are constants estimated by maximum likelihood. To avoid negative variance, restriction∑s have to be imposed on these constants. More specifically, they have to be positive. If qi=1 ai < 1 then σ 2 t is stationary. One of the problems with the ARCH(q) is that many lags are needed for the model to perform well in practice, which is not desirable as it requires the user to estimate many parameters. 3.1.2 GARCH(1,1) Bollerslev (1986) proposes a generalization of the original ARCH-process which allows for past conditional variances in the current conditional variance equation. This extension of the ARCH(q) model allows for a much more flexible lag structure by including past volatil- ity as a describing factor for the current volatility. The most commonly used specification for GARCH(p,q) is GARCH(1,1) which is defined as: σ2 = ω + α ε2 + β σ2t 1 t−1 1 t−1, (5) 14 where, in addition to the ARCH-parameters, β1 is another constant estimated by likeli- hood. In addition to the restrictions imposed on ARCH(q), α1+β1 < 1 and β1 > 0 should hold for GARCH(1,1) to ensure that σ2t is stationary. It can be shown that GARCH(1,1), by repeated substitution, can be rewritten as an ARCH(∞) which shows that GARCH captures the clustering with fewer lags. 3.1.3 E-GARCH(1,1) A limitation of both ARCH(1) and GARCH(1,1) is that they fail to capture the fact that the market reacts more strongly to negative shocks than positive shocks (leverage effect). Nelson (1991) introduces the EGARCH(p,q) model to deal with this. The model allows for an asymmetric response to positive and negative shocks by including a leverage term in the equation for the conditional variance. The leverage effect is, as described in Section 2.1, also confirmed as a stylized fact by Engle and Patton (2001). In this paper, we use the EGARCH(1,1)-specification which follows: [ √ ]εt−1 |εt−1| log(σ2) = ω + α γ + − 2/π + β log(σ2t t−1), (6)σt−1 σt−1 where, γ is the leverage coefficient and it captures the leverage effect if γ < 0. If β < 1 the EGARCH(1,1) is stationary. Using the logarithmic form of σ2t allows the parameters to be negative while keeping the conditional variance positive. 3.2 GAS(1,1) Creal et al. (2013) propose a new class of observation-driven models, Generalized Au- toregressive Score (GAS) models. Since the GAS-framework is score-driven it is very flexible and it is possible to obtain many other observation-driven models within the GAS-framework. The most simple specification of the model is the GAS(1,1): σ2t = ω + A s +B σ 2 1 t−1 1 t−1, (7) 15 st = St · ∇t, (8) where, ∇t refers to the score and St is the scaling matrix. St can be specified in many different ways, allowing for flexibility in the GAS-filters. In this paper, we derive the opti- mal filter by specifying St as the inverse of Fisher information. For Gaussian distribution, the filter equation is: σ2t = ω + A 2 1(εt−1 − σ2t−1) +B1σ2t−1, (9) Which is equivalent to GARCH(1,1). Note that the coefficients differ, but since β1 = B1 − A1 and α1 = A1 equations 5 and 9 are equivalent. If A1 = B1 the model is instead reduced to ARCH(1). 3.2.1 t-GAS(1,1) Since the GAS models utilize the optimal updates (Blasques et al., 2015), we see that GARCH(1,1) is optimal for a Gaussian distribution. However, it is of interest to test if forecasting performance can improve by assuming a different conditional distribution. If we instead assume Student’s t-distribution, we obtain t-GAS(1,1) which has the following filter equation: ( )[( ) ]−1( ) v + 3 ε2 σ2 = ω + A 1 + t−1 v + 1 ε2 2 2t 1 v (v − 2)σ2 v − 2 t−1 − σt−1 +B1σt−1, (10) t−1 where, v is the degrees of freedom. As we can see, assuming Student’s t-distribution, GAS(1,1) is not equivalent to GARCH(1,1). By including t-GAS(1,1) we allow for heavier tails in the conditional distribution. This also makes the model more robust to outliers than its Gaussian counterpart. 16 3.3 Realized Variance The measure we use for realized variation is the Realized Variance (RV) proposed by Andersen and Bollerslev (1998). The RV for a single day t is given by: ∑M RVt = (r ) 2 t,i , (11) i=1 where, rt,i is the ith observation on day t. Realized volatility at time t is approximated √ as RVt. In contrast to the open-close RV (RVOC,t), the close-close RV (RVCC,t) includes the squared return between 17:24 at time t− 1 and 09:00 at time t. 3.3.1 HAR-RV Corsi (2009) proposes a Heterogeneous Autoregressive model of Realized Volatility (HAR- RV). The model concentrates on heterogeneity originating from investors’ difference in time horizons. Some investors have a very short, intra-daily frequency while others might trade less frequently such as once a month. The idea of the model is that agents with different types of trading horizons perceive, react to, and cause different types of volatility components. In his model, Corsi identifies three types of components, short-term (daily), mid-term (weekly), and long-term (monthly). The HAR-RV model follows: (d) (d) (w) (w) (m) (m)RVt = ω + β RVt−1 + β RVt−1 + β RVt−1 , (12) (w) ∑1 5 (d) (m) ∑1 22 (d)where, RVt−1 = k=1RVt−k and RVt−1 = k=1RVt−k. In other words, the weekly5 22 RV is the sum of the last 5 days RV and the monthly is the sum of the last 22 days RV. An empirical fact of RV is that it tends to exhibit high serial correlation over many lags, which is something that is captured by both the weekly and monthly terms in HAR-RV. In contrast to the other models in this study, for HAR-RV the coefficients are estimated using regression which is equivalent to maximum likelihood. 17 3.3.2 Realized GARCH(1,1) Hansen et al. (2012) propose a GARCH model that utilizes realized measures of volatility. This model, commonly referred to as Realized GARCH, is based on an autoregressive moving average (ARMA) structure for both the realized measure and the conditional variance. Unlike traditional GARCH models that rely solely on past volatility to forecast future volatility, Realized GARCH models incorporate the realized measures of volatility, which are derived from high-frequency intraday data. Andersen et al. (2003) justify the use of realized measures ahead of daily data as the latter is slow to react to changes in the volatility. Since normal GARCH models make use of daily data they can only gradually adjust to volatility changes while the Realized GARCH model that utilizes realized measures is relatively fast to adapt. Therefore, the use of realized measures in this framework is intended to provide more accurate and more timely predictions of future volatility. The Realized GARCH(1,1) follows: log(σ2t ) = ω + β log(σ 2 t−1) + γ log(xt−1), (13) log(xt) = ξ + φ log(σ 2 t ) + δ(zt) + ut, (14) where, xt is a realized measure of volatility, in this study we use RVt. δ(zt) is the leverage function of the equation and the component that incorporates the leverage effect, e.g., the effect of returns on future volatility. Hansen et al. (2012) propose δ(z) = δ1z + δ2(z 2 − 1), as a simple specification of the leverage function. Given that xt is a realized measure based on intraday data and that σ2t is squared daily returns, then φ provides information about how much of the daily volatility occurs during trading hours. 18 3.4 Performance Evaluation 3.4.1 Benchmark To evaluate the performance of the GARCH(1,1), EGARCH(1,1), and t-GAS(1,1) we use the squared daily log returns as a proxy for the true daily volatility. For the HAR-RV and Realized GARCH(1,1) we use the RV as described in Section 3.3. 3.4.2 Loss Functions To evaluate the accuracy of the models we compare the forecasts with the benchmark using two loss functions. The reason why we use two different loss functions is that they penalize prediction errors differently, which allows for a more extensive comparison. The first loss function we use is Mean Squared Error (MSE), a loss function that penalizes errors symmetrically and since it squares the errors, MSE penalize outliers heavier than a loss function based on absolute values would. MSE is defined as: 1 ∑T MSE = (σ2t − σ̂2)2t , (15)T t=1 where, σ̂2t is the predicted signal for the true volatility and T is the number of observations. We also use Quasi-likelihood (QLIKE), a loss function that penalizes under-predictions heavier than over-predictions, something that is preferable in areas such as riskmanage- ment. Under-predicting volatility presents investors with a forecast that appears less risky than the actual risk, potentially leading to a false sense of security and too risky invest- ments. Conversely, over-predicting volatility may make investors overly cautious, as it exaggerates the expected risk level. Due to investors’ tendency to be risk averse, smaller gains are generally preferred over significant losses. The QLIKE metric is the loss function implied by a Gaussian likelihood. While the MSE depends solely on the forecast error, the QLIKE loss function is instead based on the standardized forecast error. The QLIKE 19 function is specified as follows: ∑T ( )1 σ2 QLIKE = log(σ̂2t ) + t , (16) T σ̂2 t=1 t 3.4.3 Diebold-Mariano Test To compare the forecasts we use the Diebold-Mariano (DM) test (Diebold & Mariano, 1995). The DM test compares the forecast errors from the two different forecasts and tests the null hypothesis of equal forecast accuracy. There are several benefits of using the DM test when comparing volatility forecasts. First, it is easy to implement and does not require any extensive calculations. Another benefit of the DM test is that it is robust to various distributional assumptions which makes it possible to compare the GARCH model with the t-GAS model. Different loss functions can be used in the DM test which allows for a more extensive comparison. The loss differential dt at time t is given by: dt = e 2 2 it − ejt, (17) We test the following hypothesis: H0 : E(dt) = d̄ = 0, HA : E(dt) = d̄ ̸= 0, (18) And the DM test statistic is given by: √ d̄DM = , (19) V ar(d̄) T We test at a significance level of α = 0.05. If we are able to reject the null hypothesis, we can conclude that the forecast with the lower error of the two compared forecasts is more accurate. However, if we do not reject the null hypothesis, we can not draw any conclusions based on our results. 20 4 Data In this section, we provide information about the sample data and the required modifi- cations that are done before we apply different volatility forecasting models to the data. This particular research endeavor delves into the XACT OMXS30 which is an ETF on the OMXS30. The reason for choosing XACT OMXS30 is that it reflects the index well and is liquid enough to study high-frequency data. The ETF is weighted on market value and is re-balanced twice a year. The time frame studied in this paper ranges from 2012-02-01 to 2022-01-31, spanning a period of ten years, and consists of 2450 trading days. Choosing a period of this substantial length provides a unique opportunity to examine both financially tumultuous periods and more stable times. Notable financial events during this time are the UK’s exit from the European Union in 2016 (Brexit) and the Covid-19 pandemic in 2020 which had a huge impact on all financial matters. The data is gathered from the Swedish House of Finance and consists of the traded price of XACT OMXS30. Initially, the data is sampled at a 1-minute frequency, we then re-sample the data at different frequencies to find the optimal frequency with the least market mi- crostructure noise. The different frequencies that we compare are 1, 5, 10, 15, 20, and 30 minutes. For any observations without trading activity, we resolve the issue by assuming that the price is equal to the price of the most recent observation with observed trading activity. Further, the data set is cleared from any time stamps referring to non-continuous trading periods, e.g., opening and closing auctions. Any observations that occur before 09:00 and after 17:24 are referred to as pre-trade and post-trade, and are not included in the data set. In this study, we examine both open-close returns and close-close returns. While open- close returns include trades during the actual day of interest, 09:00 to 17:24, the close-close returns incorporate the overnight effect as it spans from 17:24 the day before to 17:24 of 21 the day of interest. The reason for including two different sets of return series is that the volatility may differ since the close-close data captures the effect of financial news and events that happen during non-trading hours. 4.1 Daily Returns 0.12 0.1 0.08 0.06 0.04 0.02 0 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 (a) Open-Close 0.12 0.1 0.08 0.06 0.04 0.02 0 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 (b) Close-Close Figure 1: Absolute Daily Log Returns for XACT OMXS30 Based on Figure 1, it is clear that the absolute log returns do not exhibit a clear trend. Rather, the fluctuations in returns have varying amplitudes over time, with the periods of greatest volatility coinciding with the two major financial events, namely the time of Brexit (2016) and the Covid-19 pandemic (2020). As for both of the largest shocks, the periods that follow maintain relatively low volatility. These findings suggest an alternating pattern of volatility over time with periods of higher volatility appearing as clusters which is one of four empirical facts discussed in Section 2.1. Further, it is clear that the absolute 22 Absolute Log Return Absolute Log Return log returns in Figure 1 include a few extreme values, i.e., outliers, that could potentially make a t-distributed volatility model better suited than one with a Gaussian distribution. Further, the absolute log returns for the close-close data seem to be slightly more volatile compared to the open-close data. This is expected since the open-close data captures a shorter time frame as it only covers the actual trading hours and does not incorporate the effects of events during non-trading hours. (a) Open-Close Returns (b) Close-Close Returns (c) Open-Close Squared Returns (d) Close-Close Squared Returns Figure 2: Autocorrelations for Daily and Squared Daily Log Returns 23 We test the autocorrelation of the returns and volatility to determine whether the obser- vations are stationary or non-stationary. The correlation coefficient between a time series and a lagged version of itself is plotted against the lag. If the time series is stationary, the autocorrelation should decrease quickly and stay close to zero for all lags. If the time series is non-stationary, the autocorrelation may remain high even for large lags, indicating the presence of a trend or other non-stationary behavior. Figure 2 shows that both the open- close and close-close returns have autocorrelations close to zero which means that is not possible to predict future returns based on past observations. Conversely, the autocorrela- tion plots for the volatility, that is the squared returns, show clear signs of non-stationary trends as the autocorrelation is significant even for larger lags. This is welcomed since it allows us to use squared returns to forecast future volatility. 24 (a) Open-Close, Gaussian (b) Open-Close, Student’s t (c) Close-Close, Gaussian (d) Close-Close, Student’s t Figure 3: Log Returns Plotted Against Gaussian Distribution and Student’s t-distribution. 25 To further analyse the properties of the data, we plot the open-close and close-close log returns respectively in histograms with both a Gaussian distribution and a Student’s t- distribution, see Figure 3. It is clear that the Student’s t-distribution has a better fit to the data compared to the Gaussian distribution. While it does not perfectly capture the high thin peak around the mean, it is still an improvement from the Gaussian distribution. This confirms the fourth empirical property of Engle and Patton (2001) discussed in Section 2.1, the unconditional distribution of returns has fat tails. When comparing the two figures it is further clear that the close-close returns include more extreme values, e.g. outliers, and therefore a histogram with fatter tails. Table 1: Descriptive Statistics - Daily Returns and Squared Daily Returns Daily Returns Mean Standard Deviation Sample Size Skewness Kurtosis Open-Close 0.00003 0.0088 2450 -0.25 3.16 Close-Close 0.00045 0.0109 2449 -0.90 8.68 Squared Daily Returns Open-Close 0.00008 0.0002 2450 8.99 132.20 Close-Close 0.00012 0.0004 2449 19.01 536.07 When estimating the kurtosis of the daily returns as shown in Table 1, it is clear that the distribution has positive excess kurtosis. The daily close-close returns have a relatively high value of kurtosis which is a sign of leptokurtosis and that the distribution incorporates heavier tails than a Gaussian distribution. However, the daily open-close returns has a kurtosis value close to 3 which is the value for a Gaussian distribution. For squared daily returns, both open-close and close-close show clear signs of leptokurtosis. Further, we see that both the daily open-close returns and the close-close returns are negatively skewed. Conversely, the open-close and close-close squared daily returns are positively skewed. Skewness is a measure that indicates whether the distribution of the data set is positively or negatively skewed about its mean. Negative skewness means that the tail of the distribution is concentrated to the right and that one may expect a few extreme values to the left. 26 4.2 Realized Returns To determine the optimal sampling frequency for the RV, we compare the mean RV at different frequencies. We compute RV for 1, 5, 10, 15, 20 and 30 minutes. (a) Open-Close (b) Close-Close Figure 4: Average Daily RV for Sampling Frequencies Ranging from 1 to 30 Minutes From Figure 4 we see that the slopes in both graphs increase significantly for higher fre- quencies than 10 minutes, implying that any frequency higher than that is upward biased. The average realized variance for each frequency is computed as an average across all days in the sample. Lowering the frequency from 1 to 10 minutes seems to be optimal as this is where the line is fading out and the bias, i.e., microstructure noise, is presumably rela- tively low. To allow for an informative comparison between the open-close and close-close data in realized measures, we use the same sampling frequency for both. While the graphs are not comparable in the sense that they have the same y-axis, they both showcase how the average RV decreases when lowering the frequency. The dotted line shows the average RV for the selected frequency (10 minutes). Table 2 provides descriptive statistics for the realized variance with 10 minute sampling frequency. Compared to the statistics in Table 1, it is clear that realized variance has 27 Table 2: Descriptive Statistics - Realized Variance Mean Standard Deviation Sample Size Skewness Kurtosis Open-Close 0.00008 0.0001 2450 12.42 239.16 Close-Close 0.00013 0.0003 2449 10.89 149.95 both lower mean and standard deviation than the daily returns, however, compared to the squared daily returns the values are similar. The skewness and kurtosis are also substantially higher than for the daily returns which is expected since the RV only takes positive values. (a) Open-Close (b) Close-Close Figure 5: Autocorrelations for Realized Variance 28 Next, we check for autocorrelation for the open-close and close-close RV. As Figure 5 shows, the autocorrelation is significant for both sets of RV which is welcomed as it allows us to predict future variance by using historic data. The plots show the significance level of 5 percent and we are using 20 lags. Even for higher lags, the autocorrelation is significant which proves that past RV is a good predictor of future RV. 29 5 Results Table 3: Average Estimated Coefficients Table 3 shows the average estimated values for each coefficient across all windows. Restric- tions are applied when estimating the coefficients, in line with what is described in Section 3 where descriptions of the coefficients can be found. Further, the degrees of freedom, v, are constrained to be above 7. All values are significant for a t-test at α = 0.05. Model ω A d w m1 B1 γ v B B B ξ φ δ1 δ2 GARCH 2.33E-06 0.11 0.86 EGARCH -0.36 0.17 0.96 -0.11 t-GAS -0.28 0.07 0.97 9.56 HARRV 1.11E-05 0.26 0.07 0.01 rGARCH 3.12E-05 0.29 0.60 0.09 0.90 0.13 0.10 GARCH 3.11E-06 0.12 0.85 EGARCH -0.30 0.15 0.97 -0.13 t-GAS -0.35 0.09 0.96 9.38 HARRV 2.94E-05 0.07 0.08 0.01 rGARCH 4.94E-05 0.29 0.60 0.09 0.91 0.13 0.10 We re-estimate the coefficients for our models for each window and the average values for all coefficients are presented as follow in Table 3. First, we see that the averages for the degrees of freedom v are slightly above 9 for both open-close and close-close which indicate that the t-GAS model indeed assumes the conditional distribution of returns to be heavy-tailed. Further, the averages for the leverage term γ in EGARCH are negative which, as mentioned in Section 3.1.3, means that it captures the leverage effect. Focusing on HAR-RV, we see some notable differences between the average coefficients in the open- close and close-close estimates. For the open-close estimate, HAR-RV puts more weight on the RV the day before than in the close-close estimate while the close-close estimate instead have a higher value for the intercept, ω. 30 Close-Close Open-Close 5.1 Squared Daily Returns 5.1.1 Open-Close Figure 6: Rolling Window Open-Close Volatility Forecasts Figure 6 illustrates the forecasts produced by three models: GARCH, EGARCH, and t-GAS. It is clear that the EGARCH model deviates from the other two models by pro- ducing a forecast with more extreme highs and lows. As for the GARCH model and the t-GAS model, they provide relatively similar forecasts. A notable difference between the two models is that GARCH forecasts more extreme highs on a few occasions, e.g., during the beginning of 2020. The reason why t-GAS is not predicting as high peaks as GARCH is that it treats extreme values as outliers and therefore these values do not have as big an impact on the t-GAS forecast which results in a smoother forecast. Table 4: Performance Evaluation - Squared Daily Returns Open-Close MSE QLIKE GARCH 3.304E-05 -4.104 EGARCH 3.154E-05 -4.116 t-GAS 3.265E-05 -4.102 Table 4 shows that the difference between the three models is relatively small in regards to both MSE and QLIKE. The forecast generated by EGARCH has the lowest prediction 31 error for both loss functions. We also see that while t-GAS has a lower prediction error than GARCH for MSE, GARCH has a lower error for QLIKE. To be able to conclude that the EGARCH forecast is superior we test the significance of the loss differentials between the models using the DM test. Table 5: Diebold-Mariano Test - Open-Close MSE Loss diff DM p-value GARCH vs. EGARCH 1.501E-06 2.756 0.01 GARCH vs. t-GAS 3.931E-07 0.706 0.31 EGARCH vs. t-GAS -1.108E-06 -1.762 0.08 QLIKE GARCH vs. EGARCH 0.012 4.604 0.00 GARCH vs. t-GAS -0.002 -1.227 0.18 EGARCH vs. t-GAS -0.014 -4.488 0.00 As Table 5 shows, the EGARCH forecast significantly outperforms the regular GARCH forecast using MSE as the loss function. However, for the comparisons between GARCH and t-GAS, as well as between EGARCH and t-GAS, using MSE as the loss function, we fail to reject the null hypothesis. When evaluating the models using the QLIKE loss function, we reject the null hypothesis for both the GARCH vs. EGARCH and EGARCH vs. t-GAS comparisons. This implies that EGARCH outperforms both models when eval- uating the forecasting performance by the QLIKE loss function. However, for the GARCH vs. t-GAS comparison, we can not reject the null hypothesis and is therefore not able to draw any conclusions about the relative performance between the two forecasts. 32 5.1.2 Close-Close Figure 7: Rolling Window Close-Close Volatility Forecasts The graph in Figure 7 shows the volatility forecasts of GARCH, EGARCH, and t-GAS for close-close returns. While the models provide relatively similar forecasts for most of the time, the t-GAS model does not provide as high peak as GARCH and EGARCH at the beginning of 2020 which marks the most volatile period. As mentioned earlier, this is due to the fact that the t-GAS model treats extreme values as outliers. Except for this spike, the EGARCH model stands out with more notable highs and lows which is in line with what was shown also for the open-close returns. Comparing the open-close and close-close forecasts, we see that the close-close forecasts are higher than the open-close forecasts. Something that we expect since close-close volatility in general is higher than open-close volatility. Table 6: Performance Evaluation - Squared Daily Returns Close-Close MSE QLIKE GARCH 6.207E-05 -3.893 EGARCH 5.611E-05 -3.911 t-GAS 6.013E-05 -3.892 Table 6 shows information about the two loss functions for the close-close data. Similar to the results in Table 3, EGARCH outperforms both GARCH and t-GAS when evaluating 33 the forecasts using MSE and QLIKE. Once again we see that t-GAS has a lower error than GARCH for MSE while GARCH has a lower error for QLIKE. Both MSE and QLIKE show higher forecast errors for the close-close data compared to the open-close data. The fact that EGARCH is the model that has the least increase in forecast error is a sign that the leverage function incorporated in the model performs well when applied to a more volatile set of observations. Table 7: Diebold-Mariano Test - Close-Close MSE Loss diff DM p-value GARCH vs. EGARCH 5.957E-06 4.650 0.00 GARCH vs. t-GAS 1.948E-07 1.375 0.16 EGARCH vs. t-GAS -4.009E-06 -2.582 0.01 QLIKE GARCH vs. EGARCH 0.018 5.671 0.00 GARCH vs. t-GAS -0.001 -0.373 0.37 EGARCH vs. t-GAS -0.019 -5.734 0.00 Following the same procedure as in Table 4, we compare the different forecasts for the close-close data by using the DM test and computing the p-values. As shown in Table 7, EGARCH significantly outperforms both GARCH and t-GAS using MSE as the loss func- tion. We can not reject the null hypothesis of equal performance when comparing GARCH and t-GAS with MSE. For the QLIKE loss function, we reject the null hypothesis for both tests that include EGARCH and conclude that EGARCH provides a significantly better forecast than the two other models. Further, we can not reject the null hypothesis when for GARCH and t-GAS using QLIKE as the loss function. We also see that the errors for both loss functions are smaller for open-close data than close-close, however, the rankings for both loss functions are the same, indicating that the accuracy of the models is lower for more time series with higher volatility. Overall, we find clear evidence that EGARCH provides the most accurate forecasts when 34 using squared daily returns as a proxy for volatility. These results are in line with what H.-C. Liu and Hung (2010) conclude when they state the importance of modelling an asymmetric component when working with data that has a fat-tailed distribution. In Sec- tion 4.1 we confirm that the returns have a leptokurtic distribution, hence the EGARCH is a model well suited for providing an accurate volatility forecast. Our results do not show any significant gains from using a t-distributed model, while Hansen and Lunde (2005) amongst others, have found that GARCH models with a t-distribution outperforms a Gaussian GARCH(1,1) when forecasting stock returns. 5.2 Realized Variance 5.2.1 Open-Close Figure 8: Rolling Window Open-Close Volatility Forecasts Figure 8 shows the volatility forecasts using the Realized GARCH and HAR-RV models. While it is clear that the Realized GARCH predicts higher volatility for most periods, both models seem to follow similar patterns where the spikes occur simultaneously. Notably, the HAR-RV model forecasts a peak at the beginning of 2020 equally high as the Realized GARCH model, despite predicting lower volatility for most other periods. As Table 8 shows, HAR-RV generates more accurate forecasts for the open-close data 35 Table 8: Performance Evaluation - Realized Variance Open-Close MSE QLIKE rGARCH 1.523E-05 -3.862 HAR-RV 8.674E-06 -4.844 than the Realized GARCH, both with MSE and QLIKE as loss functions. Somewhat surprisingly, Realized GARCH does not seem to benefit from forecasting higher levels of volatility in the open-close setting when we use QLIKE as the loss function, despite the loss function penalizing under-prediction heavier than over-prediction. To further validate which of the two models that generate superior forecasts we progress with the DM test. Table 9: Diebold-Mariano Test RV - Open-Close MSE Loss diff DM p-value rGARCH vs. HAR-RV 6.569E-06 8.495 0.00 QLIKE Loss diff DM p-value rGARCH vs. HAR-RV 0.98 131.502 0.00 The results from the DM test are shown in Table 9. It is clear that the forecast pro- vided by the HAR-RV model is significantly superior to the one provided by the Realized GARCH model. This is the case for both loss functions which allows us to conclude that the HAR-RV model provides a more accurate volatility forecast. 36 5.2.2 Close-Close Figure 9: Rolling Window Close-Close Volatility Forecasts Figure 9 shows similar results as shown in Figure 8, that is the Realized GARCH model predicts higher levels of volatility than the HAR-RV model. It is worth noticing that the forecasts for close-close volatility are noticeably higher than the open-close forecast. Com- pared to the forecast for open-close volatility, this forecast provides a significant difference between the two models during the most volatile periods. The HAR-RV forecast does not match the Realized GARCH forecast during either of the extreme peaks in 2020. Table 10: Performance Evaluation - Realized Variance Close-Close MSE QLIKE rGARCH 3.905E-05 -3.680 HAR-RV 2.808E-05 -3.719 From Table 10 we see that when utilizing close-close data, Realized GARCH does not benefit from forecasting higher levels of volatility than HAR-RV when using QLIKE as the loss function. At least not enough to outperform HAR-RV. Similar results are shown for the MSE loss function, where HAR-RV outperforms Realized GARCH in forecasting performance. 37 Table 11: Diebold-Mariano Test RV - Close-Close MSE Loss diff DM p-value rGARCH vs. HAR-RV 1.097E-05 4.460 0.00 QLIKE Loss diff DM p-value rGARCH vs. HAR-RV 0.039 13.373 0.00 Similar to the results presented in Table 9 for open-close volatility, Table 11 confirms that we can reject the null hypothesis for both loss functions when we compare the close-close forecasting performance of Realized GARCH and HAR-RV. We conclude that HAR-RV generates a more accurate forecast than Realized GARCH in this setting. 5.3 EGARCH vs. HAR-RV Next, we compare the forecast performance of EGARCH and HAR-RV. It should be noted that we now compare one model that forecasts squared daily return with one that fore- casts realized variance, i.e., the models forecast different measures. The comparisons are therefore not as reliable and informative as the previous ones. But since the forecasts are evaluated based on the measure that it forecasts, comparisons will be made as they could indicate whether incorporating realized measures could improve forecasting ability. Table 12: Diebold-Mariano Test Open-Close Loss diff DM p-value HAR-RV vs. EGARCH - MSE -2.314E-05 -9.521 0.00 HAR-RV vs. EGARCH - QLIKE 0.191 10.859 0.00 Close-Close HAR-RV vs. EGARCH - MSE -7.286E-05 -7.652 0.00 HAR-RV vs. EGARCH - QLIKE 0.753 38.026 0.00 38 The DM tests shown in Table 12 prove that HAR-RV is significantly better than EGARCH when evaluated by MSE. Conversely, when using the QLIKE loss function, the EGARCH is shown to be superior to HAR-RV. While the results depend on what loss function that is used, we can conclude that all tests are significant and we are therefore able to tell which one of the two volatility forecasting models that are performing the best when applying both MSE and QLIKE. The results are the same for both open-close and close-close forecasts. 5.4 Volatility Shocks To further visualize the difference between assuming a t-distribution and Gaussian dis- tribution for the conditional distribution of returns, we plot the close-close volatilities for GARCH, EGARCH, and t-GAS during the two largest volatility shocks, Brexit in 2016 and COVID-19 in 2020. Note that we previously show focus on the out-of-sample, i.e., forecasted, volatility from the models, while we now switch focus to the in-sample volatility. 39 Figure 10: Close-Close GARCH Volatilities during Brexit (2016) Looking at Figure 10, the difference between a Gaussian and a more heavy-tailed dis- tribution for conditional volatility is clearly visible. The figure highlights the volatility modelled by the different GARCH models during Brexit 2016, which if we recall Figure 1(b), was an extremely volatile period. It is clear that GARCH is heavily affected by the shock while the t-GAS model does not react in the same way. This is because the Student’s t-distribution reduces the impact of outliers for the t-GAS model and therefore provides a more smooth pattern with lower peaks. Further, as it was a negative shock, we see that EGARCH also shows a very high level of volatility. However, the return to the normal level of volatility, i.e., mean reversion, is faster than for GARCH. 40 Figure 11: Close-Close GARCH Volatilities during the COVID-19 crisis (2020) Figure 11 shows another very volatile period, the COVID-19 crisis that occurred in 2020. Similar patterns to Figure 10 are found where the GARCH model is heavily influenced by the volatility shock and shows high levels of volatility for many periods after the initial shock while t-GAS provides a smoother path. As EGARCH initially shows high levels of volatility but then in the following periods shows lower volatility than both t-GAS and GARCH, we understand that the initial shock is negative and that the ETF after that seems to recover. 41 6 Conclusions We find that forecasts generated by EGARCH are more accurate than forecasts generated by GARCH and t-GAS allowing us to conclude that the inclusion of a leverage term in the forecasting models does improve the forecasting accuracy. As for the models that utilize realized measures, we find clear evidence that HAR-RV provides more accurate forecasts than Realized GARCH. Considering the high level of autocorrelation that RV exhibits, it is not surprising to see that HAR-RV performs so well. Comparing HAR-RV and EGARCH, we see that HAR-RV provides more accurate forecasts for a symmetrical loss function, however, EGARCH provides more accurate forecasts with a loss function that penalizes under-prediction heavier than over-prediction. Indicating that EGARCH might be more suitable for risk-management purposes. When comparing open-close and close-close forecasts, we conclude that the former have smaller prediction errors for all forecasts. Looking solely at the values of MSE and QLIKE, we see no difference in performance between the models when comparing open-close and close-close forecasts. However, the DM test shows that we can reject the null hypothesis of equal performance more often when we use close-close data than open-close data. We find no significant evidence indicating that the use of Student’s t-distribution for the conditional volatility improves forecasting performance. However, occasions where the benefits are clearly visible are presented. During both Brexit and COVID-19 we see that t-GAS is more robust to outliers while these shocks have a larger impact on the Gaussian GARCH. 42 References Andersen, T. G., & Bollerslev, T. (1998). Answering the Skeptics: Yes, Standard Volatility Models do Provide Accurate Forecasts. International Economic Review, 39 (4), 885– 905. Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling and Fore- casting Realized Volatility. Econometrica, 71 (2), 579–625. Artemova, M., Blasques, F., Brummelen, J. V., & Koopman, S. J. (2022). Score-Driven Models: Methodology and Theory. Oxford Research Encyclopedia of Economics and Finance. Bandi, F. M., & Russell, J. R. (2008). Microstructure Noise, Realized Variance, and Op- timal Sampling. The Review of Economic Studies, 75 (2), 339–369. Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., & Shephard, N. (2008). Designing Realized Kernels to Measure the ex post Variation of Equity Prices in the Presence of Noise. Econometrica, 76 (6), 1481–1536. Blasques, F., Koopman, S. J., & Lucas, A. (2015). Information-theoretic Optimality of Observation-driven Time Series Models for Continuous Responses. Biometrika, 102 (2), 325–342. Bollerslev, T. (1987). A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of Return. The Review of Economics and Statistics, 69 (3), 542– 547. Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 31 (1), 307–327. Bollerslev, T., Litvinova, J., & Tauchen, G. (2006). Leverage and Volatility Feedback Effects in High-Frequency Data. Journal of Financial Econometrics, 4 (3), 353– 384. Christoffersen, P., Feunou, B., Jacobs, K., & Meddahi, N. (2014). The Economic Value of Realized Volatility: Using High-Frequency Returns for Option Valuation. Journal of Financial and Quantitative Analysis, 49 (3), 663–697. 43 Corsi, F. (2009). A Simple Approximate Long-Memory Model of Realized Volatility. Jour- nal of Financial Econometrics, 7 (2), 174–196. Creal, D., Koopman, S. J., & Lucas, A. (2013). Generalized Autoregressive Score Models With Applications. Journal of Applied Econometrics, 28 (1), 777–795. Diebold, F. X. (2015). Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold-Mariano Tests. Journal of Business & Economic Statistics, 33 (1), 1–9. Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of Business & Economic Statistics, 13 (3), 253–263. Engle, R. F. (1982). Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 50 (4), 987–1007. Engle, R. F., & Patton, A. J. (2001). What good is a volatility model? Quantitative Finance, 1 (1), 237–245. Hansen, P. R., Huang, Z., & Shek, H. H. (2012). Realized Garch: A Joint Model for Returns and Realized Measures of Volatility. Journal of Applied Econometrics, 27 (6), 877– 906. Hansen, P. R., & Lunde, A. (2005). A forecast comparison of volatility models: Does anything beat a GARCH(1,1). Journal of Applied Econometrics, 20, 873–889. Hansen, P. R., Lunde, A., & Nason, J. M. (2003). Choosing the Best Volatility Models: The Model Confidence Set Approach. Oxford Bulletin of Economics and Statistics, 65 (1), 839–861. Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the Equality of Prediction Mean Squared Errors. International Journal of Forecasting, 13 (1), 281–291. Koopman, S. J., Lucas, A., & Scharth, M. (2016). Predicting time-varying parameters with parameter-driven and observation-driven models. The Review of Economics and Statistics, 98 (1), 97–110. Liu, H.-C., & Hung, J.-C. (2010). Forecasting S&P-100 stock index volatility: The role of volatility asymmetry and distributional assumption in GARCH models. Expert Systems with Applications, 37 (7), 4928–4934. 44 Liu, L. Y., Patton, A. J., & Sheppard, K. (2015). Does anything beat 5-minute RV? A com- parison of realized measures across multiple asset classes. Journal of Econometrics, 187 (1), 293–311. Nelson, D. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59 (2), 347–370. Patton, A. J. (2011). Volatility Forecast Comparison Using Imperfect Volatility Proxies. Journal of Econometrics, 160 (1), 246–256. Sharma, P., & Vipul. (2016). Forecasting Stock Market Volatility using Realized GARCH model: International Evidence. The Quarterly Review of Economics and Finance, 59 (1), 222–230. 45