Volatility Forecasting
A comparative study of different forecasting models.
Emil Sturesson, Anton Wennström
Bachelor’s Thesis in Financial Economics, 15HP
Supervisor: Marcin Zamojski
University of Gothenburg
Sweden
Spring Term 2023
Abstract
This study evaluates the out-of-sample forecasting performance of different volatility mod-
els. When applied to XACT OMXS30, we use GARCH(1,1), EGARCH(1,1), and t-
GAS(1,1) to forecast squared daily returns while Realized GARCH(1,1) and HAR-RV
are used to forecast Realized Variance. We forecast both measures with open-close as
well as close-close data. One-day-ahead forecasts are computed using a five year mov-
ing window. The performance is measured with two different loss functions, MSE and
QLIKE. The Diebold-Mariano test is then used to test significance. Our findings indicate
that EGARCH(1,1) is superior when forecasting squared daily returns and that HAR-RV
is superior when forecasting Realized Variance. Comparing EGARCH and HAR-RV, we
find that the latter is more accurate for a symmetrical loss function while EGARCH is
superior using the QLIKE loss function. We find no evidence indicating that Student’s
t-distribution for the conditional volatility improves forecasting accuracy. Finally, we con-
clude that open-close data generates smaller forecast errors than close-close data.
Keywords: Volatility, GARCH, EGARCH, t-GAS, HAR-RV, Realized GARCH, Volatil-
ity Forecasting, Volatility Modelling
1
Acknowledgements
We would like to express our gratitude to our supervisor, Marcin Zamojski, for his excep-
tional guidance, support, and expertise throughout the completion of our thesis. Further,
we would like to extend our appreciation to the teachers at the University of Gothenburg
for their knowledge and engaging teaching methods that have provided a solid foundation
for the research conducted in this thesis. Finally, we would like to thank our opponents
for valuable feedback and discussions.
2
Contents
1 Introduction 5
2 Literature Review 8
2.1 Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Previous Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Methods 13
3.1 GARCH-models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 ARCH(q) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.3 E-GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 GAS(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 t-GAS(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Realized Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.1 HAR-RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.2 Realized GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.3 Diebold-Mariano Test . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Data 21
4.1 Daily Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Realized Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Results 30
5.1 Squared Daily Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1.1 Open-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3
5.1.2 Close-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Realized Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Open-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 Close-Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 EGARCH vs. HAR-RV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Volatility Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Conclusions 42
4
1 Introduction
The financial environment is and has always been influenced by a forward-looking perspec-
tive where expectations about future events are reflected in the current price of financial
assets. To gain an advantage, or simply to limit risk, people constantly try to predict
what will happen in the future based on available information. One of the areas where
forecasting has been applied and studied to a great extent is volatility. Assets with high
volatility are more risky, and investors demand higher returns from them to justify the risk.
Consequently, modeling the conditional variances of assets is a significant research area
in finance, which this study focuses on. Specifically, we evaluate and compare the perfor-
mance of different volatility models, including GARCH(1,1), EGARCH(1,1), t-GAS(1,1),
HAR-RV, and Realized GARCH(1,1), on the OMXS30 index over a period of 10 years. Our
primary contribution is to provide empirical evidence on the out-of-sample performance
of these models. Given the varying strengths and weaknesses of different forecasting mod-
els, this study aims to compare their performance to identify the most accurate model in
regard to squared daily returns and realized measures.
The Autoregressive Conditional Heteroskedasticity (ARCH) model introduced by Engle
(1982) allows the conditional variance to change over time in order to capture volatility
clustering. Bollerslev (1986) proposes a Generalized ARCH: GARCH, that is able to cap-
ture the clustering effect using fewer lags than the ARCH. Different GARCH models have
been proposed to deal with different distributions and properties: Nelson (1991) introduces
the Exponential GARCH (EGARCH) model to deal with the fact that volatility seems to
be larger following negative returns than positive returns. Creal et al. (2013) propose a
score-driven model, that encompasses the Gaussian GARCH(1,1) as well as many other
models. Since the model is score-driven it can provide more detailed information about
the probability density function than just the mean and higher moments, which makes it
more suitable to model conditionally t-distributed returns (t-GAS) than a GARCH model
would be.
5
Much of the recent development in the field of volatility modelling is focused on realized
measures. Andersen et al. (2003) propose Realized Variance (RV) as an unbiased estimator
of the true volatility. Other complex realized estimators have been proposed to deal with
microstructure noise (see Barndorff-Nielsen et al., 2008). Hansen et al. (2012) propose a
GARCH model that uses realized measures together with daily returns to estimate volatil-
ity. HAR-RV by Corsi (2009) is another powerful model that utilizes realized measures.
While standard GARCH models prove to have no scaling behaviour, that is incorporating
long-memory processes, the HAR-RV is beneficial to include in this study as it is able to
capture volatility persistence over a longer time horizon.
When modelling and forecasting volatility it is necessary to consider whether the time
series are defined as observation-driven models or parameter-driven models (Creal et al.,
2013). The models that we so far have discussed are all observation-driven. These models
are commonly used as they only use past observations to predict the parameters, which
is computationally simpler than the parameter-driven models. Parameter-driven mod-
els assume stochastic parameters that are at least partly independent and therefore the
parameters require more complex estimation methods. One of the most used parameter-
driven models is the stochastic volatility model (SV). In the interest of time, we choose to
focus on observation-driven models, to which the GARCH models belong.
In this paper, we study two sets of return series, open-close and close-close. The latter
is expected to be more volatile and it is, therefore, of interest to analyse the two sets of
data separately. When estimating the models, we utilize intraday data for the models
incorporating realized variance. The return series is re-sampled at different frequencies in
order to find the optimal frequency with the least microstructure noise. Further, the ob-
servations are checked for autocorrelation as well as the unconditional distribution. Once
shown that historical squared daily returns and realized variance provide sufficient infor-
6
mation to predict future volatility, the models are estimated and used for one-step-ahead
forecasting using a rolling window.
To evaluate the forecasts we use two different loss functions: mean squared error and
QLIKE. These are two robust loss functions that are able to incorporate any imperfec-
tions in the proxy of the true volatility. Further, we use the Diebold-Mariano test to see
whether the differences in the loss functions are significant or not. This is necessary for us
to be able to draw any conclusions about which model provides the best volatility forecast.
Our findings when evaluating the forecasting performance of the volatility models are that
the EGARCH model outperforms GARCH and t-GAS for both Mean Squared Error and
QLIKE loss function. For models using realized measures, we conclude that the HAR-RV
model significantly outperforms the Realized GARCH model for both mean squared error
and the QLIKE loss function. Comparing EGARCH and HAR-RV, we see that the supe-
rior model depends on which loss function is being used where EGARCH is superior when
evaluating by the QLIKE loss function and HAR-RV significantly better when applying
Mean Squared Error.
The rest of the paper is structured as follows. In Section 2, we discuss the previous liter-
ature in the area. In Section 3, we explicate the econometric methodology, incorporating
a detailed explanation of the model specifications and performance measures employed
in our empirical analysis. In Section 4, we discuss data collecting and processing. In
Section 5, we present the results of our analysis. Finally, in Section 6, we present our
conclusions.
7
2 Literature Review
2.1 Volatility
When forecasting an unobserved variable, a common problem is the fact that one has to
solve for a proxy used as an unbiased estimator. The choice of the proxy has a significant
impact when forecasting the variable of interest. If the forecast does not depend on the
proxy, it means that the proxy variable is not a useful predictor of volatility which will
result in a less performative model. While many of the GARCH models utilize squared
daily returns as a proxy for the true level of volatility, many studies indicate that the
realized variance is a much more informative and precise estimator for the current level of
volatility (Andersen & Bollerslev, 1998; Hansen et al., 2003).
Bandi and Russell (2008) discuss a trade-off between bias and variance of the estimator
that has to be considered when estimating realized variance. They state that higher sam-
pling frequency results in more precise estimates when the true price process is observable.
But in reality, the true price process is not observable due to microstructure frictions. In-
creasing sampling frequency will in this case provide a higher degree of information about
the variance, however, it will also include a higher degree of noise (bid-ask spread, etc.).
Patton (2011) shows that volatility proxies with less noise can significantly improve fore-
casting ability. However, even though realized variance is seen as a superior volatility
proxy compared to squared daily returns, L. Y. Liu et al. (2015) show that it still can
incorporate a relatively large degree of distortion and, therefore, it leads to a trade-off
between better accuracy with higher frequency and significant effects of microstructure
noise. Patton (2011) shows how this proxy gets more efficient as the observation fre-
quency increases. Using 5-min returns when analysing a stock index return significantly
reduces the noise compared to when using half-hour returns.
8
Bollerslev et al. (2006) identify another significant advantage of using intra-day data as it
can provide a more accurate assessment of the two key factors that drive the asymmetric
relationship between volatility and past returns. Specifically, the leverage effect that ex-
plains why negative returns tend to result in higher volatility and the volatility feedback
effect that describes how higher volatility levels can lead to negative returns. In lower fre-
quency data, e.g., daily observations, these causal relationships may appear immediately
and can be indistinguishable from one another. Therefore, by using high-frequency data,
it is possible to differentiate between the leverage effect and the volatility feedback effect
more clearly and describe the relationship between past returns and volatility.
In a comprehensive study of over 400 estimators of asset price variation across various asset
classes, L. Y. Liu et al. (2015) conclude that realized variance is not inferior to any other
estimator. The study’s use of a five-minute sampling frequency as the benchmark realized
measure proves that there is little evidence suggesting its inferiority to other estimators.
Moreover, adopting a five-minute sampling frequency for realized variance yields superior
results for less liquid assets, while the advantages of utilizing more advanced estimators
are more noticeable for liquid assets.
Engle and Patton (2001) classify volatility models into two main categories. The first
category, known as observation-driven models, involves formulating conditional variance
as a direct function of observable variables. The second category, referred to as latent
volatility models, i.e., parameter-driven models, are based on variables that are not solely
observable which makes it more difficult to forecast the future volatility compared to
when utilizing observation-driven models. An example of a parameter-driven model is the
Stochastic Volatility model (SV). Further, the authors highlight some stylized facts about
asset price and volatility that they believe have to be incorporated by volatility models to
provide accurate results. The four main empirical properties discussed are as follows:
1. Clustering, periods of large or small changes tends to come in clusters. Today’s
9
volatility shocks will have a lasting impact on the anticipated volatility for many
future periods.
2. Leverage effect, volatility increases more after a negative price shock than after
positive returns of the same size.
3. Mean reversion, there is a normal level of volatility that the volatility eventually
returns to. Long-run forecasts should converge to the normal level of volatility.
4. Heavy tails, the unconditional distribution of returns has fat tails.
2.2 Previous Comparisons
When forecasting conditional variances there are multiple GARCH-type models to use,
each with different modifications and adjustments to fit the observed data. Hansen and
Lunde (2005) compare forecasts generated by 330 different GARCH models to determine
whether there are any models that prove to be better at forecasting the conditional variance
than the most commonly used, GARCH(1,1) model. While they show that GARCH(1,1)
proves to be no worse in forecasting exchange rate data, it seems to perform worse than
many other models when forecasting stock returns, more specifically other GARCH mod-
els that incorporate the leverage effect, e.g., EGARCH.
Bollerslev (1987) studies the distributional properties of stock returns and their possible
implications for the performance of volatility forecasting models. When comparing the
performance of the normal GARCH(1,1) model to the t-distributed GARCH(1,1) model
he concludes that the latter performs relatively well as it can capture the non-Gaussian
properties of the return series, i.e., heavy tails.
H.-C. Liu and Hung (2010) investigate the effectiveness of various GARCH models with
different distributional assumptions as well as their ability to incorporate the leverage
effect. According to the study, while GARCH models that assume various probability
10
distributions are not very effective in improving forecasting performance in the presence
of fat-tailed distributions, asymmetric models such as EGARCH and GJR-GARCH show
better results in predicting stock market volatility. Moreover, modeling an asymmetric
component is more important than adjusting the error term distribution, and using a
Gaussian distribution is recommended when using a GARCH model.
Christoffersen et al. (2014) further analyse whether utilizing realized measures in volatility
models may not only improve the forecasting accuracy but also bring economic gains. They
create a model that incorporates two models with different volatility components, namely
daily returns and realized measures. Their findings demonstrated that the inclusion of
realized measures leads to reduced prediction errors across key economic benchmarks, in-
cluding moneyness, maturity, and volatility. This suggests that utilizing realized measures
in volatility models not only improves forecasting accuracy but also holds potential eco-
nomic benefits.
Koopman et al. (2016) study the ability of observation-driven models and parameter-
driven models to predict time-varying parameters. When comparing models from the two
classes, it is shown that an observation-driven model with a score function, e.g., t-GAS,
performs equally well as correctly specified parameter-driven counterparts. Score-driven
models are a class of observation-driven models that consider all relevant features of the
observation density function and provide a general way of updating parameters. These
empirical properties make score-driven models a commonly used model for the purpose of
volatility forecasting (Artemova et al., 2022).
2.3 Performance Evaluation
Since true volatility can not be observed, a proxy for the true volatility has to be used
when forecasting volatility. The estimation error of the proxy itself is likely to distort the
11
evaluation of the models. Many studies have shown that the superior model depends on
the choice of loss function, see Hansen and Lunde (2005) and Patton (2011). The latter
also concludes that the two loss functions that are robust enough to handle imperfections
in the proxy for volatility are Mean Squared Error (MSE) and Quasi-Likelihood (QLIKE).
The QLIKE function is an asymmetrical loss function that penalizes under-prediction
heavier than over-prediction, while the MSE penalizes symmetrically. This implies that
if you are comparing two forecast procedures, and one consistently produces positively
biased forecasts, while the other produces forecasts that are negatively biased by the same
magnitude, then the QLIKE function can significantly favor the positively biased forecast.
Penalizing under-prediction heavier than over-predictions is preferred as the former is nor-
mally more costly and, therefore, of importance when considering activities such as risk
management. Studies have shown that the Realized GARCH outperforms the EGARCH
when using the QLIKE loss function, however, when using MSE the EGARCH model is
superior (Sharma & Vipul, 2016).
The Diebold-Mariano test (Diebold &Mariano, 1995) is useful when testing and comparing
the accuracy of forecasts for two different models. However, when Diebold (2015) looks
back on his earlier work he concludes that the DM test is to a large extent applied with an
improper purpose or intention. Instead of comparing forecast performance, much literature
uses the DM test to compare the models themselves. This clear distinction is necessary to
avoid drawing false conclusions from the results of a study. When the DM test is applied
with consideration to its actual intentions, it can be very useful thanks to its simplicity
and wide applicability. Harvey et al. (1997) analyse the behaviour of the DM test further
and conclude that despite that it is easily computed, it generally performs better for large
samples while it tends to be oversized for smaller sample sizes. When applied on larger
samples, the DM test performs well even in the case of autocorrelated forecast errors, and
fat-tailed as well as Gaussian distributed errors.
12
3 Methods
We calculate daily open-close and close-close log returns as follows:
rOC,t = ln(Pclose,t)− ln(Popen,t), (1)
rCC,t = ln(Pclose,t)− ln(Pclose,t−1), (2)
Using log returns is preferred when working with time series of stock prices as in this case
since it allows us to add up periods of returns and say something about the total return
of that period. For example, if we have the log return for two consecutive days we can
simply add these two to get the total return over the two days.
Since volatility is not an observable variable it is necessary to make use of signal as an
approximation of the true volatility. In this study, the daily volatility is approximated
by:
σ2t = r
2
t , (3)
The one-day ahead forecast σ2T+1|T is generated by the data t = 1, ..., T . We use a rolling
window so that the following one-day ahead forecast is given by t = 2, ..., T+1. The length
of the rolling window T is set to 1260, which is approximately five years worth of trading
days. Using a rolling window has several advantages as it allows for a more dynamic and
adaptive analysis of the return series. The approach can help capture temporal patterns
and trends in the data over time. Another reason for a using rolling window is that it may
provide more precise estimates as we continuously re-estimate the parameters with regard
to new information as we move forward in time. A drawback of rolling window estimates
is the sensitivity to window size. Too small windows might not capture the underlying
level of volatility as well as outliers could have a significant impact on the estimates, while
13
too large windows could make the model slow at catching up to changes in the underlying
volatility.
3.1 GARCH-models
3.1.1 ARCH(q)
The ARCH(q) model first introduced by Engle (1982), exploits the fact that variance
appears to change over time and that periods of small and large returns tend to be clus-
tered. The model allows for lagged past values of residuals to influence the current level
of volatility. The ARCH(q) model is defined as:
∑q
σ2t = ω + α
2
iεt−i, (4)
i=1
where, q is the number of lags, σ2t is the conditional variance at time t, εt is the residual
term at time t and ω and αi are constants estimated by maximum likelihood. To avoid
negative variance, restriction∑s have to be imposed on these constants. More specifically,
they have to be positive. If qi=1 ai < 1 then σ
2
t is stationary. One of the problems with
the ARCH(q) is that many lags are needed for the model to perform well in practice,
which is not desirable as it requires the user to estimate many parameters.
3.1.2 GARCH(1,1)
Bollerslev (1986) proposes a generalization of the original ARCH-process which allows for
past conditional variances in the current conditional variance equation. This extension of
the ARCH(q) model allows for a much more flexible lag structure by including past volatil-
ity as a describing factor for the current volatility. The most commonly used specification
for GARCH(p,q) is GARCH(1,1) which is defined as:
σ2 = ω + α ε2 + β σ2t 1 t−1 1 t−1, (5)
14
where, in addition to the ARCH-parameters, β1 is another constant estimated by likeli-
hood. In addition to the restrictions imposed on ARCH(q), α1+β1 < 1 and β1 > 0 should
hold for GARCH(1,1) to ensure that σ2t is stationary. It can be shown that GARCH(1,1),
by repeated substitution, can be rewritten as an ARCH(∞) which shows that GARCH
captures the clustering with fewer lags.
3.1.3 E-GARCH(1,1)
A limitation of both ARCH(1) and GARCH(1,1) is that they fail to capture the fact that
the market reacts more strongly to negative shocks than positive shocks (leverage effect).
Nelson (1991) introduces the EGARCH(p,q) model to deal with this. The model allows
for an asymmetric response to positive and negative shocks by including a leverage term
in the equation for the conditional variance. The leverage effect is, as described in Section
2.1, also confirmed as a stylized fact by Engle and Patton (2001). In this paper, we use
the EGARCH(1,1)-specification which follows:
[ √ ]εt−1 |εt−1|
log(σ2) = ω + α γ + − 2/π + β log(σ2t t−1), (6)σt−1 σt−1
where, γ is the leverage coefficient and it captures the leverage effect if γ < 0. If β < 1
the EGARCH(1,1) is stationary. Using the logarithmic form of σ2t allows the parameters
to be negative while keeping the conditional variance positive.
3.2 GAS(1,1)
Creal et al. (2013) propose a new class of observation-driven models, Generalized Au-
toregressive Score (GAS) models. Since the GAS-framework is score-driven it is very
flexible and it is possible to obtain many other observation-driven models within the
GAS-framework. The most simple specification of the model is the GAS(1,1):
σ2t = ω + A s +B σ
2
1 t−1 1 t−1, (7)
15
st = St · ∇t, (8)
where, ∇t refers to the score and St is the scaling matrix. St can be specified in many
different ways, allowing for flexibility in the GAS-filters. In this paper, we derive the opti-
mal filter by specifying St as the inverse of Fisher information. For Gaussian distribution,
the filter equation is:
σ2t = ω + A
2
1(εt−1 − σ2t−1) +B1σ2t−1, (9)
Which is equivalent to GARCH(1,1). Note that the coefficients differ, but since β1 =
B1 − A1 and α1 = A1 equations 5 and 9 are equivalent. If A1 = B1 the model is instead
reduced to ARCH(1).
3.2.1 t-GAS(1,1)
Since the GAS models utilize the optimal updates (Blasques et al., 2015), we see that
GARCH(1,1) is optimal for a Gaussian distribution. However, it is of interest to test if
forecasting performance can improve by assuming a different conditional distribution. If
we instead assume Student’s t-distribution, we obtain t-GAS(1,1) which has the following
filter equation:
( )[( ) ]−1( )
v + 3 ε2
σ2 = ω + A 1 + t−1
v + 1
ε2 2 2t 1 v (v − 2)σ2 v − 2 t−1
− σt−1 +B1σt−1, (10)
t−1
where, v is the degrees of freedom. As we can see, assuming Student’s t-distribution,
GAS(1,1) is not equivalent to GARCH(1,1). By including t-GAS(1,1) we allow for heavier
tails in the conditional distribution. This also makes the model more robust to outliers
than its Gaussian counterpart.
16
3.3 Realized Variance
The measure we use for realized variation is the Realized Variance (RV) proposed by
Andersen and Bollerslev (1998). The RV for a single day t is given by:
∑M
RVt = (r )
2
t,i , (11)
i=1
where, rt,i is the ith observation on day t. Realized volatility at time t is approximated
√
as RVt. In contrast to the open-close RV (RVOC,t), the close-close RV (RVCC,t) includes
the squared return between 17:24 at time t− 1 and 09:00 at time t.
3.3.1 HAR-RV
Corsi (2009) proposes a Heterogeneous Autoregressive model of Realized Volatility (HAR-
RV). The model concentrates on heterogeneity originating from investors’ difference in
time horizons. Some investors have a very short, intra-daily frequency while others might
trade less frequently such as once a month. The idea of the model is that agents with
different types of trading horizons perceive, react to, and cause different types of volatility
components. In his model, Corsi identifies three types of components, short-term (daily),
mid-term (weekly), and long-term (monthly). The HAR-RV model follows:
(d) (d) (w) (w) (m) (m)RVt = ω + β RVt−1 + β RVt−1 + β RVt−1 , (12)
(w) ∑1 5 (d) (m) ∑1 22 (d)where, RVt−1 = k=1RVt−k and RVt−1 = k=1RVt−k. In other words, the weekly5 22
RV is the sum of the last 5 days RV and the monthly is the sum of the last 22 days RV.
An empirical fact of RV is that it tends to exhibit high serial correlation over many lags,
which is something that is captured by both the weekly and monthly terms in HAR-RV.
In contrast to the other models in this study, for HAR-RV the coefficients are estimated
using regression which is equivalent to maximum likelihood.
17
3.3.2 Realized GARCH(1,1)
Hansen et al. (2012) propose a GARCH model that utilizes realized measures of volatility.
This model, commonly referred to as Realized GARCH, is based on an autoregressive
moving average (ARMA) structure for both the realized measure and the conditional
variance. Unlike traditional GARCH models that rely solely on past volatility to forecast
future volatility, Realized GARCH models incorporate the realized measures of volatility,
which are derived from high-frequency intraday data. Andersen et al. (2003) justify the
use of realized measures ahead of daily data as the latter is slow to react to changes
in the volatility. Since normal GARCH models make use of daily data they can only
gradually adjust to volatility changes while the Realized GARCH model that utilizes
realized measures is relatively fast to adapt. Therefore, the use of realized measures in
this framework is intended to provide more accurate and more timely predictions of future
volatility. The Realized GARCH(1,1) follows:
log(σ2t ) = ω + β log(σ
2
t−1) + γ log(xt−1), (13)
log(xt) = ξ + φ log(σ
2
t ) + δ(zt) + ut, (14)
where, xt is a realized measure of volatility, in this study we use RVt. δ(zt) is the leverage
function of the equation and the component that incorporates the leverage effect, e.g., the
effect of returns on future volatility. Hansen et al. (2012) propose δ(z) = δ1z + δ2(z
2 − 1),
as a simple specification of the leverage function. Given that xt is a realized measure based
on intraday data and that σ2t is squared daily returns, then φ provides information about
how much of the daily volatility occurs during trading hours.
18
3.4 Performance Evaluation
3.4.1 Benchmark
To evaluate the performance of the GARCH(1,1), EGARCH(1,1), and t-GAS(1,1) we use
the squared daily log returns as a proxy for the true daily volatility. For the HAR-RV and
Realized GARCH(1,1) we use the RV as described in Section 3.3.
3.4.2 Loss Functions
To evaluate the accuracy of the models we compare the forecasts with the benchmark
using two loss functions. The reason why we use two different loss functions is that they
penalize prediction errors differently, which allows for a more extensive comparison.
The first loss function we use is Mean Squared Error (MSE), a loss function that penalizes
errors symmetrically and since it squares the errors, MSE penalize outliers heavier than a
loss function based on absolute values would. MSE is defined as:
1 ∑T
MSE = (σ2t − σ̂2)2t , (15)T
t=1
where, σ̂2t is the predicted signal for the true volatility and T is the number of observations.
We also use Quasi-likelihood (QLIKE), a loss function that penalizes under-predictions
heavier than over-predictions, something that is preferable in areas such as riskmanage-
ment. Under-predicting volatility presents investors with a forecast that appears less risky
than the actual risk, potentially leading to a false sense of security and too risky invest-
ments. Conversely, over-predicting volatility may make investors overly cautious, as it
exaggerates the expected risk level. Due to investors’ tendency to be risk averse, smaller
gains are generally preferred over significant losses. The QLIKE metric is the loss function
implied by a Gaussian likelihood. While the MSE depends solely on the forecast error,
the QLIKE loss function is instead based on the standardized forecast error. The QLIKE
19
function is specified as follows:
∑T ( )1 σ2
QLIKE = log(σ̂2t ) +
t , (16)
T σ̂2
t=1 t
3.4.3 Diebold-Mariano Test
To compare the forecasts we use the Diebold-Mariano (DM) test (Diebold & Mariano,
1995). The DM test compares the forecast errors from the two different forecasts and
tests the null hypothesis of equal forecast accuracy. There are several benefits of using the
DM test when comparing volatility forecasts. First, it is easy to implement and does not
require any extensive calculations. Another benefit of the DM test is that it is robust to
various distributional assumptions which makes it possible to compare the GARCH model
with the t-GAS model. Different loss functions can be used in the DM test which allows
for a more extensive comparison. The loss differential dt at time t is given by:
dt = e
2 2
it − ejt, (17)
We test the following hypothesis:
H0 : E(dt) = d̄ = 0, HA : E(dt) = d̄ ̸= 0, (18)
And the DM test statistic is given by:
√ d̄DM = , (19)
V ar(d̄)
T
We test at a significance level of α = 0.05. If we are able to reject the null hypothesis,
we can conclude that the forecast with the lower error of the two compared forecasts is
more accurate. However, if we do not reject the null hypothesis, we can not draw any
conclusions based on our results.
20
4 Data
In this section, we provide information about the sample data and the required modifi-
cations that are done before we apply different volatility forecasting models to the data.
This particular research endeavor delves into the XACT OMXS30 which is an ETF on the
OMXS30. The reason for choosing XACT OMXS30 is that it reflects the index well and is
liquid enough to study high-frequency data. The ETF is weighted on market value and is
re-balanced twice a year. The time frame studied in this paper ranges from 2012-02-01 to
2022-01-31, spanning a period of ten years, and consists of 2450 trading days. Choosing a
period of this substantial length provides a unique opportunity to examine both financially
tumultuous periods and more stable times. Notable financial events during this time are
the UK’s exit from the European Union in 2016 (Brexit) and the Covid-19 pandemic in
2020 which had a huge impact on all financial matters.
The data is gathered from the Swedish House of Finance and consists of the traded price of
XACT OMXS30. Initially, the data is sampled at a 1-minute frequency, we then re-sample
the data at different frequencies to find the optimal frequency with the least market mi-
crostructure noise. The different frequencies that we compare are 1, 5, 10, 15, 20, and 30
minutes. For any observations without trading activity, we resolve the issue by assuming
that the price is equal to the price of the most recent observation with observed trading
activity. Further, the data set is cleared from any time stamps referring to non-continuous
trading periods, e.g., opening and closing auctions. Any observations that occur before
09:00 and after 17:24 are referred to as pre-trade and post-trade, and are not included in
the data set.
In this study, we examine both open-close returns and close-close returns. While open-
close returns include trades during the actual day of interest, 09:00 to 17:24, the close-close
returns incorporate the overnight effect as it spans from 17:24 the day before to 17:24 of
21
the day of interest. The reason for including two different sets of return series is that the
volatility may differ since the close-close data captures the effect of financial news and
events that happen during non-trading hours.
4.1 Daily Returns
0.12
0.1
0.08
0.06
0.04
0.02
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
(a) Open-Close
0.12
0.1
0.08
0.06
0.04
0.02
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
(b) Close-Close
Figure 1: Absolute Daily Log Returns for XACT OMXS30
Based on Figure 1, it is clear that the absolute log returns do not exhibit a clear trend.
Rather, the fluctuations in returns have varying amplitudes over time, with the periods
of greatest volatility coinciding with the two major financial events, namely the time of
Brexit (2016) and the Covid-19 pandemic (2020). As for both of the largest shocks, the
periods that follow maintain relatively low volatility. These findings suggest an alternating
pattern of volatility over time with periods of higher volatility appearing as clusters which
is one of four empirical facts discussed in Section 2.1. Further, it is clear that the absolute
22
Absolute Log Return Absolute Log Return
log returns in Figure 1 include a few extreme values, i.e., outliers, that could potentially
make a t-distributed volatility model better suited than one with a Gaussian distribution.
Further, the absolute log returns for the close-close data seem to be slightly more volatile
compared to the open-close data. This is expected since the open-close data captures a
shorter time frame as it only covers the actual trading hours and does not incorporate the
effects of events during non-trading hours.
(a) Open-Close Returns
(b) Close-Close Returns
(c) Open-Close Squared Returns
(d) Close-Close Squared Returns
Figure 2: Autocorrelations for Daily and Squared Daily Log Returns
23
We test the autocorrelation of the returns and volatility to determine whether the obser-
vations are stationary or non-stationary. The correlation coefficient between a time series
and a lagged version of itself is plotted against the lag. If the time series is stationary, the
autocorrelation should decrease quickly and stay close to zero for all lags. If the time series
is non-stationary, the autocorrelation may remain high even for large lags, indicating the
presence of a trend or other non-stationary behavior. Figure 2 shows that both the open-
close and close-close returns have autocorrelations close to zero which means that is not
possible to predict future returns based on past observations. Conversely, the autocorrela-
tion plots for the volatility, that is the squared returns, show clear signs of non-stationary
trends as the autocorrelation is significant even for larger lags. This is welcomed since it
allows us to use squared returns to forecast future volatility.
24
(a) Open-Close, Gaussian (b) Open-Close, Student’s t
(c) Close-Close, Gaussian (d) Close-Close, Student’s t
Figure 3: Log Returns Plotted Against Gaussian Distribution and Student’s t-distribution.
25
To further analyse the properties of the data, we plot the open-close and close-close log
returns respectively in histograms with both a Gaussian distribution and a Student’s t-
distribution, see Figure 3. It is clear that the Student’s t-distribution has a better fit to the
data compared to the Gaussian distribution. While it does not perfectly capture the high
thin peak around the mean, it is still an improvement from the Gaussian distribution. This
confirms the fourth empirical property of Engle and Patton (2001) discussed in Section 2.1,
the unconditional distribution of returns has fat tails. When comparing the two figures it
is further clear that the close-close returns include more extreme values, e.g. outliers, and
therefore a histogram with fatter tails.
Table 1: Descriptive Statistics - Daily Returns and Squared Daily Returns
Daily Returns Mean Standard Deviation Sample Size Skewness Kurtosis
Open-Close 0.00003 0.0088 2450 -0.25 3.16
Close-Close 0.00045 0.0109 2449 -0.90 8.68
Squared Daily Returns
Open-Close 0.00008 0.0002 2450 8.99 132.20
Close-Close 0.00012 0.0004 2449 19.01 536.07
When estimating the kurtosis of the daily returns as shown in Table 1, it is clear that the
distribution has positive excess kurtosis. The daily close-close returns have a relatively
high value of kurtosis which is a sign of leptokurtosis and that the distribution incorporates
heavier tails than a Gaussian distribution. However, the daily open-close returns has a
kurtosis value close to 3 which is the value for a Gaussian distribution. For squared
daily returns, both open-close and close-close show clear signs of leptokurtosis. Further,
we see that both the daily open-close returns and the close-close returns are negatively
skewed. Conversely, the open-close and close-close squared daily returns are positively
skewed. Skewness is a measure that indicates whether the distribution of the data set is
positively or negatively skewed about its mean. Negative skewness means that the tail
of the distribution is concentrated to the right and that one may expect a few extreme
values to the left.
26
4.2 Realized Returns
To determine the optimal sampling frequency for the RV, we compare the mean RV at
different frequencies. We compute RV for 1, 5, 10, 15, 20 and 30 minutes.
(a) Open-Close (b) Close-Close
Figure 4: Average Daily RV for Sampling Frequencies Ranging from 1 to 30 Minutes
From Figure 4 we see that the slopes in both graphs increase significantly for higher fre-
quencies than 10 minutes, implying that any frequency higher than that is upward biased.
The average realized variance for each frequency is computed as an average across all days
in the sample. Lowering the frequency from 1 to 10 minutes seems to be optimal as this
is where the line is fading out and the bias, i.e., microstructure noise, is presumably rela-
tively low. To allow for an informative comparison between the open-close and close-close
data in realized measures, we use the same sampling frequency for both. While the graphs
are not comparable in the sense that they have the same y-axis, they both showcase how
the average RV decreases when lowering the frequency. The dotted line shows the average
RV for the selected frequency (10 minutes).
Table 2 provides descriptive statistics for the realized variance with 10 minute sampling
frequency. Compared to the statistics in Table 1, it is clear that realized variance has
27
Table 2: Descriptive Statistics - Realized Variance
Mean Standard Deviation Sample Size Skewness Kurtosis
Open-Close 0.00008 0.0001 2450 12.42 239.16
Close-Close 0.00013 0.0003 2449 10.89 149.95
both lower mean and standard deviation than the daily returns, however, compared to
the squared daily returns the values are similar. The skewness and kurtosis are also
substantially higher than for the daily returns which is expected since the RV only takes
positive values.
(a) Open-Close
(b) Close-Close
Figure 5: Autocorrelations for Realized Variance
28
Next, we check for autocorrelation for the open-close and close-close RV. As Figure 5
shows, the autocorrelation is significant for both sets of RV which is welcomed as it allows
us to predict future variance by using historic data. The plots show the significance level of
5 percent and we are using 20 lags. Even for higher lags, the autocorrelation is significant
which proves that past RV is a good predictor of future RV.
29
5 Results
Table 3: Average Estimated Coefficients
Table 3 shows the average estimated values for each coefficient across all windows. Restric-
tions are applied when estimating the coefficients, in line with what is described in Section
3 where descriptions of the coefficients can be found. Further, the degrees of freedom, v,
are constrained to be above 7. All values are significant for a t-test at α = 0.05.
Model ω A d w m1 B1 γ v B B B ξ φ δ1 δ2
GARCH 2.33E-06 0.11 0.86
EGARCH -0.36 0.17 0.96 -0.11
t-GAS -0.28 0.07 0.97 9.56
HARRV 1.11E-05 0.26 0.07 0.01
rGARCH 3.12E-05 0.29 0.60 0.09 0.90 0.13 0.10
GARCH 3.11E-06 0.12 0.85
EGARCH -0.30 0.15 0.97 -0.13
t-GAS -0.35 0.09 0.96 9.38
HARRV 2.94E-05 0.07 0.08 0.01
rGARCH 4.94E-05 0.29 0.60 0.09 0.91 0.13 0.10
We re-estimate the coefficients for our models for each window and the average values
for all coefficients are presented as follow in Table 3. First, we see that the averages for
the degrees of freedom v are slightly above 9 for both open-close and close-close which
indicate that the t-GAS model indeed assumes the conditional distribution of returns to
be heavy-tailed. Further, the averages for the leverage term γ in EGARCH are negative
which, as mentioned in Section 3.1.3, means that it captures the leverage effect. Focusing
on HAR-RV, we see some notable differences between the average coefficients in the open-
close and close-close estimates. For the open-close estimate, HAR-RV puts more weight
on the RV the day before than in the close-close estimate while the close-close estimate
instead have a higher value for the intercept, ω.
30
Close-Close Open-Close
5.1 Squared Daily Returns
5.1.1 Open-Close
Figure 6: Rolling Window Open-Close Volatility Forecasts
Figure 6 illustrates the forecasts produced by three models: GARCH, EGARCH, and
t-GAS. It is clear that the EGARCH model deviates from the other two models by pro-
ducing a forecast with more extreme highs and lows. As for the GARCH model and the
t-GAS model, they provide relatively similar forecasts. A notable difference between the
two models is that GARCH forecasts more extreme highs on a few occasions, e.g., during
the beginning of 2020. The reason why t-GAS is not predicting as high peaks as GARCH
is that it treats extreme values as outliers and therefore these values do not have as big
an impact on the t-GAS forecast which results in a smoother forecast.
Table 4: Performance Evaluation - Squared Daily Returns
Open-Close MSE QLIKE
GARCH 3.304E-05 -4.104
EGARCH 3.154E-05 -4.116
t-GAS 3.265E-05 -4.102
Table 4 shows that the difference between the three models is relatively small in regards
to both MSE and QLIKE. The forecast generated by EGARCH has the lowest prediction
31
error for both loss functions. We also see that while t-GAS has a lower prediction error
than GARCH for MSE, GARCH has a lower error for QLIKE. To be able to conclude that
the EGARCH forecast is superior we test the significance of the loss differentials between
the models using the DM test.
Table 5: Diebold-Mariano Test - Open-Close
MSE Loss diff DM p-value
GARCH vs. EGARCH 1.501E-06 2.756 0.01
GARCH vs. t-GAS 3.931E-07 0.706 0.31
EGARCH vs. t-GAS -1.108E-06 -1.762 0.08
QLIKE
GARCH vs. EGARCH 0.012 4.604 0.00
GARCH vs. t-GAS -0.002 -1.227 0.18
EGARCH vs. t-GAS -0.014 -4.488 0.00
As Table 5 shows, the EGARCH forecast significantly outperforms the regular GARCH
forecast using MSE as the loss function. However, for the comparisons between GARCH
and t-GAS, as well as between EGARCH and t-GAS, using MSE as the loss function,
we fail to reject the null hypothesis. When evaluating the models using the QLIKE loss
function, we reject the null hypothesis for both the GARCH vs. EGARCH and EGARCH
vs. t-GAS comparisons. This implies that EGARCH outperforms both models when eval-
uating the forecasting performance by the QLIKE loss function. However, for the GARCH
vs. t-GAS comparison, we can not reject the null hypothesis and is therefore not able to
draw any conclusions about the relative performance between the two forecasts.
32
5.1.2 Close-Close
Figure 7: Rolling Window Close-Close Volatility Forecasts
The graph in Figure 7 shows the volatility forecasts of GARCH, EGARCH, and t-GAS for
close-close returns. While the models provide relatively similar forecasts for most of the
time, the t-GAS model does not provide as high peak as GARCH and EGARCH at the
beginning of 2020 which marks the most volatile period. As mentioned earlier, this is due
to the fact that the t-GAS model treats extreme values as outliers. Except for this spike,
the EGARCH model stands out with more notable highs and lows which is in line with
what was shown also for the open-close returns. Comparing the open-close and close-close
forecasts, we see that the close-close forecasts are higher than the open-close forecasts.
Something that we expect since close-close volatility in general is higher than open-close
volatility.
Table 6: Performance Evaluation - Squared Daily Returns
Close-Close MSE QLIKE
GARCH 6.207E-05 -3.893
EGARCH 5.611E-05 -3.911
t-GAS 6.013E-05 -3.892
Table 6 shows information about the two loss functions for the close-close data. Similar to
the results in Table 3, EGARCH outperforms both GARCH and t-GAS when evaluating
33
the forecasts using MSE and QLIKE. Once again we see that t-GAS has a lower error than
GARCH for MSE while GARCH has a lower error for QLIKE. Both MSE and QLIKE
show higher forecast errors for the close-close data compared to the open-close data. The
fact that EGARCH is the model that has the least increase in forecast error is a sign that
the leverage function incorporated in the model performs well when applied to a more
volatile set of observations.
Table 7: Diebold-Mariano Test - Close-Close
MSE Loss diff DM p-value
GARCH vs. EGARCH 5.957E-06 4.650 0.00
GARCH vs. t-GAS 1.948E-07 1.375 0.16
EGARCH vs. t-GAS -4.009E-06 -2.582 0.01
QLIKE
GARCH vs. EGARCH 0.018 5.671 0.00
GARCH vs. t-GAS -0.001 -0.373 0.37
EGARCH vs. t-GAS -0.019 -5.734 0.00
Following the same procedure as in Table 4, we compare the different forecasts for the
close-close data by using the DM test and computing the p-values. As shown in Table 7,
EGARCH significantly outperforms both GARCH and t-GAS using MSE as the loss func-
tion. We can not reject the null hypothesis of equal performance when comparing GARCH
and t-GAS with MSE. For the QLIKE loss function, we reject the null hypothesis for both
tests that include EGARCH and conclude that EGARCH provides a significantly better
forecast than the two other models. Further, we can not reject the null hypothesis when
for GARCH and t-GAS using QLIKE as the loss function. We also see that the errors for
both loss functions are smaller for open-close data than close-close, however, the rankings
for both loss functions are the same, indicating that the accuracy of the models is lower
for more time series with higher volatility.
Overall, we find clear evidence that EGARCH provides the most accurate forecasts when
34
using squared daily returns as a proxy for volatility. These results are in line with what
H.-C. Liu and Hung (2010) conclude when they state the importance of modelling an
asymmetric component when working with data that has a fat-tailed distribution. In Sec-
tion 4.1 we confirm that the returns have a leptokurtic distribution, hence the EGARCH is
a model well suited for providing an accurate volatility forecast. Our results do not show
any significant gains from using a t-distributed model, while Hansen and Lunde (2005)
amongst others, have found that GARCH models with a t-distribution outperforms a
Gaussian GARCH(1,1) when forecasting stock returns.
5.2 Realized Variance
5.2.1 Open-Close
Figure 8: Rolling Window Open-Close Volatility Forecasts
Figure 8 shows the volatility forecasts using the Realized GARCH and HAR-RV models.
While it is clear that the Realized GARCH predicts higher volatility for most periods, both
models seem to follow similar patterns where the spikes occur simultaneously. Notably,
the HAR-RV model forecasts a peak at the beginning of 2020 equally high as the Realized
GARCH model, despite predicting lower volatility for most other periods.
As Table 8 shows, HAR-RV generates more accurate forecasts for the open-close data
35
Table 8: Performance Evaluation - Realized Variance
Open-Close MSE QLIKE
rGARCH 1.523E-05 -3.862
HAR-RV 8.674E-06 -4.844
than the Realized GARCH, both with MSE and QLIKE as loss functions. Somewhat
surprisingly, Realized GARCH does not seem to benefit from forecasting higher levels of
volatility in the open-close setting when we use QLIKE as the loss function, despite the
loss function penalizing under-prediction heavier than over-prediction. To further validate
which of the two models that generate superior forecasts we progress with the DM test.
Table 9: Diebold-Mariano Test RV - Open-Close
MSE Loss diff DM p-value
rGARCH vs. HAR-RV 6.569E-06 8.495 0.00
QLIKE Loss diff DM p-value
rGARCH vs. HAR-RV 0.98 131.502 0.00
The results from the DM test are shown in Table 9. It is clear that the forecast pro-
vided by the HAR-RV model is significantly superior to the one provided by the Realized
GARCH model. This is the case for both loss functions which allows us to conclude that
the HAR-RV model provides a more accurate volatility forecast.
36
5.2.2 Close-Close
Figure 9: Rolling Window Close-Close Volatility Forecasts
Figure 9 shows similar results as shown in Figure 8, that is the Realized GARCH model
predicts higher levels of volatility than the HAR-RV model. It is worth noticing that the
forecasts for close-close volatility are noticeably higher than the open-close forecast. Com-
pared to the forecast for open-close volatility, this forecast provides a significant difference
between the two models during the most volatile periods. The HAR-RV forecast does not
match the Realized GARCH forecast during either of the extreme peaks in 2020.
Table 10: Performance Evaluation - Realized Variance
Close-Close MSE QLIKE
rGARCH 3.905E-05 -3.680
HAR-RV 2.808E-05 -3.719
From Table 10 we see that when utilizing close-close data, Realized GARCH does not
benefit from forecasting higher levels of volatility than HAR-RV when using QLIKE as
the loss function. At least not enough to outperform HAR-RV. Similar results are shown
for the MSE loss function, where HAR-RV outperforms Realized GARCH in forecasting
performance.
37
Table 11: Diebold-Mariano Test RV - Close-Close
MSE Loss diff DM p-value
rGARCH vs. HAR-RV 1.097E-05 4.460 0.00
QLIKE Loss diff DM p-value
rGARCH vs. HAR-RV 0.039 13.373 0.00
Similar to the results presented in Table 9 for open-close volatility, Table 11 confirms that
we can reject the null hypothesis for both loss functions when we compare the close-close
forecasting performance of Realized GARCH and HAR-RV. We conclude that HAR-RV
generates a more accurate forecast than Realized GARCH in this setting.
5.3 EGARCH vs. HAR-RV
Next, we compare the forecast performance of EGARCH and HAR-RV. It should be noted
that we now compare one model that forecasts squared daily return with one that fore-
casts realized variance, i.e., the models forecast different measures. The comparisons are
therefore not as reliable and informative as the previous ones. But since the forecasts are
evaluated based on the measure that it forecasts, comparisons will be made as they could
indicate whether incorporating realized measures could improve forecasting ability.
Table 12: Diebold-Mariano Test
Open-Close Loss diff DM p-value
HAR-RV vs. EGARCH - MSE -2.314E-05 -9.521 0.00
HAR-RV vs. EGARCH - QLIKE 0.191 10.859 0.00
Close-Close
HAR-RV vs. EGARCH - MSE -7.286E-05 -7.652 0.00
HAR-RV vs. EGARCH - QLIKE 0.753 38.026 0.00
38
The DM tests shown in Table 12 prove that HAR-RV is significantly better than EGARCH
when evaluated by MSE. Conversely, when using the QLIKE loss function, the EGARCH
is shown to be superior to HAR-RV. While the results depend on what loss function that is
used, we can conclude that all tests are significant and we are therefore able to tell which
one of the two volatility forecasting models that are performing the best when applying
both MSE and QLIKE. The results are the same for both open-close and close-close
forecasts.
5.4 Volatility Shocks
To further visualize the difference between assuming a t-distribution and Gaussian dis-
tribution for the conditional distribution of returns, we plot the close-close volatilities
for GARCH, EGARCH, and t-GAS during the two largest volatility shocks, Brexit in
2016 and COVID-19 in 2020. Note that we previously show focus on the out-of-sample,
i.e., forecasted, volatility from the models, while we now switch focus to the in-sample
volatility.
39
Figure 10: Close-Close GARCH Volatilities during Brexit (2016)
Looking at Figure 10, the difference between a Gaussian and a more heavy-tailed dis-
tribution for conditional volatility is clearly visible. The figure highlights the volatility
modelled by the different GARCH models during Brexit 2016, which if we recall Figure
1(b), was an extremely volatile period. It is clear that GARCH is heavily affected by
the shock while the t-GAS model does not react in the same way. This is because the
Student’s t-distribution reduces the impact of outliers for the t-GAS model and therefore
provides a more smooth pattern with lower peaks. Further, as it was a negative shock, we
see that EGARCH also shows a very high level of volatility. However, the return to the
normal level of volatility, i.e., mean reversion, is faster than for GARCH.
40
Figure 11: Close-Close GARCH Volatilities during the COVID-19 crisis (2020)
Figure 11 shows another very volatile period, the COVID-19 crisis that occurred in 2020.
Similar patterns to Figure 10 are found where the GARCH model is heavily influenced by
the volatility shock and shows high levels of volatility for many periods after the initial
shock while t-GAS provides a smoother path. As EGARCH initially shows high levels of
volatility but then in the following periods shows lower volatility than both t-GAS and
GARCH, we understand that the initial shock is negative and that the ETF after that
seems to recover.
41
6 Conclusions
We find that forecasts generated by EGARCH are more accurate than forecasts generated
by GARCH and t-GAS allowing us to conclude that the inclusion of a leverage term in
the forecasting models does improve the forecasting accuracy. As for the models that
utilize realized measures, we find clear evidence that HAR-RV provides more accurate
forecasts than Realized GARCH. Considering the high level of autocorrelation that RV
exhibits, it is not surprising to see that HAR-RV performs so well. Comparing HAR-RV
and EGARCH, we see that HAR-RV provides more accurate forecasts for a symmetrical
loss function, however, EGARCH provides more accurate forecasts with a loss function
that penalizes under-prediction heavier than over-prediction. Indicating that EGARCH
might be more suitable for risk-management purposes.
When comparing open-close and close-close forecasts, we conclude that the former have
smaller prediction errors for all forecasts. Looking solely at the values of MSE and QLIKE,
we see no difference in performance between the models when comparing open-close and
close-close forecasts. However, the DM test shows that we can reject the null hypothesis
of equal performance more often when we use close-close data than open-close data.
We find no significant evidence indicating that the use of Student’s t-distribution for the
conditional volatility improves forecasting performance. However, occasions where the
benefits are clearly visible are presented. During both Brexit and COVID-19 we see that
t-GAS is more robust to outliers while these shocks have a larger impact on the Gaussian
GARCH.
42
References
Andersen, T. G., & Bollerslev, T. (1998). Answering the Skeptics: Yes, Standard Volatility
Models do Provide Accurate Forecasts. International Economic Review, 39 (4), 885–
905.
Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). Modeling and Fore-
casting Realized Volatility. Econometrica, 71 (2), 579–625.
Artemova, M., Blasques, F., Brummelen, J. V., & Koopman, S. J. (2022). Score-Driven
Models: Methodology and Theory. Oxford Research Encyclopedia of Economics and
Finance.
Bandi, F. M., & Russell, J. R. (2008). Microstructure Noise, Realized Variance, and Op-
timal Sampling. The Review of Economic Studies, 75 (2), 339–369.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., & Shephard, N. (2008). Designing
Realized Kernels to Measure the ex post Variation of Equity Prices in the Presence
of Noise. Econometrica, 76 (6), 1481–1536.
Blasques, F., Koopman, S. J., & Lucas, A. (2015). Information-theoretic Optimality of
Observation-driven Time Series Models for Continuous Responses. Biometrika,
102 (2), 325–342.
Bollerslev, T. (1987). A Conditionally Heteroskedastic Time Series Model for Speculative
Prices and Rates of Return. The Review of Economics and Statistics, 69 (3), 542–
547.
Bollerslev, T. (1986). Generalized Autoregressive Conditional Heteroskedasticity. Journal
of Econometrics, 31 (1), 307–327.
Bollerslev, T., Litvinova, J., & Tauchen, G. (2006). Leverage and Volatility Feedback
Effects in High-Frequency Data. Journal of Financial Econometrics, 4 (3), 353–
384.
Christoffersen, P., Feunou, B., Jacobs, K., & Meddahi, N. (2014). The Economic Value of
Realized Volatility: Using High-Frequency Returns for Option Valuation. Journal
of Financial and Quantitative Analysis, 49 (3), 663–697.
43
Corsi, F. (2009). A Simple Approximate Long-Memory Model of Realized Volatility. Jour-
nal of Financial Econometrics, 7 (2), 174–196.
Creal, D., Koopman, S. J., & Lucas, A. (2013). Generalized Autoregressive Score Models
With Applications. Journal of Applied Econometrics, 28 (1), 777–795.
Diebold, F. X. (2015). Comparing Predictive Accuracy, Twenty Years Later: A Personal
Perspective on the Use and Abuse of Diebold-Mariano Tests. Journal of Business
& Economic Statistics, 33 (1), 1–9.
Diebold, F. X., & Mariano, R. S. (1995). Comparing Predictive Accuracy. Journal of
Business & Economic Statistics, 13 (3), 253–263.
Engle, R. F. (1982). Autoregressive Conditional Heteroskedasticity with Estimates of the
Variance of United Kingdom Inflation. Econometrica, 50 (4), 987–1007.
Engle, R. F., & Patton, A. J. (2001). What good is a volatility model? Quantitative
Finance, 1 (1), 237–245.
Hansen, P. R., Huang, Z., & Shek, H. H. (2012). Realized Garch: A Joint Model for Returns
and Realized Measures of Volatility. Journal of Applied Econometrics, 27 (6), 877–
906.
Hansen, P. R., & Lunde, A. (2005). A forecast comparison of volatility models: Does
anything beat a GARCH(1,1). Journal of Applied Econometrics, 20, 873–889.
Hansen, P. R., Lunde, A., & Nason, J. M. (2003). Choosing the Best Volatility Models:
The Model Confidence Set Approach. Oxford Bulletin of Economics and Statistics,
65 (1), 839–861.
Harvey, D., Leybourne, S., & Newbold, P. (1997). Testing the Equality of Prediction Mean
Squared Errors. International Journal of Forecasting, 13 (1), 281–291.
Koopman, S. J., Lucas, A., & Scharth, M. (2016). Predicting time-varying parameters
with parameter-driven and observation-driven models. The Review of Economics
and Statistics, 98 (1), 97–110.
Liu, H.-C., & Hung, J.-C. (2010). Forecasting S&P-100 stock index volatility: The role
of volatility asymmetry and distributional assumption in GARCH models. Expert
Systems with Applications, 37 (7), 4928–4934.
44
Liu, L. Y., Patton, A. J., & Sheppard, K. (2015). Does anything beat 5-minute RV? A com-
parison of realized measures across multiple asset classes. Journal of Econometrics,
187 (1), 293–311.
Nelson, D. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach.
Econometrica, 59 (2), 347–370.
Patton, A. J. (2011). Volatility Forecast Comparison Using Imperfect Volatility Proxies.
Journal of Econometrics, 160 (1), 246–256.
Sharma, P., & Vipul. (2016). Forecasting Stock Market Volatility using Realized GARCH
model: International Evidence. The Quarterly Review of Economics and Finance,
59 (1), 222–230.
45