Market efficiency and index fund flow An empirical study of the relationship between passive investment and broad-market efficiency Authors: Erik Larsson Jacob Wergeland Supervisor: Taylan Mavruk University of Gothenburg Department of Economics Centre for Finance Graduate School Master Thesis in Finance Gothenburg June 2020 iAbstract An observable rise in the popularity of index funds have caused the index funds to, in 2017, capture 20% of total fund assets globally. A cornerstone of such passive investment is a belief in an efficiently priced security market. This paper aims to relate index fund flows with market efficiency during the period 2000-2019. Using S&P500 returns we estimate a market efficiency measurement called the Hurst exponent, using two accredited methods: the rescaled range analysis (RS) and the detrended fluctuation analysis (DFA). We find similar estimations as previous studies, wherein the S&P500 index have exhibited a slight mean-reverting return process, close to theoretical market efficiency. We further relate this time-varying market efficiency measurement of S&P500 to its index fund flows. Using a correlation filtering method to find index funds in the US targeting the S&P500 index, and aggregating these mutual funds individual flow, we obtain aggregate index fund flow. Conducting a Granger causality test on both fractional flow and dollar flow, we find a causality that market efficiency Granger cause index fund flow. We further estimate that a lesser degree of market efficiency have a negative impact on flow: the more long-term memory the index experience, the smaller level of flow. These results hold stronger for dollar flow rather than fractional flow. Suggested keywords: market efficiency, Hurst exponent, mutual fund flow, passive investments ii Acknowledgement The classic pirate saying goes: It is not the treasure chest at the X-marked spot that is most valuable; the real treasure is the friends acquired on the journey. When we now find ourselves at the crossroads between academic school years and a professional career, we say this hold with absolute truth. Whether this friendship is characterized by knowledge, companionship, or everlasting-friends, does not matter; every student should cherish every moment. In such dire times as the present with COVID-19 ravaging the world, where a smile might be rare, we would like to start this paper with a small and playful joke. Question: What is the difference between a government bond and a man? Answer: The government bond matures! Before we let the reader dive into the text, we would like to direct our incomparable thanks towards our supervisor Taylan Mavruk for his valuable and helpful guidance. Af- ter all, the mason stands powerless without the forge. With that, we wish you a pleasant reading experience and hope this paper provides both interesting and educational aspects. Best regards, Erik & Jacob CONTENTS iii Contents Abstract i Acknowledgement ii 1 Introduction 1 2 Literature review 3 2.1 Market efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Fund flows and passive investments . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Fees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Data 6 3.1 Data management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Total net assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1.2 Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Handling of fees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Methodology 12 4.1 General method setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2 Long-term dependency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.3 The Hurst exponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3.1 GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3.2 Rescaled range analysis . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3.3 Detrended fluctuation analysis . . . . . . . . . . . . . . . . . . . . . 18 4.4 Endogeneity concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.5 Estimation and control variables . . . . . . . . . . . . . . . . . . . . . . . . 23 5 Results 25 6 Conclusion 33 7 References 34 A Appendix 37 A.1 Data adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 A.2 Hurst histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 A.3 Regressions - naked data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 LIST OF TABLES iv List of Figures 1 Histogram of returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Flow time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Evolution of Fees over time . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4 Hurst exponent results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Hurst exponent and CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6 Hurst histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 List of Tables 1 Descriptive statistics for returns . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Flow descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 AR(1)-GARCH(1,1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 VAR specification - fractional flow . . . . . . . . . . . . . . . . . . . . . . . 20 5 VAR specification - dollar flow . . . . . . . . . . . . . . . . . . . . . . . . . 21 6 Granger Causality test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7 Annual statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 8 Regressions - fractional flow . . . . . . . . . . . . . . . . . . . . . . . . . . 28 9 Regressions - dollar flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 10 Regressions (lagged Hurst) - dollar flow . . . . . . . . . . . . . . . . . . . . 30 11 Standardized Hurst coefficients . . . . . . . . . . . . . . . . . . . . . . . . 32 12 Data adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 13 Regressions, naked data- fractional flow . . . . . . . . . . . . . . . . . . . . 38 14 Regressions, naked data - dollar flow . . . . . . . . . . . . . . . . . . . . . 39 1 INTRODUCTION 1 1 Introduction Over the last decade, in the wake of the great financial crisis, the financial markets have experienced an unprecedented rise in index funds as a popular investment vehicle; a rise supported by their cost-efficiency (Malkiel, 2003; Sirri & Tufano, 1998; Weissensteiner, 2019). Passive investments, typically referring to a broad market index investment sce- nario, 1 did in june 2017 grow up to 20% of total fund assets globally, compared to 8% in 2007 (Sushko & Turner, 2018). Although there may be considerable amounts of moti- vations for utilizing passive investing, one cornerstone, which should not be diminished, is that the market must be considered efficient for passive investing to perform (Wer- mers, 2000).2 If the market would not be efficient, prowess investors would not choose passive investing (with its corresponding investment into something incorrectly priced) but instead change strategy and actively invest, improving the performance of their in- vesting (Garleanu & Heje Pederson, 2019). As such, in the case of inefficient markets, some investors would deviate from this passive investing and would therefore change their allocation from the market distribution; assuming informed decisions and sufficient and adequate market-moving power, the efficiency of the market would thus improve. Fama (1970, p.383) famously described market efficiency as a market where ”security prices always ’fully reflect’ all available information”. Correctly priced securities thus bring a no-arbitrage trading or investing environment, wherein no parties suffer informa- tional disadvantages. One important aspect of this no-arbitrage system of an efficient market is that no investor can be expected to persistently and systematically realize ab- normal returns; and, as such, in an efficient market, only random (or rather unpredictable) fluctuations of the returns are possible (Kristoufek & Vosvrda, 2013). As such, it seems that the key motivational tool for a passive investor is a belief in an efficient market; where upon such a realization of an efficient market, inducing an inflow of passive money. Fund inflow and outflows have been a thoroughly researched topic within the finance lit- 1Explicitly, passive investments are investments with a purpose of replicating or obtaining the return of a market benchmark or index. Such an investment could be an ETF (exchange-traded fund) or mutual fund targeting an index. Most mutual funds replicate target index return by holding the index composite assets weighted by their index proportion (traditional passive investing), while other funds use derivative instruments (e.g. a synthetic ETF) to artificially replicate the return (synthetic passive investing). Important to note here is the implication for market efficiency and how the all-encompassing passive investing relates to it. As the synthetic passive investment vehicles or instruments have no market moving-impact on the underlying index, it should not be equalized with traditional passive investments, which posses such powers, with regard to market efficiency. It is self explanatory that a derivative transaction should have no price effect on the underlying asset, while a transaction of the underlying asset should affect the price of the underlying asset. Finally is the distinction between passive investment and passive management (an investment with no intent of actual monitoring), where the latter is often categorized as dumb money. Although passive investment infers a non-monitoring scenario as well, it is not identical to passive management and should not be confused as such. 2Such motivations could be time-constraints and similar hobby-related causes. For such investors, passive investing might prove easiest and most well-performing choice, regardless of the efficiency of the market. 1 INTRODUCTION 2 erature. Understanding catalysts for fund flows have been essential for both practitioners and researchers. Previous studies has mainly focused on the impact of fees (Huang, Wei, & Yan, 2008; Sirri & Tufano, 1998), fund returns (Edelen & Warner, 2001) and volatility (Cao, Chang, & Wang, 2008). So how do we start by examining this causality between market efficiency and index fund flows? And maybe more importantly, what is the direction of the causality? Several event studies have examined the phenomenon of abnormal returns realized by index com- position changes (see, for example, Belasco, Finke, & Nanigian (2012) or Petajisto (2011)), clearly outlining an impact on prices, compared to non-constituents, due to index fund flows. Such impact suggests predictability of abnormal returns of single constituents, but does it indicate direction of the index or broad market efficiency? Intuitively, the overall index return cannot be predicted from abnormal returns realized by index constituents addition or deletion. As such, we have to utilize another type of measurement to deter- mine broad market efficiency and how it relates to index fund flows. There exists various such ways to measure and test broad market efficiency: Approximate Entropy [ApEn] (Pincus, 1991; Pincus, 2008) to measure the irregularity and unpredictability of fluctua- tions in time-series data; Variance Ratio (Lo & MacKinlay, 1988); the Efficiency Index (Kristoufek & Vosvrda, 2013); or the Hurst exponent (Bariviera, 2011; Eom et al., 2008) for measuring long-term memory. Notably, as Belasco, Finke, & Nanigian (2012) suggest, the abnormal single constituents returns are in fact the liquidity premium corresponding to index inclusion. Maybe this hints that it is in fact liquidity that is related to market efficiency (it may be reasonable to assume that liquidity should be higher for an efficient market, i.e. unpredictable returns), but Bariviera (2011) find only a partial relationship between liquidity and market efficiency. The majority of these previous studies have focused on testing, measuring, and rank- ing the efficiency of large indices of various securities; but, in light of the above discussion about the rise of passive investing, we will try to expand the existing body of litera- ture by investigating the relationship between market efficiency and index fund flows, utilizing the Hurst exponent as a measurement for market efficiency. The expansion is two-fold: extending the econophysics literature by relating market efficiency to important market characteristics, building upon the liquidity-linkage by Bariviera (2011) and the predictability-linkage by Eom et al. (2008); and, further developing the fund flow litera- ture by widening the studies from Huang, Wei, & Yan (2007), and Cao, Chang, & Wang (2008), and partly Sirri & Tufano (1998). The vast majority of research within both market efficiency and flow have been con- ducted for the US-market, due to being among the most active and liquid markets. Further reasons to use the US-markets is the possibility to compare our findings to similar previous studies. We aim to examine whether the S&P500 have exhibited market efficiency over last 2 LITERATURE REVIEW 3 two decades. We expect to find similar degree of efficiency for S&P500 as other developed markets.3 We further aim to explain if, and subsequently how, index fund flow relates to market efficiency: the direction of the causality and the following impact. We hypothesize the causal direction that the degree of market efficiency affects the level of index fund flows, with the magnitude that a less efficient market would induce less flow. We identify 633 index funds targeting S&P500 between 2000 and 2019. Using monthly index fund return and total net asset, we compute monthly aggregate flow values, both fractional flow and dollar flow. We estimate the Hurst exponent for S&P500, acting as a market efficiency measurement proxy. Overall, the S&P500 experienced a slight mean- reverting return process, close to market efficiency during our sample period. Identifying a causal relationship wherein a lesser degree of market efficiency negatively affects aggre- gate index fund flow. This relationship is characterized by a magnitude of one standard deviation change in the market efficiency measurement indicating a change in dollar flow by approximately 15% of the standard deviation in dollar flow. 2 Literature review 2.1 Market efficiency In 1970 (Fama) famously postulated the efficient market hypothesis (EMH) which has been the dominant view of market functionality. Ever since creation, the EMH has been challenged. Grossman & Stiglitz (1980) examined the aspects of information accessibility and how this affects the EMH. Costless information is a requirement for efficient markets, and, for a functioning marketplace, this is per se impossible (Grossman & Stiglitz, 1980). With respect to the rise of passive investing in the recent decade, Fama (1991) posited that efficient markets gave rise to passive investing, since markets were theoretically already fully efficient. More recently, Weissensteiner (2019) instead theorized a reverse relation- ship wherein market-wide forecast errors were reduced by passive investing and efficiency improved. Kristoufek & Vosvrda (2013) found that, by estimating their Efficiency Index, the most efficient markets were Japan, Denmark, and Germany and the least efficient markets were Peru, Sri Lanka, and Slovakia. Suggesting that maybe the more globally in- tegrated markets exhibit a larger degree of efficiency; nevertheless, the more local and less developed markets often exhibit more long term memory (i.e. a Hurst exponent greater than 0.5) while the US, UK, and other similar global markets were characterized by a reverse condition wherein they experienced anti-persistence (i.e. a Hurst slightly lower 3The expectation of a similar degree of efficiency for S&P500 as other developed market is based on previous studies findings, wherein such a similarity is found. Furthermore, Eom et al. (2008) conclude that the degree of efficiency is highly related to predictability (average hit-rate); thus, global markets should intuitively not exhibit easily available predictability. 2 LITERATURE REVIEW 4 than 0.5)(Kristoufek & Vosvrda, 2013)4. Bariviera (2011) found a similar result, wherein the Thai stock market nearly consequently experienced positive memory from 1975 until 2005. Cajueiro & Tabak (2004) elaborated further on emerging markets and computed Hurst coefficients for a wide range of market and compared them to the US market. They found over time mainly positive long term memory for all the examined markets, hence the US was closest to a Hurst of 0.5, concluding it to be the most efficient, whereas the emergent markets were less efficient. Likewise, Eom et al. (2008) found a difference in market efficiency between emerging and more established markets. They utilized the Hurst coefficient and find that many of the less develop market places to be above market efficiency level of 0.5, hence a positive correlation. This implies that emerging markets exhibit more long-term memory than their developed counterparts. Their result is in general in line with previous studies, implying less efficiency in emerging markets than more developed markets; suggesting that a higher Hurst exponent corresponds to a lower degree of efficicency, rather than a Hurst exponent disconnected from 0.5. However, the Hurst levels estimated by Eom et al. (2008) for emerging markets were, in general, lower than other studies (see for example Cajueiro & Tabak (2004)). An explanation for this discrepancy could be due to differing underlying computational statistics between the papers,5 which will be covered in this thesis method section. 2.2 Fund flows and passive investments Numerous studies have examined factors affecting mutual funds flow. Generally, the mu- tual fund flows are affected by previous returns. While all mutual funds flow are sensitive to recent returns (Sirri & Tufano, 1998; Sapp & Tiwari, 2004), the sensitivity increases with lower participation costs (Huang, Wei, & Yan, 2007). Similarly, Warther (1995) found that market-wide aggregate inflows are strongly affected by simultaneous aggregate price movements. Cao, Chang, & Wang (2008) find that high frequency market volatil- ity is negatively affected by aggregate flow, entailing evidence that a positive (negative) shock in flow decreases (increases) market volatility. Sirri & Tufano (1998) found that the flow into high-performing funds were disproportionately larger than the outflow from poor-performing funds. The large body of mutual fund flow studies mainly deal with panel data and intra-competition between the funds, and are often comprised entirely of 4The long term memory can be resembled by a momentum-factor (increases are likely to be followed increases), and anti-persistence by a more negative correlation than randomness prescribes (increases are more likely to be followed by decreases). The Hurst exponent is a method stemming from engineering, to find repeating patterns within a data. A Hurst of 0.5 indicates no memory whereas a higher or lower signals memory, either positive or negative memory. 5Eom et al (2008) uses detrended fluctuations analysis (DFA) to compute their Hurst coefficient whereas Cajeiro & Tabak (2004), Bariviera (2011) and Kristoufek & Vosvrda (2012) utilizes rescaled range statistics (RS). DFA is excluding short term memory in data at a greater level compared to the RS. Yielding a difference in results. This will be further explained in the method section 2 LITERATURE REVIEW 5 actively managed funds. Index funds flow studies widely concerns miss-pricing due to the flow, rather than trying to explain the flow. An inclusion into the substantial main indices may distort prices due to the large share of passive capital inflow caused by the relatively unconscious passive investing. Abnormal returns, both negative with the dele- tion from index and positive with the addition to index, affects securities bordering an index; between 1990 and 2005, the average excess (abnormal) return was from an index addition 8.8% and from index deletion -15.1% (Belasco, Finke, & Nanigian, 2012). Nonetheless, Hortacsu & Syverson (2004) identify the importance of non-portfolio attributes in attracting investors for index funds. They further note that the low partic- ipation costs of index funds cause the fund with the lowest fee not to capture the entire homogeneous index fund market. On the same note, passive investing have shown to be the most cost efficient and optimal choice for investors irregardless of market condition due to the inability of fund managers to over time outperform their benchmark index (Malkiel, 2003). This introduces an important aspect of index fund flows. There is a fundamental difference between studying fund flows on a micro-level (fund specific flow) and on a macro-level (aggregate flow)(Cao, Chang, & Wang, 2008). Micro-level studies on mutual funds may examine differences between fund categories, often categorized the funds having different objectives and returns, where the outflow from one fund might be offset by inflow into another fund. Macro-level studies disregard specific fund flow, thus only examine market flows (Warther, 1995). Hortac¸su & Syverson (2004) found that index fund intra-competition were mainly driven by non-portfolio attributes, suggesting that micro-level studies on index funds relationship to some index measurement makes less sense since, generally, the reason of one index funds outflow should not be a another index funds’ better return (after all index funds targeting the same index should have similar returns). The low sensitivity of fees for index funds combined with their cost efficiency suggest that index funds do not, in a performance setting, compete intravenous, like actively managed fund do. To conclude, when examining funds consisting of index holdings a macro-level approach is more suitable (Warther, 1995). 2.2.1 Fees The subject of fees and its impact on flow has been revisited in the literature, Sirri & Tufano (1998) and Huang, Wei, & Yan (2007) are among the more noticed. The many properties of the fee component is not in the main scope for this thesis, although it is essential to keep in mind when discussing funds. The general effect found in previous literature indicate that fees has a negative impact on the fund flows, higher fees results in lower inflow Sirri &Tufano, (1998). This relationship comes with exceptions, the most prominent is the participation cost, including both search and marketing costs (Sirri &Tufano, 1998). Funds pay to gain investors and therefore raise the fees, however still attracting flow. 3 DATA 6 The fee structure is usually complex and includes several different fees, where the most common are the front and rear load fees, management fees and 12b fees.6 For index funds the main aspect of differentiating is how to streamline operations in regards to fees, since the assets are bound to be index replicating. Cash management of the inflows and outflows are crucial when evaluating the overall performance (Elton, Grauber & Busse, 2004). 3 Data From CRSP (WRDS), we obtain monthly data of total net assets (TNA) and fund return for 43 939 funds during the period 2000-2019, as well as both daily and monthly data for S&P500 for the period 1996-20197. The daily data is used for the calculation of Hurst exponent and the monthly S&P500 data is utilized to generate returns for index funds, when specific fund returns are missing. To identify index funds tracking the S&P500 amongst all the acquired data we use a method suggested by WRDS (the provider of CRSP). By checking whether the funds have a return with a correlation of at least 99.5% (ρ ≥ 0.995) to the S&P500, we isolate all index funds tracking the S&P500.8 By utilizing this filtering method, we reduce the number of mutual funds down to 633; in other words, we identify 633 index funds which at some period between 2000 and 2019 target S&P500.9 We present introductory summary statistics for the identified index funds and S&P500 in Table 1. In figure 1 we show histograms over the S&P500 and identified index funds returns. We observe similar means and medians, while the index funds exhibit a wider distribution with more a larger outliers, resulting in a higher kurtosis and fatter tails. The lowest reported TNA from CRSP is 0.10, i.e. $100 000, and is often reported the first period the index fund became active: why both the minimum and 1% value are the same. 3.1 Data management Regarding missing values in both the return and total net assets (TNA) data of these identified index funds, numerous actions has been undertaken. As the remaining funds now target S&P500, we simply replace missing return values with the relevant return for S&P500. Although we noted previously, and as can be observed in Table 1, the index 612b fees are costs associated with distribution and marketing. In general this fee combined with the management fee is acknowledge as the expense ratio. 7The return data rt and the total net assets data TNAt indicate the return and total net assets for period t as of month-end. 8The index fund flag indicator provided by CRSP is deemed insufficient for isolating index funds targeting specific indices, why we instead use the suggested correlation filter approach. 9For some of these identified index funds we had ”inadequate” number of monthly returns, down to as few as only 2 monthly returns. Although these sparse monthly returns produced a sufficient correlation against S&P500, for robustness, we did a name check and confirmed index funds targeting S&P500. 3 DATA 7 Summary statistics Panel A: returns Panel B: total net assets S&P 500 (D) S&P 500 (M) Index funds (M) Index funds (M) N 240 5031 62111 70158 Max 11.55% 10.77% 34.06% 319624.1 99% 9.42% 3.43% 9.73% 70468.7 75% 2.97% 0.56% 3.22% 667.9 50% 0.055% 0.96% 1.11% 148.8 25% -1.78% 0.47% -1.70% 23.3 1% -10.99% -3.32% -10.91% 0.1 Min -16.94% -9.03% -35.33% 0.1 Mean 0.023% 0.42% 0.56% 2847.8 Std 1.129% 4.18% 4.46% 15304.9 Skew -3.83 -59.9 -60.70 10.4 Kurt 11.8169 4.1068 5.1379 136.3 Table 1: Descriptive statistics for S&P500 and identified index funds from 2000-01-01 until 2019-12-31. Index funds were identified by filtering with a correlation of at least 99.5% towards S&P500. The TNA are in millions of USD. (D) and (M) represents daily and monthly data, respectively. The difference in the number of observations between the returns and TNA is due to that the missing values do not always align for the two variables. funds and S&P500 does not exhibit identical distributions, the high correlation threshold should be sufficient to not cause structural deficiencies. Below we outline procedures undertaken for the TNA and flow data.10 3.1.1 Total net assets Several funds had reported a value of 0 for total net asset (TNA) for some months, which is unreasonable, as that would pertain to a non-existing fund; hence these values were treated as missing values. The missing TNA values were filled by three techniques: back- ward filling, forward filling and linear interpolation11. We utilize the Linear Interpolation method, and the other two methods for robustness check. Gaps of missing values ranging from 1 month up to 12 months were filled. Gaps of more than 12 months were treated as missing values and were not filled. To ensure this would not cause inconsistencies in the data, the number of gaps that would be filled from 13 months up to 24 months were one gap of 16 months and four gaps of 17 months. Limiting the fill gap method to 12 months does not greatly inhibit the data and suggest a removal of the possibility of an eventual 10In appendix A.1 we outline these procedures as well, but in a table format instead. 11For clarification, the linear interpolation fill method is simply a equal step-by-step increase to reach the end of the gap from the beginning of the gap. Explicitly, if Ystart is the value before the gap and Yend is the value after the gap, and n is the number of missing values between these two data points, then the ni:th missing value Yni is given by Yni = Ystart + ni × Yend−Ystart n+1 . The backward and forward filling techniques are simply the next value after the gap carry backward and the previous value before the gap carry forward, respectively. 3 DATA 8 (a) S&P500 (b) Index funds Figure 1: Histogram over the returns of S&P500 and identified index funds from 2000-01-01 until 2019-12-31. The return data depicted are in percentage and can be seen in Table 1. structural bias in the data set. This procedure increases the number of total TNA data points by 2782, from 67 341 to 70 158. 3.1.2 Flow In an attempt to not reinvent the wheel, and thus by convention of previous research (Sirri & Tufano, 1998; Huang, Wei, & Yan, 2007), we utilize a standard measurement variable for mutual funds flow. Explicitly, this can be defined as FLOWi,t = TNAi,t − TNAi,t−1 × (1 + ri,t) TNAi,t−1 (3.1) where TNAi,t is the total net assets for the fund i for time period t, and ri,t is the return for fund i for the time period t. This standard procedure of estimating fund flows removes the intrinsic returns of the funds constituents, thus cleaning the total assets change from general price movements. This procedure for the funds flow computation is normally done for estimating cross-sectional relationships. We base our aggregate index fund flow computation formula on equation 3.1, but, as expected on an aggregate level, we modify the formula to estimate flow on a macro-scale. We adopt a similar method as Cao, Chang & Wang (2008) for calculating aggregate flow. The computation becomes fairly simple, where, for each time period, we estimate the dollar value flow for each fund and obtain a aggregate dollar value flow for each period. We subsequently adjust the aggregate dollar value flow by dividing by previous periods aggregate total net assets, creating a fractional 3 DATA 9 flow measurement. Mathematically, this procedure is equivalent to FLOWAGGt = Nt∑ i=1 FLOWi,t × TNAi,t−1 × Nt−1∑ j=1 1 TNAj,t−1 (3.2) where Nt is the number of index funds for period t. Explicitly, the dollar value flow is thus given by FLOWAGGt = Nt∑ i=1 FLOWi,t × TNAi,t−1 (3.3) As these flow equations rely on a difference operator, the resulting number of FLOWAGGt values is one less than the number of ∑Nt i=1 TNAi,t values. As such, we compute two different flow measurements: fractional flow using equation 3.2, and dollar flow using equation 3.3. The reason why we utilize equation 3.2 and 3.3 for calculating FLOWAGGt values is due to that some adjustments need to be done on individual fund flow level. Due to this nature of the flow data, we obtain a total of 70 255 final individual flow values, after all adjustments described below. Before any such adjustments to the flow data, we have 69 516 data points (i.e. naked individual flow data).12 As the data exhibits irregularities regarding when different funds start and stop reporting data (we explain these as the inception and death of the fund); following Sirri & Tufano, 1998, we manually insert a 100% flow value for the month of the first reported TNA value (the Inception) and a - 100% flow value for the month succeeding the last reported TNA value (the Death). This Inception/Death procedure adds 739 new flow values, raising total data points to 70 255. Likewise, some funds have long time periods of non-reported data in between data points; we treat funds with such data structure as dead if the gap is longer than 12 months. To reduce the impact of large outliers in the flow data, especially occurring in the second period of a funds reported data, wherein the first period TNA is typically a very low value and then a huge influx of money in the second time period’s TNA value, causing an enormous flow value. We have winsorized the flow values in the first and the ninety ninth percentile. We winsorize the entire data set, and not per fund or per month, replacing 1405 number of Flow values.13 Finally, we utilize equation 3.2 and 3.3 to compute our FLOWAGG values. In Table 2 we present descriptive statistic for our calculated FLOWAGG values, pro- viding us with several insights, and in figure 2 we show the evolution of FLOWAGG over time. The fractional flow data experience a clear positive mean of 0.35% (4.28% annu- alized) very similar to the median, possibly due to the much larger number of inflow 12In appendix A.3 we present regressions with the naked aggregate flow data. We advise the reader to read section 4 and 5 before engaging the naked data regressions. 13It is obvious that the ordering of the data adjustments matter. Naturally, we winsorize last as to not cause any ’damage’ to the dataset. 3 DATA 10 Flow descriptive stats Panel A: Aggregate mutual fractional fund flow N Max 75% 50% 25% Min Mean Std Skew Kurt Flow 239 155 49 32 15 -70 35 33 0.78 4.76 Inflow 212 155 51 36 22 00 41 29 1.43 4.45 Outflow 27 -1 -6 -12 -19 -70 -15 14 -2.45 9.87 Panel B: Aggregate mutual dollar fund flow N Max 75% 50% 25% Min Mean Std Skew Kurt Flow 239 8.48 3.81 2.21 0.89 -1.73 2.47 2.06 0.49 2.82 Inflow 212 8.48 3.99 2.54 1.46 0.00 2.86 1.85 0.72 2.96 Outflow 27 -0.02 -0.30 -0.44 -0.57 -1.73 -0.56 0.40 -1.51 4.78 Panel C: partial autocorrelations of aggregate mutual fund flow Lag 1 2 3 4 5 Fractional flow 0.1105 0.0817 0.1525 -0.0501 0.1205 Dollar flow 0.5527 0.2733 0.2244 0.0609 0.1138 Table 2: Descriptive statistics for monthly aggregated flow values from Feb 2000 until Dec 2019. The values of fractional flow are in basis points and the values for dollar flow are in billions of dollar. Flow represent all flow data, while Inflow and Outflow represent the flow data corresponding to positive and negative flow, respectively. observations. We observe that only (approximately) 11.3% of the flow data are outflows, and the last outflow occurs on Oct 2011. The extreme values are clearly observed for inflow, for both the fractional and dollar measurement. The fractional flow data seems to exhibit leptokurtic properties, meaning it contains fat tails and are more prone to outliers in comparison with a standard normal distribution, whereas the dollar fund flow show signs of the opposite, platykurtic, implying fewer and lower extreme values. We also ob- serve a positive skewness for flow (more so for fractional flow than dollar flow), somewhat contradicting the daily S&P500 flow values computed by Cao, Chang & Wang (2008) over the period Feb 1998 to Dec 2003, who obtained a mean of -0.20 basis points with more outflows than inflows. They attribute their negative skewness to the large amounts of outflows that occurred during the dot-com bubble. Nevertheless, the evolution of our monthly aggregate fractional flows appear similar to Cao, Chang & Wang (2008)(the evo- lution of our fractional flow is showed in figure 2), suggesting that for an increasing fund sector the volatility in mutual fund flows occur more early than late for a given time span; why we also compute and test dollar flow. We contribute this decreasing fractional flow to the somewhat stable dollar flow. The large returns attributed to S&P500 during 2010- 2019 is causing a size effect for fractional flow, wherein the total net assets is increasing more than dollar flow, resulting in a smaller relative flow. 3 DATA 11 Figure 2: Estimated monthly aggregate Flow values from Feb 2000 until Dec 2019. Fractional flow (left hand axis) are presented in basis points, and dollar value Flow (right hand axis) are presented in billion of dollars. 3.2 Handling of fees Data of different fees were obtained, including expense ratio, management fees and 12b fees. The return data obtained from CRSP handles the management and 12b fees, as they are included in the net asset value (NAV) of which the return is calculated (Center for research in security prices (CRSP), 2019). Equation 3.4 is used by CRSP to calculate the return for a specified fund. rt = NAVt ∗ Cumfact NAVt−1 − 1 (3.4) where Rt is the return for time t, Cumfact is a factor consisting of various distributions, such as dividends and splits that occur in the holdings, and NAVt is the net asset value as of period-end t. This procedure is estimated to capture the effect of these events and yield a fair fund return. In conclusion, we are left with the expense ratio paid by investors, which will be utilized as the fee variable throughout this thesis. Hortac¸su & Syverson (2004) find that index fund intra-competition were mainly related to other factors than fund fees. This suggests that the aggregating the index fund fee data does not cause harmful inconsistencies in the data, but rather provide us with a useful tool estimate the aggregate index fund flows to the general index fund fee level. As such, we collect yearly fee data for the identified index funds. Fee values are collected as percentage points and in the report fees are on average for all fund, i.e non weighted to capture the entire market of index fund and not suffer from large impact of leading index funds. The fee data experienced missing values in connection to inception and liquidation. These missing values were either filled by the previous years reported fee, or, if that value does no exist, the next years reported fee. Since the fees reported are yearly fees, they 4 METHODOLOGY 12 Figure 3: The evolution of average index fund yearly fee for the period 2000-01-01 to 2019-12- 31. The Fee data are presented in basis points. were divided by 12 to represent the cost for each month of holding the fund. Thus, we obtain piece wise monthly fee data, readjusting on a yearly basis. Finally we calculate the mean of each month to obtain an aggregated fee per month for all the index funds. Figure 3 displays a clear pattern in fee reduction over the past 20 years. The average cost of a S&P500 index fund has dropped roughly 30 basis points. Interestingly, a small upwards rebound is observed during periods of crisis (in 2002 and 2008-2010). 4 Methodology Following the recent conventional papers on the evolution of market efficiency (i.e. time indexed/varying), we will utilize the Hurst exponent as a proxy for market efficiency (Bariviera, 2011; Eom et al, 2008; Kristoufek & Vosvrda, 2012). Explicitly, this thesis thus aims to examine the relationship between aggregate index funds flow (estimated on aggregated fund level) and the markets Hurst exponent (estimated on market level, i.e. the target index). This section is organized as follows: we will first setup up the general methodology as a martingale sequence for testing EMH, then introduce the concept of long-term dependency, and finally we will conduct a Granger causality test and specify our estimation model. 4 METHODOLOGY 13 4.1 General method setup As is custom (see for example Kristoufek & Vosvrda (2013) or Huang, Wei, & Yan, (2007)), we can define an efficient market in accordance to security prices conforming to a martingale sequence. Let C be the securities market expressed by the probability space (Ω,F , P ), where Ω is the sample space, F the event space (a subset of Ω). Further, let P be the real probability measure and the conditional probability as P[Xt|Ft]. As such, A securities market C = (Ω,F , P ) is efficient if there exists P such that the time-series of prices S = (St)t≥1 is a martingale process; i.e., P[|St|] <∞ P[St+1|Ft] = St t ≥ 0 (4.1) From here we can redefine the martingale price process to incorporate a random error term. Let (t)t≥1 be random IID. variables with mean zero, that is P[t] = 0. This incorporates innovations with no autocorrelation. Thus, we can reformulate the security price series into P[St+1|Ft] = St + P[t+1|Ft] (4.2) Such a security market, where the security prices are described by the above, would be efficient as the market does not exhibit any pattern or memory to exploit (McCauley, Bassler, & Gunaratne, 2008): there exists no systematic way to beat the market. By utilizing this martingale feature of security prices, we obtain a robust model and avoid the random walk assumption of homoskedasticity (Kristoufek & Vosvrda, 2013). Converting price data to return data, and let the 1-period return be rt = St−St−1 St−1 and µ be the drift of the return process, we obtain P[rt+1|Ft] = µ+ φrt−1 + P[t|Ft] (4.3) Modelling security returns in this manner (according to equation 4.3), the prices would adhere to a random walk, suggesting a weak form of EMH is present. Subsequently, such a return series residuals can be estimated to follow a white noise process by modelling the long-term dependency measurement the Hurst exponent, where a Hurst exponent equal to 0.5 corresponds to a random walk (Hsieh, 1993). This produces a variable effective as a test of the martingale sequence and long-term dependency. Important to note, is that a martingale sequence can exhibit memory in the sense we know the previous values and that the expected value is just this previous value (McCaule, Bassler, & Gunaratne, 2008). When we, in this paper, talk about a no-memory scenario, we rather mean that the innovation defining a step in the sequence cannot be predicted (i.e. it should be random). 4 METHODOLOGY 14 4.2 Long-term dependency The Hurst exponent is a popular estimate for assessing long-term memory of a time series. The method was originally invented by a hydrologist named Hurst who struggled to pre- vent the Nile River Dam of overflowing. The method was utilized within finance roughly a century after its first appearance and is today widely used to detect long term memory in data (Peters, 1991). The Hurst exponent generally has a support of H ∈ (1, 0) with three general outcome estimations: H = 0.50 indicates a random and uncorrelated series with no memory, H < 0.50 indicates that the series exhibit anti-persistence or anti-memory, and H > 0.50 indicates long-term memory in the series Peters (1991).14 Unfortunately, no known distribution exists for Hurst exponent, but these general outcomes have been asymptotically proven. The computation of Hurst exponent can be done by many different statistical techniques, the two most commonly used within finance and therefore utilized in this paper are; the rescheduled range (RS) and detrended fluctuations analysis (DFA). Both techniques are more thoroughly explained later in this chapter. One drawback of the RS Hurst exponent is the estimation of existing long-term memory due to short-term dependency in the data series (Grau-Carles (2000); Di Matteo, (2007)). The two main methods to avoid this issue is (i) to filter the return data through a GARCH-process, and (ii) to use the Detrended Fluctuation Analysis (DFA) method. Bariviera (2011) and Cajueiro & Tabak (2004) [among others] bypass this issue by filtering the return series through an AR(1)-GARCH(1,1) process. This procedure re- moves the short-term dependencies (memory) found in the time-series (by construction of the autoregressive element), which, if left undisturbed, may instill long-term dependency (where none actually exist) in the Hurst exponent (Tabak & Cajueiro, 2004; Di Matteo, 2007). The lagged component of the main equation in the GARCH leaves the residuals as the ”true and random” returns for which we aim to estimate long-term memory. Bariv- iera (2011) and Grau-Carles (2000) estimate the Hurst exponent using the DFA method, wherein the residuals of a locally detrended integrated time-series are used to compute the global fluctuations and repeated over multiple window lengths.15 Di Matteo (2007) further criticize the conventional Hurst RS method for its sensitivity to outliers and pres- ence of heteroskedasticity in the returns time series. In this paper, to avoid this spurious detection of long-term memory, we will estimate the Hurst exponent using both the RS 14The reason why the Hurst exponent, with its support of H ∈ (0, 1), can act as a proxy for market efficiency is due to the measurement of the average fluctuation and how it relates to time periods. Unfortunately When the average fluctuation for a long window sizes is roughly equal to the average fluctuation for short window sizes, the return data is exhibiting frequent sign-changes (+ and−). Likewise, when the average fluctuation is larger for longer window sizes, the return data, by necessity, exhibit sign- trends wherein subsequent returns change sign less frequently. A white noise process have been empirically asymptotically discovered to equal a Hurst exponent of 0.5 (Anis & Lloyd, 1976). 15We here use the word integrated in the sense of the original paper from Peng et al. (1995), wherein a detailed description of the DFA method can be found, meaning a mean adjusted cumulative sum; i.e. a cumulative deviation from the mean. 4 METHODOLOGY 15 approach (with filtered returns) and the DFA method. Furthermore, previous studies consistently estimate a higher Hurst exponent using the RS method than by using the DFA method (see for example Bariviera (2011)). 4.3 The Hurst exponent The Hurst exponent is formally defined as a power law function (for reference, see for example Peters (1991)) of the type Φn = Cn H as n −→∞ (4.4) where H is the Hurst exponent, C is a constant, n is the number of observations in a partial time series, and ΦRSn = E(RS)n for RS (rescaled range analysis), or Φ DFA n = Fn for DFA (detrended fluctuation analysis). As such, distinguishing these methods is the computation of the left hand side (both of which will in detail be described in the following sections). To obtain the Hurst exponent H, a log-transformation is utilized16, converting equation 4.4 into log(Φn) = log(C) +H × log(n) (4.5) where, once Φn is computed, a simple linear regression can be run to estimate H. Let’s define this inverse function for running equation 4.5 and estimating H, by inputting Φn on the left hand side, as Θ(Φn). Explicitly, Φn is thus a vector with equal size as n. When estimating a time varying Hurst exponent, reasonable assumptions regarding window size are of importance. Eom et al. (2008) used a window size of 60 months with 12 months rolling. In accordance with similar studies (see Cajuerio & Tabak, 2004; Bariviera, 2011), we are arguing for a Hurst window size corresponding to political cycles. As such, we utilize a window size of N = 1008 observations (252 trading days multiplied by four years)(let’s call this the global window length[GWL]).17 Calculating the Hurst exponent Ht thus requires N = GWL − 1 of previous observations including the observation on period t. As such, to compute the first Hurst exponent for Jan 2000, data is needed from Jan 1996; why we download a longer data set for S&P500. 4.3.1 GARCH The Generalized Autoregressive Conditional Heteroskedasticity model, or GARCH, is a process to estimate the conditional variance of a time-series which exhibits heteroskedas- ticity. The GARCH-model is constituted by two equations: the mean equation, which 16By standard of convention, the log-operator is defined as the natural logarithm. 17The Hurst window size profoundly affects the estimated Hurst exponent. Thus, choosing the correct window length is a sensitive issue, why we follow previous studies. As with any econometric computation, more observations generally produce a more robust estimation; nonetheless, when estimating current day market efficiency, far historical returns should have a diminishing effect. 4 METHODOLOGY 16 makes assumptions and models the underlying time-series, and the variance equation, which models the conditional variance of the underlying time-series. As previously dis- cussed, we filter the return data through an AR(1)-GARCH(1,1) model for a better input variable into the RS analysis. Thus, our model is specified as an AR(1)-GARCH(1,1) process rt = µ+ Φrt−1 + t (4.6) t = σtzt, zt ∼ N(0, 1) (4.7) σ2t = ω + α 2 t−1 + βσ 2 t−1 (4.8) where equation 4.6 is the mean equation with the specific AR(1)-term, and equation 4.8 is the variance equation, wherein the rt is the return for time t and σ2t is the conditional variance of the returns for time t, and µ, ω, α, and β are unknown model parameters that need to be estimated. For a reasonable behaving GARCH-process, some of the parameters require some restrictions: ω > 0, α > 0, and β > 0 to ensure positivity of the conditional variance, and α + β < 1 to ensure stationarity. In Table 3, we present our estimation of the AR(1)-GARCH(1,1) process. We observe a clear significant autoregressive process in the S&P500 returns. AR(1)-GARCH(1,1) estimation Value SE T-stat P-value AR(1) Constant 0.0008 0.0001 7.4025 0.00 AR(1) -0.0494 0.0139 -3.5559 0.0004 GARCH(1,1) Constant 0.00 0.00 6.3747 0.00 GARCH(1) 0.8745 0.0061 142.34 0.00 ARCH(1) 0.1050 0.0049 21.311 0.00 Table 3: Model estimation for AR(1)-GARCH(1,1), estimated using daily S&P500 returns from Jan 4th 1999 to Dec 31 2019; where the AR-value is the coefficient for the lagged return term in the mean equation, the ARCH-value is the coefficient for the lagged error term in the variance equation, and the GARCH-value is the coefficient for the lagged variance term in the variance equation. 4.3.2 Rescaled range analysis The computation of the rescaled range analysis follows several steps, outlined below, and is performed for multiple partial series of the original full length series. That is, we first divide the full time series of length N into j = 1, 2, ..., k non-overlapping partial time series with lengths ni. There exists several ways to conduct this division: for example, ni = N,N/2, N/4..., or ni equal to the factors of N , where i = 1, 2, ..., v. For our 4 METHODOLOGY 17 calculations we utilize the latter approach, due to its computational simplicity (following Weron (2002). We then execute the following steps for each ni 1. Calculate the mean: E[Xnij ] = 1 ni ∑ni t=1X ni j,t 2. Calculate mean-adjusted series: Yj,t = X ni j,t − E[X ni j ] 3. Calculate cumulative deviations: Znij,t = ∑ni t=1 Y ni j,t 4. Calculate the range: Rnij = max(Z ni j )−min(Z ni j ) 5. Calculate the variances: (σnij ) 2 = E[(Xnij,t) 2]− E[Xnij,t] 2 6. Compute the rescaled range and the average (expectation) for all partial time-series: R ni j σ ni j and subsequently E(RS)n = E [ Rnj σnj ] = 1k ∑k j=1 Rnj σnj After obtaining E(RS)n, it is now possible to estimate the Hurst exponent in Θ(Φn). However, such an estimation will have a significant deviation from its theoretical value for small window sizes ni (Weron, 2002). To correct for this, we subtract the window sizes theoretical white noise approximation (this modification to the RS analysis was introduced by Anis & Lloyd (1976) with some slight modifications by Peters (1991)). As such, we obtain the true deviations from the white noise slope; the Hurst exponent can thus be calculated as 0.5 plus this deviation. The theoretical white noise approximation is given by (keeping the original left hand side notation from Anis & Lloyd (1976)) E(R∗∗n ) =    ( n− 12 n ) Γ{ 12 (n−1)}√ piΓ{ 12n} ∑n−1 i=1 √ n−i i for n ≤ 340 ( n− 12 n ) 1√ 1 2pin ∑n−1 i=1 √ n−i i for n > 340 (4.9) where Γ{λ} is the gamma function evaluated at λ, and n are the window sizes ni. Ex- plicitly, the corrected version of the RS Hurst is thus calculated as18 HRS = 0.5 + Θ (E[RS]n)−Θ(E[R∗∗n ]) (4.10) Following Cajueiro & Tabak (2004) and Bariviera (2011), before calculating the Hurst exponent, we employ the AR(1)-GARCH(1,1) process to filter the returns for short-term dependency (see Table 3); the estimated residuals from the mean equation 4.6 are then divided by the conditional standard deviation from the variance equation 4.8. The result- ing fraction is our filtered returns and is thus used to complete the Hurst calculation. 18The attentive and enlightened reader might here realize that E(R∗∗n ) is not time-varying, but only a function of window sizes n. As such, in equation 4.10, we are subtracting a constant number 0.5 − Θ(E[R∗∗n ]) from all estimated RS Hurst exponents. For a global window size of 1008 returns, and local window sizes n = divisor(1008), this constant adds up to approximately -0.0688. This means that for our local window sizes n, Θ (E[RS]n) consistently overestimate the Hurst exponent by this value. 4 METHODOLOGY 18 Formally we can define the filtered returns explicitly as: Ωt = t √ σ2t (4.11) where Ωt are the filtered returns, t are the residuals from the AR(1)-GARCH(1,1) mean equation, and σ2t are the conditional variances from the AR(1)-GARCH(1,1) variance equation. That is, we conduct the RS approach on the filtered returns Ωt. 4.3.3 Detrended fluctuation analysis Detrended fluctuation analysis (DFA) is a method to detect long term memory in data similar to the previously introduced RS approach, but differing in some aspects, which makes DFA a useful variable along the rescaled range analysis for robustness purposes. DFA was introduced by Peng et al. (1995) and foremost used within the medical field to find long term correlations in heart rate and DNA data. DFA was popularized within financial data by Kantelhardt et al. (2001) and is frequently used to examine long range dependencies. When detecting long range memory it is essential to filter out disturbances possibly causing spurious memory. In the RS Hurst approach short term memory was removed from the data by the filtering process. DFA instead identifies trends within each local window size, which can cause false dependencies, both long and short term (Eom et al, 2008) caused by externalities (Kantelhardt et al, 2001). The computation of the detrended fluctuation analysis is in principal similar to the rescaled range analysis, but differs in some methodological aspects. Exactly alike, we first divide the full time series of length N into k non-overlapping partial time series (windows) with equal lengths ni (utilizing the divisors approach here as well). For each ni, we do the following. For each partial time series j = 1, 2, ..., k, we compute the mean x¯j = E[xj] and subsequently the integrated series of Xj,t (a cumulative mean-centered sum) Xj,t = t∑ i=1 [xj,i − x¯j] (4.12) Withing each integrated series Xj,t, a fitted straight line is located to find the trend for each window. Let Yˆj,t be this straight line fit, obtained from Xj,t = α + bτ , where τ = 1, 2, ..., ni. Then, the fluctuation Fn is computed as follows for each window length ni Fn = √ √ √ √ 1 N N∑ t=1 (Xj,t − Yˆj,t)2 (4.13) Thus, we obtain an average fluctuation Fn for each integrated and detrended time se- ries of length ni, and we can now estimate the Hurst exponent H as Θ(Fn) (equation 4 METHODOLOGY 19 4.5).19 In contrast with the rescaled range analysis, the Hurst exponent estimated from the detrended fluctuation analysis have a support of (0 < d ≤ ∞). Similarly, a Hurst coefficient of H < 0.5 indicates negative memory or anti-persistence and H > 0.5 positive memory, and d ≈ 0.5 is a sign of no memory. Although normal behaving return series still lie within H ∈ (0, 1), a Geometric Brownian Motion process (a cumulative sum of the returns) would theoretically produce a Hurst exponent equal to 1.5. 4.4 Endogeneity concerns The causal relationship between passive investing and market efficiency exhibits uncer- tainty regarding direction of causality. This duality of whether passive investing exists because of efficient markets or does passive investing affect market efficiency, raises con- cerns about the dependency in the structural regression. To examine whether a variable has a causal relationship with another variable we choose to conduct a Granger causality test. The test procures a measurement of how well the dependent variable exhibits the same pattern as the key variable. By lagging the key variable, where p is the minimum and q the maximum lag of significance, we observe how well they can describe or predict the dependent variable. The model is thus given by a VAR(p,q)-model, specified explicitly as   Ft Ht   =   b1 b2  +   ψ11,p ψ12,p ψ21,p ψ22,p     Ft−p Ht−p  + ...+   ψ11,q ψ12,q ψ21,q ψ22,q     Ft−q Ht−q  +   1,t 2,t   (4.14) where, Ft is the flow variable, Ht is the Hurst exponent, i,t is a standard error term i,t ∼ N(0,Σ) where Σ is a 2×2 covariance matrix. The off-diagonal elements in equation 4.14 represent the cross-impact of flow and Hurst, while the diagonal elements are the autoregressive terms. We produce VAR-models for dependent variables fractional flow and dollar flow, with dependent variables RS Hurst as well as DFA Hurst. To determine 19 We believe it is here appropriate to clarify the DFA-method with an example. Lets say the integrated series Xt have a length of 252 (let us also define this as the global window size). From this we choose a our window lengths n to be from 12 up to 1008 in increments of 1. That is we first divide Xt into non- overlapping windows of length 5, giving us b 2525 c = 50 windows (let us call these windows local windows). For each local window we locate the straight line fit Yˆt. Next, for the calculation of F5 we detrend Xt by its corresponding Yˆt and complete the computation. We then repeat this process for F6,F7,...,F252 with local window sizes of 6, 7, ..., 252. For a financial time series it is reasonable to set the minimum local window size to 5, corresponding to 1 trading week. This DFA technique is quite computational intensive as it estimates a vast number of regression. Explicitly, lets denote the total length of our required series to T , the global window length as GWL, and the maximum local window length as LWLN , then the total number of estimates Ψ are equal to Ψ = (T −GWL+1)×GWL× (∑LWLN n=LWL1 n LWLN ) . To compute daily DFA in the preiod 2000-2019 with GWL = 1008 and LWLn = 5, 6, ..., 252, we need T=6038 return data, from 8th Jan 1996 to 31th Dec 2019, resulting in approximately 641 million estimations. A more simple approach established by Weron (2002), which we utilize, is to choose LWL equal to the factors of the GWL, as well as limiting minimum LWL to 8. This is far less computational intensive and avoids the problem of unregular LWL causing last window to have fewer observations. 4 METHODOLOGY 20 VAR specification - fractional flow DFA-VAR(3) RS-VAR(1) Dependent → Flow Hurst Flow Hurst Response ↓ Constant 0.0055** 0.0127** 0.0089** 0.0161* (2.1027) (2.1772) (2.2137) (1.8786) AR(1) Flow 0.0982 0.0812 0.1056 0.0994 (1.5474) (0.5761) (1.6418) (0.721) Hurst 0.0003 1.4905*** -0.0106 0.9699*** (0.0104) (23.35) (-1.4457) (61.646) AR(2) Flow 0.0488 0.0954 - - (0.7768) (0.6841) Hurst 0.0453 -0.7232*** - - (0.9483) (-6.8149) AR(3) Flow 0.1437** 0.0326 - - (2.2949) (-0.2343) Hurst -0.0528* 0.2021*** - - (-1.8322) (3.1607) NumParam, k 14 6 LogLikelihood 1860.43 1868.38 AIC -3692.86 -3724.75 Table 4: VAR model specification for fractional flow and Hurst exponent calculated using both rescaled range (RS) and detrended flucuation analysis (DFA). The best fit for DFA was a VAR(3) model and for RS a VAR(1) model. Flow was computed as specified in equation 3.2. Daily DFA and RS Hurst exponents was computed accordingly to section 4.3.2 and 4.3.3, respectively, and subsequently averaged on monthly basis. The values in parenthesis represent the above coefficients t-statistic. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. the length of which to lag the key variable in the Granger Causality test (that is, to find optimal q), we conduct an Akaike Information Criterion-test (AIC). The AIC determines ”information loss” given a specific model, and the best fit is the model which minimizes the AIC. The AIC is given by AIC = −2(logL) + 2k (4.15) where logL is the loglikelihood value of the estimated VAR(p,q)-model and k is the number of estimated parameters for the model. We run the AIC estimation for p = 1 and q = 1, 2, ..., 12. In Table 4 and 5 we present the VAR models, and their corresponding LogLikelihood and AIC values, for fractional flow and dollar flow, respectively. The best fit (lowest AIC value) obtained for fractional flow is for a DFA specification q = 3 and for a RS specification q = 1; and, the best fit (lowest AIC value) obtained for dollar flow is for a DFA specification q = 3 and for a RS specification q = 2. The off- diagonal elements in equation 4.14 is our main interest of study, where, for example, ψ12,1 4 METHODOLOGY 21 VAR specification - dollar flow DFA-VAR(3) RS-VAR(2) Dependent → Flow Hurst Flow Hurst Response ↓ Constant 6.4417*** 0.0213*** 7.0607*** 0.0172** (3.9386) (2.8912) (3.5449) (1.9896) AR(1) Flow 0.2875*** 0.00 0.3591*** 0.00 (4.5409) (0.3181) (5.734) (0.0446) Hurst -11.296 1.4818*** -27.307* 0.9613*** (-0.7989) (23.27) (-1.8522) (14.809) AR(2) Flow 0.1414** -0.0003 0.2309*** -0.0002 (2.1943) (-0.8633) (3.7402) (-0.587) Hurst 27.11 -0.7083*** 14.573 0.0034 (1.1539) (-6.693) (0.9791) (0.0515) AR(3) Flow 0.1761*** -0.0004 - - (2.8273) (-1.3749) Hurst -28.433** 0.1803*** - - (-2.0007) (2.817) NumParam, k 14 10 LogLikelihood 399.08 391.15 AIC -770.16 -762.30 Table 5: VAR model specification for dollar flow and Hurst exponent calculated using both rescaled range (RS) and detrended flucuation analysis (DFA). The best fit for DFA was a VAR(3) model and for RS a VAR(1) model. Flow was computed as specified in equation 3.3. Daily DFA and RS Hurst exponents was computed accordingly to section 4.3.2 and 4.3.3, respectively, and subsequently averaged on monthly basis. The values in parenthesis represent the above coefficients t-statistic. Dollar value flow are in billions of dollar. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. correspond to dependent variable flow and independent response AR(1) Hurst in their respective tables.20 We note that HDFAt−3 have a significant impact on Flowt (although only partially for fractional flow), possibly causing the best fit to be a VAR(3)-model for both fractional and dollar flow. Nonetheless, HRSt−1 is not significant on fractional flow, it is partially significant at 10% on dollar flow. Moreover, all of the AR-terms on Hurst are highly significant. This is most likely due to that both DFA and RS are stationary processes. Economically this is intuitive, as one state of market efficiency highly depend on previous states of market efficiency. Next, we run Granger Causality tests with the specified VAR models presented in Table 4 and 5. A Granger causality test estimates if either of the series Granger cause the 20To ensure the causality we also conduct a BIC-test, similar to the AIC. The BIC penalizes more for complex models than the AIC, and is given by BIC = −2(logL) + k ∗ ln(n), where n is the number of observations and otherwise the same notation as in equation 4.15 applies. The BIC-test yields similar results and confirm the direction of the causality. 4 METHODOLOGY 22 Granger Causality test Null hypothesis Statistic P-value Causal direction Panel A: fractional Flow DFA does not Granger cause FLOW 1.6538 0.1778 DFA = FLOW FLOW does not Granger cause DFA 0.2892 0.8332 RS does not Granger cause FLOW 2.0638 0.1522 RS = FLOW FLOW does not Granger cause RS 0.5133 0.4744 Panel B: dollar value Flow DFA does not Granger cause FLOW 4.6735 0.0034 DFA → FLOW FLOW does not Granger cause DFA 1.3794 0.2498 RS does not Granger cause FLOW 6.1413 0.0025 RS → FLOW FLOW does not Granger cause RS 0.20499 0.8148 Table 6: Granger causality test results for the VAR specified models. Panel A is conducted using VAR specification in Table 4 and Panel B is conducted using VAR specification in Table 5. The null hypothesis tests whether the VAR coefficients are jointly difference from zero, see equations 4.16 and 4.17 for explicit specifications. A significant p-value indicates rejection of the null hypothesis. The test statistics are computed using a F-test. . other one; i.e. if predictive power exists. The null hypothesis are stated as the off-diagonal elements (in equation 4.14) provide no joint significance.21 Explicitly, this is HF0 : ψ12,p = ψ12,p+1 = ... = ψ12,q = 0 (4.16) and HH0 : ψ21,p = ψ21,p+1 = ... = ψ21,q = 0 (4.17) where HF0 is interpreted as HURST does not Granger cause FLOW, and H H 0 as FLOW does not Granger cause HURST. In Table 6 we present test statistics for our Granger Causilty test. We find no predictive power for either fractional flow on RS or DFA, and neither for RS or DFA on fractional flow. Nevertheless, both RS and DFA seem to Granger cause dollar value flow, indicating a a causality running from market efficiency towards index fund flow. Furthermore, no observable simultaneity endogeneity is indicated, sug- gesting a one-direction causality relationship. This forecasting/predictive power dictate the direction of the relationship we want to study. We will as such continue this paper by examining the effect market efficiency have on aggregate index fund flow. A possible 21Another approach to test for Granger causality is the null hypothesis that the summed off-diagonal coefficients are equal to zero, ∑ ψi,j = 0 for j 6= i. Such testing procedure examine if the predictor variable have any overall impact on the effect variable; in contrast with our current method which examines if at some lag p to q, the other series can be predicted. Although this particular sum-test approach could be of interest, we consider it out this papers scope, and to therein not conduct one. For the interested reader, this sum-procedure is performed by Kadapakkam, Krause, & Tse (2015) on ETFs. 4 METHODOLOGY 23 explanation for why market efficiency Granger cause dollar flow, and not fractional flow, could be due to the size effect of fractional flow discussed in section 3.1.2. 4.5 Estimation and control variables We base our estimation model on the causality found in the Granger causality test. Thus, to examine the relationship between passive investments and market efficiency, we regress Hurstt on FLOWt, while controlling for variables that might affect Flow levels. Several variables beside flow and Hurst are included in the regression to increase explanatory power. The variables are generally consistent with the regression of Sirri & Tufano (1998), and Huang, Wei, & Yan (2007) as well as Edelen & Warner (2001), but adapted for index funds. Hence, variables related to specific fund returns (typically a ranking based on performance of the mutual funds) are omitted since they are very similar. We instead use market returns of the index, as it is reasonable to conclude that it is the index return which base an investment choice rather than the specific index fund return. When an investor analyzes whether to invest in an index fund, it is sensible to assume that the investor inquire the general index return and then buys the index fund available from an institution where they are existing customers. As such, we regress flow on the following variables: • Hurst - is included to examine the impact market efficiency have on index fund flow. Both contemporanous and lagged Hurst exponents are estimated. • Fees - lagged expense ratio is included in the regression on an aggregated level, to capture effects of fee levels differing for index funds over the time scope of the study. The fee value is equally weighted among all the index funds. • Log(TNA) - lagged logged aggregate total net assets is included to capture changes in the total market size of index funds. • Mkret - is included to capture effects of previous performance of the index fund market. We include contemporaneous as well as both 1-period and 2-period lagged market returns. • CumV ol - lagged 12-month cumulative standard deviation is included to cover the effect of overall total riskiness of the market. This variable is created as a 12-month rolling window computation of the standard deviation. Our specified final estimation thus arrive at FLOWt = α + β1Hurstt + β2Mkrett + β3log(TNAt−1) + β4Feet−1 + β5CumV olt−1 + t (4.18) We conduct two series of regressions: the first with fractional flow as dependent variable, and the second with dollar value flow as dependent variable. Several regressions are 4 METHODOLOGY 24 estimated per flow variable with varying regressors. These regressions are presented in section 5. Annual statistics Year Flow (frac) Flow (dollar) TNA Return Fee Hurst(RS ) Hurst(DFA) 2000 0.58% 1.8764 319.25 -0.7674% 6.29 0.4471 0.3965 2001 0.56% 1.7750 294.99 -0.9904% 6.41 0.4760 0.4348 2002 0.39% 1.7325 268.86 -2.0061% 6.34 0.4884 0.4452 2003 0.59% 0.8870 286.08 2.0593% 6.19 0.4889 0.4344 2004 0.48% 2.0230 359.74 0.7786% 5.71 0.5202 0.4691 2005 0.39% 1.4814 406.18 0.3123% 5.31 0.5014 0.4747 2006 0.08% 1.4125 458.34 1.1246% 5.17 0.4814 0.4393 2007 0.31% 0.6684 532.33 0.3641% 5.31 0.4519 0.4072 2008 0.51% 1.6435 480.24 -3.7423% 5.48 0.4362 0.4283 2009 0.23% 2.3668 415.74 2.0805% 5.48 0.4739 0.4797 2010 0.22% 0.8026 501.18 1.2312% 5.25 0.4998 0.4802 2011 0.22% 1.2997 596.47 0.1488% 5.13 0.5089 0.4817 2012 0.27% 1.0359 670.56 1.1846% 4.84 0.5075 0.4545 2013 0.41% 2.1999 870.87 2.3160% 4.56 0.4614 0.4107 2014 0.35% 3.4926 1094.3 0.9964% 4.47 0.4468 0.4062 2015 0.35% 3.9248 1264.3 0.0592% 4.10 0.4321 0.3703 2016 0.38% 4.2773 1423.1 0.8755% 3.95 0.4349 0.4086 2017 0.26% 5.3130 1833.1 1.5807% 3.55 0.4381 0.3935 2018 0.22% 4.4228 2138.6 -0.4070% 3.62 0.4752 0.4014 2019 0.18% 4.5541 2435.4 2.2948% 3.65 0.4916 0.4194 Table 7: Annual statistics of regression inputs. The values showcased are averaged per month on a yearly basis. Reported TNA and Flow (dollar)-values are in billions of US-dollars. The returns are averaged using S&P500 returns. The fees are in basis points and based on monthly holdings. In Table 7, we present a summary of the main data used in the regressions averaged yearly. We observe a clear decreasing trend in the flow data. Looking at the TNA values, which instead portray an apparent increase, it seems reasonable to assume that the declining Flow values stem from the growing TNA, considering flow is a relative value. Cao, Chang & Wang (2008), find similar pattern in their flow data, for multiple fund categories between 1998 and 2003. The time period in this thesis suffers two majors set backs, return wise. Firstly, the dot-com crash in the early 2000s, and secondly, the great financial crisis in 2008-2009. The averaged monthly return of -3.7423% in 2008 would yield an annualized loss of almost 37%. Fees, as already mentioned, have declined in the last 20 years and are substantially lower in 2019. As argued by Huang, Wei, & Yan (2007) and Sirri & Tufano (1998) there is a connection between fund flow and the charged fee. The Hurst RS estimations vary around 0.45-0.5, while the Hurst DFA estimations vary around 0.4-0.45.22 This discrepancy between RS and DFA values are in line with findings of Bariviera (2011). 22In appendix A.2 we preset histograms over the daily estimated Hurst exponents. 5 RESULTS 25 5 Results Before we begin discussing the relationship between broad market efficiency and index fund flow, we will conduct a small examination of our estimated proxy for market efficiency (the Hurst exponent). In figure 4 we show graphically the evolution of the Hurst expo- nent during the period 2000-2019. As we can observe, HRS is nearly consistently higher than HDFA, where HRS is hovering around 0.45-0.5, and HDFA between 0.35 and 0.5. Explicitly, this means that S&P500 returns exhibits signs of unpredictable returns and market efficiency according to HRS, while HDFA rather indicates a slight mean-reverting mechanism of the returns. This procures a strange task to analyze such measurements impact. Notably, the discrepancy between the RS and DFA values mirror those of previ- ous research (see for example Bariviera (2011)). Furthermore, these result are consistent with the theory behind both of these market efficiency measurements (see Grau-Carles, 2000). Figure 4: Hurst exponents and the underlying fluctuations (Fn) used in the computations. The Hurst exponents, RS and DFA, are computed accordingly to sections 4.3.2 and 4.3.3, respectively. Daily Hurst exponents are estimated, which then are averaged monthly. The second and third plots show the average fluctuations Fn and E(RS)n used in the computations. The 24 lines represent the various window sizes, where lines higher up correspond to larger window sizes (see footnote 19). Interestingly, in boom periods, the Hurst exponent seems to be trending downwards, 5 RESULTS 26 and vice versa, in periods of recession, the Hurst exponent is rising. Mathematically this makes sense, since a steady growing equity market exhibits small average fluctuations Fn (especially for larger window sizes), resulting in a lesser increase in average fluctuation for increasing window sizes. Likewise, for periods of high volatility and turbulence, the average fluctuation naturally increases causing an upward trend in the Hurst exponent. Economically, this response in the market efficiency is less intuitive. The asymptotically proven white noise series of a Hurst exponent equal to 0.5 seemingly appears arbitrary in an economic sense. Intuitively, it is reasonable that periods of high volatility should induce non-market predictability in returns, which we observe for the increasing of HDFA closer to 0.5 during such periods. Conversely, such equity market volatility normally arises from intervals of steady market conditions followed by subsequent intervals of sharp downturns (e.g. the famous great financial crisis of 2008). Sufficiently long periods of downturns are thus periods of same-sign returns, exhibiting economic predictability and long-term dependence. Such reasoning further purports the notion that developing mar- kets experience a lower degree of efficiency (Bariviera, 2011; Cajueiro & Tabak, 2004; Eom et al., 2008) given a historically higher volatility (Harvey, 1995; Umutlu, Akdeniz, & Altay-Salih, 2010). In Figure 5 we show the estimated Hurst coefficients as well as the asymptotic em- pirical 95% confidence interval.23 As such, Hurst exponents lying inside the confidence interval corresponds to an empirical estimation of market efficiency. We thus observe that the market efficiency measured by RS are for multiple periods non-distinguishable from efficiency, while market efficiency measured by DFA are consistently below market efficiency except for a brief period after the great financial crisis. To examine the relationship between index fund flow and market efficiency we esti- mate three different set of regressions. In Table 8 we present our estimations on fractional flow and in Table 9 we present our estimations on dollar flow.24 We also run a series of regressions with lagged Hurst exponents; these are presented in Table 10. The reasoning behind the lagged Hurst regressions is to provide robustness. We observe the same neg- ative significance for the lagged Hurst as for the contemporaneous Hurst. This suggests that there exists no reverse causality relationship between flow and Hurst. For fractional flow we observe that both market efficiency measurements coefficient are most significant in regressions (B) and (F), whereas for dollar flow, all market efficiency measurement (both contemporaneous and lagged) are significant. Noteworthy, we find a negative relationship between flow and contemporaneous market return. This is not in line with Edelen & Warner (2001), where daily fund flow are 23There exists no known distribution for the Hurst exponent, but Weron (2002) estimated an empirical confidence interval for both RS and DFA around a Hurst exponent eqaul to 0.5. 24In appendix A.3 we also present regressions run on naked aggregate flow data, wherein no adjust- ments on either retunr, TNA, or flow data has been performed. As we observe, we lose significance probably due to the large outliers in the naked flow data. 5 RESULTS 27 Figure 5: The evolution of the Hurst exponent over time, calculated using both DFA and RS. The red dotted lines represent a 95% empirical asymptotic confidence interval around Hurst = 0.5, estimated by Weron (2002). positively correlated with concurrent market returns. They further conclude that it is not an attribute of simultaneous feedback trading, but rather a causality running from flow to returns within a day. Notably is the weak positive significance for Feest−1; a contra-intuitive relationship wherein flow increase with increasing fees. We suggest this is due to the fee variable acting as a time-proxy as we observe a clear negative trend in both fractional flow (see figure 2) and fees (see figure 3).25 The same significant impact cannot be found in the dollar flow regressions. This is in contrast with studies done by Sirri & Tufano (1998) and Huang, Wei, & Yan (2007) who find a general negative relationship between fund flows and fees. A clear distinction from their work is that our thesis only analyze index funds flow whereas Sirri & Tufano (1998) and Huang, Wei, & Yan (2007) studied a broader spectrum of funds, which possibly are more sensitive to fees. Moreover, the logged and lagged TNA values are consistently positively significant in the regressions run on dollar flows, and partly negatively significant in the fractional flow regression. This observed difference could stem from the previous mentioned time factor as fractional flow is slightly decreasing over time whereas the dollar flow is steadily increasing. Overall we find no evidence that past year market volatility impacts current flow values. 25Including a trend variable in these regressions cause Feest−1 to become a negative non-significance. 5 R E SU LT S 28 Linear regression 1: fractional flow Panel A: DFA Panel B: RS N = 239 (A) (B) (C) (D) (E) (F) (G) (H) Intercept 0.0045** 0.02754*** -0.0251 -0.0211 0.0064** 0.0251*** -0.0354 -0.0327 (2.17) (3.43) (-0.79) (-0.67) (2.17) (3.86) (-1.36) (-1.26) Hurstt -0.0023 -0.0147** -0.0109 -0.0122 -0.0061 -0.0135** -0.0113 -0.0126* (-0.47) (-2.12) (-1.31) (-1.47) (-0.97) (-2.08) (-1.64) (-1.85) Mkrett - -0.0124** -0.0133** -0.0132** - -0.0126** -0.0131** -0.0130** (-1.99) (-2.12) (-2.11) (-2.01) (-2.10) (-2.09) Mkrett−1 - - - 0.0037 - - - 0.0037 (0.56) (0.56) Mkrett−2 - - - 0.0024 - - - 0.0026 (0.40) (0.44) Log(TNAt−1) - -0.0013*** 0.0016 0.0014 - -0.0011*** 0.0023 0.0022 (-3.15) (0.95) (0.83) (-3.32) (1.60) (1.50) CumV olt−1 - - .0048 0.0055 - - -0.0009 -0.0009 (0.28) (0.32) (-0.06) (-0.06) Feest−1 - - 22.0462* 20.9689 - - 26.6578** 26.1407** (1.66) (1.58) (2.23) (2.19) Adjusted R2 -0.38% 8.24% 9.45% 8.99% -0.13% 7.66% 9.54% 9.09% F-statistic 0.22 6.24*** 4.72*** 3.91*** 0.95 7.58*** 5.38*** 4.69*** Table 8: Linear regressions estimated on the dependent variable fractional FLOW aggt . FLOW agg t is calculated as the sum of all index funds dollar value flow for month t adjusted by the previous month sum of all index funds total net assets (see equation 3.2). Mkrett is the S&P500 return from period t. CumV olt are the cumulative standard deviation from t− 11 up to t using monthly S&P500 returns. The Log(TNAt), is the logged value of aggregated total net assets for all the index funds for month t. Feest are the average fee level (expense ratio) of the index funds for month t. The Hurst exponents in panel A are calculated using the DFA-method and the Hurst exponents in Panel B the RS-method, both with a rolling window size of 4 years (1008 observations). The Hurst exponents are computed on a daily basis and then averaged over each month to obtain monthly estimates. The regression are estimated on a monthly basis and the numbers in parenthesis represent the t-statistic for the estimated coefficient above. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. 5 R E SU LT S 29 Linear regression 2: dollar flow Panel A: DFA Panel B: RS N = 239 (A) (B) (C) (D) (E) (F) (G) (H) Intercept 13.7845*** -16.0782*** -31.4213** -31.2826** 14.9674*** -15.7781*** -37.4288*** -37.1942*** (10.31) (-4.83) (-2.09) (-2.08) (7.84) (-4.64) (-3.02) (-2.99) Hurstt -26.1927*** -9.1362*** -8.3138** -8.3271** -26.4069*** -11.1122 ** -10.3066*** -10.3651*** (-8.77) (-3.04) (-2.30) (-2.26) (-6.64) (-3.11) (-2.81) (-2.75) Mkrett - -9.7575*** -10.0412*** -10.1331*** - -9.5229*** -9.7557*** -9.8287*** (-3.83) (-3.94) (-3.96) (-3.68) (-3.78) (-3.78) Mkrett−1 - - - 0.8797 - - - 0.9735 (0.37) (0.41) Mkrett−2 - - - -.8349 - - - -.5472 (-0.37) (-0.24) Log(TNAt−1) - 1.6843*** 2.5533*** 2.5455*** - 1.7598*** 2.9901*** 2.9771*** (9.02) (3.05) (3.04) (9.98) (4.31) (4.28) CumV olt−1 - - 3.4860 3.4207 - - -0.3548 -0.4249 (0.48) (0.48) (-0.05) (-0.06) Feest−1 - - 6392.223 6341.537 - - 9550.218* 9487.402* (1.08) (1.07) (1.88) (1.87) Adjusted R2 19.73% 43.42% 43.51% 43.08% 13.54% 43.77% 44.18% 43.75% F-statistic 76.91*** 52.91*** 31.88*** 23.26*** 44.04*** 53.82*** 32.99*** 23.66*** Table 9: Linear regressions estimated on the dependent variable dollar FLOW aggt . FLOW agg t is calculated as the sum of all index funds dollar value flow for month t (see equation 3.3). Mkrett is the S&500 return for period t. CumV olt are the cumulative standard deviation from t− 11 up to t using monthly S&P500 returns. The Log(TNAt), is the logged value of aggregated total net assets, expressed in $ billions, for all the index funds for month t. Feest are the average fee level (expense ratio) of the index funds for month t. The Hurst exponents in panel A are calculated using the DFA-method and the Hurst exponents in Panel B the RS-method, both with a rolling window size of 4 years (1008 observations). The Hurst exponents are computed on a daily basis and then averaged over each month to obtain monthly estimates. The regression are estimated on a monthly basis and the numbers in parenthesis represent the t-statistic for the estimated coefficient above. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. 5 R E SU LT S 30 Linear regression 3: dollar flow, lagged Hurst Panel A: DFA Panel B: RS N = 239 (A) (B) (C) (D) (E) (F) (G) (H) Intercept 13.9142*** -15.5915*** -29.1311* -28.7852* 15.7408*** -14.5046*** -35.3008*** -34.8036*** (10.07) (-4.72) (-1.93) (-1.91) (8.70) (-4.41) (-2.92) (-2.88) Hurstt−1 -26.4940*** -9.8331*** -9.0674** -9.1521** -28.0499*** -13.0412*** -12.1971*** -12.4051*** (-8.60) (-3.25) (-2.48) (-2.46) (-7.44) (-3.82) (-3.48) (-3.46) Mkrett - -9.6418*** -9.9188*** -10.0051*** - -9.1592*** -9.3951*** -9.4662*** (-3.80) (-3.91) (-3.93) (-3.68) (-3.66) (-3.66) Mkrett−1 - - - 1.0370 - - - 1.3526 (0.44) (0.57) Mkrett−2 - - - -0.7389 - - - -0.523741 (-0.33) (-0.24) Log(TNAt−1) - 1.6703*** 2.4362*** 2.4173*** - 1.7326*** 2.9127*** 2.8873*** (9.03) (2.91) (2.88) (9.99) (4.28) (4.24) CumV olt−1 - - 3.7741 3.4207 - - -0.5538 -0.6273 (0.53) (0.53) (-0.08) (-0.10) Feest−1 - - 5579.195 5469.133 - - 9167.343* 9050.544* (0.94) (0.92) (1.83) (1.82) Adjusted R2 20.55% 43.74% 43.74% 43.31% 15.73% 44.59% 44.92% 44.53% F-statistic 73.89*** 62.67*** 38.00*** 26.98*** 55.40*** 64.84*** 39.83*** 28.30*** Table 10: Linear regressions estimated on the dependent variable FLOW aggt . FLOW agg t is calculated as the sum of all index funds dollar value flow for month t (see equation 3.3). Mkrett is the S&500 return for period t. CumV olt are the cumulative standard deviation from t−11 up to t using monthly S&P500 returns. The Log(TNAt), is the logged value of aggregated total net assets, expressed in $ billions, for all the index funds for month t. Feest are the average fee level (expense ratio) of the index funds for month t. The Hurst exponents in panel A are calculated using the DFA-method and the Hurst exponents in Panel B the RS-method, both with a rolling window size of 4 years (1008 observations). The Hurst exponents are computed on a daily basis and then averaged over each month to obtain monthly estimates. The regression are estimated on a monthly basis and the numbers in parenthesis represent the t-statistic for the estimated coefficient above. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. 5 RESULTS 31 Although the RS coefficients are on average more significant than the DFA coefficients, we find very similar coefficients for both, suggesting that choice of method does not significantly alter the market efficiency’s estimated effect on flow.26 As such, we find that a lesser degree of market efficiency negatively impacts flow. But how do we further examine this phenomena? This market efficiency measurement is both stationary and continuous; flow and market efficiency are ultimately two completely different variables in terms of structure. Market efficiency should be seen as a current state of the market, a continuous condition of how inter-dependent the returns are. Flow, on the other hand, is an event, a one period happening. This touches upon an important issue; the levels of index fund flow should be affected by whether the market returns show signs of long-term dependency or short-term anti memory. Both states should indicate an inefficient market, but are otherwise completely different in nature. In the regressions in Table 9 and 10, we observe that the higher the Hurst exponent is, the lower Flow becomes. This is reasonable, as the Hurst exponent generally increases when estimated over periods of high volatility with a justifiable simultaneous liquidation of index fund assets. This reasoning seems strengthened when concurrent market returns are included [(B) and (F)], which reduces the market efficiency coefficient. Nonetheless, the economic interpretation is more difficult. Although, that irregardless of the current state of the market efficiency, it exhibits a negative impact on flow.27 Important here is the findings of Eom et al. (2008), that markets with higher Hurst exponent tends to be more predictable and exhibit a lower degree of efficiency. We thus argue that our findings, of a negative impact on Flow from a higher Hurst exponent, reinforce this phenomena. The more inefficient the market is (corresponding to a higher Hurst exponent), the lower the index fund Flow is. For clarification, our estimation of the degree of market efficiency for S&P500 is in line with other studies estimation of the degree of market efficiency in developed markets (Bariviera, 2011; Cajueiro & Tabak, 2004; Eom et al., 2008), wherein these developed markets generally obtain a Hurst estimation of around 0.40 and 0.50 (that is, a slight mean reversing process of returns).28 Previous studies further find that 26Regarding the discrepancy between the significance of the coefficients between the DFA and RS approach, wherein RS coefficients overall are more significant, the conclusion should not be that the RS Hurst is a better measurement for market efficiency. Nonetheless, it seems that the RS market efficiency measurement more accurately describes concurrent index fund flows. 27Unfortunately, transforming the Hurst exponent in various ways in accordance to the 95% confidence interval, to try and isolate whether significant long-term memory or anti-persistence yield different effects on flow, produce non-significant results. Such non-significance in the current position of the market efficiency in respect to true efficiency (i.e. whether the Hurst exponent is below, inside, or above the confidence interval), strengthens our argument of a lesser degree of efficiency above than below the confidence interval. 28As of the writing of this paper, we have not found any other studies that show an equity market consistently exhibiting a Hurst exponent less than 0.4. This suggests that, while the theoretical support of the Hurst exponent is H ∈ (0, 1), the empirical Hurst exponent support in finance applications is different. Maybe such a strong mean-reverting return process indicated by sub-0.30 Hurst exponent is not a plausible security market phenomena. Rather, the slight mean-reversing return process characterized by H ≈ 0.4 appear standard for the current developed and considered most efficient markets. 5 RESULTS 32 Standardized coefficients Regression (A) (B) (C) (D) (E) (F) (G) (H) Table 8 -0.0243 -0.1559** -0.1152 -0.1289 -0.0537 -0.1183** -0.0987 -0.1101* Table 9 -0.45*** -0.16*** -0.14** -0.14** -0.37*** -0.16** -0.15*** -0.15*** Table 10 -0.45*** -0.17*** -0.16** -0.16** -0.40*** -0.18*** -0.17*** -0.18*** Standard deviations RS DFA Flow (dollar) Flow (fractional) Std 0.0287 0.0348 2.06 33 Table 11: Standardized coefficients of the Hurst exponent in the estimated regressions for each table. The standardized coefficients are obtained by standardizing all variables in each regression to have a mean 0 and standard deviation of 1. These coefficients thus measure a one standard deviations change in the dependent variable with a one standard deviation change in the dependent variable multiplied by the coefficient. For reference, we also present the standard deviation of each measurement.The standard deviation of fractional flow is in basis points and the standard deviation for dollar flow is in billions of dollar. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. developing markets display a higher Hurst exponent (lesser degree of efficiency). This touches upon state vs. event problematic, and further purports the notion that, irregard- less of where a security market lies on the Hurst support scale, an increase in the Hurst exponent indicates a movement towards inefficiency; and, as of the findings of this paper, a reduction in index fund flow. To examine the magnitude of the impact of market efficiency on Flow we utilize stan- dardized coefficients (also called beta coefficients). In Table 11 we present the transformed Hurst coefficients into standardized coefficients. We observe that the standardized coef- ficients for Table 9 and 10 are similar with only small differences between the regressions (except for regressions (A) and (E)). Examining the more significant coefficients against dollar flow indicates that an increase in the Hurst exponent of one standard deviation would result in an approximately decrease in dollar flow by 0.15 to 0.17% multiplied by one standard deviation in dollar flow. With a HRS standard deviation of 0.0287 and dollar flow standard deviation of $2.06 billion, an increase in HRS by one standard deviation roughly translates into an index fund outflow of $309 million. To put into perspective, the maximum one month difference in HRS was an increase by 0.0247 that occurred in September 2009, corresponding to a simultaneous outflow of $997 million, the fifth largest outflow in our sample period. The significant lagged Hurst exponents provide evidence that lead index fund flow can be predicted from current state of market efficiency. Ceteris paribus, a movement towards market efficiency, corresponding to a reduction in the Hurst exponent, indicates that aggregate index fund flow should be larger (in comparison with current month) in the impending month. 6 CONCLUSION 33 6 Conclusion We find evidence that the S&P500 market index have exhibited signs a slight mean- reverting process (anti-persistence, or negative memory), close to market efficiency, be- tween 2000 and 2019; characterized by HurstRS ≈ 0.5 and a HurstDFA ≈ 0.4. These findings are similar to previous studies of time-varying Hurst exponents. We further find a causality direction between market efficiency and index fund flows, where the degree of market efficiency negatively affect the level of flow. We find no indication of causality in the reverse direction. Our findings suggests that equity markets which are characterized by a higher Hurst exponent, a lower degree of efficiency, experience lower levels of aggre- gate index fund flow. For dollar flow, this relationship is characterized by a one standard deviation change in the market efficiency measurement corresponding to a change in dol- lar flow of (roughly) 15% of the standard deviation in dollar flow. The same relationship exists for fractional flow, albeit less significant, wherein the effect is (roughly) 12% of the standard deviation in fractional flow. Moreover, we find a similar significant relationship for previous month’s Hurst exponent, indicating predictability for impending index fund flow from current month market efficiency. These results generally hold stronger for dollar flow than fractional flow. Our findings suggests that dollar flow provides greater insight than fractional flow into the dynamics of macro-scale mutual fund flows. This discrepancy is not found in previous mutual fund flow studies, and highlights the size-effect present in a bull-markets, wherein the return dwarfs the dollar flow growth. Nonetheless, our estimated causality expands the possible significant control variables when evaluating mutual fund flows. The estimated market efficiency’s effect on index fund flow bridge the theoretical gap between passive investing and degree of efficiency, suggesting that a random walk market (weak form EMH) induce larger flow. These implications are powerful for further studies regarding both fund flows and market efficiency, but also for practitioners trying to model mutual fund flows. Generally, we encourage further studies examining the same relationship in other secu- rity markets in order to determine if our findings is a general index fund flow mechanism, or if it holds specifically for S&P500. As this paper studied broad-market efficiency, a natural continuation is to study the index fund flow and the efficiency of specific index constituents; i.e., the market impact of index fund flows on single company equity. Rea- sonably, large institutional actors providing index funds need to adjust their holdings to track the index (say once a month), and therein cause abnormal price movements. Such a study would prove useful for traders, highlighting arbitrage opportunities. Furthermore, as has been noted by Tiwari, Albulescu, & Yoon (2017), the stock market seems to exhibit signs of a multi-fractal nature. We therein propose further studies of multi-fractal Hurst exponents (MF-DFA) and its relation to index funds. 7 REFERENCES 34 7 References Anis, A.A. & Lloyd, E.H., (1976). The Expected Value of the Adjusted Rescaled Hurst Range of Independent Normal Summands. Biometrika, 63(1), pp.111–116. Bariviera, A.F., (2011). The influence of liquidity on informational efficiency: The case of the Thai Stock Market. Physica A: Statistical Mechanics and its Applications, 390(23-24), pp.4426–4432. Belasco, E., Finke, M. & Nanigian, D., (2012). The impact of passive investing on corporate valuations. Managerial Finance, 38(11), pp.1067–1084. Cajueiro, D.O. & Tabak, B.M., (2004). The Hurst exponent over time: testing the assertion that emerging markets are becoming more efficient. Physica A: Statistical Mechanics and its Applications, 336(3-4), pp.521–537. Cao, C., Chang, E.C. & Wang, Y., (2008). An empirical analysis of the dynamic rela- tionship between mutual fund flow and market return volatility. Journal of Banking and Finance, 32(10), pp.2111–2123. Center for research in security prices (CRSP), (2019). Survivor Bias Free US Mutual Fund Guide, https://wrds-www.wharton.upenn.edu/documents/1303/MFDB˙Guide.pdf?˙ga= 2.213683028.1760127731.1585562190-838547392.1581322778 Di Matteo, T., (2007). Multi-scaling in finance. Quantitative Finance, 7(1), pp.21–36. Edelen, R.M. & Warner, J., (2001). Aggregate price effects of institutional trading: a study of mutual fund flow and market returns. Journal Of Financial Economics, 59(2), pp.195–220. Elton, E., Gruber, M. & Busse, J., (2004). Are investors rational? Choices among index funds. Journal Of Finance, 59(1), pp.261–288. Eom, C. et al., (2008). Hurst exponent and prediction based on weak-form efficient market hypothesis of stock markets. Physica A: Statistical Mechanics and its Ap- plications, 387(18), pp.4630–4636. Fama, E.F., (1970). Efficient Capital Markets: A Review Of Theory And Empirical Work. Journal of Finance, 25(2), pp.383–417. Fama, E.F., (1991). Efficient Capital Markets: II. Journal of Finance, 46(5), pp.1575–1617. Grau-Carles, P., (2000). Empirical evidence of long-range correlations in stock returns. Physica A: Statistical Mechanics and its Applications, 287(3-4), pp.396–404. Grossman, S. & Stiglitz, J., (1980). On the impossibility of informationally efficient markets. American economic review, 70(3), pp.393–408. Harvey, C., (1995). Predictable risk and returns in emerging markets. The Review of Financial Studies, 8(3), pp.773–816. Hortac¸su, A. & Syverson, C., (2004). Product Differentiation, Search Costs, and Com- petition in the Mutual Fund Industry: A Case Study of S&P 500 Index Funds. The 7 REFERENCES 35 Quarterly Journal of Economics, 119(2), pp.403–456. Hsieh, D.A., (1993). Chaos and Order in the Capital Markets: A New View of Cy- cles, Prices, and Market Volatility (Book Review).The Journal of Finance, 48(5), pp.2041–2044. Huang, J., Wei, K.D. & Yan, H., (2007). Participation Costs and the Sensitivity of Fund Flows to Past Performance. Journal of Finance, 62(3), pp.1273–1311. Kantelhardt, J.W. et al., (2001). Detecting long-range correlations with detrended fluc- tuation analysis. Physica A: Statistical Mechanics and its Applications, 295(3-4), pp.441–454. Kadapakkam, P.-R., Krause, T. & Tse, Y., (2015) Exchange traded funds, size-based portfolios, and market efficiency. Review of Quantitative Finance and Accounting, 45(1), pp.89–110. Kristoufek, L. & Vosvrda, M., (2013). Measuring capital market efficiency: Global and local correlations structure. Physica A: Statistical Mechanics and its Applications, 392(1), pp.184–193. Malkiel, B.G., (2003). Passive Investment Strategies and Efficient Markets. European Financial Management, 9(1), pp.1–10. McCauley, J.L., Bassler, K.E. & Gunaratne, G.H., (2008). Martingales, detrending data, and the efficient market hypothesis. Physica A: Statistical Mechanics and its Applications, 387(1), pp.202–216. Peng, C. et al., (1995). Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos: An Interdisciplinary Journal of Nonlinear Science, 5(1), pp.82–87. Petajisto, A., (2011). The index premium and its hidden cost for index funds. Journal of Empirical Finance, 18(2), pp.271–288. Pincus, S.M., (1991). Approximate Entropy as a Measure of System Complexity. Pro- ceedings of the National Academy of Sciences of the United States of America, 88(6), pp.2297–2301. Pincus, S.M., (2008). Approximate Entropy as an Irregularity Measure for Financial Data. Econometric Reviews, 27(4-6), pp.329–362. Sapp, T. & Tiwari, A., (2004). Does Stock Return Momentum Explain the “Smart Money” Effect? Journal of Finance, 59(6), pp.2605–2622. Sirri, E.R. & Tufano, P., (1998). Costly Search and Mutual Fund Flows. Journal of Finance, 53(5), pp.1589–1622. Sushko, V. & Turner, G., (2018). The implications of passive investing for securities markets. BIS quarterly review : international banking and financial market devel- opments, pp.113–131. Tiwari, A.K., Albulescu, C.T. & Yoon, S.-M., (2017). A multifractal detrended fluctua- 7 REFERENCES 36 tion analysis of financial market efficiency: Comparison using Dow Jones sector ETF indices. Physica A: Statistical Mechanics and its Applications, 483, pp.182–192. Umutlu, M., Akdeniz, L. & Altay-Salih, A., (2010). The degree of financial liberalization and aggregated stock-return volatility in emerging markets. Journal of Banking and Finance, 34(3), pp.509–521. Warther, V.A., (1995). Aggregate mutual fund flows and security returns. Journal of Financial Economics, 39(2), pp.209–235. Wharton Research Data Services (WRDS), (2020). How to identify index funds. https://wrds-www.wharton.upenn.edu/pages/support/support-articles/crsp/mutual-fund/how- identify-index-funds/?˙ga=2.191050090.193790233.1585126468-1241545974.1580722954 Weissenteiner, A., (2019). Correlated noise: Why passive investment might improve mar- ket efficiency. Journal of Economic Behavior and Organization, 158, pp.158–172. Weron, R., (2002). Estimating long-range dependence: finite sample properties and con- fidence intervals. Physica A: Statistical Mechanics and its Applications, 312(1-2), pp.285–299. Printed references Peters, E.E., (1991). Chaos and order in the capital markets : a new view of cycles, prices, and market volatility, New York: Wiley. A APPENDIX 37 A Appendix A.1 Data adjustments Variable Step Procedure Effect Return 0) Naked data 62 111 TNA 0) Naked data 67 341 1) Fill 67 341 → 70 158 Flow 0) Naked data 69 516 1) Inception & Death 69 516 → 70 225 2) Winsorize (1 and 99%) 70 255 Table 12: Data adjustments for downloaded mutual fund data. Data for a total of 43 938 funds were obtained from CRSP. These steps outline our data management procedure explained in section 3.1. A.2 Hurst histogram Figure 6: Histogram over daily Hurst estimates computed using a window size of 1008 during the period 2000-2019. A.3 Regressions - naked data A A P P E N D IX 38 Linear regression, naked data: fractional flow Panel A: DFA Panel B: RS N = 239 (A) (B) (C) (D) (E) (F) (G) (H) Intercept 0.0143* 0.0187 -0.2024** -0.2333** 0.0216* 0.0252 -0.1720* -0.1915* (1.75) (0.83) (-1.98) (-2.31) (1.79) (0.92) (-1.73) (-1.97) Hurstt -0.0230 -0.0260 -0.0017 0.0092 -0.0365 -0.0399 -0.0298 -0.0196 (-1.23) (-1.19) (-0.07) (0.39) (-1.46) (-1.36) (-0.96) (-0.69) Mkrett - 0.0149 0.0125 0.0122 - 0.0164 0.0151 0.0147 (0.66) (0.56) (0.53) (0.73) (0.69) (0.66) Mkrett−1 - - - -0.0378 - - - -0.0358 (-1.57) (-1.50) Mkrett−2 - - - -0.0181 - - - -0.0156 (-0.78) (-0.69) Log(TNAt−1) - -0.0002 0.0121** 0.0137** - -0.0002 0.0120** 0.0120** (-0.20) (2.08) (2.40) (-0.13) (1.97) (2.18) CumV olt−1 - - -0.0496 -0.0554 - - -0.0410 -0.0407 (-0.82) (-0.93) (-0.72) (-0.71) Feest−1 - - 95.66** 103.72** - - 89.72** 93.10** (2.21) (2.38) (2.12) (2.17) Adjusted R2 0.09% -0.46% 0.85% 2.44% 0.46% -0.03% 1.35% 2.61% F-statistic 1.50 0.68 1.42 1.57 2.13 1.24 1.68 1.49 Table 13: Linear regressions estimated on the dependent variable fractional FLOW aggt . FLOW agg t is calculated as the sum of all index funds dollar value flow for month t adjusted by the previous month sum of all index funds total net assets (see equation 3.2). No adjustments are done on the return, TNA, or flow data. Mkrett is the S&P500 return from period t. CumV olt are the cumulative standard deviation from t − 11 up to t using monthly S&P500 returns. The Log(TNAt), is the logged value of aggregated total net assets for all the index funds for month t. Feest are the average fee level (expense ratio) of the index funds for month t. The Hurst exponents in panel A are calculated using the DFA-method and the Hurst exponents in Panel B the RS-method, both with a rolling window size of 4 years (1008 observations). The Hurst exponents are computed on a daily basis and then averaged over each month to obtain monthly estimates. The regression are estimated on a monthly basis and the numbers in parenthesis represent the t-statistic for the estimated coefficient above. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively. A A P P E N D IX 39 Linear regression, naked data: dollar flow Panel A: DFA Panel B: RS N = 239 (A) (B) (C) (D) (E) (F) (G) (H) Intercept 28.2045*** -42.2071** -173.7096** -196.0168** 29.2642*** -41.7072 -164.0102* -178.5299* (3.33) (-2.04) (-2.19) (-2.20) (3.67) (-1.39) (-1.95) (-1.95) Hurstt -56.7264*** -17.4416 -3.4558 4.3331 -54.0161*** -20.9468 -14.7685 -7.2979 (-3.09) (-1.42) (-0.26) (0.29) (-3.31) (-0.95) (-0.64) (-0.28) Mkrett - 2.3508 0.8832 1.1187 - 2.7874 1.9706 2.1526 (0.16) (0.06) (0.07) (0.19) (0.13) (0.14) Mkrett−1 - - - -31.5603 - - - -30.7215 (-1.23) (-1.17) Mkrett−2 - - - -8.8723 - - - -7.8688 (-0.80) (-0.71) Log(TNAt−1) - 3.9991*** 11.3171** 12.5377** - 4.1395** 11.0443** 11.7807** (2.69) (2.39) (2.37) (2.46) (2.30 (2.27) CumV olt−1 - - -26.3973 -30.0667 - - -24.4713 -23.7961 (-0.81) (-0.91) (-0.81) (-0.77) Feest−1 - - 56850.13** 62667.51** - - 55589.42** 58118.02** (2.00) (2.02) (1.99) (1.97) Adjusted R2 2.98% 7.30% 7.32% 8.13% 1.68% 7.30% 7.44% 8.15% F-statistic 9.58*** 3.67** 2.27** 2.61** 10.98*** 6.52*** 4.03*** 3.86*** Table 14: Linear regressions estimated on the dependent variable dollar FLOW aggt . FLOW agg t is calculated as the sum of all index funds dollar value flow for month t (see equation 3.3). No adjustments are done on the return, TNA, or flow data. Mkrett is the S&500 return for period t. CumV olt are the cumulative standard deviation from t− 11 up to t using monthly S&P500 returns. The Log(TNAt), is the logged value of aggregated total net assets, expressed in $ billions, for all the index funds for month t. Feest are the average fee level (expense ratio) of the index funds for month t. The Hurst exponents in panel A are calculated using the DFA-method and the Hurst exponents in Panel B the RS-method, both with a rolling window size of 4 years (1008 observations). The Hurst exponents are computed on a daily basis and then averaged over each month to obtain monthly estimates. The regression are estimated on a monthly basis and the numbers in parenthesis represent the t-statistic for the estimated coefficient above. ***, **, and * indicates significance at 1%, 5%, and 10%, respectively.