Stochastic Partial Differential Equations with Multiplicative Noise Numerical simulations of strong and weak approximation errors Thesis for the degree of Master of Science Andreas Petersson Mathematical Sciences Chalmers University of Technology University of Gothenburg Gothenburg, Sweden, 15 May 2015 Abstract A finite element Galerkin spatial discretization together with a backward Euler scheme is implemented to simulate strong error rates of the homogeneous stochastic heat equation with multiplicative trace class noise in one dimension. For the noise, two different operators displaying different degrees of regularity are considered, one of which is of Nemytskii type. The discretization scheme is extended to include dis- cretization of the covariance operator of the Q-Wiener process driving the equation. The results agree with the theory. Furthermore, for exploratory reasons, weak error rates are also simulated using the classical Monte Carlo method and the multilevel Monte Carlo method. Acknowledgements First of all I would like to thank my supervisor Associate Professor Annika Lang for a tremendous amount of support and advice during the entirety of this project, as well as for getting me interested in this fascinating field from the get-go. It has been an exciting and entertaining, if at times challenging, experience. Thank you also to all who contributed to the course material of Numerical Analysis of Stochastic Partial Differential Equations at ETH Zu¨rich, it was very helpful for getting me started. I would also like to thank Professor Stig Larsson for granting me access to the computational resources at Chalmers Centre for Computational Science and Engineering (C3SE) provided by the Swedish National Infrastructure for Computing (SNIC). Thank you also to Johan Alvbring at C3SE for installing MATLAB Distributed Computing Server— on the C3SE resources and providing support and documentation to make my simulations run on them. Finally I would like to thank my partner Andrea and the rest of my family for their moral support and admirable amount of patience with me during the work. i ii Contents 1 Introduction 1 1.1 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 On notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Stochastic calculus in Hilbert spaces 4 2.1 Hilbert-Schmidt spaces and trace class operators . . . . . . . . . . . . . 4 2.2 Semigroups and fractional powers of operators . . . . . . . . . . . . . . . 6 2.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 Q-Wiener Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.5 Stochastic integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Semilinear stochastic partial differential equations 13 3.1 Setting and assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Strong and mild solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Existence and uniqueness of the mild solution . . . . . . . . . . . . . . . 15 4 Discretization methods for SPDE 16 4.1 Galerkin finite element methods . . . . . . . . . . . . . . . . . . . . . . . 16 4.2 The implicit Euler scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.3 A fully discrete strong approximation of the SPDE . . . . . . . . . . . . 18 4.4 Noise approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Monte Carlo methods 24 5.1 Strong and weak errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.2 The Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.3 The Multilevel Monte Carlo method . . . . . . . . . . . . . . . . . . . . 28 6 Simulations 31 6.1 Geometric Brownian motion in infinite dimensions . . . . . . . . . . . . 31 6.2 The heat equation with multiplicative Nemytskii-type noise . . . . . . . 32 6.3 Simulation setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.4 Results: Strong convergence rates . . . . . . . . . . . . . . . . . . . . . . 34 6.5 Results: Weak convergence rates . . . . . . . . . . . . . . . . . . . . . . 37 6.6 Results: Multilevel Monte Carlo estimations . . . . . . . . . . . . . . . . 42 6.7 Concluding discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A Appendix 47 iii B Source code 48 B.1 HSHE strong errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 B.2 multilevel estimate G2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 References 53 iv List of symbols Symbol Description SPDE Stochastic partial differential equation(s), page 13 FEM Finite element method(s), page 16 N Set of all positive integers N0 Set of all non-negative integers R Set of all real numbers T A positive finite real number denoting the end of some time interval [0, T ] U,H Real separable Hilbert spaces 〈·, ·〉H Inner product of a given Hilbert space H H0 Hilbert space Q 1 2 (H), page 11 B,B1, B2 Real Banach spaces dom(A) Domain of the operator A C(B1;B2) Space of all continuous mappings from B1 to B2 (Ω,F , (Ft)t∈[0,T ], P ) A filtered probability space with a normal filtration, page 4 E [X] Expectation of X, page 9 B(B) Borel σ-algebra of B PT σ-algebra of predictable stochastic pro- cesses, page 11 H˙r H˙r = dom(A r 2 ), page 7 Lp(Ω, B) Banach space of all p-fold integrable map- pings from Ω to B L(B1, B2) Banach space of all bounded linear opera- tors from B1 to B2 LHS(U ;H) Hilbert space of all Hilbert–Schmidt oper- ators from U to H, page 5 || · ||B Norm of a given Banach space B || · ||r Norm of the Hilbert space H˙r, page 7 I Identity operator 1D Indicator function of a given measurable set D Ih Interpolation operator, page 33 v Rh Ritz projector, page 16 Ph Orthogonal projector, page 16 E(t) Semigroup generated by −A, page 7 tr(Q) Trace of Q, page 5 β A real-valued Brownian motion W Q-Wiener process, page 10 EN [Y ] Monte Carlo estimator, page 25 EL[YL] Multilevel Monte Carlo estimator, page 28 vi 1 Introduction In this thesis, we are concerned with the implementation of numerical approximation schemes of stochastic partial differential equations of evolutionary type, driven by mul- tiplicative noise. These are partial differential equations where we have introduced a random noise term so that the solutions become stochastic processes taking values in some function space. Such equations are interesting for a number of reasons (see e.g. [13] or [16] for examples of applications) and we analyze them by considering the abstract problem of finding a solution X : [0, T ]× Ω → H to dX(t) +AX(t)dt = G(X(t))dW (t), for 0 ≤ t ≤ T X(0) = X0. In the main part of this thesis, we will take A = −∆, where ∆ is the Laplacian, H = L2([0, 1], R) and W is a Q-Wiener process where Q is of finite trace. As the operator G controlling how the noise affects X also depend on X we say that this equation, which we refer to as as the homogeneous stochastic heat equation, has multiplicative trace class noise. It holds that under sufficient constraints on G and X0 the equation admits a so called mild solution X(t) = E(t)X0 − ∫ t 0 E(t− s)G(X(s))dW (s) which we want to approximate by some other process Xˆ` that we can compute. Here E is the so called C0-semigroup generate by A and the integrals are of Bochner and Itoˆ type respectively. We are interested in the strong error ||X(T )− Xˆ`||L2(Ω;H) and the weak error ∣ ∣ ∣E[φ(X(T ))]− E[φ(Xˆ`)] ∣ ∣ ∣ where φ : H → R is some smooth functional. When we want to implement this theory in a computer program, we have to be able to represent the solution X(T ) as an approximation Xˆ` on finite partitions in space and time. For this we implement a spatio-temporal discretization scheme described in [11], which is the main source of the theory used in this thesis. The particular discretizations in this case are a Galerkin finite element method when discretizing with respect to H and a backward Euler-Maruyama scheme with respect to [0, T ]. We are not aware 1 of previously published simulation results on SPDE with multiplicative noise using an implementation of this particular spatio-temporal scheme. The theory of Galerkin finite element methods is well-established and they do not require knowledge of the eigenvalues and eigenvectors of the operator A. We extend the discretization scheme slightly by proving results on a discretization of the covariance operator Q of the underlying Q- Wiener process, using an approach that is similar to the one used in [4]. The error estimate of this scheme given in [11] is in the form of strong errors. Since we are not aware of any papers providing results on the weak convergence rates when we consider SPDE with multiplicative noise with a discretization in both time and space we choose to simply note that we have weak convergence of this scheme since the weak error is bounded by the strong error: ∣ ∣ ∣E[φ(X(T ))]− E[φ(Xˆ`)] ∣ ∣ ∣ ≤ ||X(T )− Xˆ`||L2(Ω;H) for sufficiently smooth φ. However, many authors (see in particular [1] and [7] and the references therein for a setting similar to ours) have considered weak convergence in a semidiscrete setting (with respect to either space or time) and in anticipation of a combination of these we do an exploratory simulation on weak convergence rates of this spatio-temporal discretization scheme in the particular setting of the heat equation. There is a common rule of thumb [11, page 9] that in many situations the weak rate of convergence is almost twice the strong rate of convergence, and we wanted to see if we could find an indication of whether this was true in this case as well. To actually simulate the expectations involved in the weak and strong errors above, we have to use estimators such as the classical Monte Carlo estimator and the so called multilevel Monte Carlo estimator. The multilevel Monte Carlo estimator can often reduce the computational work compared to the classical, or singlelevel, Monte Carlo estimator and its application to SPDE has been the subject of a number of recent papers (e.g. [3], [4]). We give proofs on the L2-convergence of the obtained estimates to the true strong and weak error rates. The computations involved in the simulations of these error rates are often very expen- sive. Fortunately, we were granted access to the cluster Glenn of the Chalmers Centre for Computational Science and Engineering (C3SE), and so we were able to also consider quite costly simulations. In the end, our simulations of the strong error rates are consistent with the theory. The simulations of the weak error rates are also to some extent consistent with the before- mentioned rule of thumb, which can be of interest for future research. Furthermore, 2 we are able to achieve similar results on the simulations of the weak error when using multilevel estimators to a smaller computational cost compared to singlelevel estima- tors. 1.1 Outline of the thesis Chapter 2 is intended as an introduction to some notions needed from functional analysis that may be new to some readers. In particular we focus on Hilbert-Schmidt operators, trace class operators and fractional operators along with selected properties of them. We also give a short introduction to parts of stochastic calculus in infinite dimensions, especially the Q-Wiener process and the stochastic integral driven by it. In Chapter 3 we present a semilinear SPDE of evolutionary type, different notions of solutions of it, as well as the main assumptions we make on the parameters involved. We also recapitulate an existence and uniqueness result on the mild solution. Chapter 4 contains a brief summary of the spatio-temporal discretization scheme of [11] along with strong error estimates of this. In the last part of this chapter we prove a strong error estimate with respect to discretization of the covariance operator of the Q-Wiener process. In Chapter 5 we describe the Monte Carlo method and prove results on its application to simulation of strong and weak rates of convergence. We also describe the multilevel Monte Carlo method and its application to estimating weak convergence rates. Chapter 6 contains our implementation of the theory of the previous chapters. We esti- mate strong convergence rates and compare single- and multilevel Monte Carlo results on the estimation of the weak convergence rate. 1.2 On notation In this thesis we mostly follow the notation of [11], with one notable exception in the form of Hilbert-Schmidt spaces. These we denote by LHS(·; ·). We also mention that we use generic constants denoted by C. These may vary from line to line in, for example, an equation and are always assumed not to depend on the spatial and temporal step sizes. Finally, we note that when we for real variables x, y write x ' y we mean that there exists strictly positive constants C1 and C2 such that C1x ≤ y ≤ C2x. 3 2 Stochastic calculus in Hilbert spaces The purpose of this chapter is to introduce some basic concepts needed for the definition of a stochastic partial differential equation (SPDE). It is assumed that the reader has some basic familiarity with measure theory and functional analysis. For an introduction to the material which presupposes less familiarity, [13] is an excellent resource. However, for details on the construction of the Q-Wiener process and the Itoˆ integral, we follow the slightly different approach of [16], which is also a good introductory text. In this whole chapter, let 0 < T <∞ and let (Ω,F , (Ft)t∈[0,T ], P ) be a filtered probability space where (Ft)t∈[0,T ] is a so called normal filtration, i.e. (i) F0 contains all null sets of F , and (ii) Ft = ⋂ s>t . Furthermore, let H be a real separable Hilbert space with inner product 〈·, ·〉H endowed with the Borel σ-algebra B(H). Let {ei}i∈N be an orthonormal basis (ONB) of H. 2.1 Hilbert-Schmidt spaces and trace class operators This subsection serves to introduce concepts from functional analysis that may be new to some readers. We start with the definition of compactness of operators. Definition 2.1. Given two Banach spaces B1 and B2 with G : X → Y being linear, we say that G is compact if whenever a sequence (xi)i∈N is bounded in B1 then (Gxi)i∈N has a convergent subsequence in B2. We now introduce so called Hilbert-Schmidt operators, which are a subset of linear bounded operators. They will play a very important role throughout the thesis. Definition 2.2. Let U be another real separable Hilbert space with ONB (fi)i∈N and let G ∈ L(U ;H). Then we refer to G as an Hilbert-Schmidt operator if ∞∑ i=1 ||Gfi|| 2 H <∞. The collection of all such operators is denoted by LHS(U ;H) or LHS(H) if U = H. It holds that LHS(U ;H) is a separable Hilbert space when it is equipped with the inner 4 product 〈 G, G˜ 〉 LHS(U ;H) := ∞∑ i=1 〈 Gfi, G˜fi 〉 H . Next, we prove that the norm defined by this inner product is an upper bound of the operator norm. Lemma 2.3. Let A ∈ LHS(U ;H). Then ||A||L(U ;H) ≤ ||A||LHS(U ;H). Proof. Take any x ∈ U such that ||x||U = 1. Then, by the Cauchy–Schwarz inequality in the sequence space `2, ||Ax||H = || ∑ i∈N 〈x, fi〉U Afi||H ≤ ( ∑ i∈N 〈x, fi〉 2 U ) 1 2 ( ∑ i∈N ||Afi|| 2 H) 1 2 = ||A||LHS(U ;H) which implies the inequality by definition of the operator norm. Another important notion is that of the trace of an operator: Definition 2.4. Let Q ∈ L(U) be self-adjoint and positive definite. We define the trace of G by tr(Q) := ∑ i∈N 〈Qfi, fi〉U Whenever this quantity exists it is independent of the choice of the orthonormal basis, see e.g. [9, page 18]. In this case we refer to Q as an operator of finite trace or a trace class operator. Reasoning as in [11, page 12], from [6, Prop. C.3] it follows that such operators are compact. Therefore, by the Spectral Theorem [14, Theorem 4.24], Q diagonalizes with respect to an ONB (fi)i∈N of U , i.e. Qfi = µifi (1) for all i ∈ N with µi ∈ R+. We therefore have tr(Q) = ∑∞ i=1 µi. Similarly, given the eigenbasis (fi)i∈N, we can construct a trace class operator by choosing a positive real sequence (µi)i∈N of eigenvalues such that ∑∞ i=1 µi < ∞. We will also use the following result. 5 Proposition 2.5. [11, Proposition 2.6] Let Q ∈ L(U) be positive definite and self- adjoint. Then there exists a unique self-adjoint and positive definite operator Q 1 2 ∈ L(U) such that Q 1 2 ◦Q 1 2 = Q. Given the relation (1), it is easy to see that Q 1 2 fi = µ 1 2 i fi (2) for the ONB (fi)i∈N of U . Hence, we have the following relationship between this operator and the trace of Q: ||Q 1 2 ||2LHS(U) = tr(Q). (3) 2.2 Semigroups and fractional powers of operators In this section, we will consider densely defined, linear, self-adjoint positive definite operators A with compact inverse which are not necessarily bounded. An example is −∆ when H = L2([0, 1];R). Here ∆ is the Laplace operator, which will play a vital part in the SPDE considered in later parts of the thesis. We first define the semigroups that such operators generate. They can be thought of as extensions of the exponential operator. The definitions come from [11, Appendix B.1]. Definition 2.6. Consider a Banach space B. A family (E(t))t∈[0,∞) with E(t) ∈ L(B) for all t ∈ [0,∞) is called a strongly continuous semigroup or a C0-semigroup if (i) E(0) = I, the identity operator, (ii) E(t+ s) = E(t)E(s) for all t, s ≥ 0 and (iii) lim t↘0 E(t)b = b for all b ∈ B. If in addition, (iv) ||E(t)||L(B) ≤ 1 for all t > 0, then E is called a semigroup of contractions. 6 Definition 2.7. Let (E(t))t∈[0,∞) and B be as in the previous definition. The linear operator −A defined by −Ab = lim h↘0 E(h)b− b h with domain dom(−A) = { b ∈ B : lim h↘0 E(h)b− b h exists in B } is called the inifinitesimal generator of the semigroup (E(t))t∈[0,∞). For our choice of A (again, think of the Laplace operator) the following two results on fractional operators hold. Here we take B = H, the Hilbert space considered in the beginning of this chapter. Proposition 2.8. [11, Appendix B.2] Let A : dom(A) ⊆ H → H be a densely defined, linear, self-adjoint and positive definite operator with compact inverse A−1. Then A diagonalizes with respect to an eigenbasis of H (ei)i∈N in H with an increasing sequence of eigenvalues (λi)i∈N. Furthermore, for r ≥ 0, the fractional operators A r 2 : dom(A r 2 ) ⊆ H → H are defined by A r 2x := ∞∑ n=1 λ r 2 i 〈x, ei〉H ei for x ∈ dom(A r 2 ). It also holds that H˙r := dom(A r 2 ) = { x ∈ H : ||x||2r := ∞∑ i=1 λri 〈x, ei〉 2 H } are separable Hilbert spaces when equipped with the inner product 〈·, ·〉r := 〈 A r 2 ·, A r 2 · 〉 H . The operator −A generates a C0-semigroup of contractions, which is explicitly expressed in the next corollary that finishes this section. Corollary 2.9. [13, Lemma 3.21] Let A : dom(A) ⊆ H → H be a densely defined, linear, self-adjoint and positive definite operator with compact inverse A−1. Then, the 7 family (E(t))t∈[0,∞) with E(t) ∈ L(H) defined by E(t)h := ∞∑ i=1 e−λit 〈h, ei〉H ei is a C0-semigroup of contractions, generated by −A. 2.3 Random variables In this section, we generalize some common notions from real-valued probability theory to our setting. The solution X to the SPDE considered later will have to take values in a general Hilbert space, therefore the common definition of a real valued random variable must be extended to a more general notion. Definition 2.10. Let (B, || · ||B) be any Banach space. An F − B(B) measurable function X : Ω→ B is called a B-valued random variable. If B = R then we refer to it as a random variable. Definition 2.11. Let (Ei)i∈I be a (possibly uncountable) family of sub-σ-algebras of F . These are said to be independent if for any finite subset J ⊆ I and every family (Ej)j∈J with Ej ∈ Ej we have P   ⋂ j∈J Ej   = ∏ j∈J P (Ej). A family of B-valued random variables (Xi)i∈I is called independent if the corresponding family of generated σ-algebras (σ(Xi))i∈I is independent. To define the expectation of X, one needs the so called Bochner integral, an exten- sion of the Lebesgue integral to functions taking values in any Banach space. For the construction of it, we refer to [8, pages 156 and 179]. Definition 2.12. The expectation of a B-valued random variable X is given by E [X] := ∫ ω∈Ω X(ω)dP (ω) whenever E [||X||B] <∞ . 8 We note one important property of the Bochner integral: ||E [X] ||B ≤ E [||X||B] . (4) One can go on and define a covariance that takes values in Hilbert Spaces, but for our purposes we will incorporate this in the definition of a Gaussian H-valued random variable. There are several equivalent definitions of Gaussian law in Hilbert spaces, here we follow that of [16]. Definition 2.13. A probability measure µ on (H,B(H)) is called Gaussian if for each h ∈ H, the bounded linear mapping 〈·, h〉H has a Gaussian law, i.e. there exist real numbers mh and σh ≥ 0 such that if σh > 0 µ({u ∈ H : 〈u, h〉H ∈ D)}) = 1 √ 2piσ2h ∫ A e − (x−mh) 2 2σ2h dx for all D ∈ B(R), and if σh = 0, µ({u ∈ H : 〈u, h〉H ∈ D)}) = 1D(mh) for all D ∈ B(R), where 1D is the indicator function of D. Theorem 2.14. [16, Theorem 2.1.2] A probability measure µ on (H,B(H)) is Gaus- sian if and only if its characteristic function µˆ(h) := ∫ H ei〈u,h〉Hµ(du) = ei〈h,m〉H− 1 2 〈Qh,h〉H (5) for all h ∈ H where m ∈ H and Q ∈ L(H) is of trace class. Conversely, we have: Theorem 2.15. [16, Corollary 2.1.7] Let Q ∈ L(H) be of trace class and let m ∈ H. Then there exists a Gaussian measure µ fulfilling (5). Definition 2.16. Let X be an H-valued random variable. X is called a Gaussian H- valued random variable if its image measure P ◦X−1 is a Gaussian probability measure. In this case, Q in Theorem 2.14 is called the covariance (operator) of X, and we write X ∼ N(m,Q). In connection to this, we also mention that by [16, Proposition 2.16] E [X] = m. 9 2.4 Q-Wiener Processes In this section, we define an infinite-dimensional analogue to the Wiener process. First we need to introduce stochastic processes. Definition 2.17. Given a Banach space B, a family of B-valued random variables (X(t))t∈[0,T ] is called a B-valued stochastic process. It is said to be adapted if X(t) is Ft-measurable for all t ∈ [0, T ]. We can equally well think of a stochastic process as a function X : [0, T ]× Ω→ B and we will mostly use this notation. The next definition follows the lines of [16]. Definition 2.18. Let Q be a trace class operator Q ∈ L(H). A stochastic process W : [0, T ]× Ω→ H on (Ω,F , (Ft)t∈[0,T ], P ) is called a (standard) Q-Wiener process if ˆ W (0) = 0, ˆ W has P -a.s. continuous trajectories, ˆ W has independent increments and ˆ for all 0 ≤ s < t ≤ T the increment W (t)−W (s) ∼ N(0, (t− s)Q). If also the following holds, ˆ W is adapted to (Ft)t∈[0,T ] and ˆ W (t)−W (s) is independent of Fs for all 0 ≤ s < t ≤ T , then W is called a Q-Wiener process with respect to the filtration (Ft)t∈[0,T ]. When H = R we allow for Q = I. In this case we call W a real-valued (standard) Wiener process or a Brownian motion and we denote it by β. Using this process, we mention another representation of the general Q-Wiener process. This is called the Karhunen–Loe`ve expansion. Theorem 2.19. [13, Theorem 10.7] Let Q be as above with eigenvectors (ei)i∈N and eigenvalues (µi)i∈N. Then W : [0, T ]× Ω→ H is a Q-Wiener process if and only if W (t) = ∞∑ i=1 µ 1 2 i βi(t)ej , P -a.s. where (βj)∞j=1 is a sequence of independent identically distributed real-valued Wiener processes on (Ω,F , (Ft)t∈[0,T ], P ). The series converges in L 2(Ω, H) and even in L2(Ω, C([0, T ], H)). 10 2.5 Stochastic integrals Before defining the stochastic Itoˆ integral which takes values in Hilbert spaces, it is useful to, as in [11], introduce the separable Hilbert space H0 := Q 1 2 (H) together with the inner product 〈·, ·〉H0 := 〈 Q− 1 2 ·, Q− 1 2 · 〉 H (6) If Q is not one-to-one, Q− 1 2 denotes the pseudoinverse of Q 1 2 . Now note that if H is a Hilbert space, then so is L(H) when equipped with the operator norm [14, Proposition 2.3]. This allows us to consider Bochner integrals with respect to L(H)−valued stochastic processes. We denote the H-valued stochastic Itoˆ integral of a stochastic process Φ : [0, T ] × Ω → L(H) with respect to the Q-Wiener process W as ∫ T 0 Φ(s)dW (s). As stated in [11, page 17], this is a well defined H-valued random variable if Φ is integrable, that is, if, Φ ∈ L2([0, T ]× Ω,PT ,dt⊗ P ;LHS(H0, H)) where PT is the σ-algebra of predictable stochastic processes, PT := σ({(s, t]× Fs|0 ≤ s < t ≤ T, Fs ∈ Fs} ∩ {{0} × F0|F0 ∈ F0}). We will not go into the construction of it here but refer to [16] for this. We will, however, mention two key properties of it, from [11, Chapter 2.2]. Theorem 2.20 (Itoˆ isometry). For all integrable stochastic processes Φ : [0, T ]×Ω→ L(H) the following holds: E [∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∫ t 0 Φ(s) dW (s) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 ] = ∫ t 0 ||Φ(s)||2LHS(H0;H) ds for t ∈ [0, T ]. 11 Theorem 2.21 (Burkholder-Davis-Gundy-type inequality). For any p ∈ [2,∞), 0 ≤ t1 < t2 ≤ T and for any predictable process Φ : [0, T ] × Ω → LHS(H0;H) satisfying E [(∫ t2 t1 ||Φ(s)||2LHS(H0;H)ds ) p 2 ] <∞, there exists a constant C > 0 depending only on p such that E [∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∫ t2 t1 Φ(s)dW (s) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ p ] ≤ CE [(∫ t2 t1 ||Φ(s)||2LHS(H0;H)ds ) p 2 ] . We end by proving the following upper bound on ||Φ(s)||LHS(H0;H). Lemma 2.22. Let Φ : [0, T ]× Ω→ L(H) be an integrable stochastic process. Then ||Φ(s)||LHS(H0;H) ≤ tr(Q) 1 2 ||Φ(s)||L(H) (7) Proof. By (6), we have that ||Φ(s)||2LHS(H0;H) = ||Φ(s)Q 1 2 ||2LHS(H) = ∑ i∈N ||Φ(s)Q 1 2 ei|| 2 H Since Φ(s) ∈ L(H), ∑ i∈N ||Φ(s)Q 1 2 ei|| 2 H ≤ ||Φ(s)|| 2 L(H) ∑ i∈N ||Q 1 2 ei|| 2 H = ||Φ(s)||2L(H) tr(Q) where the equality follows by equation (3). 12 3 Semilinear stochastic partial differential equations In this chapter, we introduce the stochastic partial differential equation treated in the remainder of this thesis. We consider a simplified version of the setting used in [11], which will be outlined in the next section. 3.1 Setting and assumptions From now on, we consider the separable Hilbert space H = L2([0, 1];R). For the prob- ability space, the same assumptions as in Chapter 2 apply. We consider the equation dX(t) + [AX(t) + F (X(t)]dt = G(X(t))dW (t), for 0 ≤ t ≤ T X(0) = X0. (8) This is to be understood as the integral equation X(t) = X0 − ∫ t 0 [AX(s) + F (X(s))]ds+ ∫ t 0 G(X(S))dW (s), where the left integral is of Bochner type while the second is an Itoˆ integral, so that X(t) is H-valued for all t ∈ [0, T ]. We will return to what we mean by a solution to (8) in Section 3.2, but first we will describe our assumptions on the terms of the equation. We refer to r below as the regularity parameter. Assumption 3.1. (i) W is a Q-Wiener process adapted to the filtration (Ft)t∈[0,T ]. Given the ONB (ei)i∈N with ei = √ 2 sin(ipix), the trace class operator Q on H is defined through the relation Qei = µiei where µi = Cµi−η for some constants Cµ > 0 and η > 1. (ii) The linear operator −A : dom(A) → H is the Laplacian with zero boundary conditions. (iii) Only the homogenous case is considered, i.e. F = 0. (iv) Fix a parameter r ∈ [0, 1). The mapping G : H → LHS(H0;H) satisfies for a constant C > 0 (a) G(h) ∈ LHS(H0, H˙r) for all h ∈ H˙r, 13 (b) ||A r 2G(h)||LHS(H0;H) ≤ C(1 + ||h||r) for all h ∈ H˙ r , (c) ||G(h1)−G(h2)||LHS(H0;H) ≤ C||h1 − h2||H for all h1, h2 ∈ H and (d) ||G(h)ei||H ≤ C||h||H for all basis vectors ei and h ∈ H. (v) For r ∈ [0, 1) we assume that X0 ∈ H˙1+r is the deterministic initial value of the SPDE. Regarding the choice of the linear operator A, we note that it is known (see e.g. [13, Ex- ample 1.90]) that Proposition 2.8 holds for A with the eigenbasis {ei}i∈N and eigenvalues λi = i2pi2. 3.2 Strong and mild solutions There are several notions of solutions to (8), two of which we will list here: the strong solution and the mild solution. In general, we expect that a strong solution is also mild but not vice versa [13, page 449] . The definition of the strong solution comes from [6]. Definition 3.2 (Strong solution). A predictable H-valued process X : [0, T ]×Ω→ H is called a strong solution of (8) if for all t ∈ [0, T ]: (i) X(t) ∈ H˙2 PT -almost surely, (ii) P (∫ T 0 |X(s)|+ |AX(s)| ds <∞ ) = 1, (iii) P (∫ T 0 ||G(X(S))|| 2 LHS(H0;H) ds <∞ ) = 1 and (iv) X(t) = X0 − ∫ t 0 [AX(s) + F (X(s))]ds+ ∫ t 0 G(X(S))dW (s). Under Assumption 3.1(ii) −A is the generator of the semigroup E of Corollary 2.9. Now, for the mild solution, we follow the definition in [11]. Definition 3.3 (Mild solution). A predictable H-valued process X : [0, T ]× Ω→ H is called a p-fold integrable mild solution of (8) if sup t∈[0,T ] ||X(t)||Lp(Ω;H) <∞ and for all t ∈ [0, T ] and h ∈ H, we have that P -a.s. X(t) = E(t)X0 − ∫ t 0 E(t− s)F (X(s))ds+ ∫ t 0 E(t− s)G(X(s))dW (s). (9) 14 This last definition is the one we will consider in this thesis, and in the next section, we cite an existence and uniqueness result. 3.3 Existence and uniqueness of the mild solution Assumption 3.1 is stronger than Assumptions 2.13 to 2.17 of [11, Chapter 2] and hence we can use the corresponding result on existence and uniqueness of the mild solution. Theorem 3.4. [11, Theorem 2.25] Let Assumption 3.1 hold. Then there exists a unique (up to a modification) integrable mild solution X : [0, T ] × Ω → H to (8) such that for every t ∈ [0, T ] and every s ∈ [0, 1) it holds that P (X(t) ∈ H˙s) = 1 with sup t∈[0,T ] ||X(t)||L2(Ω;H˙s) <∞ (10) Furthermore, for every δ ∈ (0, 12) there exists a constant C > 0 such that ||X(t1)−X(t2)||L2(Ω;H) ≤ C|t1 − t2| δ (11) for all t1, t2 ∈ [0, T ]. We also mention that due to the stronger assumptions made here, Assumption 2.19 and 2.20 of [11, Chapter 2] are also satisfied, and so the temporal regularity in Theorem 3.4 also holds for δ = 12 , by Theorem 2.31 of [11]. Uniqueness is understood in the sense that if 15 4 Discretization methods for SPDE In this chapter, we show how one can discretize the solution of (8) so that it can be simulated on a computer. From now on, we assume the conditions of Assumption 3.1 and consider approximations of the mild solution. Throughout the sections 4.1 and 4.2 we follow closely the approach of [11] but after that we leave this context and consider how the covariance operator Q can be discretized. 4.1 Galerkin finite element methods In this section, we briefly describe the Galerkin finite element method, which is our first step in the discretization of (8). Here finite dimensional subspaces of H are considered, and so we speak of spatial discretizations. Let (Vh)h∈(0,1] be a sequence of finite dimensional subspaces such that Vh ⊂ H˙ 1 ⊂ H. For these spaces, we follow the notation of [11] and consider two orthogonal projections: the usual Ph : H → Vh and the Ritz projection Rh : H˙1 → Vh. These are defined by the relations 〈Phx, yh〉H = 〈x, yh〉H for all x ∈ H, yh ∈ Vh and 〈Rhx, yh〉1 = 〈x, yh〉1 for all x ∈ H˙ 1, yh ∈ Vh. As in [11, Chapter 3.2], we make the following assumptions on these projections: Assumption 4.1. For the given family of subspaces (Vh)h∈(0,1] and all h ∈ (0, 1] there exists a constant C such that (i) ||Phx||1 ≤ C||x||1 for all x ∈ H˙1 and (ii) ||Rhx− x||1 ≤ Chs||x||1 for all x ∈ H˙s with s ∈ {1, 2}. We will consider an explicit choice of (Vh)h∈(0,1] later on. Next we introduce the discrete version of the operator A, Ah : Vh → Vh. For each xh ∈ Vh we define Ahxh to be the unique element of Vh such that 〈Axh, yh〉H = 〈xh, yh〉1 = 〈Ahxh, yh〉H for all yh ∈ Vh. By using this relation with the properties of the inner product 〈·, ·〉1 one sees that Ah also is self-adjoint and positive definite on Vh. Therefore, as before, it is the generator of an analytic semigroup of contractions which we denote by Eh(t) and 16 one can show (see e.g. [11, Section 3.4]) that there exists a unique stochastic process Xh : [0, T ]× Ω→ Vh which is the mild solution to the stochastic equation dXh(t) +AhXh(t)dt = PhG(Xh(t))dW (t), for 0 ≤ t ≤ T Xh(0) = PhX0. This is called the semidiscrete approximation of the solution to (8), but we will not consider it in detail in this thesis. The interested reader is referred to [11] for this. Instead, we will focus on the fully discrete approximation in which we also consider a discretization with respect to time. 4.2 The implicit Euler scheme In this section, we mirror the approach of [11, Section 3.5] who in turn draws from [19, Chapter 7]. We refer to these sources for more details and generalizations of our informal introduction to the implicit (or backward) Euler–Maruyama scheme. Let again (Vh)h∈(0,1] be a sequence of finite dimensional subspaces such that for all h ∈ (0, 1], Vh ⊂ H˙1 ⊂ H. Consider the homogenous equation du(t) +Ahu(t)dt = 0 with inital value u(0) = u0 ∈ Vh for t > 0 and some fixed h ∈ (0, 1]. One can then show (see e.g. [19]) that the solution to this is given by the semigroup generated by −Ah, namely Eh(t). We may approximate this equation by defining the recursion uˆj − uˆj−1 + kAhuˆj = 0, j ∈ N for some fixed time step k ∈ (0, 1] where uˆj denotes the approximation of u(tj) with tj := jk. A closed form of this is then given by uˆj = (I + kAh) −ju0, j ∈ N0 Now, following the notation of [11, Section 3.5] we write Ek,h(t) := (I + kAh) −j if t ∈ [tj−1, tj) for j ∈ N and we call this operator the rational approximation of the semigroup E(t) generated by −A. We end this section by citing the following smoothing property of the scheme, from [11, page 67]: ||AρhEk,h(t)xh|| ≤ Ct −ρ j ||xh|| (12) which holds for any t ∈ [tj−1, tj), ρ ∈ [0, 1] and xh ∈ Vh. 17 4.3 A fully discrete strong approximation of the SPDE In this section, we combine the Galerkin method with the linearly implicit Euler– Maruyama scheme and cite a convergence rate of the fully discrete approximation Xjh of X(tj), where X is the mild solution of (8). For this, consider the same sequence of subspaces (Vh)h∈(0,1] as before and let T > 0 be the fixed final time. Define a uniform timegrid with a time step k ∈ (0, 1] by tj = jk, j = 0, 1, ..., Nk with Nkk = T . Denote the fully discrete approximation of X(tj), where X is the mild solution of (8), by Xjh. The recursion scheme that approximates X is Xjh −X j−1 h + k(AhX j h) = PhG(X j−1 h )∆W j for j = 1, ..., Nk X0h = PhX0 (13) where ∆W j are the Wiener increments W (tj)−W (tj−1). In terms of the operator Ek,h(t) one may equally well express this as Xjh = Ek,h(tj−1)PhX0 + ∫ tj 0 Ek,h(tj − s)PhGh(s) dW (s) (14) where Gh(s) := { G(Xj−1h ) if s ∈ (tj−1, tj ], G(PhX0) if s = 0. The following key theorem on convergence of the fully discrete approximation from [11] holds. Theorem 4.2. [11, Theorem 3.14] Under Assumptions 3.1 with r ∈ [0, 1) and 4.1, for all p ∈ [2,∞) there exists a constant C independent of k, h ∈ (0, 1] such that ||Xjh −X(tj)||Lp(Ω;H) ≤ C(h 1+r + k 1 2 ). (15) 4.4 Noise approximation An issue remaining when one wants to simulate a realisation of X(t) for some t ∈ [0, T ] is how to simulate the Q-Wiener process W . We know that this can be expressed as an infinite sum of Brownian motions (see Theorem 2.19, the Karhunen–Loe`ve expan- sion), but we cannot simulate an infinite number of Brownian motions on the computer. 18 Therefore, if one wants to use this expansion, one needs to truncate it at some point κ ∈ N. We then end up with a new Q-Wiener process: W κ(t) = κ∑ j=1 µ 1 2 j βj(t)ej with the corresponding covariance operator Qκ defined by the relation Qκej = 1{j≤κ}µjej . In the same way as before, we have a mild solution to (8) but now with truncated noise, and as in (14) it can be represented by Xjκ,h = Ek,h(tj−1)PhX0 + ∫ tj 0 Ek,h(tj − s)PhGκ,h(s) dW κ(s) (16) where Gκ,h(s) := { G(Xjκ,h) if s ∈ (tj−1, tj ] G(PhX0) if s = 0. We also introduce W cκ(t) := W (t)−W κ(t) = ∞∑ j=κ+1 µ 1 2 j βj(t)ej (17) which also is a Q-Wiener process with covariance operator Qcκ = Q − Qκ. It can be seen that for the stochastic integral it holds ∫ t 0 φ(s)dW (s)− ∫ t 0 φ(s)dW κ(s) = ∫ t 0 φ(s)dW cκ(s). In the following proof, we use these notions to give an error bound for Xjκ,h, c.f. Theo- rem 4.2, when κ ∈ N is chosen appropriately, to reflect the decay η, of the eigenvalues µj of Q, (see 3.1(i)). For this we take an approach that is very similar to the one found in [2]. 19 Theorem 4.3. Assume that Assumption 3.1 with r ∈ [0, 1) and Assumption 4.1 hold. Assume also that κ ' h−β for some β > 0. Furthermore, if h1+r ' k 1 2 and β(η − 1) = 2(1 + r), for all p ∈ [2,∞) it holds that ||X(tj)−X j κ,h||Lp(Ω;H) ≤ Ch 1+r. for some constant C > 0. Proof. Throughout this proof, we will use C to refer to any constant. First we split the error, by using Lemma A.2: ||X(tj)−X j κ,h|| 2 Lp(Ω;H) ≤ 2(||X(tj)−X j h|| 2 Lp(Ω;H) + ||X j h −X j κ,h|| 2 Lp(Ω;H)) =: 2(I + II) For the first term, it holds that I ≤ C(h1+r + k 1 2 )2 ' Ch2(1+r) (18) by Theorem 4.2. By Lemma A.2 and the representation of the fully discrete approxi- mation (14) and its truncated version (16) we have for II: ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∫ tj 0 Ek,h(tj − s)PhGh(s) dW (s)− ∫ tj 0 Ek,h(tj − s)PhGκ,h(s) dW k(s) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 Lp(Ω;H) ≤ 2 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∫ tj 0 Ek,h(tj − s)Ph(Gh(s)−Gκ,h(s)) dW (s) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 Lp(Ω;H) + 2 ∣ ∣ ∣ ∣ ∣ ∣ ∫ tj 0 Ek,h(tj − s)PhGκ,h(s) dW (s) − ∫ tj 0 Ek,h(tj − s)PhGκ,h(s) dW κ(s) ∣ ∣ ∣ ∣ ∣ ∣ 2 Lp(Ω;H) =: 2IIa + 2IIb 20 Now, by Theorem 2.21: IIa = E [∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∫ tj 0 Ek,h(tj − s)Ph(Gh(s)−Gκ,h(s)) dW (s) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ p H ] 2 p ≤ C E [(∫ tj 0 ||Ek,h(tj − s)Ph(Gh(s)−Gκ,h(s))|| 2 LHS(H0;H) ds ) p 2 ] 2 p = C E   (∫ tj 0 ∑ i∈N ||Ek,h(tj − s)Ph(Gh(s)−Gκ,h(s))Qei|| 2 H ds ) p 2   2 p ≤ C E   (∫ tj 0 ∑ i∈N ||(Gh(s)−Gκ,h(s))Qei|| 2 H ds ) p 2   2 p = C E [(∫ tj 0 ||(Gh(s)−Gκ,h(s))|| 2 LHS(H0;H) ds ) p 2 ] 2 p ≤ CE   ( k j∑ n=1 ||Xn−1h −X n−1 κ,h || 2 H ) p 2   2 p ≤ Ck j∑ n=1 ||(||Xn−1h −X n−1 κ,h || 2 H)||L p 2 (Ω;R) = Ck j∑ n=1 ||Xn−1h −X n−1 κ,h || 2 Lp(Ω;R), where the second inequality follows from the smoothing result (12) with ρ = 0, while the third follows from Assumption 3.1(iv)(c) and the fourth is the triangle inequality. For the other term, by the discussion preceeding this theorem and the representation (2) of (Qcκ) 1 2 : IIb = ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∫ tj 0 Ek,h(tj − s)PhGκ,h(s) dW cκ(s) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 Lp(Ω;H) ≤ C E [(∫ tj 0 ||Ek,h(tj − s)PhGκ,h(s)|| 2 LHS((Qcκ) 1 2 [H];H) ds ) p 2 ] 2 p = C E   (∫ tj 0 ∞∑ i=κ+1 µi||Ek,h(tj − s)PhGκ,h(s)ei|| 2 H ds ) p 2   2 p 21 ≤ C E   (∫ tj 0 ∞∑ i=κ+1 µi||Gκ,h(s)ei|| 2 H ds ) p 2   2 p where we have used Theorem 2.21 and (12) with ρ = 0 again. Now we note that, using Assumption 3.1(i) we have: ∞∑ i=κ+1 µi = Cµ ∞∑ i=1 (i+ κ)−η ≤ Cµ ∫ ∞ 0 (x+ κ)−η dx ≤ Chβ(η−1) (19) where we have used the fact that κ ' h−β. We now use this observation along with the fact that due to Assumption 3.1(iv)(d) we have ||Gκ,h(s)ei||H ≤ C||X j κ,h||H if s ∈ (tj−1, tj ] to see that E   (∫ tj 0 ∞∑ i=κ+1 µi||Gκ,h(s)ei|| 2 H ds ) p 2   2 p ≤ E   ( ( ∞∑ i=κ+1 µi)k j∑ n=1 ||Xn−1κ,h || 2 H ds ) p 2   2 p ≤ ( ∞∑ i=κ+1 µi)k j∑ n=1 ||Xn−1κ,h || 2 Lp(Ω;H) ≤ Ch β(η−1)k j∑ n=1 ||Xn−1κ,h || 2 Lp(Ω;H) ≤ Chβ(η−1)k j∑ n=1 ( ||Xn−1κ,h −X n−1 h || 2 Lp(Ω;H) + ||Xn−1h −X(t k n−1)|| 2 Lp(Ω;H) + ||X(t k n−1)|| 2 Lp(Ω;H) ) ≤ Chβ(η−1)(k j∑ n=1 ||Xn−1κ,h −X n−1 h || 2 Lp(Ω;H) + (h 1+r + k 1 2 )2 + 1) where the second inequality is the triangle inequality for L p 2 (Ω;H), the third follows from (19) the fourth follows from Lemma A.2 and the fifth from Theorem 4.2 and (10), noting that jk ≤ T . Using the bounds on IIa and IIb, we get II ≤ Ck(1 + hβ(η−1)) j∑ n=1 ||Xn−1κ,h −X n−1 h || 2 Lp(Ω;H) + Ch β(η−1)((h1+r + k 1 2 )2 + 1). 22 Now we can use the discrete Gro¨nwall inequality, Theorem A.1, with an = ||Xnκ,h − Xnh || 2 Lp(Ω;H) to get: II ≤ Chβ(η−1)((h1+r + k 1 2 )2 + 1)(1 + Ck(1 + hβ(η−1)))j ≤ Chβ(η−1)((h1+r + k 1 2 )2 + 1)eCkj(1+h β(η−1)) ≤ Chβ(η−1)((h1+r + k 1 2 )2 + 1)e2CT = Chβ(η−1)((h1+r + k 1 2 )2 + 1) ≤ Ch2(1+r), where we have used that hα ≤ 1 for α > 0 and also the assumption β(η − 1) = 2(1 + r). Taken together with (18), we have the result. 23 5 Monte Carlo methods In this chapter, we will describe how one can estimate quantities involving X(t). We start by defining the two types of errors we will analyse. Throughout this chapter, we use the notation Xjκ,h to refer to the truncated fully discrete approximation defined in (16). 5.1 Strong and weak errors We refer to the error ||X(tj) −X j κ,h||L2(Ω;H) of Theorem 4.3 as the strong error of the truncated fully discrete approximation Xjκ,h. Often, one may not be interested in the paths of the solution to our SPDE (8) but rather the average value of some functional of its value at the final time T . Therefore, one is then interested in the weak error |E[φ(X(T )]− E[φ(XNkh,κ)]| (20) where φ : H → R can be any sufficiently smooth test function. In our case, we set Φ := || · ||2 and refer to the expression |E[||X(T )||2H ]− E[||X Nk h,κ|| 2 H ]| (21) as the weak error of our truncated fully discrete approximation of X(T ). Before we continue, we need to briefly mention the definition of a Fre´chet differentiable operator. Definition 5.1. Let B1 and B2 be Banach spaces and let U ⊆ B1 be an open set. A function φ : U → B1 is called Fre´chet differentiable at x ∈ U if there exists φ′(x) ∈ L(B1;B2) such that lim h→0 ||φ(x+ h)− φ(x)− φ′(x)h||B2 ||h||B1 = 0. Then φ′(x) is referred to as the Fre´chet derivative of φ at x ∈ U . The weak error is weaker than the strong error in the sense that (as is mentioned in e.g. [11, page 3]): |E[φ(X(T )]− E[φ(XNkh,κ)]| ≤ C||X(T )−X Nk κ,h||L2(Ω;H). (22) 24 This holds true when φ is Fre´chet differentiable and ||φ′(x)||L(H) ≤ C(1 + ||x|| p−1 H ). Our choice of φ indeed fulfils this condition for all p ≥ 2, as: ||φ′(x)||L(H) = || 〈·, x〉H ||L(H) ≤ ||x||H by the Cauchy–Schwarz inequality. Therefore, every strongly convergent approximation is also weakly convergent. 5.2 The Monte Carlo method We first briefly review what the (ordinary) Monte Carlo method entails. Let (Yˆi)i∈N be a sequence of independent, identically distributed (i.i.d.) U -valued random variables, where U may be any Hilbert space. Then, for large enough N ∈ N, one could as in the real case expect to have EN (Y ) := 1 N N∑ i=1 Yˆi ≈ E [Y ] . That this is true is made clear by the following. We cite a simple form of the law of large numbers that holds true in general Hilbert spaces. Lemma 5.2. [4, Lemma 4.1] For N ∈ N and for Y ∈ L2(Ω;U) it holds that ||E [Y ]− EN [Y ]||L2(Ω;U) ≤ 1 √ N ||Y ||L2(Ω;U). Using this lemma, we can estimate the additional error when estimating the strong (L2-)error. Proposition 5.3. Let the assumptions of Theorem 4.3 be fulfilled. Then, the Monte Carlo estimator with N ∈ N of ||X(tj)−X j κ,h||L2(Ω;H) satisfies ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(tj)−X j κ,h|| 2 H ] 1 2 − ||X(tj)−X j κ,h||L2(Ω;H) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) ≤ 1 N 1 4 ||X(tj)−X j κ,h||L4(Ω;H). 25 Proof. We have that ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(tj)−X j κ,h|| 2 H ] 1 2 − ||X(tj)−X j κ,h||L2(Ω;H) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) = ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(tj)−X j κ,h|| 2 H ] 1 2 − E [ ||X(tj)−X j κ,h|| 2 H ] 1 2 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) ≤ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(tj)−X j κ,h|| 2 H ] − E [ ||X(tj)−X j κ,h|| 2 H ]∣ ∣ ∣ 1 2 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) = ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(tj)−X j κ,h|| 2 H ] − E [ ||X(tj)−X j κ,h|| 2 H ]∣ ∣ ∣ ∣ ∣ ∣ L1(Ω;R) ≤ ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(tj)−X j κ,h|| 2 H ] − E [ ||X(tj)−X j κ,h|| 2 H ]∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) ≤ 1 √ N E [ ||X(tj)−X j κ,h|| 4 H ] 1 2 = 1 √ N ||X(tj)−X j κ,h|| 2 L4(Ω;R), where the first inequality follows from the fact that | √ a − √ b| ≤ √ |a− b| for a, b ≥ 0. The second inequality is the Ho¨lder inequality while the third follows from Lemma 5.2. When it comes to the weak error, there are (at least) two ways of approximating it with a Monte Carlo method, namely ∣ ∣ ∣E [ ||X(T )||2H ] − EN [ ||XNkh,κ|| 2 H ]∣ ∣ ∣ (23) and ∣ ∣ ∣EN [ ||X(T )||2H − ||X Nk h,κ|| 2 H ]∣ ∣ ∣ . (24) In practice, neither E[||X(T )||2H ] nor ||X(T )|| 2 H will be known exactly, so one has to estimate them. However, there is an important distinction. The quantity E[||X(T )||2H ] is a real number that can be estimated independently of EN [ ||XNkh,κ|| 2 H ] while ||X(T )||2H is a real-valued random variable that must be simulated using the same realisation of the Q-Wiener process as ||XNkh,κ|| 2 H . We will return to this in more detail in later sections. For now, we prove the following result, analogously to Proposition 5.3. 26 Proposition 5.4. Let the assumptions of Theorem 4.3 be fulfilled. Then, the Monte Carlo estimators (23) and (24) with N ∈ N of |E[||X(T )||2H ]− E[||X Nk h,κ|| 2 H ]| satisfy: ∣ ∣ ∣ ∣ ∣ ∣|E [ ||X(T )||2H ] − EN [ ||XNkh,κ|| 2 H ] | − |E [ ||X(T )||2H − ||X Nk h,κ|| 2 H ] | ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) ≤ C √ N (25) and ∣ ∣ ∣ ∣ ∣ ∣|EN [ ||X(T )||2H − ||X Nk h,κ|| 2 H ] | − |E [ ||X(T )||2H − ||X Nk h,κ|| 2 H ] | ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) ≤ C √ N ∣ ∣ ∣ ∣ ∣ ∣X(T )−XNkκ,h ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;H) . (26) Proof. By the reverse triangle inequality, the left hand side of (25) is bounded by ||E [ ||XNkh,κ|| 2 H ] − EN [ ||XNkh,κ|| 2 H ] ||L2(Ω;R). The inequality now follows from Lemma 5.2 and the fact that ||XNkh,κ||L2(Ω;H) ≤ C < ∞ for some C > 0, which in turn is a conse- quence of Theorem 4.3 and (10). Next, we again use the reverse triangle inequality to see that the left hand side of (26) is bounded by ∣ ∣ ∣ ∣ ∣ ∣EN [ ||X(T )||2H − ||X Nk h,κ|| 2 H ] − E [ ||X(T )||2H − ||X Nk h,κ|| 2 H ]∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) ≤ 1 √ N ∣ ∣ ∣ ∣ ∣ ∣||X(T )||2H − ||X Nk h,κ|| 2 H ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) = 1 √ N ∣ ∣ ∣ ∣ ∣ ∣ 〈 X(T ) +XNkh,κ, X(T )−X Nk h,κ 〉 H ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;R) ≤ C √ N ∣ ∣ ∣ ∣ ∣ ∣X(T )−XNkh,κ ∣ ∣ ∣ ∣ ∣ ∣ L2(Ω;H) Here, the first inequality follows from Lemma 5.2, while the second follows from the Cauchy–Schwarz inequality along with the fact that ||XNkh,κ||L2(Ω;H) ≤ C < ∞ and ||X(T )||L2(Ω;H) ≤ C <∞ for some C > 0. 27 5.3 The Multilevel Monte Carlo method One problem with the Monte Carlo estimator of the strong and weak errors described in the previous section is the large number of samples needed for a good estimate. This is not a problem when the approximation in time and space is rough, since it is compu- tationally relatively cheap. However, this estimator has a high bias. A compromise is not to solve all samples on the same discretization level - we want to generate a large number of samples on a coarse grid and fewer samples on a fine grid and then add them together to get an estimator that has both low variance and bias and is computationally cheaper than the estimator EN (Y ), which we from now on will refer to as the singlelevel Monte Carlo estimator. In this section, we introduce the multilevel Monte Carlo in a general framework, similar to that of [3]. Assume that (Y`)`∈N0 is a sequence of approximations of the U -valued random variable Y , where we have denoted the set of non-negative integers by N0 as opposed by the set of positive integers which we denote by N. For any L ∈ N it holds that YL = Y0 + L∑ `=1 (Y` − Y`−1) and by the linearity of the expectation operator E [YL] = E [Y0] + L∑ l=1 E [(Y` − Y`−1)] . (27) This motivates the multilevel Monte Carlo estimator EL[YL] := EN0 [Y0] + L∑ `=1 EN`(Y` − Y`−1) where EN`(Y` − Y`−1) is the singlelevel Monte Carlo estimator with a number of inde- pendent samples N` depending on the level `. We prove the following slightly modified version of [3, Lemma 2.2]. Lemma 5.5. Let (Y`)l∈N0 be a sequence of approximations to Y ∈ L 2(Ω,R) and assume further that Y` ∈ L2(Ω;R) for all ` ∈ N0. Then it holds that ||E [Y ]− EL[YL]||L2(Ω,R) ≤ |E [Y − YL] |+ ( 1 N0 ||Y0|| 2 L2(Ω;R) + L∑ l=1 1 N` ||Y` − Y`−1|| 2 L2(Ω;R) ) 1 2 28 Proof. By the triangle inequality ||E [Y ]− EL[Y ]||L2(Ω,R) ≤ ||E [Y ]− E [YL] ||L2(Ω,R) + ||E [YL]− E L[YL]||L2(Ω,R) = |E [Y − YL] |+ ||E [YL]− EL[YL]||L2(Ω,R). For the second term, we have using the telescoping sum at the right hand side of (27), ||E [YL]− EL[YL]||2L2(Ω,R) = ||E [Y0]− EN0 [Y0] + L∑ l=1 (E [Y` − Y`−1]− EN` [Y` − Y`−1])|| 2 L2(Ω,R) = ||E [Y0]− EN0 [Y0]|| 2 L2(Ω,R) + L∑ l=1 ||(E [Y` − Y`−1]− EN` [Y` − Y`−1])|| 2 L2(Ω,R), where the last equality follows from the linearity of the ||·||2L2(Ω;R)-operator for real-valued independent zero-mean random variables. The result now follows from Lemma 5.2. We now make an explicit application of this multilevel Monte Carlo estimator. Recalling the notation and setting of Theorem 4.3, we set h` = h02−` for ` ∈ N0 with some real h0 > 0. Let Xˆ` := X Nk` κ`,h` where k` = h2` and κ` = h 2 η−1 ` . Then we get a series of H-valued random variables such that Xˆ` ∈ L2(Ω, H), and by Theorem 4.3, for all p ≥ 2 there exists a constant C > 0 independent of h`, κ` and k` such that: ||Xˆ` −X(T )||L2(Ω;H) ≤ Ch` = Ch02 −`. (28) We now apply Lemma 5.5 to this sequence of H-valued random variables. Proposition 5.6. Let Xˆ` := X Nk` κ`,h` where for some real h0 > 0, h` = h02−`, k` = h2` and κ` = h 2 η−1 ` . Let δ > 0 and for ` ≤ L with L, ` ∈ N0 let N` of the multilevel estimator EL fulfil N` ' h 2(1−α) 0 ` 1+δ22(αL−`) for ` ≥ 1 and N0 ' h −2α 0 2 2αL. Assuming that there exists constants C1, α > 0 such that for all ` ∈ N0 the weak error satisfies |E [ ||X(T )||2H − ||Xˆ`|| 2 H ] | ≤ C1h α 0 2 −α`, then there exists a constant C2 not depending on L such that ||E [ ||X(T )||2H ] − EL[||XˆL|| 2 H ]||L2(Ω,R) ≤ C2h α 0 2 −αL. 29 Proof. By Lemma 5.5 we have ||E [ ||X(T )||2H ] − EL[||XˆL|| 2 H ]||L2(Ω,R) ≤ |E [ ||X(T )||2H − ||XˆL|| 2 H ] | + C ( 1 N0 ||Xˆ0|| 4 L4(Ω;R) + L∑ l=1 1 N` ∣ ∣ ∣ ∣ ∣ ∣||Xˆ`|| 2 H − ||Xˆ`−1|| 2 H ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) ) 1 2 . Furthermore, ∣ ∣ ∣ ∣ ∣ ∣||Xˆ`|| 2 H − ||Xˆ`−1|| 2 H ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) = ∣ ∣ ∣ ∣ ∣ ∣ 〈 Xˆ` + Xˆ`−1, Xˆ` − Xˆ`−1 〉 H ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) ≤ C ∣ ∣ ∣ ∣ ∣ ∣Xˆ` − Xˆ`−1 ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;H) ≤ 2C (∣ ∣ ∣ ∣ ∣ ∣X(T )− Xˆ` ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;H) + ∣ ∣ ∣ ∣ ∣ ∣X(T )− Xˆ`−1 ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;H) ) ≤ Ch20(2 −2` + 2−2(`−1)) = Ch20(2 −2` + 4 · 2−2`) ≤ Ch202 −2`, where the first inequality follows from the Cauchy–Schwarz inequality and the fact that the truncated fully approximations are bounded (c.f. the proof of Proposition 5.4). The second inequality is Lemma A.2 while the third is the convergence rate of the strong error, as outlined above. Therefore, using the fact that the truncated fully discrete approximations are bounded once again: 1 N0 ||Xˆ0|| 4 L4(Ω;R) + L∑ `=1 1 N` ∣ ∣ ∣ ∣ ∣ ∣||Xˆ`|| 2 H − ||Xˆ`−1|| 2 H ∣ ∣ ∣ ∣ ∣ ∣ 2 L2(Ω;R) ≤ C ( 1 N0 + L∑ `=1 1 N` h202 −2` ) ≤ Ch2α0 ( 2−2αL + 2−2αL L∑ `=1 `−(1+δ) ) ≤ Ch2α0 (1 + ζ(1 + δ))2 −2αL ≤ Ch2α0 2 −2αL, where ζ denotes the Riemann zeta function. Summing up, using the assumption on the weak convergence, we get our result ||E [ ||X(T )||2H ] − EL[||XˆL|| 2 H ]||L2(Ω,R) ≤ C(hα0 2 −αL + (h2α0 2 −2αL) 1 2 ) ≤ Chα0 2 −αL. We remark that the assumption on the weak convergence rate of Proposition 5.6 holds true at least for α = 1 since the weak error is bounded by the strong error. 30 6 Simulations In this chapter, some numerical experiments in connection to the results of the previous chapters will be described. We will focus on the simulation of the strong and weak approximation errors. However, we start by defining two noise operators (the term G in (8)), G1 and G2, which, we recall, are functions on H = L2([0, 1];R) taking values in LHS(H0;H). 6.1 Geometric Brownian motion in infinite dimensions The first operator G1 is taken from [11, Section 6.4]. For h ∈ H and h0 ∈ H0 it is defined by G1(h)h0 := ∞∑ j=1 〈h, ej〉H 〈h0, ej〉H ej . We have to check the conditions of Assumption 3.1(iv) to see that they hold. To see that for all h ∈ H, G1(h) ∈ LHS(H0; H˙r), note that ||G1(h)||LHS(H0;H˙r) = ||A r 2G1(h)||LHS(H0;H) ≤ tr(Q) 1 2 ||A r 2G1(h)||L(H) by Lemma 2.22 and that by Lemma 2.3 ||A r 2G1(h)|| 2 L(H) ≤ ||A r 2G1(h)|| 2 LHS(H) = ∞∑ i=1 ||A r 2 〈h, ei〉H ei|| 2 H = ∞∑ i=1 ||A r 2 〈h, ei〉H ei|| 2 H = ∞∑ i=1 ||λ r 2 i 〈h, ei〉H ei|| 2 H = ∞∑ i=1 λri 〈h, ei〉 2 H = ||h|| 2 r using Parseval’s identity and Proposition 2.8. Therefore for all r ≥ 0 we have ||G1(h)||LHS(H0;H˙r) ≤ tr(Q)||h||r, which shows the first two assumptions of Assumption 3.1(iv) - the third also follows from this since for h1, h2 ∈ H: ||G(h1)−G(h2)||LHS(H0;H) = ||G(h1 − h2)||LHS(H0;H) ≤ tr(Q)||h1 − h2||H . 31 Finally, the fourth assumption follows from the fact that the ONB of H (see Assump- tion 3.1(i)) is uniformly bounded. Now, this choice of G admits an analytical solution of (8), as is shown in [11, Section 6.4] - for t ∈ [0, T ] we get X(t) = ∞∑ i=1 〈X0, ei〉H exp ( −(λi + µi 2 )t+ µ 1 2 i βi(t) ) ei. (29) So on each basis function of H, the process follows a geometric Brownian motion, which is the reason for the name of this section and this process. Since βi(T ) ∼ N(0, T ), it is easy to show that E [ exp ( 2µ 1 2 i βi(T ) )] = exp (2µiT ) and so by Parseval’s identity we have E [ ||X(T )||2H ] = ∞∑ i=1 〈X0, ei〉 2 H exp (−(2λi + µi)T ) . (30) 6.2 The heat equation with multiplicative Nemytskii-type noise The next operator G2 is analysed in detail in [10]. For this, we let γ : R → R be a Lipschitz continuous function and we then define G2 : H → LHS(H0;H) for h ∈ H, h0 ∈ H0 and x ∈ [0, 1] by (G2(h)h0)[x] := γ(h(x))h0(x). As it is noted in [11, Example 2.23], the analysis of [10, Section 4] shows that G2 is globally Lipschitz and that there furthermore exists a constant C > 0 such that for h ∈ H ||G2(h)||LHS(H0;H) ≤ C tr(Q) (1 + ||h||H) and under the additional assumption ∞∑ i=1 µi sup x∈[0,1] |e′i(x)| 2 <∞ (31) there exists a constant C > 0 such that ||A r 2G(h)||LHS(H0;H) ≤ C(1 + ||h||r) 32 for all r ∈ [0, 12) and h ∈ H˙ r. Since e′i(x) = √ 2ipi cos(ipix), (31) holds if η > 3. This shows the first three assumptions of Assumption 3.1(iv) for any r ∈ [0, 12) while the last one again follows from the uniform bound of the ONB of H. In the remainder of the thesis we take γ(x) to be sin(x). 6.3 Simulation setting In the next few sections, we will use the single level Monte Carlo method to simulate both strong and weak error rates and use the multilevel Monte Carlo method to simulate weak error rates of the equation (8). The computations will be done in MATLAB—, in part on a desktop computer and in part on a computer cluster. We now consider equidistant partitions in space on the domain [0, 1] with h = 1Nh , xj := jh, j = 0, 1, 2, ..., Nh. For each such partition we let Vh be given by the set of all continuous functions on [0, 1] that are piecewise linear on the intervals [xj , xj+1] for j = 0, 1, 2, ...Nh−1 and zero at the boundary of the domain. From [11, Example 3.6] we see that this choice of Vh fulfils Assumption 4.1. The sequence of hat functions (Φj) Nh−1 j=1 defined by their nodal values Φj(xi) = { 1, if i = j, 0, if i 6= j forms a basis for Vh. We will compute the numerical approximations XNkκ,h by recursively solving the numerical equation Xjκ,h −X j−1 κ,h + k(AhX j κ,h) = G h i (X j−1 κ,h )∆W κ,j for j = 1, ..., Nk X0κ,h = IhX0. (32) where ∆W κ,j are the Wiener increments W κ(tj) − W κ(tj−1) and the interpolation operator Ih : H → Vh is defined by Ihf(x) = Nh−1∑ j=1 f(xj)Φj(x) and for Ghi : Vh → LHS(H0;Vh), i ∈ {1, 2} we set Gh1(v)[h0] := κ∑ i=1 〈v, Ihei〉H 〈Ihh0, Ihei〉H Ihei 33 and Gh2(v)[h0] := IhG2(v)Ihh0. We do not exactly know how or if these changes of projections from Ph to different constellations of interpolation operators Ih affect the order of convergence. However, this kind of replacement of projectors seems to be quite common in practice and they make the computation easier. We leave this choice unjustified, subject to future re- search. We now fix η = 5, T = 1 and for ` ∈ N0 we set h` = 2−`, k` = h2` and κ` = h −1 ` to get a series of solutions to (32) which we denote by Xˆ` := X Nk` κ`,h` . We note that by our choice of η it would have been sufficient to set κ` = h −1/2 ` and still have, by Theorem 4.3 with r = 0, ||X(T )− Xˆ`||L2(Ω;H) ≤ 2 −` (33) but for computational reasons we abstain from this and note that still (33) holds with this choice. We also set X0(x) := x−x2 and note that this satisfies Assumption 3.1(v). We end this section by remarking that in practice, given a finite element space Vh, we will estimate the norm in H by ||f ||H ≈ √∑Nh−1 j=1 |f(xj)| 2. 6.4 Results: Strong convergence rates We start by using the singlelevel Monte Carlo estimator to try to estimate the strong error rate. When we say that the strong error converges with a rate of α, we mean in this context that there exists a constant C > 0 such that ||X(T )− Xˆ`||L2(Ω;H) ≤ Ch α l = C2 −αl. (34) In general, when simulating the quantity on the left hand side of this expression, we do not have access to the exact value of X(T ) for a given realisation of X. This is true even when, for the case of G = G1, we can use the expression (29) for t = T , since we cannot generate an infinite number of Brownian motions. Instead, we use a so called reference solution, that is, we replace the expression ||X(T ) − Xˆ`||L2(Ω;H) by ||X˜L − Xˆ`||L2(Ω;H) where we let X˜L = XˆL for L > ` when we consider G = G2 while we let X˜L be given by the expression (29) truncated at κL = 2L when we consider the case G = G1. Now we investigate the strong rate of convergence for different choices of Cµ, which influence the overall level of variance in the realisations of X (recall that the eigenvalues 34 of Q are µi = Cµi−5). For M = 2 · 104, ` = 1, 2, 3, 4, 5 and L = 6 we compute the quantities EM [ ||X˜L − Xˆ`|| 2 H ] 1 2 for G1 and G2. Informed by Proposition 5.3 we hope that these will approximate the strong error well. In Figure 1 we plot these values against 2` for ` = 1, 2, 3, 4, 5 where we use a logarithmic scale for both axes: a so called log-log plot. By [18] it holds that in this graph a convergence of order α corresponds to a line with slope −α. It should be noted that given a fixed Cµ all observations of X˜L and Xˆ` are computed on the same M instances of Q-Wiener processes W κL . Therefore, the variance will be much lower than if we for each level ` had generated Xˆ` independently of the other levels. Also, the same set of Q-Wiener processes was used for the strong error corresponding to G1 and the strong error corresponding to G2, so in Figure 1 the paths are independent of one another with respect to different values of Cµ, but for the same value of Cµ a path in the lower picture is not independent of the corresponding path in the upper picture. The theoretical results of Chapters 3 and 4 make us expect a convergence rate of order 1 as noted in (34). Furthermore, when Cµ = 0, the equation reduces to the deterministic case, and from [12, Chapter 10] we expect a convergence of order 2 in this case. The results of Figure 1 seem to be more or less in line with this. For small values of Cµ we appear to get a rate of order 2 but asymptotically it seems reasonable to expect a convergence of order 1 in this case. When Cµ > 10 the variance is so great that the curves appear very erratic, despite the use of a single set of Q-Wiener processes for all levels `. Next, we choose to look more closely at how the strong convergence behaves when Cµ = 10, so we take levels ` up to ` = 7 with L = 8 and set M = 3 · 103. Despite the relatively low number of samples, this computation takes a very long time if we were to run it on a typical desktop computer. Fortunately, we have access to the Glenn cluster at Chalmers Centre for Computational Science and Engineering (C3SE). The system consists of 379 compute nodes with a total of 6080 cores [17]. With ` and M as above, the total computation for all levels takes 10 hours and 32 seconds using 8 computing nodes with 128 cores. The result is shown in Figure 2. From this we can see that the trend of a convergence of order 1 seems to hold, even with this somewhat low number of samples. 35 100 101 10210 −8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 steps in spatial partition stro ng e rror s Order 1Order 2Deterministic equationCmu=1Cmu=3Cmu=10Cmu=20Cmu=30Cmu=40 100 101 10210 −8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 steps in spatial partition stro ng e rror s Order 1Order 2Deterministic equationCmu=1Cmu=3Cmu=10Cmu=20Cmu=30Cmu=40 Figure 1: Approximations of the strong error rate for ` = 1, 2, 3, 4, 5 and different values of Cµ. The upper image shows the case G = G1 and the lower image G = G2. For each Cµ, all levels of error approximations have been computed using the same set of M = 2 · 104 realisations of the Q-Wiener processes. To utilize the resources of the cluster in an optimal way, it is crucial to make sure that the code can run in parallel. In our case, this essentially amounts to changing the for loop of our Monte Carlo computation to a so called parfor loop, which is a feature of the MATLAB Distributed Computing Server—, installed on the cluster. In the user manual of the Parallel Computing Toolbox— [15] we can read: A parfor-loop is useful in situations where you need many loop iterations of a simple calculation, such as a Monte Carlo simulation. parfor divides the loop iterations into groups so that each worker executes some portion of the total number of iterations. parfor-loops are also useful when you have loop iterations that take a long time to execute, because the workers can execute iterations simultaneously. From this manual we also note that each worker is assigned a unique random number stream so we can be sure that we are getting M independent samples of the Q-Wiener process. The code used to produce the results of Figure 1 and 2 can be found in Section B. 6.5 Results: Weak convergence rates Next, we compare the strong convergence rates to the weak rates. These are defined completely analogously to how we defined the strong rate in (34). To our knowledge, no simulations of the weak convergence rates of fully discrete approximations (using the FEM) of SPDE with multiplicative noise have been published, so this could be of interest for future research. A common ”rule of thumb” (see e.g. the introduction of [11]) within this field is that the weak rate of convergence is twice that of the strong rate. Therefore, in particular we investigate whether one can achieve a rate of order 2 in the same context as the previous simulation, that is, when we consider ` from 1 to 7, L = 8 and we take M = 3 · 103. Now, as we recall from Section 5 there are (at least) two ways of approximating it with a Monte Carlo method, namely ∣ ∣ ∣E [ ||X˜L|| 2 H ] − EM [ ||Xˆ`|| 2 H ]∣ ∣ ∣ and ∣ ∣ ∣EM [ ||X˜L|| 2 H − ||Xˆ`|| 2 H ]∣ ∣ ∣ . which we refer to as the weak error rate of type I and type II respectively. 37 100 101 102 10310 −9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 steps in spatial partition err ors G1 strong errorsG2 strong errorsDeterministic solution, strong errorsOrder 1Order 2 Figure 2: Approximations of the strong error rate for ` = 1, 2, 3, 4, 5, 6, 7 for Cµ = 10. All levels of error approximations has been computed using the same set of M = 2 · 104 realizations of the Q-Wiener processes with a reference solution at level L = 8. For the weak error of type I we choose to estimate the quantity E [ ||X˜L||2H ] for the case G = G1 by the deterministic quantity 106∑ i=1 〈X0, ei〉 2 H exp (−(2λi + µi)) (35) since we have access to the analytical solution as expressed in (30). For the case of G = G2 we do not have this, so we instead estimate E [ ||X˜L||2H ] by EM [ ||XˆL||H ] . In this case we will estimate E [ ||X˜L||2H ] on a different set of Q-Wiener processes than that used to generate EM [ ||Xˆ`||2H ] . For the weak error of type II we generate both ||X˜L||2H and ||Xˆ`|| 2 H on the same set of Q-Wiener processes where we again make use of (29) truncated at κ = h−1L and h −1 ` respectively to generate these in the case of G = G1. The resulting simulation is shown in Figure 3. In this case we see that the error is much smaller than when we simulated the strong error in Figure 2 which is what we expect. However, it is hard to gauge any particular rate of convergence since the variance seems to dominate the weak error rate. We also note that we appear to have no rate of convergence at all when we consider the case G = G1 and an independent (in this case deterministic) reference solution, an issue that we will return to shortly. For G = G2, the addition of an independent estimate of the reference solution seems to have little to no influence on the behaviour of the simulation of the weak rate of convergence. The behaviour of the last two points could be due to the fact that we consider a reference solution at the next (L=8) level instead of taking the exact solution. The computation to create this picture takes 19 hours, 59 minutes and 32 seconds using 8 computing nodes with 128 cores. In Figure 4 we repeat these simulations for Cµ = 5. We note that we get clearer indica- tions of a rate of convergence which is somewhere between a rate of 1 and 2. We now also include the case of the type I error when all of the approximations of EM [ ||Xˆ`||2H ] are independent of one another. In this case we have increased M to 105, but despite this, the noise is so great that it is impossible to say anything about the order of convergence. The computation time for these is presented in Table 1. Remark 6.1. The complete lack of convergence in the case of a deterministic reference solution in Figure 3 needs to be addressed. To do this purpose we can use the fact that we have an analytical expression of X(T ) to investigate the behaviour of EM [ ||XˆL||2H ] 39 100 101 102 10310 −13 10−12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 steps in spatial partition err ors Order 2G2 weak errors. independent reference solutionG2 weak errorsG1 weak errors. independent reference solutionG1 weak errorsDeterministic solution, weak errors Figure 3: Approximations of the weak error rate for ` = 1, 2, 3, 4, 5, 6, 7 and Cµ = 10. In one set of cases the reference solution has been generated on an independent set of Q-Wiener processes and in one set it has been generated on the same set as the observations for the levels ` = 1, 2, 3, 4, 5, 6, 7. Level ` Time for G = G1 Time for G = G2 2 00:01:28 00:00:49 3 00:01:09 00:00:38 4 00:00:51 00:00:51 5 00:02:31 00:02:14 6 00:21:09 00:16:28 7 07:13:20 04:57:38 Table 1: Time needed to compute the independent weak error estimates in Figure 4 using 8 computing nodes with 128 cores. The time for ` = 1 was not recorded. 105 samples were taken at each level. 100 101 102 10310 −12 10−11 10−10 10−9 10−8 10−7 10−6 10−5 10−4 steps in spatial partition we ak e rror s Order 2G2 weak errors. independent reference solutionG2 weak errorsG1 weak errors. independent reference solutionG1 weak errorsG2 independent estimates at each levelG1 independent estimates at each level Figure 4: Approximations of the weak error rate for ` = 1, 2, 3, 4, 5, 6, 7 and Cµ = 5. For the lines: in one set of cases the reference solution has been generated on an independent set of Q-Wiener processes and in one set it has been generated on the same set as the observations for the levels ` = 1, 2, 3, 4, 5, 6, 7. For the dots: all weak error estimates have been generated on independent sets of 105 Q-Wiener processes with a reference solution generated at L = 8 using 104 Q-Wiener processes. when M is large and L = 8. Using (29) and the fact that for all i ∈ N, βi(T ) ∼ N(0, 1), we estimate ||XˆL||2H by || NhL∑ i=1 〈X0, ei〉H exp ( −(λi + µi 2 ) + µ 1 2 i Zi ) IhLei|| 2 H where Zi ∼ N(0, 1) i.i.d. and we use this to compute EM [ ||XˆL||2H ] which we plot against M ranging from 1 to 106 in Figure 5 (note that for an increase in M we just add another observation of ||XˆL||2H as opposed to generating another M + 1 number of observations). We see that for the final value of M the value of EM [ ||XˆL||2H ] is almost entirely attributable to one single observation of ||XˆL||2H . This indicates that the distribution is highly skewed, so that a large number of observations will be very close to zero but the sample mean will be much larger due to the presence of very large outliers. Another (not entirely rigorous) way to think about this is to realize that since 〈X0, ei〉 2 H decreases rapidly with increasing values of i and λi + µi 2 is increasing in i, we should often have || NhL∑ i=1 〈X0, ei〉H exp ( −(λi + µi 2 ) + µ 1 2 i Zi ) IhLei|| 2 H ≈ || 〈X0, e1〉H exp ( −(λi + µ1 2 ) + µ 1 2 1 Z1 ) IhLe1|| 2 H which has a log-normal distribution. A well-known fact of the log-normal distribution is that it is highly skewed to the right when the variance of the underlying normal distributed variable (in this case µ1Z1) is big, so we should expect the presence of a small number of very large outliers which in turn means that in the majority of cases, for reasonable values of M , EM [ ||XˆL||2H ] will be relatively far from the true mean. 6.6 Results: Multilevel Monte Carlo estimations We end our numerical exploration by implementing the multilevel Monte Carlo estimator for the weak error. As in Figure 4 we let Cµ = 5 and for ` = 0, 1, 2, ... we set h` = h02−`, k` = h2` and κ` = h −1 ` to get a series of solutions to (32) which we denote by Xˆ` := X Nk` κ`,h` . For computational reasons we let h0 = 2−1. 42 0 1 2 3 4 5 6 7 8 9 10 x 105 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 10−6 number of samples est ima te o f L2 no rm Figure 5: A plot of EM [||Xˆ8||2H ] for M ranging from 1 to 10 6. Let us make the bold assumption that the weak error rate actually is twice that of the strong error rate. Then we have |E [ ||X(T )||2H − ||Xˆ`|| 2 H ] | ≤ C1h 2 02 −2` and so, from Proposition 5.6 we get ||E [ ||X(T )||2H ] − EL[||XˆL|| 2 H ]||L2(Ω,R) ≤ C2h 2 02 −2L. Let us now test this for L = 1, 2, 3, 4, 5. As in Section 6.5, we replace E [ ||X(T )||2H ] by (35) in the case of G = G1 and in the case of G = G2 we replace it by a reference solution EM [||Xˆ7||2H ] where we let M = 10 4. We generate all quantities independently of one another, but when computing a single multilevel estimate EL[||XˆL||2H it is vital to let the differences ||Xˆ`||2H − ||Xˆ`−1|| 2 H be computed on the same Q-Wiener process. In the case of G = G1, the total computation time for all levels was 10 minutes and 39 seconds and in the case of G = G2, the total computation time was 9 minutes. The computation time is thus reduced by more than a factor of two when compared to taking M = 105 in the singlelevel estimator. The errors ∣ ∣ ∣E [ ||X(T )||2H ] − EL[||XˆL||2H ] ∣ ∣ ∣ are shown in the upper part of Figure 6. For comparison purposes, we have also included the independent weak errors from Figure 4. We note that for both noise operators, we are able to achieve similar results to a smaller computational cost using the multilevel estimator. In the lower part of Figure 6 we take the L2-average of 100 realizations of the multilevel algorithm, that is we plot E100 [∣ ∣ ∣E [ ||X(T )||2H ] − EL[||XˆL|| 2 H ] ∣ ∣ ∣ 2 ] 1 2 and we see that we get an order of convergence which is very close to 2. 6.7 Concluding discussion In the numerical experiments outlined above, we first tried to simulate the strong rate of convergence predicted by the theory of the previous chapters of this thesis. The results seem to be consistent with this theory, although the noise makes it hard to say anything for certain about this. In the case of weak convergence, we noted that when we tried to estimate the weak errors using the same set of Q-Wiener processes there were indications that the rule of 44 100 101 10210 −9 10−8 10−7 10−6 10−5 10−4 10−3 steps in spatial partition we ak e rror s Order 2G2 weak errors, multilevel estimatesG1 weak errors, multilevel estimatesG2 independent estimates at each levelG1 independent estimates at each level 100 101 10210 −9 10−8 10−7 10−6 10−5 10−4 10−3 steps in spatial partition we ak e rrro s G1 weak errors, average multilevel estimatesG2 weak errors, average multilevel estimatesOrder 2 Figure 6: Upper picture: A comparison between multilevel estimates computed at ` = 1, 2, 3, 4, 5 and the corresponding (single level) weak errors where each level is simulated independently of one another of Figure 4. Lower picture: Two L2 averages of 100 realisations of the multilevel schemes of the upper picture. thumb stating that the weak rate often is twice the strong rate seemed to hold. However, especially in the analytical case, when comparing the estimates to a reference solution which had been computed independently of these, the pattern was not very clear. When we finally had all the estimates computed independently of one another there were close to no indications of convergence, despite the relatively expensive computations. This shows how it, even in these relatively simple cases, can be hard to actually estimate quantities of the solution in practice due to the variance of these, and perhaps due to properties of their distribution as well. However, in the case of the multilevel estimates, we were able to get a rate of convergence that seems to be close to 2 despite having each level estimate be independent of one another. The pattern was much clearer, though, when an average of multiple runs of the multilevel algorithm was taken. We stress, though, that the application of the multilevel algorithm assumes that the weak rate of convergence actually is two, something that we have not provided theory for. The result nevertheless shows the practical value of the multilevel algorithm and together with the results on the singlelevel weak error rates, may indicate some interesting directions for future research. 46 A Appendix This appendix contains two useful inequalities that do not fit in well elsewhere. We start with the following discrete Gro¨nwall inequality from [5]. Lemma A.1. [5, page 280] Let (an)n∈N, (bn)n∈N and (cn)n∈N be non-negative se- quences such that an ≤ bn + n−1∑ k=0 ckak for n ≥ 0. Then an ≤ max θ∈{0,1,...,n} bθ n−1∏ k=0 (1 + ck) The second result is an arithmetic inequality which we use in several places throughout the thesis and prove below. Lemma A.2. Let n ∈ N and let ai, i = 1, 2, ...n be non-negative numbers. Then ( n∑ i=1 ai )p ≤ np−1 n∑ i=1 api Proof. Consider the measure space (Ω˜,A, µc) with Ω˜ = {1, 2, ..., n}, A = P(Ω˜), the power set of Ω˜, and µc being the counting measure. Let f, g : Ω˜→ R be defined by f(k) = 1 and g(k) = ak. Then, by Ho¨lder’s inequality: n∑ i=1 ai = ||fg||1 ≤ ||f ||q||g||p = n 1 q ( n∑ i=1 api ) 1 p where p−1 + q−1 = 1. The result now follows by raising each side of the inequality to the p-th power. 47 B Source code B.1 HSHE strong errors This function is used to produce the results of Figure 1 and 2. function [strong errors G1, ... strong errors G2]=HSHE strong errors(maximum ell, M, eigenconstant, ... eigendecay) % Produces the strong error approximations for the homogeneous of Figure X. % Uses M realizations of the $Q$-Wiener process with % Qe j=eigenconstant*jˆ(-eigendecay) to compute approximations for h % ranging from 2ˆ-1 to 2ˆ-maximum ell. % A reference level of maximum ell+1 is used to compute the reference % solution which is exact for G 1 and approximated for G 2. %M samples of the error in L2-norm squared error samples G1=zeros(M,maximum ell); error samples G2=zeros(M,maximum ell); ref N h=2ˆ(maximum ell+1); % Inverse space step size for reference level rng('shuffle'); parfor m = 1:M % Generate a Wiener process - each row is a realisation of \beta j W=cumsum([ zeros(ref N h-1,1), randn(ref N h-1,ref N hˆ2)],2)./ref N h; % Generate the 'exact' reference solution for G=G 1 ref X G1 = zeros(ref N h+1,1); start = ((-1).ˆ((1:ref N h-1)+1)+1)'.*(1:ref N h-1).ˆ(-3)'; % Initial ... value is x-xˆ2 start = sqrt(8)/piˆ3*start; xgrid = linspace(0,1,ref N h+1); E = sqrt(2)*sin(pi*xgrid(2:end-1)'*(1:(ref N h-1))); % E ij=e j(x i), ... symmetric ref X G1(2:end-1) = E*(start.*exp(-piˆ2*(1:(ref N h-1))' .ˆ ... 2-eigenconstant/2 * (1:(ref N h-1))' .ˆ ... (-eigendecay)+(sqrt(eigenconstant) * (1:(ref N h-1))' .ˆ ... (-eigendecay/2)) .* W(:,end))); % Generate the reference solution for G=G 2 [~,ref X G2] = solve HSHE(ref N h, ref N hˆ2, eigenconstant, ... eigendecay, W, ref N hˆ2); % For each level, calculate the fully discrete approximation and record % the sample of the errors for level=1:maximum ell 48 [Xpath G1, Xpath G2] = solve HSHE(2ˆlevel, 2ˆ(2*level), ... eigenconstant, eigendecay, W, ref N hˆ2); error samples G1(m,level) = sum( (interp1( ... linspace(0,1,2ˆlevel+1), Xpath G1(:,end), ... linspace(0,1,ref N h+1))'-ref X G1(:)).ˆ2 ) / ref N h; error samples G2(m,level) = sum( ( ... interp1(linspace(0,1,2ˆlevel+1), Xpath G2(:,end), ... linspace(0,1,ref N h+1))'-ref X G2(:,end)).ˆ2 ) / ref N h; end end % Take the square of the mean value of the errors to get the final estimate strong errors G1 = sqrt(mean(error samples G1)); strong errors G2 = sqrt(mean(error samples G2)); end function [Xpath G1, Xpath G2] = solve HSHE(spacesteps, coarsetimesteps, ... eigenconstant, eigendecay, Wfine, finetimesteps) % Given a realisation of W, solves HSHE for G1 and G2 by implementing the % backwards Euler scheme. h = 1/coarsetimesteps; % time step size xgrid = linspace(0,1,spacesteps+1); % Initialize memory for output variable Xpath G1 = zeros(spacesteps+1, coarsetimesteps+1); Xpath G2 = zeros(spacesteps+1, coarsetimesteps+1); % Translate the brownian motions to coarsetimesteps, truncate the ... Karhunen-Loeve expansion Wcoarse=zeros(spacesteps-1,coarsetimesteps+1); for j=1:(spacesteps-1) Wcoarse(j,:)=interp1(linspace(0,1,finetimesteps+1), Wfine(j,:), ... linspace(0,1,coarsetimesteps+1)); end % Generate matrices A = spacesteps*(2*diag(ones(spacesteps-1,1)) - ... diag(ones(spacesteps-2,1),1) - diag(ones(spacesteps-2,1),-1)); % ... stiffness M = (6*spacesteps)ˆ-1*(4*diag(ones(spacesteps-1,1)) + ... diag(ones(spacesteps-2,1),1) + diag(ones(spacesteps-2,1),-1)); % mass E = sqrt(2)*sin(pi*xgrid(2:end-1)'*(1:(spacesteps-1))); % E ij=e j(x i), ... symmetric D = sqrt(eigenconstant)*diag((1:(spacesteps-1)).ˆ(-eigendecay/2)); % ... eigenvalues of Q P = E*D; O = E*M; % for efficiency 49 % Evaluate initial condition Xpath G1(:,1) = initial(xgrid)'; Xpath G2(:,1) = initial(xgrid)'; % Compute exact path implementing the backward Euler scheme for j = 1:coarsetimesteps Xpath G1(2:end-1,j+1) = (M + h*A) \ (M*(Xpath G1(2:end-1,j) + ... P*diag(Wcoarse(:,j+1)-Wcoarse(:,j))*O*Xpath G1(2:end-1,j))); Xpath G2(2:end-1,j+1) = (M + h*A) \ (M*(Xpath G2(2:end-1,j) + ... sin(Xpath G2(2:end-1,j)) .* ... (P*diag(Wcoarse(:,j+1)-Wcoarse(:,j))*ones(spacesteps-1,1)))); end end function init = initial(xgrid) % Returns X(0) on a given xgrid init = xgrid-xgrid.ˆ2; end B.2 multilevel estimate G2 This function computes the multilevel estimate used in 6.6 in the case of G = G2. The code for the case G = G1 is entirely analogous and is therefore not provided here. function [MLMC]=multilevel estimate G2(L, eigenconstant, eigendecay) delta = 0; h0=1/2; % space list = (h0)ˆ(-1)*2.ˆ((0:L)); % List of space steps, h 0=2ˆ{-1} time list = space list.ˆ2; % List of time steps M list = round([space list(end)ˆ4, ... space list(2:end).ˆ(-2).*space list(end)ˆ4.*(1:L).ˆ(1*(1+delta))]); % ... Monte Carlo iterations MC = 0; % MC estimate at each level tic; % X 0 : parfor m = 1:M list(1) % Generate a Wiener process - each row is a realisation of \beta j W=cumsum([ zeros(space list(1)-1,1), ... randn(space list(1)-1,time list(1))],2)./space list(1); 50 X = solve HSHE(space list(1),time list(1), eigenconstant, ... eigendecay,W); L2norm = sum( X(:,end).ˆ2 )/(space list(1)); MC = MC + L2norm/M list(1); end MLMC = MC; % X 1 to X L for k = 2:length(space list) MC = 0; finetimesteps=time list(k); coarsetimesteps=time list(k-1); finespacesteps=space list(k); coarsespacesteps=space list(k-1); M=M list(k); parfor m = 1:M % Generate a Wiener process - each row is a realisation of \beta j fineW=cumsum([ zeros(finespacesteps-1,1), ... randn(finespacesteps-1,finetimesteps)],2)./finespacesteps; % Interpolate Wiener process to the coarser time grid coarseW=zeros(coarsespacesteps-1,coarsetimesteps+1); for j=1:(coarsespacesteps-1) coarseW(j,:)=interp1(linspace(0,1,finetimesteps+1), ... fineW(j,:),linspace(0,1,coarsetimesteps+1)); end X = solve HSHE(finespacesteps,finetimesteps, eigenconstant, ... eigendecay,fineW); Y = solve HSHE(coarsespacesteps,coarsetimesteps, eigenconstant, ... eigendecay,coarseW); L2normX = sum( X(:,end).ˆ2 )/finespacesteps; L2normY = sum( Y(:,end).ˆ2 )/coarsespacesteps; L2norm = L2normX-L2normY; MC = MC +L2norm/M; % Singlelevel estimate of difference end %toc MLMC = MLMC+MC; % Update multilevel estimate end toc end function [Xpath] = solve HSHE(spacesteps, timesteps, eigenconstant, ... eigendecay, W) % Given a realisation of W, solves HSHE for G1 and G2 by implementing the % backwards Euler scheme. h = 1/timesteps; % time step size xgrid = linspace(0,1,spacesteps+1); 51 % Initialize memory for output variable Xpath = zeros(spacesteps+1, timesteps+1); % Generate matrices A = spacesteps*(2*diag(ones(spacesteps-1,1)) - ... diag(ones(spacesteps-2,1),1) - diag(ones(spacesteps-2,1),-1)); % ... stiffness M = (6*spacesteps)ˆ-1*(4*diag(ones(spacesteps-1,1)) + ... diag(ones(spacesteps-2,1),1) + diag(ones(spacesteps-2,1),-1)); % mass E = sqrt(2)*sin(pi*xgrid(2:end-1)'*(1:(spacesteps-1))); % E ij=e j(x i), ... symmetric D = sqrt(eigenconstant)*diag((1:(spacesteps-1)).ˆ(-eigendecay/2)); % ... eigenvalues of Q P = E*D; % Evaluate initial condition Xpath(:,1) = initial(xgrid)'; % Compute exact path implementing the backward Euler scheme for j = 1:timesteps Xpath(2:end-1,j+1) = (M + h*A) \ (M*(Xpath(2:end-1,j) + ... sin(Xpath(2:end-1,j)) .* ... (P*diag(W(:,j+1)-W(:,j))*ones(spacesteps-1,1)))); end end 52 References [1] A. Andersson and S. Larsson. Weak convergence for a spatial approximation of the nonlinear stochastic heat equation, 2012. arXiv:1212.5564. [2] A. Barth and A. Lang. Milstein approximation for advection-diffusion equations driven by multiplicative noncontinuous martingale noises. Appl. Math. Optim., 66(3):387–413, 2012. [3] A. Barth and A. Lang. Multilevel Monte Carlo method with applications to stochas- tic partial differential equations. Int. J. Comput. Math., 89(18):2479–2498, 2012. [4] A. Barth, A. Lang, and C. Schwab. Multilevel Monte Carlo method for parabolic stochastic partial differential equations. BIT, 53(1):3–27, 2013. [5] D. S. Clark. Short proof of a discrete Gronwall inequality. Discrete Appl. Math., 16(3):279–281, 1987. [6] G. Da Prato and J. Zabczyk. Stochastic equations in infinite dimensions, volume 44 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 1992. [7] A. Debussche. Weak approximation of stochastic partial differential equations: the nonlinear case. Math. Comp., 80(273):89–117, 2011. [8] G. B. Folland. Real analysis. Modern techniques and their applications. 2nd ed. New York, NY: Wiley, 2nd ed. edition, 1999. [9] L. Gawarecki and V. Mandrekar. Stochastic differential equations in infinite dimen- sions with applications to stochastic partial differential equations. Probability and its Applications (New York). Springer, Heidelberg, 2011. [10] A. Jentzen and M. Ro¨ckner. Regularity analysis for stochastic partial differential equations with nonlinear multiplicative trace class noise. J. Differential Equations, 252(1):114–136, 2012. [11] R. Kruse. Strong and weak approximation of semilinear stochastic evolution equa- tions, volume 2093 of Lecture Notes in Mathematics. Springer, Cham, 2014. [12] S. Larsson and V. Thome´e. Partial differential equations with numerical meth- ods, volume 45 of Texts in Applied Mathematics. Springer-Verlag, Berlin, 2009. Paperback reprint of the 2003 edition. 53 [13] G. J. Lord, C. E. Powell, and T. Shardlow. An introduction to computational stochastic PDEs. Cambridge Texts in Applied Mathematics. Cambridge University Press, New York, 2014. [14] B. D. MacCluer. Elementary functional analysis, volume 253 of Graduate Texts in Mathematics. Springer, New York, 2009. [15] Mathworks. Parallel computing toolbox—: User’s guide. http://se.mathworks. com/help/pdf_doc/distcomp/distcomp.pdf, March 2015. Accessed 3 May 2015. [16] C. Pre´voˆt and M. Ro¨ckner. A concise course on stochastic partial differential equa- tions, volume 1905 of Lecture Notes in Mathematics. Springer, Berlin, 2007. [17] Chalmers Centre for Computational Science and Engineering. Hardware Glenn. http://www.c3se.chalmers.se/index.php/Hardware_Glenn, September 2012. Accessed 10 May 2015. [18] O. Runborg. Lecture notes in numeriska metoder grundkurs II. http://www.csc. kth.se/utbildning/kth/kurser/DN1240/numfcl12/Lecture6.pdf, 2012. Ac- cessed 25 April 2015. [19] V. Thome´e. Galerkin finite element methods for parabolic problems, volume 25 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, second edition, 2006. 54