The effect on cosmological parameter estimation of a parameter dependent covariance matrix

Cosmological large-scale structure analyses based on two-point correlation functions often assume a Gaussian likelihood function with a fixed covariance matrix. We study the impact on cosmological parameter estimation of ignoring the parameter dependence of this covariance matrix, focusing on the particular case of joint weak-lensing and galaxy clustering analyses. Using a Fisher matrix formalism (calibrated against exact likelihood evaluation in particular simple cases), we quantify the effect of using a parameter dependent covariance matrix on both the bias and variance of the parameters. We confirm that the approximation of a parameter-independent covariance matrix is exceptionally good in all realistic scenarios. The information content in the covariance matrix (in comparison with the two point functions themselves) does not change with the fractional sky coverage. Therefore the increase in information due to the parameter dependent covariance matrix becomes negligible as the number of modes increases. Even for surveys covering less than $1\%$ of the sky, this effect only causes a bias of up to ${\cal O}(10\%)$ of the statistical uncertainties, with a misestimation of the parameter uncertainties at the same level or lower. The effect will only be smaller with future large-area surveys. Thus for most analyses the effect of a parameter-dependent covariance matrix can be ignored both in terms of the accuracy and precision of the recovered cosmological constraints.

Cosmological large-scale structure analyses based on two-point correlation functions often assume a Gaussian likelihood function with a fixed covariance matrix. We study the impact on cosmological parameter estimation of ignoring the parameter dependence of this covariance matrix, focusing on the particular case of joint weak-lensing and galaxy clustering analyses. Using a Fisher matrix formalism (calibrated against exact likelihood evaluation in particular simple cases), we quantify the effect of using a parameter dependent covariance matrix on both the bias and variance of the parameters. We confirm that the approximation of a parameter-independent covariance matrix is exceptionally good in all realistic scenarios. The information content in the covariance matrix (in comparison with the two point functions themselves) does not change with the fractional sky coverage. Therefore the increase in information due to the parameter dependent covariance matrix becomes negligible as the number of modes increases. Even for surveys covering less than 1% of the sky, this effect only causes a bias of up to O(10%) of the statistical uncertainties, with a misestimation of the parameter uncertainties at the same level or lower. The effect will only be smaller with future large-area surveys. Thus for most analyses the effect of a parameter-dependent covariance matrix can be ignored both in terms of the accuracy and precision of the recovered cosmological constraints.

I. INTRODUCTION
We are entering a phase in which cosmological galaxy surveys will have remarkable constraining power. This arises from the fact that they will cover large areas of the sky, to large depths and, in consequence, with a high number density [1 -6]. As a result, the statistical power of these surveys will be astounding. To access their scientific potential, it will be necessary to control any systematic effects with exquisite accuracy. While there has been much focus on the instrumental and astrophysical limitations of any particular survey, care is now being given to the analysis pipeline and the approximations which are currently being used. In this paper we focus on one such aspect: the covariance matrix in the likelihood analysis used for constraining cosmological parameters.
Given a set of cosmological observations, for example in the form of a map, a galaxy catalogue, a correlation function or power spectrum, the process for estimating cosmological parameters is straightforward. One feeds the data into a likelihood function which assesses the likelihood of a given theory, given the data. With a judicious use of Bayes theorem, one explores a range of theoretical parameters (that can be cosmological but also can characterise underdetermined properties of the survey or astrophysical effects that may be present) to find the subspace of parameters which is best compatible with the data. In doing so, one also derives uncertainties on these parameters. It is therefore equally important for * Electronic address: darsh.kodwani@physics.ox.ac.uk † Electronic address: david.alonso@physics.ox.ac.uk ‡ Electronic address: pedro.ferreira@physics.ox.ac.uk this procedure to produce unbiased parameter estimates and precise error bars.
There are a number of effects that can plague cosmological parameter estimation. For a start there are errors due to the inaccurate modelling of simulated data, x(p) and the corresponding, covariance matrix Σ estimated from them. These can come from not understanding the underlying model (e.g. baryonic effects in the matter power spectrum [7][8][9][10][11], the precise nature of the galaxymatter connection [12,13] or the impact of intrinsic alignments [14,15]). Then there are errors due to incorrect assumptions about the statistics of the data and covariance matrix. For example there may be non-Gaussian corrections to Σ (i.e. corrections associated to the non-Gaussian nature of the fields being correlated). Then there are errors due to an incorrect model of the likelihood (e.g. assuming a Gaussian function of the data vector by default). Finally, even within the Gaussian approximation, there are errors that may arise from ignoring the parameter dependence of the covariance matrix. It is on this last source of errors that we will focus in this paper.
There is a growing literature on the accuracy of likelihood functions and its impact on parameter estimation.
A key focus has been on how well one needs to know the covariance matrix and how Gaussian the likelihood function is. Covariance matrices are generally estimated from a large number of simulations; there is now a clear understanding of the errors associated with such a process and how to correct for them [16][17][18][19][20][21][22][23][24] . A recent proposal in [25] provides a novel data compression algorithm that reduces the number of simulations needed by a factor of 1000 to estimate the covariance matrix for Gaussianly distributed data. There have also been attempts at constructing better models, or approximations, for the likelihood function that include the fact that it may be non-Gaussian [26,27].
There have been attempts to analytically calculate the non-Gaussian contributions to the covariance matrix. The authors of [28] use a perturbation theory approach to compute the non-Gaussian contribution of the matter power spectrum to 1-loop and compare the results to simulations. A linear response approach to calculating the non-Gaussian contribution of the covariance matrix is presented in [29,30]. In the case of large-scale structure observables like the cosmic shear power spectrum, it has been pointed out that that the dominant contribution to the non-Gaussian covariance is the term that comes from super-sample covariance (SSC) [31]. Separate-universe simulations have been used to evaluate the response functions for various observables such as weak lensing and the matter power spectrum [32,33].
In this paper, our aim is to quantify the impact on parameter estimation of including (or not) the parameter dependence of the covariance matrix for upcoming photometric redshift surveys. Previous work on this topic has focused on the Fisher information of a parameterdependent covariance in the two-point likelihood of Gaussian random fields [34], and on the overall parameter dependence of the cosmic shear two-point covariance [35], where the authors quantified the effect on the likelihood contours of σ 8 and Ω m , using both analytic and ray tracing simulations. Here we will focus on galaxy clustering and galaxy shear for a tomographic survey, and we will systematically analyse the effect of a parameterdependent covariance matrix (including the effects of super-sample covariance) in terms of information content, parameter biases and final uncertainties.
The paper is structured as follows. In Section II we present the various types of likelihoods which we will be studying and show how we can use the Fisher forecast formalism to find an estimate of the bias and errors of the model parameters; we consider a few simplified models to get an idea of what one should expect in the more general, realistic case. In Section III we look at a realistic survey scenario, involving a combination of weak lensing and galaxy clustering and calculate the bias and compare the errors depending on the choice of likelihood. In Section IV we discuss the implications of our results while in Appendix A we present a few, key, technical aspects of our calculation of the covariance matrix.

II. APPROXIMATING THE LIKELIHOOD
The aim of this section we quantify the impact of a parameter-dependent covariance on the bias and variance of parameter estimates. The impact of biases on the data vector has been quantified in e.g. [36,37], and the parameter dependence of the covariance has been explored in [35,38]. In particular we calculate the bias in inferred parameters explicitly due to the parameter dependence of the covariance matrix, which is a key result that hasn't been calculated before as far as we know.
In general, for a Gaussian data vector d, the likelihood is given by where both the mean t and the covariance matrix Σ may in principle depend on the parameters θ.
To begin with, we will start with the simple scenario, well known in the literature, in which the data is a single Gaussian sky map. It will be instructive to review some basic points about Gaussian likelihoods and the role covariance matrices play in them. In doing so, we will identify two approximations to the true likelihood function (for parameter-dependent and independent covariances) and how one might assess the difference in the parameter estimates they lead to.
A. Likelihoods and covariances: the case of a 2-dimensional Gaussian field Consider the harmonic coefficients a m of a single fullsky map. Under the assumption that they are statistically isotropic, their covariance matrix is diagonal, and given by the angular power spectrum C : If we further assume that the mean of the map is zero, and that it is Gaussianly distributed, its likelihood is completely determined by the power spectrum, and takes the form: where C e ≡ m=− |a m | 2 /(2 + 1) is the estimated power spectrum. For simplicity, let us focus on a single multipole order , and we will also use the goodness of fit G = −2 log p. Assuming flat priors on all parameters, the posterior distribution for θ is simply given by this likelihood, and therefore It is often desirable to compress the data into a quadratic summary statistic, such as C e . If we consider C e to be the data, then the likelihood is described by the Wishart distribution [39] given by where we have used the label "exact" to distinguish this distribution from the two approximations below. The extra term in this expression is a Jacobian/volume term that comes about due to the change of variables from a e m to C e . This expression effectively tells us that z = (2 + 1)C e /C obeys a χ 2 2 +1 distribution. For any distribution G, the Fisher information matrix is given by where we have used the shorthand ∂ µ = ∂/∂θ µ ≡ ∂/∂ θ.
For the exact likelihood in Eq. 5 we find Until now, all our calculations have been exact in the limit of Gaussian, full-sky maps. Let us now consider a simplification that can be made if we assume that C e itself is Gaussianly distributed. This is a valid approximation in the large-limit, since the number of independent modes contributing to a given C grows like , and the Central Limit Theorem (CLT) eventually applies. In this approximation, and under the assumption that the covariance is parameter-independent, the likelihood takes the form where the label PI identifies this distribution with the case of a "parameter-independent" covariance matrix, and where we have defined the power spectrum covari- Wick's theorem can be used to relate Σ f to the underlying fiducial power spectrum C f , finding the well-known result where C f is a fiducial power spectrum which is independent of the cosmological parameters. One can show that this approximation can be obtained by Taylor expanding and assuming C e C f . This approximate likelihood is similar to the original exact likelihood in two ways. First, their Fisher matrices coincide if C f is the true underlying power spectrum Secondly, in the simplest case, where our only free parameter is the amplitude of the power spectrum itself (i.e. θ ≡ C ), both likelihoods yield unbiased maximum likelihood estimators for this parameter: Consider now a different approach and take the large limit of Eq (5). Applying the CLT we have that ξ obeys a Normal distribution, N [0, + 1/2], i.e.: where now the covariance matrix depends on θ (and hence the label "PD"). Accounting for this parameter dependence, the Fisher matrix for this distribution is and the maximum likelihood estimator for the power spectra isĈ where in the second line we have kept only the first-order term of the large-expansion of the first line. Therefore, G PD reduces to G exact in the large-limit.
We thus see that in this particular case, while assuming a parameter independent covariance matrix may lead to unbiased parameter estimates, it necessarily leads to a mis-estimate of the parameter uncertainties (unless the chosen fiducial covariance Σ f is the underlying true one). A parameter dependent covariance matrix (assuming a Gaussian approximation for the likelihood) leads to both biased parameter estimates and a misestimate of the uncertainties but there is a well defined limit in which it recovers both correctly and this limit is set by the number of modes being considered in the analysis. This point was made by [34], who also show that using a parameterdependent covariance when approximating the two-point likelihood of Gaussian random fields is formally incorrect.
The question then arises: how important, in practice, is the parameter dependence in the process of parameter estimation from current and future data sets? In this paper, we go beyond the simple analysis of 2D full-sky Gaussian fields presented here to consider the case of tomographic analyses with non-Gaussian contributions to the covariance matrix.

B. Likelihoods and covariances: the general case
The aim of this section is to derive approximate expressions for the parameter uncertainties and biases associated to the parameter dependence of the covariance matrix. Using the identity log detM = Tr log M, let us start by writing the goodness of fit for the generic multivariate Gaussian distribution (Eq. 1) as Let us now define three sets of parameters 1. θ T : the true parameters that generate the data.
2. θ PD : the maximum-likelihood parameters found by minimizing G PD , the version of G gen in which the covariance matrix depends on θ.
3. θ PI : the maximum-likelihood parameters found by minimizing G PI , the version of G gen in which the covariance matrix does not depend on θ.
θ T generate the data in the sense that d = t( θ T ) 1 and We will also assume that the PI likelihood uses the true covariance as the fiducial one, i.e. Σ f = Σ( θ T ) 2 . This will allow us to isolate the impact of the parameter dependence from the systematics effects associated with using an inaccurate covariance matrix. For brevity, we will often use the shorthand Σ T ≡ Σ( θ T ). Note that, at this stage, we have not made any statements about the validity of either G PI or G PD 3 , and we will focus only on the comparison of their associated parameter uncertainties and on their relative bias.
By definition Writing θ PD = θ PI + ∆ θ and G PD = G PI + ∆G we can expand the equation above to linear order, to find where ∂/∂ θ and ∂ 2 /∂ θ∂ θ is shorthand for the parameter gradient and Hessian matrix respctively. Taking the expectation value of the equation above, the parameter bias can be estimated in this approximation as where F PI is the Fisher matrix for the parameterindependent case, given simply by To evaluate ∆ θ for the distribution in Eq. 19, let us start by writing ∆G to first order in ∆Σ Differentiating with respect to θ we obtain where in the second line we have used the fact that ∂ µ ∆Σ ≡ ∂ µ Σ. Before continuing, it is important to note that, according to Eq. 23, we must evaluate this expression at the parameter-independent best fit θ PI . This best fit satisfies Now let us define δd ≡ d − t( θ T ) and δ θ ≡ θ PI − θ T . To linear order in δ θ, the equation above reads and we can solve for δ θ as or, equivalently Substituting this in Eq. 26, making use of the fact that δd = 0 and δd δd T ≡ Σ T , and after a little bit of algebra, we obtain With index notation then, the parameter bias is This is a key result of our paper. The effect of the parameter-dependent covariance on the final parameter uncertainties can be taken into account simply by accounting for this parameter dependence when deriving the Fisher matrix. Such a calculation yields [40] The parameter uncertainties can then be estimated by inverting the Fisher matrix in either case.

C. Large-scale structure likelihoods
Let us now specialise the discussion in the previous section to the case of a data vector made up of the collection of auto-and cross-power spectra between different sky maps, each labelled by a roman index (e.g. a i m labels the harmonic coefficients of the i-th). In general, each sky map will correspond to an arbitrary projected astrophysical field, and the discussion below covers this general case. However, here we will only consider maps from two types of tracers, the galaxy number overdensity δ g and the cosmic shear γ, each measured in a given tomographic redshift bin. The cross-power spectrum between two fields is C ij can be related in general to the matter power spectrum in the Limber approximation [41] through [42] where the window functions for δ g and γ are Here b(χ) is the linear galaxy bias, dn i /dz is the redshift distribution of lens or source galaxies respectively in the i-th redshift bin, normalised to 1 when integrated over the full redshift range, and χ H is the distance to the horizon. Since our data vector is an ordered list of cross-power spectra between different pairs of maps (ij) at different scales , the covariance matrix depends on 6 indices In order to account for the non-Gaussian nature of the late-times large-scale structure, we estimate the covariance matrix as a sum of both the Gaussian and supersample covariance (SSC) contributions [28,43] We neglect other non-Gaussian corrections from the connected trispectrum, which are known to be subdominant with respect to the SSC contribution, at least for cosmic shear studies [31]. Note that it is important to account for non-Gaussian contributions coupling different scales in order to address the relevance of the parameter dependence of the covariance matrix. As discussed in Section II A, for a single Gaussian map, the relative bias between the PI and PD likelihoods drops like ∼ 1/ for a single multipole order (see Eq. 18), and therefore as ∼ 1/ 2 max for a maximum multipole max , becoming negligible for a sufficiently large number of modes. This will in general also be true for an arbitrary number of maps, since the same arguments hold for each of the independent eigenmaps that diagonalize C ij . Non-Gaussian contributions to Σ will couple different scales, effectively reducing the number of independent modes, and therefore may enhance the impact of the parameter-dependent covariance.
The Gaussian contribution to the covariance matrix can be estimated as [44] ( Note that we account for an incomplete sky coverage by scaling the full-sky covariance by 1/f sky to account for the reduced number of available modes. This is not correct in detail, since measurements on a cut sky induce mode correlations, but it is a good enough approximation for forecasting [45]. It is also important to note that the power spectra entering Eq. 40 must contain both signal and noise contributions. The former are given by Eq. 35, while the latter are where n i g is the mean angular number density of objects in the i-th redshift bin, and σ γ = 0.28 is the intrinsic ellipticity dispersion per component. We compute the SSC contribution to the covariance matrix as described in appendix A. However, ignoring this contribution for the moment, we can use the simple dependence of the Gaussian covariance with f sky to study the importance of a parameter-dependent covariance matrix as a function survey properties. Simple inspection of Eq. 23 shows that, since F PI ∝ f sky , the parameter bias scales like ∝ f −1 sky . On the other hand, the effect on parameter uncertainties is  1: Left: distribution of maximum-likelihood estimates of an overall power spectrum amplitude parameter in the case of parameter-dependent (red) and parameter-independent (blue) covariances. The parameter dependence of the covariance matrix causes a downwards relative bias with respect to the parameter-independent case, as well as a mild shrinkage of the parameter uncertainties. Results are shown for a small maximum multipole ( max = 10 in order to highlight the effects of the parameter dependence. Right: relative bias (red) and shift in uncertainties (blue) due to the parameter dependence of the covariance matrices as a function of the maximum multipole max (note that both quantities are normalized by the parameter-independent error bars). The red circles show the values found by directly evaluating the exact likelihood for 10 6 simulations, while the solid lines show the values found in our Fisher matrix approximation. The Fisher approach is able to reproduce the exact result with good accuracy for both quantities, slightly over-estimating the bias for small max.
where ∆F is the second term in Eq. 33, and in the last line we have expanded to first order in this parameter. Since ∆F does not scale with f sky , the correction to the parameter errors (given by the second term above) is ∝ f −3/2 sky . Thus, in general the relevance of a parameterdependent covariance matrix will decrease with the surveyed area. This results holds also in the presence of non-Gaussian contributions to the covariance. The SSC term arises from the non-linear coupling induced between different modes by the presence of super-survey, longwavelength modes. Its impact will therefore increase for smaller sky areas, and therefore we can expect it to induce a slightly steeper dependence on f sky than the purely Gaussian case.
The expected behaviour with the smallest scale included in the analysis max is also similar: increasing the number of modes reduces the relative impact of the parameter-dependent covariance. The non-Gaussian terms can somewhat modify this behaviour, due to mode coupling. However, the relative impact of the SSC contribution will also decrease for larger survey areas, or for more noise-dominated datasets.

D. Analytic example: the power spectrum amplitude
Before we study the importance of the parameter dependence in the covariance matrix in detail for the case of tomographic large-scale structure datasets, let us exam-ine a particularly simpler scenario: a single sky map, and a single parameter quantifying the overall amplitude of the power spectrum (for instance, this would correspond to the case of A s or σ 8 for linear power spectra). In this case, we can calculate the impact of the parameterdependent covariance exactly, and therefore it allows us to quantify the validity of the Fisher approximations derived in the previous section. Let us label the amplitude parameter A, such that our model for the measured angular power spectrum is related to the a fiducial one as We will assume a fiducial value of A = 1, and consider a purely Gaussian covariance with no noise where n = + 1/2. On the one hand, the parameter-dependent and independent likelihoods are (up to irrelevant constants) given by where r = C d /C f , and C d is the measured power spectrum. The maximum-likelihood solutions for A for a given realisation of the data then arê where S n ≡ n r n . On the other hand, our Fisher predictions for the parameter bias and the parameterindependent and parameter-dependent errors (Eqs. 32 and 33) are where max is the maximum multipole included in the analysis.
To validate these results, we generate 10 6 random Gaussian realisations of r with standard deviation σ = n −1 , and computeÂ PD andÂ PI for each of them. We then evaluate the mean and standard deviation of both quantities and compare them with the approximate results in Eqs. 51, 52 and 53. The results of this validation are shown in Figure 1. The left panel shows an example of the distributions ofÂ PI (blue) andÂ PD (red) for the case of max = 10. As discussed above, the small number of modes in this case highlights the relevance of the parameter-dependent covariance, which produces a noticeable relative bias on A and a decrease in its uncertainty [39]. The right panel then shows the relative parameter bias and shift in uncertainties normalised by σ PI as a function of max for the simulated likelihoods (circles) and for our Fisher predictions (solid lines). The Fisher approximation is remarkably good, and remains accurate at the 10% level even for low values of max ∼ 30.

III. FORECASTS
We apply the results discussed in the previous section to the case of imaging surveys targeting a joint analysis of galaxy clustering and weak lensing. We start by describing the assumptions we use to quantify the expected signal and noise of these surveys and then present our results regarding the relevance of the parameter dependence of the covariance matrix.

A. Survey specifications
We assume an LSST-like survey. For simplicity we assume the same redshift distribution for the clustering and lensing samples with a redshift dependence [46] dN dz and a total number density n g = 27 arcmin −2 . We split the total sample into 10 photometric redshift bins with approximately equal number of galaxies in each bin. The redshift distributions for each bin are calculated assuming Gaussian photometric redshift uncertainties with a standard deviation σ z = 0.05 (1 + z). These are shown in Figure 2. We produce forecasts for a set of 4 cosmological parameters: the fractional matter density Ω M , the matter power spectrum amplitude σ 8 and equation of state parameters w 0 and w a . The forecasts presented below are marginalized over 6 nuisance parameters corresponding to the values of the linear galaxy bias in 6 redshift nodes z = (0.25, 0.75, 1.25, 1.85, 2.60, 3.25). We assume a linear dependence with redshift between these nodes, and our fiducial bias values correspond to b(z) = 1 + 0.84 z [47]. Since the aim of this work is to explore the impact of parameter-dependent covariances, and not to produce forecast of the expected cosmological constraints, we do not consider any other sources of systematic uncertainty, such as intrinsic alignments, multiplicative shape biases or photometric redshift systematics.
Finally, our fiducial forecasts assume a sky fraction f sky = 0.4, as expected for LSST, and a constant scale cut max = 3000 for all redshift bins. A more realistic choice of scale cuts would be motivated by modelling uncertainties, by removing all scales smaller than the physical scale of non-linearities k NL , necessarily in a redshift-dependent way. By not removing these scales at low redshifts ( NL (z = 0.5) k NL χ(z = 0.5) 400 for k NL = 0.3 h Mpc −1 ), our fiducial choice emphasizes the role of super-sample covariance, potentially highlighting the effects of parameter dependence in the covariance. At the same time, we have seen that these effects become less relevants as we increase the number of independent modes, and therefore we will also present results as a function of max and f sky .

B. Results
We compute the Fisher matrices for parameterdependent and independent covariances as well as the bias due to the parameter dependence as described in Section II B, using Eqs. 24, 33 and 32. Here t is the vector of all possible cross-power spectra between two tracers (δ g or γ) in any pair of redshift bins, calculated using Eq. 1 (with noise power spectra given by Eq. 41), and Σ is the covariance matrix of this data vector, calculated as in Eq. 39. To compute all angular power spectra we use the Core Cosmology Library 4 [48], which we also modified to provide estimates of the super-sample covariance as described in Appendix A. Note that, we do not compute power spectra and covariance matrices for all integer values of . Instead, we use 15 logarithmicallyspaced bandpowers between = 20 and = 3000.
We report our results in terms of the relative parameter biases and the relative error differences where σ PI and σ PD are computed from the inverse of the corresponding Fisher matrices. Results are reported for the 4 cosmological parameters (Ω M , σ 8 , w 0 , w a ). Since the aim of this paper is to study the relevance of the parameter dependence in the covariance matrix, and not to produce cosmological forecasts for future surveys, we do not report absolute errors on these parameters. Figure 3 shows the relative difference in the statistical uncertainties as a function of sky area for a fiducial max = 3000. Results are shown for galaxy clustering, weak lensing and for the combination of both. We also show the impact of the super-sample contribution to the covariance matrix by displaying results with (solid) and without it (dashed). In all cases, the information contained in the covariance matrix is very small, and would only lead to a ∼ 20% suppression of the uncertainties for the smallest sky areas (∼ 0.5% of the sky), corresponding to e.g. CFHTLens [49] or the first data release of HSC [6]. The effect becomes even less important for larger sky areas, as more independent modes become available and concentrate most of the information on the power spectrum. For LSST-like areas (f sky 0.4) the effect is, at most, of the order of 1% of the statistical uncertainties.
The same results are shown in Figure 4 for the relative parameter biases, and similar conclusions hold. The effects are always smaller than ∼ 0.3σ on small survey areas, and get suppressed to a percent fraction of the statistical uncertainties for larger areas. While these effects are small, it is interesting to note that they are enhanced by considering a more accurate model of the covariance matrix that includes the SSC term. Thus we see once again the effect of the super-sample covariance inducing a statistical coupling between modes which then reduces the effective number of independent degrees of freedom and and enhances the relative information content of the covariance matrix.
For a fixed LSST-like sky fraction (f sky = 0.4), Figure  5 shows the impact of the parameter dependence of the covariance matrix as a function of the maximum multipole max included in the analysis. Results are shown for the error difference (left panel) and biases (right panel), and we also show the impact of the super-sample covariance term. Again, as expected based on our discussion in Section II, the information content of the covariance matrix decreases as more independent modes are included in the analysis. The impact, both on the uncertainties and on the parameter biases, is smaller than ∼ 20% of the overall statistical error budget (and this only for max = 20). For more realistic scale cuts ( max ∼ 1000), the effect is suppressed to the level of O(1%).

IV. SUMMARY
In the era of precision cosmology it is becoming increasingly important to understand the systematic and statistical errors that occur during parameter estimation. In this paper we have analysed the effect of using a likelihood with a parameter dependent covariance matrix on the inference of cosmological parameters. The computation of covariance matrices for large-scale structure surveys is a numerically complex problem for which multiple approaches have been proposed in the literature. Therefore, assessing the need to estimate the covariance at every point of a given likelihood evaluation, rather than estimating it only once for a set of parameters sufficiently close to the maximum likelihood, is important for future surveys, where covariance estimation will become more computationally demanding.
We have focused our analysis on the two main aspects of parameter inference: final statistical uncertainties and parameter biases. For multivariate Gaussian likelihoods, we have quantified the impact on the statistical errors using a standard Fisher matrix approach that accounts for the parameter dependence of both the mean and covariance (Eq. 33). We have also derived an expression to estimate the expected parameter bias by expanding the likelihood around its maximum. The resulting expression (Eq. 32) is easy to calculate and has not been presented before to our knowledge. We have evaluated the accuracy of this approximation by comparing its predictions with the analytical solutions available for a simplified case involving a single amplitude parameter and including the super-sample covariance (solid lines). As argued in Section II, the importance of the parameter dependence grows towards smaller f sky , as the number of modes available decreases. Nevertheless, the effect is never larger than ∼ 20% of the statistical errors for the smallest sky fractions (f sky = 0.005), and is below 1% for LSST-like areas (f sky 0.4). a single sky map (Section II D). This exercise shows that our approximate estimates are indeed accurate as long as the true parameter bias is small, and that, if anything, our approximations will slightly overestimate this bias. The methods used here are therefore perfectly applicable to the case we study, and a more computationally expensive approach involving a full evaluation of the likelihood through Monte-Carlo methods is unlikely to yield different results.
We have then evaluated the parameter shifts and error differences for the particular case of a large-scale structure experiment targeting the joint measurement of galaxy clustering and cosmic shear. The Fisher information for the parameter dependent covariance matrix, in contrast to the parameter independent case, does not increase with the survey area. This can be seen in Eq 33 where all factors of f sky in the covariance matrix can-cel out for the second term. The associated parameter bias, on the other hand, is roughly inversely proportional to f sky (see Section II C). This means there will always be a regime in which the parameter dependence becomes important, the question is whether it will be important for any practical value of f sky . Our study of simple single-map cases (Sections II A and II D) have shown that the impact of a parameter-dependent covariance decreases with multipole order (e.g. Eqs. 51 and 53), and therefore the generic message is that the relative information content in a parameter-dependent covariance decreases as more independent modes are included in the survey. To account for mode-coupling induced by nonlinear evolution of the matter overdensities, we have included the super-sample contribution to the covariance matrix, which has been determined to be the most relevant contribution in the range of scales considered here Left: relative error difference due to a parameter-dependent covariance for the four cosmological parameters studied here as a function of the maximum multipole max included in the analysis for a fiducial sky fraction f sky = 0.4. Right: same as the left panel for the relative parameter bias. As expected, the impact of the parameter dependence grows as we reduce the number of independent modes used in the analysis. In all cases, the effect is smaller than ∼ 20% of the statistical uncertainties, even for max ∼ 20, and becomes percent-level for realistic values ( max ∼ 1000). [28]. We note that we have not considered additional nuisance parameters such as shifts in the photometric redshifts, the impact of baryons or intrinsic alignments. We have also only taken into account the super-sample covariance contribution to the Gaussian covariance matrix, neglecting all other parts of the connected trispectrum. While this has been determined to be sufficiently accurate for lensing observables [31] large-scale structure studies exploring the deeply non-linear regime will require a more careful treatment. Nevertheless, we don not expect any of these effects to change our results significantly. Furthermore it is conceivable that in situations where the parameter space is extended, to include more exotic models of dark energy for example, the likelihood may become Non-Gaussian and then the parameter dependence of the covariance may need to be taken into account. Our findings, summarised in Figures 3, 4 and 5 show that, for any current and future surveys, the parameter dependence of the covariance matrix can be safely ignored, since it only leads to changes in the statistical errors and maximum-likelihood parameters that are 1% of the statistical uncertainties. For surveys targeting very small sky fractions (f sky < 1%) or very large scales ( 20), the impact of the parameter dependence becomes more important, but the impact is at most at the ∼ 0.2σ level, both in biases and uncertainties. Note that the these effects are sufficiently small that it strongly justifies the use of the Fisher formalism to undertake this estimate, but a more thorough analysis of these effects in cases where information content of the covariance is significant would require the use of a full likelihood exploration. Since we can expect systematic and numerical uncertainties to be at least as important, it is safe to say that the parameter dependence of the covariance matrix can be ignored in all practical cases. k ≡ ( + 1/2)/χ, z is the redshift at the comoving radial distance χ and P (k, z) is the matter power spectrum. R(k, z) is the response of the matter power spectrum to a long-wavelength density perturbation δ b and σ b is the projected variance of the density field within the footprint of your survey. For simplicity we assume a connected disc with an area corresponding to the sky fraction covered, in which case σ b can be estimated as σ b (f sky , z) = dk 2 ⊥ P L (k ⊥ , z) where P L is the linear matter power spectrum, J 1 is the order-1 cylindrical Bessel function and R s ≡ χ(z) θ s , θ s ≡ arccos(1 − 2f sky ).
The expressions above reproduce the results of previous work by [50], in the simplified scenario of linear galaxy bias. We refer the reader to these works for further details.
A standard approach to compute the power spectrum response R is to make use of so-called "Separate-Universe" simulations [51,52] in which the effects of a long-wavelength mode are modelled as modifying the effective background cosmology. Since our results will not depend strongly on the details of this calculation, we proceed as in [50] and estimate R using a halo model approach.
As a brief review and to introduce the notation, we provide the details of the Halo model below [51]. In the Halo model the matter power spectrum can approximated as P (k) = P 1h (k) + P 2h (k) where P 1h (k) represents the contribution from correlations within a single halo and P 2h (k) is the contribution from separate halos correlated by the linear power spectrum. Explicitly these are defined in terms of Halo model kernels I α β (k).
where dn/dM is the Halo mass function and b(M ) is the halo bias. Both quantities were estimated using the fits from [53]. u(k|M ) is the Fourier transform of the halo density profile for a halo of mass M , which we model assuming a NFW profile with a concentration-to-mass relation given by [51]. In terms of these quantities, the response of the power spectrum to the background density is given by [51] R(k) = R BC + R HSV + R LD (A7) R BC (k) = 68 21 A few comments on these terms are in order.
• The beat coupling (BC) term is the one that has been studied extensively in the literature. It represents the fact that a short wavelength mode will grow more in the presence of a larger background density (i.e the large scale mode outside the window).
• The halo sample variance (HSV) term shows the change in the number of Halos in the presence of a large scale density mode.
• A large scale density mode will change the local scale factor and thus change the physical size of a mode that would otherwise be smaller. This is known as the linear dilation (LD) effect.
These different contributions at z = 0 are shown in Figure 6, and the mild redshift evolution of the overall response is presented in Figure 7.
Using this response we can compute the SSC matrix as shown in Figure 8. It is worth noting that, when evaluating the parameter dependence of the covariance matrix, we vary all terms entering Eq. A1 except for R, which we keep fixed.