The impact of signal-to-noise, redshift, and angular range on the bias of weak lensing 2-point functions

Weak lensing data follow a naturally skewed distribution, implying the data vector most likely yielded from a survey will systematically fall below its mean. Although this effect is qualitatively known from CMB-analyses, correctly accounting for it in weak lensing is challenging, as a direct transfer of the CMB results is quantitatively incorrect. While a previous study (Sellentin et al. 2018) focused on the magnitude of this bias, we here focus on the frequency of this bias, its scaling with redshift, and its impact on the signal-to-noise of a survey. Filtering weak lensing data with COSEBIs, we show that weak lensing likelihoods are skewed up until $\ell \approx 100$, whereas CMB-likelihoods Gaussianize already at $\ell \approx 20$. While COSEBI-compressed data on KiDS- and DES-like redshift- and angular ranges follow Gaussian distributions, we detect skewness at 6$\sigma$ significance for half of a Euclid- or LSST-like data set, caused by the wider coverage and deeper reach of these surveys. Computing the signal-to-noise ratio per data point, we show that precisely the data points of highest signal-to-noise are the most biased. Over all redshifts, this bias affects at least 10% of a survey's total signal-to-noise, at high redshifts up to 25%. The bias is accordingly expected to impact parameter inference. The bias can be handled by developing non-Gaussian likelihoods. Otherwise, it could be reduced by removing the data points of highest signal-to-noise.


INTRODUCTION
Weak lensing is a central observable for contemporary cosmology. Arising from the bending of light rays in gravitational fields, it allows not only to study gravity beyond its Newtonian limit, but also to map dark and luminous matter alike. When jointly analyzed with galaxy clustering, it can constrain the gravitational slip parameter (Daniel et al. 2008) and thereby discriminate between different types of gravitational theories. Considerable observational efforts are thus invested into weak lensing, including the Kilo Degree Survey (Hildebrandt et al. 2020), Dark Energy Survey (Abbott et al. 2019) and the Hyper Suprime-Cam (Mandelbaum et al. 2018).
Weak lensing is sensitive to the cosmological parameter σ 8 , which describes the 'clumpiness' of matter in the Universe in the sense that σ 8 sets the amplitude of the matter power spectrum. Measurements of σ 8 derived from weak lensing experiments are mildly discrepant with measurements of σ 8 from the cosmic microwave background (CMB), see Planck Collaboration (2014,2016,2018). Since many years, weak lensing yields lower values of σ 8 , and it has been repeatedly proposed that this might be indicative of an unsolved bias in contemporary weak lensing constraints.
The term 'bias' usually carries a negative connotation. However, in a previous study  had shown that although the summary statistics derived from weak lensing observations are indeed 'biased', this bias has here to be understood in the literal sense. An estimator is said to be biased if the value it yields most likely differs from the one it yields on average. An experiment's average outcome then deviates from its most likely outcome. This occurs if the estimator is drawn from a skewed distribution.
sellentin@strw.leidenuniv.nl  had shown that all weak lensing 2-point function estimators will be affected to various degrees by such a literal bias, as they follow a skewed distribution. In consequence, a sound weak lensing data vector will likely display an amplitude that is below average.
If such weak lensing data were analyzed with the matching skewed likelihood, then parameter inference could proceed self-consistently. However, the current standard weak lensing analysis employs a symmetric Gaussian likelihood instead. When used to analyze a data vector that is biased low, the Gaussian likelihood will force the fit to center on this low amplitude. As σ 8 scales the amplitude of weak lensing observables, it can thus be expected that a systematically lowered value for σ 8 is found in the standard analysis.
Quantifying by how much σ 8 will be biased lowand removing this bias -is a highly non-trivial challenge: solving both questions implies the correct weak lensing likelihood must be found, rather than a heuristic non-Gaussian substitute.
Accordingly, a skewed baseline likelihood to handle weak lensing observations alongside galaxy clustering and their cross-correlation has been presented in Manrique-Yus & Sellentin (2020). It takes the form of a Bayesian Hierarchical Model and at its core lies the same Wishart distribution that also describes the skewed distribution of the low multipoles of the cosmic microwave background (CMB), see Hamimeche & Lewis (2008, 2009); Upham et al. (2020). In comparison to CMB data, weak lensing data follow a decidedly more skewed distribution, as was revealed by simulations (Sellentin & Heavens 2018;Harnois-Déraps & van Waerbeke 2015) and subsequently by Diaz Rivero & Dvorkin (2020). This additional skewness can be implemented in the skewed baseline likeli-hood through a free parameter g eff , but to date g eff can only be measured from simulations, and computing it from first principles is still unsolved.
In this paper, we thus return to the origin of skewness in weak lensing likelihoods. In particular, we study whether the problematic skewness can be suppressed without losing precision. Although the skewness could be suppressed by invoking the Central Limit Theorem, this is highly suboptimal: inconsiderate averaging will remove information. We therefore opt for the more refined approach by filtering, and study whether skewness can be filtered out. Should such a suppression succeed, then developing non-Gaussian likelihoods could be put aside. We here opt for a set of filters known as COSEBIs, as these are constructed to remove further weak lensing systematics. If also able to suppress skewness, a COSEBIanalysis would thus present a highly controllable weak lensing analysis.
The setup of this paper is as follows. In Sect. 2 we introduce weak lensing. In Sect. 3 we describe why it can be hoped that filtering may suppress skewness and ensuing biases. In Sect. 4 we introduce the essentials of COSEBI filters, and in Sect. 5 we create 819 COSEBI samples to estimate their sampling distribution. Our strategy to detect deviations from a Gaussian distribution is described in Sect. 6. From Sect. 7 we present the main results of this paper: Sect. 7.1 describes how sampling distribution skewness evolves with redshift, and Sect. 7.2 describes the excess skewness of weak lensing measurements in comparison to the CMB case. Sect. 7.3 contrasts Euclid-and LSST-like angular ranges with KiDS-and DES-like angular ranges. Sect. 8 then studies the excess probability of finding a 'low' weak lensing outcome, if a Gaussian likelihood is insisted upon. Our conclusions are presented in Sect. 9, and the appendix describes the construction of COSEBIs and answers questions concerning gravity-induced skewness often encountered.

WEAK LENSING SIGNAL AND NOISE
Matter between the observer and far-away sources distorts the image of the sources via gravitational lensing. As light from neighbouring sources propagates through similar foreground fields, weak lensing causes the images of neighbouring sources to have a correlated alignment. This correlation can be extracted from observations, see Bartelmann & Schneider (2001); Kilbinger (2015).
Such weak lensing signals carry two types of noise, 'shape noise' due to the diversity of galaxy shapes and orientations, and 'cosmic variance' due to the stochastic nature of the matter fields transversed by the propagating light.
A weak lensing analysis begins from a weak lensing galaxy catalogue that stores for N galaxies their celestial positions x together with the two measured ellipticity components 1 and 2 and the redshift. From such a catalogue, the standard analysis of a weak lensing data set computes binned 2-point statistics, such as the correlation functions ξ ± . If θ denotes the angular separation on the sky, then the weak lensing correlation functions ξ ± can be extracted via the estimator where w i,j are the weights arising from the shape measurement process on realistic galaxies. We use hats to denote noisy quantities. The angular separation θ and the positions x of galaxies in the sky are still two-dimensional only. As our Universe evolves our time, it is advantageous to split a galaxy catalogue radially into subsets of galaxies. This splitting is known as redshift tomography. Denoting the redshift by z, the ith redshift bin is n i (z), which is the normalized distribution of the galaxies assigned to the bin i.
These redshift bins are then re-expressed as a function of the comoving distance χ, known as the weak lensing kernel corresponding to the ith galaxy population. The light emitted from galaxies per bin will propagate through foreground fields towards the observer. Of these, cosmological theory can predict the matter power spectrum P δ . The continuous propagation of light through these foreground fields causes that the weak lensing power spectrum P ij γ ( ) integrates over the matter power spectrum, weighted by the lensing kernels where the Limber approximation has been used. Furthermore, is a spherical harmonic multipole, H 0 is the Hubble constant, Ω m is the matter density parameter, c is the speed of light. Both the matter power spectrum P δ and the comoving distance χ(z) as a function of redshift depend on a broad range of cosmological parameters, which can accordingly be inferred from weak lensing data.

FILTERING TO SUPPRESS SKEWNESS?
Comparing Eq. (1) and Eq. (3), an inherent disparity between weak lensing observations and its theoretical predictions becomes apparent: while the weak lensing signal is easiest to extract in real space θ, the theoretical predictions are easier to compute in harmonic space . The translation from harmonic space to real space occurs via a filter F ( θ) such that a general weak lensing 2-point function can be written as where the filter F is to be specified. The correlation function ξ + (θ) employs the filter J 0 ( θ) and ξ − (θ) employs J 4 ( θ), which are the zeroth and fourth order Bessel functions respectively. Within the context of determining whether a Gaussian likelihood function is adequate, the filter in Eq. (4) bears a special significance: as shown by , it is the low -modes of the transversed matter fields that skew the distribution from which weak lensing data are drawn. Thus, if we succeed in constructing a filter F ( θ) that downweighs the contribution of lowmodes to a weak lensing observable, then this filter will Gaussianize the distribution. n ( ) filters for logarithmic COSEBIs, as a function of mode index n = 1 and multipole . The angular range [θ min , θmax] is indicated in arc-minutes. It is known that lowmodes cause non-Gaussianities in 2-point function likelihoods. As low n-modes assign large weight to low -modes, it can be expected that COSEBIs with low n-mode will require non-Gaussian likelihoods.
The downside to filtering out non-Gaussianity is that if it is done naively (e.g. by setting the filter to zero in the low -regime), then constraining power is lost. We thus have to find a filter that is both efficient in retaining cosmological information, and in suppressing non-Gaussianity. In this context, the so-called COSEBI filters proposed by Schneider & Kilbinger (2007);Schneider et al. (2010) are of interest. Originally developed to split weak lensing E-modes from B-modes, and applied as such to the data of the CFHTLenS (Asgari et al. 2017), KiDS and DES survey (Asgari et al. 2019), COSEBIs are not only efficient in retaining cosmological information, but might as well suppress non-Gaussianity.

ESSENTIALS OF LOGARITHMIC COSEBIS
Complete Orthogonal Sets of E-/B-mode Integrals, or COSEBIs for short, were originally introduced to split weak lensing E-modes from B-modes (Schneider & Kilbinger 2007;Schneider et al. 2010). The physics of weak lensing is expected to produce E-modes only, and the detection of B-modes would thus directly indicate systematic effects. By discarding B-mode COSEBIs and only analyzing their E-mode counterparts, a clean data vector can be gained and used for parameter inference. However, this clean E-mode data vector still requires the correct likelihood to be known, otherwise the use of the incorrect likelihood becomes a systematic itself.
With the aim of determining the likelihood shape of COSEBIs, we thus study whether the COSEBI filter W log n ( ) succeeds in suppressing non-Gaussianity that skews the likelihood for parameter inference.
We detail the derivation of COSEBIs in appendix A. For the study of the COSEBI likelihood shape, it is sufficient to recognize that the COSEBI filters will act according to Eq. (4): given a power spectrum, the filter lets certain -modes pass, while others are suppressed, depending on the amplitude that the filter assigns to these modes.
As it is known that the subclass of logarithmic COSEBIs is more efficient in retaining cosmological information, we here specialize to logarithmic COSEBIs, which we denote as E n . COSEBIs are discrete modes, enumerated by their index n, and we expand until n = 7, which has been shown to retain the majority of cosmological information (Asgari et al. 2012).
In case of logarithmic COSEBIs, the filter F from Eq. (4) is usually referred to as W log n ( ). Per index n, the filter will output the corresponding COSEBI E n , implying the filter's shape will change when varying n.
As COSEBIs are constructed to only use data on given angular ranges (see appendix A), the filters W log n ( ) implicitly also depend on θ min and θ max , the minimum and maximum angular range of data that the filter lets pass.
The exclusion of angular ranges becomes more evident when transforming the W log n ( ) filters to real space, where they filter ξ ± (θ) instead of the shear power spectrum P ij γ ( ). The real space analogues of the W log n ( ) filters are here denoted as T log +,n (θ) and T log −,n (θ), depending on whether they filter ξ + or ξ − .
We plot harmonic space representations W log n ( ) of selected COSEBI filters in Fig. 1. Fig. 2 displays examples of T ± filters for n ∈ [1, 7] and an angular range of 0.5 to 100 arc-minutes. The angular ranges [θ min , θ max ] that each COSEBI filter integrates over are indicated in the legends of our plots.
From the harmonic space representation W log n ( ) in Fig. 1 it is evident that COSEBI filters let a selected range of -modes pass. As  identified low -modes to be the cause of likelihood skewness, COSEBIs might thus be able to exclude these problematic modes. In practice, the filter functions displayed in Fig. 2 are applied to correlation functions ξ ± estimated from observed galaxy catalogues. In Sect. 5, we will use the T ± filters on correlation functions estimated from simulations, with the aim to gain a representative sample of simulated COSEBIs to study the shape of their sampling distribution.

CREATING COSEBI SAMPLES
In order to study whether COSEBIs suppress non-Gaussianity in a weak lensing data set, we have to estimate the distribution from which such COSEBIs are drawn. This distribution is called the 'sampling distribution' of the data. In parameter inference the sampling distribution reoccurs as the likelihood, when it will be evaluated conditionally on the parameters. To estimate the sampling distribution, we compute 819 simulated COSEBI data vectors from the SLICS weak lensing simulations (Harnois-Déraps & van Waerbeke 2015) in a Euclid-or LSST-like setup of redshift binning.
The SLICS simulations provide square celestial patches of the cosmic large scale structure of size A sky = 100 deg 2 at a mass resolution of 2.88 · 10 9 M /h. The simulation suit contains 930 independent realizations of a gravitationally evolved dark matter field projected onto lens planes in order to compute the light deflection and therefrom the weak lensing signal per simulation.
Of these simulations, 819 were post-processed to imitate a 10-bin tomographic weak lensing survey, with n = 2.6/arcmin 2 galaxies per bin. This corresponds to a total of 30 galaxies per square-arcminute as expected for a Euclid-like survey.
To compute COSEBIs, we first run the TreeCorr implementation (Jarvis et al. 2004) of theξ ± -estimator given in Eq. (1) on the SLICS weak lensing catalogues. When computing accurate COSEBIs an unusually high angular resolution is of the essence. Following the recommendation of Asgari et al. (2017), we thus use 10000 linear θ-bins, of width ∆θ = 0.00995 . We then filter the resultingξ ± (θ) estimates with the T (n) ± (θ) filters according to Eq. (A3). This results in 819 estimates of E-and B-mode COSEBIs E n and B n . We retain the E-modes for further statistical analysis, but discard the B-modes as these would not contain physical information in an analysis of real data.
In our computation of simulated data vectorsξ ± , we observe the known loss of power inξ ± as reported in Fig. 6 of Harnois-Déraps et al. (2018). This loss of power in turn affects the amplitude to the COSEBIs computed fromξ ± . Fortunately, this is a deterministic rather than a stochastic effect, and is caused by the finite mass resolution and box size of the simulations. As Harnois-Déraps et al. (2018), we hence rescale the extractedξ ± such that they average to the best-fitting parameters of the original simulation run.
To study shape noise and cosmic variance individually, we estimate pure cosmic-varianceξ ± (θ) correlation functions from the SLICS simulations. As shape noise is understood to be Gaussianly distributed, we process the pure cosmic-varianceξ ± functions further by adding shape noise directly toξ ± rather than per galaxy.
The shape noise variance per galaxy splits as σ 2 = σ 2 1 + σ 2 2 where the indices 1, 2 indicate the two components of the estimated ellipticities. We use the typical standard deviation per component of σ i = 0.29. Due toξ ± being computed in angular bins, the shape noise variance C s ofξ ± is given by where A sky = 100 deg 2 is the area of the simulated celestial patch andn the number density of about 2.6 galaxies per arcmin per tomographic bin. The correlation functions with shape noise then follow to beξ where the random variablesŝ are drawn from Gaussian distributions of mean zero and variance C s (θ), Note, that Eq. (6) and Eq. (7) imply that shape noise does not change the signal, but only enhances the scatter around the signal: the quantityŝ(θ) is a Gaussian random variable, taking with equal probability values above and below its mean. Both the correlation functions with and without shape noise are then filtered according to Eq. (A3) in order to yield 819 E-mode COSEBIs for each n and each redshift bin combination. With 10 redshift bins, there will be 55 redshift bin combinations. As we compute 7 COSEBIs per redshift bin combination, we thus create 819 samples of a 385 dimensional data vector.
As an example, Fig. 3 depicts all two-dimensional marginal distributions of the first seven COSEBI Emodes of the redshift bin 8 correlated with itself. This ]. Per marginal, the two En modes are indicated on the axes. The black circles mark the peak of the marginals which are often off-center, due to the asymmetry of the distributions; the pink circles mark the means. When Cholesky-whitening the left column, the right column ensues, where each d i is a linear superposition of the En modes. Cholesky-whitening destroys all Gaussian correlations between data points. This is easily seen close to the diagonal where the strongly ellipsoidal (i.e. correlated) COSEBI distributions were transformed to more isotropic, roundish distributions. Non-Gaussian dependencies between random variables survive Cholesky-whitening, which is why the right column still displays asymmetric non-Gaussian features. These residual non-Gaussianities are quantified in Sect. 6. figure can be repeated for the remaining 54 redshift bin combinations, which provides the basis for the non-Gaussianity detection in Sect. 6. From Fig. 3, we can already deduce by eye that the assumption of a Gaussian distribution for COSE-BIs there used will be approximate. This is evidenced by the marginals deviating from two-dimensional Gaussian distributions. It can be seen that the distributions' peaks are often off-center, which affects in particular the COSEBIs E 2 and E 3 . Additionally, the marginals including the COSEBI E 6 display a sharp drop in intensity, causing an edge in the depicted histograms. These first visual impressions of non-Gaussianity will be quantified in Sect. 6.

NON-GAUSSIANITY DETECTION STRATEGY
The purpose of this section is to quantify the level of non-Gaussianity in the high-dimensional data set of seven COSEBIs per 55 redshift bin combinations of a Euclid-or LSST-like survey. For detected non-Gaussianities, a follow-up study of the ensuing biases is presented in Sect. 8.
A covariance matrix is sufficient if pairs of data points correlate Gaussianly only. The components of the covariance matrix are estimated via the usual sample estimator which uses the sample mean If the likelihood truly is Gaussian, then computing an accurate covariance matrix is the final stepping stone for an unbiased analysis. In contrast, if the likelihood is non-Gaussian, then even an arbitrarily precise covariance matrix is insufficient to parameterize the actual likelihood. The purpose of a trans-covariance matrix, as developed by Sellentin & Heavens (2018), is to flag where in a data set computing a covariance matrix is insufficient. If the likelihood truly is Gaussian, then the trans-covariance matrix will vanish. Otherwise, if non-Gaussian dependencies affect the data, then the trans-covariance matrix will flag pairs of data points for which more than their covariance must be understood to yield the correct likelihood.
The logic of a trans-covariance matrix is to first destroy the Gaussian correlations in pairs of data points, and then measure whether any further non-Gaussian dependencies between the two data points remain. If d = (d a , d b ) is a pair of two data points taken from a usually much larger data vector x, then the covariance matrix of this pair is given by .
This covariance matrix is Cholesky decomposed as C = LL T , and the pair d is whitened viã Should the covariance matrix of Eq. (10) have fully captured the statistical dependence between d a and d b , then the whitened vectord will now consist of two independent normally distributed random variables. This is testable. If the test is failed, non-Gaussian dependencies between the pair exist. The test for non-Gaussian dependencies must be repeated for every pair of data points. This repetition builds up a matrix: just like the covariance matrix computes the covariance between a pair, the trans-covariance matrix flags the non-Gaussian dependencies in a pair. Plotting the two matrices side by side directly reveals for which pairs the covariance matrix is an all-encompassing statistical description, and which pairs require an advanced statistical treatment.
Our test for non-Gaussian dependencies in a pair proceeds as follows. Assumed a ,d b indeed were whitened normal distributed variables. Theñ would hold. A direct consequence of this assumption is that the sum ofd a ,d b follows If there exist non-Gaussian dependencies between d a and d b , then the assumption thatd a ,d b are independently drawn from the two normal distributions in Eq. (12) breaks down, and Eq. (13) will be incorrect as a consequence. From our simulated COSEBIs, we can create a histogram of the sum in Eq. (13). If this histogram tends to a Gaussian with variance two, then the pair d a and d b is fully described by its covariance matrix. If the histogram deviates from a Gaussian of variance two, then d a and d b depend on each other in ways not captured by a Gaussian likelihood. Denoting the histogram as H(d a +d b ), its deviance from G(0, 2) is computed by integrating over the difference between the histogram and the Gaussian. If the histogram consists of W rectangular bins, then this integral is simply given by the sum where H w is the wth histogram bin. The sum S ab provides the (a, b) component of the trans-covariance matrix. The corresponding components of the covariance matrix are given in Eq. (8).
Finally, we have to mitigate the impact of having only 819 simulations, which will cause noise in the transcovariance matrices. We thus compute trans-covariance matrices for 819 Gaussian data sets, and measure the mean and standard deviation of their matrix elements. We then mean-subtract the trans-covariance matrices of the actual COSEBI simulations, and divide by the standard deviation. This yields a trans-covariance matrix in units of a standard deviation, such that we can speak of a '5σ' detection, etc. The units σ are thus assigned to the colour bars of all plotted trans-covariance matrices.

RESULTS: BIASES AND SKEWNESS IN COSEBIS
In this section we present the main findings of this paper, which include the scaling of biases from skewness with redshift, -mode, and angular range θ min , θ max in a weak lensing data set. 7.1. Increase of skewness with redshift In Fig. 4 we depict the correlation matrices and the trans-covariance matrices for a Euclid-or LSST-like survey with 10 tomographic redshift bins. The redshift bins 1-4 of this simulated survey fall below a redshift of unity. The redshift bin 5 marks the transition to a redshift of unity, and the redshift bins 6-10 finally include galaxies above a unit redshift. For a contemporary survey such as KiDS and DES, the vast majority of their observed galaxies used for weak lensing fall below a unit redshift, see e.g. van Uitert et al. (2018); Hildebrandt et al. (2020); Abbott et al. (2019). Accordingly, these surveys can roughly be recognized in the first four redshift bins of Fig. 4. Fig. 4 displays the COSEBI modes E 1 to E 7 per redshift bin combination. The angular range of these COSE-BIs is θ min = 1 and θ max = 300 .
From the correlation matrices with and without shape noise we recognize the familiar trend that weak lensing observations in different redshift bins are strongly correlated. The trans-covariance matrix without shape noise (lower left) reveals a 14σ to 21σ detection of a non-Gaussian sampling distribution for COSEBIs at low redshift. In the case at hand, this non-Gaussianity takes the form of a skewed sampling distribution, as is expected for any 2-point function, see Sellentin  The correlation-and trans-covariance matrices of the seven En COSEBIs for all 55 redshift bin combinations of a 10-bin survey. These COSEBIs were computed by filteringξ ± (θ) binned in 250 angular bins. The axis labels indicate the redshift bins; within each redshift bin the mode index n increases from 1 (left) to 7 (right). The left column singles out cosmic variance; the right column also adds shape noise. The angular ranges [θ min , θmax] of the COSEBIs are indicated. Interpretation: While all COSEBIs are strongly correlated as seen in the upper row, only selected COSEBIs show additional non-Gaussian dependencies. Without shape noise, non-Gaussianity is detected in the low redshift bins with up to 21σ significance. For visibility, the colour bar is cut at 14σ. With shape noise, a redshift trend for non-Gaussianity emerges: While the redshift bins 1-4 are unaffected, higher bins show a 6σ detection of non-Gaussianity. The redshift bins 1-4 all fall below a redshift of unity, bin 5 marks the transition, and from bin 6 onwards the binned galaxies reside at redshifts z > 1. While the bulk of galaxies of contemporary surveys is located below a redshift of unity, half of the galaxies in a Euclid-or LSST-like survey will reside at redshifts beyond unity. This implies the next generation of surveys will be affected by the found skewness. ences therein. This shape-noise-free panel is included to show the upper bound on non-Gaussianity, as it would occur in a survey of many more galaxies than detectable by Euclid or LSST.
The lower right trans-covariance matrix adds shape noise for a Euclid-or LSST-like survey. Shape noise follows a Gaussian distribution and convolves the originally strongly skewed distributions that gave rise to the 14 to 21σ detection on the left. The convolution results in a suppression of non-Gaussianity, and accordingly the trans-covariance matrix with shape noise flags a 6σ detection of residual skewness, which is still considerable. All pairs of data points that are flagged as subject to non-Gaussian dependencies will require a non-Gaussian likelihood for bias-free inference.
The perhaps most important result of Fig. 4 is the evidence that a COSEBI-based KiDS and DES analysis on the angular scale [1 , 300 ] is free from non-Gaussian biases. This conclusion is evidenced by the first four redshift bins not displaying any prominent non-Gaussianity detection. In contrast, for a Euclid-or LSST-like survey, half of the galaxies reside at redshifts above unity, and will thus be affected by the skewness of sampling distribution and likelihood here detected. The computation of Fig. 4 was only feasible by binning theξ ± (θ) functions in 250 angular bins before applying the COSEBI filters T ± (θ). While 250 angular bins inξ ± are insufficient to yield a percent-level accuracy of the mean COSEBI signal (see Asgari et al. (2017)), we find that the COSEBI trans-covariance matrix converges much more rapidly than the mean signal. To evidence that 250 angular bins were sufficient, we display  Fig. 1, the [1',300']-filter amplifies -modes around ≈ 100 for which non-Gaussian sampling distributions are identified with 6σ significance (left). The [1',360']-filter emphasizes slightly lower -modes causing a non-Gaussianity detection significance of more than 7σ (middle). The [70',100']-filter changes sign, thereby balancing positive and negative skewness against each other; this results in a suppression of non-Gaussianity that is consequently only detected at slightly more than 3σ (right).
in Fig. 5 a recomputation of the trans-covariance matrix of redshift bin 8, but now with 10000 angular bins. This yields precisely the same detection significance of the found non-Gaussianities. Further recomputations of the first three redshift bins at angular resolution of 10000 angular bins are presented in Sect. 7.3. 7.2. -ranges: Excess skewness in comparison to the CMB The sampling distribution (and in consequence the likelihood) of any 2-point function is a priori expected to be skewed. This arises due to 2-point functions being quadratic in the underlying field values that they correlate, and in consequence even affects the power spectrum estimates C of the cosmic microwave background (CMB), although the microwave background itself is compatible with a Gaussian random field (Hamimeche & Lewis 2008, 2009. Known from analyses of the CMB is that the sampling distribution of C is skewed until ≈ 20, see e.g. the discussion in the Planck likelihood of Planck Collaboration et al. (2019). As COSEBI filters allow us to single out certain -ranges with more precision than could be achieved with a ξ ± (θ) analysis, we are able to compare likelihood skewness in weak lensing surveys to likelihood skewness in CMB analysis. This study is depicted in Fig. 5.
All three panels of Fig. 5 depict the trans-covariance matrix of the COSEBI modes E 1 to E 7 for redshift bin 8. These COSEBIs were computed by filtering ξ ± binned in 10000 angular bins, as described in appendix A. From left to right, the angular range of the COSEBIs was varied, to the effect of singling out different -ranges. The left trans-covariance matrix reveals a 6σ detection of skewness on the angular range [1 , 300 ]. The corresponding W log n ( ) filter has been displayed in blue in Fig. 1. Evidently, this filter emphasizes -modes in the range ≈ 80 − 150, and continues to assign significant amplitude to -modes at ≈ 200. Weak lensing is therefore subject to likelihood skewness over an -range roughly ten times larger than for the CMB.
The middle panel of Fig. 5 depicts the trans-covariance matrix for θ min = 1 and θ max = 360 . The corresponding W log n ( ) filter is depicted in yellow in the lower panel of Fig. 1. This filter emphasizes -modes slightly below those of the COSEBIs in the left panel. Accordingly, non-Gaussianities are now detected with more than 7σ significance, rather than 6σ as in the case of the previously discussed [1 , 300 ] COSEBIs.
The right-most trans-covariance matrix of Fig. 5 employs the angular range [70 , 360 ], whose W log n ( ) filter is depicted in green in Fig. 1. This filter also assigns a large amplitude on -modes around ≈ 100, but additionally shows a reversal of sign. As neighbouring P γ ( )estimates are highly correlated in weak lensing, this filter will thus sum over nearly identical P γ ( ) values which it weights with positive and negative signs. As the skewness of a distribution is reversed when the sign of the random variable is reversed, this leads to a cancellation of the skewness during the summation. Accordingly, the right-most trans-covariance matrix of Fig. 5 only displays a 3.2σ detection of residual skewness.

Scaling with angular range
Euclid and LSST will not only compile galaxy catalogues that reach deeper into the cosmic past than those of KiDS and DES, but will additionally cover a larger celestial area. Accordingly, it is natural that KiDS and DES limit their analysis of weak lensing correlation functions to smaller angular ranges than we discussed in Sect. 7.1.
In Fig. 6 we therefore depict trans-covariance matrices for KiDS-and DES-like redshift ranges and angular ranges. The depicted COSEBIs are E 1 to E 7 , computed for the first three redshift bins of Fig. 4. The upper limit of these redshift bins is z ≈ 1. We depict three sets of COSEBIs, with those of the left column spanning the angular range [1 , 20 ], the middle column using [0.5 , 70 ] and the right-most column [70 , 100 ]. All these COSE-BIs were computed fromξ ± (θ) binned in 10000 angular bins.
The upper row of Fig. 6 reveals that these low redshifts are subject to strong non-Gaussianity when cosmic variance alone is regarded. The skewness of the sampling The top row depicts E-modes without shape noise, the lower row includes it. The comparison reveals that although the KiDS-and DES-like redshift ranges are affected by non-Gaussianity from cosmic structure formation, it is the addition of shape noise that renders this skewness negligible. This is evidenced by the up to 18σ detections of skewness without shape noise (middle top panel), which is suppressed as soon as shape noise is added. Accordingly, the lower row shows no significant detection of skewness throughout the depicted redshift ranges. distribution of COSEBIs with θ min = 1 and θ max = 20 is detected with more than 12σ significance. For the angular range [0.5 , 70 ] the skewness of the COSEBI distribution is even detected with 21σ significance. On angular ranges [70 , 100 ] small clusters of 3.5σ significance appear, but otherwise the sampling distribution of these COSEBIs is compatible with a Gaussian.
The lower row of Fig. 6 reveals that the addition of shape noise suppresses the skewness due to cosmic variance efficiently: with shape noise, none of of the COSE-BIs display any significant detection of skewness anymore.

EXCESS PROBABILITY OF FINDING 'LOW' WEAK LENSING MEASUREMENTS
As known from , data points for ξ + (θ) on angular scales of 100 arcminutes and beyond are expected to be biased low by 0.3 standard deviations in the low redshift bins of a Euclid-or LSST-like survey. This bias was shown to exacerbate with redshift. ξ − is similarly biased low, but to a lesser degree.
COSEBIs form linear combinations of ξ + (θ) and ξ − (θ). If the COSEBI filter is positive everywhere, any COSEBI computed from low ξ ± will in turn be biased low itself. Only when the COSEBI filter changes sign can it be expected that biases are suppressed, see the discussion of Fig. 5 in Sect. 7.2. With Euclid and LSST set up to target large angular scales, we therefore investigate the biases of COSEBIs on the [1',300'] scale, and set these biases into perspective with the signal-to-noise contributed by each COSEBI mode.
In Fig. 7 we plot the excess probability ratio of COSEBI modes taking values below their mean. This 'excess' is rated in comparison to a Gaussian. If COSEBI data points were drawn from a Gaussian distribution, they are equally likely to take values below or above their mean.
To quantify the excess probability ratio of COSEBIs falling below the mean, we analyze all our COSEBI samples directly, without binning in a histogram. For each COSEBI mode E n , we compute the fraction of samples that falls below and above the meanĒ n . This estimates the two probabilities P(E n <Ē n ) and P(E n >Ē n ). For a Gaussian, the ratio of these probabilities will equal unity, with some small scatter due to estimating the probability from 819 samples.
We define the ratio of these probabilities as -Excess probability of drawing a 'low' weak lensing data vector in a 10-bin tomographic survey when insisting on a Gaussian likelihood. All COSEBIs here used span the angular range [1',300'], and include cosmic variance and shape noise. Plotted is the probability ratio of a COSEBI mode falling below or above its mean. For a symmetric Gaussian distribution, this ratio equals unity. The grey and white bands mark the different redshift bin combinations as indicated on the x-axis. The colourful points per band mark the probability ratio of the individual En modes per bin. As can be seen, in particular E 1 often has an excess probability of 20-25% of falling below its mean which is unexpected by a Gaussian likelihood. Moreover, the marker size of each mode is proportional to its signal-to-noise ratio (SNR), revealing that the COSEBIs contributing the highest SNR are most prominently biased low. The blue (purple) bars indicate the SNR-weighted (unweighted) average probability ratio of all modes within one redshift bin being biased low. This probability of being biased low increases whenever at least one redshift bin lies at large redshift, causing the visible sinusoidal pattern. Typical unweighted probabilities of being biased low amount to 10%. Weighting by the SNR, it can be seen that in 20-25% of all cases, the signal will be systematically lower than expected from a Gaussian likelihood. Roughly a quarter of a Euclid-or LSST-sized data set will thus be affected.
which is the 'excess' of COSEBIs taking values below their mean. Fig. 7 reveals that this excess probability for the COSEBIs deviates from unity. We observe that in particular the low COSEBI modes E 1 to E 4 are at least 10% more likely to fall below the mean than expected from a Gaussian. However, in particular for high redshifts, this excess can amount to 20-25%, in rare cases even exceeding 30%.
Assuming all COSEBI modes contributed with equal weight to the signal in a redshift bin, we compute the arithmetic mean per redshift bin This average excess probability ratio of all COSEBIs per bin falling below their mean is indicated as purple horizontal bars in Fig. 7. A sinusoidal pattern of this averaged excess can be seen, which originates from the ordering of redshift bin combinations: combinations with at least one bin at high redshift are systematically more likely to yield low values for COSEBIs. In comparison to a Gaussian, Fig. 7 indicates typically a 10% excess of yielding a low weak lensing data vector, if all seven COSEBIs are assigned equal weights while averaging. In reality however, the different COSEBI modes contribute a different signal-to-noise to the entire measurement. Computing an unweighted mean therefore neglects that certain COSEBIs are more influential for the final inference than others. To refine the study, we therefore proceed by computing the signal-to-noise per COSEBI, and then compute the signal-to-noise weighted expecta-tion for low COSEBI amplitudes. Fig. 8 depicts the signal-to-noise ratio for the different COSEBI modes in a 10-bin tomographic survey. In Fig. 7, the marker size of each COSEBI mode is proportional to its signal-to-noise. We define the signal-to-noise ratio as whereĒ n is the mean of a COSEBI mode, and σ(E n ) its standard deviation. In Fig. 8, the seven modes used in this study are indicated along the horizontal axes, whereas the redshift bin combinations are indicated on the vertical axis. As weak lensing integrates along the line of sight, the signal-to-noise increases with redshift. Also visible is that the information content of COSEBIs saturates with increasing n, implying that also within a single redshift bin combination the different COSEBI modes will have unequal impact on the total signal per bin.
To account for this inequality of modes, we compute the weighted average excess probabilitȳ which weighs modes according to their signal-to-noise. These weighted excess probability ratios for the weak lensing signal per bin to fall below its mean are marked by blue bars in Fig. 7. As the modes with highest signal to noise are precisely those that are the most biased, the weighted average for all COSEBIs taking values below their mean is larger than its unweighted equivalent: Fig. 7 reveals a 15-20% excess probability ratio of all The signal-to-noise that each En COSEBI contributes in a 10-bin tomographic survey with shape noise. Indicated along the horizontal axis are the redshift bin combinations, along the vertical axis the index n of the COSEBI E-modes. As expected from weak lensing being an integral effect, the highest signal-to-noise is carried by the highest redshift bin combinations. With increasing n, the COSEBI modes contribute ever less signal, motivating the truncation at n = 7. modes taking 'surprisingly low' values, if regarded with a Gaussian expectation. This trend exacerbates for high redshifts.
In total, the implications for a Euclid-or LSST-like survey are clear: given Fig. 7 there exists the choice between removing the modes of highest signal-to-noise from the data set in order to debias -or a non-Gaussian likelihood is required for a fully bias-free inference. As both surveys are conceived to yield a precision measurement of weak lensing, the removal of the highest signal-to-noise modes does not seem to be the natural solution.

CONCLUSIONS
The sampling distribution -and in consequence the likelihood -of any 2-point function is a priori expected to be skewed. This has been implemented in likelihoods of the CMB, see e.g. Planck Collaboration et al. (2019), and while Manrique-Yus & Sellentin (2020) developed a skewed core-likelihood for weak lensing, galaxy clustering and their cross-correlation, the standard in weak lensing analyses still employs a symmetric Gaussian likelihood. Skewness in a sampling distribution implies the measurement gained most likely will systematically differ from the average measurement. As already shown in , this affects weak lensing, and both the magnitude and direction of this bias are similar to the lowness of σ 8 as in comparison to Planck's measurements.
From a somewhat misplaced appeal to the Central Limit Theorem, one could hope that data compression reduces the bias arising from skewness. The hope therein is that large and small values of data points cancel during compression, thereby yielding a compressed data vector that scatters more tightly around its mean. This hope is somewhat misplaced, as the Central Limit Theorem implies that (potentially weighted) sums over random variables will tend towards Gaussianly distributed variables. This is still consistent with the statement that if all random variables happen to be 'low', then their sum will equally be 'low'.
Cosmological data compression often takes the form of filtering, where the scalar product with the filter causes the weighted averaging. Accordingly, if the original signal is low, the filtered signal will also be low, unless the filter has specifically been constructed to cancel the low values. This argument holds for all filter-based compression techniques (Heavens et al. 2000(Heavens et al. , 2017(Heavens et al. , 2020Alsing & Wandelt 2018) and in weak lensing also for COSEBI filters (Schneider & Kilbinger 2007;Schneider et al. 2010).
COSEBIs were originally designed to split weak lensing E-and B-modes. A priori, they are thus not made to suppress biases from skewness, but could coincidentally do so. Accordingly, this paper studied whether COSEBIs suppress biases from sampling distribution skewness.
In Sect. 7.1 we focussed on a Euclid-or LSST-like 10 bin tomoraphic weak lensing survey and showed that the skewness of a given COSEBI mode's sampling distribution increases with redshift. Up to a unit redshift, which comprises the majority of all galaxies observed by KiDS and DES, no significant skewness is detected, implying a COSEBI analysis of the current surveys may use a Gaussian likelihood. In contrast, roughly half of the galaxies observed with LSST or Euclid will fall into redshift ranges where the skewness of the COSEBI distribution is detected at 6σ significance.
In Sect. 7.2 we provide evidence that weak lensing observables are drawn from a much more skewed distribution than CMB observations of the same multipole. While the CMB likelihood for spherical harmonics power spectra C Gaussianizes at ≈ 20, the weak lensing likelihood remains skewed until ≈ 100.
The skewness of 2-point function sampling distributions implies that if the data are analyzed with a symmetric Gaussian likelihood, then one will often find 'surprisingly low' amplitudes. This arises since the Gaussian misestimates how frequently the data will take values below their mean. In Sect. 8 we showed that if a Gaussian likelihood is insisted upon for a Euclid-or LSST-like survey, then in 10% of all cases the data points will appear to be systematically low. Particularly affected are those COSEBI modes which contribute the highest signal-tonoise, such that in the high signal-to-noise regime systematically low amplitudes are to be expected in up to 25% of the cases. In total, we conclude that the next generation of surveys has the choice between lowering the signal-to-noise by discarding the modes which are affected by skewness-induced biases, or to retain these modes and develop a non-Gaussian likelihood for their analysis.
The skewness here discussed is known from analyses of the cosmic microwave background; as such, it is fundamentally different from gravity-induced non-Gaussianity.
The interested reader may find a further discussion of gravitationally induced non-Gaussianity in weak lensing in appendix B.
COSEBIs were initially introduced in order to split E-and B-modes in a weak lensing survey. Their filters are polynomials that are linear or logarithmic in the angular scale θ. As linear COSEBIs have been shown to be significantly less efficient in retaining cosmological information (Schneider et al. 2010), we directly used logarithmic COSEBIs in this paper.
Unlike a correlation function that is a continuous function of angular separation θ, COSEBIs are set of discrete modes, labelled by an index n. This index is accordingly indicated in all relevant figures. We denote modes that only contain E-type power as E n , and COSEBI B-modes by B n . If a cosmology of cold dark matter (CDM), and a non-standard dark energy equation of state w is to be studied, then expanding COSEBI modes until n = 7 is sufficient (Asgari et al. 2012).
To derive COSEBIs via their property of splitting E-and B-modes, we introduce an E-mode power spectrum P ij E ( ), and a B-mode power spectrum P ij B ( ), where ij determines the tomographic bin combination. Weak lensing cannot induce a B-mode power spectrum, and it is likewise hoped that no systematics contribute to the E-mode power spectrum. In this case it then follows that P E ( ) equals Eq. (3).
In terms of both P ij E ( ) and a putative P ij B ( ), the weak lensing correlations functions ξ ± are given by and implying they will mix E-and B-modes, if B-modes are present (Schneider & Kilbinger 2007). To create a linear filter that splits E-from B-modes, the ansatz is chosen. The filter functions T (n) ± (θ) thus depend on the index n which determines the order of the COSEBI mode. To construct a filter which splits E-from B-modes, we transform the filters T This allows us to read off the defining property of a filter that splits E-and B-modes, namely the equality W (n) such that the angular brackets in Eq. (A5) vanish, thereby enforcing the split.
Fortunately, infinitely many solutions for the constraint Eq. (A6) exist. This enables the specification of further advantageous side constraints on the filter. COSEBIs use this freedom to impose additionally that only a finite range in θ be used. The integrals in Eq. (A3) will then only run over the interval [θ min , θ max ] rather than from zero to infinity. By setting [θ min , θ max ] to within the bounds of the observation, we can thus enforce that only observed ranges are used.
To exclude angular ranges, the filter T (n) + (θ) ≡ 0 outside [θ min , θ max ], implying the filter's zero amplitude does not let pass data from outside these angular ranges. As T (A7) such that also the T (n) − does not let pass data outside the chosen angular range. The two constraints Eq. (A6) and Eq. (A7) are of direct importance, as they enforce the E/B-mode split and the usage of observed angular ranges only. Schneider et al. (2010) furthermore impose the constraint that the sequence of filters be orthonormal, which is convenient, but not necessary. Importantly however, Schneider et al. (2010) show that cosmological information is better retained if the T-filters are logarithmic in θ, thus motivating the last transformation T (n) + (θ) = t (n) + (z), z = ln(θ/θ min ). (A8) At this stage, Eq. (A6) till Eq. (A8) fully specified the scientific aims to be met with COSEBIs. The constraints following therefrom can now be met, implying the COSEBI-filters can now be explicitly constructed. To do so, we follow Eq. (28) -Eq. (39) of Schneider et al. (2010), such that we compute our polynomial COSEBIs via their roots. This yields the filters shown in Fig. 1 and Fig. 2 which are used to gain the results on skewness in this paper.

DISCUSSION OF GRAVITATIONAL SKEWNESS
This paper has focused on a quantification of skewness in the sampling distribution and likelihood of 2-point functions derived from weak lensing observations. This quantification is required to understand biases during parameter inference. Nonetheless, it transpired that much interest exists to also understand the role of gravity in this skewness.
This interest is in particular caused by the excess skewness of weak lensing distributions (until ≈ 100), in comparison to the likelihood of the CMB power spectrum, which is only noticeably skewed until ≈ 20. While the impact of shape noise and cosmic variance on the skewness has already been factored apart in , gravity is often put forward as a potential additional cause, and we shall thus discuss it here.
We here observed the trend that excess skewness increases with redshift, which is also consistent with previous studies on gravitationally caused non-Gaussianity: Schmit et al. (2019) show that the gravitational bispectrum is dominated by triangles of the squeezed and elongated type, consistent with our observed redshift trend, and Munshi et al. (2020) show that the weak lensing bispectrum is also dominated by squeezed configurations. The amplitude of the weak lensing bispectrum equally grows with redshift, again consistent with the findings of this paper. We limit ourselves to describing these as 'consistencies' and refrain from attributing the excess skewness to gravity alone.