Sheer shear: weak lensing with one mode

3D data compression techniques can be used to determine the natural basis of radial eigenmodes that encode the maximum amount of information in a tomographic large-scale structure survey. We explore the potential of the Karhunen-Lo\`eve decomposition in reducing the dimensionality of the data vector for cosmic shear measurements, and apply it to the final data from the \cfh survey. We find that practically all of the cosmological information can be encoded in one single radial eigenmode, from which we are able to reproduce compatible constraints with those found in the fiducial tomographic analysis (done with 7 redshift bins) with a factor of ~30 fewer datapoints. This simplifies the problem of computing the two-point function covariance matrix from mock catalogues by the same factor, or by a factor of ~800 for an analytical covariance. The resulting set of radial eigenfunctions is close to ell-independent, and therefore they can be used as redshift-dependent galaxy weights. This simplifies the application of the Karhunen-Lo\`eve decomposition to real-space and Fourier-space data, and allows one to explore the effective radial window function of the principal eigenmodes as well as the associated shear maps in order to identify potential systematics. We also apply the method to extended parameter spaces and verify that additional information may be gained by including a second mode to break parameter degeneracies. The data and analysis code are publicly available at https://github.com/emiliobellini/kl_sample.


INTRODUCTION
Imaging galaxy surveys are one of the most promising paths towards understanding some of the main open questions in cosmology (Hu 1999;Amara and Réfrégier 2007), such as the precise nature of the energy components dominating the late-time expansion (Linder 2003), the level of non-Gaussianity in the cosmic inhomogeneities induced by early-time physics (Dalal et al. 2008;Alonso and Ferreira 2015), or the mass of neutrinos (Hu et al. 1998). To do so, these experiments combine the constraining power of several cosmological probes, such as Type-Ia supernovae, strong lenses, the distribution of galaxy clusters, the clustering of galaxies, and the weak gravitational lensing effect through the correlated distortion of galaxy shapes.
Current and near-future photometric surveys, such as the Dark Energy Survey (DES Collaboration et al. 2017), the Kilo-Degree Survey (Hildebrandt et al. 2018), the Hyper Suprime-Cam (Hikage et al. 2018), Euclid (Laureijs et al. 2011) or the Large Synoptic Survey Telescope (LSST) (LSST Science Collaboration et al. 2009) face a number of observational, statistical and computational challenges in order to extract the maximum amount of cosmological information from them without incurring significant systematic biases. Arguably the most critical challenge for these experiments is the correct treatment of the photometric redshifts (Newman et al. 2015;). An object by object analysis would be expensive and the broad nature of the lensing kernel dilute the information contained by each galaxy. In addition, we lack perfectly accurate posterior redshift prob- * emilio.bellini@physics.ox.ac.uk ability distributions for each object. This is why a socalled tomographic approach (Hu 1999) has become the standard analysis path. In this case, the full galaxy sample is divided into photometric redshift (photo-z) slices, and cosmological constraints are extracted from the different cross-correlations between the 2D galaxy positions and shapes in all pairs of redshift bins. In this approach, the problem of obtaining and interpreting accurate posterior redshift distributions is replaced by the potentially simpler problem of estimating the redshift distribution of all objects in each redshift bin. The number and distribution of the different redshift bins used in the analysis is then determined as a compromise between the needs to maximize the amount of information recovered and to minimize the size of the final data vector (in the form of a collection of all auto-cross-bin two-point functions).
In Alonso (2018), we proposed a method to determine the optimal slicing for a photometric survey that recovers the maximum amount of information in the minimum number of radial modes. The method is based on the Karhunen-Loève transform, and follows a formalism similar to other 3D data compression methods 1 proposed in the literature (e.g. Heavens 2003;Kitching et al. 2007;McEwen and Leistedt 2013;Khalid et al. 2014;Leistedt et al. 2015;Asgari and Schneider 2015). The resulting modes are in general scale-and redshift-dependent, and in the case of spectroscopic galaxy clustering can be related to the standard harmonic-Bessel transform that leads to the standard 3D power spectrum analysis, see a.g. Alonso (2018). In the case of cosmic shear, due to the radially cumulative nature of gravitational lensing, the Karhunen-Loève transform can produce a significant reduction in the dimensionality of the final data vector, which has a direct impact on the computational complexity of estimating its covariance matrix (beyond guaranteeing an optimal recovery of cosmological information).
Two main approaches have been used in the literature to attack the problem of covariance matrices for largescale structure and weak lensing data. On the one hand, the sample covariance matrix can be computed by running a large number of simulations resembling the data, estimating the data vector in each of them, and estimating their sample variance. In order to recover a converged covariance that can be safely inverted, this requires running a number of simulations several times larger than the size of the data vector (Dodelson and Schneider 2013;Petri et al. 2016) (O(10) times larger is often a good rule of thumb). Since the size of a complete set of tomographic cross-correlations scales as ∼ N 2 bin with the number of tomographic bins, the computational complexity of estimating the sample covariance can grow very fast if a large number of bins is required in order to recover the cosmological information. On the other hand, it is also possible to compute analytical estimates of the covariance matrix. These analytical covariances (Krause and Eifler 2017) are often faster to compute than the simulated ones, but they can also be computationally challenging, since it is often necessary to accurately account for the mode-coupling caused by the survey mask (e.g. see García-García et al. 2019), and real-space covariances can require expensive double Hankel transforms. Since all elements of the covariance matrix must be estimated individually, the complexity for analytical covariances scales with the square of the size of the data vector, or ∼ N 4 bin . The problem of estimating covariances can therefore benefit from efficient compression methods .
As a proof of concept for the potential of this method, in this paper we apply it to the analysis of cosmic shear data from the Canada-France-Hawaii Telescope Lensing Survey (hereafter refered to as CFHTLenS, . Even if more up-to-date cosmic shear data are available, the choice of CFHTLenS has been made mainly because other 3D data compression methods have been applied in the past (Kitching et al. 2014), which makes possible a comparison between different methods. We build upon the tomographic analysis of  and present the performance of our method in terms of optimal radial eigenmodes, compression factor, final constraints and goodness of fit. The paper is structured as follows: in Section 2 we provide the theoretical background for weak lensing tomographic analyses and the Karhunen-Loève transform as a data compression method. Section 3 then introduces the data used in our analysis, as well as the different data products we derive from it. Our main results are presented in Section 4, which we discuss and summarize in Section 5.

Cosmic shear 2-point functions
The cosmic shear field for galaxies observed in a given tomographic redshift bin α characterized by a redshift distribution n α (z) is given by where Φ(χn) is the gravitational potential in the lightcone towards a sky position with spherical coordinates (θ, ϕ) defined by the unit vectorn, χ H is the comoving radial distance to the horizon. The lensing kernel q α in flat space becomes Here, z is redshift, and ð is the spin-raising differential operator, which acts on a spin-s quantity defined on the sphere s f (n) as 2 The harmonic decomposition of a spin-2 quantity like γ gives rise, in general, to both E-and B-modes (e.g. see Zaldarriaga and Seljak 1997). Since gravitational lensing only gives rise to negligible B-modes, we will only concern ourselves with the E-mode component here (although our data analysis pipeline computes the B-mode component as a test of systematics).
Since the gravitational lensing that shears the observed shape of a galaxy at redshift z is caused by the collective effect of all the matter inhomogeneities along the line of sight to that redshift (manifested by the cumulative nature of the lensing kernels q α in Eq. 2), the weak lensing signal in different redshift bins is usually strongly correlated. The effective number of degrees of freedom in a tomographic analysis therefore increases very slowly with the number of redshift bins, which motivates the use of radial data compression, as described in Section 2.2, to determine the optimal radial modes that saturate the information content of a given weak lensing survey.
The two-point correlator of the weak lensing E-mode signal in two redshift bins α and β at multipole can be related to the power spectrum of matter fluctuations P (k, z) as (Kilbinger et al. 2017) where S stands for Signal, see Eq. 5 below, and a(χ) is the scale factor at the comoving distance χ, H 0 is the Hubble constant and Ω M is the matter abundance at z = 0. In deriving Eq. 4 we have used the Limber approximation 3 (Limber 1953; Afshordi et al. 2004), and assumed a flat cosmology. Under the assumption that the observed shear is a combination of the true underlying signal and stochastic noise, the total power spectrum is also a sum of the signal and noise contributions: We assume the noise power spectrum to be uncorrelated across bins (i.e. N αβ ∝ δ αβ ), and we estimate it as described in Section 3. The shear angular power spectrum can then be related to the real-space correlation functions ξ ± as where J ν (x) is the cylindrical Bessel function of order ν, and we have used the flat-sky approximation, which is appropriate for the small angular footprint covered by the CFHTLenS survey, i.e. ∼ 154 deg 2 .
To estimate the theoretical predictions for all two-point functions used in this work we used the Core Cosmology Library (CCL) (Chisari et al. 2018), with the matter power spectrum computed by CLASS (Lesgourgues 2011) using the version of Halofit by Takahashi et al. (2012).
2.2. The Karhunen-Loève transform and radial data compression Suppose we have a data vector x with dimension N s , that we assume Gaussianly distributed with zero mean, and whose covariance C ≡ xx † depends on a particular parameter θ we want to measure. It is possible to find a linear transformation E such that the elements of the transformed data vector y ≡ E † x have unit variance and are uncorrelated (i.e. y p y * q = δ pq ), and such that the first m < N s elements contain most of the information about θ (Tegmark et al. 1997;Bond 1995;Vogeley and Szalay 1996). The columns of E, which we label e p , are the eigenvectors of the following generalized eigenvalue problem: where ∂ θ denotes the partial derivative with respect to θ. The resulting linear transformation is the so-called Karhunen-Loève (KL) transform, which can be used to simplify the analysis for one particular parameter. In a more general situation, we would like to measure several parameters simultaneously. Although under certain circumstances it may be possible to find a set of KL eigenmodes that maximize the overall information content, as described in Tegmark et al. (1997), a simpler and popular approach is to instead maximize the information on the overall amplitude of the signal. For concreteness, let us decompose our data into signal and noise contributions x = s+n, and let us introduce a fictitious amplitude α with fiducial value 1 such that in general x = αs + n. Under the assumption that signal and noise are uncorrelated, and redefining 2/(2 − λ p ) → λ p , we can rewrite the generalized eigenvalue problem in Eq. 7 as where C = S+N is the covariance matrix of x as a sum of the signal and noise covariances. Using the Cholesky decomposition of the noise covariance N = LL † , and definingẽ p ≡ L † e p , the previous equation can be rewritten as a standard eigenvalue problem forẽ p : Solving forẽ p , the new data vector can then be computed as y p =ẽ † p L −1 x. The covariance of the new data vector is given by the eigenvalues y p y * p = λ p δ pp , and the transformed noise covariance is simply the identity.
As described in Alonso (2018), this formalism can be used to determine the optimal set of radial modes in a given photometric redshift survey. In this case, our data vector x is the vector of E-mode harmonic coefficients of the cosmic shear field in different redshift bins for a fixed ( , m), and S, N and C are the signal, noise and total power spectra described in the previous section at the corresponding value of . The KL eigenmodes e α p, are therefore scale-dependent functions in general. However, as shown in Alonso (2018), this scale dependence is very mild for cosmic shear data, and in this case the eigenmodes can be thought of as redshift-dependent galaxy weights that can be used to generate the set of most informative shear maps. We will explore this in detail in Section 4. In this case, each of the KL modes can be interpreted as a shear map associated with an effective redshift distribution given byñ p (z) = e α p n α (z).

DATA
3.1. The CFHTLenS lensing sample We use data from the CFHTLenS survey 4 . Their analysis performed weak lensing data processing with THELI , shear measurement with lensfit , and photometric redshift measurement with PSF-matched photometry . A full systematic error analysis of the shear measurements in combination with the photometric redshifts is presented in , with additional error analyses of the photometric redshift measurements presented in .
CFHTLenS is a deep multi-colour survey that spans 154 deg 2 of the sky divided in four fields -labelled W1, W2, W3 and W4. The full CFHTLenS catalogue contains ∼ 2 × 10 7 objects (precisely 22,652,728) that have been classified as "galaxy", "star" or "bad fit object" by the lensfit algorithm (see Miller et al. 2013 for details). Furthermore, some of the objects are in masked regions of the sky due to observational systematics, or their position is in MegaCam fields that did not pass the lensing residual systematics effects, as described in . Thus, in any shear analysis that uses CFHTLenS data, the catalogue has to be cleaned from objects that are classified as stars or lie on unreliable sky regions. In detail, the filters we used in order to obtain a clean catalogue are: • MASK= 0: to select only unmasked objects; • weight> 0: objects that have been measured with strictly positive weight; • star flag= 0: remove stars and bad fits, and keep only galaxies;   Table 1 Redshift bins used in this analysis along with the effective density n eff per bin and field. The last column shows the total n eff per bin.
• remove objects with positions in the 1 deg 2 camera MegaCam fields that do not pass the lensing residual systematics effects. For a list of those fields see .
After using these filters, the clean catalogue reduces to 4,180,003 objects, which we use to estimate the lensing two-point statistics.
The ellipticities of each source, as well as their additive (c i ) and multiplicative (m i ) correction biases and an approximately optimal inverse variance weight (w i ) are obtained using the Bayesian model fitting software lensfit . Photometric redshifts are derived from the Bayesian photometric redshift code BPZ (Benítez 2000), which returns a full redshift probability distribution p(z) for each object. As in , we split the sample into 7 tomographic bins, using the peak of the redshift distribution of each galaxy (z B ) as a redshift bin label. Table 1 shows the redshift bins chosen along with the effective density of galaxies in each bin, defined as Here the effective area (A eff ) per field is [41.0, 11.6, 24.7, 12.2] deg −2 for W1-4 respectively. The redshift distribution for each tomographic bin was obtained, as in , by stacking the photo-z p(z) of all unmasked objects in that bin. The resulting normalized redshift distributions are shown in Figure 1.

Masks and maps
In this section we describe in detail the method used to generate masks and maps of shear measurements, which are then used to compute the power spectra. As mentioned, the relative small area covered by CFHTLenS allows to use a flat-sky approximation. In this case, masks and maps are 2D arrays of pixels (one array for each field). In each pixel, the masks quantify the quality of the CFHTLenS measurements in a given sky area, while the maps store the value of the two shear components estimated by averaging over the ellipticities of all sources in the pixel.
To define our sky mask, we start from a set of individual sky masks for each of the 4 CFHTLenS fields. These have a resolution of 1 arcsecond and allow us to remove problematic areas, such as those outside the CFHTLenS footprint, around stars and stellar halos, significant object overdensities, flagged pixels, and others. The full description of the masked areas can be found in Table B2 of . While in  the authors mention that it is possible to consider as unmasked areas those that lie in large masks around stars and stellar haloes (mask value of 1), here we adopt a more conservative approach, and consider only areas with mask value of 0. We also remove by hand all pixels falling inside any of the pointings identified by Heymans et al. (2012) as being systematics dominated (corresponding to a total of ∼ 25% of the available area). Once all sources in masked pixels are removed from the data, we downgrade the resolution of this binary mask to a pixel size of 2 arcminutes, while preserving the information in the finer map in terms of the fractional area masked in each larger pixel. We chose this resolution to guarantee that the majority of the unmasked pixels will be populated by, on average, four galaxies in each redshift bin. Finally, to optimally weight each pixel when computing the angular power spectra, for each field and redshift bin we define the weights map used to compute the power spectrum (see Section 3.3) to be proportional to the sum of the lensfit weights of all sources in each pixel. Figure 2 shows the mask and weights map used in this analysis for a particular field (W3). The top panel shows the 2 arcminutes mask obtained by downgrading the original 1 arcsec binary mask. This is the mask that has been used to calculate the area of each field used in Eq. 10. The bottom panel represents the weights map described above, used by the power spectrum estimator. It is worth noting that the weights map is therefore different for each redshift bin, and the associated modecoupling matrix (described in the next section) must be computed independently for each pair of bins.
To compute power spectra we also need to produce maps of the two shear components γ 1 (n) and γ 2 (n). A first estimate of these maps is made as a weighted average of the ellipticity components of all sources falling within each pixel corrected for their measured additive and multiplicative shape measurement bias c i,g and m p where g stands for galaxy and p for pixel. While the additive correction can be applied on a object by object basis (noting that c 1 = 0 in the CFHTLenS analysis), m must be estimated by averaging over a large number of sources to avoid a large bias in the final maps ). Thus, we make a map of m p for each field and redshift bin by averaging, for a given pixel p, over the multiplicative bias parameter of all sources lying within a square of size 10 arcmin centred around p. In Figure  3 we show the multiplicative correction map for the W3 field and the sixth redshift bin (the bin with the highest shear signal-to-noise ratio). It is possible to notice that this gives up to a ∼ 10% correction to the shear measurements. Finally, Figure 4 shows the shear maps used in this analysis for the W3 field and the sixth redshift bin, with γ 1 and γ 2 shown in the top and bottom panels respectively.
3.3. Two-point functions We estimate all EE and BB cross-power spectra between the 7 different redshift bins using a pseudo-C estimator as implemented in NaMaster . We describe the estimator briefly here and refer the reader to the original paper for further details.
The pseudo-C estimator attempts to make an unbiased measurement of the power spectrum of a masked field by analytically computing the coupling between modes induced by that mask. Mathematically, let f v (n) = v(n) f (n) be the observation of an underlying true field f through a multiplicative weights map v. Minimally, this weights map will be a binary mask removing  regions where the field has not been observed (v = 0) leaving only the observed pixels (v = 1). More optimally, the weights map should be proportional to the local inverse noise variance, downweighting or upweighting different regions depending on their noise properties. In our analysis, the weights map is given by the sum of lensfit weights in each pixel. Due to the convolution theorem, a direct estimate of the power spectrum from the masked fieldsC f g ignoring 10 −10 10 −9 10 2 10 3 C (7, 7) Figure 5. Measurements of the E-mode power spectra estimated from CFHTLenS data. Red points and red solid error bars indicate positive C , while blue triangles and dashed error bars refer to negative C . Solid black lines show the theoretical predictions for the best fit cosmology of . Gray shaded areas represent -bandpowers that have been discarded in this analysis.
No. 1310 -2300 7 2300 -5100 Table 2 We show the -bandpowers used in this analysis. The first column indicates the number of the bandpower, the second one the -range that each bandpower covers and the last column indicates which bandpowers were actually used to get cosmological constraints.
their masks, will in general yield a mode-coupled version of the underlying true power spectrum where the mode-coupling matrix M vw depends on the masks v and w, and can be estimated analytically. The mode-coupling matrix is usually ill-conditioned and cannot be directly inverted to yield an unbiased estimate of C f g . The usual approach is then to bin the pseudo-C into bandpowers: For this work we use the same bins chosen by Köhlinger et al. (2016) 5 and summarized in Table 2. This choice was motivated by the need to capture the most relevant power spectrum features while staying within the mildly non-linear regime. For sufficiently wide bandpowers, the corresponding binned coupling matrix becomes invertible, and the final estimator takes the form whereÑ f g is an estimate of the noise pseudo-power spectrum (the so-called "noise bias"). This estimator, described here for scalar fields on the sphere, can be easily generalized to spin-2 quantities and flat skies, which we use in this work. We refer the reader to  for further details. Finally, as in Hikage et al. (2018), we estimate the noise bias in Eq. 15 by averaging over the power spectrum measured in 1000 noise realizations, generated by randomly rotating the ellipticities of all sources in the catalogue.  10 2 10 3 10 6 C BB (7, 7) Figure 6. Same as Fig. 5 for the B-mode power spectra. Here, the grey areas represent the -bandpowers that were excluded in the analysis, where a detection of B-modes should not be taken into account. We do not find statistically significant B-modes, with a PTE of ∼ 10% for the measurements shown. Gray shaded areas represent -bandpowers that have been discarded in this analysis.
This procedure allows us to estimate the power spectra in each of the 4 CFHTLenS fields. The combined power spectra are then computed as an inverse-variance sum of the per-field power spectra: Here C f is the vector of all power spectra computed in field f , Σ f is the power spectrum covariance matrix in field f , described in the next section, and C tot and Σ tot are the combined power spectra and covariance matrix. The resulting measurements of the E-mode power spectra estimated from the CFHTLenS data are shown in Figure 5, where the error bars are computed from the covariance matrix described in Section 3.4. We find good agreement with the best-fit cosmological model of  (solid lines). It is worth noting that this is the first time that a pseudo-C analysis of these data has been carried out.  used real-space correlations measured on these same redshift bins, while Köhlinger et al. (2016) used an optimal quadratic estimator on two redshift bins that would be computationally impractical for a larger number of subsamples. After implementing the scale cuts shown in Figure 5, the final data vector has 140 elements.
We use three sanity checks to validate the power spectrum estimation. First, running our pipeline on the set of 2000 Gaussian simulations used to calculate the covariance matrix (described in Section 3.4), we verify that we are able to recover the input power spectra with no appreciable bias (this was also thoroughly verified in . Secondly, we estimate the Bmode power spectrum of all pairs of redshift bins in the CFHTLenS data. Since the cosmological signal is not expected to yield a detectable B-mode signal, the presence of B-modes can be used as a diagnostic for unknown systematics. Within the range of scales used in this analysis we do not detect significant B-modes, with a probability to exceed (PTE 6 ) associated with the measurements shown in Figure 6 of ∼ 10%. Lastly, with our pipeline we replicated the analysis done in Köhlinger et al. (2016), and the resulting E-B-modes match perfectly. Other Bmode analyses of the CFHTLenS data have been carried out in Kitching et al. (2014),  and Alsing et al. (2017).
Although our main analysis uses Fourier-space observables, we also present results for the measurements of the correlation function made by , in order to make a direct comparison with those results. The measurements correspond to the 7 tomographic bins from z = 0.15 − 1.3 and 7 angular bins from θ = 1 − 120 arcmin, which post-masking results in a data vector of 280 elements. The correlation functions ξ ± were computed for each pair of redshift bins in each of the four fields using the athena software 7 (Kilbinger et al. 2014). These measurements include the same additive shear correction as in the Fourier space analysis, and moreover include a weighted ensemble average multiplicative shear correc- The bin pairs are shown in row-major order (i.e. ((1, 1), (1, 2), ..., (1, 7), (2, 2), (2, 3), ..., (6, 7), (7, 7))).
tion following  and . The patch-dependent measurements were subsequently averaged by the number of galaxy pairs returned by athena.

Covariance matrix
To estimate the covariance matrix of the power spectrum measurements we make use of 2000 Gaussian simulations of the 4 fields. Each simulation is generated in two steps: 1. We generate a Gaussian realization of the expected cosmic shear signal γ. To do so, we draw Gaussian realizations of the Fourier-space E-modes with a variance given by the E-mode power spectrum predicted by the best-fit cosmological parameters of , and no B-modes. These are then transformed into real space to produce a shear map. Note that the Gaussian Fourier coefficients are not drawn independently for each redshift bin, but fully preserve the expected correlation between them.
2. We then interpolate the Gaussian shear field to the position of each source in the original catalogue. The total ellipticity of each source is then calculated by adding, to the interpolated shear value, the galaxy's true ellipticity rotated by a random angle. This last steps allows us to include shape noise in a realistic way (under the assumption that the ellipticity of a single object is dominated by noise, and not by the lensing-induced shear).
Each simulated catalogue is then put through the same power spectrum pipeline used for the real data, including masking, power spectrum estimation and field coaddition. We use the power spectra measured in each simulation to estimate the sample covariance matrix as: where N s is the number of simulations used to estimate the covariance, C i is the vector of power spectra measured in the i-th simulation and · denotes averaging over simulations. When using this covariance to compute Gaussian likelihoods, we correct the inverse covariance by the multiplicative 'Hartlap factor' introduced by Hartlap et al. (2007) to account for the finite number of simulations used. As shown by Sellentin and Heavens (2016), this approximation is sufficiently accurate given the large number of simulations used in our analysis. This was explicitly confirmed in . Note that, as described in the next section, we use a maximum of N s = 2000 simulations when analysing the full set of power spectra, but we scale this number down with the size of the final data vector when exploring cases with different numbers of KL modes. This is to showcase the usage of data compression to reduce the number of mock realizations needed to estimate the covariance matrix while retaining the final constraining power. In detail, the number of simulations used for each case is chosen to keep the 'Hartlap factor' constant.
We neglect all non-Gaussian contributions to the covariance matrix. The only relevant contribution for weak-lensing measurements, as quantified by Barreira et al. (2018) is the so-called super-sample covariance, which becomes less important for low-density samples where the measurements on high-s are noise-dominated. This fact, together with the acceptable value of the χ 2 we obtain both for the fiducial and best-fit cosmological parameters (see Section 4) imply that the covariance matrix estimated with this method is appropriate for our analysis. It is worth noting that our objective is not to provide state-of-the-art constraints on cosmological parameters from these data, but rather to showcase the performance of the KL decomposition in terms of data compression. The resulting E-mode power spectrum covariance is shown in Figure 7. For a given auto-or crossspectrum, the covariance is strongly dominated by the diagonal.
In order to compare our results with those of Joudaki et al. (2017), we follow their estimation of the covariance matrix for the real-space correlation functions as closely as possible. In this case, the covariance matrix is estimated from a large suite of 497 distinct Scinet LIght Cone Simulations (Harnois-Déraps and van Waerbeke 2015) based on a WMAP9+BAO+SN cosmology. Following 2 × 2 sub-division of the simulation boxes, the resulting set of 1988 pseudo-independent mock CFHTLenS shear catalogues correspond to a 'Hartlap factor' of 0.86, and have the correct effective source density, shape noise, photometric redshift scatter, small-scale masking, and other properties that match the real survey. Given the different effective areas of the four CFHTLenS regions, the final covariance was obtained through area-weighted averaging of the patch-dependent inverse covariances. We refer the reader to  for further details. The number of simulations used for the analysis of different sets of KL modes was scaled with the size of the data vector as described in the case of the Fourier-space measurements. Before presenting the main result of this paper: i.e. the level of data compression that can be achieved through the KL transform, it is useful to review the key steps of the method, as described in Section 2.2, and how we apply them in practice to the data. Figure 8 shows the diagonal part of the angular E-mode power spectra, C αα , as computed by CCL, together with the expected noise N αα as a function of the multipole for the best fit parameters of , which we use here as our fiducial cosmology. The fact that the noise dominates over the signal for a wide range of scales, and that the off-diagonal signal terms (not shown here) have amplitudes that are similar to the diagonal (given the radially cumulative nature of weak lensing), indicates that different modes are tightly correlated and therefore that the effective number of signal-dominated independent radial modes should be small (i.e. the information carried out by the angular power spectrum is inefficiently spread out all over the redshift bins).

The Karhunen-Loève transform
As described in Section 2.2, the two advantages of working with KL-transformed power spectra, is that ideally some combination of the redshift bins is constructed to have uncorrelated radial modes and that the information is compressed in the first few modes. This second feature can be seen in Figure 9, where we show thedependence of the eigenvalues of Eq. 9, which represent the KL-transformed power spectra (labelled Λ α here). This quantity measures the amount of signal to noise present in each mode, where the KL-transformed noise power spectrum is simply unity (δ αβ ) for all s by construction. It is clear that the majority of the information is concentrated in the first two bins, while the information we lose by considering only the first three bins is almost negligible and the remaining bins are noise dominated.
A further simplification comes by assuming our KL transform to be -independent, as proposed in Alonso (2018). This is necessary in real space, where multipoles are replaced by angles θ 8 , but they can be safely used even in Fourier space. This is shown in Figure 10, where we plot the first three principal eigenvectors of Eq. 9 as a function of redshift. Each colour represents a single eigenvector, while different shades refer to different s: lighter colours for low-s and darker ones for high-s. Excluding the first few s (where cosmic variance dominates the uncertainties and little information can be recovered), all eigenvectors for the same α clearly have the same shape. Moreover, the blue lines, which represent the first eigenmode and dominate the information content, are the least scattered even at low-. Then, an averaged version of the KL transform e α p, can be written asē α p = (2 + 1) e α p, where the 2 + 1 factors are included to roughly account for the statistical weight of each harmonic multipole. When performing the real-space analysis, we used this -independent KL transform. In Fourier space we tested both options, and we will show that dropping the scale dependence of the eigenvectors makes little or no difference. The measured power spectra for the first three KL eigenmodes using a scale-dependent KL transform are shown in Figure 11, together with the theoretical prediction corresponding to the fiducial cosmological model used in the analysis. This illustrates how the method concentrates the bulk of the signal-to-noise into the first few modes.
Finally, it is worth emphasizing that, since a critical use of data compression is to reduce the number of simulations needed to estimate the covariance matrix, when analyzing different compression schemes in the next section, we used a covariance matrix computed from a reduced set of simulations, scaled from the full set by the size of the data vector. This allows us to show that achieving a compression factor f will allow us to carry out the analysis using f times fewer simulations.

4.2.
Observational constraints from the KL decomposition In order to study the efficiency of different data compression schemes, we explore the posterior distribution of cosmological parameters using an MCMC sampler 9 . We assume that the two-point functions (both in realand Fourier-space) follow a Gaussian likelihood. This is known to be formally incorrect (Sellentin and Heavens 2018;Hamimeche and Lewis 2008), and can give rise to parameter shifts below the 1σ level. For the purposes of this analysis (i.e. to show the performance of the KL transform in terms of data compression) this effect can be ignored, particularly given the fact that most of the constraining power is obtained from the smaller scales, where the large number of available modes makes the Gaussian approximation acceptable due to the central limit theorem. As in , we use a standard ΛCDM model and vary five cosmological parameters: the Hubble parameter h, the physical baryon density parameter Ω b h 2 , the physical cold dark matter density parameter Ω c h 2 , the primordial comoving curvature power spectrum amplitude A s and the spectral in-  dex n s . The priors used in this analysis are uniform and reported in Table 3. As investigated in , the impact of priors is non-negligible in determining the allowed parameter space. However, since the scope of this work is just to show the goodness of the KL transform in reducing the data vector without losing any significant information, we choose to keep these priors fixed to those used in previous analyses for all our runs.
In Figure 12, we show the posterior distribution for the parameters Ω m ≡ Ω c + Ω b and S 8 ≡ σ 8 Ω m /0.3 obtained with the analysis in Fourier space and with an -dependent KL-transform. The derived quantity S 8 is of particular interest in weak lensing analyses since cosmic shear has the most constraining power on this parameter and it has proven to be less dependent on the specific choice of priors. We show a few representative compression schemes: (i) full is the case where no KLcompression is used, (ii) in 7 diag we consider all 7 radial KL modes but only using their auto-correlations, (iii) in 2 off we use the first 2 radial modes but we consider their auto-and cross-correlations, (iv) 2 diag is the same as 2 off but using only auto-correlations, and (v) 1 is the maximum compression case, where we consider only the first KL mode. Contours for the same parameters are shown in Figures 13 and 14 for the Fourier analysis with -independent KL-transform and the real-space analysis respectively, where the latter also compares the real-space and Fourier-space results directly. Several conclusions can be drawn from these results: • We are able to place similar constraints on the most relevant cosmological parameters in terms of final uncertainties in all cases.
• We observe small shifts in the maximumlikelihood parameter values for different compression schemes. These are well within the 1σ level, and are to be expected, since different compression schemes effectively use different sectors of the data vector. A further source of random shifts is the use of different sets of simulations to compute the covariance matrix for the different data compression schemes. We checked using different simulations to compute the covariance matrix that these shifts can be considered as statistical fluctuations.
• We obtain qualitatively equivalent results in terms of data compression efficiency using -dependent and -independent KL modes in a Fourier-space analysis, as well as using a real-space pipeline.
In order to get a quantitative sense of the compression efficiency of the KL transform, in Table 4 Figure 13. The same as in Figure 12, but the KL-transform is assumed to be scale-independent (see Eq. 18).
the 2D Ω m -S 8 plane. It is possible to notice a substantial difference between the real-space and the Fourier-space analyses. This is to be expected. We believe that most of these differences can be attributed to the different methods used to estimate the covariance matrices in both cases. 10 In addition, it is never possible to match both sets of scale cuts, giving rise to different levels of con-10 It is interesting to note that both analyses also obtain substantially different values for the best-fit parameter χ 2 with χ 2 /ndof = 1.47 for the real-space analysis and χ 2 /ndof = 0.98 for the Fourierspace one. straining power. More interestingly, for the purposes of data compression, for a given analysis choice we observe only insignificant variations in the FoM as we reduce the number of KL modes left in the data vector. This clearly indicates that the information contained in the full case is essentially the same as in the most compressed one. Also, by comparing the results for the two different Fourier-space analyses, i.e. the -dependent (SD) and the -independent (SI) KL transform, we can conclude that there is no significant discrepancy between the two if we take into account cross-correlations between different radial modes. When neglecting the cross-correlations of the power spectra, we find a small (∼ 10%) degradation in the FoM for the -independent case. This indicates that, while considering an -dependent KL transform may retain marginally more information especially when we maximally compress our data vector, the loss of information is small. It is interesting to notice that the FoM sometimes increases when we further compress the data vector. By construction, it is impossible to gain constraining power by compressing the data. This indicates that the apparent increasing of the FoM can be interpreted only as a statistical fluctuation.
In addition, since S 8 is the best constrained parameter with current weak lensing surveys, we can estimate the goodness of the KL compression by looking at its 1D posterior distribution. We compared the absolute values of the relative shift in the means (|∆S 8 /σ(S 8 )|), and of the relative difference of the standard deviations (|∆σ(S 8 )/σ(S 8 )|) between the full and the 1 cases. These two have been chosen as representative of the cases with none and maximum data compression respectively. For the shift in the means we find ∼ 0.38 for both the Fourier-space analyses (SD and SI) and ∼ 0.25 for the real-space analysis. Moreover, the relative difference of the standard deviations is ∼ 0.04 for the Fourier-space SD analysis, ∼ 0.08 for the Fourier-space SI analysis and ∼ 0.01 for the real-space analysis. This indicates not only that the amount of information lost with the KL-transform is practically negligible, but also that the shifts in the posterior distributions are compatible with expected statistical fluctuations caused by ignoring part of the signal.
In the best case, comparing all the data points used in the real-space analysis of  with the maximum data compression of the Fourier-space pipeline, we are able to achieve a compression factor of 30, which greatly simplifies the problem of computing the covariance matrix from both mock catalogues and with analytic methods.

Relaxing the assumptions
The constraints obtained in the previous section show that even with the maximum KL compression the loss of information is negligible. However, they were obtained using a number of assumptions that can be relaxed in order to test the robustness of our conclusion.
As the KL transform has to be calculated assuming a fiducial model, the loss of information is minimal if the fiducial model is a good fit to the data. This is the main reason to choose the best fit of  as our fiducial model. To explore the impact of the fiducial cosmology on the performance of this method, we have repeated our analysis using a fiducial model that is signif-   Table 4 We show the size of the data vector along with the Figure of Merit (FoM) for each case considered here. The first column indicates the KL-compression used. The second column shows the size of the data vector for each method, where s θ = 5 in Fourier space (5 bandpowers) and s θ = 10 in Real space (6 angles θ for ξ + and 4 for ξ − ). The other columns show the FoM for each case, calculated as the Area −1 of the 1σ contour in Ωm-S 8 . "Fourier SI" refers to the analysis in Fourier Space with -independent KL-transform, "Fourier SD" refers to the -dependent KL-transform, while "Real" shows the KL-transformed results in real space.
icantly far away from the best fit. In particular we use a flat ΛCDM model with (h, Ω b h 2 , Ω c h 2 , ln(10 10 A s ), n s ) = (0.8, 0.02, 0.16, 3.2, 0.8), which yields a χ 2 for the CFHTLenS data that is twice as large as that of the best fit. The result is shown in Figure 15, which shows the posterior distribution for the parameters Ω m and S 8 . We find contours that are very similar to the ones shown in Figure 13 and, as in the previous section, we find that the shift in the means and the relative difference of the 1-σ errors for S 8 are ∼ 0.32 and ∼ 0.06 respectively. The performance of the KL transform is therefore only mildly affected by the choice of fiducial model. Moreover, the results of our main analysis are in principle only valid for standard flat ΛCDM models described by the parameters in Table 3. An extended parame-  Figure 15. The same as in Figure 12 but the KL transform has been calculated using a bad fit to the data, i.e. χ 2 ∼ 2 × χ 2 bestfit .
ter space could in principle benefit from including additional KL modes in order to break parameter degeneracies. To explore this possibility, we have repeated our analysis for a "wCDM" model, in which we parametrize the evolution of the equation of state of dark energy as w(a) = w 0 + (1 − a)w a (Chevallier and Polarski 2001;Linder 2003), where a is the scale factor, and (w 0 , w a ) are two new free parameters of the model 11 . The results are shown in Figure 16 both for the Ω M -S 8 plane (top panel) and the w 0 -w a plane (bottom panel). As expected, the parameter contours for Ω M -S 8 are broader than in the ΛCDM case. The corresponding parameter shift and degradation in the final error on S 8 when comparing the full and 1 cases are ∆S 8 /σ 8 = 0.58, ∆σ(S 8 )/σ(S 8 ) = 0.12. These are noticeably larger than in the ΛCDM case, reflecting the fact that additional information is encoded in the remaining KL modes. This is more evident in the w 0 -w a plane, where the uncertainties on both parameters degrade noticeably if only a single mode is used. However, we can see that, by including both the first and second KL modes and considering their cross-correlation (dashed violet contours), we can recover the same constraints found when using the full data vector (which is also true for the Ω M -S 8 plane). Therefore, the compression efficiency degrades in this case from a factor ∼ 30 to a factor ∼ 10, which is still significant. This result should come as no surprise, since figures 9 and 11 show that, even if the S/N ratio is dominated by the first KL mode, some information can still be extracted from the second one. In the context of the wCDM model, this can be interpreted as follows: although the overall amplitude of the matter fluctuations can be well captured by a single, high signal-to-noise measurement of the weak lensing power spectrum, corresponding to the first KL mode, information about their growth as a function of time is vital in order to distinguish between This extended parameter space evidences the benefits of including a second KL mode in the analysis. different dark energy models. As shown by the red lines in Fig. 10, the second KL mode weights the high-redshift and low-redshift data differently, and is therefore able to partially recover the growth history. This was also shown in Alonso (2018), where it was found that an LSST-like experiment will require at least 3 KL modes in order to recover optimal constraints in an evolving dark energy scenario with massive neutrinos. It is also worth emphasizing the fact that, as shown by this exercise, even though the KL modes are uncorrelated in the fiducial cosmology, additional information can be obtained by exploring the parameter dependence of their crosscorrelation.
This result shows that the performance of the KL decomposition depends upon the parameter space on which it is used, and therefore a preliminary forecast is required for any experiment before deciding upon a given com-pression scheme. Note that this is also a required step before defining the redshift binning to be used in a standard tomographic analysis. It is also worth noting that our particular implementation of the KL transform was aimed to maximize the overall S/N of the data. The method is however more general than that, and can be tuned to optimize the final constraints on any parameter. Therefore, the results shown here could potentially be improved if we fine-tuned our KL decomposition for e.g. w 0 and/or w a .

Comparison with other analyses
We compare here the performance of the method described in this paper to other similar analyses carried out on the CFHTLenS data.
The closest study of a radial decomposition of weak lensing data is that of Kitching et al. (2014), which uses spherical Bessel functions as a radial basis. This analysis makes use of 50 radial modes, compared to our 1 or 2 KL eigenmodes. The fact that their basis is orthogonal by construction also implies that the cross-correlations between different modes cannot be neglected (see e.g. their Figure 4). One advantage of the modes used by Kitching et al. (2014) is that, for accurate enough photometric redshifts, each of them can be associated with a physical wavenumber k, and therefore the effect of baryons and non-linearities in the matter power spectrum can be kept under control by using a conservative cut on k. It is also worth noting that Kitching et al. (2014) use the maplevel modes as their data, instead of summarising them into power spectra. The total number of modes used at the likelihood level is therefore very large, but the use of map-level quantities simplifies the problem of computing 2-point function covariances.
Another interesting analysis is that of , in which the authors make use of a compressed set of Complete Orthogonal Sets of E/B-mode Integrals (COSEBIs) calculated from the CFHTLenS data. Their analysis proposes the use of a set of 20 COSEBIs for these data, which corresponds to a data reduction factor of ∼ 7 with respect to their raw initial data vector. Although this is not checked explicitly in this analysis, the resulting compressed COSEBIs should preserve the constraining power on a basic set of 5 cosmological parameters.
Finally, Simon et al. (2015) have made use of a KL decomposition to reduce the dimensionality of third-order cosmic shear statistics in CFHTLenS . The compression factor achieved here is modest (∼ 4), but the use of this type of analyses for these datasets is extremely relevant, given the large number of possible three-point function combinations.

DISCUSSION
The main goal of this paper has been to show the potential of the Karhunen-Loève transform in reducing the dimensionality of the data vector for cosmic shear measurements. We applied our method to weak lensing data from the CFHTLenS survey. The strong correlation between the weak lensing signal at different redshifts, makes cosmic shear particularly well suited for this type of compression. The fact that other 3D data decomposition frameworks have been applied to the CFHTLenS data in the past additionally allows us to make a comparison between different strategies.
The KL transform is designed to determine the linear combinations of the data that maximize the amount of information about a given parameter. Our main analysis of the angular two-point functions has been performed in Fourier space, both considering an -dependent and an -independent KL transform that maximizes the S/N of the data. In this context, the resulting set of radial eigenfunctions can be likened to a Fourier decomposition along the redshift coordinate (i.e. the KL transform is a generalization of the standard 3D power spectrum analysis to weak lensing data). For comparison with previous literature, we have also applied our method to realspace correlation functions, and show how the compression works in the different cases. In Fourier space we used the CFHTLenS catalogue to generate masks and maps of the observed area of the sky. We then used these maps to estimate the weak lensing power spectra. For the real-space analysis, we directly used the correlation functions and covariance matrix constructed in . In all cases, theoretical predictions for the power spectra and correlation functions given the cosmological parameters were obtained using CCL. In order to calculate the KL transform, our fiducial point was the best fit of . Choosing cosmological parameters close to the best fit ensures a nearly optimal compression, while random parameters would potentially degrade the efficacy of the method. We then applied the KL transform to our two-point functions and explored the posterior parameter distribution using MCMC methods.
The main result of this paper is that, both in real and Fourier spaces, the KL compression works exceptionally well. Indeed, it is possible to compress the original data vector, with 280 and 140 elements in real and Fourier space respectively, down to a vector with 10 and 5 elements without any appreciable loss of information or parameter biases. In other words, we achieve a compression factor of ∼ 30 for CFHTLenS, which simplifies the problem of computing covariance matrices from mock catalogues by the same factor, or by a factor ∼ 800 using analytic methods. This is shown in Table 4, where only insignificant deviations in the FoM are found as we reduce the number of KL modes considered. Additionally, we observe that considering an -independent KL transform does not significantly degrade the constraints, which simplifies the application of this method to realspace analyses.
As shown in Section 4.3, almost the same result holds after relaxing some of our assumptions. First, we have shown that the performance of the method is robust against errors in the fiducial cosmology chosen to define the KL eigenmodes. To show this we repeated our analysis for a fiducial cosmology that is substantially disfavoured by the CFHTLenS data, finding no significant difference in the final constraints. We also extended the parameter space by replacing the cosmological constant with a fluid with time dependent equation of state, using the standard (w 0 , w a ) parametrization. In this case, we find that, even though the final constraints are affected by the prior on these extended parameters, additional information can indeed be gained by including a second KL mode. This can be expected, given the non-zero signal contained in this mode, and can be interpreted as the second mode being able to recover growth information.
The compression factor in this case degrades by a factor 3 with respect to the ΛCDM analysis, although this could potentially be further improved by using a KL transform that optimizes the recovery of specific parameters (e.g. w 0 and/or w a ).
We must note a number of caveats in our analysis. Most prominently, we have not marginalized over possible photometric redshift systematics, including catastrophic outliers. Standard tomographic approaches have traditionally encapsulated these as additional nuisance parameters per redshift bin (e.g. overall shifts in the redshift distribution) that are marginalized over when evaluating the likelihood. This is not necessarily appropriate for the KL decomposition used here. The analogous approach would be to marginalize over the redshift distributions associated to each KL mode used (ñ p (z) = e α p n α (z), using the notation in Section 2). Intrinsic alignments (Troxel and Ishak 2015) is another systematic that may affect the performance of the method, given its non-local nature. Given the low significance of this effect, compared to the expected lensing signal, we expect it to have a negligible impact on the set of optimal eigenmodes, which are defined purely on the basis of signal-to-noise ratio. However, it could degrade significantly the compression efficiency in terms of final parameter constraints. We leave this study for future work. We have also ignored systematics associated to shear calibration, source clustering or baryonic effects on the matter power spectrum. Current and future imaging surveys draw most of their cosmological constraining power from the joint analysis of galaxy clustering and weak lensing (van Uitert et al. 2018, Joudaki et al. 2018, Abbott et al. 2018. It would therefore be interesting to explore the different ways in which the KL transform could be applied to this combined analysis. Although a direct application of the method would yield KL modes that mix clustering and shear, an alternative would be to combine the KL modes of both probes taken individually, in order to separate the different systematic uncertainties that affect them. We leave these investigations for future work. Finally, it is worth noting that a number of alternative data compression methods have been proposed in the literature that can achieve significantly larger compression factors (e.g. Reichardt et al. 2001;), down to a single data point per free parameter of the model. Although some aspects of these methods depend on having a sufficiently large sample of simulations, they are some of the most promising avenues to alleviate the complexity of computing covariance matrices (or even to forgo likelihood computations altogether). One possible advantage of the KL transform as presented here with respect to these methods is the availability of intermediate data products (KL mode maps and power spectra) that can be inspected to identify systematics in the analysis pipeline or the theory model. Data compression techniques of any form will be of tremendous use in future surveys such as LSST or Euclid, simplifying computationally demanding tasks, such as the computation of large covariance matrices.