Source Distributions of Cosmic Shear Surveys in Efficiency Space

We show that the lensing efficiency of cosmic shear generically has a simple shape, even in the case of a tomographic survey with badly behaved photometric redshifts. We argue that source distributions for cosmic shear can therefore be more effectively parametrised in"efficiency space". Using realistic simulations, we find that the true lensing efficiency of a current cosmic shear survey without disconnected outliers in the redshift distributions can be described to per cent accuracy with only two parameters, and the approach straightforwardly generalises to other parametric forms and surveys. The cosmic shear signal is thus largely insensitive to the details of the source distributions, and the features that matter can be summarised by a small number of suitable efficiency parameters. For the simulated survey, we show that prior knowledge at the ten per cent level, which is attainable e.g. from photometric redshifts, is enough to marginalise over the efficiency parameters without severely affecting the constraints on the cosmology parameters $\Omega_m$ and $\sigma_8$.


INTRODUCTION
Measurements of cosmic shear obtained via the weak lensing effect on individual galaxy shapes are one of the best available probes of the late Universe where Dark Energy dominates. The large numbers of galaxies necessary to reduce the statistical noise on cosmic shear two-point functions requires that current (Troxel et al. 2018;Hikage et al. 2019;Hildebrandt et al. 2020) and future (The LSST Dark Energy Science Collaboration et al. 2018;Amendola et al. 2018) surveys rely on photometric detections of sources only. This means that the second crucial piece of information necessary for cosmic shear Cosmology -distances to the galaxies for which shapes are measured -typically comes with the large uncertainties inherent in photometric redshifts. Large amounts of effort are expended on how to increase the precision and accuracy of such uncertain redshift estimates (see e.g. Schmidt et al. 2020, for a review of a number of methods).
The fiducial approach for current surveys is to parametrise uncertainty on the potential shift of the mean of the redshift distribution n(z −∆z), with ∆z in each tomographic bin marginalised over with a Gaussian prior. In addition to this statistical error, the systematic uncertainty stemming from the methodological differences in how the initial n(z) is formed has also been argued to dominate over the statistical uncertainty of current surveys (Joudaki et al. 2019). A well-motivated and principled way of accounting for a much wider range of statistical uncertainties than simply the shift in mean involves marginalising over the heights of histogram bins for the number count of weak lensing source galaxies as a function of E-mail: nicolas.tessore@manchester.ac.uk redshift (e.g. Leistedt et al. 2016;Sánchez & Bernstein 2019), but this necessarily creates a large number of new nuisance parameters which cannot feasibly be included in a typical analysis.
In this paper, we argue that attention should instead be focused directly on the lensing efficiency q of the source distribution, and that this is where the constraining power of data can be most effectively expended. The argument stems from the fact that the source distribution, in the form of the source density n(x) per comoving distance x, only enters the cosmic shear signal through the lensing efficiency (for a good short summary, see Lemos et al. 2017), However, this integral operator smooths out almost all details of the source distribution, so much so that even sharp features in n(x) can have no appreciable effect on q(x), as shown in Fig. 1. This smoothing is a generic feature of the lensing efficiency, which can be understood as follows. Instead of the integral (1), the lensing efficiency is equivalently characterised by a second-order differential equation, obtained by differentiating twice, with initial value q(0) = 1 due to the normalisation of n(x), and initial slope given by the mean inverse comoving distance η. Hence, for fixed η, different densities n(x) only lead to different accelerations along the curve. However, the integrated lensing efficiency is also constrained by the mean comoving distance µ. For fixed initial conditions but no acceleration (i.e. q(0) = 1, q (0) = −η, q (0) = 0), the integrated efficiency would be η −1 /2, which is less than µ/2 by Jensen's inequality (cf. Fig. 1). The mean µ of the source distribution therefore describes the tail of the curve, while the inverse mean η describes its behaviour near the origin. Overall, this essentially fixes the shape of q(x).
In the following, we develop this idea into a simple method for cosmic shear in efficiency space instead of redshift space. In Section 2, we fix a parametric form for the lensing efficiency, based on the preceding argument, and show that it can describe a current cosmic shear survey to very good accuracy. The parametric efficiency is then used in Section 3 for inference of the cosmological parameters. A brief summary and conclusion is given in Section 4.

PARAMETRIC LENSING EFFICIENCY
Since we expect the efficiency q(x) to depend little on the details of the source density n(x), we can derive a parametric form for q(x) by introducing a convenient parametric form for n(x) and computing its efficiency via the integral (1). In particular, we want n(x) to be positively supported, and q(x) to be of sufficiently elementary form for easy analytic and numeric evaluation. A natural choice is thus the gamma distribution with shape parameter α > 0 and scale parameter β > 0, where Γ(α) is the gamma function. The lensing efficiency is then . Two-dimensional distribution of intrinsic redshifts z intr and the photometric redshifts z tomo used for tomographic binning. Source selection into tomographic bins corresponds to horizontal cuts in the plane. Since the distribution is not purely diagonal, this creates complicated intrinsic redshift distributions when using photometric tomography.
where Q(α, x) is the regularised gamma function. This makes both the density (5) and the efficiency (6) straightforward to work with and quick to compute.
From our initial discussion, we expect the mean µ and inverse mean η to be important descriptors for the shape of the lensing efficiency. Computing both for the density (5), we can readily invert the relations to parametrise the efficiency in terms of mean µ and inverse mean η, We expect the parameters µ and η to be good generic descriptors for the shape of the lensing efficiency. The parameters α and β, on the other hand, only belong to the specific parametric form (6) for the efficiency. Positive values α and β always fulfil the strict constraint that µη ≥ 1 due to Jensen's inequality.
To demonstrate that the lensing efficiency can be parametrised by µ and η, we apply this description to the Buzzard synthetic sky catalogue (DeRose et al. 2019), which simulates the Dark Energy Survey Year 1 (DES Y1) observations with realistic uncertainties. In particular, the catalogue contains both intrinsic source redshifts and photometric redshifts obtained from simulated observations with the BPZ algorithm (Benítez 2000), in the same way that photometric redshifts are estimated from actual DES Y1 data (Hoyle et al. 2018). In total, we use three different types of redshifts from the catalogue, i) the intrinsic redshifts z intr that correspond to the comoving distance x of sources, ii) the photometric redshifts z phot used to create the photometric source distributions, obtained by a random draw from the BPZ p(z) posterior probability, and iii) the photometric redshifts z tomo used for tomographic redshift binning, given by the mean of the BPZ p(z) posterior probability, as in DES Y1. Fig. 2 shows the two-dimensional distribution of the intrinsic redshift z intr and the associated photometric redshift z tomo for tomographic binning. Degeneracies in features of spectral energy distributions, such as the Lyman and Balmer breaks, combine with measurement error-induced scatter to result in some regions of z intr separated by ∆z ∼ 1 being indistinguishable from each other. The effect is mitigated by the use of prior distributions on p(z), which in turn can be highly sensitive to selection effects and the misidentification of sources in the samples used to form the priors (e.g. Hartley et al. 2020). The resulting joint distribution of z intr and z tomo contains diffuse tails away from the diagonal, leading to tomographic source distributions in which the intrinsic redshifts may fall significantly outside the nominal bin edges.
This effect of broadening and overlap of the intrinsic redshift distributions is clearly visible in Fig. 3, which shows the tomographic source distributions of the catalogue. Here and in the following, we always assume the DES Y1 tomographic redshift bins with bin edges of 0.2, 0.43, 0.63, 0.90, and 1.30. While a point estimate z tomo is used for binning, the redshift sample z phot from the full posterior is used to create the shown photometric redshift distributions. The stacking procedure often leads to biases when using photometric redshift distributions (Schmidt et al. 2020), which we will shortly see via the lensing efficiency.
To convert redshifts to comoving distances, we use the flat ΛCDM cosmology of the Buzzard simulations with Ω m = 0.286 (DeRose et al. 2019). We work in units of Mpc/h to remove the dependency of the comoving distances on the Hubble parameter h. The exact lensing efficiency q samp for a sample of sources at distances x 1 , x 2 , . . . with weights w 1 , w 2 , . . . can be computed as the weighted average The exact computation is free from a choice of binning for the number count histograms, which are only used for illustration. The resulting lensing efficiencies for photometric and intrinsic redshifts are shown in Fig. 4. The absolute error ∆q of the photometric efficiency is at the 5 per cent level for the two lower tomographic bins.   Figure 6. Top: Best-fit densities (solid) and densities for the parametric efficiencies (dotted). Middle: Lensing efficiencies for the best-fit densities (solid) and intrinsic efficiencies (dotted). Bottom: Absolute difference ∆q(x) between efficiencies for the best-fit densities and intrinsic efficiencies.
As expected, the lensing efficiencies are of the characteristic simple shape even for a realistic galaxy catalogue and photometric source selection into tomographic bins. To show how closely the intrinsic efficiencies match the parametric form (6), we perform a continuous least squares fit by minimising the integrated square error, arg min The resulting parametric efficiencies are shown in Fig. 5. We find that our simple parametrisation reproduces the intrinsic efficiencies to per cent accuracy across the entire distance range and all tomographic bins. The best-fit efficiency parameters µ and η are given in Table 1. They are in good agreement with the parameters obtained directly from the intrinsic source distributions. Overall, we find that the efficiency parameters µ and η suffice to describe a DES Y1-like cosmic shear survey. We can furthermore recover the lensing efficiency through a simple parametric form (6), to far better accuracy than e.g. the raw, uncalibrated photometric distributions. This is not merely due to any similarities between the assumed parametric form (5) and the source distributions in the catalogue: Fig. 6 show that a direct least squares fit of the density n(x) to the intrinsic distributions produces a better match but yields significantly degraded efficiencies that are accurate only to a level comparable to photometric redshifts. The specific parametric form (6) for the efficiency is nevertheless only a convenient choice, while the generic observation is that the lensing efficiency is almost featureless and easily parametrised by a suitable density function. This could be e.g. a Gaussian, suitably clipped to the positive reals, or a generalised gamma distribution, in which case the agreement with the intrinsic efficiencies of the Buzzard catalogue improves by at least a factor of two.
Parametric lensing efficiencies obtained in this way, i.e. by integrating a chosen n(x), describe catalogues with outliers that are connected to the bulk of the tomographic source distributions. It is well known that outliers can strongly bias the cosmic shear signal (Amara & Réfrégier 2007), and this effect can be seen in terms of the efficiency parameters, since even a small fraction of outliers can severely impact the mean and inverse mean. Such outliers are expected to comprise populations of physically related galaxies, which appear similar in photometric observations of low spectral resolution, but are actually separated in redshift. If this separation is large enough, tomographic selection by photometric redshifts may result in multiple disjoint populations in the source distributions. The efficiency of the total composite distribution of such a sample has a modified shape that is the superposition of the simple shapes of the individual populations. This is shown schematically in Fig. 7, in which the differing contributions to the total lensing efficiency from two disjoint populations can be seen. For surveys where disjoint outlier populations are significant, it is therefore possible to use a mixture of parametric efficiencies to describe each component individually. However, as seen above, this is not necessary for the Buzzard catalogue, where the total efficiency is well described by a single component.

COSMOLOGICAL PARAMETER INFERENCE
We now use the parametric efficiencies to infer the cosmological parameters Ω m and σ 8 from measurements of cosmic shear.
In a first step, we simulate a shear-only two-point function data vector, mimicking the corresponding DES Y1 data product (Krause et al. 2017 Halofit (Takahashi et al. 2012;Smith et al. 2003) to obtain the matter power spectrum, which is projected using Limber's approximation to produce the tomographic shear power spectra (for details, see e.g. Abbott et al. 2018), and transformed to the shear two-point functions (using the method of Kilbinger et al. 2009). These twopoint functions are then stored as our simulated data vector, together with a Gaussian covariance matrix matching the effective number densities and shape noise of DES Y1 (Troxel et al. 2018). The second step is the analysis of the synthetic data vector using the same pipeline, but with Ω m and σ 8 left as free parameters to be constrained by the data. The Hubble parameter h is fixed to the true value; this does not affect the results since the efficiency parameters µ and η are given in Mpc/h. The analysis is performed i) in redshift space using either intrinsic or photometric redshifts, and ii) in efficiency space using the parametric efficiencies fixed to the best-fit values.
The resulting two sigma contours for Ω m and σ 8 are shown in Fig. 8. Using a parametric efficiency affects the degeneracy of the cosmological parameters, because the cosmology is not used to convert the redshift distributions to comoving distance. This leads to a tilt in the contours between the efficiency space and redshift space analysis. The marginal distributions are largely unaffected, with a mildly wider marginal posterior for σ 8 , and no change in the marginal posterior for Ω m . Neither cosmological parameter is biased by the parametric efficiency with respect to the intrinsic redshifts.
The above shows that for known source distributions, working in efficiency space yields roughly the same results as working in redshift space. The real advantage of a parametric efficiency lies in the opposite direction, where it becomes possible to do cosmic shear Cosmology, in the extreme case, without any information about the source distributions. Leaving the efficiency parameters as free parameters allows sampling of the posterior while exploring all possible source distributions that are covered by the efficiency model (such as, in the case we are considering, those without disjoint outlier populations). Any information about the sources that is available can then be incorporated into the analysis through a prior on the efficiency parameters, as appropriate for a Bayesian analysis. This cleanly separates data, which only enters through the observed shear two-point function, and theoretical predictions, where the observed redshifts are no longer used, in contrast to current analyses where data in the form of the observed n(z) is often used as part of the theoretical model.
To understand how imperfect knowledge about the efficiency parameters affects the cosmology, we repeat the same analysis, but instead of fixing the efficiency parameters to the best-fit values, we equip µ and η −1 with a uniform prior of varying width. We use the reciprocal η −1 under the principle that a distance, not an inverse distance, should be uniformly distributed. We set the uniform prior range of all efficiency parameters to a fraction f of their respective true value, where the parameter P is either µ or η −1 , and i is the tomographic bin index. In every instance, the efficiency parameter space is also naturally bounded by the strict condition that µ i η i ≥ 1, which the sampling takes into account. We sample the posterior distribution at uniform prior widths of 2%, 5%, 10%, 20%, and 100%, so that we can obtain the posteriors at intermediate widths by resampling without unduly reducing the number of effective samples. The resulting cosmological constraints are shown in Fig. 9. The mean and standard deviation of the marginal distributions of Ω m and σ 8 are relatively stable to uniform prior widths of ∼10%, from which point on the marginal means develop a slight bias, which grows up to ∼1 standard deviation at 100% prior width. Interestingly, the width of the Ω m constraint remains constant over the entire range, while the constraint for σ 8 widens by a factor of ∼1.5. To quantify how broadening the efficiency parameter priors affects the joint constraining power for the cosmological parameters, we define a figure of merit as the inverse of the area of the covariance ellipse (Albrecht et al. 2006), Here, we find again that a uniform prior width below ∼10% does not affect the posterior. For larger prior widths, the figure of merit falls by a factor ∼10 at 100%. This, together with the results for the marginal distributions, implies that the joint contours for Ω m and σ 8 become rounder as prior information about the source distributions is removed, but not significantly wider along either marginal axis. This qualitative analysis demonstrates the feasibility of obtaining real cosmology constraints with the efficiency space approach when the efficiency parameters are not perfectly known. With a look at Table 1, we find that the photometric values of µ and η lie within the ∼10% range at which the analysis starts to be adversely affected by the lack of information about the sources. Furthermore, uniform priors for µ and η could readily be improved through targeted modelling and/or higher-dimensional and hierarchical sampling methods for the distance distributions. We also emphasise that these results are meant to show the sensitivity of cosmological parameters to the efficiency parameters, and should not be taken as a proposal for, or compared to, full analyses. Our message in this paper is to demonstrate the principle of modelling distance distributions in efficiency space.

CONCLUSION
We have shown that the distribution of distances to sources in a cosmic shear analysis can be modelled directly in the space of the lensing efficiency, rather than in the space of the source redshifts. We have argued that the approach is motivated by the form of the lensing efficiency transformation, which smooths out many features of source redshift distributions. This means that expending parameters on modelling such features is unnecessary when the goal of the analysis is cosmic shear Cosmology. By modelling the source distance distribution in efficiency space, we are modelling only the information that is necessary for Cosmology. The behaviour of the lensing efficiency in (3) and (4) suggests that only the parameters η and µ are necessary to describe the weak lensing action of a single population of sources, and Fig. 7 shows how this readily and simply extends to outlier populations.
We have chosen a parametrised form for the lensing efficiency (6) and shown in Fig. 5 that for a representative cosmic shear survey (DES Y1 as modelled by the Buzzard simulation) two free parameters per tomographic bin allow us to model the true efficiency to within one per cent accuracy. The further analysis of Section 3 then shows the effect of this on the constraints for Ω m and σ 8 , the cosmological parameters best constrained by cosmic shear surveys, where we find little loss in constraining power even when marginalising over efficiency parameters which are uncertain at the level of up to ten per cent.
More sophisticated methods are necessary for cosmological parameter estimation in real data, but here we have argued that modelling of that data can be done most parsimoniously by mod-elling the lensing efficiency directly, rather than the redshift number density distribution.