LIMBERJACK.JL: AUTO-DIFFERENTIABLE METHODS FOR ANGULAR POWER SPECTRA ANALYSES

,


INTRODUCTION
Cosmology is currently experiencing an unprecedented increase in the quantity and quality of data.The Dark Energy Spectroscopic Instrument (DESI) (DESI Collaboration et al. 2016) has already recorded more galaxy spectra in its first two years of operations than previously done in the whole history of humanity.Moreover, next-generation surveys such as Legacy Survey of Space and Time (LSST) (Ivezić et al. 2019), Euclid (Laureijs et al. 2011), the Nancy Grace Roman space telescope (Spergel et al. 2015), the Simons Observatory (Ade et al. 2019), CMB-HD (Sehgal et al. 2019) or CMB-S4 (Abazajian et al. 2016) promise to further accelerate this trend in the next decade.
In order to match the quality of the data, physicists are starting to incorporate into their theoretical predictions more of the physical, observational and instrumental effects which, until now, could be overlooked.In practice this translates into a dramatic increase in the number of parameters that future analyses will have to consider.This combination of large data sets with complex models will (and in many cases already does) overwhelm the inference methods we currently use to constrain the values of these parameters.
In a Bayesian framework, the statistical distribution of a set of parameters, , given some data, , is given by Bayes theorem: P ( | ) ∝ L ( |)Π() , (1) * jaime.ruiz-zapatero@physics.ox.ac.uk where ( | ) is the posterior distribution of the parameters given the data, L ( |) is the likelihood of the data for a set of parameters and Π() are the prior beliefs on the distribution of the parameters.Current cosmological analyses explore their parameter spaces by mapping out ( | ) according to some stochastic process by which the direction of exploration is chosen.Despite the success of this methodology, relying on stochastic methods becomes inefficient at high dimensions as the chances of randomly finding the parameters that describe the observed data well decreases with the volume of the parameter space being probed.This effect is known as the "curse of dimensionality".One of the most effective ways of overcoming the curse of dimensionality is using the gradient of the posterior; ∇( | ) ∝ ∇(L ( |) Π()), to guide exploration towards regions of interest in parameter space.Algorithms that use the gradient of the likelihood to explore parameter space are known as gradient-based samplers (MacKay 2002;Hoffman & Gelman 2011;Zhang et al. 2021;Robnik et al. 2022).Unfortunately, traditional methods of finding numerical gradients, such as the finite-difference method, scale poorly with the number of dimensions.Moreover, they can be prone to numerical instabilities.This can render the use of gradient information counter-productive.
Thankfully, in the last decades a series of algorithms known as auto-differentiation (AD) (Griewank & Walther 2008;Margossian 2018;Harrison 2021) have grown in popularity.Given a generic computer program that maps a series of inputs to a series of outputs (i.e. a function inside a computer) AD is a family of algorithms designed to produce a symbolic representation of such computer program such that the chain rule can be systematically applied to produce a second program for the gradient of the original function.Unlike finite differences, AD is not subject to truncation errors and its computational cost scales much more favourably with the dimensionality of the original function (Griewank & Walther 2008).Thus, the goal of this paper is to provide an AD cosmological inference framework.
We present LimberJack.jl,an auto-differentiable angular power spectra analysis code fully written in Julia.LimberJack.jl is designed after the Core Cosmology Library (CCL Chisari et al. 2019) developed by the LSST Dark Energy Science Collaboration, aiming to fulfil the similar scientific goals.LimberJack.jl allows the user to easily compute ΛCDM model predictions for the angular power spectra of weak lensing, CMB lensing and clustering surveys using the Limber approximation (Limber 1953;LoVerde & Afshordi 2008).Most importantly, LimberJack.jl can also provide the user with accurate gradients of these predictions in a computationally efficient way, due to its compatibility with Julia's AD libraries ForwardDiff.jland ReverseDiff.jl.While LimberJack.jl is currently more limited than CCL, its modular design means that it can easily be extended by the community.This will allow the community to add new features that will be necessary for the analysis of future data which are currently not present in LimberJack.jl; such as more precise prescriptions of the non-linear corrections to the matter power spectrum or going beyond the Limber approximation (Leonard et al. 2023).
A number of auto-differentiable cosmological codes have been presented in the literature.The most notable precedents are JAX-COSMO (Campagne et al. 2023) and CosmoPower-JAX (Piras & Spurio Mancini 2023).JAX-COSMO's functionalities and goals are very similar to those of LimberJack.jl both basing their design on CCL.CosmoPower-JAX is a neuralnetwork emulating framework for the matter power spectrum originally developed in TensorFlow (Spurio Mancini et al. 2022) and later ported to JAX to be paired with JAX-COSMO.The main difference between JAX-COSMO, CosmoPower-JAX and LimberJack.jl is the AD ecosystem they make use of.Both JAX-COSMO and CosmoPower-JAX are written in JAX, a scripting programming language which interfaces with a lower-level language, XLA, a just-in-time (JIT) compiled language with AD capabilities.JAX's main strengths are its powerful parallelisation schemes on both CPU's and GPU's, its performant AD methods and the fact that it shares API with the ubiquitous Python library NumPy.On the other hand, LimberJack.jl is fully written in Julia, a general-purpose, JIT-compiled, programming language with native AD capabilities.Julia's main advantage is the lack of a lower-level programming language as is in the case of other popular AD environments such as TensorFlow interfacing with C++ and JAX with XLA.This transparency makes Julia an excellent language to develop complex libraries with customised methods.A recent practical example where this feature has played a key role is the development of the first AD Boltzmann solver, Bolt.jlLi & Sullivan (prep), a challenging task for JAX and TensorFlow but which, nonetheless, was possible in Julia.
The structure of this paper is as follows: in Sect. 2 we describe the theoretical predictions computed by LimberJack.jl, how LimberJack.jl computes these predictions in an auto-differentiable way and how AD can be used to speed up statistical inference.In Sect.3, we use LimberJack.jl to reproduce the DES Y1 3x2-pt analysis and to perform a Gaussian process reconstruction of the growth factor across redshift.Finally, in Sect.4, we summarise our work and interpret our results.In addition to this we also present the full derivatives of the most relevant theoretical predictions of LimberJack.jl in App.A and a tutorial on how to use LimberJack.jl in App.B.

METHODS 2.1. Angular Power Spectra
The main goal of LimberJack.jl is to compute the ΛCDM model predictions for the angular power spectra of any two tracers using the Limber approximation as well as their gradients.In the Limber approximation (Limber 1953;LoVerde & Afshordi 2008), the angular power spectrum,   ℓ , between two projected fields  and  is related to the three-dimensional power spectrum,   (, ) of their associated quantities  and  via: where   and   are the radial kernels of the  and  fields respectively.
Currently LimberJack can compute the auto-and crosscorrelation of three different fields: cosmic shear (i.e. the distortion in the shape of galaxies caused by weak gravitational lensing), CMB lensing convergence (i.e. the apparent magnification or reduction of CMB anisotropies due to gravitational lensing by the matter distribution along the line of sight) and galaxy clustering.
The radial kernels cosmic shear (  ), CMB lensing convergence (  ) and galaxy clustering (  ) are given by: where  is the speed of light,  () is the Hubble expansion rate,  0 ≡  ( = 0),  is the comoving distance, ( ) is the expansion factor, Ω m is the matter density parameter today,   is the galaxy-matter linear bias parameter,  * is the comoving distance to the surface of last scattering and () is the redshift distribution of the galaxies.The estimation of the galaxy clustering, shear and convergence fields is subject to a variety of systematic biases that must be accounted for in their modelling.In galaxy clustering analyses the level of detail to which the relation between the observed galaxy overdensity and the predicted matter overdensity is modelled limits the range of scales that can be analysed.We consider a single parameter linear bias model,  g , that multiplies the clustering kernel.This restricts our analysis of galaxy clustering data to relatively large scales.Cosmic shear analysis must account for possible biases in galaxy shape estimation.Similarly, we consider a single multiplicative bias parameter, .In addition, galaxy shapes are not only correlated by cosmic shear but also by intrinsic alignments (IAs) in the orientation of galaxies due to local interactions.Within the so-called Non-Linear Alignment (NLA, Hirata & Seljak 2004) model this can be accounted for by adding an extra contribution to the final shear kernel given by: where  IA () is: where  IA,0 and  IA are two free parameters,  0 is a redshift pivot (which we fix to  0 = 0.62 as in Abbott et al. (2018); Troxel et al. (2018)), and  () the linear growth factor.When the shear and convergence kernels enter Eq. 2 they must be multiplied by an ℓ-dependent correction to account for the differences between angular and three-dimensional derivatives of the matter density field.These corrections are given by: 8) for the shear and convergence kernels respectively.Since all the fields considered are ultimately projections of the 3-dimensional matter density field we only need to consider the matter auto-correlation power spectrum, (, ) ≡   (, ), when computing Eq. 2. Assuming that the evolution of the matter power spectrum is scale independent, (, ) can be obtained by first computing the value of the linear matter power spectrum at  = 0 at an array of scales.As we will discuss in Sect.2.2 this can be done either using fitting formulas or emulating the solutions of Botlzmann codes.Then, the linear matter power spectrum can be evolved backwards in time as  L (, ) = 2 () 0 () where  () is the linear growth factor found by solving the Jeans equation: However, in order to fit the small scales it is necessary to compute the non-linear evolution of the matter power spectrum,  NL (, ).We do so by applying the Halofit fitting function Smith et al. (2003) with revisions from Takahashi et al. (2012) to  L (, ).However, analyses of data from the latest surveys (See The implementation of the expressions discussed in Sect.2.1 within LimberJack.jl will are thus constrained by the demands of the AD methods used to obtain their gradients.The computation of Eq. 2 fundamentally involves two different types of quantities, the radial kernels and the matter power spectrum.On the one hand computing the radial kernels boils down to evaluating the expansion history of the Universe,  (), and its integral over time, i.e. the comoving radial distance, ().Calculating the expansion history, as given by the ΛCDM model, amounts to evaluating ) on a grid of redshifts where Ω m and Ω  are the cosmological matter and radiation density respectively.The radial comoving distance is then obtained integrating over the grid using Simpson numerical integration which is generically compatible with AD methods.In the left and centre columns of Fig. 1 we show a comparison between the LimberJack.jl and CCL predictions for the expansion history and comoving distance between 0 <  < 1100 (top panels).Each panel contains a subpanel where the relative difference between the CCL and LimberJack.jlpredictions is shown.We find that the relative difference between the predictions of CCL and LimberJack.jl are smaller than 10 −4 at all redshift.Moreover we also show a comparison for the derivative of said quan-tities with respect to Ω m when computed by LimberJack.jl using AD and finite differences for the same redshift range (bottom panels).Similarly, each panel contains a subpanel where the relative difference between the AD and numerical gradient is shown.We find that the relative difference between the two methods is smaller than 10 −4 at all redshift.
On the other hand computing the matter power spectrum in an AD-compatible way is a much more difficult problem.The first and most challenging obstacle is obtaining the linear matter power spectrum.As described in Sect.2.1, computing the matter power spectrum involves solving a coupled system of linear differential equations with time-and scale-dependent coefficients.This task has proven a major challenge for most AD environments (Campagne et al. 2023) as it requires tweaking the lower-level programming languages.This is however not a problem for the Julia AD ecosystem which has a (currently under development) full Boltzmann-Einstein solver, Bolt.jlLi & Sullivan (prep).While still in its early days, Bolt.jl can provide users with full numerical solutions for the linear power spectrum and its derivatives with respect to the ΛCDM parameters.LimberJack.jl can then interface with Bolt.jl to obtain said predictions.
However, solving the full Boltzmann-Einstein equations is very computationally expensive (being the bottleneck of most cosmological analyses) even in fast programming languages such as C++ (CLASS), Fortran (CAMB) or Julia.Therefore, it is common to look for ways to bypass solving the Boltzmann-Einstein equations in cosmological analyses where speed is important.When the evolution of the matter power spectrum can be assumed to be scale independent, (, ) can be computed by constructing a fitting formula that approximates the true value of the matter power spectrum at  = 0,  0 (), at an array of scales and then evolving  0 () into the past using the linear growth factor.The first commonly applied fitting formula for  0 () was the Bardeen-Bond-Kaiser-Szalay (BBKS Bardeen et al. 1986) formula which modifies the Harrison-Zeldovich power spectrum (Harrison 1970;Peebles & Yu 1970;Zeldovich 1972) by a transfer function: where is   is the amplitude of the matter power spectrum,   = 0.05  −1 and  () is the transfer function for a given set of fixed cosmological parameters.The most popular fitting formula is that of (Eisenstein & Hu 1999, 1998, E&H from now on), which follows the same strategy as the BBKS formula of approximating the transfer function but with a more complex expression that includes baryonic effects such as baryonic acoustic oscillations and small-scale power suppression.Thanks to these inclusions the E&H formula is accurate enough to return unbiased cosmological constraints for the 3x2-pt DES Y1 analysis as shown in Campagne et al. (2023).However, it is important to keep in mind that the E&H formula will not be accurate enough to analyse future data set as well as some current ones.
In recent years, a new family of fitting formulae, known as emulators, have grown in popularity.Emulators are computer models (such as neural networks or Gaussian processes) whose weights are optimized to reproduce a target function over a certain domain.The main advantage of emulators over traditional fitting formulae is that they automate the majority of the trial and error process of building an accurate fitting formula.Moreover, they tend to require less knowledge of the physical problem to be constructed.However, emulators require vasts amounts of training data and the resulting models tend to be far larger than traditional fitting formulae.Nonetheless, emulators have found great success in astrophysics in recent years, offering extremely accurate yet affordable approximations to different functions (Donald-McCann et al. 2022;Bonici et al. 2023), including the matter power spectrum (Spurio Mancini et al. 2022;Mootoovaloo et al. 2022).
The great advantage of these fitting formulae is that they effectively amount to algebraic expressions through which AD algorithms can easily differentiate as shown in Piras & Spurio Mancini (2023) and Bonici et al. (2023).Combined with their speed, fitting formulae can often be the preferred way to obtain estimates for the primordial power spectrum in gradient-based analyses.For these reasons, LimberJack.jl is equipped with both a native Julia implementation of the E&H formula and the recently developed Gaussian process based emulator EmuPk (Mootoovaloo et al. 2022) for the linear matter power spectrum.
The next challenge is to compute the growth factor,  (), to find the evolution of the linear power spectrum.As discussed in Sect.2.1, the growth factor is obtained by solving the Jeans equation (see Eq. 10) which is an inhomogeneous ordinary differential equation.Differentiating through the solutions of differential equations can be done by writing a recursive numerical scheme to solve the differential equation.These schemes amount to a series of linear operations that update mutating variables.While mutating variables can pose challenges for certain backwards modes of AD, the numerical schemes can otherwise be differentiated through to yield gradients for the solution of a differential equation.Thus, LimberJack.jl solves the Jeans equation using a second order Runge-Kutta solver.Returning to Fig. 1, in right column of this figure we show a comparison between the LimberJack.jl and CCL predictions for the linear growth factor (top panels).We also shown a comparison between the derivatives of LimberJack.jl's  () with respect to Ω m when computed using AD and finite difference (bottom panels) for the redshift range 0 <  < 3. Again, each panel contains a subpanel where the relative difference between the two compared quantities is shown.Concerning the growth factor itself, we find that the relative difference between the CCL and LimberJack.jlpredictions are smaller than 10 −4 for all the redshift window.Concerning the derivatives of the growth factor, the relative differences between AD and finite differences are again smaller than 10 −4 for all the redshift window.
To compute the necessary non-linear corrections at small scales, LimberJack.jl is equipped with an autodifferentiable implementation of Halofit as given by Smith et al. (2003) with revisions of Takahashi et al. (2012).The biggest obstacle to accomplish this is differentiating through the root finding process that occurs within Halofit.Similarly to solving differential equations, root finding can be differentiated through by writing a recursive numerical scheme of linear operations with mutable variables (Margossian & Betancourt 2021).The Halofit implementation within LimberJack.jl uses the secant method for its root finding through which most AD methods can differentiate.Putting all of these together LimberJack.jl can perform a fully auto-differentiable computation of the non-linear matter power spectrum at any redshift or scale.In Fig. 2 we show a comparison between the predictions for the non-linear matter power spectrum -.Left column: Top panel shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the expansion history between 0 <  < 1100.Bottom panel shows a comparison between the derivative of the expansion history with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the same redshift range.Middle column: Top panel shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the comoving distance between 0 <  < 1100.Bottom panel shows a comparison between the derivative of the comoving distance with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the same redshift range.Right column: Top panel shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the linear growth factor 0 <  < 3. Bottom panel shows a comparison between the derivative of the linear growth factor with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the same redshift range.from LimberJack.jl and CCL for  = [0.0,0.5, 1.0, 2.0] and −2 < log 10 () < 4 (top panels).Both the CCL and LimberJack.jlpredictions are obtained using the E&H linear power spectrum.Then Halofit was used to obtain the non-linear corrections.Moreover we also show a comparison between the derivative of the LimberJack.jlpower spectra with respect to Ω m when computed using AD and finite differences.Finally, each panel contains a subpanel where the relative difference between the two compared quantities is shown.Concerning the power spectra, we find that the relative difference between the CCL and LimberJack.jlpredictions are smaller than 1 * 10 −3 for all the redshift window.However, we observe that the Halofit implementation of LimberJack.jl systematically over-predicts the non-linear matter power spectrum at higher redshifts.As we will see this bias will manifest when considering angular power spectra involving the CMB lensing tracer later on but it nonetheless doesn't prevent us from achieving our accuracy goals.In the future, we aim to implement more accurate prescriptions of the non-linear matter power spectrum such as as HMCode (Mead et al. 2021) or baccoemu (Angulo et al. 2021).Concerning the derivatives of the same power spectra, the relative differences between AD and finite differences are again smaller than 10 −3 for all the redshift windows.
The only step left to compute angular power spectra is to bring the radial kernels and the matter power spectrum together and perform the Limber integral (see Eq. 2).Similarly to the computation of comoving distances, this is done within LimberJack.jl by evaluating all the quantities within the integrand at a regular array of logarithmic scales,  10 (), and then performing the numerical integral for the desired multipoles, ℓ, using Simpson numerical integration.One small challenge in doing so is finding the corresponding redshift associated with the comoving distance given by the scale and the multipole to evaluate the radial kernels.This inconvenience is a result of LimberJack.jl defining the radial kernels as functions of redshift instead of comoving distance.Normally, finding the redshift at a given comoving distance would involve a costly root finding process.However, LimberJack.jl handles this by building an interpolator between redshift and comoving distance and then inverting the interpolator to establish a straight forward comoving distance to redshift map Top panels show a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the non-linear matter power spectrum for  = [0.0,0.5, 1.0, 2.0] and −4 < log 10 () < 2. Both LimberJack.jl and CCL used the E&H formula to compute the linear matter power spectrum.Then, Halofit was used to obtain the non-linear corrections in both cases.Bottom panels shows a comparison between the derivative of the LimberJack.jlmatter power spectra with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange).Each panel contains a subpanel where the relative difference between the two compared quantities is shown.
for a given set of cosmological values.
In Fig. 3 we show a comparison between the predictions of LimberJack.jl and CCL for the angular power spectra of different types of tracers for 10 < ℓ < 1000.We consider the auto-and cross-correlations of galaxy clustering, cosmic shear and CMB convergence.In all cases the E&H formula was paired with Halofit to obtain the matter power spectrum.We also assume a Gaussian redshift distribution centred at  = 0.5 with a standard deviation   = 0.05, sampled at 1000 evenly-spaced intervals in the range 0 <  < 2. We observe that the discrepancy between LimberJack.jl and CCL is smaller than 10 −3 for all angular power spectra except for the auto-correlation of CMB lensing where the discrpancy is larger but nontheless stays below 10 −2 .This is due to the discrepancy in the evolution of the Halofit non-linear matter power spectrum between LimberJack.jl and CCL shown in Fig. 2. Similarly, in Fig. 4 we show a comparison for the derivatives of the same quantities with respect to Ω m when computed using AD and finite differences.The derivative of the clustering tracer proved to be extremely sensitive to the resolution of the numerical integration scheme used to normalise the galaxy distributions leading the observed oscillatory behaviour.Despite this, we nonetheless observe a sub-percentage-level agreement between the two approaches in all cases.It also worth remembering that disagreements between AD and finite differences do not necessarily imply a mistake in the AD.Indeed finite differences are more likely to differ from the underlying true derivative given that they are subject to truncation errors.
In this section we have only displayed derivatives of the theoretical predictions of LimberJack.jl with respect to Ω m .In App.A we show the derivatives of the non-linear matter power spectrum as well as the auto-and cross-correlation power spectra of galaxy clustering, weak lensing and CMB lensing with respect the five ΛCDM parameters for completeness.

Gradient-based Samplers
In the previous sections we described how the computation of angular power spectra (Sect.2.1) is made compatible with AD methods within LimberJack.jl (Sect.2.2).In this section we will describe how these newly found gradients can be used to significantly speed up the statistical inference of cosmological parameters, the goal of this paper.
As described in Sect. 1 stochastic inference methods become prohibitively expensive as the number of parameters (dimensions) increases.Gradient-based samplers provide a solution to this problem by using the gradient of the posterior to guide the exploration of the parameter space towards the regions where the probability density of the likelihood is the highest.
Hamiltonian Monte Carlo (HMC, MacKay 2002) is a sampling algorithm that simulates Hamiltonian trajectories on the parameter space to guide the exploration.In order to do so HMC duplicates the parameter space by introducing a set of auxiliary momentum variables ( ) independent of the likeli-

C l
Fig. 3.-.Shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of auto-and cross-correlation angular power spectra of galaxy clustering, weak lensing and CMB lensing tracers for the range of multipoles 1 < log(ℓ) < 3.Both LimberJack.jl and CCL used the E&H formula to compute the linear matter power spectrum.Then, Halofit was used to obtain the non-linear corrections in both cases.hood, and sampled from a d-dimensional unit variance Gaussian distribution.The joint distribution of the position (i.e the original parameters of interest ) and momentum variables is then defined by a Hamiltonian function which governs the dynamics of the system: where  is the mass matrix, a positive-definite matrix that controls the mixing of momenta variables.
In each iteration of HMC, the Hamiltonian dynamics are simulated using a numerical integration scheme to generate the next sample.This is done using a leapfrog scheme given its symplectic properties that preserve the volume of the phase space along the trajectory.The sample is then accepted or rejected using a Metropolis adjustment based on its Hamiltonian energy.This ensures that the target posterior distribution is obtained upon marginalizing over the momenta variables, i.e: While HMC can dramatically speed up the exploration of large parameter spaces, it also introduces a series of chal- -.Shows a comparison between the derivatives of the auto-and cross-correlation angular power spectra of galaxy clustering, weak lensing and CMB lensing tracers with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the range of multipoles 1 < log(ℓ) < 3.In order to compute the gradients of the angula power spectra the E&H formula was used to compute the linear matter power spectrum.Then, Halofit was used to obtain the non-linear corrections in both cases.lenges.Namely, the user must find the ideal step size for the numerical solver of the Hamiltonian trajectories, the ideal length of the trajectories and an expression for the mass matrix.The No-U-turns sampler (NUTS) (Hoffman & Gelman 2011) is an HMC algorithm that dynamically tunes these quantities based only on two user provided quantities: the Target Acceptance Probability (TAP) of the Metropolis adjustment and the number of adaptation steps.The fundamental idea behind NUTS is to prevent the sampler from returning to previously explored regions by avoiding U-turns in phase space.This is done by evolving the Hamiltonian trajectories both backwards and forwards in time (known as branching) and stopping when the momenta start to turn on themselves on either branch.This establishes a principled criterion for the length of the Hamiltonian trajectories that maximises the statistical independence of the samples.Moreover, during the adaptation phase the step size of the integrator is tuned such that the TAP is achieved.Finally, NUTS tunes the mass matrix using the variance of the adaptation phase samples such that the covariance matrix of the momenta variables is as close as possible to unit-variance.-.Shows the marginalised posteriors distributions from the analysis of DES Y1 3x2-pt data for the cosmological parameters Ω m ,  8 and  8 .Lower panel compares the LimberJack.jl(solid blue) and CCL (dashed red) analyses when using the E&H formula.Upper triangle compares the LimberJack.jl(solid purple) and CCL (dashed green) analyses when using EmuPk and CLASS respectively.Cobaya contours were obtained using the MH sampler while LimberJack.jlcontours were obtained using the NUTS sampler.
In order to validate the constraints obtained using LimberJack.jlwe first replicate two different analyses based of DES Y1 3x2-pt (The Dark Energy Survey Collaboration 2005; Abbott et al. 2018) data done using the wellestablished Bayesian inference framework Cobaya which employs a Metropolis Hastings (MH) sampler.In the first analysis, both LimberJack.jl and Cobaya used the E&H formula to obtain the matter power spectrum.In the second analysis, Cobaya used the CLASS code to obtain the matter power spectrum while LimberJack.jlused the EmuPk emulator Mootoovaloo et al. (2022), which was trained on CLASS.We use the García-García et al. ( 2021) (CGG21 from now on) angular power spectra and covariance matrix of the DES Y1 3x2-pt analysis based of the DES Y1 cosmological catalogue Abbott et al. (2018) (See Sects. 3.1,4.1 and 42. of CGG21).The same scale cuts as in CGG21 (see Tab. 4 of CGG21) were applied to the data vector.
With this aim, we make use of the Julia library for statistical inference Turing.jl(Ge et al. 2018).Turing.jl is a probabilistic programming language (PPL) that allows the user to create statistical models by explicitly writing the relationship between the sampled parameters and the theoretical predictions for the observed data.Turing.jluses this information to draw a likelihood density function compatible with the Julia AD infrastructure as long as the internal computations of the model are also compatible with AD.The user can then condition this function on the observations and sample it using a series of sampling back-ends.In this work we make use of the NUTS sampler as implemented in the Julia library AbstractHMC.jlwithin Turing.Moreover, we choose the AD library ForwardDiff.jl to provide NUTS with the gradient of the likelihood.Note that using forwards AD for statistical inference is sub-optimal since the likelihood is a function that maps a high-dimensional space to a single scalar.In the future, we aim to implement efficient backwards AD in LimberJack.jl.
The same priors were used in the two cases for both the Cobaya and LimberJack analyses.A summary of the priors can be found in Tab. 1.We show the resulting posteriors in Fig. 5.We observe an excellent agreement between the LimberJack.jl and Cobaya pipelines regardless of whether the E&H formula or CLASS/EmuPk is used to obtain the primordial matter power spectrum.
We compare the performance of the two samplers by looking at the number of effective samples (i.e.number of statistically independent samples in the Markov chain Gelman et al. 2013) per number of likelihood calls.This metric is independent of the hardware used to run the analysis and the time taken by LimberJack.jl and Cobaya to evaluate the likelihood.Thus this metric allows us to look at the improvement purely brought about by using gradient-based inference methods.Analysing the Markov chains of the different samplers we find that NUTS is approximately 1.5 times more efficient than Cobaya at generating effective samples.However, in order to fairly compare the performance of the two samplers we must take into account that NUTS computes the gradient of the likelihood at every step.Given the 25 parameters of the DES Y1 3x2-pt analysis, evaluating the gradient of the likelihood using ForwardDiff.jl is roughly 5 to 6 times more expensive than evaluating the likelihood itself which is an order of magnitude faster than if we had used finite differences.In order to compensate for this extra cost the efficiency improvement of NUTS over MH should be equal or larger than the relative cost of computing the gradient of the likelihood.Therefore the efficiency improvement of NUTS is not enough to compensate for the cost of computing the gradient in this particular application.
There exist two avenues to tilt the balance in favour of NUTS.On the one hand, increasing the efficiency of the sampler.On the other hand, reducing the cost of the likelihood gradient.However, the current bottleneck is the efficiency of the sampler as even more specialised AD methods (such as Zygote.jl)would take at least twice the time to compute the likelihood to obtain its gradient.Future works could explore initialising the NUTS mass matrix using the posterior co-variance estimate from variational inference algorithms such as PathFinder.jl(Zhang et al. 2021).Therefore, in order to showcase the strength of the methods developed on this work we need an application for which the efficiency of gradient-based samplers truly outpaces that of traditional inference methods.

Growth Factor Reconstruction
In the previous section we showed the reliability of LimberJack.jl by reproducing the official DES Y1 3x2pt analysis.
In this section we will showcase how LimberJack.jl can be used to perform statistical inference on models outside of the reach of traditional inference methods.In order to do so we perform a model-independent reconstruction of the growth factor using a Gaussian process (GP) with more than a hundred parameters.
A GP is a collection of random variables (nodes), each of them sampled from a multivariate Gaussian distribution with a non-diagonal covariance (Rasmussen & Williams 2006;Seikel et al. 2012).Thus a GP -(x) where x is a arbitrary vector representing the position of the nodes -is fully specified by a mean function, (x) ≡ E[(x)] -where E[• • • ] is the expectation value over the ensemble -and a covariance function, In combination, the mean and covariance functions determine the statistical properties of the random variables that define the family of shapes the GP can take.
In this work we use a GP composed of 101 nodes equally spaced through the redshift window 0 ≤  ≤ 3. The mean of the GP is given by the best-fit ΛCDM Planck 2018 (Planck Collaboration et al. 2020, P18 from now on) prediction for  ().While this choice of mean will bias the reconstruction towards the P18 prediction outside of the data domain, its impact in the data dominated regions is small (see e.g.Ruiz-Zapatero et al. 2022b,a, for similar examples of this GP behaviour in the literature.).Regarding the covariance matrix, we choose a square exponential covariance function, defined as: where  is the amplitude of the oscillations around the mean and  is the correlation length between the GP realizations and (,  ′ ) is the distance matrix of  and  ′ given by    = |  −  ′  | 2 .This decision was made based on the fact that the square exponential is a computationally inexpensive and infinitely differentiable kernel, appropriate to model smooth fluctuations around the mean of the GP.
Therefore, the growth factor is given by: where  is a vector of the GP nodes sampled from a unit variance, diagonal, multivariate normal distribution,  denotes the redshift array at which said nodes sit,  P18 () is the P18 growth factor evaluated at  and   ( (, )) is lowertriangular matrix obtained from the Cholesky decomposition of the covariance matrix.A way of interpreting Eq. ( 15) is as a rotation given by   on a vector of white noise  that imposes the correlations of the GP kernel.This model is similar to the one considered in CGG21, but using a GP to reconstruct the growth factor instead of splines.The reasoning behind the choice of GPs over splines for this work is threefold.First, GPs offer a well-defined measure of the uncertainty in their predictions which makes assessing the statistical significance of their results straightforward.Second, GPs are not subject to the strict assumptions that can bias spline reconstructions such as the choice of linear or cubic interpolations.However, it is important to bear in mind that GPs are far from assumption free as their structure is constrained by the properties of the chosen covariance kernel.Nonetheless, as we will show, these assumptions can indeed be neglected when the data is constraining enough.Third, GPs are as differentiable as their covariance matrix kernel (Rasmussen & Williams 2006).The derivative of a GP () ∼ N ((),  (,  ′ )) is another GP given by () ∼ N (  (),     ′  (,  ′ )).This is a very desirable feature that will allow us to use growth rate measurements (i.e.measurements of the gradient of the growth factor) to further constrain the reconstructed growth.As it will be shown, the growth rate measurements will highly restrict the evolution of the growth.
In order to constrain this model we use the south data collection described in CGG21 composed of the 3x2-pt DES Y1 data, the auto-correlation of eBOSS DR16 quasars and the cross-correlation of CMB lensing data with eBOSS DR16 quasars, DES Y1 clustering and DES Y1 weak lensing data.Therefore, our analysis combines a total of 7 two-point statistics which we will refer to as "7×2-pt" hereafter.These particular cross-and auto-correlations correspond to the physical overlap of the different surveys in the sky shown in Fig. 7.We explicitly list the considered auto-and cross-correlations in Tab. 2. In this table we can see that the DES Y1 galaxy clustering (GC) data is divided into 5 redshift tomographic bins.The DES Y1 weak lensing (WL) data is divided into 4 redshift tomographic bins.Similarly, the eBOSS DR16 quasar data is divided into 2 redshift tomographic bins.We used the Planck 2018 lensing convergence map.We process all these data following the CGG21 angular power spectra analysis described in Sects.3.1, 3.2, 3.3, 4.1 and 4.2.Thus we consider a total of 42 different angular power spectra which amount to 665 different data points.The associated covariance matrix of these data was computed using Cosmoteka (Garcia-Garcia Prep) and it is shown in Fig. 6.The scale cuts considered for each angular power spectrum are listed in the triangle of Tab. 2. For a detailed description of these data we refer the reader to Sect. 3 of CGG21.Besides the aforementioned data, we also consider a collection of redshift-space distortion (RSD) measurements by the BOSS DR12 (Alam et al. 2017), eBOSS DR16 (Alam et al. 2021), Wigglez (Blake et al. 2012), 6dF (Beutler et al. 2012) and VIPERS (Pezzotta et al. 2017) surveys.This analysis constitutes the first combination of all these different data, dramatically improving the constraints by constraining the evolution of the growth factor.We summarise these data in Tab. 3 and plot it in Fig. 8.For a full description of these data we refer the reader to Ruiz-Zapatero et al. (2022b).
These data allow us to constraint the growth factor and its derivative across redshift.In order to do so we relate   8 () with the reconstructed  () by where  8 18 is the  8 of the ΛCDM P18 cosmology and  () is given by Eq. 15.The main difference in Eq. 16 with respect to the ΛCDM model is that  8 is kept fixed.This is because in our ΛCDM analysis we defined  ( = 0) = 1.0 such that the amplitude of the growth factor varies by  8 .Unlike in TABLE 2. Shows the auto-and cross-correlations that form the 7×2-pt data vector.The first row below the title shows the names of the data sets, namely DES Y1 galaxy clustering (GC) and weak lensing (WL), eBOSS DR16 quasars (eBOSS) and the Planck 2018 CMB lensing convergence map (Planck CMB).The row below shows the tomographic bins of each survey.The triangle table shows the scale cuts in angular scales (ℓ) applied to each angular power spectrum.In total, the 7×2-pt data vector added to 665 data points.ΛCDM, in our GP model the amplitude of the growth factor is not fixed, but varies with each GP realisation.Therefore, the role of  8 would be completely degenerate with the amplitude of the  = 0 GP node and hence is kept fixed.

7×2-pt Data Vector
Once we have drawn an expression for   8 () in our GP model, the next question is how to evaluate it.While this might seem straight forward, the gradient of the growth factor is extremely sensitive to numerical errors which can lead to non-physical predictions for   8 ().One solution to this problem would be to compute the gradient analytically using the properties of GPs as discussed above.However, in practice evaluating the derivative of the GP analytically involves treating said derivative as a second GP with a cross-covariance matrix with the original GP.Computing and inverting the crosscorrelation matrix between the original GP and its derivative at every step of the Markov chain for the number of nodes used in this analysis proved too computationally expensive.
The solution found was to use the noise properties of the GP to interpolate the GP to a finer grid such that a finite difference can be taken to obtain the gradient to enough precision.Effectively, this is done by applying a Wiener filter to the original GP where the kernel used for the filtering is the covariance matrix of the GP.In this scenario the Wiener filter is just given by  ( * , ) =  ( * ,  * ) −1 (, ) where  * is the finer nodes array of redshifts.The resampled growth factor is then given by  ( * ) =  ( * , ) ().While this approach also involves inverting a matrix, this is just the covariance matrix of the GP which is a much cheaper operation.
Having discussed our model it is time to discuss how theory and observations are brought together to constrain the model.We built a Gaussian likelihood assuming the RSD and angular power spectra measurements are completely uncorrelated such that the final likelihood is the product of the likelihoods of the individual data types.This is a fair assumption given that the RSD data is mostly located in the northern hemisphere whereas the angular power power spectra are located further South, resulting in a small overlap between the surveys.We then derived constraints for the GP and cosmological parameters by applying Bayes theorem using the priors displayed in Tab.

5.
Note that the GP hyperparameters  and  were kept fixed.On the one hand, fixing the hyperparameters avoids the volume effects on the posteriors expected from including these parameters.On the other hand, fixing the hyperparameters to a particular set of values will introduce certain biases in the final reconstruction of the growth factor specially in the regions where data is sparse.For example, fixing the correlation length is expected to induce spurious oscillations in the reconstructed function outside of the data range.Nonetheless the decision to keep both parameters fixed was made as freeing them introduces severe non-Gaussian degeneracies in the posteriors which pose a challenge even to gradient-based methods such as HMC or NUTS.This is due to their hierarchical relationship with the GP nodes.Future studies that wish to undertake a completely model independent reconstruction will need to explore inference methods specifically tailored to deal with such non-Gaussianity such as Riemannian HMC (Betancourt 2012).
The number of parameters listed in Tab. 5 adds up to 128, a dimensionality which requires gradient-based samplers.For this reason, we employed a Gibbs sampling set up where GP parameters were sampled by one NUTS sampler, keeping all other parameters fixed, and then the rest of the (cosmology and nuisances) were sampled by their own NUTS sampler keeping the GP parameters fixed to their last sample.Moreover, the chains were started at the DES Y1 ΛCDM bestfit cosmology with the GP nodes starting from zero.The Gibbs scheme combined with the starting point proved to greatly increase the efficiency of the NUTS adapting phase in this high dimensional space.
In total four analyses were run.First, we ran two ΛCDM analyses, one of the 7×2-pt data and another of 7×2-pt + RSD data.In these analyses, we used the ΛCDM model to predict the growth factor as opposed to doing a GP reconstruction.Then the same two analyses were ran but performing the GP reconstruction.In addition to these, we also rerun the CGG21 analysis of 7×2-pt data using the priors listed in Tab. 5 in order to be able to compare our results against the Cobaya pipeline.
We also study the sampling efficiency in each of these four cases.A summary of our analysis can be found in Tab. 4. Starting with the ΛCDM, we find that the effective sample size per calls of the likelihood of the NUTS algorithm with the 7×2-pt data is once again approximately 1.5 times larger than when using the MH algorithm.Remarkably, we found that computing the gradient of the likelihood in this analysis, using AD, was roughly 5 times more expensive than computing the likelihood itself, the same relative cost as the DES Y1 likelihood gradient.Adding RSD data to the ΛCDM analysis results in virtually the same effective sample size per calls of the likelihood when using the NUTS sampler.This is to be expected given that fact that the posteriors remain Guassian and the number of dimensions is unchanged.Looking at the GP analyses, the addition of 101 extra new parameters renders the MH algorithm computationally unfeasible.Thus, we cannot directly compare the effective sample sizes per calls of the likelihood of the two samplers.However, we can still draw comparisons with previous analyses.We find that the efficiency of the NUTS sampler when performing a GP reconstruction based on the 7×2-pt data is around half the efficiency of the ΛCDM analysis of same data using the MH algorithm.Moreover, if we add RSD data to the analysis we find that the efficiency of the sampler becomes virtually identical to that of the ΛCDM analysis of the 7×2-pt data using the MH algorithm.This is due to the RSD data constraining the derivative of the growth and, therefore, putting stronger constraints on the GP nodes.Taking into account the cost of the likelihood gradient, we find that adding an extra 101 free parameters increases the relative cost of the gradient of the likelihood to a factor of ∼ 30 when using AD.However, using AD to obtain the gradient of the likelihood remains an order of magnitude cheaper than using finite differences despite the extra 101 free parameters.This showcases the favourable scaling of AD methods with the dimensionality of the problem.Hypothetically, an even more favourable scaling could be achievable by implementing an efficient backwards AD algorithm to obtain the gradient.In combination these two achievements managed to produce converged constraints for our GP analyses in O(10 2 ) CPU hours.For reference, our ΛCDM analysis of 7×2-pt data using the MH sampler took 24 CPU hours to converge.Thus our methods make analyses with a O(100) parameters computationally feasible while leaving ample room for speed ups.

ΛCDM Results
We start by analysing our data using a traditional ΛCDM model where the growth factor is obtained from the ΛCDM parameters by solving the Jeans equation for the matter anisotropies.This allows us to establish a frame of reference against which to compare the results of the GP reconstruction.Moreover, it also allows us to ground our analysis by comparing it with previous analyses of the same combination of data in the literature, namely CGG21.
In Fig 9, we show the obtained Ω m ,  8 and  8 posteriors when performing a ΛCDM analysis of the 7×2-pt data set with and without including additional RSD data (green and red contours respectively).We also show the contours obtained by the CGG21 pipeline when analysing the 7×2-pt data using black dashed lines.Our ΛCDM analysis of the 7×2pt data found Ω m = 0.287 ± 0.027,  8 = 0.809 ± 0.045 and

GP Results
We are now in a position to start discussing the GP reconstruction of the growth factor.We start by considering how introducing the GP affects the parameter constraints previously discussed in Sect.3.2.1.In order to establish a comparison we derive constraints for  8 ≡  8 ( = 0) and  8 ≡  8 ( = 0) from the GP reconstruction of the growth factor.From Eq. 16, we can see that within the GP model: where  () is given by 15.
In Fig. 10 we show four different set of posteriors for the parameters Ω m ,  8 and  8 .In the lower triangle we show the contours obtained when using the ΛCDM model (red) and the GP model (blue) to analyse the 7×2-pt data.Similarly, in the upper triangle we show the contours obtained when using the ΛCDM model (red) and the GP model (blue) to analyse the 7×2-pt data in addition to RSD data.The associated numerical constraints for the GP reconstructions are Ω m = 0.289±0.026, 8 = 0.839±0.067and  8 = 0.839±0.067when 7×2-pt alone is consider and Ω m = 0.286 ± 0.023,  8 = 0.815 ± 0.039 and  8 = 0.793 ± 0.017 when RSD data is included.The full posteriors can be found in the last two columns of Tab. 6. Performing the GP reconstruction of the growth factor greatly reduces the constraining power of the 7×2-pt data.We observe ∼30% wider constraints when using the GP model to analyse the data on average across the three parameters.It is important to note that the overwhelming majority of the impact occurs in the parameters explicitly related to the growth factor such as  8 and  8 and the linear bias parameters of the different probes.Other parameters such as ℎ or   remain virtually unchanged.The  8 constraint is also centred at a significantly higher value than when performing a ΛCDM analysis.This due to the lack of data at  = 0 to constrain the GP, which is reflected in the large error bar in the constraint.When RSD's are included the degradation in constraining power when performing the GP reconstruction is much smaller, with constraints only being ∼10% wider.Moreover, the  8 constraint becomes centred at exactly the same value as when performing the ΛCDM analysis of the same data.Due the larger error bar, we observe that nonetheless the tension with the P18 ΛCDM value drops to 1.7 sigma.
In order to understand how the inclusion of RSD data leads to these changes, we have to look at the reconstructed growth factor.In each of the rows of Fig. 11 we show the constraints obtained for  (),  8 () and   8 () (in this order) as functions of redshift.In each panel we compare the constraints obtained when using the ΛCDM model against the GP reconstruction.The left-column panels show analyses where only the 7×2-pt data was used while the right column shows the respective analyses when RSD data were included.Moreover, each panel has a subpanel showing the evolution of the 1 confidence interval of each function over redshift.
Let us begin the discussion by focusing on the top panels of Fig. 11 showing the evolution of the growth factor.In these panels we can see how introducing a GP to reconstruct the growth factor from the data increases the error bars in the predictions of  () by one to two orders of magnitude when compared to the ΛCDM prediction based on the same data.Including RSD data significantly contributes to constraining  () regardless of the model considered.In the case of ΛCDM analyses, we observe a 10% reduction in the standard deviation of  () across redshift.The impact of RSD data is even bigger when we instead use a GP to reconstruct  ().In this case we observe a 20% to 30% percent improvement in the constraints depending on the redshift window.Nonetheless, we observe that in all four cases the obtained growth factor is compatible with the P18 prediction at all redshifts.
It is important to note that the uncertainty of the  () ΛCDM prediction actually falls to zero at  = 0 since in this model the growth factor is re-scaled to be precisely one at this redshift.Therefore, by looking at the ΛCDM prediction of  () we are omitting a large contribution to the uncertainty of the growth factor in this model, its amplitude.Therefore, it is useful to consider quantities such as  8 () that do incorporate the uncertainty in the amplitude of the growth factor in the ΛCDM model, encapsulated in the parameter  8 .In the second row of panels of 11 we show the associated  8 () constraints for the four scenarios considered.In these panels we can see that the uncertainty in  8 () of the GP reconstruction at low redshift is actually comparable to that of the ΛCDM model the uncertainty in the amplitude is taken into account.However, the uncertainty of the GP reconstruction quickly increases once the data becomes sparse as seen in the  () panels.
It is also interesting to note that, as expected, fixing the correlation length of the GP induces oscillations in the reconstructed growth factor and  8 ().This is visible in the GP reconstruction of  () based on 7×2-pt data.However, including RSD data nullifies these oscillations.In order to understand these behaviours we need to look at the third rows of panels of Fig. 11.In these panels we show the associated   8 () predictions as a function of redshift.In the left panel we can see how the oscillations on the growth factor induced by our assumptions on the GP hyperparameters result in non-physical predictions for   8 () despite the growth factor itself being perfectly compatible with the P18 ΛCDM prediction.Including RSD data solves this problem by directly constraining the possible values of   8 () given the current data.These constraints on   8 () translate into strict demands for the evolution of the gradient of the growth factor.Therefore, when RSD data are included, we can see how the oscillatory behaviour present in the reconstruction from only 7×2-pt data disappears.This means that when the data is constraining enough the GP reconstruction remains unbiased by the assumptions introduced by fixing the hyperparameters.Finally it is relevant to compare the growth factor reconstruction performed in this work using GPs against the reconstruction performed by CGG21 from the same data using splines.In Fig. 12 we can observe that our GP reconstruction stands in good statistical agreement with the four constraints on  8 () at  = [0.24,0.53, 0.83, 1.5] obtained by CGG21.This reinforces the notion that these different model-agnostic approaches are data-driven, and not biased by their own individual assumptions.

CONCLUSIONS
In this work we have presented LimberJack.jl, an autodifferentiable cosmological code to compute angular power spectra fully written in Julia.The goal of LimberJack.jl is to enable the use of gradient-based inference methods in cosmological analyses.LimberJack.jl core strength's are: • Auto-Differentiablility: Every step in between the input cosmological parameters and the output theoretical prediction in LimberJack.jl is compatible with Julia's auto-differentiation (AD) libraries ForwardDiff.jland ReverseDiff.jl.These methods result in gradients up to an order of magnitude faster than when using finite differences.This is the key feature that makes the use of gradient-based inference methods possible with LimberJack.jl • Speed: LimberJack.jl is equipped with a native implementation of the matter power spectrum emulator EmuPk (Mootoovaloo et al. 2022) which makes it orders of magnitude faster than CLASS.
• Accuracy: The LimberJack.jl theoretical predictions have been thoroughly tested against those of the well-established cosmological code CCL, as well as the quality of the AD gradients against the more costly finite difference results.
• Interoperability: Thanks to its modular structure, LimberJack.jl can be easily interfaced with other Julia libraries to increase its capabilities.For example LimberJack.jl can be interfaced with the Bolt.jllibrary (Li & Sullivan prep) to gain access to the first auto-differentiable Boltzmann code.In each panel we over plot the prediction of the ΛCDM model for the given data as well as the result of the GP reconstruction.Moreover, we also overplot the P18 ΛCDM prediction using a black dashed line for reference.Finally, each panel counts with a subpanel where the we show the evolution of the 1 confidence intervals of the plotted functions across redshift.
Furthermore, we presented two examples of how LimberJack.jl can be employed to perform present and future cosmological analyses using the gradient-based inference algorithm NUTS (Hoffman & Gelman 2011), a self-tuning formulation of the Hamiltonian Monte Carlo algorithm.In the first example we reproduced two analyses of DES Y1 3x2-point data performed using the well established Cobaya library to ensure the reliability of LimberJack.jl.LimberJack.jl's constraints were found virtually identical to those obtained with Cobaya regardless of whether the matter power spectrum was computed using the Eisenstein and Hu (Eisenstein & Hu 1999, 1998) formula or EmuPk (Mootoovaloo et al. 2022) to emulate CLASS (Lesgourgues 2011).Moreover, the NUTS sampler proved to be 1.5 times more efficient (measured as effective samples per likelihood call) than the MH sampler used by Cobaya.However, this improvement in efficiency proved to be not enough to compensate for the cost of computing the gradient of the likelihood despite using AD methods.Further work is necessary to determine the point at which gradientbased inference methods out-weight the cost of computing the likelihood gradient in angular power spectra analyses.
In the second example we showcased the unique capabilities  -.Shows a comparison between the reconstructed growth factor in this work using a GP (blue) and the CGG21 reconstruction based on splines (black dots) of the same 7×2-pt data.We also overplot the P18 ΛCDM prediction for the growth factor (black dashed line) for reference.
of LimberJack.jl by performing a Gaussian process (GP) reconstruction of the growth factor across redshift adding to a total of 128 parameters.In order to constrain this model we employed a combination of DES Y1 galaxy clustering and weak lensing data, eBOSS QSO's and CMB lensing (referred to as 7×2-pt in the text) as well as a collection of the latest RSD measurements of   8 .We started by considering a ΛCDM analysis of the aforementioned data to establish a baseline for the GP reconstruction.Our ΛCDM analysis of the 7×2-pt data found  8 = 0.789 ± 0.016.Adding the RSD data yielded  8 = 0.793 ± 0.015.Regardless of whether or not RSD data are included, our results are in 2 disagreement with the Planck 2018 results which found  8 = 0.832 ± 0.013 (See Tab. 1 of Planck Collaboration et al. 2020).Performing the GP reconstruction instead yielded  8 = 0.839 ± 0.067 when 7×2-pt alone is considered and  8 = 0.793±0.017when RSD data is included.Looking at the reconstructed growth factor we observed a reasonable agreement between the GP and the Planck 2018 ΛCDM prediction regardless of the data combination used.However, including RSD data significantly smoothed the reconstruction of the growth factor, disfavouring large oscillations.Moreover, it improved the constraints on the reconstructed growth factor by 20% on average across redshift.This stresses the importance of including RSD data in future cosmological analyses, specially given the up coming DESI survey (DESI Collaboration et al. 2016).In terms of sampling efficiency, our GP analysis of 7×2-pt +RSD data using the gradient-based NUTS sampler managed to achieve the same sampling efficiency as our reference ΛCDM analysis of 7×2-pt data.Moreover, LimberJack.jl'sAD methods reduced the cost of the likelihood's gradient by an order magnitude with respect finite differences.In combination these two achievements made a previously unfeasible analysis computationally possible, taking O(10 2 ) CPU hours to reach convergence.
Auto-differentiable and gradient-based inference methods will play a crucial role speeding up future cosmological analyses as well as enabling entirely new science.For instance, analyses of multiple auto-and cross-correlations between stage-IV surveys may contain up to a hundred free parameters.This will be the case even when performing traditional ΛCDM analyses with minimalistic modelling of systematics simply due to the large number of tomographic bins that will be involved.Future surveys will, however, provide unprecedented measurements of small scales.In order to fit these scales and further our understanding of non-linear cosmology, more complex modelling of baryonic effects will have to be included.Similarly, the constraining power provided by the new data will enable analyses with beyond ΛCDM physics as well as modelagnostic reconstructions such as the one presented in CGG21 and this work which will have a similar effect in the number of free parameters.In addition to this, gradient-based inference methods are already indispensable to undertake field level inference cosmology (Lavaux et al. 2019;Bayer et al. 2022;Dai & Seljak 2023) and they will become more so in the future.
Finally, while LimberJack.jl is already a fully functional tool, there are several avenues for future improvement: 1. Improved predictions: the methods currently implemented in LimberJack.jl provide enough accuracy to analyse DES Y1 data.However, these methods will need to be improved in order to analyse DES Y3 data or future data sets such as LSST.Here we present a non-exhaustive list of possible extensions: • Non-linear corrections to the matter power spectrum beyond the Halofit formula.• Small scale baryonic effects on the galaxy and matter power spectra.• Angular power spectra beyond the Limber approximation (Leonard et al. 2023).• Scale-dependent growth of structure.
2. Parallelization: currently the threading parallelisation of LimberJack.jl is suboptimal.This is because the default Julia threading parallelisation scheme does not handle the shared memory between the threads efficiently enough.At the moment computing resources are best spent running different instances of LimberJack.jl in parallel as opposed to parallelising one instance.Future works could study alternative parallelisation schemes or manually managing the memory between the threads to improve the multi-core performance of LimberJack.jl.In FIg. 13 we show the derivatives of the non-linear matter power spectrum where the linear matter power spectrum was computed using the E&H formula.In each panel the we overplot the value of the corresponding derivative at  = [0.0,0.5, 1.0, 2.0] where light colours correspond to lower redshifts.As we can see, the derivatives with respect Ω m and Ω b of (, ) are highly oscillatory.On the other hand, the rest of the derivatives are smoother.Moreover, we can also observe that the value of the derivatives decreases for all five parameters as the redshift at which they are computed increases.
In Fig. 14 we show the derivatives of auto-and crosscorrelation angular power spectra of galaxy clustering, weak lensing and CMB lensing with respect the five ΛCDM cosmological parameters.In each panel, we over plot the five derivatives for a given angular power spectrum.In all cases we employed the E&H formula to compute the linear matter power spectrum.We can see that different cosmological parameters have vastly different impact in the computation of the angular power spectra.This can be seen not only in the different degrees of smoothness but also in the signs of the derivatives.We also note that the derivatives with respect to ℎ are jagged, specially galaxy clustering is involved.This is caused by the numerical scheme used to integrate the redshift distribution.

B. TUTORIAL
In this section we will show a small demonstration of how to use LimberJack.jl.We shall begin by having a quick overview of the structure of LimberJack.jl.This will help us understand how to use the code later on.As described in Sect.2.2, LimberJack.jl is a modular code.A summary of the relationship between the different modules can be found in Fig. 15.The core of LimberJack.jl is the Cosmology structure and its homonymous constructor.Cosmology's role is to contain all the information on how to compute the theoretical predictions.When the Cosmology structure is initiated it computes theoretical predictions for the expansion history, comoving distance, the growth factor and the matter power spectrum according to the provided prescriptions.The computation of background quantities occurs within core.jl itself.However, the computation of the matter power spectrum takes place across three different modules, boltzmann.jl,growth.jland halofit.jl,which compute the primordial matter power spectrum, the linear growth factor and the nonlinear corrections respectively.The different predictions are evaluated at a grid of values which are then used to build interpolators for each of the respective quantities.The interpolators are then stored inside the Cosmology structure.This allows the user to quickly compute the theoretical predictions at arbitrary values by using a series of public functions.These functions take the Cosmology structure and the necessary inputs.Inside these functions the corresponding interpolator is evaluated at the provided inputs to return the prediction to the user.
Let us briefly showcase how this is done in code 1 by computing some relatively straight forward predictions: Listing 1. Shows basic use of LimberJack.jlComputing the matter power spectrum is just as easy (see code 2).However, the computation is subject to a series of options that the user can alter.By default Cosmology will use the E&H formula to find  0 and it will not apply nonlinear corrections.These settings can be changed by specifying the keyword arguments tk mode and Pk mode.In terms of transfer functions LimberJack.jl offers two possibilities tk mode = :EisHu (default) / :EmuPk which correspond to using the Eisenstein and Hu formula or EmuPk.Similarly, Pk mode = :linear / :Halofit which determines whether or not non-linear corrections are applied using Halofit.LimberJack.jl offers two distinct public functions to evaluate either the linear or non-linear matter power spectrum regardless of the choice in Pk mode.However, if Pk mode = :linear the two functions will return the linear matter power spectrum.Listing 2. Shows how to compute the matter power spectrum in LimberJack.jl Computing angular power spectra is a slightly more involved process.An example of how this is done in LimberJack.jl can be found in code 3.In this code we can see that first a Cosmology structure must be initiated.The Cosmology structure automatically computes the matter power spectrum given the user specifications.Then the user must compute the radial kernels of the relevant tracers by proving the corresponding LimberJack.jl public functions with the Cosmology structure and the distribution of sources.Moreover, as described in 2.1 different tracers can be impacted by different systematic effects.These systematics are accounted by incorporating a -.Shows how the derivatives of auto-and cross-correlation angular power spectra of galaxy clustering, weak lensing and CMB lensing with respect the five ΛCDM cosmological parameters Ω m , Ω b , ℎ,  8 and  s .The linear matter power spectrum was computed using the E&H formula.In each panel, we over plot the five derivatives for a given angular power spectrum.series of nuisance parameters that can also be provided to the tracers public functions of LimberJack.jl.The output of the tracer functions (see lines 15 18 of code 3) is a Tracer structure that hosts interpolator for the corresponding radial kernel and the corrections described in Eqns.8 and 9 (set to in of clustering).The angular spectra computed by providing the AngularCls public function with the aforementioned Cosmology and Tracer objects as well as the desired multipoles.
Listing 3. Shows how to compute angular power spectra in LimberJack.jlFinally, in code 4 we show how to use Turing.jl in unison with LimberJack to build and sample an statistical model for a the DES Y1 3x2-pt analysis.The first step is to load the data.For this purpose we will use the libraries YAML and sacc, ubiquitous in astrophysics.Julia counts with a native implementation of YAML but not of sacc.However, calling Python libraries from Julia is extremely simple thanks to the PythonCall.jllibrary.PythonCall.jl allows us to import sacc.py as a Julia module and read files entirely within Julia.Note that in order to this we must first install sacc in the Python environment of Julia or point PythonCall to our local Pytthon installation.Instructions on how to do this can be found in the LimberJack.jlGitHub.Once the data are loaded they must be passed to the LimberJack.jlpublic function make data which turns the files into Julia structures that LimberJack.jl can easily manage.
After that, the user can use Turing.jl'smodel macro to define an statistical model.Inside the model, the user must define the priors for the parameters of the model using the Distributions.jlAPI.Note that while the cosmology parameters can be directly passed to the Cosmology structure constructor the nuisance parameters must be stored inside a Julia dictionary.The name of these parameters inside the dictionary must follow a strict convention "tracer name + + bin number+ +nuisance parameter name".
In order to obtain the theoretical prediction for the DES Y1 3x2-pt analysis data vector the user must provide the just initiated Cosmology structure and the meta and files structures generated by make data to the public function Theory.Moreover, the user must also provide the nuisance parameter dictionary using the Nuisances keyword argument.The Theory function orchestrates the computation of the theory vector, computing the necessary radial kernels and evaluating the angular power spectra in the correct order.
Once a theoretical prediction has been obtained, the user must define how the data is distributed with respect to the theory prediction.In the case of a Gaussian likelihood, this corresponds to a multivariate Gaussian distribution with mean the theory prediction and covariance matrix the data covariance matrix.All is left is to do is to use Turing.jl to condition the model on the observations (see line 80 of code 4), define the sampler we wish to use and sample the model.For a more thorough explanation of how to use Turing.jland its different options please see Turing.jl's documentation.The results can be found plotted in Fig.
Fig.1.-.Left column: Top panel shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the expansion history between 0 <  < 1100.Bottom panel shows a comparison between the derivative of the expansion history with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the same redshift range.Middle column: Top panel shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the comoving distance between 0 <  < 1100.Bottom panel shows a comparison between the derivative of the comoving distance with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the same redshift range.Right column: Top panel shows a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the linear growth factor 0 <  < 3. Bottom panel shows a comparison between the derivative of the linear growth factor with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the same redshift range.
Fig.2.-.Top panels show a comparison between the LimberJack.jl(solid green) and the CCL (dashed red) computation of the non-linear matter power spectrum for  = [0.0,0.5, 1.0, 2.0] and −4 < log 10 () < 2. Both LimberJack.jl and CCL used the E&H formula to compute the linear matter power spectrum.Then, Halofit was used to obtain the non-linear corrections in both cases.Bottom panels shows a comparison between the derivative of the LimberJack.jlmatter power spectra with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange).Each panel contains a subpanel where the relative difference between the two compared quantities is shown.
Fig.4.-.Shows a comparison between the derivatives of the auto-and cross-correlation angular power spectra of galaxy clustering, weak lensing and CMB lensing tracers with respect to Ω m computed using AD (solid blue) and finite differences (dashed orange) for the range of multipoles 1 < log(ℓ) < 3.In order to compute the gradients of the angula power spectra the E&H formula was used to compute the linear matter power spectrum.Then, Halofit was used to obtain the non-linear corrections in both cases.
Fig.5.-.Shows the marginalised posteriors distributions from the analysis of DES Y1 3x2-pt data for the cosmological parameters Ω m ,  8 and  8 .Lower panel compares the LimberJack.jl(solid blue) and CCL (dashed red) analyses when using the E&H formula.Upper triangle compares the LimberJack.jl(solid purple) and CCL (dashed green) analyses when using EmuPk and CLASS respectively.Cobaya contours were obtained using the MH sampler while LimberJack.jlcontours were obtained using the NUTS sampler.

Fig
Fig. 6.-.Shows logarithm of the correlation matrix of the 7×2-pt data set described in Tab. 2.

Fig. 8 . 3 TABLE 3 .
Fig. 8.-.Shows the  8 data points from the different surveys used in this work across redshift.Numerical values can be found in Tab. 3

Fig
Fig.9.-.Shows the 2D and 1D marginal distributions for the parameters Ω m ,  8 and  8 of the ΛCDM analysis of 7×2-pt data (red) and 7×2-pt plus RSD data (green).Black dashed contours show the reanalysis of the CGG21 using Tab.5's priors.
Fig. 11.-.Shows the evolution across redshift of the reconstructed  () (first row),  8 () (second row) and   8 () (third row).Results based on 7×2-pt data are shown on the left column while results combining 7×2-pt plus RSD data are shown in the right column.In each panel we over plot the prediction of the ΛCDM model for the given data as well as the result of the GP reconstruction.Moreover, we also overplot the P18 ΛCDM prediction using a black dashed line for reference.Finally, each panel counts with a subpanel where the we show the evolution of the 1 confidence intervals of the plotted functions across redshift.
Fig.12.-.Shows a comparison between the reconstructed growth factor in this work using a GP (blue) and the CGG21 reconstruction based on splines (black dots) of the same 7×2-pt data.We also overplot the P18 ΛCDM prediction for the growth factor (black dashed line) for reference.

Fig. 13 .
Fig.13.-.Shows the derivatives of the non-linear matter power spectrum (, ) with respect to the five ΛCDM parameters Ω m , Ω b , ℎ,  8 and  s .The linear matter power spectrum was computed using the E&H formula.The non-linear corrections were computed using the Halofit formula.In each panel the we overplot the value of the corresponding derivative at  = [0.0,0.5, 1.0, 2.0] where light colours correspond to lower redshifts.All the derivatives where evaluated at Ω m = 0.30, Ω b = 0.05, ℎ = 0.67,  8 = 0.81 and  s = 0.95

5
Hoffmann (2014)(2019)feature of LimberJack.jl which sets it apart from similar, more extensively tested codes (such as CCLChisari et al. (2019)) are its AD capabilities.AD methods can be classified into two groups, forwards and backwards.In order to understand the difference, let us consider a complicated function of an independent variable,  (), one can represent the function as a composition of simpler functions, The first of these different strategies to accumulate terms in the chain rule corresponds to forward AD while the second to backwards AD.Whether to use forwards or backwards AD will greatly depend on the nature of the problem.Generally1, given a function  : R N → R M , forwards AD will be more efficient at computing ∇  if  < .If, on the other hand,  < , backwards AD is preferred.LimberJack.jl is currently capable of forwards AD through the Julia library ForwardDiff.jlandReverseDiff.jlrespectively.However, only forwards AD is efficiently implemented.Both ForwardDiff.jlandReverseDiff.jlperformAD by pushing special types of numbers through the original computer program.On the one hand, ForwardDiff.jlusesdualnumbers.Dual numbers are expressions of the form  +   such that  2 = 0 but  ≠ 0. It can then be shown that given a generic function  ( + ) =  () +   ′ () (see Sect. 4 ofHoffmann (2014)for proof) such that the gradient of the original function can be easily obtained.This means that ForwardDiff.jlimposes little to no constraints on the program it differentiates through2.
(van Merriënboer et al. 2018)ey et al. 2023; Dalal  et al. 2023, among others)require the use of the more accurate methods such as HMCode(Mead et al. 2021)or baccoemu(Angulo et al. 2021)as smaller scales are included.2.2.() =   ( −1 (... 1 ( 0 )) where  0 = .Thus in order to obtain the gradient of  (), one can either find the gradient of the simpler functions with respect the independent variable or, alternatively, find the gradient of the original function with respect the simpler functions.On the other hand, ReverseDiff.jlusestapingnumbers, a special type of numbers that records all the operations the number undergoes, to generate a trace of the basic operations that compose a generic computer program(van Merriënboer et al. 2018).Given this record, ReverseDiff.jlcan then generate an expression for the gradient of the program.Thus ReverseDiff.jlrequires two passes through the original program.First, a forward pass generates the program's operation trace.Then, a backwards pass computes the partial derivatives and accumulates their values as the input is backpropagated.Because of the greater complexity of the algorithm, ReverseDiff.jlimposes strong demands on the computer programs it acts upon.Commonplace computations such as control flow or variable mutation are examples of operations that must be handled carefully.

TABLE 4 .
Effective sample size per calls of the likelihood for MH and NUTS samplers when analysing 7×2-pt and 7×2-pt +RSD data using both ΛCDM model as well as the GP reconstruction.

TABLE 5 .
Prior distributions for parameters considered for the Growth Factor Reconstruction.We sample the cosmological parameters keeping  8 fixed to avoid degeneracies with the Gaussian process.For the same reason the Gaussian process hyperparameters are also kept fixed.8=0.789 ± 0.016.The reanalysis of CGG21 using the priors shown in Tab. 5 found Ω m = 0.296 ± 0.028,  8 = 0.794 0.043 and  8 = 0.786 ± 0.015.Thus we can observe that the constraints produced by the LimberJack.jlpipelinepresented in this work are completely consistent with the results of the Cobaya pipeline of CGG21.Combining the 7×2-pt with the RSD data we found Ω m = 0.277 ± 0.021,  8 = 0.827 ± 0.034 and  8 = 0.793 ± 0.015.The addition of RSD data improved the constraints on Ω m and  8 by 20%.However, the constraints on  8 were largely unaffected.Regardless of whether or not RSD data are included, our results are in 2  disagreement with the P18 results which found  8 = 0.832 ± 0.013(See Tab. 1 of Planck Collaboration et al. 2020).The full numerical posteriors of the LimberJack.jlΛCDM analyses of 7×2-pt and 7×2-pt plus RSD data can be found in the first two columns of Tab. 6.

TABLE 6 .
Posterior distributions of the Growth Factor Reconstruction analyses.Error bars denote the 1 confidence interval.
computation of angular power spectra.Future work could study how to bypass these operations and to make LimberJack.jlcomptible with Julia's GPU libraries such as CUDA.jl. 4. Backwards-AD: currently LimberJack.jl'spreferred AD mode is forward-AD.However, statistical inference preferred AD mode is backwards-AD, specially as the number of parameters increases.Future works could look into making LimberJack.jlcompatible with the latest Julia AD libraries such as Zygote.jlor Enzyme.jl to implement efficient backwards-AD.
3. GPU's: LimberJack.jl currently cannot run on GPU's which are known to significantly speed-up cosmological inference.This is due to LimberJack.jl performing scalar indexing operations at several points of the