A Differentiable Model of the Assembly of Individual and Populations of Dark Matter Halos

We present a new empirical model for the mass assembly of dark matter halos. We approximate the growth of individual halos as a simple power-law function of time, where the power-law index smoothly decreases as the halo transitions from the fast-accretion regime at early times, to the slow-accretion regime at late times. Using large samples of halo merger trees taken from high-resolution cosmological simulations, we demonstrate that our 3-parameter model, Diffmah, can approximate halo growth with a typical accuracy of 0.1 dex for t>1 Gyr for all halos of present-day mass greater than 10^11Msun, including subhalos and host halos in gravity-only simulations, as well as in the TNG hydrodynamical simulation. We additionally present a new model for the assembly of halo populations, DiffmahPop, which not only reproduces average mass growth across time, but also faithfully captures the diversity with which halos assemble their mass. Our python implementation is based on the autodiff library JAX, and so our model self-consistently captures the mean and variance of halo mass accretion rate across cosmic time. We show that the connection between halo assembly and the large-scale density field, known as halo assembly bias, is accurately captured by Diffmah, and that residual errors in our approximations to halo assembly history exhibit a negligible residual correlation with the density field. Our publicly available source code can be used to generate Monte Carlo realizations of cosmologically representative halo histories; our differentiable implementation facilitates the incorporation of our model into existing analytical halo model frameworks.


INTRODUCTION
In the standard cosmological model, the matter content of the Universe is dominated by Cold Dark Matter (CDM), and gravitationally self-bound objects referred to as dark matter halos are the fundamental building blocks of structure formation (see Mo et al. 2010, for a comprehensive review). Dark matter halos are the natural sites of galaxy formation (White & Rees 1978;Blumenthal et al. 1984), and so a detailed understanding of the buildup and evolution of halos is a key ingredient of any theory of structure growth.
The basic physical picture of halo evolution is now relatively well understood. At early times, halo assembly is characterized by a "fast-accretion phase" in which the growth of halo mass is rapid and major mergers are common; mass accretion rates diminish considerably at later times in a "slow-accretion phase", during which time major mergers are comparatively rare (Bullock et al. 2001;Wechsler et al. 2002;Tasitsiomi et al. 2004). The mass density profile of dark matter halos is well-described by a double power law known as the NFW profile (Navarro et al. 1997), an approximation that remains reasonably accurate across most of cosmic time (although see Ludlow et al. 2013, for shortcomings of this approximation). Although the concentration parameter that defines the NFW profile exhibits a well-known dependence upon total mass (e.g., Diemer & Kravtsov 2015;Child et al. 2018), all halos in the fast-accretion phase tend to have similar values of concentration c ≈ 3 − 4 (Zhao et al. 2003a), and as halos transition to the slow-accretion E-mail:ahearin@anl.gov phase, their concentrations steadily increase as they predominantly pile up mass onto their outskirts (see Wang et al. 2020, for a detailed physical picture of the relationship between halo assembly and NFW concentration).
Many additional characteristics of the internal structure of dark matter halos are closely related to their assembly histories, including substructure abundance (Gao et al. 2004;Wechsler et al. 2006;Giocoli et al. 2010), the "splashback" feature in the outer profile (Diemer & Kravtsov 2014), and ellipsoidal shape (Chen et al. 2020a;Lau et al. 2021). The growth history of a dark matter halo is also tightly connected to the larger-scale environment in which it evolves, a phenomenon known as halo assembly bias (e.g., Gao et al. 2005;Mao et al. 2018;Mansfield & Kravtsov 2020).
The mass assembly history of dark matter halos is widely used in theoretical models of galaxy formation as a fundamental quantity regulating the star formation rate of the halo's resident galaxy, including early formulations of traditional semi-analytic modeling (Kauffmann et al. 1993;Baugh et al. 1996;Avila-Reese et al. 1998, SAMs, hereafter), practically all contemporary SAMs (e.g., Bower et al. 2006;Somerville et al. 2008;Henriques et al. 2015;Lagos et al. 2018), simple empirical models (Watson et al. 2015;Behroozi & Silk 2015), and recent empirical approaches (Becker 2015;Moster et al. 2018;Behroozi et al. 2019). Strong motivation for treating halo assembly as the backbone of galaxy formation models also comes directly from hydrodynamical simulations, which show that the accretion rate of baryonic mass closely tracks that of dark matter (Wetzel & Nagai 2015).
The assembly of a halo is fully characterized by its merger tree, which describes the mass and accretion time of the "progenitors" that merged into the main halo over time, as well as the progenitors of those progenitors, etc. The most common theoretical technique for modeling halo merger trees is the (extended) Press-Schechter formalism (Press & Schechter 1974;Bond et al. 1991;Bower 1991;Lacey & Cole 1993, EPS, hereafter). The EPS framework yields considerable insight into many aspects of the structure and buildup of dark matter halos (e.g., Somerville & Kolatt 1999;Nusser & Sheth 1999;Cole et al. 2008), and has been shown to yield reasonably accurate predictions for the average growth of halos over time (van den Bosch 2002), the mass function of the progenitor halos , and the statistical distribution of halo formation times (Li et al. 2007;Giocoli et al. 2012). Merger trees predicted by EPS furthermore enable suitably implemented SAMs to generate predictions for galaxy evolution without the need for high-resolution N-body simulations (e.g., Somerville et al. 2008;Benson 2012). We refer the reader to Zentner (2007) for a pedagogical overview of EPS, and to  for a comparison of modern implementations of EPS predictions of merger trees.
Empirically formulated fitting functions calibrated against cosmological simulations offer a simpler route to describing the evolutionary history of halo mass. The one-parameter exponential model developed in Wechsler et al. (2002) is one of the first of such models, and a number of improvements have followed this early empirical approach that capture richer phenomena with greater accuracy. For example, the fitting formula in Tasitsiomi et al. (2004) is a two-parameter generalization of the exponential growth model with improved performance for a greater diversity of cluster-mass halos. In Fakhouri et al. (2010), the authors present a simple and accurate fitting formula for dM halo (t)/dt|M 0 , the average mass accretion rate of halos as a function of cosmic time and present-day mass M 0 , and they furthermore find that they can use their model to accurately predict M halo (t)|M 0 by numerically integrating their parameterized mass accretion rate histories. In McBride et al. (2009), the authors introduce a different two-parameter fitting function for individual halo mass growth, and also include additional modeling ingredients for the dependence of how the fitting parameters depend on presentday halo mass.
The program to develop empirical approximations to dark matter halo assembly is faced with a number of challenges. Each of the efforts summarized above were conducted at fixed cosmology, and also for gravity-only N-body simulations, and the robustness of these findings to effects of cosmology and hydrodynamics remains relatively coarsely characterized (although see , discussed further in §5 below). Efforts to improve flexibility beyond the one-parameter exponential model also face a significant difficulty of ensuring physically consistent behavior across the full range of mass and redshift; for example, as pointed out in Appendix H of Behroozi et al. (2013a), predictions of the model presented in McBride et al. (2009) exhibit unphysical behavior at high redshift due to the crossing of average assembly histories of halos of different present-day mass. Of special concern to the present paper, empirical fitting functions historically only reproduce average halo growth across time, but do not enjoy capability to predict the diversity of halo assembly, a shortcoming made especially glaring through comparison to EPS-based techniques.
In this paper, we seek to improve upon this situation by developing an empirical approach to modeling the growth of both individual halos, as well as the assembly of cosmologically representative populations. In §3, we will introduce a new parameterization for individual halo growth, Diffmah, validating that our formulation is sufficiently flexible to capture the assembly of halos in both the gravity-only and full-physics hydro simulations described in §2. In §4, we outline DiffmahPop, our statistical model for the assembly of halo populations, relegating exposition on mathematical and computational details to the appendices. We discuss our results and potential applications in §5, and conclude in §6 with a brief summary of our primary results.

SIMULATIONS AND MERGER TREES
We have built our model for the mass assembly history (MAH hereafter) of dark matter halos to approximate results taken from the merger trees of halos identified in both gravity-only and hydrodynamical simulations. For our gravity-only simulations, we use a combination of the Bolshoi Planck simulation (BPL, Klypin et al. 2011) and MultiDark Planck 2 (MDPL2, Klypin et al. 2016). The BPL simulation was carried out using the ART code (Kravtsov et al. 1997) by evolving 2048 3 dark-matter particles of mass m p = 1.55 × 10 8 M on a simulation box of 250 Mpc on a side; the MDPL2 simulation was carried out using the L-GADGET-2 code (Springel 2005) with 3840 3 dark-matter particles of mass m p = 1.51 × 10 9 M on a simulation box of 1000 Mpc on a side; both simulations were run under cosmological parameters closely matching Planck Collaboration et al. (2014). For both BPL and MDPL2, we used publicly available 1 merger trees that were identified with Rockstar and Consistent-Trees (Behroozi et al. 2013b,c;Rodríguez-Puebla et al. 2016).
We additionally studied the assembly of halos in the IllustrisTNG hydrodynamical simulations. IllustrisTNG is a suite of hydro simulations that models the joint evolution of dark matter, gas, stars, and supermassive black holes, and includes treatment of radiative gas cooling, star formation, galactic winds, and AGN feedback (Weinberger et al. 2017;Pillepich et al. 2018). We use publicly available data from the largest hydrodynamical simulation of the suite, TNG300-1 ). The TNG300-1 simulation was carried out using the movingmesh code Arepo (Springel 2010) 2500 3 with gas tracers together with the same number of dark matter particles in a simulation box of 205 Mpc on a side under a cosmology very similar to Planck Collaboration et al. (2014). For TNG300-1 the corresponding mass resolution is 4.1 × 10 7 M and 7.7 × 10 6 M for dark matter and gas, respectively. Halos and subhalos in IllustrisTNG were identified with the SUBFIND algorithm (Springel et al. 2001), and the merger trees we use were constructed with SUBLINK (Rodriguez- Gomez et al. 2015).
For some of the results in the paper, we calculate crosscorrelation functions between simulated halos and the density field, using a random downsampling of particles from the appropriate snapshot. For BPL we use a random downsampling of dark matter particles provided by halotools (Hearin et al. 2017). For IllustrisTNG, we use a random downsampling of TNG300-1 particles at z = 0 kindly provided by Benedikt Diemer. All results in the main body of the paper pertain to the assembly histories of present-day host halos (i.e., upid=-1 for Rockstar, and the "main halo" for SUBFIND).
Comparable results for the case of subhalos can be found in Appendix D. Throughout the paper, including the present section, values of mass and distance are quoted assuming h = 1. For example, when writing M peak = 10 12 M , we suppress the M /h notation and write the units as M .

DIFFMAH
In this section, we present results for Diffmah, our model of the assembly of individual dark matter halos. In §3.1 we describe the functional form of the model and its differentiable implementation, and in §3.2 we show the accuracy with which the model can approximate the formation history of simulated halos.

Diffmah Model formulation
The Diffmah model for the growth of individual halos is designed to capture the evolution of cumulative peak halo mass, M peak (t), defined as the largest mass the main progenitor halo has ever attained up until the time t. Thus as a matter of definition, M peak (t) is a non-decreasing function for all t, so that M peak (t) will remain constant if a halo experiences a period of mass loss.
We model M peak (t) with a power-law function of cosmic time with a rolling index, where t 0 is the present-day age of the universe, M 0 ≡ M peak (t 0 ), and α(t) is given by a sigmoid function defined as follows: .
( 2) The parameters α early and α late define the asymptotic value of the power-law index at early and late times, respectively, and τ c is the transition time between the early-and late-time indices. The parameter k controls the speed of the transition between the two regimes; as described in Appendix A, for all results in the paper we hold k fixed to a constant value of 3.5.
In Figure 1, we give a visual illustration of the physical interpretation of the three free parameters of our model: α early , α late , and τ c . Each of the six panels shows the assembly history of halos with present-day mass M 0 = 10 12 M , with variations of the MAH parameters color-coded according to the legend. For each of the three MAH parameters, α early , α late , and τ c , smaller values of the parameter corresponds to a halo that was accreting mass more rapidly at early times and more slowly today; thus a halo with a smaller value of a MAH parameter has an earlier formation time relative to a halo with a larger value of that parameter. For curves labeled "fiducial model", we have (α early , α late , τ c ) = (2.5, 0.3, 1.25[Gyr]); for parameters set to their smaller values, we have (α early , α late , τ c ) = (1. 25,0.05,0.6[Gyr]); for larger values, we have (α early , α late , τ c ) = (5.0, 0.6, 2.5[Gyr]). The top row of panels shows the history of cumulative mass, M peak (t), which is parameterized directly according to Eq. 1. The bottom row of panels shows dM peak /dt, the history of mass accretion rate. While dM peak /dt is analytically calculable by symbolically differentiating Eq. 1, we have computed each curve in the bottom panels of Figure 1 via automatic differentiation (autodiff, see Baydin et al. 2015, for a recent review), as our source code is implemented based on the JAX autodiff library (Bradbury et al. 2018).

Diffmah Model performance
In the previous section, we illustrated the basic features of the Diffmah model for dark matter halo assembly. Here we show the accuracy with which simulated halo histories can be approximated by Diffmah. When fitting the mass growth of an individual dark matter halo with the fitting function in Equation 1, we allow the parameters α early , α late , and τ c to vary freely, and use gradient descent to minimize the logarithmic difference between the simulated and predicted values of M peak (t), with k held fixed to a constant value of 3.5 for all halos, and M 0 held fixed to M peak (t 0 ). We refer the reader to Appendix A for a detailed description of our optimization technique. Figure 2 shows a typical example of the ability of Diffmah to approximate individual dark matter halo growth. Both panels show the assembly history of the same dark matter halo taken from the IllustrisTNG simulation; the top panel shows the history of the halo's mass accretion rate, and the bottom panel shows the history of its cumulative peak mass. The blue curve in each panel shows the assembly history of the halo as taken directly from the simulated merger tree, and the orange curve shows the differentiable approximation. As described in Appendix A, our differentiable approximation comes from fitting M peak (t) of the simulated halo, and our approximation for dM peak /dt is a derived quantity.

Capturing individual halo growth
Using the optimization techniques detailed in Appendix A, we have identified a set of best-fitting parameters for every halo in the samples described in §2. Figure 3 displays the quality of these fits for both gravityonly N-body simulations and IllustrisTNG. Each panel shows the logarithmic difference between the simulated and best-fitting MAH plotted as a function of cosmic time; results for samples of halos of different present-day mass are shown in different panels as indicated by the inpanel annotation. The black curves in each panel show the average residual difference, and the shaded band shows the 1σ scatter; residuals for gravity-only simulations are shown with the solid black curve and purple shaded region; residuals for IllustrisTNG are shown with the dashed black curve and green shaded region. Figure 3 shows that the rolling power-law model defined by Eq. 1 gives an unbiased approximation to halo mass assembly with a typical error of 0.1 dex for t 1 Gyr; residual errors are slightly larger at higher halo Each panel shows the assembly history of a dark matter halo with present-day mass M 0 = 10 12 M ; the top row of panels shows the history of cumulative peak mass, M peak (t); the bottom row of panels shows the history of mass accretion rate, dM peak /dt. Our model approximates halo assembly as a power-law function of time with rolling index, M peak (t) ∝ t α(t) , as in Eq. 1. The left columns show the influence of α early , the power-law index at early times; the middle columns show the influence of α late , the power-law index at late times; the right columns show τc, which controls the transition time between the fast-and slow-accretion regimes. Smaller values of each parameter correspond to halos with earlier formation times. mass due to the increased prominence of major mergers. In Appendix D, we show that analogous results hold for the case of present-day host halos that have experienced a flyby event at some point in their past history ( Figure D13), for present-day subhalos ( Figure D14), and also for subhalos that have either merged or become disrupted prior to z = 0 ( Figure D15). In Figure 3, we have restricted attention to t > 1 Gyr for the sake of ensuring good mass resolution across the mass range of all our samples; this cut in cosmic time is driven by our focus on the redshifts z 5 that are most relevant to cosmological surveys, and by the lack of a publicly-available, higher-resolution simulation with homogeneously processed merger trees (see §A for further discussion).

Capturing halo assembly bias
Even though Figure 3 shows that simulated MAHs are well approximated by the Diffmah model, of course the agreement is not perfect, and a question that naturally arises is whether the residual errors are correlated with the cosmic density field. As mentioned in §1, at fixed present-day mass, the large-scale density of halos exhibits a significant dependence upon assembly history, a phenomenon referred to as halo assembly bias. This effect is commonly quantified in terms of the two-point cross-correlation between halos and dark matter particles, ξ δm (r). For example, for a sample of dark matter halos with the same value of M 0 ≡ M peak (t 0 ), the quantity ξ δm (r) varies with the formation time of the halo, t form , defined as the first time the halo attains some specified fraction of M 0 . The non-zero residual errors shown in Fig. 3 indicate that each halo's approximate and actual values of t form will differ, and it is plausible that the difference correlates with the density field in a way that alters the t form -dependence of ξ δm (r). For example, halos with a merger-rich history tend to reside in denser regions relative to halos with a more quiescent assembly, and the prevalence of mergers may be correlated with the residual errors of our best-fitting approximation.
To investigate this question, we select a sample of host halos in the BPL simulation with 10 11.9 M ≤ M 0 ≤ 10 12.1 M , divide the sample in half according to the median value of t form for the sample, and use halotools (Hearin et al. 2017) to compute ξ δm (r) for each subsample. We have repeated this exercise using the value of t form computed directly from each halo's simulated merger tree, and compared the clustering to the case when t form is defined instead based on the differentiable approximation to each individual halo's MAH. We display our results in Figure 4; the top panel shows t form = t 4% , and the bottom panel shows t form = t 50% . We find a similar level of agreement for halos of different mass in both gravity-only simulations, and Figure E17 in Appendix E shows the analogous result for the case of IllustrisTNG. We have also verified that we obtain a similar level of agreement between simulation-and modelbased clustering when dividing our halo samples based on quartiles of formation time rather than medians.
In summary, the results of §3 and Appendices D-E demonstrate that the 3-parameter fitting function defined by Eq. 1 is sufficiently flexible to approximate M peak (t) with an accuracy of better than 0.1 dex across nearly all of cosmic time; that this level of accuracy is ro- -Example fit to individual halo growth. The blue curve in each panel shows the assembly history of the same dark matter halo taken directly from the merger trees of the Illus-trisTNG simulation. The top panel shows the history of the halo's mass accretion rate, and the bottom panel shows the history of its cumulative peak mass. In each panel, the orange curve shows the approximate halo history based on our differentiable model. bust to variations in halo growth due to hydrodynamics and baryonic feedback, and applies equally well to subhalos and splashback centrals; and that the small residual errors of the approximation exhibit an essentially negligible correlation with the large-scale density field.

MODELING THE ASSEMBLY OF HALO POPULATIONS WITH DIFFMAHPOP
In the previous section, we presented Diffmah, a model for the assembly history of individual dark matter halos defined by Eq. 1. Our model for M peak (t) is characterized by three parameters: τ c , α early , and α late . In this section, we present DiffmahPop, a model for P (M peak (t)|M 0 ), the statistical distribution of assembly histories for halos of present-day peak mass, M 0 . Having shown that individual simulated MAHs can be accurately approximated by our parametric fitting function, our approach to modeling P (M peak (t)|M 0 ) is to construct a suitably accurate statistical model for P (α early , α late , τ c |M 0 ), the distribution of our MAH model parameters as a function of M 0 .
In order to motivate the form of our model for P (α early , α late , τ c |M 0 ), in Figure 5 we show two different cross sections of this multi-dimensional distribution, focusing on BPL halos with M 0 = 10 12 M . The top panel shows a histogram of P (τ c |M 0 ), which shows a roughly log-normal shape with a modest bimodality in which two distinct sub-populations emerge: earlierforming halos with τ c 2 Gyr, and later-forming ha- Using on a large collection of fits to the assembly histories of simulated halos, each panel shows the logarithmic difference between the simulated and approximate values of cumulative peak mass, M peak (t), as a function of time; results for samples of halos of different present-day mass are shown in different panels as indicated by the in-panel annotation. The black curve in each panel shows the average residual difference, and the shaded band shows the 1σ scatter; residuals for gravity-only simulations are shown with the solid black curve and purple shaded region; residuals for Illus-trisTNG are shown with the dashed black curve and green shaded region. The figure shows that the rolling power-law model defined by Eq. 1 gives an unbiased approximation to halo mass assembly with a typical error of 0.1 dex for t 1 Gyr. Using a sample of host halos in the BPL simulation with M peak (t 0 ) ≈ 10 12 M , we divide the sample in half according to t form , the median value of halo formation time for the sample, and for each subsample we compute ξ δm (r), the cross-correlation between halos and dark matter particles. Red curves show ξ δm (r) for early-forming halos, blue curves for lateforming halos; solid curves show results for the case where t form is computed directly from each halo's simulated merger tree; dashed curves show results for the case where t form is defined by the differentiable approximation to each halo's assembly history. The top and bottom panels show results for different definitions of t form as indicated in the legend. The figure demonstrates that the correlation between halo formation time and the density field is retained when simulated merger trees are approximated with our differentiable model. los with τ c 2 Gyr. In the bottom panel of Figure 5, we show a scatter plot of the two-dimensional distribution of P (β e , β |M 0 ), where we define β = U (α late ), the variable β e = U (α early − α late ), and the function U (s) ≡ ln(exp(s) − 1). As described in Appendix A, the variables β e and β are the actual quantities that we programmatically vary when seeking best-fitting approximations to the growth of individual halos, as this enforces physical constraints in the approximation to each halo's assembly. In the bottom panel of Figure 5, points with τ c > 2 Gyr are shown in purple, and points with τ c < 2 Gyr are shown in orange. The color-coding in the bottom panel brings the bimodal structure of P (α early , α late , τ c |M 0 ) into sharper relief; while the twodimensional marginal distribution P (α early , α late |M 0 ) is rather complex, the same two-dimensional distribution becomes significantly simpler when transforming to the The figure shows the distribution of best-fitting parameters of our model for individual halo growth for a sample of host halos in the BPL simulation with M peak (t 0 ) ≈ 10 12 M . The top panel shows the distribution of τc, which presents a modest bimodality of early-forming halos (τc < 2 Gyr) and late-forming halos (τc > 2 Gyr). The bottom panel shows a scatter plot of U (α early ), and U (α early − α late ), where U (s) ≡ ln(exp(s) − 1), with points colorcoded according to their value of τc. The two panels together highlight the bimodal structure of the three-dimensional probability distribution, P (τc, α early , α late |M 0 ). We leverage this bimodality when building our model for the growth of halo populations.
variables β e and β , and conditioning the distribution upon whether τ c < 2 Gyr. In the remainder of this section, we will outline how we have leveraged this apparent bimodality to build a simple two-population model of halo assembly; we refer the reader to Appendix B for a more expansive discussion of the full three-dimensional distribution, P (α early , α late , τ c |M 0 ).
We model P (α early , α late , τ c |M 0 ) as a two-population normal distribution of three variables: log 10 τ c , β = U (α late ), and β e = U (α early − α late ). For BPL, MDPL2, and IllustrisTNG, we find that relative abundance of the two populations is roughly 50%, but exhibits a shallow mass-dependence. Each sub-population is described by an independently-defined mean, µ i , and covariance matrix, Σ i , each of which also have a shallow, monotonic mass-dependence, so that µ i = µ i (M 0 ) and Σ i = Σ i (M 0 ). In calibrating our model parameters for this mass dependence, our principal goal is to faithfully describe P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ). Thus when optimizing our model parameters, for our target data we use measurements of the first two moments of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ), rather than attempting to directly recover the distribution P (α early , α late , τ c |M 0 ). Briefly, we tabulate P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ) using several narrow bins of halo mass, trimming 3σ outliers before computing mean and variance; we optimize the parameters of our model for µ i (M 0 ) and Σ i (M 0 ) by minimizing the logarithmic difference between the mean and variance of simulated and predicted values for P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ). Our differentiable formulation in JAX enables us to optimize our parameters with the gradient-based BFGS algorithm implemented in scipy (Broyden 1970;Fletcher 1970;Goldfarb 1970;Shanno 1970). We refer the reader to Appendix C for a detailed description of our application of these techniques.
In Figure 6, we compare the average assembly of halos in gravity-only simulations to our best-fitting model; the top panel shows the average cumulative peak mass as a function of cosmic time, and the bottom panel shows average accretion rates. Assembly histories for halos of different present-day mass are color-coded as shown in the legend, with dashed curves showing the model, and solid curves showing the corresponding quantity in the simulations; we use BPL for halos with M peak ≤ 10 13.5 M , and MDPL2 for more massive halos.
Several of the basic physical principles of CDM structure formation are on plain display in Figure 6. In the top panels, we can see that for halos of all mass, the slope of M peak (t) is steeper at early times relative to late times; the flattening of the slope of M peak (t) is a manifestation of the evolution of halos from the fast-accretion regime to the slow-accretion regime. Structure growth in CDM is said to be "hierarchical", such that the formation of smaller halos precedes that of more massive halos. The hierarchical nature of halo assembly is visible in both panels, as we can see that the slope of the bluer curves reaches an inflection point at earlier times relative to the redder curves. In the bottom panel we can see the influence of dark energy on halo mass accretion rates: at low redshift when the cosmic acceleration becomes significant, the value of dM peak /dt is declining for halos of all mass, with lower-mass halos becoming subject to this trend at earlier epochs.
In Figure 7, we show that the DiffmahPop model for the assembly of halo populations captures both the average growth of halos, as well as the diversity of halo assembly. In the left column of panels of Fig. 7, we plot M peak (t)|M 0 , and in the right column of panels we plot dM peak /dt|M 0 ; in all panels, solid black curves show the quantity taken from the simulation, gray shaded bands show the 1σ standard deviation of the simulated quantity, and the dashed green curves show the results of our best-fitting model. Each row of panels corresponds to the history of halos with different present-day mass, as indicated by the in-panel annotation.
As mentioned above, the plotted data in Figure 7 are the first and second moments of P (M peak (t)|M 0 ), which we used directly as target data to calibrate our model parameters. In Figure 8, we show comparisons between simulations and our model predictions for the full distribution, P (M peak (t)|M 0 ). In the top panel of Fig. 8, we focus on a sample of host halos in MDPL2 with present-day mass of M 0 = 10 14 M , and show the distribution of halo masses at several different redshifts as indicated in the legend. In the bottom panel of Fig. 8, we focus on host halos in BPL with present-day mass of M 0 = 10 12 M , and show distributions of halo formation time, t form , defined as the first time the halo attained some specified fraction of its present-day mass; different shaded histograms show different choices for the definition of t form as indicated in the legend. In both panels, each shaded histogram is accompanied by a dashed black curve showing the predictions of our differentiable model of halo populations. Fig. 8 does not display the same level of agreement between simulations and model as illustrated in Figure 7, because in Fig. 8 we show the full probability distributions taken directly from the simulated merger trees without trimming 3σ outliers. We refer the reader to §C.2 for a detailed description of our outlier exclusion algorithm, which has the effect of excluding 2-3% of halos at each mass that have recently experienced an unusu- In the left column of panels, we plot the history of cumulative mass of simulated halos, M peak (t), showing its average value with the solid black curve, and the 1σ standard deviation with the gray band. The right hand column of panels shows the mean and standard deviation of halo mass accretion rates. The history of halos of different present-day mass are shown in different rows of panels as indicated by the in-panel annotation. In all panels, the dashed green curves show our model for the assembly of halo populations.
ally large major merger. Thus the shaded histograms in Fig. 8 include contributions from short-term fluctuations associated with recent mergers, whereas we optimized our model for halo populations using a sample that excludes such contributions. This difference can be seen in two different manifestations in Fig. 8. First, the PDFs shown in the shaded histograms have slightly broader support relative to our model, reflecting the increased variance associated with the inclusion of outliers. Second, the shaded histograms tend to be shifted towards slightly earlier redshifts relative to the model; this second effect is due to our election to model the evolution of peak halo mass, M peak (t), whose associated formation times are asymmetrically shifted to earlier epochs by major mergers. Both of these effects are visible in Fig. 8. The full distributions for Milky Way mass halos shown in the bottom panel display a somewhat better level of agreement relative to the full distributions for clustermass halos shown in the top panel; this is sensible since the assembly of more massive halos tends to be more rich in mergers.

DISCUSSION & FUTURE WORK
We have presented a new model for the assembly history of individual dark matter halos, Diffmah. In our model, halo mass is a power-law function of cosmic time with a rolling index, M peak (t) ∝ t α(t) , where α(t) is a sigmoid function that smoothly transitions halo growth from the early-time, fast-accretion regime, to the latetime, slow-accretion regime (see Eqs. 1-2). We note that we have chosen to model peak halo mass, M peak (t), the largest mass attained by the halo up until the time t, and so our model is not intended to capture the instantaneous halo mass, M halo (t), which is subject to additional physical effects whose treatment is beyond the scope of the present paper.
The results in §3 in the main body of the paper demonstrate that for host halos identified with Rockstar over a wide range of present-day mass, M 0 ≡ M peak (z = 0) 10 11 M , the Diffmah model faithfully describes the mass assembly of individual halos over nearly all of cosmic time, t 1 Gyr, with a typical accuracy of ∼ 0.1 dex or better. In Appendix D, we show that our model for M peak (t) enjoys the same level of accuracy regardless of whether the host halo experiences a splashback event in its past history, or whether the halo eventually becomes a subhalo of a more massive host, including cases where the subhalo is disrupted or merges with its host. In Appendix E (together with Fig. 3), we show that our model achieves the same level of accuracy in its description of M peak (t) for both host halos and subhalos in IllustrisTNG, a full-physics hydrodynamical simulation whose halos and merger trees were identified with entirely different algorithms from those used in the analysis of the gravity-only simulations we consider.
We have additionally presented a new model for the assembly of halo populations, DiffmahPop. In typical empirical approaches to modeling halo assembly, average halo growth is directly parameterized with a fitting function, and variance in halo assembly is modeled simply as random scatter. Our alternative approach is formulated by using the Diffmah model for individual halo growth as its fundamental basis, and by building a model for the statistical distribution of the Diffmah parameters that regulate individual halo assembly. As a result, DiffmahPop not only reproduces M peak (t)|M 0 , but also a cosmologically realistic diversity of individual halo trajectories across time, P (M peak (t)|M 0 ). The DiffmahPop model for the growth of halo populations is differentiable courtesy of its implementation in the autodiff library JAX, and accurately captures the average mass accretion rate across cosmic time, dM peak /dt|M 0 , as well as the diversity of accretion rates P (dM peak /dt|M 0 ). As a convenience, our publicly available python code includes functionality to generate a Monte Carlo realization of the diversity of halos with the same M 0 .
The accuracy of our approximation may not be surprising as previous results have shown that halo mass is predominantly assembled via a combination of smooth accretion and mergers with a very small mass ratio (e.g., Genel et al. 2010). However, our model could naturally be used as the basis of empirical techniques to predict merger rates and/or the subhalo mass function at the time of accretion (also referred to as the unevolved subhalo mass function, or USHMF hereafter). For example, in Yang et al. (2011), the authors developed an analytical model that accurately predicts the USHMF; as the basis of their model, they used the fitting function of Zhao et al. (2003b) for the median assembly history of dark matter halos, supplementing this with a simple lognormal distribution to describe random scatter in the main-branch mass as a function of time. The techniques in the present work could be used to improve upon such an approach, as our model provides an accurate description of P (M peak (t)|M 0 ), as shown in Figure 7. Such an extension could also prove fruitful when used in concert with subhalo orbital evolution codes (e.g., Zentner et al. 2005;Jiang et al. 2021). Recent progress in modeling subhalo orbital evolution has demonstrated the importance of accounting for the evolution of the host halo potential (Ogiya et al. 2021), and so deploying our model in this context could be leveraged to capture physically realistic accretion-rate correlations in the halo-to-halo variance of substructure abundance and orbits across cosmic time (see, e.g., Jiang & van den Bosch 2017). Along similar lines, it has recently been shown that scatter in the thermal Sunyaev-Zel'dovich effect is largely driven by variance in the assembly histories of cluster-mass halos (Green et al. 2020), and so our model may also help improve the predictive power of models for the massobservable relation of galaxy clusters.
While previous work has demonstrated that the cosmology-dependence of halo assembly is relatively simple (e.g., Zhao et al. 2009), all results in the present work are limited to a fixed set of cosmological parameters that closely match those in Planck Collaboration et al. (2014). In van den , it was shown that halo assembly histories have a universal form controlled entirely by the cosmology-dependence of cosmic time, and so it may be relatively straightforward to extend our model for P (M peak (t)|M 0 ) to capture the influence of cosmological parameters on halo growth. Such an extension could be directly incorporated into frameworks that model cosmology-dependence via physicallymotivated rescalings (e.g., Angulo & White 2010;Renneby et al. 2018;Aricò et al. 2020). In principle, our model could also be used in applications with emulators built upon cosmological suites of simulations (e.g., Heitmann et al. 2016;DeRose et al. 2019;Nishimichi et al. 2019), although this would require suites with high massresolution that include merger trees, which remains uncommon (although see Contreras et al. 2020, for recent progress).
Our restriction to cosmological parameters similar to Planck Collaboration et al. (2014) is just one example of how in the present work, we have not attempted to provide a comprehensive study of P (M peak (t)|M 0 ). As another example, even though we have provided separate calibrations of our model for halos in gravity-only simulations and in IllustrisTNG, we have not attempted a detailed characterization of the differences between the assembly histories of halos in the two simulations. For such a purpose, a simulation suite such as CAMELS (Villaescusa-Navarro et al. 2020) that spans a range of baryonic feedback parameters would enable a more rigorous study than an uncontrolled comparison between IllustrisTNG and N-body halos. Furthermore, when calibrating our models for P (M peak (t)|M 0 ), we have focused on the mass range 10 11.5 M ≤ M peak ≤ 10 14.5 M as this range of halo masses contains thousands of halos that are resolved by over 2000 particles in the N-body simulations we use; however, a series of nested-resolution boxes would be considerably more effective towards the goal of extending the mass range and conducting rigorous resolution tests. As discussed in §A, we have similarly focused on the range of cosmic time t > 1Gyr to avoid potential resolution issues in our simulations, but our work could be extended to redshifts z > 5 with a simulation suite that would permit such resolution tests. Improvements such as these would require a dedicated effort along the lines of what has been done to calibrate models of the halo mass function (e.g., Jenkins et al. 2001;Tinker et al. 2008;McClintock et al. 2019; Bocquet et al. 2020); we thus consider these improvements beyond our present scope, as here we merely demonstrate that a modeling approach such as ours has capability to deliver a precision tool.
When fitting the growth history of large samples of simulated halos, we find that the distribution of best-fitting parameters exhibits a clear bimodality, with earlier-forming and later-forming halos occupying distinct regions of the three-dimensional parameter space of α early , α late , and τ c . Bimodal halo growth has been reported previously in Shi et al. (2018) in terms of the distribution of halo formation times, although this previous finding applied only to infall halos that eventually became subhalos of some more massive host. The bimodality we find pertains to host halos at z = 0, applies to halos of all mass, persists even when excluding halos that have previously experienced a flyby event, and is present with comparable strength in both gravity-only simulations and IllustrisTNG. We have leveraged this bimodal structure in building our analytical model for P (M peak (t)|M 0 ), as capturing this feature considerably simplifies the task of achieving the level of agreement with simulations shown in Fig. 7, particularly for the case of the second moment.
The two-point clustering of dark matter halos of the same mass has well-known dependence upon halo formation time, t form , a phenomenon known as halo assembly bias. In Figure 4, we showed that the t form -dependence of halo clustering remains the same whether one uses the actual formation time taken directly from simulated merger trees, or the value of t form taken from our bestfitting differentiable model. This result implies that our model can approximate halo growth in such a way that residual errors in the fit are uncorrelated with the density field on large scales. This finding is especially interesting in light of the relationship between assembly bias and halo concentration (Wechsler et al. 2006), the latter of which is closely connected to assembly history (Wechsler et al. 2002;Chen et al. 2020b), and is known to be strongly influenced by mergers (e.g., Wang et al. 2020). We will investigate this issue systematically in follow-up work to the present paper.
As discussed in §1, halo assembly history plays a fundamental role in contemporary semi-analytic models of galaxy formation (SAMs), since the star formation rate of a main sequence galaxy is commonly assumed to be proportional to the mass accretion rate of its parent halo. We have formulated our model for the assembly of halo populations with numerous applications to SAMs squarely in mind. For example, our model has potential to considerably accelerate efforts to explore the portions of SAM parameter space that regulate eff , the efficiency with which galaxies transform accreted mass into stars, since running SAMs on main-branch histories alleviates the need to solve galaxy formation ODEs along every branch of the merger tree. On the one hand, this computational speedup comes at the cost of neglecting the role of mergers on SAM-predicted galaxy properties. On the other hand, the capability of our model to generate a physically realistic diversity of smooth, individual halo assembly histories enables a rather comprehensive investigation of the specific role of mergers in shaping the properties of the galaxy population. We intend to pursue these directions in future work based on the Galacticus semi-analytic model (Benson 2012); Galacticus already includes an independent implementation of our model in its publicly available source code 2 , which may be useful in its own right for researchers more accustomed to programming in Fortran rather than Python.
As the principal aim of our future work, we are using our model for halo assembly as the foundation of a new approach to forward modeling cosmological structure formation, Surrogate Modeling the Baryonic Universe (SBU). The basis of SBU is a minimally flexible parameterized family of solutions to the differential equations of galaxy formation; we make predictions for galaxy populations by building a statistical model for a cosmologically realistic diversity of individual galaxy histories (Alarcon et al. 2021). The approach taken in the present work to modeling halo populations essentially provides a template for this framework.
This forward modeling effort is already well underway. In Chaves-Montero & Hearin (2020), we studied the influence of star formation history (SFH) on the broadband photometry of galaxies and found that, despite the complexity of the processes that impact optical color, physically-motivated variations in SFH correspond to changes in a unique direction in color-color space. In a follow-up paper (Chaves-Montero & Hearin 2021), we studied how the presence of unresolved SFH fluctuations impact model predictions for broad-band photometry, finding that the statistical distributions of the broadband colors of a galaxy population are quite insensitive to short-term variability. These results taken together indicate that the influence of halo assembly upon galaxy photometry is likely to be simple, and that the absence of an explicit treatment of halo mergers in our model may not pose a problem for the ability of SBU models to make accurate predictions for the color distributions observed by imaging surveys such as the Dark Energy Survey 3 (DES, Dark Energy Survey Collaboration 2016) and the Legacy Survey of Space and Time 4 (LSST, Ivezić et al. 2019;LSST Science Collaboration et al. 2009). Furthermore, the results shown in Figure 4 indicate that SBU predictions for color-dependent clustering are also likely to be unbiased by the smooth nature of our model for halo assembly.
We expect that the differentiable formulation of our halo assembly model based on the JAX autodiff library will play a critical role in this forward-modeling program. Modern analyses of large-scale structure rely upon a considerable number of "nuisance parameters" in order to ensure that cosmological inference is robust to systematic uncertainty. Conventional Bayesian techniques such as MCMC scale poorly with the dimension of the model parameter space, and since the statistical precision of cosmological surveys will dramatically improve in the 2020s, the parameter space of models used to analyze largescale structure measurements will only continue to increase. The capability to make differentiable predictions addresses this growing problem, since gradient-based optimization techniques such as Adam are extremely efficient even in very high dimensions (Kingma & Ba 2014), as are gradient-based inference methods such as Hamiltonian Monte Carlo (Hoffman & Gelman 2014). In the present work, we have shown that not only is our model for individual halo growth differentiable, but using the weighted sampling techniques described in Appendix C, so are one-point function predictions based on our model for the assembly of halo populations. In a closely related paper to the present work ), we will introduce a new theoretical framework for making differentiable predictions for two-point functions such as galaxy clustering and lensing, using abundance matching ) as a toy model for the galaxy-halo connection. The halo assembly model presented here will provide a core ingredient to our program to make differentiable, multi-wavelength predictions for large-scale structure that are physically self-consistent across cosmic time.

SUMMARY
We conclude by summarizing our primary results: 1. We have introduced Diffmah, a new fitting function for M peak (t), the evolution of cumulative peak mass of individual dark matter halos; our model approximates halo assembly as a power-law function of cosmic time with rolling index, M peak (t) ∝ t α(t) (see Eq. 1). Using both gravity-only simulations and IllustrisTNG, we have demonstrated that the Diffmah model can approximate halo growth with an accuracy of 0.1 dex over the range t 1 Gyr for halos of present-day mass M 0 10 11 M .
2. We find that the connection between halo assembly and the large-scale density field, known as halo assembly bias, is entirely captured by Diffmah, and that residual errors of our differentiable approximation to halo growth exhibit a negligible correlation with the density field on large scales.
3. We have introduced a new model for the growth of halo populations, DiffmahPop, and shown that our model faithfully reproduces the evolution of average halo mass, M peak (t)|M 0 , and average mass accretion rate, dM peak /dt|M 0 , for cosmic times t 1 Gyr. The DiffmahPop model additionally captures the diversity of halo mass assembly, P (M peak (t)|M 0 ), as well as the diversity of accretion rates, P (dM peak /dt|M 0 ). 4. Our python code, diffmah, is publicly available and can be installed with pip or conda. The repository for our source code is on GitHub, https:// github.com/ArgonneCPAC/diffmah, and includes Jupyter notebooks providing demonstrations of typical use cases. A parallelized script in the diffmah repository can be used to fit the assembly histories of individual simulated halos with the Diffmah parameters. The diffmah code provides a differentiable description of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ) for both gravity-only simulations and IllustrisTNG, and also includes a convenience function that can be used to generate Monte Carlo realizations of cosmologically realistic populations of halo assembly histories. Precomputed fits for hundreds of thousands of halos in the BPL, MDPL2, and IllustrisTNG simulations are available at https://portal.nersc. gov/project/hacc/aphearin/diffmah_data/. In this appendix, we give a detailed description of Diffmah, our parametric model for the mass assembly of individual halos. As outlined in §3, the Diffmah model for individual halo growth is based on Eq. 1, which describes a power-law scaling between peak halo mass and cosmic time, M peak = M 0 (t/t 0 ) α(t) , where the powerlaw index α(t) smoothly rolls between asymptotic values α early and α late according to the sigmoid function in Eq. 2. Thus there are a total of six numbers that fully characterize our model for individual halo growth: the normalization, M 0 , the present-day cosmic time, t 0 , and the four sigmoid parameters, α early , α late , k, and τ c . For all halos in each simulation, we hold t 0 fixed to the present-day age of the universe in the simulation, and M 0 fixed to the value M peak (t 0 ) in the simulation. All results in the paper also hold k fixed to 3.5; to determine this particular value, we ran the optimization algorithm described below while allowing k to be a free parameter, and then observed no appreciable changes in the quality of the fits when holding k fixed to any value in its typical best-fitting range, 2 k 5. Thus the Diffmah model for individual halo growth has a total of 3 free parameters: α early , α late , and τ c .
When evaluating the sigmoid behavior of the powerlaw index, we implement this scaling relation in logarithmic time, x ≡ log 10 t, rather than linear time. Thus in  Fig. 1, only here we show the behavior of the autodiff-computed gradients that we use to determine the bestfitting parameters describing the assembly of each simulated halo.
practice, we calculate the power-law index α(t) via: In order to identify an optimal set of parameters for each halo, we searched our three-dimensional model parameter space, θ, for the combination of α early , α late , τ c that minimizes the quantity L MSE , defined as where w and v are the base-10 logarithm of the predicted and simulated values of M peak evaluated at a set of N control points, t i . For the control points of each halo, we use the collection of snapshots in the halo's merger tree after a time t min , defined as where we set t cut = 1 Gyr, and the quantity t thresh is the most-recent simulated snapshot where the halo mass falls below a simulation-dependent threshold mass, M thresh ; for the MDPL2 simulation, we use M thresh = 10 11.25 M , and for the Bolshoi-Planck and TNG simulations we use M thresh = 10 10 M . In Eq. A3, the quantity t δm refers to the most-recent snapshot where the halo mass falls below some fraction f δM of its present-day mass; for all simulations, we set f δM = 10 2.5 . In each of the three simulations we studied, for a surprisingly large number of halos, we find that at early times, t 3Gyr, the behavior of M peak (t) suddenly drops like a stone by an order of magnitude or more from one snapshot to the next, even for halos that should nominally be resolved by hundreds of particles. We found through experimentation that the inclusion of the term t δm in Eq. A3 is reasonably effective at limiting the influence of this phenomenon on the best-fitting parameters describing the halo MAH. We suspect that this behavior is a reflection of the difficulty of halo-finding when the high-redshift density field resembles a sea of shallow peaks, however, the root cause of this phenomenon is unclear, and other origins of this phenomenon are certainly plausible. For example, recent work on structure formation has uncovered surprisingly stringent resolution demands of N-body simulations for science applications that require robust tracking of the evolution of halo substructure Ogiya et al. 2019). It could be the case that analogous results apply to the simulation requirements associated with reliable tracking of halo mass accretion history in the first Gyr of cosmic time; alternatively, halo assembly at high redshift may simply require an even more flexible fitting function than the smoothly rolling power-law model adopted here. As discussed further in §5, the most reliable way to address this and related issues is through a dedicated resolution-requirement study that uses homogeneously-processed, nested simulation boxes spanning a wide range of resolution. Since such a simulation suite is not currently publicly available, and since our own science applications are focused on the redshift range z 5 that is most relevant to present-day and near-future cosmological surveys, we have opted to relegate further study of this issue to future work.
To minimize L MSE for each halo, we use the JAX implementation of the Adam algorithm (Kingma & Ba 2014), which is a gradient descent technique with an adaptive learning rate. Our use of Adam requires calculating the gradients ∂L MSE /∂θ, which is facilitated by our JAX implementation, allowing us to efficiently compute these gradients to machine precision without reliance upon finite differencing methods. We show example gradients of our model predictions for M peak (t) in Figure A9. To tune the performance of the optimization beyond the default settings recommended in Kingma & Ba (2014), we use 2 successive burn-in cycles with a relatively large step-size parameter of 0.25 for ∼ 50 updates, followed by a final cycle for an additional ∼ 200 updates using a step-size parameter of 0.01.
A.1. Imposing physical constraints on individual halo growth When minimizing L MSE , the parameters we actually vary in the gradient descent are β e , β , and x 0 , where the relationships between these parameters and the quantities that directly appear in Eq. 1, α early , α late , and τ c , are defined as follows: The function S(z) is the softplus function, which is strictly positive across the real line, exhibits approximately linear behavior for s 1, and asymptotically approaches zero as s → −∞. For any positively-valued z, the inverse of the softplus function is well-defined and computed as S −1 (z) ≡ ln(exp(z) − 1), which we have written as U (z) in the main body of the paper and plotted on the axes of Figure 5, as well as on the axes of the figures in Appendix C. Figure A10 gives a visual illustration of the behavior of the softplus function we use to enforce these constraints for the case of α late .
Through our use of these variable transformations, we ensure that the best-fitting parameters determined by our application of gradient descent will always respect 0 < α late < α early . Mathematically, this guarantees that The figure shows the behavior of the softplus function that controls the relationship between α late and β , the actual variable that we vary in the gradient descent algorithm used to fit the assembly history of each individual halo. We use a similar technique to ensure 0 < α late < α early , and a simple logarithmic transformation to enforce τc > 0. These transformations guarantee that the growth history of each model halo will always increase monotonically and transition from a fast-accretion phase to a slow-accretion phase, even for outlier halos with unusual merger trees. See Eq. A4 and associated discussion for details.
in the best-fitting approximation to each individual halo, for all t > 0 the quantity M peak (t) will increase monotonically, and dM peak /dt will strictly slow down as M peak (t) eventually attains the value M 0 at the time t 0 . These mathematical constraints embody the physical expectation that dark matter halo growth begins with a fastaccretion phase at early times, and transitions to a slowaccretion phase at late times. One can imagine employing a more flexible fitting function such as one based directly on a machine-learning algorithm that would not hard-wire this expectation into the approximation; such an approach has potential to capture short-term fluctuations that our model does not. We note, however, that even when these constraints are imposed on the fits as described above, dark matter halo growth can be approximated with a typical accuracy of 0.1dex or better for all t 1 Gyr.
In carrying out this investigation, we studied a wide range of alternative implementations in which alternative variables were used in the optimization. For example, rather than using β e and β , we have also explored the use of logarithmic variables, and an alternative formulation in which the varied parameters were restricted to finite bounds via a sigmoid function. The specific choice for which parameters are varied in the optimization amounts, in effect, to a choice for a prior distribution on the parameters. We found through considerable experimentation that this choice can have a significant impact on the distribution of best-fitting parameters, even though the quality of the individual fits themselves was typically unaffected by this choice. Since one of our primary goals was to build a simple model for the distribution of best-fitting parameters, then the choice for the effective priors warranted special attention. Ultimately, our decision on the parameterization was driven by two considerations: first, that the differentiable approximation to each simulated halo respects the physical constraints defined above, and second, that the distribution of best-fitting parameters admits a reasonably simple em-pirical description.
Amongst all the variations we explored for the effective priors, as well as amongst several variations in the actual functional form used to fit the growth of individual halos, we found a clear bimodality in the halo population. The manifestation of this bimodality of course changed from variation to variation, but in all cases, we consistently found two separate sub-populations within the distribution of best-fitting parameters, roughly corresponding to early-forming and late-forming regions of parameter space. While these two sub-populations are easily identified in terms of their clustering within the distribution of best-fitting parameters, the distribution of the actual values of the formation time of the halos, t form , shows heavy overlap between the two groupings. There is also no strong dependence of the NFW concentration, c, on sub-population membership, even though c is strongly correlated with t form . Moreover, when splitting a sample of halos of the same M 0 according to sub-population membership, the two-point clustering of two samples hardly differs, even for halo masses where the actual t form -dependence of ξ mδ (r) is strong. Thus even though this bimodality considerably simplified our task of modeling the distribution of best-fitting parameters (as described in detail in Appendix B), the subpopulation membership of a halo may not have special physical significance.

POPULATIONS WITH DIFFMAHPOP
In this appendix, we describe our model for the diversity of assembly histories of halo populations. The goal of our halo population model is to capture P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ). Our approach to this problem is to use our model for individual halo growth as a basis, and to build a model of the statistical connection between present-day halo mass and the distribution of parameters describing individual halo MAHs, P (α early , α late , τ c |M 0 ).
As outlined in §4, we model the distribution P (α early , α late , τ c |M 0 ) as a two-population normal distribution of 3 parameters. We model this distribution in terms of the same parameter transformations used in Appendix A to define the variables β ≡ U (α late ), β e ≡ U (α early − α late ), and x 0 ≡ log 10 τ c , where U (s) ≡ ln(exp(s) − 1) is the inverse of the softplus function (see Eq. A4). Thus each sub-population is described by an independently-defined mean, µ i ≡ (β e , β , x 0 ), and covariance matrix, Σ i , defined as By defining our halo population model in terms of the transformed variables β e , β , and x 0 , we ensure that all values of α early , α late , and τ c in the support of P (α early , α late , τ c |M 0 ) will respect the physical constraints described in §A.1, even in the tails of the distribution.
The relative weight of the two populations, F late , has shallow, monotonic mass-dependence, so that F late = F late (M 0 ). Each of the three components of µ i , and each of the six components of Σ i , also have a shallow, monotonic mass-dependence, so that µ i = µ i (M 0 ), and Σ i = Σ i (M 0 ). For the remainder of this appendix, we will describe our models for F late (M 0 ), µ i (M 0 ) and Σ i (M 0 ) in turn.
We model the mass-dependence of F late (M 0 ) as a power-law function of present-day mass with rolling index. We capture this behavior using the same sigmoid functional form defined in Eq. 2, so that F late has sigmoid-type dependence upon log 10 M 0 . We calibrate the specific parameters of the sigmoid using the techniques described in detail in Appendix C.
We model the mass-dependence of each component of µ i in the same way we modeled F late , i.e., as a power-law function of M 0 with rolling index, implemented based on a sigmoid function of log 10 M 0 . Again we relegate discussion of our determination of the specific parameters of these sigmoid functions to Appendix C.
In order to model the components of each Σ i , we make use of the Cholesky matrix decomposition. Briefly, for any real-valued, symmetric, positive-definite matrix, Σ, there exists a unique, real-valued, lower-triangular matrix, L, that defines the Cholesky decomposition, written as Σ = L · L T . The diagonal entries of L are strictly positive with a product equal to the determinant of Σ. We use JAX to calculate L so that our computations will be differentiable with autodiff, but many modern linear algebra libraries have efficient implementations of the Cholesky decomposition (for a modern review, see Higham 2009).
In modeling each of the two 3 × 3 matrices, Σ i , the quantities we parameterize are the 6 entries of the associated Cholesky matrix: In fitting our model for P (M peak (t)|M 0 ) as described in Appendix C, we formulate our parameterization in terms of log 10 a i , log 10 b i , and log 10 c i to ensure that Σ = L · L T will always be positive definite, but the parameters d i , e i , and f i can take on any value on the real line and still produce a valid covariance matrix, and so we use linear variables for our parameterization of the off-diagonal entries of L.

C. FITTING THE PARAMETERS OF DIFFMAHPOP
Our goal in §4 is to identify a point in the parameter space of our model that can be used to generate a realistic distribution of individual, smooth trajectories of halo assembly, and at the same time results in an accurate reproduction of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ). To achieve these goals, our model for P (M peak (t)|M 0 ) has numerous free parameters that required tuning in order to achieve the level of agreement shown in Figures 7 & 8. In C.1, we review the general techniques we used for making differentiable predictions for the mean and variance of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ), and in C.2 we describe how we optimized the parameters of our best-fitting models.
C.1. Differentiable predictions for halo population assembly The first moment of our model prediction for P (M peak (t)|M 0 ) is given by where the PDF in the integrand is given by 5 P (α early , α late , τ c ) = (1 − F late ) · P early (α early , α late , τ c ) + F late · P late (α early , α late , τ c ), (C2) where P early and P late are the two, separately-defined three-dimensional Gaussian distributions described in Appendix B. The calculation of the second moment of P (M peak (t)|M 0 ) is similar to Eq. C1, only rather than M peak (t) appearing in the integrand, the PDF-weighted There are a number of approaches one could take to computing these nested integrals. First, we note that our model for P (α early , α late , τ c |M 0 ) is completely analytic and smooth, without any complex oscillatory behavior or sharp turns in the parameter space, and so commonlyused integration routines (e.g., Romberg 1955) would have no trouble giving a high-accuracy result. Alternatively, each of the component ingredients of our model for the PDF in the integrand of Eq. C1 have well-established numerical algorithms for generating stochastic realizations of the distributions, and so it would be straightforward to use a Monte Carlo approach to compute these integrals via moments of randomly generated samples. Either calculation could also be conveniently conducted in python using implementations in the scipy library. However, we have instead opted for a third method based on weighted sampling that simplifies gradient calculations with autodiff.
Our autodiff-friendly method for calculating the integrals in Eq. C1 can be easily understood in terms of a simple toy example of a generic PDF convolution: where we will think of the integrand as some smooth target function, f (x), weighted by a probability distribution, P (x), and where in the second equation we have replaced the continuous integration with a finite sampling of points spanning the domain of support, D. In other words, to calculate the integral in Eq. C3, rather than using an iterative algorithm such as Romberg (1955) that is challenging to implement in an autodiff library, instead one can use simple Gaussian integration with a sufficiently dense sampling grid. This toy problem is of the same form as the problem at hand in the evaluation of Eq. C1 for cases where P (x) = P (x|θ) and f (x) = f (x|θ), only with this formulation, we can readily see how simple it is to compute gradients of y with respect to model parameters, θ : By implementing the model components described in Appendix B in an autodiff library such as JAX, evaluating expressions such as C4 is entirely straightforward and does not require laboriously working out the analytical result for each contribution to the summations. One need only ensure that the integration domain D is sufficient to span the range of non-negligible support for P (α early , α late , τ c |M 0 ). We found this flexibility especially helpful during development phases of our model, when the need to conduct a range of modeling experiments required many variations of the underlying functional forms. In the following section, we describe how we carried out these calculations for the particular case of the PDF convolutions that arise from optimizing our model predictions for the first and second moments of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ).
C.2. Optimizing DiffmahPop predictions for halo population assembly In this section, we describe how we optimized the parameters of our model predictions for P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ), where the target distributions come from the merger trees of halos in gravity-only Nbody simulations. To generate our target data for a particular bin of present-day halo mass, we begin by selecting host halos in bin of M peak (z = 0) with width 0.1dex. One issue that arises in making accurate predictions for the variance of halo histories is that in our model, the trajectory of each halo has exactly the same final mass, M 0 , by construction, whereas the bin of simulated halos will have a source of unwanted variance due to the finite width of the bin. To account for this effect, we rescale the history of each simulated halo according to its actual value of M peak (z = 0) in the simulation before computing the target mean and variance.
As mentioned in §4, when defining the target data for a particular bin of present-day halo mass, M 0 , we limit the impact of wild outliers in the merger trees by trimming 3σ outliers from the halo sample. We conduct this outlier exclusion procedure as follows. For each snapshot of the simulation, t, we compute the trimmed mean, µ p (t|M 0 ), and trimmed variance, σ p (t|M 0 ), using scipy.stats.mstats.trimmed mean and scipy.stats.mstats.trimmed variance, respectively; within the scipy library, these quantities are computed by rank-ordering the objects in the sample, computing the percentile of each object, p, excluding the objects lying outside the range (p, 100 − p), and computing the mean and variance from the resulting population. At each snapshot, and for each bin of halo mass, we use µ p and σ p to compute the z-score, z p (t|M 0 ) ≡ log 10 M peak (t|M 0 ) − µ p (t|M 0 ) σ p (t|M 0 ) .
We use p = 0.01 to define each snapshot's trimmed mean and variance, and at each snapshot over the redshift range 0 < z < 3, we flag any halo with |z p (t|M 0 )| > 3. We exclude any flagged halo from the sample whose mean and variance define the target data we use to optimize our model for halo populations; this outlier exclusion procedure typically removes 2-3% of the halos in each mass bin; by construction, this excludes halos that experience an unusually large major merger after z < 3; the function implementing this algorithm can be found in the measure mahs.py module of the publicly available diffmah source code. To compute the target data defined in this fashion, we use the BPL simulation for bins of present-day halo mass centered at M 0 ≤ 10 13.5 M , and MDPL2 for larger halo masses, choosing bin centers spanning the range log 10 M 0 ∈ [11.75, 14.5] , with 0.25dex separation. In building our model for P (M peak (t)|M 0 ), our goal is to faithfully recover the diversity of smooth assembly histories for a population of halos with the same presentday mass, M 0 . As our model for individual halo assembly does not explicitly account for contributions from mergers, for our target data we use each individual halo's bestfitting M peak (t) and dM peak /dt to define the target mean and variance of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ). If we were to instead use the directly simulated histories to define the target one-point functions, then our model for P (α early , α late , τ c |M 0 ) would instead converge to a result with unrealistically large variance in the smooth trajectories traced by real halos. Using the directly simulated values of the MAHs as target data would require building an additional model component describing the variance from mergers about the smooth trajectories, which is beyond our intended scope.
The predictions of our model are controlled by the parametrized behavior of F late (M 0 ), µ i (M 0 ) and Σ i (M 0 ), and optimizing these functions requires beginning with an initial guess for the parameters. In order to determine this initial guess, we directly inspect the distribution of best-fitting parameters describing the individual trajectories of the halos in each target bin, M 0 . We use the Gaussian mixture model implemented in scikit-learn to decompose the binned halos into two weighted sub-populations defined by β e , β , and x 0 ; we thus use scikit-learn to supply an estimate for the fraction of later-forming halos in the bin,F late (M 0 ), and the mean and covariance of each sub-population,μ i (M 0 ) andΣ i (M 0 ), respectively. We proceed in this fashion and record each result for each bin of M 0 . Figure C11 shows an example comparison between initial guess for the two-component model and the true distribution of parameters for halos with M 0 = 10 12 M .
Using the collection ofF late (M 0 ),μ i (M 0 ) andΣ i (M 0 ) for each of our target bins of present-day halo mass, we find that each of these quantities exhibits a shallow, monotonic mass-dependence that is well-approximated by the 4-parameter sigmoid function defined in Eq. 2. To define our initial guess for the best-fitting parameters of our halo population model, the parameters of each sigmoid can simply be hand-tuned to give a reasonable approximation to each quantity, and in all cases we found acceptable approximations by holding the transitionmass parameter fixed to x 0 = 13.5, and the transitionspeed parameter fixed to k = 0.5.
Using the procedure described above to define the initial guess, we use the L-BFGS-B algorithm implemented in scipy to optimize our model parameters. The loss For halos of present-day peak mass M 0 = 10 12 M , the figure shows cross-sections of P (α early , α late , τc|M 0 ). For each individual BPL halo in the mass bin, we approximate the assembly history of the halo with the rolling power-law model described in §3, and plot the distribution of parameters U (α early ), U (α early − α late ), and log 10 τc, where U (s) ≡ ln(exp(s) − 1). The orange ellipsoids show the earlier-forming sub-population (τc 2 Gyr) of the two-component Gaussian distributions identified by scikit-learn, and the purple ellipses show the later-forming sub-population (τc 2 Gyr). The colored ellipses are defined byμ i andΣ i that we use to define the initial guess for the parameters regulating our model prediction for P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ). See text in C.2 for details. The axes in each row of panels are the same as those described in Figure C11. Different colored ellipses correspond to normal distributions of halos with different present-day mass as indicated in the legend. The top row (bottom row) of panels shows distributions for the earlyforming (later-forming) population of halos, where the mean and variance of each population was determined according to the optimization procedure described in C.2. function we minimize in this case is analogous to L MSE used in Appendix A (see Eq. A2), only here we minimize the sum of the logarithmic differences between the collection of means and variances of P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ) tabulated for each of our target mass bins. In searching for the best-fitting point, we do not vary all four parameters of each sigmoid function of each component, but only the two parameters controlling the two asymptotic bounds on each sigmoid, holding the k and x 0 parameters fixed to their hand-tuned values. Moreover, we do not allow the parameters to vary arbitrarily, but rather we only permit the L-BFGS-B algorithm to search within a relatively narrow range of ∼ 30% of the initial guess. We refer the reader to our publicly available source code for further quotidian details on the implementation of this optimization procedure. Figure C12 displays the results for the mass-dependence of the two-component normal distribution whose predictions for P (M peak (t)|M 0 ) and P (dM peak /dt|M 0 ) are shown in the main body of the paper. We refer the reader to the rockstar pdf model.py of our source code for the specific best-fitting values that result from this optimization exercise.

D. ASSEMBLY HISTORIES OF SUBHALOS, ORPHANS, AND SPLASHBACK-CENTRALS
In the publicly available Rockstar catalogs described in §2, the column a first infall indicates the scale factor of the first snapshot where the (sub)halo was contained within the virial radius of some more mas-sive halo. For a present-day host halo (as defined by upid=-1), the value of a first infall will be smaller than unity if and only if the halo experienced a flyby event at some point in its past history. For the results in the main body of the paper, we defined our host halo samples by requiring upid=-1, without regard for the a first infall column. Figure D13 in this appendix shows the residual errors of our model for individual growth for samples of flyby host halos defined by upid=-1 and a first infall<1, using narrow bins of present-day peak halo mass as indicated by the in-panel annotation. Figure D14 in this appendix shows the analogous results for subhalos in the publicly available Rockstar catalogs in which upid = −1. Finally, Figure D15 shows the analogous results for samples of subhalos that were either merged or disrupted at some point prior to z = 0. These subhalos do not appear in the publicly available Rockstar catalogs, but their histories can be extracted from the publicly available merger trees using the extract orphan info.c program in the UniverseMachine source code 6 .

E. ASSEMBLY HISTORIES OF HALOS IN IllustrisTNG
In this appendix, we illustrate the results of calculations described in the main body of the paper, only here shown for halos in IllustrisTNG. When analyzing the IllustrisTNG merger trees, we compute M peak (t) using the total halo mass defined by the sum of dark matter, stars and gas. We remind the reader that halos and merger trees in IllustrisTNG were identified by SUBFIND and SUBLINK, respectively, which are entirely different algorithms than the Rockstar and Consistent Trees codes used on the gravity-only simulations studied in this paper. Figure E16 is analogous to Fig. 3 in the main body of the paper, but for the case of present-day subhalos in IllustrisTNG. Figure E17 demonstrates that the residual errors in approximating halo growth are uncorrelated with the large-scale density field in IllustrisTNG. For IllustrisTNG, we have also repeated the halo population model calibration exercise described in detail in Appendix C. Making no changes to the form of any model components, and varying the same set of parameters, we simply optimized the model according to a different set of target data. Figures E18-E19 show that our model for halo population growth achieves a comparable level of accuracy for IllustrisTNG as for gravity-only simulations.
This paper was built using the Open Journal of Astrophysics L A T E X template. The OJA is a journal which provides fast and easy peer review for new papers in the astro-ph section of the arXiv, making the reviewing process simpler for authors and referees alike. Learn more at http://astro.theoj.org.