The local vertical density distribution of ultracool dwarfs M7 to L2.5 and their luminosity function

We investigate the form of the local vertical density profile of the stars in the Galactic disk, close to the Galactic plane. We use a homogeneous sample of 34000 ultracool dwarfs M7 to L2.5 that all lie within 350 pc of the plane. We fit a profile of the form sech$^\alpha$, where $\alpha=2$ is the theoretically preferred isothermal profile and $\alpha=0$ is the exponential function. Larger values of $\alpha$ correspond to greater flattening of the profile towards the plane. We employ a likelihood analysis that accounts in a direct way for unresolved binaries in the sample, as well as for the spread in absolute magnitude $M_J$ within each spectral sub-type (Malmquist bias). We measure $\alpha=0.29^{+0.12}_{-0.13}$. The $\alpha=1$ (sech) and flatter profiles are ruled out at high confidence for this sample, while $\alpha=0$ (exponential) is included in the 95% credible interval. Any flattening relative to exponential is modest, and is confined to within 50 pc of the plane. The measured value of $\alpha$ is consistent with the results of the recent analysis by Xiang et al. Our value for $\alpha$ is also similar to that determined for nearby spiral galaxies by de Grijs et al., measured from photometry of galaxies viewed edge on. The measured profile allows an accurate determination of the local space density of ultracool dwarfs M7 to L2.5, and we use this to make a new determination of the luminosity function at the bottom of the main sequence. Our results for the luminosity function are a factor two to three lower than the recent measurement by Bardalez Gagliuffi et al., that uses stars in the local 25 pc radius bubble, but agree well with the older study by Cruz et al.


INTRODUCTION
The variation of the space density of stars in the disk of the Milky Way, in the vertical direction, i.e. perpendicular to the plane of the disk, and at the solar radius, approximates to an exponential distribution (Gilmore and Reid 1983) up to heights of 1 kpc. What happens close to the plane? Is there a sharp density peak, or does the exponential soften? We do not have a clear answer to this question for the Milky Way, but the density profile is often modelled by a sech 2 distribution (e.g. Gould et al. 1996;Siegel et al. 2002;Ferguson et al. 2017;Bennett and Bovy 2019), which softens by a factor four relative to an exponential. A self-gravitating isothermal sheet has this equilibrium solution (Spitzer 1942;Camm 1950;van der Kruit and Searle 1981), and this may be why the sech 2 distribution is popular, even though it is well known that the velocity dispersion of the stars in the disk depends on age.
In considering this question a useful flexible functional form for the density distribution as a function of height z from the plane, is the generalised sech distribution proposed by Van der Kruit (1988): ρ(z) = 2 −2/n ρ e sech 2/n (nz/2z e ) , where z e is a scale height. With this parameterisation, the exponential, sech, and sech 2 distributions correspond to n = ∞, 2, and 1 respectively. For data analysis this representation is unsatisfactory, because we want to constrain the value of the parameter n, but it has an infinite range, causing difficulty in defining the prior. For this reason we prefer the form used by Dobbie and Warren (2020), who substitute α = 2/n, so the function becomes: ρ(z) = 2 −α ρ e sech α (z/αz e ) , and now the exponential, sech, and sech 2 distributions correspond to α = 0, 1, 2. In this form, with different values of α, the functions all have the same density at large values of |z| where they each asymptote to the exponential distribution with scale height z e . The term 2 −α is therefore the degree of softening in the centre relative to the exponential distribution. This shows that the sech 2 distribution softens by a factor four, as quoted above. Example functions, with ρ e = 1, are plotted in Fig. 1 for values of α = 0, 0.5, 1, 2, (top to bottom, respectively) and for a scale height z e = 300 pc, which is the canonical value for the Milky Way (e.g. Gilmore and Reid 1983;Jurić et al. 2008;Bochanski et al. 2010;Chang et al. 2011).
Beyond a few scale heights there is an excess in the tail, requiring a second population, of larger scale height. However in the current paper we are only concerned with the density distribution close to the plane, at heights |z| < 350 pc, so we will assume a single population.
In the remainder of this paper we will use the following equivalent parameterisation of the density function: ρ(z) = ρ 0 sech α (z/αz e ) = ρ 0 sech α ((z + z )/αz e ) . (3) Here ρ 0 is the density in the Galactic plane, z is the height of the Sun above the Galactic plane, and z = z − z is the vertical height measured from the Sun. Because our analysis is confined to distances < 350 pc we ignore any effect of variation of the stellar density with Galactic radius, because the scale length for this is so large, ∼ 3 kpc. -Density profiles, according to equation 2, with ρe = 1.0 and ze = 300 pc. From top to bottom, the curves are: exponential, sech 0.5 , sech, sech 2 , which are α = 0, 0.5, 1, 2 respectively. Dobbie and Warren (2020) summarise the status of measurements of α in external galaxies and in the Milky Way. The best study of α in external galaxies is the analysis by de Grijs et al. (1997) who measured the surface brightness profiles of edge-on spiral galaxies in the K band, to minimise the effects of extinction. From a sample of 24 galaxies they found a distribution of values of α = 0.5±0.2 (corrected for seeing and extinction), and they argue that the true value may be even lower due to a bias because the galaxies are not viewed perfectly edge on.
In the Milky Way itself there have been very few quantitative studies that contribute to this question. Using observations in the near-infrared Hammersley et al. (1999) state that "Analysis of one relatively isolated cut through an arm near longitude 65 degrees categorically precludes any possibility of a sech 2 stellar density function for the disc." In a footnote Jurić et al. (2008) state that the exponential profile provides a better fit than sech 2 close to the plane. Using Gaia DR1 Bovy (2017) draws a different conclusion. He measured the vertical density distribution separately for different spectral types A to K, and states "All vertical profiles are well represented by sech 2 profiles, with scale heights ranging from ∼ 50 pc for A stars to ∼ 150 pc for G and K dwarfs and giants". However, he does not fit α as a free parameter and measure the uncertainty, so it is unclear what 'well represented' here means. Furthermore for the latertype stars the profile fits are not compelling. It is noteworthy that the measured scale heights are much smaller than the canonical value for the thin disk of 300 pc, even for the later-type stars for which one might expect agreement.
The most detailed information on the vertical density distribution is provided by the recent study by Xiang et al. (2018) using LAMOST spectroscopic observations. They are able to divide stars into several age bins. The measured values of α = 2/n 1 (their Table 3b) show sig-nificant differences between age bins, both up and down, but averaging over several bins one can see that the younger populations, ages < 8 Gyr, have smaller scale heights and average α ∼ 1, while the older populations, ages > 8 Gyr, have larger scale heights and α ∼ 0. Combining all ages together the best fit value is also α ∼ 0. The selection function for this survey is exceedingly complex, and the estimation of α was not a primary aim of the project. The significant variations in α between different age bins may indicate that the uncertainties have been underestimated, which is why we have not quoted uncertainties here. Nevertheless the overall trend of α decreasing with age seems clear and this is the first time that this has been shown.
In their own study, Dobbie and Warren (2020) used the large samples of K and M stars from SDSS collated by Ferguson et al. (2017) to study the problem. They found that there is moderate evidence (specifically meaning 2 < ln B < 5, where B is the Bayes factor) for the exponential and sech models over the sech 2 model, but concluded that a sample that reaches closer to the Galactic plane is needed. This is in fact the problem with the majority of samples of the vertical structure of the Galactic disk, that they sample a conical volume, with the Sun at the apex, so the space density at small heights |z| < 300 pc is not sufficiently well sampled, if at all. This may be compounded by the problem that the images of nearby stars are saturated.
The results on α of Xiang et al. (2018) and Dobbie and Warren (2020) are not in agreement with the finding of Bovy (2017) that α ∼ 2. As with the study of Xiang et al. (2018), the selection function for the sample of Bovy (2017) is rather complex. This makes it very difficult to investigate the origin of the disagreement. These results motivate a new measurement of α using a survey with good sampling of the local volume, distances < 500 pc, with a simple selection function, and ideally selected at near-infrared wavelengths to minimise extinction. In this paper we analyse such a sample: we combine 32 942 M7 to M9.5 dwarfs from Ahmed and Warren (2019) with 1 016 L0 to L2.5 dwarfs from Skrzypek et al. (2016), all selected from UKIDSS, to measure an accurate value of α. This in turn provides a measurement of the space density of each spectral type in the plane of the disk, which can be transformed to the luminosity function at the bottom of the main sequence.
The layout of the remainder of the paper is as follows. In Sect. 2 we describe the samples used in the analysis. In Sect. 3 we analyse the vertical density distribution using firstly a binned estimate, and then a maximumlikelihood fit using every star individually. We provide a discussion of these results on the vertical density distribution in Sect. 4. In Sect. 5 with use the results to determine the stellar luminosity function over the spectral range M7 to L2.5. We provide a summary of the main points in Sect. 6. All magnitudes in this paper are on the Vega system. The J band refers to the MKO J passband (Tokunaga et al. 2002), unless specifically stated otherwise.

SAMPLES
By combining the samples of Ahmed and Warren (2019) and Skrzypek et al. (2016) we create a large homogeneous sample of 33 958 M7 to L2.5 dwarfs 13.0 < -Distribution of distances for the sample of 33 985 M7 to L2.5 dwarfs, where for this plot sources are treated as single (unresolved binaries are ignored). Objects have been plotted at a random horizontal coordinate within each spectral-type box. For each spectral type the upper and lower distance limits correspond to the sample magnitude limits 13.0 < J < 17.5. The sample is complete over all spectral types between distances d min (M7) and dmax(L2.5), which is the range 41.3 < d < 109.5 pc. The thin horizontal line is drawn at a distance of 40 pc, to make clear the relative contributions of the different spectral sub-types to measuring the space density at very small distances. J < 17.5, at distances < 350 pc (except for unresolved binaries as explained below). The hydrogen burning limit is believed to be reached after spectral type L2.5 (Dieterich et al. 2014), meaning that L3 dwarfs and later are brown dwarfs, while L2.5 dwarfs and earlier are predominantly main sequence stars, but can also include young brown dwarfs. In the field the proportion of young brown dwarfs is small, and therefore the sample of M7 to L2.5 dwarfs is representative of the bottom of the main sequence. The numbers of dwarfs of different spectral types are listed in Table 1. As detailed in the above catalogue papers these samples are well suited to measuring the density profile and the luminosity function. The samples are highly complete, and the spectral classifications are unbiased except for rare peculiar blue or red sources, comprising an estimated ∼ 1% of the sample.
The two samples cover the same area of sky and were selected by essentially identical methods using the phototype method of Skrzypek et al. (2015). The method classifies objects using multicolour photometry. For the L dwarfs the bands izY JHKW 1W 2 were used, while for the M dwarfs the W1 and W2 bands were omitted. They add no significant useful information for these types. The accuracy of the spectral types is competitive with spectroscopy. For the M dwarfs the classification is accurate to better than 0.5 sub-types rms, and is tied to the optical spectroscopy of the BOSS sample of Schmidt et al. (2015). For the L dwarfs the classification is accurate to one sub-type rms, and is anchored to the optical system of Kirkpatrick et al. (1999). The sample is presented in Fig. 2, where each star has been plotted at the distance computed assuming that the object is single (ignoring unresolved binaries). Distances are computed based on the absolute magnitudes listed for each spectral type in Table 1. The selected absolute magnitudes are discussed extensively below, in this section. The sample is plotted in a different way in Fig. 3, as a histogram showing heights from the Galactic plane, assuming the Sun lies at a height of 10 pc above the plane. This plot illustrates the fact that the sample includes a large number of objects at heights |z| < 100 pc, and therefore is well suited to investigating the softening close to the Galactic plane. The survey covers an effective area of 3 031 deg 2 (7.3% of the sky), and the solid angle as a function of Galactic latitude Ω(b) is provided in Table 1 of Ahmed and Warren (2019), in wedges of angular extent 1 • . The sample defined in this way has excluded from the slightly larger total sample two small areas of larger reddening, totalling 39 deg 2 .
The photometry for this sample is very precise and extinction is very low. In the J band the median photometric uncertainty is 0.016, and the 90% quantile is 0.028. We quantify the extinction using the results from Green et al. (2018), for a distance of 400 pc. This is an overestimate of the effects of dust on the sample, but the method of Green et al. (2018) becomes too inaccurate at smaller distances to be useful. On this basis in the J band the median extinction is < 0.02 mag. and for 90% of sources the extinction is < 0.06 mag. For only 0.3% of the sources is the extinction at 400 pc greater than 0.13 mag. Therefore we make no corrections for extinction.
Laithwaite and Warren (2020) have made a detailed study of the unresolved binaries in the Ahmed and Warren (2019) sample of M7 to M9.5 dwarfs. They find that unresolved binaries comprise 16.2% of all the systems, and that the binaries are almost exclusively equal mass systems. This means that we can assume that per unit volume 16.2% of the sources are twice as bright as single stars. We will assume that the same properties apply to the L0 to L2.5 dwarfs. In this respect we note that Reid et al. (2008) find a similar fraction of unresolved binaries in their sample of L dwarfs.
Laithwaite and Warren (2020) also redetermined values of the absolute magnitude M J for spectral types M7 to M9.5, based on Gaia DR2 parallaxes for 2706 systems, with typical parallax/error of 30. 1 The results therefore are very accurate, but the values are on average some 0.5 mag. brighter than those of Dupuy and Liu (2012) for these spectral types (although the two relations converge at L0). The new values are listed in Table 1. The distances quoted in Ahmed and Warren (2019) used the absolute magnitudes of Dupuy and Liu (2012) and are therefore wrong.
The large difference in M J for spectral types M7 to M9.5 compared to Dupuy and Liu (2012) requires further comment. The matter was considered in detail by Laithwaite and Warren (2020), in their Section 6.1. The reason for the discrepancy is not due to incorrect parallaxes in Dupuy and Liu (2012) but is apparently due to differences in spectral classifications between the two samples. The sample of Ahmed and Warren (2019) is homogeneous and the classifications are accurately cal-1 We have checked these results using the more recent Gaia EDR3 release, finding essentially identical values.
ibrated to the classifications of the BOSS spectroscopic sample of Schmidt et al. (2015), which itself is homogeneous and was subject to careful checks for systematics. Laithwaite and Warren (2020) found 10 stars in common between Schmidt et al. (2015) and Dupuy and Liu (2012), and the classifications of Schmidt et al. (2015) are on average 0.7 spectral types later.
Some additional evidence that the Schmidt et al. (2015) M7 to M9 classifications are later than older classifications comes from the recent CARMENES paper by Cifuentes et al. (2020). Although there are no CARMENES M7 to M9 stars in common with Schmidt et al. (2015), it is evident that over this spectral range the G − J 2M ASS colours in Cifuentes et al. (2020) are substantially redder than the G−J M KO colours in Laithwaite and Warren (2020). Over this spectral range a star has J 2M ASS − J MKO ∼ 0.1 mag. (Stephens and Leggett 2004). Using this, their average colours for M7 and M8 are G − J M KO = 4.15, 4.42 respectively. Laithwaite and Warren (2020) measure G − J M KO = 4.15, 4.47 for spectral types M8 and M9, respectively, indicating an offset by approximately one spectral type. The absolute magnitude scale of Cifuentes et al. (2020) is also quite close to that of Dupuy and Liu (2012).
To summarise, there is now good evidence that the differences in absolute magnitude between the relation of Laithwaite and Warren (2020) and the other two samples are due to differences in spectral classifications between the BOSS sample of Schmidt et al. (2015) and older samples. The advantage of using the M7 to M9.5 sample of Ahmed and Warren (2019) is that it is large and homogeneous, and the relation between M J and spectral type has been accurately calibrated with Gaia with a large sample of 2706 systems. It would of course be incorrect to apply the relation between M J and spectral type derived from a different sample to the sample of Ahmed and Warren (2019). The sample of Schmidt et al. (2015), to which the spectral types of Ahmed and Warren (2019) are calibrated, has become the de facto standard in this field, so this apparent offset in spectral classifications compared to older samples needs to be borne in mind when comparing science results derived from different samples.
For the L0 to L2.5 stars the absolute magnitudes in Table 1 were calculated using the polynomial relation between M J and spectral type of Dupuy and Liu (2012), derived from ground-based parallaxes. We checked these values by first matching to GAIA DR2 all the L0 to L3 dwarfs in the sample of Skrzypek et al. (2016), then limiting to sources with parallax/error> 10. We then fit a linear relation between absolute magnitude and spectral type, allowing for binaries by fitting a double Gaussian profile to the distribution of absolute magnitudes at fixed spectral type. The method is very similar to that employed by Laithwaite and Warren (2020) for the M7 to M9.5 stars, except they used G − J colour rather than spectral type. The result for single stars is the linear relation M J = 0.359 SpT+7.882 where SpT denotes spectral type and L0, L3 are 10, 13 2 . Comparing against the values in Table 1 we find agreement at the level |∆M J | < 0.1 for all sub-types L0 to L2.5, confirming that the absolute magnitudes of Dupuy and Liu (2012) are reliable over this spectral range.
A final point to consider is the possibility of bias in the distances, due to the application of spectroscopic parallaxes i.e. a single absolute magnitude for all the stars of a particular spectral type. This could lead to biases in the derived structural parameters i.e. the scale height z e and the value of α. We present an analysis of this point in the Appendix, finding that any systematic errors introduced are significantly smaller than the random errors on parameters.

FITTING THE VERTICAL DENSITY DISTRIBUTION
It is usual to measure the density distribution by first binning the data and then fitting to the counts. This is useful because the binned counts give a visual impression of the shape of the variation in density. There is an important drawback to this approach however, in that it is unclear how to deal with the unresolved binaries: in Fig. 2 the binaries (which cannot be identified individually) should be plotted at a distance a factor √ 2 larger. Here we first present a binned analysis of the M7 and M7.5 stars only, which includes a large fraction of all the stars, and we then provide results of an optimal method that fits to all the data points simultaneously, without binning, and correctly accounts for binaries.

Binned analysis
In this section we simply ignore the fact that a fraction of the sources are unresolved binaries, and treat all the sources as single. The results are illustrative and used as a guide to the more complete analysis presented in the next subsection. Referring to Fig. 2 the lower and upper distance limits for each spectral type correspond to the magnitude limits of the survey J = 13.0 and 17.5, given the absolute magnitude for any particular spectral type. Therefore we can form a volume-complete sample of M7 and M7.5 stars by using distance limits d min (M7), and d max (M7.5), which are 41.3 pc and 287.1 pc respectively. The sample comprises 20 849 stars and is plotted in Fig. 4 using polar coordinates. The blue histogram plots the solid angle of the survey at each value of b, in 1 • wedges. Therefore to compute the space density we sum the number of sources in each slice, of height 10 pc, and sum the volume contributed by each wedge along the slice, accounting for the solid angle as it varies with b. For this calculation we assume the Sun lies at a height z = 10 pc above the Galactic plane.
The results of this calculation are plotted in Fig. 5. The blue points are the binned estimates of space density at heights above the plane, and the red points are the same for below the plane. The uncertainty on each point is plotted as a fractional uncertainty of 1/ √ N , where N is the number of points in the slice, and we have only used bins with > 20 points. The grey points are the blue points reflected about the Galactic plane, and allow a comparison of the variation in space density above and below the plane. There are no strong differences between the two curves, indicating consistency. It is well known that at larger distances from the plane differences are seen when comparing measurements above and below the plane (e.g. Widrow et al. 2012;Ferguson et al. 2017). It is possible that density fluctuations exist at a similar level in our data but they would be relatively less important   close to the Galactic plane where the space densities are higher. Given the good agreement between the red and the grey points we are justified in averaging the results for above and below the plane. The averaged results are presented in Fig. 6.
In fitting a model to the data, we assume Gaussian errors for each bin i.e. we approximate the Poisson distribution as Gaussian. Then if the model predicts m i points in bin i, and the observed number is n i , the logarithm of the likelihood is given by: which is Eqn 8 in Dobbie and Warren (2020). Because the likelihood is strongly peaked in the space of the parameters, the posterior is not sensitive to the form of the priors. We adopt broad uniform priors, which for the shape parameter covers the range 0 < α < 3. The function sech α (z/αz e ) can be awkward 3 to evaluate for small values of α, so we employ the identity and ensure that when α = 0 the function is set equal to e −x . Fitting Eqn 3, we measure α = 0.23 ± 0.13, and z e = 222 +8 −10 pc. This quantifies that the density profile is consistent with exponential all the way to the Galactic plane and that any flattening is modest. 4 The best-fit profile is plotted in Fig. 6, where it is compared to the best-fit exponential, which has a scale height z e = 227 +7 −6 pc. This indicates that any softening is confined to within 50 pc of the plane. The data are inconsistent with not only the sech 2 profile, but also the sech profile. Interestingly fitting a sech 2 profile, which is a bad fit, yields a scale height of 126 pc which makes no sense when compared to the fiducial scale height of the thin disk of 300 pc. This calculation shows that force-fitting the wrong profile, one that flattens too much in the centre, leads to an underestimate of the correct scale height. The problem of the sech 2 profile may be illustrated in another way. Because we have not accounted for binaries the binned analysis underestimates the true scale height somewhat. If we suppose that the scale height, uncorrected for binaries, could be as small as z e = 200 pc, and fit a sech 2 profile with this scale height the result is the very flattened and clearly incorrect profile plotted as the dotted line in Fig.  6.
This binned analysis indicates that the profile is peaky near the centre, but the values of the fitted parameters are not correct because we have ignored the presence of unresolved binaries. It is well know that the presence of unresolved binaries in a sample causes the scale height to be underestimated, and conceivably it could affect the estimate of α as well. The effect of unresolved binaries is usually quantified by modelling (e.g. Covey et al. 2008;Jurić et al. 2008;Bochanski et al. 2010). For example in the current case a synthetic catalogue would be created, one that includes unresolved binaries and that matches the selection criteria of the actual catalogue. The catalogue would be created with a scale height that is a guess for the true scale height and would be analysed in the same way as the actual sample. The catalogue scale height would then be adjusted until the measured (biased) scale height matches the measured value in the actual catalogue. In this way the true scale height is recovered.
In the next subsection we describe a likelihood analysis that instead accounts for unresolved binaries in a direct way, and makes additional improvements over the binned analysis presented in this section.

Method
We now implement four improvements compared to the above binned analysis. These are: 1. Using stars individually without binning.
2. Accounting correctly for the presence of unresolved binaries.
3. Accounting for the intrinsic spread in absolute magnitude M J of each spectral type, which in a magnitude-limited sample means that intrinsically brighter sources are over-represented -the Malmquist bias.
4. Including all stars in the sample from M7 to L2.5 over the full distance range of each type (displayed in Fig. 2).
We wish to calculate the likelihood L of observing the sample in question. In deriving this we will assume a Poisson point process, and follow a similar procedure to that presented by Marshall et al. (1983). Consider firstly a single spectral sub-type t. The data comprise the list of sources of observed Galactic latitude b and apparent magnitude J. Ignoring binaries, the expected number of sources µ in an infinitesimal element dbdJ is: Here, and throughout this section, d(J) refers to the distance of a single star computed from J using the absolute magnitude for the particular sub-type, listed in Table 1. The height above the plane is calculated as The angle b is in degrees, since Ω(b) is the solid angle defined in wedges of angular size 1 • (see Sect. 2). The term ρ(z(d(J), b, z )) should be understood to include the dependence on the model parameters ρ 0 (t), z e and α.
The probability of finding one source in the element is µe −µ , and of finding none is e −µ . Therefore the likelihood is the product of the probabilities of observing the N sources with their particular b, J, and of observing no sources in all the other elements. Consequently the likelihood for this type t may be written: where the first product is over elements containing sources, and the second product is over all the other elements within the volume surveyed. Taking the logarithm and dropping terms that are independent of the model parameters we obtain: and the volume integral is the expected number of sources in the survey, given the density function ρ, which depends on ρ 0 (t), z e , α and z . The likelihood as defined above uses all the stars of a particular sub-type individually, and deals with the first item above. We now consider the treatment of binaries. It is relatively easy to treat the binaries correctly through the likelihood because we can assume that each binary is exactly twice as bright as a single, based on the detailed study of binaries in this sample by Laithwaite and Warren (2020). This means that the survey is in fact two surveys, one for singles, and a second for binaries, where the distance limits for the binary survey are √ 2 larger than for the singles survey. For a total number of systems comprising a fraction f b of unresolved equal-mass binaries, and so a fraction 1 − f b singles, if the space density of stars at any point is ρ, the space density of single stars is ρ(1 − f b )/(1 + f b ) = ρr s , and the space density of binary systems ρf b /(1 + f b ) = ρr b . The binary systems are unresolved sources that are twice as bright as single stars, and we assume f b = 0.162 (Laithwaite and Warren 2020).
Returning now to equation 6, in computing the expected number of sources in an element dbdJ we must include the expected number of binary systems, and the equation becomes: where the term 2 √ 2 in front of r b derives from the larger volume element at the larger distance (from the terms d and dd/dJ). Propagating through to the logarithm of the likelihood, we obtain the final expression where the first triple integral is over the volume occupied by singles, given the sample magnitude limits, and the second volume integral is the same for the binary systems, for which all distances are √ 2 larger. In this way binaries are correctly included in the calculation of the likelihood.
We now consider how to treat Malmquist bias. Our sample is magnitude limited 13.0 < J < 17.5. But (single or binary) stars of a particular spectral type have a spread in absolute magnitude (due to e.g. variations in age and/or metallicity). Therefore the more luminous sources are detected to larger distances, and are overrepresented in the sample, and vice versa for less luminous sources. If stars of a particular sub-type are treated as having a unique absolute magnitude the measured parameters of the density distribution will be biased, and this is what we mean when using the term Malmquist bias in this paper. Laithwaite and Warren (2020) found a Gaussian distribution of absolute magnitude of dispersion σ M = 0.21 mag. at fixed G − J colour. Over the colour spread of half a spectral sub-type the dispersion increases to σ M = 0.24 mag. which is the value we adopt. The additional dispersion comes from the relation between absolute magnitude and colour.
Eqn 10 shows us how to deal with the spread in the absolute magnitudes M J of each spectral type. In Eqn 10 we are dealing with two populations, singles and binaries, where the binaries are twice as bright and occupy a different volume to the singles, where all distances are √ 2 larger. In the same way each of these two populations comprises a set of sub-populations of different absolute magnitude, the more luminous sources occupying a volume where all distances are multiplied by a factor f > 1, compared to the average, and the less luminous sources occupying a volume where all distances are multiplied by their own factor f < 1. To implement this we divide each population (singles or binaries) into a small number of sub-populations, i.e. we model the Gaussian distribution of M J as a coarse histogram. Each sub-population of absolute magnitude M J − ∆M J has a distance correction f = 10 0.2∆M J and a volume correction f 3 , analogous to the √ 2 and 2 √ 2 terms in the second term in Eqn 10. For a set of subpopulations defined by weights w j ( j w j = 1) and distance corrections f j , then, for ex- There is a similar sum for the second term, and then a set of volume integrals for all the sub-populations, single as well as binary, over the relevant volume occupied by each subpopulation.
In principle photometric errors can have an effect that is similar to the effect of the spread in absolute magnitudes, but this can be safely ignored for this dataset as the photometric errors in the J band (Sect. 2) are considerably smaller than the dispersion in absolute magnitude.
The final improvement we make ensures that all the stars in the full sample are used, over the full distance range of each spectral sub-type (see Fig. 2), rather than limiting to the distance range in common, as we did in the binned analysis for the M7 and M7.5s. This can be achieved straightforwardly by assuming that the density function has the same form for each spectral sub-type, meaning that the parameters z e , α, z are in common, but the normalisations are different, i.e. the central space density of each spectral type is a free parameter ρ 0 (t). Then the likelihood is the sum of the individual likelhoods for each sub-type, ln L = t ln L t , where the individual likelihoods are computed over the full sample and full volume for that sub-type. This means that the total number of free parameters is 15: the 12 ρ 0 (t), and z e , α, z . To be completely clear: ρ 0 (t) is the summed number of stars (not systems) per unit volume for a particular sub-type.
We adopt broad uniform priors on the parameters. Again, because the likelihoods are sharply peaked, the results are insensitive to the priors.

Results
We have fit the function sech α , as well as the simpler exponential function. We used the MCMC package emcee (Foreman-Mackey et al. 2013) to maximise the likelihood, and to measure the uncertainties. The results for both functions are summarised in Table 2, in each case accounting for binaries and Malmquist bias in the analysis. The uncertainties quoted correspond to the 16, 50, 84% quantiles in the marginilised distributions, and for all the ρ 0 (t) the fractional uncertainties are larger than 1/ √ N . The uncertainties on the ρ 0 (t) parameters are considerably larger for the sech α function compared to the exponential function because they include the uncertainty in the flattening towards the plane. We use the sech α results when comparing against local measurements of space densities, and in calculating the luminosity function. The exponential function fit is included because of its simplicity, and it can be used for comparison against other surveys except close to the plane |z| < 50 pc.
The uncertainties on the ρ 0 (t) parameters are highly correlated. Therefore when we perform arithmetic on the space densities (e.g. the summed space density of spectral types M7-M9.5 listed in Table 2, and later the calculation of the luminosity function), to measure the uncertainties we first perform the arithmetic on the MCMC chains and then measure the dispersion in the resulting values.
A corner plot for the parameters ρ 0 for the M7s, z e , z , and α is presented in Fig. 7, produced using the GetDist package (Lewis 2019). We have included only one ρ 0 (t) in this plot as the correlations for ρ 0 (t) of the other spectral sub-types have a similar form.
The most interesting result, and the principal result of this paper, is the distribution for α. As listed in Table 2 we measure α = 0.29 +0.12 −0.13 . Although the α distribution is strongly peaked near 0.3, there is a shoulder to the distribution that extends to α = 0. This shoulder reflects the fact that the data have almost no constraining power over the range 0 < α < 0.1 because within this range the density distribution varies only very close to the plane |z| < 20 pc, where there are very few sources, so the posterior is quite flat over this range. Quantifying the credible interval by the integrated probability within a range between equal probability densities, we find the 95% and 99% credible intervals are 0 < α < 0.50 and 0 < α < 0.59, meaning that the sech profile, α = 1, is firmly excluded. We wish to quantify at what level the credible interval includes the exponential model. This is ambiguous because the posterior density rises slightly as α approaches zero (Fig. 7). A useful measure is to state the credible interval at which the range becomes one sided, i.e. once the full range of α from the peak down to α = 0 is included, and this is the 95% interval. This means there is moderate evidence against the exponential model continuing all the way to the Galactic plane, or equivalently moderate evidence for some degree of flattening close to the plane. We compare the exponential and sech α fits in Fig. 8, plotting the summed density for the full spectral range M7 to L2.5. The two curves are essentially identical except at heights |z| < 50 pc. Any softening of the exponential profile is rather slight.
We can quantify the effects of the different improvements implemented in the full likelihood analysis. For the binned data we had the pair of results for the scale height z e and the shape parameter α of [222, 0.23]. Implementing successively a) treating all points individually rather than binned, and over the full distance ranges for each sub-type, b) accounting for binaries, c) accounting  . We see that accounting correctly for binaries has a substantial effect. Without allowance for binaries the scale height is underestimated by 10%. The effect of Malmquist bias is considerably smaller. If not accounted for, the scale height is underestimated by 3%. We believe this is the first time that the corrections for binaries and for Malmquist bias have been made in this direct way, as opposed to using mock catalogues.
The sign and the size of these biases are in very good agreement with the results of Jurić et al. (2008) for the SDSS, computed using mock catalogues. For example they found (we find) that for a binary fraction f b = 0.25 (0.162) the scale height is underestimated by 15% (10%). For Malmquist bias they found (we find) that for σ M = 0.30 (0.24) the scale height is underestimated by ∼ 5% (3%).
Using each source individually rather than binning the data ensures that the data are used optimally. However, this has a large cost in terms of the computational time required for the fit, and the gain is actually probably rather modest. For much larger sample sizes than used here the computational cost could be prohibitive. In such cases a compromise is possible. The key would be to bin the data in J and b, rather than in z, for each spectral sub-type. Then it would still be possible to implement a likelihood treatment for binaries and Malmquist bias analogous to the one employed here.

DISCUSSION OF THE MEASURED DENSITY PROFILE
The measured value of α = 0.29 +0.12 −0.13 for this population of stars corroborates the finding of Xiang et al. (2018) that α ∼ 0 for the thin disk, summing all ages together. The result is also in good agreement with the measurement by de Grijs et al. (1997) of the distribution of α in nearby edge-on spiral galaxies of α = 0.5 ± 0.2. Since these latter measurements were made in the K band the light would be dominated by cooler stars. 5 Bovy (2017) found that the vertical profiles of A to K stars are 'well represented by sech 2 profiles', and he measured scale heights increasing from ∼ 50 pc for (younger) A stars to ∼ 150 pc for (older) G and K dwarfs. The latter value is much smaller than the canonical value for the thin disk of 300 pc which is hard to understand. However in contrast to the A stars, the vertical profiles of G and K dwarfs are not well sampled by the GAIA DR1 data. It may be possible to reconcile all these results in the following way. The results of Xiang et al. (2018) indicate that α is larger for young populations, so the profile for A stars might be satisfactorily fit by a sech 2 profile. However the G and K samples will be dominated by older stars so for these populations one would expect a value of α similar to our measurement of 0.29. If this is true, fitting the sech 2 profile to this steeper profile will result in a substantial underestimate of the scale height, as we found in Sect. 3.1 (the same effect is also visible as the anti-correlation between z e and α in Fig. 7). This might help explain the small scale heights measured by Bovy (2017) for G and K dwarfs.
The measured value of α is interesting from a theoretical perspective because it is inconsistent with the value α = 2 predicted for an isothermal distribution. Banerjee and Jog (2007) have argued that a steeper value would be expected as a consequence of the constraining effect of the mass in the thin gaseous disk.
The measured scale height z e = 259 pc for the sech α profile, or z e = 269 pc for the exponential profile, is broadly in line with previous measurements for the thin  ski et al. (2010) who measured a scale height of 300±15 pc from a large sample of early and mid M dwarfs, accounting for binaries and Malmquist bias. This is satisfactory agreement given the much larger distances sampled by their survey, the different analysis method, and the uncertainty in the photometric parallaxes used by them.

disk. A useful comparison is against the result of Bochan
The measured height of the Sun above the plane of 10.9 +1.7 −1.6 pc also deserves comment. There have been several measurements of this quantity. One of the most recent and most detailed is the study of Bennett and Bovy (2019) who find a height 20.8 ± 0.3 pc. They emphasise the influence of asymmetries in the vertical density distribution, and they define the Galactic plane as the centre of the symmetric part of the density profile. The main asymmetry manifests itself at heights of 500 pc, beyond the limit of our survey. Since our data look symmetric we might expect the two results to agree somewhat better. Nevertheless as shown in Sec. 3.1, a difference at this level has only a small effect on the estimate of α.

THE LUMINOSITY FUNCTION
The sech α fits produce an estimated space density at the Galactic plane for each spectral type, Table 2. We can now compare against measurements of the local space density, in particular the measurements by Bardalez Gagliuffi et al. (2019) who have made a comprehensive census of ultracool dwarfs at distances < 25 pc. Referring to Fig. 8, with the Sun located 10 pc above the plane, the space density in the local bubble of radius 25 pc will be almost identical to the value at the mid plane.
We compare our measurements of the local space density with those of Bardalez Gagliuffi et al. (2019) in Table  3 and Fig. 9. Because Bardalez Gagliuffi et al. (2019) use a full spectral sub-type, for e.g. M7 we have combined our results for M7 and M7.5. 6 There is mostly fair  agreement between the two determinations, although the points for M7, L1, and L2 are not in statistical agreement (outside 2σ). For the M7 to M9 dwarfs, recall (Sect. 2) that Laithwaite and Warren (2020) noted an apparent discrepancy between spectral classifications in the homogeneous BOSS sample of Schmidt et al. (2015) and classifications collected from older literature. The difference is in the sense that older measurements found earlier spectral types than measured by Schmidt et al. (2015). If the classifications in Bardalez Gagliuffi et al. (2019) are systematically different to the BOSS classifications for M7 to M9 this would translate to differences in their measured space densities compared to ours.
The differences in space density at L1 and L2, where we measure values a factor two smaller than Bardalez Gagliuffi et al. (2019), are harder to understand. The uncertainties plotted on our sample are very small compared to theirs. Any systematic differences in the spectral classifications between samples would not necessarily translate into differences in the measured luminosity function, as type bin effectively covers M6.5 to M7.5. Adding two half sub-type bins effectively covers M6.75 to M7.75. long as absolute magnitudes have been determined correctly for each sample (this would not be the case if the same relation between spectral type and absolute magnitude were used for both samples). To calculate the luminosity function we consider the bins of size 0.5 mag listed in Table 4 and compute the space density in each bin. 7 It is important to allow correctly for the spread in absolute magnitude of each spectral sub-type, to include all subtypes that contribute to a bin (given the spread), and to ignore bins where sub-types not considered here would contribute significantly. The last is true for example for the bin 9.75 − 10.25, where M6.5 stars would contribute. We assume the absolute magnitudes of each of the twelve spectral sub-types are centred on the values listed in Table 1 and are Gaussian distributed with σ M = 0.24 (Sect. 3.2.1). We then integrate the space densities in the MCMC chains into the relevant absolute magnitude bins to compute the luminosity function and uncertainties. For the highest luminosity bin 10.25 − 10.75 any contribution from spectral type M6.5 will be negligible. For the lowest luminosity bin 11.75−12.25 we have added in a small contribution from L3 dwarfs of 0.02 pc −3 , estimated by assuming the space density of L3 dwarfs is the same as that of L2.5 dwarfs.
The results are listed in Table 4, where they are compared against the luminosity function results of Bardalez Gagliuffi et al. (2019). Again we find that the uncertainties quoted by Bardalez Gagliuffi et al. (2019) are smaller than a fractional uncertainty of 1/ √ N , and so the values quoted in Table 4 use 1/ √ N . Our values for the luminosity function are a factor of two to three lower than those of Bardalez Gagliuffi et al. (2019). Substantially lower space densities in each bin could have been anticipated, since the estimated space densities are lower for most spectral types, Fig. 9, and in addition the range of absolute magnitudes for what we call M7 to M9 is larger than their range, so the space density per 0.5 mag. bin is further lowered for our sample. The two estimates of the luminosity function are plotted in Fig. 10. Our results use the MKO J band whereas they use the 2MASS J band. As noted earlier, over this spectral range a star has J 2M ASS − J MKO ∼ 0.1 mag. (Stephens and Leggett 2004). Therefore we have shifted their points 0.1 mag to the left in Fig. 10 to represent their results on the MKO system.
In Fig. 9 we also plot the older results on the luminosity function from Cruz et al. (2007) (their Table 11

SUMMARY
The main points in this paper are the following: 1. We have used a homogeneous sample of 34 000 ultracool dwarfs of spectral type M7 to L2.5, all at distances < 350 pc, to measure the local vertical density distribution of stars in the disk of the Milky Way. The sample was selected in the J band and benefits from high photometric precision and low extinction. We have developed a likelihood analysis that uses all the stars in the sample optimally, accounting directly for the proportion of unresolved binaries in the sample, and treating Malmquist bias.
2. Fitting the function sech α to the density distribution as a function of height from the Galatic plane, we measure α = 0.29 +0.12 −0.13 . The exponential profile α = 0 is contained within the 95% credible interval. Any softening of the density distribution towards the plane relative to an exponential profile is modest. The flatter sech and sech 2 profiles are ruled out at high confidence.
3. Because of the good sampling of the peak of the density distribution the sample is useful for measuring the location of the Galactic plane for this population, and we find the Sun lies at a height 10.9 +1.7 −1.6 pc above the plane.
4. We have used the results of the fit of the density profile to measure the stellar luminosity function at the bottom of the main sequence over the absolute magnitude interval 10.25 < M J < 12.25. Our results for the luminosity function are a factor two to three lower than the measurements by Bardalez Gagliuffi et al. (2019) that uses stars in the local 25 pc radius bubble, but agree well with the older study of Cruz et al. (2007).
volume complete sample of stars with GAIA DR2 parallaxes, where the upper and lower distance limits were a function G−J colour. These distance limits ensured that the sample was representative of the multiplicity of the population. Then the relation between M J (for single stars) and G − J was determined by fitting the mode of the distribution, assuming a Gaussian distribution in M J at fixed colour, and a population of equal-mass binaries offset by 0.75 mag., with the same dispersion. Finally M J for each spectral type was selected as the value on this relation corresponding to the median colour for that spectral type, and the dispersion of M J for the population was also measured. The question then arises of how well this model represents the population or whether it could lead to systematic errors in distances because it is too simple. We can answer this question directly using the original dataset from which the absolute magnitudes were measured. Specifically there are 1737 sources classified M7 in the volume-complete sample of Laithwaite and Warren (2020). Using the double Gaussian model of singles and equal-mass binaries we can assign a probability any source is single or binary, and we select as singles the 1498 sources for which p(single) > 0.5. Each source has an accurate parallax, with typical parallax/error of 30. We characterise the (true) distribution of distances of this sample by the mean, dispersion, and skewness, measuring respectively µ = 123.8 pc, σ = 23.9 pc, and γ 1 = −0.15. Now we ask how accurately do the spectroscopic parallaxes represent this distribution. Estimating the distances based on the apparent magnitude J and the adopted M J , we measure µ = 126.0 pc, σ = 28.5 pc, and γ 1 = 0.05. The mean distance estimated using spectroscopic parallaxes is 1.7% larger than the correct value. This fractional difference is significantly smaller than the random uncertainty on the measured disk scale height (Table 2) and so may be disregarded.
The skewness is very small in both measurements. The standard deviation is larger for the estimated distances compared to the true distances. This is expected, because the estimated distances have a substantial uncertainty of 11%. We can check this by convolving the distribution of true distances with this Gaussian error distribution, and then we measure σ = 27.9 pc, in very good agreement. We conclude that the adopted value of M J and the Gaussian error distribution provide an accurate model for the distances of the population. The uncertainty in the distances smears out the true distribution of distances, but this smearing is fully accounted for in the fitting, and corrected for, by the procedure used to correct for Malmquist bias, explained in Sect. 3.2.1.