Large-scale power loss in ground-based CMB mapmaking

CMB mapmaking relies on a data model to solve for the sky map, and this process is vulnerable to bias if the data model cannot capture the full behavior of the signal. We demonstrate that this bias is not just limited to small-scale effects in high-contrast regions of the sky, but can manifest as $\mathcal{O}(1)$ power loss on large scales in the map under conditions and assumptions realistic for ground-based CMB telescopes. This bias is invisible to simulation-based tests that do not explicitly model them, making it easy to miss. We identify two different mechanisms that both cause suppression of long-wavelength modes: sub-pixel errors and detector gain calibration mismatch. We show that the specific case of subpixel bias can be eliminated using bilinear pointing matrices, but also provide simple methods for testing for the presence of large-scale model error bias in general.


INTRODUCTION
CMB telescopes observe the sky by scanning their detectors across it while continuously reading off a series of samples from the detectors.Typically the signal-to-noise ratio of each sample is small, but by combining a large number of samples with knowledge of which direction the telescope was pointing at any time, it's possible to reconstruct an image of the sky.There are several ways of doing this, with the most common being maximumlikelihood, filter+bin and destriping.These all start by modeling the telescope data as (Tegmark 1997) where d is the set of samples read off from the detectors (often called the time-ordered data), m is the set of pixels of the sky image we want to reconstruct, n is the noise in each sample (usually with significant correlations), and P is a pointing matrix that encodes how each sample responds to the pixels in the image.Given this data model, it's possible to either directly solve for an unbiased map (as in maximum likelihood mapmaking or destriping), or to measure and correct for the bias in a biased estimator (as in filter+bin mapmaking).In the case of maximum-likelihood mapmaking, we solve for the least-squares solution m = (P T N −1 P ) −1 P T N −1 d (2) Linear regression like this is optimal and unbiased as long as only the dependent variable (d in this case) is noisy, but it is relatively well known to be subject to attenuation bias if the independent variables (P in this case) are also noisy (Draper & Smith 1998).This is effectively what -Preview of the model error bias we will discuss in the following sections.Despite the standard expectations that maximumlikelihood mapmaking is optimal and unbiased, the maximumlikelihood solution (right) for a simple toy example is strongly power-deficient on large scales compared to the input signal (left).As we shall see this bias is not unique to maximum-likelihood methods, and can be triggered by several subtle types of model error.
happens when we have model errors: The P we assume deviates from the true behavior of the instrument P true by some perturbation which effectively acts as a noise source.Since P enters into the denominator (P T N −1 P ) in the second power, this noise term is squared, resulting in a noise bias there.We therefore end up dividing by too much, leading to m being underestimated.
Similar things happen in destriping and filter+bin mapmaking for much the same reason: at some point, whether in the map-making or power spectrum estimation steps, one needs to divide out the bias introduced by filtering, and if there's a noise bias in what's being divided, then the result will be biased low.
In the context of CMB mapmaking, this bias has, to the extent that one has been aware of it at all, been thought of as mainly a small-scale effect relevant only in regions of the sky with very high contrast, leading to artifacts like thin stripes or bowling around bright sources (Piazzo 2017;Naess 2019), or as a tiny correction on intermediate scales (Poutanen et al. (2006) saw a bias of 0.6% peaking at ℓ = 800 for simulations of the Planck space telescope).The goal of this paper is to demonstrate the unintuitive and surprising result that that these effects can lead to O(1) bias on large scales everywhere in the map.The full scope of these errors appears to be largely unappreciated, and we fear that some published results may suffer from uncorrected bias at low multipoles in total intensity due to these effects.As we shall see, it's mainly ground-based telescopes that are vulnerable to this bias, which usually manifests as a power deficit at large scales, as illustrated in figure 1.
In the following we will go through two common cases of model error in detail.First subpixel errors, which arise when the smooth sky is approximated as a mosaic of pixels; and then detector gain mismatch, which is becoming increasingly important with today's O(105 )detector telescopes.We will demonstrate the effects with 1D and 2D toy examples and derive analytic expressions for the bias; and test various mitigation methods such as reducing the filtering of correlated noise.

SUBPIXEL ERRORS
Subpixel errors may be both the most common and most important class of model errors in CMB mapmaking.For efficiency reasons P is almost always 1 chosen to use simple nearest-neighbor interpolation, where the value of a sample is simply given by the value of the pixel nearest to it.This means that P can be implemented by simply reading off one pixel value per sample, and its transpose P T consists of simply summing the value of the samples that hit each pixel.However, this comes at the cost of there being a discontinuous jump in values as one scans from one pixel to the next, as illustrated in figure 2. Hence, the closest the model can get to a smooth curve is a staircase-like function that hugs it, leaving a residual full of discontinuous jumps (the blue curve in the figure).
Discontinuous residuals are not necessarily problematic.The trouble arises when this is coupled with a likelihood 2 where some modes have much less weight than others.Ground-based CMB telescopes suffer from atmospheric emission that acts as a large source of noise on long timescales.This leads to time-ordered data noise power spectra similar to the one sketched in figure 3, with a white noise floor at short timescales (high frequency) transitioning to a steep rise of several orders of magnitude as one moves to longer timescales (low frequency).In this case long-wavelength modes have orders of magni-1 While methods with non-discontinuous pointing matrices exist, e.g.Keihänen & Reinecke (2012), these have had very limited adoption.As far as we are aware, every CMB telescope project has used nearest neighbor mapmaking for their official maps.This includes destripers like MADAM (Keihänen et al. 2010) and SRoll (Planck Collaboration 2020) used in Planck, the filter+bin mapmakers used in SPT (Schaffer et al. 2011;Dutcher et al. 2021) and BICEP (BICEP2 Collaboration 2014) and the maximum likelihood map-makers used in ACT (Aiola et al. 2020) and QUIET (Ruud et al. 2015).
2 The equivalent for filter+bin is a filter that impacts some modes more than others (which is the whole point of a filter), and for destriping it's the baseline length and any amplitude priors.
tude lower weight in the likelihood than short-wavelength modes.Put another way, they are orders of magnitude cheaper to sacrifice when the model can't fully fit the data.

1D toy example
To illustrate the interaction between a nearest neighbor pointing matrix's subpixel errors and a likelihood where large scales have low weight, let us consider a simple 1D case where 100 samples scan uniformly across 10 pixels3 : npix = 10 nsamp = 100 pix = np .arange ( nsamp ) .astype ( float ) * npix / nsamp A standard nearest-neighbor pointing matrix for this looks like: We assume a typical Fourier-diagonal "1/f" noise model With this in place, we can now define our map estimators.

Binning
The binned map is simply the mean value of the samples in each pixel, with no weighting: In the nearest neighbor model, the value associated with each sample is simply that of the closest pixel, regardless of where inside that pixel it is.b: Example detector signal (red) for the same path.The closest matching model (green) leaves a jagged residual (blue) that has power on all length scales despite the signal itself being very smooth.For comparison, if our model were a constant zero, then the residual would just be the signal itself (red), and hence smooth.If smooth residuals are much cheaper in the likelihood than jagged ones, then a zero model will be preferred to one that hugs the signal as tightly as possible like the green curve.Fig. 3.-The noise model/inverse weights/inverse filter used in the subpixel bias demonstration in figures 4 and 5.It is a simple Fourier-diagonal 1/f + white noise spectrum typical for groundbased CMB observations.The frequency axis is in dimensionless units in this toy example, but for real telescopes the transition from white noise is typically a few Hz, corresponding to multipoles of hundreds on the sky.The power axis is dimensionless here, but for a real-world case could have units like µK 2 /Hz.

Maximum-likelihood
The maximum-likelihood solution of equation 1 for the sky image m is where N is the covariance matrix of the noise n.This is the generalized least-squares solution for the map.Unliked binned mapmaking, the matrix P T N −1 P except for the special case of uncorrelated noise.Solving for the maximum-likelihood map can therefore be hundreds of times slower, and requires iterative methods for large maps.6In our toy example, N −1 = F , so the maximumlikelihood map is map_ml = np .linalg .solve (( P .T .dot ( F ) .dot ( P ) ) ,P .T .dot ( F .dot ( signal ) ) )

Filter+bin
As the name suggests, filter+bin consists of filtering the time-ordered data, and then making a binned map.We'll use F as our filter, so the filter+bin map is map_fb = np .linalg .solve ( P .T .dot ( P ) , P .T .dot ( F ) .
dot ( signal ) ) The filter+bin map is biased by design, so to interpret or debias it one needs to characterize this bias.There are two common approaches to doing this: Observation matrix and simulations.
Observation matrix -The observation matrix approach recognizes that the whole chain of operations observe, filter, map together make up a linear system, and can therefore be represented as a matrix, called the observation matrix (BICEP2 and Keck Array Collaboration 2016).Building this matrix is heavy, but doable for some surveys.Under the standard assumption that the observation step is given by equation 1, the observation matrix is given by obsmat = np .linalg .inv ( P .T .dot ( P ) ) .dot ( P .T .dot ( F ) .dot ( P ) ) and using it, we can define a debiased filter+bin map map_fb_deobs = np .linalg .solve ( obsmat , map_fb ) Simulations -Alternatively, and more commonly, one can characterize the bias by simulating the observation of a set of random skies, passing them through the filter+bin process, and comparing the properties of the input and output images (e.g.Sayre et al. 2020).The standard way of doing this is by assuming that equation 1 describes the observation process, and that the bias can be described by a transfer function: a simple independent scaling of each Fourier mode.Under these assumptions, we can measure and correct the bias as follows.Destriping splits the noise into a correlated and uncorrelated part, and models the correlated noise as a series of slowly changing degrees of freedom to be solved for jointly with the sky image itself (Sutton et al. 2010;Tristram et al. 2011).The data is modeled as where n w is the white noise with diagonal covariance matrix N w , and Q describes how each correlated noise degree of freedom a maps onto the time-ordered data, typically in the form of seconds (ground) to minutes (space) long baselines.Given this, the maximum-likelihood solutions for a and m are where C a is one's prior knowledge of the covariance of a.
Similarly to binned and filter+bin mapmaking, destriping derives its speed from the diagonality of P T N −1 w P allowing for instant inversion.On the other hand, the equation for the baseline amplitudes a is not diagonal, and destriping allows a speed/optimality tradeoff in the choice of baseline length and prior.It approaches the maximum likelihood map when the baseline length is a single sample and C a + N w = N .We implement this limit below, but explore other choices in section 2.2.In our toy example N w = I, so C a = F −1 − I. 2.1.5.Results Figure 4 compares the recovered 1D sky "images" for the different mapmaking methods for this toy example.All methods are expected to have a small loss of power at small scale (called the "pixel window") due to averaging -Demonstration of large loss of power in long-wavelength mode caused by the poor subpixel treatment in the standard nearestneighbor pointing matrix.The vertical axis is dimensionless in this toy example, but could have units like µK or Jy/sr for a real-world case.Figure 3 shows the noise model/inverse weights/inverse filter used in the various methods.signal: The input signal, a smooth long-wavelength mode, sampled at 10 samples per output pixel.binned: Simple binned map (the unweighted average per pixel).Very suboptimal in the presence of correlated noise, but unbiased.ML: Maximum-likelihood map.2/3 of the signal amplitude is lost despite the naive expectation of biaslessness for this estimator.FB deobs: Filter+bin map debiased using an observation matrix.Identical to ML. FB detrans: Filter+bin map debiased by deconvolving a transfer function measured from simulations.Even more biased than the others due to ignoring mode coupling.destripe: Destriper in the maximum-likelihood limit (1-sample baselines with optimal baseline prior).Identical to ML.
the signal within each pixel, but this effect is well-known, easy to model, and not our focus here.We deconvolve it using the following function before plotting.
def dewin ( x ) : return np .fft .irfft ( np .fft .rfft ( x ) / np .sinc ( freq ) ,n = len ( x ) ) .real After pixel window deconvolution the binned map (green) closely matches the input signal (red).The same can not be said for the other estimators.Maximum likelihood, fil-ter+bin with observation matrix debiasing and destriping (which are all equivalent in the limit we consider here) are strongly biased, with the signal only being recovered with 1/3 of its real amplitude.The situation is even worse for filter+bin with simulation-based debiasing, as this suffers from an additional bias due to assuming that the Fourier modes are independent.7

Explanation
To see why inaccuracies in modeling the signal at subpixel scales can bias the largest scales in the map, let us consider the example in figure 2, where a detector measures a smooth, large-scale signal while moving across a few pixels.With a nearest-neighbor pointing matrix it is impossible to model this smooth signal: the model for each sample is simply that of the closest pixel, regardless of where inside that pixel it is.The model therefore looks like a staircase-like function in the time domain.
Given that we can't exactly match the signal, what is the best approximation?Let us consider two very different alternatives.Model A: The value in each pixel is the average of the samples that hit it, making the model curve trace the smooth signal as closely as it can.This is the green curve in figure 2, and has the sawtooth-like residual shown with the blue curve.It is probably the model most people would choose if asked to draw one manually.Model B: The value in each pixel is zero, and the residual is simply the signal itself.Model B seems like a terrible fit to the data, but under a reasonable noise model for a ground-based telescope, like the one shown in figure 3, it will actually have a higher likelihood (lower χ 2 = r T N −1 r where r is the residual) than model A. The reason is that while model B has a much larger residual than model A, model B's residual is smooth and hence has most of its power at low frequency in the time domain where N −1 is very small.Meanwhile, model A's residual extends to all frequencies due to its jagged nature, including the costly high frequencies.The actual maximum-likelihood solution will be intermediate between these two cases, sacrificing some but not all of the large-scale power in the model to make the residual smoother.
To demonstrate that the bias really is caused by subpixel errors, we repeated the simulation with a signal that follows the same nearest-neighbor behavior as the data model, thus eliminating subpixel errors.The result is shown in figure 5.The bias has disappeared in all methods except filter+bin with transfer-function based debiasing, which has an additional source of bias due to its assumption of a Fourier-diagonal filter response.

2D toy example
The 1D toy example is useful for understanding the origin of the bias, but its unrealistic observing pattern makes it insufficient for exploring optimality/bias tradeoffs in the different methods.We therefore made a larger toy example where a single detector scans at constant speed across a square patch, sampling N scan = 400 equi-spaced rows with N scan equi-spaced samples per row, followed by a column-wise scan the same patch, leading to a total cross-linked data set N samp = 2N 2 scan = 3200 samples long.This equi-spaced grid of samples was chosen to make it easy to simulate a signal directly onto the samples without needing the complication of pixel-to-sample projection.These samples will then be mapped onto a square grid of pixels with a side length of N side = 100 using the different mapmaking methods.
For the signal we draw realizations from a CMB-like 1/l 2 power spectrum with a Gaussian beam with standard deviation σ = 3 pixels.Here l is the pixel-space wavenumber, which we evaluate in the higher resolution sample space as The last step here takes into account that the simulated scanning pattern covers the field twice, first horizontally and then vertically.

Cases
We will investigate 5 classes of nearest-neighbor maps: 1. bin: Simple binned map, which we expect to be unbiased but very noisy 2. ML: Maximum-likelihood map.Ideally unbiased and optimal, but will deviate from this due to model errors.
3. ML wX: Maximum-likelihood maps with a "whitened" noise model, which overestimates the white noise power by a factor 10 X , X ∈ {1, 2, 3}, reducing the overall correlatedness of the noise model.We expect the bias to be proportional to the overall dynamic range of the noise model, so making the noise model whiter should reduce bias.The cost will be suboptimal noise weighting, leading to a noisier map, but it might be worth it.
4. DS X: Destriping map with a baseline of X ∈ {4, 16, 64} samples and no amplitude prior.This is probably the most common type of destriping.The The top left map is the input signal, which is directly evaluated at 4x higher resolution than what is used for reconstructing the output maps.All output maps were built from the same data and assume a nearest neighbor pointing matrix, except for the last row where a bilinear pointing matrix was used.The color range is the same for all panels.The binned map (top right) is bias-free but has uselessly high noise.Maximum likelihood (ML) and destriping with short baselengths (with (DS+) and without (DS) baseline amplitude prior, which only matters for the shortest baselines) are all low-noise but biased on large scales.This bias goes away as the noise model is artificially whitened (ML wX) or the baseline length is increased (for destriping), but this comes at a cost of increased noise on all scales.Bilinear mapmaking (last row) instead avoids the bias by eliminating most of the model error.It can be difficult to judge how significant the bias and noise levels are from these images.See figures 7 and 8 for easier to interpret comparisons of the signal and noise power spectra respectively.-Comparison of the subpixel bias of different mapmaking methods described in section 2.2.1 and discussed in section 2.2.2.Standard maximum-likelihood (thick blue) is strongly biased, as are all but the (very noisy) longest destriping baseline.This bias disappears (ML) or is greatly reduced (DS) when switching to bilinear interpolation in the pointing matrix.See figure 8 for the corresponding noise spectra.The wavenumber l is dimensionless and with a Nyquist frequency of 0.5.we interpret the upturn at l > 0.4 as aliasing, which is expected and not relevant for the biases we consider here.-Comparison of the optimality for the methods defined in section 2.2.1, measured as the debiased noise power spectrum (using the bias measurements from figure 7) for each method relative to that of bilinear maximum-likelihood mapmaking.Note the logarithmic vertical axis, and that the binned case was divided by 10 3 to bring it partially inside the plot bounds.The wavenumber l is dimensionless and with a Nyquist frequency of 0.5.It's clear that simple bias mitigation methods like artificially whitening the noise model or increasing the baseline length are too costly to be practical.
baseline lengths can be compared with the noise knee wavelength of 6.7 pixels ≈ 27 samples.This is the wavelength where the correlated and white noise powers are equal.
5. DS+ X: Like DS X, but uses the correlated part of the maximum-likelihood noise model as an amplitude prior (C a in equation 5).
6. FB: Filter+bin mapmaking where a transfer function is measured using simulations based on the data model, and this is deconvolved when measuring the power spectrum.This is the standard approach for filter+bin mapmaking.In this case only the power spectrum is debiased, not the map, so we don't show an example map for this case.
Since all the mapmaking methods are linear, the signal and noise can be mapped separately.We make 400 signalonly and noise-only data realizations, and map them using each method, computing the mean signal and noise power spectra.

Results
Example maps are shown in figure 6, while the bias and noise are quantified in figures 7 and 8 respectively.As expected the binned map is unbiased but extremely noisy.The maximum-likelihood map is low-noise, but measurably biased on all scales, with a power deficit of a few percent at the smallest scales which grows to almost 100% on the largest scales.The deficit appears to fall proportionally with the whitening, with ML w1 and w2 being respectively 10x and 100x as close to unbiased.Sadly this comes at the cost of 40% and 350% higher noise power respectively.These numbers may differ for real-world cases, but this still seems like a very expensive bias mitigation method.All but the longest-baseline destriped maps are also strongly biased, with the shortest baseline being considerably worse than ML for almost all scales.Much like we saw with the ML variants, the less biased destriping versions come at a high cost in noise.Finally, the filter+bin map is biased even after simulationbased debiasing due to the simulations not capturing the subpixel behavior of the real data.Both the bias and noise levels are the same as maximum-likelihood in this example.
To test whether the observed biases are truly caused by subpixel errors, we repeat the simulations with only one sample per pixel (N scan = N side ).As expected this results in an unbiased power spectrum for all methods.8

Effective mitigation of subpixel errors
There will be no subpixel errors in the limit of infinitely small pixels, so it's tempting to simply reduce the pixel size to solve the problem.This does work, but since subpixel errors are proportional to |∇s • ⃗ ∆|, where s is the true, smooth signal on the sky and ⃗ ∆ = [∆ x , ∆ y ] is the pixel shape, the improvement is only first order in the pixel side length.A much more feasible solution is to instead improve the subpixel handling in the pointing matrix.Going from nearest neighbor to bilinear interpolation is enough to practically eliminate subpixel errors without reducing pixel size.We can implement this in the toy example by defining Since each sample in bilinear mapmaking gets contributions from the four closest pixels, the pointing matrix is about four times slower than the standard nearest neighbor version.However, as shown in figures 7 and 8 this cost is worth it, with bilinear maximum-likelihood mapmaking being bias-free and even lower noise than standard ML.The bias is also greatly reduced for destriping, but not eliminated.
Note that bilinear mapmaking (and in general any mapmaking method where each sample affects multiple pixels) breaks the diagonality of P T P which methods like binning, filter+bin and destriping rely on for their speed.While we have implemented bilinear variants of them here in order to investigate the effect on the bias, bilinear filter+bin and bilinear destriping are too slow to be useful in practice.This is in contrast to bilinear maximum-likelihood mapmaking, which only becomes a few times slower.

GAIN ERRORS
Surprisingly, gain errors can also lead to a scaledependent power loss.This happens when the noise model correlates observations with inconsistent gain, for example multipe miscalibrated detectors with correlated noise.We will illustrate this using a minimal 1D toy example with two correlated detectors, as well as a more realistic 2D example.

1D toy example
Consider two miscalibrated detectors scanning across a line of pixels.Both detectors point in the same direction, and take one sample per pixel.We can model this as where d is the vector of samples, m is the true signal in the pixels, n is Gaussian noise with covariance N , P is the pointing matrix and G is the gain error.Written out in terms of detectors and samples, we can express this as with d and i being the detector and sample index respectively, . This simply expresses that both detectors see the same signal at the same time.Let us assume that the noise covariance is diagonal in Fourier space where f and f ′ are frequency indices, and A f is the noise power spectrum.This gives us the inverse noise model The maximum-likelihood solution of equation 6 is which has expectation value ⟨ m⟩ = m, which is unbiased.However, given that G is supposed to be a gain error, we are presumably not aware of it, and so our data model will instead be The reason why G is still present in front of n in the model is that the noise properties are almost always measured from the data, so our noise model would automatically absorb G.For example, if a detector were to be calibrated too high, it would also appear noisier, and would therefore be downweighted more under inverse variance weigting.Given this model, our maximum-likelihood solution for m is Inserting the actual data behavior from equation 6, we get Since N is diagonal in Fourier space and both G and P are constant in time, the whole equation becomes Fourierdiagonal 16) Inserting eq. 10, this simplifies to (A f cancels) So the effect of the gain miscalibration is an overall scaling of the result as one would expect, g 1 g 2 (g 1 + g 2 )/(g 2 1 + g 2 2 ), times a transfer function This reduces to 1 both when the detectors are uncorrelated (α f = 0) and when there are no relative gain errors (g 1 = g 2 ).
To motivate a frequency-dependent correlation coefficient, let's consider the common case of atmospheric 1/f-noise plus white instrumental noise.Our detectors are co-pointing, meaning they see the same atmosphere, so this nosie component is 100% correlated between detectors, but we assume their instrumental noise to be uncorrelated.
The result is a frequency-dependent correlation coefficient that changes smoothly from 100% corrleated for atmosphere-dominated scales (low frequencies) to 0% correlated for detector-noise dominated scales (high frequencies).The transfer function for this is plotted in figure 10 for the case of a 10% gain error and average weather (tough note that this 1D example does not capture the benefits of crosslinking).-Signal loss for a 1D toy example with noise properties similar to a ground-based CMB telescope average weather (perdetector ℓ knee = 1000) and a 10% gain error.The solid red curve is based on equation 19.The dashed blue curve is a simulation (gain/gain_model_error_toy_1d_sim.py).Unlike the other examples, the horizontal axis uses multipoles instead of Nyquist units to make it easier to compare to CMB observations.

2D toy example
We have also demonstrated the effect of gain errors on a somewhat more realistic 2D simulation, where an array of four detectors scans first horizontally and then vertically across a grid of 200 × 200 pixels, with one sample per pixel and each sample hitting the pixel center to avoid subpixel errors.-Power loss for a 2D toy example with multiple detectors with inconsistent but constant gain errors.Plain unweighted binning (red) is unbiased, but both maximum-likelihood (yellow) and filter+bin (pink) lose much of the signal at large scales (low wavenumber).The light blue curves are maximum-likelihood with 10/100/1000 times as much white noise in the noise model, reducing the overall correlatedness of the noise.This results in less bias, but also much more noise as shown in figure 12.
The detectors are offset by (0,0), (0,1), (1,0) and (1,1) pixels from the first so that the array instantaneously observes more than just a single pixel, but still only a fraction of the whole image, much like a real detector array would.Each detector has a (constant) gain error drawn independently from a normal distribution with a standard deviation of 0.1, and both the data and noise model reflect this.
The noise model consists of white noise plus a common mode with a 1/ℓ spectrum with a spectral index of −3.5 and ℓ knee = 0.125 (corresponding to 1/50th of the image side length).This common mode makes the noise model strongly correlated on large scales, like the atmosphere does for real ground-based observations.The bias requires inconsistent observations to be correlated, so the common mode is crucial for this demonstration.
We solve for binned, maximum-likelihood and filter+bin maps using a data model that is unaware of the gain errors (as one would be in reality, or they wouldn't be gain errors).This is implemented in the program gain_mode_error_2d_toy.py (see section 7).
As expected from the 1D toy example, the gain inconsistency results in a large loss of power on large scales, as shown in figure 11.Filter+bin mapmaking is no less vulnerable to this bias than maximum-likelihood.Figure 12 shows the corresponding relative noise spectra.And as we saw for subpixel errors, attempting to mitigate the bias by artificially reducing the correlations in the noise model is much too costly noise-wise to be a practical mitigation strategy. 9 Detectors with strongly correlated noise on large scales and O(10%) gain error inconsistency are not unrealistic for a ground-based CMB telescope, so it is likely that many ground-based telescopes suffer from this bias to some extent.-How noisy different mapmaking methods are compared to maximum-likelihood for the 2D gain error toy example.Each graph is the ratio of the transfer-function-deconvolved noise spectrum for binned mapmaking (red), whitened maximum-likelihood (yellow) or filter+bin (pink) divided by the corresponding spectrum for standard maximum-likelihood.The whitened curves correspond to a noise model with 10/100/1000 times exaggerated white noise floor for the bottom/middle/top light blue curve.Both these and binned mapmaking are too costly to be realistic solutions to the power loss problem.Note the logarithmic y axis and that the binned curve has been divided by 10 4 to fit in the graph.
dependent.Can these also cause large-scale power loss?As we shall see the answer is "yes, but only in unrealistic circumstances".
Consider a single detector that makes a series of passes over a line of pixels, such that where d ix is the measurement in pixel/sample x of pass i, m x is the true signal in pixel x, g ix is the unmodeled gain and n ix is the noise.We assume that the gain error is ignored in the data model but captured by the noise model: This has maximum-likelihood solution We further assume 1.Both the noise and gain errors are independent between each pass 2. The inverse gain perturbation γ ix = g −1 ix − 1 has stationary covariance Γ, which is therefore diagonal in Fourier space, with diagonal Γ (its power spectrum).
3. Similarly the covariance is Fourier-diagonal with power spectrum N Under these assumptions it can be shown (see appendix B) that when the number of passes is much greater than one the Fourier transform of the maximum likelihood solution becomes where * is the convolution operator and S is the number of pixels, and we've assumed the standard discrete Fourier transform normalization.We see that in the limit of uncorrelated gain errors, Γ f = const, the bias becomes 1/(1 + const • N ).For atmospheric noise this gives a strong suppression of the signal on large scales, just like we saw for detector errors.On the other limit where the gain error changes slowly and is approximately constant over a noise correlation length, we have , and so the bias just becomes a constant scaling.So to get scale-dependent power loss, the gain error needs to vary on time-scales over which the noise is strongly correlated.However, all of this rests on the assumption that the gain errors are captured by the noise model.For the previous case of constant per-detector gain errors this is realistic -after all detectors typically don't all have the same noise level, so they need to be weighted differently, and the noise model ends up absorbing the gain error when being measured from the data itself.The situation is different from time-dependent gains.As we saw above, the noise model must be modulated by gain errors that change quickly relative to the atmosphere if we are to get scale-dependent bias, and for a typical ground-based CMB telescope this works out to be ∼ second time-scales.It is very uncommon to build noise models that track the noise properties with a time resolution this high.
In contrast to detector-dependent gain errors, we therefore do not expect time-dependent gain errors to be a significant source for large-scale power loss.

EASILY MISSED IN SIMULATIONS
What makes model error bias especially insidious is that it is completely invisible to any simulation that does not explicitly include that particular type of model error.For example, a simulation where the CMB is read off from a simulated map using the same pointing matrix as is used in the mapmaking itself would be blind to subpixel errors.And simulations that don't inject gain errors into both the data and noise model will be blind to power loss from gain errors.Given the many possible ways the real data might deviate from one's model of it, it is hard to be sure that one has included all the relevant types of model error in the simulations.It is therefore very easy to trick oneself into believing that one has an unbiased analysis pipeline while there are in fact O(1) biases remaining.

DETECTION STRATEGIES
The unintuitive large-scale effects of model errors originate from the interaction between model errors and nonlocal weighting/filtering.A noise model (or filter) with a large dynamic range, such as one capturing the huge ratio between the long-wavelength and short-wavelength noise power typical for ground-based CMB telescopes, is therefore much more vulnerable to large-scale power loss than one appropriate for space-based telescopes which have almost flat noise power spectra.This suggests the following tests for large-scale model error bias: 1. Compare power spectra with those from a spacebased telescope.This is reliable but comes at the cost of being able to make an independent measurement.
2. Split the data into subsets with different ratios of correlated noise to white noise and check their consistency.This could for example be a split by the level of precipitable water-vapor (PWV) in the atmosphere.High-PWV-data would be expected to have a higher dynamic range and therefore be more vulnerable.
3. Map the same data both using the standard noise model/filter and a less optimal one an artificially reduced lower dynamic range, e.g. one that underestimates the amount of correlated noise or ignores correlations between detectors.The latter should result in less a biased but noisier map, as we saw in figures 7 and 8.If the maps are consistent, then large-scale model error bias is probably not an issue.Alternatively, one can check for consistency when varying the pixel size.We haven't tested the effectiveness of this method, but model error bias should be proportional to the pixel size with nearest neighbor mapmaking.

WHO IS AT RISK?
While model error bias can be expected to always be present at a low level, it only becomes important when dealing with very large amounts of correlated noise (e.g.much more noise correlations on some scales than others, see equation 19).Therefore, the telescopes at risk are those that fulfill the following criteria.
1. Observe from the ground.
2. Care about total intensity, since the atmosphere is mostly unpolarized.
3. Observe at frequencies with high water vapor emission, since water vapor is the main source of clumpiness in the atmospheric emission.At CMB-relevant frequencies, the lower the frequency the safer.Frequencies < 60 GHz should be relatively safe while > 200 GHz are particularly vulnerable.
4. Have sensitive detectors, since lower white noise means a larger ratio between this noise and the correlated noise from the atmosphere.
5. Have a large number of detectors with a low angular separation on the sky, since this increases their correlatedness.
There aren't that many telescopes that fulfill these criteria, as ground-based telescopes have focused mostly on polarization or lower frequencies.So to summarize, why hasn't this been noticed before?Ground-based CMB surveys have mostly focused on polarization, where this effect is orders of magnitude smaller.Those that do publish total intensity power spectra have only reached the high enough sensitivity and detector counts for it to become noticable in the last decade.Given how frequently ground-based CMB surveys cut ℓ ≲ 500 − 1000 it is tempting to speculate that model error bias may have caused null tests to fail at low ℓ without being identified as the culprit.

SOURCE CODE
The source code and data files behind these examples is available at https://github.com/amaurea/model_error2.
The 1D and 2D subpixel simulations are available in subpix/model_error_toy.py and subpix/model_error_toy.py respectively, while the gain error example is in gain/gain_model_error_toy_2d.py. 4. The size of the bias is proportional to the dynamic range of the filter/noise model.Hence it is important for ground-based measurements of the unpolarized CMB due to the presence of largescale atmospheric noise there, but is much less important for polarization measurements or space-based telescopes.
5. Simulations are blind to these biases unless specifically designed to target them.There is a large risk of ending up thinking one has an unbiased pipeline despite there being large bias in the actual CMB maps.
6.An effective way of testing for this bias is to also map the data using a (much) lower dynamic range noise model/filter 11 and checking if this leads to consistent power spectra.
While it's unclear how large a role model-error bias has played in ground-based CMB results published so far, it will be an increasingly important effect as sensitivity increases.We hope upcoming projects like Simons Observatory and CMB-S4 will be on the lookout for large-scale power loss in total intensity, and to be aware that simulations alone cannot prove that their analysis pipeline is bias free.are the values of the pixel to the left and right of where sample a points, and x a is its subpixel position.This linear system can be written as the following matrix equation P 's form will depend on the exact scanning pattern, but the final pixel window should only depend on each pixel being hit equally much at all subpixel locations.For simplicity, we will assume a simple scanning pattern that scans a single time across a row of pixels from left to right, with a constant scanning speed and infinite sample rate.This results in the pointing matrix P taking the form shown in figure 15.  -Structure of a linear-interpolation pointing matrix P and its transpose P T for the case of a set of samples making a single scan across a row of pixels in 1D, assuming wrapping boundary conditions.x ∈ [0, 1) is the subpixel coordinate of each sample.
Given the set of samples d, the least squares estimator for the pixel values v under uniform weighting is v = (P T P ) −1 P T d (A4) The denominator P T P consists of two types of non-zero cases: In the limit M ∞ the odd powers of the random variable γ disappear, so we are left with computing ψ.Since both N and Γ are diagonal in Fourier space, it makes sense to express ψ in Fourier space.Denoting Fourier-space quantities with a tilde, and defining the Fourier transform operators Fig. 1.-Preview of the model error bias we will discuss in the following sections.Despite the standard expectations that maximumlikelihood mapmaking is optimal and unbiased, the maximumlikelihood solution (right) for a simple toy example is strongly power-deficient on large scales compared to the input signal (left).As we shall see this bias is not unique to maximum-likelihood methods, and can be triggered by several subtle types of model error.
03 and α = −3.5, and build the inverse noise matrix/filter/baseline-prior F by projecting the inverse noise spectrum N −1 into pixel space: 4 freq = np .fft .rfftfreq ( nsamp ) inv_ps = 1/(1+( np .maximum ( freq , freq [1]/2) /0.03) ** -3.5) F = np .zeros (( nsamp , nsamp ) ) I = np .eye ( nsamp ) for i in range ( nsamp ) : F [: , i ] = np .fft .irfft ( inv_ps * np .fft .rfft ( I [ i ]) , n = nsamp )The signal itself consists of just a long-wavelength sine wave: 5 signal = np .sin (2* np .pi * pix / npix ) Fig.2.-a: Example path (red) of a detector across a few pixels.The area closest to each pixel center (black dots) is shown with dotted lines.In the nearest neighbor model, the value associated with each sample is simply that of the closest pixel, regardless of where inside that pixel it is.b: Example detector signal (red) for the same path.The closest matching model (green) leaves a jagged residual (blue) that has power on all length scales despite the signal itself being very smooth.For comparison, if our model were a constant zero, then the residual would just be the signal itself (red), and hence smooth.If smooth residuals are much cheaper in the likelihood than jagged ones, then a zero model will be preferred to one that hugs the signal as tightly as possible like the green curve.
iCa = np .linalg .inv ( np .linalg .inv ( F ) -I ) Z = I -P .dot ( np .linalg .solve ( P .T .dot ( P ) , P .T ) ) a = np .linalg .solve ( Z + iCa , Z .dot ( signal ) ) map_ds = np .linalg .solve ( P .T .dot ( P ) , P .T .dot ( signal -a ) ) Fig.4.-Demonstration of large loss of power in long-wavelength mode caused by the poor subpixel treatment in the standard nearestneighbor pointing matrix.The vertical axis is dimensionless in this toy example, but could have units like µK or Jy/sr for a real-world case.Figure3shows the noise model/inverse weights/inverse filter used in the various methods.signal: The input signal, a smooth long-wavelength mode, sampled at 10 samples per output pixel.binned: Simple binned map (the unweighted average per pixel).Very suboptimal in the presence of correlated noise, but unbiased.ML: Maximum-likelihood map.2/3 of the signal amplitude is lost despite the naive expectation of biaslessness for this estimator.FB deobs: Filter+bin map debiased using an observation matrix.Identical to ML. FB detrans: Filter+bin map debiased by deconvolving a transfer function measured from simulations.Even more biased than the others due to ignoring mode coupling.destripe: Destriper in the maximum-likelihood limit (1-sample baselines with optimal baseline prior).Identical to ML.
Fig.5.-Like figure4, but with the input signal having the same nearest-neighbor pixelization as the models.In this case all models except FB detrans are unbiased.
ly = np .fft .fftfreq ( N_scan ) [: , None ] * N_side / N_scan lx = np .fft .fftfreq ( N_scan ) [ None ,:] * N_side / N_scan l = ( ly **2+ lx **2) **0.5With this we can define the signal power spectrum C l = l −2 and beam B l = exp(−l 2 σ 2 /2), and draw signal realizations as signal_map = np .fft .ifft2 ( np .fft .fft2 ( np .random .s ta nd a rd _ no rm a l (( N_scan , N_scan ) ) ) * Cl **0.5*Bl ) .real signal_tod = np .concatenate ([ signal_map .reshape ( -1) , signal_map .T .reshape ( -1) ]) pix_pat1 = ( np .mgrid [: N_scan ,: N_scan ]* N_side / N_scan ) .reshape (2 , -1) pix_pat2 = pix_pat1 [:: -1] pix = np .concatenate ([ pix_pat1 , pix_pat2 ] ,1) Fig.6.-Examplesignal and noise maps for the different mapmaking methods.The top left map is the input signal, which is directly evaluated at 4x higher resolution than what is used for reconstructing the output maps.All output maps were built from the same data and assume a nearest neighbor pointing matrix, except for the last row where a bilinear pointing matrix was used.The color range is the same for all panels.The binned map (top right) is bias-free but has uselessly high noise.Maximum likelihood (ML) and destriping with short baselengths (with (DS+) and without (DS) baseline amplitude prior, which only matters for the shortest baselines) are all low-noise but biased on large scales.This bias goes away as the noise model is artificially whitened (ML wX) or the baseline length is increased (for destriping), but this comes at a cost of increased noise on all scales.Bilinear mapmaking (last row) instead avoids the bias by eliminating most of the model error.It can be difficult to judge how significant the bias and noise levels are from these images.See figures 7 and 8 for easier to interpret comparisons of the signal and noise power spectra respectively.
Fig. 7.-Comparison of the subpixel bias of different mapmaking methods described in section 2.2.1 and discussed in section 2.2.2.Standard maximum-likelihood (thick blue) is strongly biased, as are all but the (very noisy) longest destriping baseline.This bias disappears (ML) or is greatly reduced (DS) when switching to bilinear interpolation in the pointing matrix.See figure8for the corresponding noise spectra.The wavenumber l is dimensionless and with a Nyquist frequency of 0.5.we interpret the upturn at l > 0.4 as aliasing, which is expected and not relevant for the biases we consider here.
Fig.8.-Comparison of the optimality for the methods defined in section 2.2.1, measured as the debiased noise power spectrum (using the bias measurements from figure7) for each method relative to that of bilinear maximum-likelihood mapmaking.Note the logarithmic vertical axis, and that the binned case was divided by 10 3 to bring it partially inside the plot bounds.The wavenumber l is dimensionless and with a Nyquist frequency of 0.5.It's clear that simple bias mitigation methods like artificially whitening the noise model or increasing the baseline length are too costly to be practical.
Fig. 9.-Comparison of the 1D pixel window for nearest neighbor mapmaking (red, sinc(l)) and linear mapmaking (blue).The 2D pixel window is the outer product of the 1D one along each axis.The pixel windows model the response of the Fourier coefficients to binned (unweighted) mapmaking.Square it to get the effect on the power spectrum.inds = np .concatenate ([ iy1 * N_side + ix1 , iy1 * N_side + ix2 , iy2 * N_side + ix1 , iy2 * N_side + ix2 ]) P_lin = scipy .sparse .csr_array (( weights ,( samps , inds ) ) , shape =( N_samp , N_pix ) ) Bilinear mapmaking results in a different pixel window than the standard pixwin_nn = sinc(ly)[:,None]*sinc (lx)[None,:] of nearest neighbor mapmaking.We derive this in appendix A, and the result is pixwin_lin = linwin1d(ly)[:,None]*linwin1d(lx)[None,:], linwin1d(l)= sinc(l)**2/(1/3*(2-cos(2*pi*l))).This is plotted in figure 9.Since each sample in bilinear mapmaking gets contributions from the four closest pixels, the pointing matrix is about four times slower than the standard nearest neighbor version.However, as shown in figures 7 and 8 this cost is worth it, with bilinear maximum-likelihood mapmaking being bias-free and even lower noise than standard ML.The bias is also greatly reduced for destriping, but not eliminated.Note that bilinear mapmaking (and in general any mapmaking method where each sample affects multiple pixels) breaks the diagonality of P T P which methods like binning, filter+bin and destriping rely on for their speed.While we have implemented bilinear variants of them here in order to investigate the effect on the bias, bilinear filter+bin and bilinear destriping are too slow to be useful in practice.This is in contrast to bilinear maximum-likelihood mapmaking, which only becomes a few times slower.
Fig.10.-Signalloss for a 1D toy example with noise properties similar to a ground-based CMB telescope average weather (perdetector ℓ knee = 1000) and a 10% gain error.The solid red curve is based on equation 19.The dashed blue curve is a simulation (gain/gain_model_error_toy_1d_sim.py).Unlike the other examples, the horizontal axis uses multipoles instead of Nyquist units to make it easier to compare to CMB observations.
Fig.11.-Power loss for a 2D toy example with multiple detectors with inconsistent but constant gain errors.Plain unweighted binning (red) is unbiased, but both maximum-likelihood (yellow) and filter+bin (pink) lose much of the signal at large scales (low wavenumber).The light blue curves are maximum-likelihood with 10/100/1000 times as much white noise in the noise model, reducing the overall correlatedness of the noise.This results in less bias, but also much more noise as shown in figure12.
Fig.12.-How noisy different mapmaking methods are compared to maximum-likelihood for the 2D gain error toy example.Each graph is the ratio of the transfer-function-deconvolved noise spectrum for binned mapmaking (red), whitened maximum-likelihood (yellow) or filter+bin (pink) divided by the corresponding spectrum for standard maximum-likelihood.The whitened curves correspond to a noise model with 10/100/1000 times exaggerated white noise floor for the bottom/middle/top light blue curve.Both these and binned mapmaking are too costly to be realistic solutions to the power loss problem.Note the logarithmic y axis and that the binned curve has been divided by 10 4 to fit in the graph.
After reviewing LAMBDA's list of CMB experiments we found four > 60 GHz ground-based CMB telescope projects started after year 2000 10 with published TT spectra for ℓ < 2000: ACBAR, ACT, BICEP and SPT.Of these, ACBAR, ACT(Choi et al. 2020) and SPT-3G(Balkenhol et al. 2022) cut their low-ℓ data, indicating problems recovering these scales.Meanwhile BICEP (BICEP1 Collaboration 2013; BICEP2 and Keck Array Collaboration 2018) andSPTpol (Henning et al. 2018) publish TT spectra down to ℓ < 100 with no signs of problems.The only case where we could see clear evidence for large-scale power-loss is in the multipoles just above the multipole cutoff in ACT, as seen in figure13(reproduced fromChoi et al. (2020)).

Fig. 13 .
Fig. 13.-Large-scale power loss in ACT DR4.The ACT TT power spectrum (blue) and the ACT-Planck cross-spectrum (orange) are both low at ℓ ≲ 800 compared to the Planck power spectrum (green) in the two ACT patches shown.This is especially clear in the third panel, which shows the difference between ACT and the ACT-Planck cross.Reproduced from Choi et al. (2020).

APPENDIXA.
LINEAR PIXEL WINDOWLet us start by considering the 1D case.Given a set of pixels {i} with values {v i } we define the linear interpolation value d at some pixel coordinate i + x where x ∈ [0, 1) asd = (1 − x)v i + xv i+1 (A1)This is illustrated in figure 14.
Fig. 14.-Definition of the quantities involved in linear interpolation.Given values {v i } sampled at locations {i}, we define the linear interpolation value d at some location i + x where x ∈ [0, 1) as d = (1 − x)v i + xv i+1 .We can generalize this to a set of samples d a withd a = (1 − x a )v left a + x a v right a

Fig
Fig.15.-Structure of a linear-interpolation pointing matrix P and its transpose P T for the case of a set of samples making a single scan across a row of pixels in 1D, assuming wrapping boundary conditions.x ∈ [0, 1) is the subpixel coordinate of each sample.
Fig.16.-Comparison the 1D pixel windows for nearest neighbor mapmaking (Wnn) and linear mapmaking (Wnn), plotted up to the Nyquist frequency, which is 1 2 in these units.