Accredited official statistics

Background Quality Report: Income Tax statistics and distributions

Published 27 June 2024

1. Contact

  • Organisation unit - Knowledge, Analysis and Intelligence (KAI)

  • Name – M Brunning

  • Function - Statistician, Personal Taxes

  • Mail address – 100 Parliament Street, London, SW1A 2BQ

  • Email - [email protected]

2. Statistical presentation

2.1 Data description

This publication provides detailed statistics about individuals liable to UK Income Tax and their incomes for the tax year with the most recent outturn data, alongside projection estimates for the 3 subsequent tax years. These are estimated using the Survey of Personal Incomes (SPI), and includes data on the number of individual Income Tax payers, Income Tax liabilities, and average rates of Income Tax.

2.2 Classification system

The published estimates are based on HMRC’s Survey of Personal Incomes (SPI) statistics. The SPI is carried out annually and is based on information held by HMRC on the income assessable for Income Tax for individuals who could be liable to UK Income Tax in a given tax year. Published breakdowns of the number of taxpayers, income, tax liabilities, allowances and deductions are determined based on data submitted by the individual in their Self Assessment Return or by their employer in the PAYE data.

A unique Income Tax payer reference assigned to each individual is used to aggregate the data. The data is subset by the following factors:

  • age (under or over 65, and over state pension age)

  • sex (male or female)

  • Government Office Region (the North East, North West, Yorkshire and Humberside, East Midlands, West Midlands, East of England, South East, London, South West, England, Scotland, Wales and Northern Ireland)

  • Marginal rate of tax (the rate of Income Tax paid on the next £1 of income)

  • Income percentile group

  • Income range

  • Rate of Income Tax (basic, higher, additional) paid on different types of income (earnings, savings, dividends)

For more information on data sampling and methodology, see Annex B of the Supporting Documentation for the relevant year. 

2.3 Sector coverage

Income Tax is an annual tax paid on most sources of income including pay from employment, profits from self-employment, private and occupational pensions, retirement annuities, state retirement pensions, foreign income, income from property, taxable social security income, savings income, income from shares (dividends) and income from trusts. Employees who receive non-cash benefits from their employers such as company cars, fuel, medical insurance, living accommodation or loans also pay Income Tax on these benefits.

Adding all these sources together will give an individual’s total income assessable for tax, an aggregate that appears in several tables in this publication. Some sources of income are not liable for Income Tax including certain social security benefits, Child Tax Credit and Working Tax Credit, and income from tax exempt savings accounts (such as Individual Savings Accounts (ISAs) and some National Savings & Investment products). Most people in the UK get a Personal Allowance which is the amount of income on which no tax will be charged. Some people are also eligible for tax reliefs.

2.4 Statistical concepts and definitions

Once tax-free allowances have been taken into account, Income Tax due is calculated using different Income Tax rates for specific types of income across a series of Income Tax bands. There are 3 different sources of income for Income Tax purposes:

  • earnings, or income other than savings and dividends, also known as non-savings non-dividends (NSND) income (for more information please see the Glossary of the Supporting Documentation for the relevant year)

  • savings income (such as bank and building society interest)

  • dividends (such as income from shares in UK companies)

Income Tax powers over earned income have been devolved to Scotland and Wales. In the rest of the UK (rUK), earned income is taxed within three main bands of Income Tax rates: the basic rate, the higher rate and the additional rate. In Wales, earned income is taxed within the same bands as the rest of the UK but the rates can be different. In Scotland, earned income is taxed within 6 main bands of rates: the starting rate, basic rate, intermediate rate, higher rate, advanced rate and top rate. Savings and dividend income are taxed on same basis across the whole of the UK, with the bands aligned to the rUK bands for earned income but the rates for dividends can be different. Some basic rate taxpayers are also eligible for a starting rate for savings.

Income Tax typically works on a ‘stack’ basis. This means that earnings are generally taxed first, then savings income and finally dividend income. This means that if an individual has earnings after allowances sufficient to completely fill the basic rate Income Tax band, all savings or dividend income would be charged at the higher or additional rates of tax.

Further detail on how Income Tax liabilities are calculated is provided in Annex A of the Supporting Documentation, and a full list of definitions of terms used in this publication can be found Glossary of Terms section in the Supporting Documentation for the relevant year. 

2.5 Statistical unit

The unit in the statistics release are Income Tax payers in the UK.

2.6 Statistical population

All individuals liable to pay Income Tax in the UK. Individuals who may have income but are not liable for Income Tax are excluded; this may occur if the individual has no Income Tax liability due to their deductions, reliefs and Personal Allowances exceeding their total income or if their income is below their Personal Allowance.

2.7 Reference area

The geographic region covered by the data is the United Kingdom (UK).

2.8 Time coverage

The statistics primarily focus on the new outturn data from the most recent SPI and the projected estimates for the three subsequent tax years (up to the current tax year at time of publication). Some tables present statistics from tax year 1990 to 1991 onwards. Note that data for 2008 to 2009 is not available.

3. Statistical processing

3.1 Source data

The published estimates of the number of individuals subject to UK Income Tax with positive Income Tax liabilities (hereafter referred to as Income Tax payers) and the magnitude of those liabilities are based on HMRC’s Survey of Personal Incomes (SPI) statistics. The SPI is carried out annually and is based on information held by HMRC on the income assessable for Income Tax for individuals who could be liable to UK Income Tax in a given tax year.

For each sample individual the SPI includes information on incomes assessable for Income Tax along with basic information on individual characteristics such as age and sex. The survey data is used to estimate Income Tax liabilities arising on incomes in a given tax year for each individual in the SPI sample; these amounts are summarised in Tables 2.1 to 2.6. The data in this release is based on the latest SPI tax year and earlier, and includes projections of this data for the three subsequent tax years.

The SPI data is currently sampled from two HMRC operational computer systems:

  • the National Insurance and PAYE Service (NPS) system covers all employees and occupational pension recipients with a Pay As You Earn (PAYE) record

  • The Computerised Environment for Self Assessment (CESA) system covers people with self-employment, rental or untaxed investment income. It also covers those with higher incomes and other people with complex tax affairs. Where people have both NPS and CESA records, their CESA record is selected because it provides a more complete picture of their taxable income.

The SPI previously also sampled from another HMRC operational computer system (but this is no longer the case from the tax year 2019 to 2020):

  • the Claims system includes individuals without NPS or CESA records that have had too much Income Tax deducted at source and claim a repayment. For operational efficiencies, R40 Claim forms were migrated onto NPS. The last survey that samples were drawn from the Claims system was the 2015 to 2016 SPI.

Additional data about the sample cases are also collected from other HMRC administrative systems, as follows:

  • The PAYE Real Time Information (RTI) system covers submissions from employers and pension providers to report Income Tax and National Insurance contributions before they pay wages or pensions to employees and pensioners. This has been used as the source of ‘net pay’ pension contributions in the SPI.

3.2 Frequency of data collection

The SPI dataset is compiled on an annual basis and is usually available around 23 months after the end of the tax year. The raw data is drawn from HMRC’s systems approximately one year after the tax year end and it takes about a year to process, analyse and produce the SPI publication.

3.3 Data collection

Separate samples are drawn from each of the administrative systems and different sampling strategies are used for each, which reflect the skewed distribution of Income Tax liabilities. The samples are structured as follows:

  • the PAYE population from NPS is stratified by age, sex and the sum of pay plus occupational pension income for the previous tax year. Where the previous year’s income is not available, cases are stratified by sex and by whether they are a higher rate or additional rate taxpayer for the current tax year based on information available at the time the sample was drawn. Approximately 400,000 individuals in the SPI sample are selected from NPS. See the Supporting Documentation for the relevant year for further information on the sample counts and the associated sampling rates.

  • for the Self Assessment population from CESA, the main source of income (self-employment or employment/occupational pension) and ranges of income and tax are used to stratify the sample. Approximately 450,000 individuals in the SPI sample are selected from CESA. See the Supporting Documentation for the relevant year for further information on the sample counts and the associated sampling rates.

  • from the tax year 2019 to 2020 onwards, a separate sample for the claims population is no longer included. The majority of claims cases were non-taxpayers and therefore, excluded from the statistical tables in this publication.

Some individuals with a PAYE record are also in the SA system. These individuals are excluded from the PAYE population prior to sampling, as their SA record provides a more complete picture of their taxable income.

Once data are collected for the constituent parts of the sample, the datasets are joined together.

The sampling strategies described above intentionally yield large sub-samples of SPI cases with very high incomes and subsequently account for a large proportion of total Income Tax liabilities. This increases the precision of estimates of liabilities and taxable incomes drawn from the SPI. After allowing for non-response and for records that failed data validation tests, the SPI contains 800,000 to 900,000 records, representing approximately 1.5% of individuals in contact with HMRC. For exact population counts, please refer to the Supporting Documentation for the relevant year.

3.4 Data validation

Checks carried out on the SPI include:

  • Automated checks take place when loading the data into the analysis database.

  • Analysts check that the number of records loaded into the analysis database is as expected.

  • Data validation checks are performed on date of birth, sex, income and postcode. Data are checked against other internal systems and in some cases are validated manually.

  • Analysts check any outliers in the data which are then examined on a case-by-case basis. Outlier checks sometimes result in adjustments to the dataset as required to improve accuracy or prevent skewing.

Checks carried out when projecting the SPI to future years include:

  • Analysts check that the calibration targets for pay, savings and dividends have been met.

  • Year on year growth for the component parts of pay, net interest rates and dividends are measured to ensure these are in line with trends in the OBR’s economic assumptions.

  • The time series of all projected years is reviewed to check the growth in Income Tax liabilities and total Income Tax payers is as expected.

Any large or unexpected changes in the number of Income Tax payers, amount of income or Income Tax liability from one statistical release to the next are investigated.

3.5 Data compilation

Imputation of characteristics

HMRC does not have complete information on the age and sex of Income Tax payers. Where no information is held, estimated values are imputed.

Imputation of savings income

The coverage of savings income for the sample drawn from NPS prior to 2018 to 2019 was incomplete. This is because most Income Tax payers with savings income do not report it to HMRC:

  • prior to April 2016, banks and building societies deducted tax at a basic rate on interest and paid this to HMRC. Only individuals below the Personal Allowance or above the higher rate threshold needed to report interest to HMRC to ensure that the correct tax was paid

  • post April 2016, most interest income is covered by a combination of the Personal Savings Allowance, the Personal Allowance, and the starting rate for savings and therefore is not liable to Income Tax. Those that do need to pay Income Tax on their savings income do so by contacting HMRC to report their savings income, where this information has not already been provided through Self Assessment

HMRC collects data on savings income directly from banks and building societies. From 2018 to 2019 onwards, this income data feeds into the NPS system so that HMRC can collect the appropriate tax without Income Tax payers needing to contact HMRC directly. This income has replaced the previous method which estimated savings interest by imputation from the 2018 to 2019 SPI onwards.

Previous savings imputations (prior to the 2018 to 2019 SPI) followed a similar method to the dividend imputation outlined below, however the targets for the total number of individuals and the total interest received were taken from data provided by the UK banks.

Between the 2016 to 2017 and 2018 to 2019 releases, the claims component was imputed by selecting the claims cases that had no employments from the tax year 2015 to 2016 survey (the last year data was available). The income for these cases was projected using Office for Budget Responsibility determinants in order to estimate the level of income and the tax due in the respective tax year.

Imputation of dividend income

In order to create a full picture of total income for the SPI, it is necessary to impute values of dividend income to some sample cases. For the dividends imputation, the amount for each SPI case:

  • is known for cases in Self Assessment from the amount declared on the Self Assessment Return

  • can be inferred or estimated reasonably for NPS cases where there is an adjustment to the tax code for taxpayers

  • is unknown for NPS cases where there is no coding adjustment

Where no information at case level is available from HMRC administrative systems, estimated values are imputed to cases so that the population as a whole has amounts consistent with evidence from other sources.

Starting from control totals at UK level for the number of cases and total amount of dividends, the Self Assessment and NPS cases with coding adjustments are deducted to leave targets for the remainder of the taxpayer population. These targets are at UK level – no attempt is made to control the targets to sub-UK geographical units. The cases to which amounts are attached by the imputation process and the amounts attached are determined by probabilistic methods with just the UK targets and distributions in mind. For dividend income, the number of non-Self Assessment cases with dividend income and distribution of imputed amounts were inferred from Family Resources Survey data for the relevant tax year. The cases to which amounts are attached by the imputation process and the amounts attached are determined by probabilistic methods with just the UK targets and distributions in mind.

Imputation of pension income

As with dividends income, HMRC does not have complete information about superannuation or personal pension contributions. Pension contributions can be made under 3 types of arrangement, either a net pay scheme, relief at source or salary sacrifice scheme.

HMRC holds information on the value of employee pension contributions paid under “net pay arrangements” in Real Time Information (RTI) submissions by their employer. This data has been used to match SPI cases to “net pay” pension contributions. Pension schemes operating a net pay scheme are occupational pension schemes. However, as some employers operate relief at source or salary sacrifice schemes, contributions to those schemes are not included in the “net pay” figures and thus the “net pay” figures do not include all “occupational” individual contributions.

The RPSCOM100(Z) is an annual return completed by pension scheme administrators which provides HMRC with information about individual contributions made to relief at source pension accounts in the tax year. It is used to match relief at source pension contributions to cases in the SPI, including basic rate tax relief. Where applicable, information submitted via RTI and Self Assessment returns has been used. Individual contributions recorded in the different data sources have been adjusted, where necessary, to include basic rate tax relief. Some adjustments are made to align total contribution values with the APSS106. The APSS106 is a form used by pension scheme administrators to make annual claims for the recovery of tax deducted from individuals.

A change was made to the classification of pension contributions in the tax year 2017 to 2018, to better reflect their treatment in the tax system. As a result of the methodological improvement, the new pension contributions statistics (net pay and relief at source) aren’t comparable with statistics under the previous classifications (occupational and personal).

The methodology for estimating contributions to relief at source pensions was revised in the 2021 to 2022 tax year so it better aligns with the methodology used in HMRC’s Private Pension statistics. In previous tax years, only information from Self Assessment returns was used to match Self Assessment cases.

In addition, in the 2020 to 2021 tax year only, relief at source pension contributions matched to PAYE cases in the SPI were net of basic rate tax relief. Whereas in previous tax years and the 2021 to 2022 tax year, contributions were gross of basic rate tax relief. It is estimated that including basic rate tax relief would increase total individual relief at source contributions in the 2020 to 2021 SPI by around £1.5 billion to £12.9 billion.

Additionally, the SPI includes contributions made to retirement annuity contracts and contributions made to employer’s schemes not deducted at source.

Employers, individuals and schemes providers are not required to report individual contributions made using salary sacrifice to HMRC. These contributions are deducted from an individual’s gross earnings and added to the contributions made by their employer. Individual contributions made using salary sacrifice arrangements are not included in this publication.

The estimated value for “relief at source” and for “net pay” contributions has been combined with other pension reliefs and included in these statistics. For more info on these pensions data sources please refer to the latest methodology document for the Private pension statistics release.

Imputation of Marriage Allowance

HMRC collects data regarding claimants (receivers and transferers) of Marriage Allowance through coding adjustments for those in NPS or via Self Assessment returns. The latest available administrative data is matched to the SPI sample data allowing for the calculation of tax liabilities adjusting for Marriage Allowance.

The SPI sample is not stratified around any subsets of populations including Marriage Allowance claimants, and therefore when grossed up and subset for just Marriage Allowance claimants it does not exactly match the population of claimants separately estimated and published using the collected administrative data. To calibrate to published Marriage Allowance claimants, estimated values are imputed to cases so that the population as a whole has amounts consistent with the evidence from these other sources.

Starting from published estimates at UK level for the number of cases, the Self Assessment and NPS cases with coding adjustments are deducted to leave targets for the remainder of the claimant population. These targets are at UK level – no attempt is made to control the targets to sub-UK geographical units. For Marriage Allowance, the number of eligible claimant cases were inferred from Family Resources Survey data. The cases to which claims are received or transferred are attached by the imputation process to align to the published estimates of take up.

Grossing

The sample is drawn from records held on HMRC transactional systems and the available information reflects what is known about the cases approximately one year after the tax year to which the survey relates. Allowance is made for Self Assessment cases yet to file a return and the overlap between the Self Assessment and PAYE systems when estimating the likely final grossed population for the tax year. The SPI data reflects the information held on HMRC systems at the time the sample was drawn, therefore values associated with some cases, particularly in Self Assessment could continue to evolve after the survey is completed.

Each SPI sample case has a grossing factor associated with it and these are used to create estimates of overall numbers of Income Tax payers, total income and total Income Tax liabilities for the entire UK population. Grossing factors vary depending on different factors, for example where the sample case data was sourced from (PAYE or Self Assessment), income type, and where in the income distribution the sample individual sits.

When projecting the SPI to future years, the grossing factors are changed in line with meeting targets for pay, dividends and net interest rates.

Modelling Income Tax liabilities with the Personal Tax Model

Data on the number of Income Tax payers, total Income Tax liabilities, and the distribution of Income Tax liabilities presented in Tables 2.1 to 2.6 are estimated using HMRC’s Personal Tax Model (PTM).

The PTM is a micro simulation model of the UK Income Tax system. ‘Micro simulation’ refers to modelling with individual level data, in this case using the SPI dataset. For each SPI sample case, the PTM models Income Tax liabilities in a given tax year based on incomes assessable for Income Tax and the main features and parameters of the Income Tax system for that year.

An overview of the PTM modelling process applied to each SPI sample case is provided in Annex B of the Supporting Documentation for the relevant year.

Projection estimates for tax years beyond the SPI

This publication includes projection estimates up to the current tax year to provide a more up-to-date assessment of the distributions for Income Tax payers and Income Tax liabilities. While the projection methods aim to capture the most important influences on Income Tax payer numbers and Income Tax liabilities, the projection of the base SPI data to later years means that data for these years is subject to greater uncertainties and larger error margins than the outturn data. Projections beyond the current tax year are not provided because Income Tax rates, allowances and thresholds impacting on the statistics are not known until announced by the Government.

The economic series used in the projection processes are consistent with the OBR’s March 2024 Economic and Fiscal Outlook for the UK economy. Outturns and OBR forecasts for key series including employment, earnings, prices and interest rates are found in Table 2.9 ‘Determinants of the fiscal forecast’. Population projections used in this release are published by the ONS.

Estimated Income Tax payer numbers in the projection years are calculated by rescaling the base year grossing factors for individual SPI sample cases, according to a high-level partition of the SPI sample by each case’s main income source. Nominal income amounts recorded for each SPI sample case are projected at the UK level using OBR’s most recently published forecasts for the macroeconomic data series relevant to the income sources recorded in the SPI. For each income source, this uprating is generally uniform across all sample cases, however, in the case of pay the projection factors vary across the pay distribution according to the recent trends revealed in RTI data.

No distinction is made for sex or any factor other than income. Since the RTI and SPI samples are different, the resulting mean earnings growth across all SPI cases would differ from the OBR forecast. Therefore, a further rescaling is applied to all sample cases to ensure that mean earnings growth does align with the OBR forecast.

For more detail on the projection methodology please see Annex B of the Supporting Documentation for the relevant year.

Dividend Income Tax adjustments

Policy changes to dividend tax rates and allowances in 2016 to 2017 resulted in behavioural changes by some individuals with this type of income. Adjustments have been made to the SPI data and projected estimates to account for this behavioural response; for more details please see Annex B of the Supporting Documentation for the relevant year.

4. Quality Management

4.1 Quality assurance

All official statistics produced by KAI, must meet the standards in the Code of Practice for Statistics produced by the UK Statistics Authority and all analysts adhere to best practice as set out in the ‘Quality’ pillar.

Analytical quality assurance (QA) describes the arrangements and procedures put in place to ensure analytical outputs are error free and fit-for-purpose. It is an essential part of KAI’s way of working as the complexity of our work and the speed at which we are asked to provide advice means there is a high risk of error, which can have serious consequences on KAI’s and HMRC’s reputation, decisions, and in turn on peoples’ lives.

Every piece of analysis is unique, and as a result there is no single QA checklist that contains all the QA tasks needed for every project. Nonetheless, analysts in KAI use a checklist that summarises the key QA tasks and is used as a starting point for teams when they are considering what QA actions to undertake.

Teams amend and adapt it as they see fit to take account of the level of risk associated with their analysis and the different QA tasks that are relevant to the work.

At the start of a project, during the planning stage, analysts and managers make a risk-based decision on what level of QA is required.

Analysts and managers construct a plan for all the QA tasks that will need to be completed, along with documentation on how each of those tasks are to be carried out, and turn this list into a QA checklist specific to the project.

Analysts carry out the QA tasks, update the checklist, and pass onto the Senior Responsible Officer for review and eventual sign off.

4.2 Quality assessment

The QA for this project adhered to the framework described in ‘4.1 Quality assurance’ and the specific procedures undertaken were as follows:

Stage 1 – Specifying the question

Up to date documentation was agreed with stakeholders setting out outputs needed and by when; how the outputs will be used; and all the parameters required for the analysis.

Stage 2 – Developing the methodology

Methodology was agreed and developed in collaboration with stakeholders and others with relevant expertise, ensuring it was fit for purpose and would deliver the required outputs.

Stage 3 – Building and populating a model/piece of code

  • analysis was produced using the most appropriate software and in line with good practice guidance

  • data inputs were checked to ensure they were fit-for-purpose by reviewing available documentation and, where possible, through direct contact with data suppliers

  • QA of the input data was carried out

  • the analysis was audited by someone other than the lead analyst – checking code and methodology

Stage 4 – Running and testing the model/code

  • results were compared with those produced in previous years and differences understood and determined to be genuine

  • results were compared with comparable independent estimates, and differences understood

  • results were determined to be explainable and in line with expectations

Stage 5 – Drafting the final output

  • checks were completed to ensure internal consistency (e.g. totals equal the sum of the components)

  • the final outputs were independently proof read and checked

5. Relevance

5.1 User needs

The statistics in this publication are used by a variety of organisations mainly concerned with Government decision making about Income Tax policy, both in a policy making and policy monitoring context. The United Kingdom Statistics Authority Monitoring Brief 6/2010 The Use Made of Official Statistics provides a generic framework for classes of use of Official Statistics.

The projection estimates form the basis for HMRC’s detailed assessments of the Exchequer costs and impacts on individuals of potential changes to the Income Tax system. This informs the Government’s Income Tax policy decisions, and they are used by other Government departments for similar purposes. They are also used by Parliament, Government departments such as HM Treasury, some private organisations including policy ‘think tanks’, and the media and other commentators to monitor Income Tax trends and distributions. They inform, for example, users’ assessments of the impacts of past Income Tax policy changes or the sustainability of the UK public finances. For some users, such as the OBR, the statistics are used explicitly for economic and Income Tax forecasting, informing assessments of recent trends or used as specific inputs to the forecasting process.

The statistics are also used by HMRC and other organisations in assessments of the operation of the UK Income Tax system and its impact on individuals.

5.2 User satisfaction

Formal investigations into user satisfaction have not been undertaken, however feedback from users following the release have been received and KAI are always open to ideas for new analysis to meet changing user requirements.

The recent user consultation, published in 2022, included seeking views on reducing the coverage of this publication. As a result, Section B presenting data from Table 2.7 as a complement to the SPI-based statistics has been discontinued, but the legacy data is still available on the publication page. HMRC has data suggesting that over two-thirds of tax credit claimants have already moved to Universal Credit. As the impact of tax credits was the main differentiating feature of the selection of example individual and couple scenarios and earnings levels in table 2.7, it was felt this represents a high enough proportion no longer claiming tax credits to support the discontinuation of its production. Details are set out in the consolation document, responses and summary of changes.

The previous user engagement exercise ran from November 2017 to July 2018, and as part of this the frequency of the publication was permanently changed to once a year. Only a very limited number of responses were received without any objection to the change in frequency (more details were set out in the June 2019 edition of this publication).

User comments are reviewed regularly, and results of most surveys and consultations are published, including a previous survey of users of HMRC Income Tax statistics. In 2020, an independent review of HMRC’s official statistics was undertaken and recommended a reduction in the ‘number or size’ of published statistics to ensure resources are being used effectively. For this publication, it was suggested to remove projection years from the published tables. In response to the review, HMRC conducted a consultation which concluded that for this publication the projection years would continue to be published alongside the new outturn data.

We are committed to providing impartial quality statistics that meet our users’ needs. While HMRC has regular contact with key users of the Income Tax Liabilities Statistics publication within Government, we would like to improve our knowledge of the use made of this publication, particularly by private sector organisations and individuals. We encourage users to provide feedback on their use of these statistics including their specific requirements, any improvements they would like to see or gaps they have identified, and any decisions they may inform.

5.3 Completeness

It is a legal requirement that all individuals who are liable to Income Tax either pay the tax due through PAYE or through Self Assessment. Penalties exist for non-compliance.

It is likely that there will be Self Assessment cases yet to file a return after the data are drawn from transactional systems, however, allowances are made to account for late filed returns when estimating the likely final grossed population for the tax year. The statistics contained in this report can therefore be considered as complete. More information on the approach taken can be found in the Supporting Documentation for the Personal Income statistics.

6. Accuracy and reliability

6.1 Overall accuracy

These statistics and analyses are based on administrative data and use a sample database that is designed to represent the UK Income Tax paying population. Accuracy is addressed by eliminating non-sampling errors as much as possible through adherence to the quality assurance framework. Moreover, the SPI sampling methodology is constantly reviewed and refined to improve the accuracy and reliability of the sample, and to reduce sampling error.

The key potential sources of error are:

  • individuals entering incorrect information on their Self Assessment return or organisations entering incorrect information submitting PAYE information

  • individuals not completing their Self Assessment return by the required date

  • the stratified sampling process used for the SPI, which reflects the skewed distribution of Income Tax liabilities across the UK population. This is described in sections 4 and 10 of the Supporting Documentation for the relevant year

  • the imputation process for missing age, sex and dividend data, described in detail in Annex B: Coverage of the SPI and missing data in the [Supporting Documentation(https://www.gov.uk/government/collections/income-tax-statistics-and-distributions) for the relevant year

  • the grossing factors which are used to scale the SPI data to the UK Income Tax paying population (see Annex B: Grossing factors in the Supporting Documentation for the relevant year

  • the production of projected estimates for tax years beyond the current SPI, which are based on forecasts and economic assumptions (though RTI and other data are incorporated where possible). The methodology for projected estimates used in these statistics is described in Annex B: Projected estimates for tax years beyond the SPI in the Supporting Documentation for the relevant year

  • mistakes in the programming code used to analyse the data and produce the statistics

6.2 Sampling error

These statistics are produced from the annual SPI, the purpose of which is to create a dataset that is representative of the UK Income Tax paying population that can be used to infer the size of that population and the estimated liabilities of all Income Tax payers. As the SPI is a sample and does not include the whole population of Income Tax payers, estimates drawn from the SPI are subject to sampling variation and will differ from the actual figures purely by chance. A stratified random sample is drawn across the NPS and CESA transactional systems. Cases are categorised by income band and other characteristics. Categories involving higher incomes tend to be sampled more intensively to improve the precision in estimates of total income. Confidence Intervals are published for sub-UK estimates.

To quantify the sampling error associated with the statistics presented in this publication, 95% confidence intervals were calculated. A confidence interval is a range of values within which there is reasonable certainty that the true value lies. A 95% confidence interval means that if the population were sampled repeatedly you would expect to get estimates within the range 95% of the time, and that if the entire population were sampled then there is a 95% probability of the true value lying in that range. The 95% confidence intervals are based on standard error calculations; standard error is a type of standard deviation (a measure of variability) and is a measure of the precision of the sample mean.

6.3 Non-sampling error

Coverage error

The SPI is representative of UK Income Tax payers.

For the sub-sample of individuals drawn from PAYE, a number of data items are not recorded in the administrative tax records. These data items are not collected because they are not needed for the operation of the Income Tax system. These missing data items include:

  • HMRC does not have complete information on age and sex of Income Tax payers. Where no information is held, estimated values are imputed

  • the coverage of investment income for the sample drawn from NPS is incomplete. In order to create a full picture of total income for this survey, it is necessary to impute values of dividends to some sample cases. Where no information at case level is available from HMRC administrative systems, estimated values are imputed to cases so that the population as a whole has amounts consistent with evidence from other sources

  • HMRC does not have complete information about pension contributions. To compile complete estimates for relief at source pensions and total income for the SPI, a significant proportion of the amount of relief at source pension contributions has been estimated using data from external data sources. The estimated value for this and for net pay contributions has been combined with other pension reliefs and included in these statistics

Model errors

Income Tax liabilities in this publication are estimated at case level with the base SPI data using the PTM. The Income Tax modelling process attempts to capture all significant features of the UK Income Tax system, but inevitably this involves certain simplifications and omissions.

The modelling outputs are regularly benchmarked at case level against the Income Tax liabilities that are recorded as due in HMRC’s Self Assessment system. Differences between the outputs and the SPI sub-population Self Assessment data arise for known and specific reasons and only in a small minority of sample cases. The impact of these simplifications is judged to be small for key aggregates at UK level, and for most UK Income Tax payer sub-populations.

Projection errors

There are simplifications and potential errors in both the projection process and the economic assumptions applied to the projection processes, which are likely to induce larger errors in the projection estimates compared with outturn statistics for the most recent SPI and earlier tax years.

The projection methods are described in Annex B: Projected estimates for tax years beyond the SPI in the Supporting Documentation for the relevant tax year. It should be noted that the projection methods are suited to analysis of Income Tax liabilities at UK level. Projections of potential Income Tax payer numbers and incomes by income source are based on UK economic assumptions, which are applied in a broadly uniform manner to all individuals in the SPI sample. They take no account of local divergences in economic trends since the SPI year within the UK or across other dimensions such as industrial sector. Published breakdowns of projected Income Tax payer numbers by country and region (Table 2.2) are therefore indicative, and there is some evidence that they may be subject to potentially large error margins. HMRC is reviewing the evidence and will consider whether regional projections are suitable for continued publication.

In addition, the projections will not capture potentially important shifts in the distribution of incomes occurring after the most recent tax year with outturn data. The projected shares of total income and Income Tax across Income Tax payer income groupings (Table 2.4) are therefore indicative, but do allow for differential growth in earnings across the pay distribution consistent with past trends and possible responses of individuals with high income to changes in the Income Tax policy regime.

Summary statistics describing actual, rather than forecast, errors across key aggregates for projections released following Spring Budgets/Statements since 2001 are provided in Annex C: Projection errors in the Supporting Documentation for the relevant tax year, as well as a measure of the uncertainty in the projections.

Measurement error

Accuracy of the SPI data are based on information from HMRC systems that are used to administer the income tax system, which ensures that the most accurate picture of declared personal incomes is used for the production of these statistics.

Incorrectly entered data may include abnormally small or large incomes or other factors that may skew the distribution of the data and the overall statistics. To mitigate against this, checks are conducted on the SPI database before the statistics are produced and any incorrectly small or large values detected are altered.

Non-response error

Case level non-response arises primarily because of late filing of tax returns. This is dealt with by estimating the likely size of the final population.

Item level non-response refers to when a sample record has incomplete information for some characteristics. Age, sex and postcode are key variables for published statistics and if these items are missing from sample records, others sources are examined.

For most cases in the PAYE system, it is not necessary to know about dividends or certain pension contributions etc. To create a complete estimate of total income for such cases, an imputation process allocates amounts randomly to cases so that population estimates follow pre-determined distributions.

Processing error

It is possible that errors exist in the programming code used to analyse the data and produce the statistics. This risk is reduced through developing a good understanding of the complexities of Income Tax, and regularly reviewing and testing the programs that are used.

6.4 Data revision

Data revision – policy

As per the [United Kingdom Statistics Authority Code of Practice](null for Official Statistics HMRC has published a policy on revisions.

A summary of HMRC’s policy for different types of revisions is outlined below:

  • planned revisions, which usually take place after receipt of expected information or data. Outputs that are subject to scheduled revisions will include an explanation of how these are dealt with

  • unplanned revisions, which can occur when data suppliers missed the original deadline, data was submitted incorrectly, or when errors were made during analysis or processing. In these cases a judgement will normally be made by the Head of Profession for Statistics as to whether the change is significant enough to publish a “revised” statistical release

  • revisions that occur as a result of changes to data source systems or methodology would be planned and where possible would be conducted in consultation with users

Data revision – practice

These statistics are published annually, usually in May or June. Each scheduled release contains new SPI outturn data, revised projections for the two subsequent tax years, and a new projection for the current tax year. Revisions to projection years are normally small and reflect the most recent economic forecast from the Office for Budget Responsibility’s (OBR), as well as any new RTI data.

6.5 Seasonal adjustment

Seasonal adjustment is not applicable for this analysis.

7. Timeliness and punctuality

7.1 Timeliness

The reference period for the Personal Income statistics is the income tax year ending on 5 April. Statistics for the income tax year will normally be published around 23 months after the end of the tax year. The information is drawn from the transactional systems approximately a year after the reference period. This is to allow time for individuals to file their Self-Assessment returns and for PAYE reconciliation. It takes approximately a year to turn the raw dataset into information and commentary ready for publication. The reason is due in part to the time required to complete the data validation, complex analysis and quality assurance.

The projections used in this publication are created once the latest SPI data is finalised and is consistent with the OBR forecast from the Spring fiscal event. It takes a further 3 months to complete the projections process, analysis and quality assurance of these statistics.

7.2 Punctuality

In accordance with the Code of Practice for official statistics, the exact date of publication will be given no less than one calendar month before publication on both the Schedule of updates for HMRC’s statistics and the Research and statistics calendar of GOV.UK.

The full publication calendar or any delays to publication dates can be found on both the Schedule of updates for HMRC’s statistics and the Research and statistics calendar of GOV.UK.

8. Coherence and comparability

8.1 Geographical comparability

Breakdowns of Income Tax payers are available for the UK, country and region. The statistics also detail Income Tax liabilities for the UK. Figures are comparable between the geographic areas.

8.2 Comparability over time

Comparability is to some extent determined by the scope of income tax and allowable reliefs which drives the information available from HMRC administrative systems. The supporting documentation highlights any changes made in methodologies such as the calculation of figures that are presented in the data or data issues (if any). 

8.3 Coherence – cross domain

Estimates from the Income Tax Liability Statistics publication may be compared with some higher level figures from the Personal Incomes statistics tables.

Coherence – sub-annual and annual statistics

All statistics are presented as annual outputs. No coherence issues exist.

Coherence – national accounts

This publication shows income and Income Tax liabilities for each tax year. Income Tax liabilities are amounts of Income Tax due on incomes arising in a given tax year, whereas receipts are amounts of Income Tax paid and collected in a given year. The breakdowns of Income Tax liabilities provided in this publication are not available on a receipts basis.

8.4 Coherence – internal

Rounding of numbers may cause some minor internal coherence issues as the figures within a table may not sum to the total displayed. Effort has been made to ensure totals between tables remain constant where appropriate.

9. Accessibility and clarity

9.1 News release

There haven’t been any press releases linked to this data over the past year.

9.2 Publication

The tables and associated commentary are published on the Income Tax statistics and distributions webpage of GOV.UK.

Tables are published in the OpenDocument format, and the associated commentary as an accessible HTML webpage.

Both documents comply with the accessibility regulations set out in the Public Sector Bodies (Websites and Mobile Applications) (No. 2) Accessibility Regulations 2018. 

Further information can be found in HMRC’s accessible documents policy.

9.3 Online databases

This analysis is not used in any online databases.

9.4 Micro-data access

SPI Public Use Tape microdata are available to approved researchers on the UK Data Service website.

9.5 Other

There aren’t any other dissemination formats available for this analysis.

9.6 Documentation on methodology

Supporting documentation for each annual statistics release is publicly available to users.

9.7 Quality documentation

All official statistics produced by KAI, must meet the standards in the Code of Practice for Statistics produced by the UK Statistics Authority and all analysts adhere to best practice as set out in the ‘Quality’ pillar.

Information about quality procedures for this analysis can be found in section 4 of this document.

10. Cost and burden

Because all necessary data for these statistics is obtained from administrative data sources (NPS and CESA) there is no additional burden on individuals or HMRC tax inspectors to provide information.

It is estimated to take about a year to produce the annual analysis and publication, with input from a small number of analysts across different teams.

11. Confidentiality

11.1 Confidentiality – policy

HMRC has a legal duty to maintain the confidentiality of taxpayer information.

Section 18(1) of the Commissioners for Revenue and Customs Act 2005 (CRCA) sets out our duty of confidentiality.

This analysis complies with this requirement.

11.2 Confidentiality – data treatment

The statistics in these tables are presented at an aggregate level so identification of individuals is not possible.

To make sure no individual taxpayers can be identified, statistical disclosure control (SDC) is applied to cells within tables. SDC is the application of methods to ensure confidential data is not disclosed to parties who don’t have authority to access it.

SDC modifies data so that the risk of data subjects being identified is within acceptable limits while making the data as useful as possible.

Disclosure in this analysis is avoided by applying rules that prevent categories of data containing:

  • small numbers of contributors

  • small numbers of contributors that are very dominant

If a cell within a table is determined to be disclosive, its contents are suppressed either by removing the data or combining categories.

Further information on anonymisation and data confidentiality best practice can be found on the Government Statistical Service’s website.

Our statistical practice is regulated by the Office for Statistics Regulation (OSR). OSR sets the standards of trustworthiness, quality and value in the Code of Practice for Statistics that all producers of official statistics should adhere to. You are welcome to contact us directly with any comments about how we meet these standards by emailing [email protected]. Alternatively, you can contact OSR by emailing [email protected] or via the OSR website.

The Income Tax Liabilities Statistics were independently reviewed by the Office for Statistics Regulation in July 2013. They comply with the standards of trustworthiness, quality and value in the Code of Practice for Statistics and should be labelled ‘accredited official statistics’. Accredited official statistics are called National Statistics in the Statistics and Registration Service Act 2007.

National Statistics are accredited official statistics.