Official Statistics

DCMS Economic Estimates 2020 (provisional): Earnings – Technical and quality assurance report

Published 26 November 2020

1. Introduction

This document sets out the data sources, methodology, definitions, and quality assurance processes underlying the DCMS Economic Estimates: Earnings 2020 publication.

This publication provides estimates of the earnings of employees in DCMS Sectors in the UK for 2019 and 2020, and estimates of the Gender Pay Gap (GPG) in DCMS Sectors.

The sectors for which DCMS has responsibility are:

  • Civil Society
  • Creative Industries
  • Culture
  • Digital
  • Gambling
  • Sport
  • Telecoms
  • Tourism

1.1 What is reported?

These statistics cover the following areas:

DCMS sector earnings - this looks at the breakdown of median weekly earnings by employment status e.g. full time and part time, age, sub-sector and by place of work (English regions, Scotland, Wales and NI)

Gender pay gap - looking at the percentage difference between men’s and women’s earnings in the digital sector (based on hourly pay excluding overtime)

Some data have not been reported because at high levels of granularity, sample sizes are generally very small and likely to give misleading estimates.

The accompanying data tables also provide the same breakdowns of annual gross pay.

Finalised figures are disseminated as Excel tables and a written report (which includes written text and graphs) published on GOV.UK at DCMS Economic Estimates

1.2 Code of Practice for Statistics

These statistics were produced in accordance with the requirements of the Code of Practice for Statistics.

These statistics are classed as experimental as it is the first time DCMS has introduced analysis on earnings in all DCMS sectors. DCMS plans to widen this analysis in the future to include other sources of earnings data, in particular to enable analysis by other demographics, including disability and ethnicity. This work is a response to customer interest after the publication of Digital Sector earnings estimates for 2018, and remains a priority.

1.3 Data and sources

This release presents analysis on median weekly earnings for DCMS sector employees based on the Annual Survey of Hours and Earnings (ASHE) dataset. This dataset is provided by the Office for National Statistics (ONS), and is the most detailed and comprehensive source of earnings information in the UK.

Provisional (p) results for 2020 are available and are subject to change once finalised in late 2021. Data for 2019 are final. The estimates in the publication are consistent with national (UK) estimates, published by ONS.

1.4 Data collection

The ASHE dataset includes information about the levels, distribution and make-up of earnings and hours paid for employees in all industries and occupations across the UK.

Businesses are surveyed in April of each year. The survey uses a random sample of 1% of all employee jobs from HM Revenue and Customs’ (HMRC’s) Pay As You Earn (PAYE) system, taken in January of the reference year. The sample is drawn in such a way that many of the same individuals are included from year to year, thereby allowing longitudinal analysis of the data.

Since ASHE is a survey of employee jobs, it does not cover the self-employed or any jobs within the armed forces. Given the survey reference date in April, the survey does not fully cover certain types of seasonal work, for example, employees taken on for only summer or winter work.

Validation is carried out on returned data that is regarded as incomplete or potentially inaccurate, based on automatic comparisons with data for similar jobs or against data for the same job in previous years. In these cases, respondents may be re-contacted by ONS in order to verify the information that has been provided.

1.5 ASHE 2020

In 2020, the ASHE survey covered the pay period including 22 April, during the COVID-19 pandemic. This meant that:

  • The usual sample size achieved was around 25% smaller than usual, at 136,000 jobs compared to 180,000 in other years. This means that estimates of earnings for 2020 have greater uncertainty than usual.
  • At the time the survey was carried out approximately 8.8 million employees (in the whole UK economy) were furloughed under the Coronavirus Job Retention Scheme (CJRS).
  • Pay estimates include furloughed employees and are based on actual payments made to the employee from company payrolls and the hours on which this pay was calculated, which in the case of furloughed employees are their usual hours.
  • The CJRS funded 80% of normal pay, to a maximum of £2,500 per month, and employers were permitted to top up pay to normal levels but were not required to.
  • Due to challenges in matching some jobs to the ASHE sample, the furlough estimates in ASHE represent an undercount of approximately 20% against administrative sources. The Office for National Statistics considers that the information is useful for earnings comparisons but not for counting furloughed employees.

Further information can be found in the ASHE methodology document.

1.6 Summary of data sources

In summary, the data presented in this report on earnings:

Are based on official statistics data sources.

Are based on internationally-harmonised codes, meaning the estimates are:

  • Comparable at both a national and international level.
  • Comparable over time, allowing trends to be measured and monitored.
  • Subject to limitations of the underlying classifications of the make-up of the UK economy. For example, the standard industrial classification (SIC) codes were developed in 2007 and have not been revised since. Emerging sectors, such as Artificial Intelligence, are therefore hard to capture and may be excluded or mis-coded.

Are based on survey data (Annual Survey of Hours and Earnings) and, as with all data from surveys, there will be an associated error margin surrounding these estimates.

Sampling error is the error caused by observing a sample (as in a survey) instead of the whole population (as in a census). While each sample is designed to produce the “best” estimate of the true population value, a number of equal-sized samples covering the population would generally produce varying population estimates.

This means we cannot say an estimate calculated from a sample of, for example, 20% is very accurate for the whole population. Our best estimates, from the survey sample, suggest that the figure is 20%, but due to the degree of error, the true population figure could perhaps be 18% or 22%.

This is not an issue with the quality of the data or analysis; rather it is an inherent principle when using survey data to inform estimates.

The following data sources were used in the production of earnings estimates for DCMS sectors:
Annual Survey of Hours and Earnings (ASHE) (published 3 November 2020)

Latest Tourism Satellite Account (published 4 November 2020 by the ONS Tourism Intelligence Unit)

The following data sources were used to provide additional context and information:

Business Impact of Covid-19 Survey (BICS) (published 21 May 2020)

The ONS BICS survey is an experimental statistics publication based on responses from a new voluntary fortnightly business survey. For the period 20 April to 3 May 2020, the survey was sent to around 18,500 UK businesses and achieved a response rate of 33.5% (6,196).

Estimates for this period were unweighted and should be treated with caution. They have been used here to provide an indication of how the proportion of furloughed employees varied by industry.

1.7 Method and definitions

The definition of the DCMS sectors and occupations are consistent with DCMS definitions of these sectors. These are discussed in the section of this report headed DCMS sector definitions.

The ONS’s definition of earnings is the payment received by employees in return for employment. Most analyses of earnings consider only gross earnings, which is earnings before any deductions are made in light of taxes (including National Insurance contribution) and benefits. Further information is available from the ONS publication: A guide to sources of data on income and earnings

The data tables accompanying the earnings report use three different measures of earnings. The filters used are consistent with ONS analysis:

  • Weekly pay is used for most of the analyses in the report. The weekly filter is employees on adult rates whose earnings for the pay period were not affected by absence. Additionally, employees who do not have a valid work region and who are less than 16 years old are filtered out because the age and region variables are required for weighting.
  • Hourly pay excluding overtime is used to calculate the GPG, and uses the same filters as weekly pay.
  • Annual gross pay is provided as data tables. The annual filter is employees on adult rates who have been in the same job for more than one year. Additionally, employees who do not have a valid work region and who are less than 16 years old are filtered out. Employees with missing or zero annual gross salaries are also removed.

1.8 Mean vs median

The headline statistics for ASHE are based on the median rather than the mean. The median is the value below which 50% of employees fall. It is ONS’s preferred measure of average earnings as it is less affected by a relatively small number of very high earners and the skewed distribution of earnings. Thus, the median provides a better indication of typical pay than the mean.

1.9 Strengths and limitations of the data

The ASHE data used for this analysis are robust and have a number of strengths:

  • Size and coverage - the ASHE dataset contains information on approximately 180,000 jobs in all industries, occupations and regions, making it the most comprehensive source of earnings information in the UK and enabling a vast range of analyses.
  • Quality - alternative sources of earnings information such as the Labour Force Survey (LFS) rely on self-report or proxy data, which are known to be less reliable than information from employers’ administrative systems.
  • Uniqueness - for many uses, ASHE is the main data source and for some uses it is the only data source.

But there are some limitations of which users should be aware:

  • Due to data collection difficulties during the 2020 COVID-19 pandemic, the sample achieved in the 2020 ASHE was about 25% smaller than usual, at 136,000 jobs.
  • Analyses presented here have been calculated on a consistent basis in DCMS. Due to minimal differences in the methodology and analysis used to calculate the median, results in this report may not match the ONS published results, in particular when looking at further breakdowns to some data e.g. by region or age. These differences are small but should be treated with caution.
  • Lack of personal demographic information - characteristics such as ethnicity, religion, education, disability and pregnancy are not recorded in the ASHE dataset.
  • The quality of estimates at low levels of disaggregation can be poor.
  • The dataset does not cover those who are self-employed.
  • Definitions of DCMS sectors have their own limitations that are explained more fully in the ‘DCMS sector definitions’ section of this report.

A fuller description of the strengths and limitations of the Annual Survey of Hours and Earnings (ASHE) can be found in the Quality and Methodology Information report and the Guide to sources of data of earnings and income.

1.10 DCMS sector definitions

DCMS uses a range of definitions based on internal or UK agreed definitions. More details of the definition used for each sector are set out below. With the exception of civil society, all definitions are based on the Standard Industrial Classification 2007 (SIC) codes.

This means nationally consistent sources of data can be used and enables international comparisons. The use of SIC codes also allows DCMS to estimate which parts of the economy are included in multiple sectors and avoid double counting (see methodology note Table 2.2 for SIC allocations to DCMS sectors).

This approach of developing individual sector definitions as the policy areas were added to the departmental portfolio has meant that there is overlap between DCMS sectors.

For example, the Cultural Sector is defined using SIC codes that are all within the Creative Industries except for 4 SIC codes (manufacture of musical instruments, retail sale of music and video recording in specialised stores, reproduction of recorded media, and operation of historical sites and buildings and similar visitor attractions), whilst the Telecoms Sector is completely within the Digital Sector.

Civil Society

In July 2016, DCMS took on responsibility for the Office for Civil Society, which is responsible for charities, voluntary organisations or trusts, social enterprises, mutuals and community interest companies. The Civil Society sector is not like a traditional industry and therefore data are not readily available in the usual data sources. Where possible, data are provided from official sources.

Earnings estimates for Civil Society capture employees of those businesses with a legal status of Not for Profit, across all industries.

Creative Industries

See the Creative Industries methodology note for more details on limitations, including specifics on crafts, music, fashion and computer games.

Culture

There are significant limitations to the DCMS definition of the Cultural sector due to the limited granularity of the standard industrial classifications. There are many cases where culture forms a small part of a different industry classification and therefore cannot be separately identified and assigned as culture using standard data sources. DCMS consulted on the definition of Culture and published a response in April 2017.

The Heritage sector is defined in our estimates by one SIC code “91.03 Operation of historical sites and building and similar visitor attractions”. As the balance and make-up of the economy changes, the international SIC codes used here are less able to provide the detail for important elements of the UK economy related to DCMS sectors. It is therefore recognised that the published estimates are likely to be an underestimate for the Heritage sector.

Digital

The DCMS definition used for the Digital Sector is based on the internationally comparable OECD definition. UK government, in an effort to better define the Digital Economy, has expanded on the OECD definition, using additional SIC codes. However, it does not allow consideration of the value added of “digital” to the wider economy e.g. in health care or construction. DCMS policy responsibility is for digital across the economy and therefore this is a significant weakness in the current approach.

Gambling

The definition of gambling used in the DCMS Sectors Economic Estimates is consistent with the internationally agreed definition, SIC 92, Gambling and betting activities.

Sport

The definition of Sport used in the earnings report includes only SIC codes which are predominately sport.

DCMS also publishes estimates of sport based on the EU agreed Vilnius definition. The Vilnius definition is a more comprehensive measure of sport which considers the contribution of sport across a range of industries, for example sport advertising, and sport related construction.

The DCMS Sport Satellite Account is based on an EU agreed methodology. However, due to the time lag with the sport satellite account and further development required to make the sport satellite account replicable on an annual basis, the statistical definition is being used in this publication of estimates for DCMS sectors to allow the contribution of sport to be considered in a way which is consistent with the other sectors.

Telecoms

The definition of telecoms used in the DCMS Sectors Economic Estimates is consistent with the internationally agreed definition, SIC 61, Telecommunications. Please note that as well as appearing as a sector on its own, Telecoms is also entirely included within the Digital Sector as one of the sub-sectors.

Tourism

Tourism is defined by the characteristics of the consumer in terms of whether they are a tourist or resident. This, therefore, differs from “traditional” industries such as gambling or telecoms which are defined by the goods and services produced themselves, and means that a different approach to defining the industry must be used.

To estimate earnings in the Tourism sector, ratios calculated from the Tourism Satellite Account (TSA) are applied to the ASHE weightings. This allows an estimate of the earnings of those directly employed in the Tourism industry.

1.11 External Data Sources

It is recognised that there are always different ways to define sectors, but their relevance depends on what they are needed for. Government generally favours classification systems which are:

  • rigorously measured,
  • internationally comparable,
  • nationally consistent, and
  • ideally applicable to specific policy interventions.

These are the main reasons for DCMS constructing sector classifications from Standard Industrial Classification (SIC) codes.

However, DCMS accepts that there are limitations with this approach (outlined above) and alternative definitions can be useful where a policy-relevant grouping of businesses crosses existing Standard Industrial Classification (SIC) codes. DCMS is aware of other estimates of DCMS Sectors.

These estimates use various methods and data sources, and can be useful for serving several purposes, e.g. monitoring progress under specific policy themes such as community health or the environment, or measuring activities subsumed across a range of SICs.

1.12 Quality assurance processes

This chapter summarises the quality assurance processes applied during the production of the DCMS Economic Estimates 2020: Earnings. This includes a detailed account of the quality assurance processes and the data checks carried out by our data providers (Office for National Statistics, ONS) as well as by DCMS.

Quality Assurance Processes at ONS

The data underpinning this release are taken from the Office for National Statistics (ONS) Annual Survey of Hours and Earnings (ASHE). ASHE provides information about the levels, distribution and make-up of earnings and paid hours worked for employees in all industries and occupations.

Quality assurance at ONS takes place at a number of stages. The various stages and the processes in place to ensure quality at each stage are outlined below. This information is taken from the ASHE quality information report and should be credited to the ONS.

Sampling and data collection

ASHE is based on a 1% sample of employee jobs taken from HM Revenue and Customs (HMRC) Pay As You Earn (PAYE) records. The sample is matched against the ONS’ Inter-Departmental Business Register (IDBR) in order to obtain contact and address details for the employers. Information on the hours paid and earnings of employees is obtained from employers and treated confidentially.

The sample is drawn in such a way that many of the same individuals are included from year to year, thereby allowing longitudinal analysis of the data. Please note that ASHE does not cover the self-employed, nor does it cover employees not paid during the reference period.

A specific date in April is chosen so that all respondents refer to the same point in time. This reference date is not the same every year. Given the survey reference date in April, the survey does not fully cover certain types of seasonal work, for example, employees taken on for only summer or winter work.

A copy of the ASHE questionnaire is available and includes detailed instructions on how to complete (and return) them.

Response rate and imputation

The ASHE dataset contains information on approximately 180,000 jobs in all industries, occupations and regions, making it the most comprehensive source of earnings information in the UK and enabling a vast range of analyses. See the ‘ASHE 2020’ section of this report for information on how the survey differed in 2020

Sometimes respondents return forms with only some of the variables completed. This means that different variables have different levels of response within the returned dataset. If left, this would mean that the proportions of each variable returned for each job type would be different and therefore the weights required for each variable would be different.

Instead of using different weights for different variables ONS impute the missing values so that the same weights can be used for all the variables.

The imputation method used for ASHE is donor imputation. To impute for a missing value in a record, this method looks for another record with similar characteristics and uses the value from the donor record for the missing variable. This ensures that the distribution of each variable within the imputed data set is similar to the distribution in the un-imputed dataset.

Weighting

Weighting is used to compensate for types of job that are under-represented in the ASHE dataset due to poor response. Because some types of respondents are more or less likely to respond than others, the data is not always representative of the general population. Therefore different weights are applied to different types of respondents, so that when the weights are added together, each type will have a proportion consistent with their proportions in the general population.

For example, if respondents of type A are generally poor responders, each record of type A will need a relatively large weight so that they can collectively represent all the type A people in the population.

The general population proportions used by ASHE to calculate its weights come from the Labour Force Survey (LFS) and the types are determined by classifying people by age group, sex, occupation and a regional split.

Disclosure

Statistical disclosure control is applied to all outputs produced from ASHE. This ensures that information attributable to an individual or individual organisation is not identifiable in any published outputs. The Code of Practice for Statistics and, specifically, the pillar on trustworthiness, sets out principles for how we protect the identities of respondents from being disclosed.

Firstly, to protect individual earnings data, a frequency count is taken and all cells that are based on a count of fewer than three individuals are suppressed. Secondly, to protect employers’ pay information, a dominance rule is applied within each cell, which uses the contribution from the largest employer and the overall standard error of the estimate to deduce whether information about the employer can be derived with a reasonable degree of certainty.

Given the nature and complexity of ASHE outputs it is not possible to use a practical method to check for issues of secondary suppression. Instead, ASHE applies a policy where no sample counts are released, only weighted sample counts rounded to the nearest 1,000. This gives users enough information about the sample size for a cell for them to make quality inferences, without giving sufficient information to derive data by difference with any degree of certainty. Although in some circumstances a figure can be derived by difference, it would be impossible to tell how many individuals contributed to the figure.

Sampling errors

This occurs because estimates are based on a sample rather than a census. ASHE estimates this error through coefficients of variation (cv) which are published alongside all ASHE outputs. The cv is the ratio of the standard error (se) of an estimate to the estimate itself, expressed as a percentage. Generally, if all other factors are constant, the smaller the cv the higher the quality of the estimate.

It should be noted that at low levels of disaggregation, high coefficients of variation imply estimates of low quality. For example, for an estimate of £400 with a cv of 10%, the true value is likely to lie between £321.60 and £478.40. This range is given by the estimate plus or minus 1.96 multiplied by the se. Where these ranges for different estimates overlap, interpretation of differences between the relevant domains becomes more difficult.

Non-sampling errors

ASHE statistics are also subject to non-sampling errors. For example, there are known differences between the coverage of the ASHE sample and the target population (that is, all employee jobs). Jobs that are not registered on Pay As You Earn (PAYE) schemes are not surveyed. These jobs are known to be different from the PAYE population in the sense that they typically have low levels of pay.

Consequently, ASHE estimates of average pay are likely to be biased upwards with respect to the actual average pay of the employee population.

Non-response bias may also affect ASHE estimates. This may happen if the jobs for which respondents do not provide information are different from the jobs for which respondents do provide information. For ASHE, this is likely to be a downward bias on earnings estimates since non-response is known to affect high-paying occupations more than low-paying occupations.

Finally, ASHE results tables do not account for differences in the composition of different “slices” of the employee workforce. For example, figures for the public and private sectors include all jobs in those sectors and are not adjusted to account for differences in the age, qualifications or seniority of the employees or the nature of their jobs, all factors that may affect how much employees earn.

Returns

Various procedures are in place to minimise errors in returned data. Returns undergo a range of checks that include validation against previous returns and expected values, selective editing (a technique for prioritising suspicious values for follow-up based on their impact on published results) and re-contacting businesses for verification. Similar checks are also made at the aggregate level for main results.

1.13 Quality Assurance Processes at DCMS

The majority of quality assurance of the data underpinning the Earnings in the Digital Sector release takes place at ONS, through the processes described above. Once ONS have ensured all their in-house data checks, the data required by DCMS are sent via secure transfer. Further quality assurance checks are then carried out within DCMS.

Production of the analysis and report is typically carried out by one member of staff, whilst quality assurance is completed by at least one other, to ensure an independent evaluation of the work.

Data requirements

The ONS do not publish the ASHE data at the granularity required to calculate the median annual pay for employees in DCMS sectors. DCMS discussed data requirements with ONS and these are formalised as a Data Access Agreement (DAA).

The DAA covers which data are required, the purpose for accessing the data, and the conditions under which ONS provide the data. Discussions of requirements and purpose with ONS improved the understanding of the data at DCMS, helping us to ensure we receive the correct data and use it appropriately.

Checking of the data delivery

The data is delivered to DCMS as an SPSS file once the ONS have published their latest earnings release. For this particular release we check that:

  • We have received all data at the 4 digit SIC code level, which is required for us to aggregate up to produce estimates for our Digital Sector and sub-sectors.
  • Data at the 4 digit SIC code has not been rounded unexpectedly. This would cause rounding errors when aggregating up to produce estimates for our sectors and sub-sectors.
  • Data for the correct year has been included
  • The number of rows in the data seem sensible compared to previous year’s data.

Data analysis

At the analysis stage, data are aggregated to produce information about the DCMS sectors and sub-sectors, as well as the overall UK economy. The lead statistician builds in the following checks at this stage:

  • Checking that the analysis for the overall UK economy matches that of the ONS’s published outputs. This includes analysis by various demographics (such as age, gender, work region, gender pay gap)

  • “Sense checks” of the data, for example, do the median earnings in the DCMS sectors and sub-sectors look similar to last year?

  • Making sure it is not possible to derive sensitive data from the figures that will be published, especially at lower aggregations e.g. earnings by gender and by work region.

Quality assurance of data analysis

Once analysis is complete, the producer hands over to the quality assurers to carry out further checks of the analytical work completed.

The lead statistician documents the checks needed on the R syntax, tables and report and the quality assurers document the outcome of the checks and any other feedback. After the publication, the quality assurance processes are reviewed to ensure they are relevant and comprehensive. The checks cover:

  • Ensuring the correct data have been used for the analysis e.g. has the 2018 data been used to derive the 2018 figures, or has the 2017 data been used by mistake?
  • Checking that the correct SIC codes and SOC codes (used for occupations) are all accounted for and no codes are missing or included by accident.
  • Checking the syntax picks up the correct variables e.g. checking the annual pay variable is used and not the weekly pay variable for analysis on gross annual earnings.
  • Sense check of percentage change figures to the previous year – do they look sensible?
  • Cross checking report figures with the published tables.
  • Checking that the charts link to the correct data.
  • Making sure any statements made about the figures (e.g. regarding trends) are correct according to the analysis.

1.14 Further information

For enquiries on this release, please email [email protected].

For general enquiries contact:

Department for Digital, Culture, Media and Sport
100 Parliament Street London
SW1A 2BQ

Telephone: 020 7211 6000

DCMS statisticians can be followed on Twitter via @DCMSInsight.

The Economic Estimates of DCMS Sectors release is an Experimental Official Statistics publication and has been produced to the standards set out in the Code of Practice for Statistics.