Quality and methodology information (QMI) for BDUK's performance report
Published 13 December 2024
1. About this report
This quality and methodology information report contains information on the quality characteristics of the data as well as the methods used to create it.
The information in this report will help you to:
- understand the strengths and limitations of the data
- understand the methods used to create the data
- help you to decide suitable uses for the data
- reduce the risk of misusing data
2. Contact
Organisation unit – Building Digital UK (BDUK)’s Analysis and Evaluation Function
Name – M McMahon, Head of Reporting and Statistics
Please contact us at [email protected] with any questions or suggestions for improvements.
3. Data and methods information
3.1 Data description
The data is used to produce BDUK’s performance reporting, which presents how many premises in the United Kingdom have received a gigabit-capable broadband connection as a result of BDUK subsidy.
3.2 Source data
We use a range of source data to produce these statistics. Further detail on management information sources for delivery data can be found in “Data collection”.
BDUK uses geographic data from the Office for National Statistics (ONS), Northern Ireland Statistics and Research Agency (NISRA), and Scottish Government. Further detail on these can be found in “Classification systems”.
BDUK uses premises- and postcode-level data from Ofcom to estimate the number of premises that we have subsidised that are in Ofcom’s premises base. We use the most recent Connected Nations dataset available.
A list of the source data versions can be found in each release’s bulletin (for 2024 onwards).
3.3 Statistical unit
Unique Property Reference Number (UPRN) – a unique identifier for an addressable location, used interchangeably with “premises”. This unique number does not change through time.
3.4 Statistical population
BDUK has its own list of premises that are deemed to be in scope for subsidy. These are based on Ordnance Survey (OS) AddressBase Premium classifications. The categories are designed to include only those locations where a gigabit-capable connection would provide benefit (e.g. dwelling or business). For further details on the premises base, BDUK intends to publish a UPRN-level dataset in winter 2024/2025 that will identify which premises are in scope and in BDUK contracts. Alternatively, you may contact the statistics production team at [email protected] for information on the classifications used.
BDUK’s premises base is different from that used by Ofcom. Most notably, we include “child” premises (e.g. apartments in houses of multiple occupancy) whereas Ofcom does not. It is also distinct from other providers of UK coverage estimates, such as ThinkBroadband and as such, coverage estimates cannot be directly compared.
3.5 Reference area
The geographic region covered by the data is the United Kingdom (UK).
3.6 Time coverage
Data covers the time period between 1 April 2012 to 31 March 2024. Delivery of gigabit-capable connectivity commenced under the Superfast programme in 2012, but we have condensed all data for premises connected before 31 March 2021 into one category.
3.7 Data collection
Management information for the Gigabit Infrastructure Subsidy (GIS) intervention, referred to here as “Gigabit contracts”, is submitted monthly by suppliers to BDUK in Stage 2 progress reports as spreadsheets through the Atamis contract management system. BDUK conducts a series of data quality checks on these reports and only reports that pass these checks are used for reporting.
Management information for the hubs intervention comes from different sources. For the Local Full Fibre Network (LFFN) and Rural Gigabit Connectivity (RGC) interventions, data is now sourced from two static and previously assured datasets, as the interventions have completed delivery and fully closed down. For the hubs intervention, data is updated monthly by the project delivery team in a delivery tracker spreadsheet, which is manually checked by analysts.
Management information for the vouchers intervention comes through two sources. BDUK holds management information on which premises have been issued vouchers and where vouchers have been connected. We also hold information on vouchers projects, including the agreed list of expected premises passed. We additionally receive quarterly data from active vouchers suppliers (Ready For Service (RFS) data); this includes premises that have been passed as a result of the vouchers project intervention, but where occupiers have not claimed a voucher. We are aware of data quality issues with vouchers RFS data, and only use data submissions that meet our quality checks for reporting purposes.
Management information for the Superfast programme is provided by suppliers on a quarterly basis through one of two routes: either to the local body with whom the Superfast contract is held, or directly to BDUK, where there is an arrangement to do so. These reports are cumulative and contain all delivery within that contract to date.
3.8 Data validation
For all interventions, each UPRN (if supplied) provided by suppliers is checked to ensure that it is a valid UK UPRN, has a valid passed date (the date the supplier provided delivery to that premises) and that the premises is eligible for BDUK subsidy and in the relevant contract.
3.9 Classification systems
We use multiple classification systems in our tables.
For tables 2 to 5, we classify premises by geography using the June 2024 version of the Office for National Statistics’ (ONS) National Statistics UPRN Lookup (NSUL) and the National Statistics Postcode Lookup (NSPL) in conjunction with the Ordnance Survey’s AddressBase product. We first attempt a match on UPRN, then postcode and finally string extraction from the address. Where none of these have been successful, the premises geography is defined as “Unknown”.
For tables 6a to 6c, we classify premises by urban/rural classification using the NSPL 2011 ONS urban/rural classifications for England and Wales, the Scottish Government Urban Rural Classification 2020 for Scotland, and the Northern Ireland Statistics and Research Agency (NISRA) Settlement Development Limits 2015 for Northern Ireland. The following table shows these classifications.
Urban | Rural | |
---|---|---|
England and Wales | A1 to C2 | D1 to F2 |
Scotland | 1 to 5 | 6 to 8 |
Northern Ireland | A to E | F to H |
For table 7, we classify by whether the premises had Superfast broadband (30 Mbps) before receiving a gigabit-capable connection. For Superfast contracts, we assume that all contracted premises were previously sub-Superfast, as that was the specific intention of the programme; we conservatively assume that overspill premises (see Statistical processing and data compilation for more information on overspill) were not sub-Superfast before receiving a gigabit-capable connection. For Gigabit contracts, the Superfast status of the premises at time of contract is recorded and is used in this categorisation.
For vouchers and hubs premises, we use Ofcom’s Connected Nations data, which is only available from January 2019 onwards and can only be assessed for premises in Ofcom’s premises base. We use the most recent available broadband speed before the date the premises was connected. For vouchers where we use a multiplier approach (see Statistical processing and data compilation for more information on the multiplier approach) and where the Ofcom speed is not available, we assume that the paid-for voucher premises is representative of the additional premises passed, and we use the speed reported by the voucher supplier prior to the upgrade. For any other vouchers or hubs premises where the Ofcom data is not available, we conservatively assume the premises were not sub-Superfast before receiving a gigabit-capable connection.
For table 8, we classify by BDUK intervention. Premises will only be counted in one intervention. Premises are removed from our counts where the premises has benefitted from more than one intervention, keeping the record with the earliest delivery date. The majority of these premises are related to overspill or indirect subsidy rather than BDUK directly paying to provide a connection to a particular premises more than once.
For table 9, we use the Ofcom premises base. For transparency, BDUK has estimated how many of the premises passed by BDUK interventions also form part of the latest Ofcom Connected Nations ‘premises base’. This is the list of premises that Ofcom collect data on and the basis of their reporting of how many UK premises have access to a gigabit-capable service. Ofcom publishes details on their methodology for identifying their premises base.
BDUK interventions use different eligibility criteria from Ofcom’s premises base. Most notably, we include in scope of our programmes ‘child’ premises (e.g. apartments in houses of multiple occupancy). All other figures in the performance report reflect the full scope of our programmes rather than the Ofcom Connected Nations ‘premises base’. See the Statistical processing and data compilation section for more details on data quality and processing for table 9.
3.10 Statistical concepts and definitions
Premises – a UPRN that is eligible for subsidy
Premises passed – premises are classified as passed by gigabit-capable broadband if it is possible to access a gigabit-capable service for the supplier’s standard price and be connected in the supplier’s standard timescale
Overspill – premises that were not directly funded but received connections as a result of nearby BDUK-funded projects
3.11 Statistical processing and data compilation
The data processing methodology for counting premises involves identifying and reporting premises that have received a gigabit-capable connection due to BDUK interventions. This includes both directly funded premises and those classified as overspill (see Statistical concepts and definitions).
The criteria for counting these overspill premises are:
- there must be intentional overspill delivery for the intervention
- BDUK must hold sufficient quality data to measure overspill
- premises must not have received a gigabit-capable connection without BDUK involvement at the time of contract award
All of BDUK’s interventions have measured intentional overspill delivery; this effect is largest for the vouchers and hubs interventions, which were specifically designed to encourage additional build beyond the premises we directly subsidise.
For Superfast, we know that suppliers built to premises en-route to the subsidised premises that they would not have otherwise built. These premises had a subsidy control status of Superfast grey but gigabit white at the time of contract award. For contracts where we receive data from suppliers on the gigabit status of these Superfast grey premises, we count the grey premises as BDUK-delivered where they would not have received a gigabit-capable intervention without the contract. For contracts awarded under phase 1 and phase 2 of the Superfast programme (2012-205), we count all grey premises as BDUK-delivered because gigabit-capable coverage in the UK was under 2.5%, so it is unlikely that any premises at that time would have a gigabit-capable connection. For phase 3 contracts, we only count premises where our own cost model suggests that the premises was uncommercial due to high costs (roughly the most expensive 18% of premises in the UK) and where Virgin Media O2 did not have a footprint that would later be switched to gigabit-capable technology.
For vouchers, the intervention is designed to subsidise uncommercial build through demand aggregation. The additional premises are built into the project and suppliers are expected to build to all the premises in the project, but we only pay directly for a subset of these where the occupier takes a gigabit-capable connection and claims a voucher. Suppliers report to BDUK against a list of premises in each project (expected premises passed), which includes both directly subsided and non-subsidised premises. We check that the UPRNs returned to BDUK by suppliers meet data validation standards, and only count the premises that our own cost model suggests are uncommercial due to high costs. We additionally apply a 5% attrition rate, derived from analysis of Ofcom data, to check that the premises have indeed been built. As of the 2023/24 performance report, 62% of all vouchers premises are produced using this counting methodology.
For vouchers, where we do not receive expected premises passed data from suppliers, or where this data is of poor quality, we apply a multiplier approach. The multiplier approach is based on completed vouchers projects where we have good quality data. From analysis, we are able to estimate how many additional premises are built for each paid voucher. An attrition rate of 18% is built into the multipliers. We last conducted this analysis in summer 2023 and we have not refreshed the multiplier analysis for 2023/2024 because only around 500 premises per year are now estimated using the multiplier approach. The multiplier values are applied based on when the first voucher was issued: before financial year 2019 to 2020 – 2.1; financial year 2019 to 2020 – 2.2; financial year 2020 to 2021 – 2.1; financial year 2021 to 2022 – 1.8; financial year 2022 to 2023 and onwards – 1.7. As of the 2023/24 performance report, 38% of vouchers premises are estimated using this multiplier methodology; most of these are in older financial years when BDUK did not receive high quality data for vouchers projects from suppliers. As more data is submitted to BDUK, the methodology split may change as calculation methods for each project can switch.
For hubs, we currently report the number of hubs that were directly subsidised but we intend to revise using a multiplier approach once our research evaluation into the effectiveness of hubs additionality (the number additional premises built that would not have been built without the hubs intervention) is published.
For Gigabit contracts, we currently report all contracted premises that have been delivered.
We deduplicate UPRNs within and across interventions so any UPRN is only reported once, keeping the record with the earliest date of delivery. Approximately 8,000 premises have been removed from our counts where the premises has been removed from more than one intervention. The majority of these (5,200) are related to indirect subsidy (that is, where we count a premises as overspill) rather than BDUK directly subsiding the premises more than once.
For table 9, we are unable to exactly report the number of premises passed by BDUK that form part of the Ofcom premises base due to data gaps at UPRN level. To estimate the total number of Ofcom premises passed by BDUK we assume that this group of unknown premises have the same percentage split between Ofcom and wider premises as for the group where it is known, grouped by product and delivery year. We therefore estimate the premises passed within Ofcom’s premises base by calculating the percentage of verified premises passed UPRNs which are in the Ofcom premises base, stratified by product and delivery year, and then multiply this percentage by the total premises passed for that product and delivery year. These estimates are then summed to give the total estimated delivery in the Ofcom premises bases.
4. Quality characteristics
4.1 Quality management
BDUK’s quality assurance strategy sets out minimum standards for quality assurance for analytical products and documentation, as well as roles and responsibilities across the organisation. The quality assurance strategy meets the requirements of the Government Functional Standard for Analysis.
4.2 Relevance
We classify our users into the main categories of industry bodies, interested local groups, and the general public. The statistics provided here are developed to meet the key requirements of all of these users. We recognise that not all BDUK data that our users require should be published as official statistics, and we are working to release key datasets at UPRN level to meet the needs of more technical users interested in unaggregated data.
Shortcomings of our data content include the timeliness of our data (that it takes months after a premises is built for us to report that it has been built) and that we are unable to precisely compare our premises base to Ofcom’s. We also recognise that we are unable to precisely determine when a premises first received a gigabit-capable broadband connection, resulting in a likely overcount of premises where an unsubsidised supplier built to a premises commercially before BDUK’s subsidised supplier built.
4.3 Confidentiality
Information published here is not confidential, except for vouchers issued as those relate to individuals taking up a gigabit-capable connection.
Statistical disclosure control is applied to confidential data in line with the Code of Practice for Statistics and relevant legislation.
4.4 Accessibility and clarity
Data is released through data tables (for professional users) and the text of the annual report and accounts (for both professional and occasional users). These data releases are freely available on GOV.UK and are designed to meet the Web Content Accessibility Guidelines (WCAG) 2.2 standard for web accessibility.
4.5 Accuracy and reliability
BDUK is aware of some accuracy issues with its data. These fall into two main categories: supplier error and matching where there is incomplete information.
Supplier error
BDUK is aware of supplier errors, including missing and inaccurate information, in returns to BDUK. We believe these errors are infrequent but are working with our data teams to improve data assurance as our Gigabit contracts deliver more premises. These data errors generally affect data fields that do not directly drive subsidy payments but may be used in reporting (such as the date of build or the build plans for the infrastructure network). We are also aware of issues where suppliers have believed that a subcontractor delivered a premises but later became aware that this was not true, and subsequently amended their returns data to BDUK.
Incomplete information
BDUK’s data sometimes has incomplete information, especially for its older delivery programmes such as Superfast. For Superfast, there are approximately 7,000 premises where the local body has validated Superfast delivery but the supplier report does not hold the date the connection was provided. These premises passed have been attributed to delivery before 31 March 2021 because they were all delivered through Phase 1 and Phase 2 Superfast contracts.
We do not always have access to full UPRN data for our delivery, mainly affecting legacy reporting, and so have to rely on postcode or other data to provide geographic information. In many cases we are able to match to higher levels of geography (e.g. region) but are unable to match to lower levels of geography (e.g. parliamentary constituency). A manual exercise was performed in 2023 to match null UPRNs to specific locations, but gaps remain where there is incomplete or invalid UPRN and postcode information. Where we do not have UPRN-level information and rely on postcodes, we apply a postcode cap. That is, the maximum number of premises we count in a postcode is the number of premises in BDUK’s premises base in that postcode.
BDUK has modelled data for the commerciality of a premises; that is, whether a premises is likely to receive a gigabit-capable connection as part of a commercial build project. As outlined in the Statistical processing and data compilation section, we use our own cost model to determine the relative commerciality of premises, which treats 18% of premises as uncommercial due to high costs. We also have incomplete data on which supplier has first built to a premises, making it challenging for us to determine whether the BDUK-subsidised supplier was the first to build to a premises. This is an area of active analysis and research and we are planning to refine our estimates in future; this is likely to result in a small downwards revision to our figures because we may find that a non-BDUK-subsidised supplier built to a premises before the BDUK-subsidised supplier.
For the 2023/24 performance report, BDUK is awaiting adequate quality data from some Superfast suppliers. As a result, we expect to revise the Superfast data for quarter 4 (January to March 2024) upwards (by approximately 500 premises) when we receive complete data.
As we receive more RFS data, which in some cases relates to historical financial years, we are likely to have an increased number of premises with unknown speeds before intervention. This is because with older vouchers data, for which we use the multiplier approach, we apply the speed of the connected premises to the additional multiplied premises. With RFS data, we need to match to Ofcom speed data for each individual premises, which we do not have for every RFS premises.
4.6 Timeliness
BDUK has a time lag for its external reporting due to the time taken to receive data from suppliers, and the statistical production time. Superfast data has the longest time lag; the data arrives 20 working data after the quarter ends, and there are frequently data issues that require manual cleansing or supplier clarification for issues such as rounded or invalid UPRNs and future delivery dates. Final data is expected 40 working days after quarter end.
BDUK is looking to improve the time lag in future by publishing quarterly. We aim to reduce our overall time lag to six weeks after the end of the quarter.
4.7 Coherence and comparability
BDUK is unable to directly compare its data with Ofcom data for two main reasons:
- BDUK’s premises base differs from Ofcom’s premises base; the primary distinction is that our base includes “child” premises (e.g. apartments in houses of multiple occupancy) whereas Ofcom’s does not. As such, it is not possible to calculate BDUK’s contribution towards UK gigabit connectivity by taking the number of premises subsidised by BDUK and dividing by the size of Ofcom’s premises base.
- BDUK does not have access to UPRN-level data from Ofcom’s collections, meaning that we can only reliably look at postcode-level data from Ofcom to compare to our UPRN-level data.
BDUK’s data also cannot be compared directly with ThinkBroadband’s coverage estimates because their raw premises-level coverage data is not available, and their premises base is not publicly defined.
4.8 Cost and burden
We estimate that producing the performance report takes about 50 days full time equivalent. We are seeking to improve efficiency through automating data pipelines, improving the clarity of requirements for supplier data submissions
4.9 Data revision
Revisions will be conducted under the Department for Science, Innovation and Technology (DSIT) revisions policy with the advice of the Head of Profession for Statistics at DSIT.