Data quality report: national flu and COVID-19 surveillance report
Updated 4 November 2024
Applies to England
About this report
This report explains the data sources used for the national flu and COVID-19 surveillance official statistics reports published by the UK Health Security Agency (UKHSA).
This data quality report assesses the quality of the data used to produce these statistics, in line with the 6 data quality dimensions in the Government Data Quality Framework.
About the statistics
The national flu and COVID-19 surveillance reports contain information from surveillance systems which are used to monitor COVID-19 caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), influenza, and diseases caused by seasonal respiratory viruses in England.
Geographical coverage: England
Publication frequency: Weekly during winter reporting period, fortnightly during summer.
Contact
Lead analysts: Respiratory virus team
Contact information: [email protected]
Laboratory surveillance
Laboratory surveillance is comprised of 2 main systems: notifiable test reporting through the Second Generation Surveillance System (SGSS) and the Respiratory DataMart sentinel system.
Data sources
Second Generation Surveillance System (SGSS)
SGSS stores and manages laboratory test result information for notifiable infectious diseases and antimicrobial resistance from diagnostic laboratories in England. Notifiable diseases are infections that may present a risk to human health, which include those deemed clinically significant or of epidemiological relevance, and those related to ongoing public health emergencies. SGSS data is used for COVID-19 case reporting, as well as the calculation of COVID-19 and influenza positivity.
COVID-19 case reporting
COVID-19 cases in England are monitored using an episode-based definition to include possible reinfections. Each infection episode, beginning with the earliest positive test date, is counted separately if there are at least 91 days between positive test results (polymerase chain reaction (PCR) or lateral flow device (LFD)).
The national flu and COVID-19 surveillance report includes the number and rate of confirmed COVID-19 cases where swab testing was carried out in NHS hospitals (largely covering healthcare workers and those in hospital with clinical need).
Further data on COVID-19 cases is available on the UKHSA dashboard.
Influenza positivity
Positivity reports on the proportion of individuals who test positive for influenza among all individuals tested for the respective organisms. Positivity is presented as positivity by PCR testing only. In reports from week 16 2023 onwards, this is presented as a 7-day rolling average, with the number of individuals testing positive during the preceding 7 days divided by the number of individuals tested during the preceding 7 days through PCR testing.
To ascertain accuracy in the positivity metric, we exclude laboratory data from trusts that have not reported negative tests in any week within the last year. This ensures that data is being included from trusts who have consistently and accurately reported test samples and avoids inflation of positivity rates.
The Unified Sample Dataset (USD)
USD stores all SARS-CoV-2 positive, negative and void test results reported to SGSS, Respiratory DataMart, and UKHSA laboratories. Data from the USD is used for the calculation of COVID-19 positivity.
Positivity reports on the proportion of individuals who test positive for SARS-CoV-2 among all individuals tested for the respective organisms, as described previously for influenza positivity. COVID-19 positivity follows the same exclusion criteria of NHS trusts as influenza positivity; additionally, we only include PCR testing carried out in NHS hospitals (largely covering healthcare workers and those in hospital with clinical need).
Changes to testing policies over time may affect positivity rates and incidence rates and should be interpreted accordingly. From 1 April 2022, the government ended provision of widespread community COVID-19 testing in England, as outlined in the plan for living with COVID-19. Routine asymptomatic testing through NHS settings was paused from 31 August 2022. Further changes in COVID-19 testing policy are in effect since 1 April 2023, which ended all PCR testing outside of NHS settings and routine symptomatic testing for staff and residents in social care settings and other settings including prisons, homelessness and refuge settings, and asylum settings.
Relevance
Laboratory surveillance through SGSS and USD provides an overall summary of laboratory-confirmed COVID-19 and influenza activity in the population. Samples are mainly derived from laboratories in secondary care settings and therefore indicative of activity among populations at risk. Demographic information collated on sample forms and further data linkage allow for activity to be monitored across different populations (for example, by age, sex, ethnicity and deprivation status), localities (by region) and time periods of interest. This allows to inform public health planning and actions particularly during the winter months.
Strength and limitations of the data
Laboratory surveillance provides timely detection and accurate data on the activity of respiratory viruses. Data collected in SGSS could also be linked to demographic information such as age, sex, ethnicity, and region. This means we are able to better monitor activity in populations at risk.
Laboratory data follows a slight lag due to collection and reporting requirements. However, as the lag is about 2 to 3 days, the data can be considered near real-time. Additionally, samples are currently only being collected in secondary care settings, which may represent more at-risk populations and may not fully reflect activity in the community.
Data quality
Accuracy
Laboratory surveillance is based on well-validated PCR testing occurring in NHS trusts.
Completeness
Completeness of laboratory surveillance depends on the completeness of laboratory reporting forms and completeness of linkage between laboratory data and electronic health record data. Missing data on date of test, onset date, or inability to link to electronic health record data can affect the completeness of positivity indicators or composite indicators that provide a proxy of incidence.
Uniqueness
COVID-19 laboratory data used in counts and incidence rates are deduplicated at the episode-level, where each infection episode, beginning with the earliest positive test date, is counted separately if there are at least 91 days between positive test results. Infectious organism, specimen date, and patient details including NHS number, forename, surname, hospital number, date of birth, sex, and postcode are used for deduplication.
Influenza and COVID-19 laboratory data used in positivity calculations are deduplicated at the person-day level, where each record indicates how many tests are attributed to a person each day and how many of these tests were positive. NHS number, forename, surname, date of birth, and date of test are used for deduplication.
Consistency
Laboratory data in SGSS and USD collected in hospital settings are linked to demographic data from the NHS, which is also used in other healthcare IT systems across England, ascertaining consistent patient data.
Timeliness
Laboratory data are reported on a daily basis but follow a lag as samples need to be transported to the relevant trust laboratory, processed, and reported. Typically, there is a lag of around 2 to 3 days between the date a sample is taken and the date the result is reported.
Validity
Validity checks are undertaken, for example, the sample date reported should be before the laboratory received date. Laboratory surveillance is based on well-validated PCR testing occurring in NHS trusts.
Respiratory DataMart
The Respiratory DataMart sentinel system was initially set up in 2009 to automate the collection of all influenza A(H1N1)pdm09 laboratory testing information in England. It is now an important sentinel laboratory surveillance tool, monitoring all major respiratory viruses in England, with 17 laboratories contributing data at present through weekly automatic electronic outputs. This includes the national reference laboratory (Respiratory Virus Unit (RVU), UKHSA Colindale), regional public health laboratories based on hospital sites, and NHS partner laboratories.
Participating laboratories test swabs for respiratory viruses using real-time polymerase chain reaction (RT-PCR), though not all laboratories test for or report all viruses. Tests that DataMart records results for include:
- SARS-CoV-2
- influenza
- rhinovirus
- parainfluenza
- adenovirus
- human metapneumovirus
- respiratory syncytial virus (RSV)
Samples are predominantly from hospital patients, rather than community or primary care, particularly for those who have been admitted or are in the process of being admitted.
Relevance
As denominator data is available (the total number of patients tested for each virus) we can examine trends in the proportion of samples positive for each virus on a weekly basis. Importantly, this functions as a pseudoprevalence indicator that is relatively stable over time (week to week and year to year) and is less prone to being affected by changes in use of testing that can cause apparent changes in disease prevalence or incidence which can affect indicators that are based on counts or rates.
Most testing reported in Respiratory DataMart is done in patients with a clinical indication for a respiratory virus test.
Note that as the Respiratory DataMart system is based on a sample of sentinel laboratories, positivity figures may differ from those obtained through other surveillance systems.
Strength and limitations of the data
DataMart is a sentinel system rather than comprehensive.
As is the case across the health system, viral typing and subtyping is incomplete.
Historically, many SARS-CoV-2 tests have been done as part of admissions screening. This caused, for those periods, a relative suppression of the positivity (by increasing the denominator amongst patients less likely to test positive) compared to other viruses in which testing is generally only done for clinical diagnostic reasons, for example, patients with acute respiratory infections, notwithstanding the impact of pandemic control measures of viral transmission in general.
Data quality
Respiratory DataMart has a range of measures in place to strengthen data validity; however it is a surveillance system operating in near real time on clinical data for the purposes of public health monitoring; quality aspects such as timeliness and meeting information needs are as important as validity.
DataMart reported testing is by PCR, which has high specificity for each pathogen and high sensitivity for detection of pathogen nucleic acid.
There is some deduplication within patient episodes to avoid over-reporting where a patient has multiple tests within the same clinical episode. Deduplication can additionally be performed between datasets such as DataMart and SGSS for uses such as vaccine effectiveness analysis by UKHSA scientists.
Data are received daily, then aggregated and reported weekly. As this is live clinical data, retrospective updates are common and positivity may commonly change slightly in the weeks immediately following initial reporting.
Community surveillance
Data sources
Acute respiratory infection (ARI) incidents
Information on acute respiratory infection (ARI) incidents is based on situations reported to UKHSA health protection teams (HPTs) and entered onto the Case and Incident Management System (CIMS).
These include confirmed outbreaks of acute respiratory infections (2 or more laboratory-confirmed cases of SARS-CoV-2, influenza or other respiratory pathogens) linked to a particular setting, as well as situations where an outbreak is suspected. All suspected outbreaks are further investigated by the HPT in liaison with local partners. Respiratory sampling to identify the virus involved is encouraged.
Relevance
ARI outbreak data provides an insight into the presence of ARIs in various institutional settings, including care homes, hospitals, educational settings, and prisons. It provides an indication of the number of confirmed and suspected outbreaks across these settings, as well as the proportion of outbreaks caused by various pathogens, including influenza and SARS-CoV-2.
In the national flu and COVID-19 surveillance reports, we present:
- the number of ARI incidents by principal setting
- the number of ARI incidents in all settings by virus type
Data is presented for the previous 52 weeks up to the current reporting week.
It should be noted that the incidents captured on CIMS represent a subset of all ongoing ARI outbreaks and clusters in England, rather than an exhaustive listing.
Strengths and limitations of the data
Accuracy of ARI incident data is dependent on the quality of reports in CIMS, and we understand that reporting practices vary significantly between HPTs.
Several caveats should be considered when interpreting ARI incident data:
- the incidents captured on CIMS represent a subset of all ongoing ARI clusters and outbreaks in England, rather than an exhaustive listing
- the denominators for different settings vary significantly (that is, there are fewer hospitals than workplaces) as does the propensity to report incidents to UKHSA. As such, comparisons between settings on the basis of this data are not advised
- prior to July 2024, ARI incidents were recorded in HPZone; a previous case and incident management system. From 2 July, HPTs began to transition to the new CIMS system, with the last HPT expected to complete the transition to CIMS in September 2024. Any interpretation of seasonal and temporal trends since 2 July 2024 (week 27 2024) should consider the likelihood of differences in reporting of ARI incidents by HPTs due to this change
- in addition, SARS-CoV-2 testing policies and public health guidance for different settings has changed over time (for example, from 1 April 2023, changes to COVID-19 testing came into effect) and any interpretation of seasonal and temporal trends since March 2020 should take this into account
Data quality
Accuracy
See strengths and limitations above. The virological result, primary context and data week data are manually reviewed during the data cleaning process to ensure data are as accurate as possible. Free text fields are reviewed where possible to confirm accuracy of other fields.
Completeness
The incidents captured on CIMS represent a subset of all ongoing ARI clusters and outbreaks in England, rather than an exhaustive listing. Propensity to report incidents to UKHSA varies significantly between settings. Therefore, comparisons of ARI outbreaks between settings using this data are not advised.
Incident reports are manually reviewed during the data cleaning process, and any missing fields are completed where they can be confirmed using information and context from elsewhere in the record.
For the purposes of this report, incidents are assigned to a specific pathogen only if confirmation of a positive virological test can be identified. ARI incident data typically shows a high proportion (around 50%) of incidents where the infectious agent has been categorized as ‘not available / not tested’. This occurs in instances where:
- virological testing has not been performed or has been performed but has not returned a positive result for any pathogen
- virological testing has returned a positive test, but this has not been recorded in CIMS, or has been recorded in a field that could not be extracted for the purposes of this report
Uniqueness
ARI incident records are manually reviewed each week, and any outbreaks reported with the same postcode in the same week are investigated further, with the aim of ensuring each outbreak is included in the data only once. This process ensures minimal duplication of records.
Consistency
There are no further data sources available against which ARI incident data can be checked. However, each weeks’ data is compared to the previous week to ensure consistency in ARI data across the season.
Timeliness
Data are received from CIMS on Monday mornings and include incidents entered into CIMS over the previous 7 days. The report will usually be published 3 days after the data was extracted.
In the majority of incidences, the recorded outbreak start date is the same as the date the record was entered into CIMS.
Validity
Validity checks are undertaken during the data cleaning process (for example, all dates converted to dd/mm/yyyy date format and confirmed to fall within the expected range).
FluSurvey
FluSurvey is an online surveillance system based on the Influenzanet platform, which is a European wide initiative which includes 11 other European countries. It was developed to monitor self-reported respiratory symptoms, social contact patterns and health service use in the UK general population in near-real time through a weekly survey of registered participants. It was originally set up by the London School of Hygiene and Tropical Medicine (LSHTM) during the 2009 H1N1 pandemic and is now managed by UKHSA.
FluSurvey typically operates during the autumn and winter periods when seasonal respiratory infections are expected to be most common. However, participants supported year round surveillance during the COVID-19 pandemic.
As a voluntary participatory surveillance system, FluSurvey is open to adult residents aged 18 years and over who are willing to report the presence or absence of symptoms related to respiratory infections. To encourage participation, there is no specific population sampling.
In addition to reporting demographic, geographic, socioeconomic, and health data at registration, participants are sent weekly reminders via email to report any symptoms relating to flu or other respiratory illnesses that they may have experienced. Those reporting symptoms can also report any health service use as a result of their symptoms. During the 2023 to 2024 surveillance period, there were over 2,500 registered participants. To register or find to out more, visit FluSurvey.
The indicators presented in the 2024 to 2025 weekly report includes participants residing in England only. These indicators are calculated using the weekly symptoms data:
1) Influenza-like-illness (ILI), as an indicator of symptoms related to influenza in the community.
Counts and percentages of weekly ILI cases are calculated in each week using the European Centre for Disease Control (ECDC) ILI case definition of a sudden onset of symptoms with at least one of fever (chills), malaise, headache, muscle pain and at least one of cough, sore throat, shortness of breath. This indicator will also reflect other respiratory infections with compatible presentations.
Counts are additionally transformed into rates per 1,000 participants by dividing the number of participants meeting the ILI case definition by the number of participants completing the weekly symptoms questionnaire and multiplying by 1,000.
2) Fever or cough, as an indicator of broader respiratory virus activity (compared to ILI)
Counts and percentages of participants reporting fever or cough are calculated in each week.
Counts are additionally transformed into rates per 1,000 participants by dividing the number of participants reporting fever or cough by the number of participants completing the weekly symptoms questionnaire and multiplying by 1,000.
3) Health service use and type among participants meeting the ILI case definition, as an indicator of the proportion of participants that are subsequently seen in a healthcare setting.
Counts and percentages of participants reporting any health service use are calculated in each week among participants meeting the ILI case definition. The type of healthcare use (phoning 111, phoning GP, visiting GP, visiting hospital) is additionally reported. Where participants report contact with multiple health services, secondary care will be indicated over primary care use and physical attendance to primary care will be indicated over the use of remote services (for example, telephoning their GP or 111).
Relevance
FluSurvey data are provided by members of the general population without limiting to those seeking healthcare, enabling us to understand respiratory disease activity in the community.
Strengths and limitations of the data
FluSurvey is a fast, near real-time surveillance system, providing an early indicator of respiratory virus activity in the community. It is an important tool for surveillance as not all people experiencing symptoms of respiratory illnesses visit healthcare services. However, we note that ILI is a broad symptomatic indicator and does not exclude other respiratory infections such as COVID-19. In addition, users may not be representative of the general population. The number of participants completing weekly symptoms surveys may also vary and could be affected by onset of a recent illness.
Data quality
Accuracy
The accuracy of the data collected is reliant on participant’s recollection of their symptoms in the reporting week and hence could be affected by recall bias. However, as participants are asked to enter symptoms experienced in the previous 7 days, the length of time between symptom onset and reporting may be reduced, minimising this effect.
Completeness
The indicators included in the report are based on self-reported symptoms and participants can select no symptoms where relevant. All participants responding to the symptoms survey in that week are included in analyses. The number of participants reporting in each week may vary based on new registrations and the regularity of participation over the season. During the 2023 to 2024 season, the average weekly participation was 62.8% among all registered participants.
Uniqueness
Deduplications occur as part of the weekly data analysis to ensure that participant symptom data are only included once in a reporting week using the unique user ID assigned upon registration.
Consistency
Weekly data does not contain personal identifiers and consistency cannot be checked against external data sources. However, the survey was designed to methodically measure symptoms of respiratory illnesses and subsequent use of healthcare services and consistency between related questions can be checked.
Timeliness
As a community surveillance system, FluSurvey can provide one of the earliest indicators of respiratory illness activity in the population. In addition, weekly symptoms data are extracted and analysed in the subsequent week allowing for timely insights on activity in the community.
Validity
As the indicators are based on the participant’s selection of self-reported symptoms which cannot be compared to other data sources, therefore no validity checks can be performed at the individual level.
FluDetector
UKHSA’s predecessor organisation, Public Health England, worked with University College London (UCL) to assess the use of internet-based search queries as a surveillance method for ILI in England. Combining natural language processing and machine learning techniques, a non-linear Gaussian process model was developed by UCL to produce real-time estimates of ILI. This work on early-warning surveillance systems for influenza was developed through the Engineering and Physical Sciences Research Council (EPSRC) Interdisciplinary Research Collaboration (IRC) project i-sense.
The supervised model was trained on historical data from the Royal College of General Practitioners (RCGP) sentinel surveillance scheme (data was from the 2005 to 2006, to 2016 to 2017 seasons at national level). It produces daily ILI estimates based on the proportion of ILI related search queries within a 10 to 15 percent sample of all queries issued, and is extracted daily from Google’s Health Trends API.
Further information on the FluDetector model is available online.
Syndromic surveillance systems
Syndromic surveillance is the process of collecting, analysing and interpreting health-related data to provide an early warning of human or veterinary public health threats that require public health action.
UKHSA’s real-time syndromic surveillance team (ReSST) coordinates several national syndromic systems by collecting and analysing anonymised health data from several sources, looking for trends indicating higher-than-usual levels of illness and publishing bulletins to keep public health professionals up to date.
Primary care surveillance
Data sources
Primary care surveillance is run by the Oxford RCGP (Royal College of General Practitioners) Research and Surveillance Centre (Oxford-RCGP RSC) in collaboration with UKHSA. The RSC is a national sentinel surveillance system of around 2,000 GP practices covering over 19 million registered patients of all ages across England. Two components of primary care surveillance are included in the report and are outlined below:
1) Syndromic surveillance
This is used to monitor the rate of clinical presentations to primary care of a possible acute respiratory tract infection (ARI). The ARI diagnosis definition is subclassified into ILI, exacerbation of chronic respiratory disease (ECLD), lower respiratory tract infection (LRTI) and upper respiratory tract infection (URTI). Coded electronic health record data are provided by the EMIS and TPP SystmOne clinical systems.
GP consultation rates per 100,000 population are calculated as the number of coded condition (for example, ILI or URTI) divided by the number of patients registered in the included practices per week.
To aid interpretation of the rates of influenza like illness and comparison with previous years, the UK has adopted a standardised method of reporting influenza activity, the Moving Epidemic Method (MEM), also used by the European Centre for Disease Prevention and Control.
The MEM method uses historical data to evaluate the timing and duration of an influenza epidemic through a series of cut points, baseline, low, medium, high and very high thresholds. The initial baseline threshold, once breached, typically denotes the start of influenza activity or circulation with the breach of subsequent thresholds denoting the intensity of influenza activity in a particular season.
This method allows for comparability between the countries of the UK and with other European countries as we know there are differences in the sensitivity of these reporting systems, due for example to differences in patient consulting behaviour. It is also important to note that since the COVID-19 pandemic, healthcare-seeking behaviours have changed, affecting patient propensity to consult their GP.
The current MEM thresholds used in the weekly national influenza and COVID-19 surveillance report are based on data from the 2016 to 2017 to the 2023 to 2024 seasons. Data from 2020 to 2021 and 2021 to 2022 seasons have been excluded from these calculations due to low activity. The MEM thresholds for this year are:
- baseline: <8.54
- low: 8.54 to <16.27
- medium: 16.27 to <38.66
- high: 38.66 to <56.68
- very high: 56.58+
2) Virological surveillance
A subset of Oxford-RCGP RSC network practices (around 300) take part in weekly virology surveillance. Practices collect nasopharyngeal samples from patients presenting to their GP with symptoms of any ARI with an onset date within the last 10 days. This may include samples taken in a clinical setting or self-samples submitted through Take a Test UK. The samples are sent to the UKHSA Reference Laboratory and tested for seasonal coronaviruses (NL63, 229E, OC43 and HKU1), SARS-CoV-2, Influenza (A H1N1, A H3N2 and B), RSV (A and B), human metapneumovirus (hMPV), adenovirus, human rhinovirus, and enterovirus.
Samples where influenza, SARS-CoV-2, adeno or RSV viruses are detected may be analysed further: a subset undergoes genetic characterisation by whole genome sequencing using Illumina technology to investigate breadth of circulating viral strains – if complete genomes are generated from these, the information is shared with the international surveillance network by uploading sequence data directly onto GISAID (for influenza, RSV) or the generated sequences are also deposited onto CLIMB and ENA (SARS-CoV2). Genetic data of influenza is further analysed and screened for mutations in the virus neuraminidase (NA) and the cap-dependent endonuclease (PA) genes known to confer neuraminidase inhibitor or baloxavir resistance, respectively.
Furthermore, influenza and RSV viruses are cultured from suitable clinical materials. Influenza viruses that can be cultivated in sufficient quantity undergo haemagglutination inhibition (HI) assays (antigenic analysis). This data is used to compare how similar the currently circulating influenza viruses are to the strains included in seasonal influenza vaccines, and to monitor for changes in circulating influenza viruses. The interpretation of genetic and antigenic data sources is complex due to a number of factors, for example, not all viruses can be cultivated in sufficient quantity for antigenic characterisation, so that viruses with sequence information may not be able to be antigenically characterised as well.
Positivity is calculated as the number of individuals testing positive divided by the number of individuals tested during a week.
Linkage of the clinical data and the virological data may be used to run composite indicators that provide a proxy for the incidence of primary care attendances due to specific pathogens.
Relevance
Primary care syndromic and virological surveillance provides an early indication of respiratory virus activity in the population. It also provides an indication of the different respiratory pathogens responsible for different clinical syndromes presenting to primary care at different time periods and different populations. This may be used to inform NHS winter planning and interventions such as vaccination or use of antivirals for example, the incidence of primary care ARI influenza attendances.
Strengths and limitations of the data
Primary care syndromic surveillance can provide one of the earliest indicators of a change in respiratory virus activity as patients typically present earlier to primary care than to secondary care and it does not rely on laboratory testing which may follow a lag. Nevertheless, it is a non-specific indicator and may not be able to differentiate presentations due to one pathogen from another or presentations due to non-infectious causes.
Virological surveillance follows more of a lag but is pathogen specific. Furthermore, all samples are tested for a panel of respiratory viruses at the national reference laboratory. This allows the relative burden of different respiratory viruses to be assessed. It is the only dataset where all influenza, RSV and adenovirus positive samples are typed and, if appropriate, subtyped. Furthermore, genetic and antigenic characterization is used to assess how closely circulating respiratory viruses match those included in seasonal vaccination programmes and provides additional information on sensitivity to antiviral drugs.
Data quality
Accuracy
Accuracy of the syndromic data depends on the quality of clinical coding. Where information in the clinical record is written in free text and not coded or miscoded this may affect the accuracy of syndromic indicators. First episodes and new episodes of illness are identified using an episode typing algorithm, as episode type is not always recorded in the clinical data or may be unreliable due to defaults in the configuration of clinical systems. RCGP-RSC and UKHSA run regular training sessions with practices in the network to ensure the accuracy of coding.
The virological surveillance is based on test results obtained using in-house developed, validated and verified PCR platforms and sequencing algorithms undertaken at the national reference laboratory which hold ISO 15189:2022 accreditation status.
Completeness
Completeness of the syndromic data depends on the completeness of clinical coding. Where information in the clinical record is written in free text and not coded this will not be reflected in the syndromic indicators. RCGP-RSC and UKHSA run regular training sessions with practices in the network to improve completeness of coding.
Valuesets for identifying ARI are updated periodically to reflect major SNOMED CT updates.
Completeness in the RCGP RSC data may also be affected by extraction issues and/or delays in extracting the data from GP practice clinical systems. For each GP practice, the completeness and quality of data received for a particular ISO week is assessed and practices are excluded from calculations of incidence rates in any weeks where data quality thresholds have not been met. These weekly practice level checks take account of known import errors and involve identifying days with missing data, the number of patients for whom data has been received, and the total number of monitored events. Threshold limits are used to identify practices where the total monitored event rate low was too low or too high, which are then excluded from the extract.
Completeness of virological surveillance depends on the completeness of laboratory reporting forms and completeness of linkage between laboratory data and electronic health record data. Missing data on onset date or inability to link to electronic health record data can affect the completeness of positivity indicators or composite indicators that provide a proxy of incidence.
Uniqueness
Deduplication occur as part of the weekly data analysis to ensure that sample tests are only counted once within a week, therefore there should be minimal duplication of records. Surname, forename, date of birth, NHS number, sex, week number, year and sample results are used for deduplication.
Patients are included in the aggregate clinical data if they have a regular registration at the GP practice and are registered at the practice at the start of the relevant ISO week. Patients registered as temporary patients are excluded.
Consistency
Electronic health record data is anonymised and consistency cannot be checked against other data sources.
The presentation data reported each week may change over time as data for previous ISO weeks are recalculated each week. This allows historic data for new practices that have joined the network to be included and also for practices whose data extraction was delayed and patients whose data was recorded on a later date to be included in subsequent extracts. Patients who opt out of data sharing will cease to be included in the extract once their opt out status has been received.
Data in the RCGP RSC database might also have changed slightly since the data were first reported due to edits/corrections made to medical records or corrections to patient month/year of birth.
Timeliness
Syndromic data are reported on a weekly basis. Data are received on Tuesday afternoon and Wednesday morning and include data up to the preceding Sunday. The report will usually be published about 2 days after the data was extracted. There are typically lags of 3 to 7 days between an individual developing symptoms and the date they present to primary care.
Virological data are also reported weekly but follow a greater lag as the sample needs to be transported to the national reference laboratory, tested and reported. Typically, there is a lag of around 2 to 4 days between the date a sample is taken and the date the result is received.
Validity
Validity checks are undertaken for example, the sample date reported should be before the laboratory received date and after the symptom onset date.
Test results are not reported until technical and clinical validation is complete. Failed runs are rejected, and samples re-tested from original material. Molecular typing/subtyping results must be in concordance with sequencing results.
Clinical presentation data are assigned to an ISO week according to the event date recorded in the primary care records, rather than the date it was recorded in the clinical system.
Secondary care surveillance (SARI Watch)
Data sources
The severe acute respiratory infection (SARI) Watch surveillance system was established in 2020 to report the number of laboratory-confirmed influenza, COVID-19 and RSV cases admitted to hospital and critical care units (ICU or HDU) in acute NHS trusts in England. The number of extracorporeal membrane oxygenation (ECMO) admissions are also collected through enhanced surveillance. SARI Watch integrated and replaced surveillance systems used in previous seasons – the UK Severe Influenza Surveillance System (USISS) and the COVID-19 Hospitalisations in England Surveillance System (CHESS). SARI Watch operates mandatory and sentinel collections. The mandatory collections require all acute NHS trusts in England to report data. The sentinel collections are voluntary with data reported by a subset of NHS trusts.
Aggregate level data are submitted weekly by acute NHS trusts in England. A week is based on the ISO week system running from Monday to Sunday. Counts are summed and converted to rates per 100,000 by linking to catchment populations, derived by the Office for Health Improvement and Disparities (OHID), for participating trusts in that week. Overall national (all ages), age-specific and region-specific rates are calculated.
The following collections are used for the national weekly reports.
COVID-19
Collection: new weekly COVID-19 admissions admitted to hospital (all levels of care) and ICU-HDU:
- counts of test confirmed SARS-CoV-2 cases admitted to hospital (all levels of care inclusive of ICU-HDU) by age group
- counts of test confirmed SARS-CoV-2 cases admitted to ICU or HDU by age group
This is a mandatory collection, launched in March 2020. Currently, reporting is operating all year round. COVID-19 admission activity has not settled into a typical winter seasonal pattern and the year-round vigilance remains important for this relatively new pathogen.
Individuals admitted with a pneumonia, acute respiratory infection or influenza-like-illness with test-confirmed SARS-CoV-2 infection can be included (PCR, molecular point of care or lateral flow device are acceptable for case confirmation).
Influenza
The weekly numbers by influenza subtype and type are presented alongside rates to provide further insight into dominant influenza subtypes affecting admissions.
MEM thresholds are applied to influenza rates for each collection, based on data from historic seasons. The thresholds indicate the impact and intensity of influenza activity in secondary care (baseline, low, medium, high and very high). The current MEM thresholds used in the weekly national influenza and COVID-19 surveillance report are based on data from the 2016 to 2017 to the 2023 to 2024 seasons. Data from 2020 to 2021 and 2021 to 2022 seasons have been excluded from these calculations due to low activity.
Collection: new weekly influenza cases admitted to ICU-HDU:
- counts of test confirmed influenza cases admitted to ICU-HDU by age group and influenza subtype/type
This is a mandatory collection, launched in the 2011 to 2012 season. Reporting operates from week 40 (approximately October) to week 20 (late May) in the following year.
Individuals admitted with a pneumonia, acute respiratory infection or influenza-like-illness and with test-confirmed influenza infection can be included (PCR, molecular point of care or antigen tests are acceptable for case confirmation).
The MEM thresholds for this year are: baseline: <0.10; low: 0.10 to <0.25; medium: 0.25 to <0.62; high: 0.62 to <0.93 and very high: 0.93+
Collection: new weekly influenza cases admitted to hospital (all levels of care):
- counts of test confirmed influenza cases admitted to hospital (all levels of care inclusive of ICU/HDU), by age group and influenza subtype/type
This is a voluntary collection, launched in the 2011 to 2012 season, with sentinel trusts initially selected using stratified random sampling. There are approximately 25 to 30 participating trusts, varying between seasons. Reporting operates from week 40 (approximately October) to week 20 (late May) in the following year.
Individuals admitted with a pneumonia, acute respiratory infection or influenza-like-illness and with test-confirmed influenza infection can be included (PCR, molecular point of care or antigen tests are acceptable for case confirmation).
The MEM thresholds for this year are: baseline: <1.77; low: 1.77 to <4.29; medium: 4.29 to <11.92; high: 11.92 to <18.74 and very high: 18.74 and over.
RSV
Collection: new weekly RSV admissions by level of care
- counts of test confirmed RSV cases admitted to hospital (excluding ICU-HDU) by age group
- counts of test confirmed RSV cases admitted to ICU-HDU by age group
This is a voluntary collection, launched in the 2017 to 2018 season. Sentinel trusts already participating in the sentinel aggregate influenza hospitalisations collection were invited to participate in RSV surveillance. There are approximately 20 to 25 participating trusts, varying between seasons. In the 2024 to 2025 season, reporting will operate from September to April in the following year (previously this was from October to May).
Individuals admitted with a pneumonia, acute respiratory infection or influenza-like-illness and with test-confirmed RSV infection can be included (PCR or molecular point of care tests are acceptable for case confirmation).
ECMO
Collection: new adult admissions to commissioned severe respiratory failure centres (SRF) offering ECMO
ECMO provides temporary life support to individuals with life threatening conditions affecting the heart and lungs. This collection was launched as an enhanced surveillance in 2014 to 2015 season and includes data on patients admitted to 7 SRF centres in the UK (6 in England and 1 in Scotland). Reporting is all year round with centres asked to input data as soon as possible following admission.
SRFs report all cause admissions collected under four categories: test confirmed ARI, non-infection (such as cardiac, asthma or trauma), suspected ARI and sepsis of non-respiratory origin. This severe cohort typically comprises small numbers.
Relevance
All SARI Watch indicators represent indicators of severe disease due to influenza, COVID-19 and RSV requiring hospital admission. The near real time reporting provides timely indications of the epidemiology of respiratory viruses and their impact on hospital activity. The data may additionally be used to contribute to monitoring on the impact of countermeasures (such as vaccination), assist monitoring of winter pressures on acute health services at a national level, and to inform policy at all levels.
Strengths and limitations of the data
One of the strengths of the SARI Watch surveillance system is that it uses near real time data, with granular information such as counts by age group and influenza type/subtype. This allows the data to be rapidly scrutinised to assess the timing and intensity of these respiratory viruses on hospital admission activity during the winter period. Early signals of changing epidemiology triggers clinical and public health actions to minimise impact on patients and NHS hospitals.
For the mandatory SARI Watch collections (influenza ICU-HDU and COVID-19 admissions), although complete trust coverage is not achieved, participation is high enough to provide an indicative epidemiological picture. For instance, in the 2023 to 2024 season, roughly 75 percent of trusts reported to the mandatory collections. To increase coverage, non-participating trusts have been contacted and asked to improve their engagement in the 2024 to 2025 season.
Although SARI Watch’s influenza surveillance systems are unique in collecting influenza subtype information, the proportion of influenza A that is not subtyped has been increasing in recent years. This caveat should be considered when interpreting influenza A subtype distributions as this is based on a subset of the data.
The sentinel influenza and RSV collections are based on a subset of trusts. Participating trusts are distributed across England, but some regions may be under-represented (West Midlands and South East in the 2023 to 2024 season). Trust participation may additionally vary by week and season with the weekly number of participating trusts indicated in the surveillance report. However, these sentinel collections can still provide an indication of activity that is likely to be experienced more widely in NHS hospitals.
Trends in hospital and critical care admission rates need to be interpreted in the context of changes in testing recommendations and practices. Therefore, comparisons with historic SARI Watch seasons should be interpreted with this in mind. In recent years there has also been wider implementation of rapid molecular point of care tests for influenza in hospital settings. From a public health surveillance perspective, it is important to consider a step change in influenza case ascertainment in more recent years.
Data quality
Accuracy
All trusts are given a detailed user guide that includes standard definitions and additional reporting instructions. Accuracy of SARI Watch data depends on the application of these definitions and local data pipelines within trusts. Data quality checks are additionally performed to identify any implausible values. These include checks for large differences in the context of previous weeks for that trust and checks to ensure data were inputted in the correct field. Discrepancies are queried directly with trusts prior to weekly analyses and trusts are requested to correct these if errors have occurred. If an error is detected and not corrected in time, the data from that trust for that week is excluded from analysis until corrected. Once corrected, the data will be retrospectively updated for that week.
Completeness
Trusts are only able to submit complete data for each SARI Watch module described using data available to them at the time of submission. Zero counts can be entered to denote a nil return which are included in analyses.
Trusts are able to include retrospective updates for the following reasons: if additional cases were identified after submission; if there is belated participation; or if corrections are needed.
Retrospective updates usually occur for the most recent week but preceding weeks may also be subject to change. For that reason, the most recent weeks are considered provisional.
Uniqueness
Guidance is provided to sites on how to submit data to ensure counts represent a single patient in a reporting week for that collection.
Consistency
The SARI Watch rates for influenza, RSV and COVID-19 are compared with metrics from other data sources for these viruses. This allows us to assess compatibility in the progression of the winter epidemic.
Timeliness
Data is received from trusts by Monday or Tuesday afternoon and includes data up to the preceding Sunday. The report will usually be published approximately 2 days after the data was extracted. As indicated above, SARI Watch data is provisional and subject to retrospective updates.
Validity
Validity checks are built in to ensure the data submitted is of the correct data format.
Microbiology surveillance
Data sources
SARS-CoV-2-variants
Whole genome sequencing in England
UKHSA conducts genomic surveillance of SARS-CoV-2 lineages, providing an overview of new and current circulating lineages in England, and the prevalence of lineages amongst sequenced episodes of SARS-CoV-2 infections. Information on whole genome sequencing (WGS) coverage is also published.
WGS is a laboratory procedure used to analyse entire genomes from PCR-tested SARS-CoV-2 samples taken from infected individuals and has been used to identify SARS-CoV-2 lineages.
The UK established the COVID-19 Genomics UK Consortium (COG-UK) in April 2020 to provide large-scale, rapid WGS for SARS-CoV-2 through a network of academic, hospital, and public health laboratories. When testing was widely available to the general population up to April 2022, sampling for WGS mainly covered hospitalised cases and hospital staff, cases linked to international travellers, national core priority studies, and region-based sampling within the community. After the end of community-wide testing, the sequencing strategy prioritised hospitalised cases, patients receiving antiviral therapy, and national core priority studies. As of June 2024, samples for sequencing are procured only from sentinel NHS pathology sites.
WGS data is uploaded onto the centralised Cloud Infrastructure for Microbial Bioinformatics (CLIMB) database, and is made open access internationally through release to the Global Initiative on Sharing All Influenza Data (GISAID).
Lineage assignment
Sequences are assigned a particular SARS-CoV-2 lineage using pangolin assignment, a computational tool that has been developed to assign the most likely lineage to a given SARS-CoV-2 genome sequence.
During the process of defining lineages of SARS-CoV-2, a new lineage may be designated based on a small number of nucleotide changes in the sequence that are common to a group of sequences. There have been over 4,500 lineages of SARS-CoV-2 defined by pangolin v.1.30.
As a lineage grows and spreads it may be defined into further sub-lineages, not all of which will reach high prevalence in the UK. The continued creation of sub-lineages of sub-lineages has led to an aliasing system to try to simplify the naming conventions of the lineages.
To allow for the effective tracking and visualisation of the lineages UKHSA has defined a lineage grouping system, which includes the following to define a lineage group:
-
all lineages that have been previously declared as significant variants from the UK have their own lineage group. This includes lineages that are no longer circulating in the UK but would be identified as their own group in the sequencing data
-
lineages that reach a threshold of 10% prevalence within the last month of the sequencing data have their own lineage group defined for data for 2024 onwards
-
lineages that do not reach the 10% prevalence threshold would be part of their parent or ancestor lineage group depending on which reached the required prevalence threshold.
-
all parent recombinants would have their own lineage group unless sub-lineages of recombinants have been previously declared as a significant variant
Linking genomics and test data
Data on SARS-CoV-2 PCR-confirmed tests is collected in SGSS and deduplicated into 90-day episodes as described in the laboratory surveillance section. Cases with positive PCR samples that have been sequenced are linked to WGS data procured from the CLIMB database that have been assigned lineages via pangolin, processed and quality-assessed by the UKHSA Genomics Public Health Analysis (GPHA) team. This data is used to report on lineage prevalence, as relative percentages of lineages detected among cases that have had a sequenced positive sample.
Relevance
Surveillance of SARS-CoV-2 lineages provides a better understanding of how the virus is evolving, and which lineages may be responsible for changes in COVID-19 incidence, transmission, and severity. Data on lineages is used to inform public health policy and interventions such as vaccination and testing programmes.
Strengths and limitations of the data
Accurate WGS and assignment of lineages allows for high quality data on what lineages of the virus is circulating within the infected population. Samples are mainly derived from laboratories in secondary care settings and therefore indicative of activity among populations at risk; however, this may not fully reflect activity in the community. Trends in lineage prevalence will be delayed as sequenced data follows a lag of 2 to 3 weeks due to collection, sequencing, and reporting requirements.
Data quality
Accuracy
Lineage surveillance is based on well-validated PCR testing occurring in select sentinel NHS pathology sites, and high-quality sequencing occurring in UKHSA laboratories. Lineage assignment done through pangolin has been validated globally by researchers and public health agencies and continues to be quality assessed by the UKHSA GPHA team on a regular basis.
Completeness
Completeness of lineage sequencing depends on the quality of samples collected used for sequencing. Lower quality samples will yield incomplete and low-quality sequences that will unlikely be assigned a lineage. Inability to link sequence data to sample and patient data will also affect completeness of demographic data used to inform lineage prevalence among COVID-19 cases, as well as granular trends in lineage prevalence, such as by age or region.
Uniqueness
Positive PCR samples are deduplicated to the episode-level as described in the laboratory surveillance section, ensuring that data from any sequenced samples is retained per episode.
Consistency
Only high-quality sequenced SARS-CoV-2 samples are included in lineage surveillance, reducing the risk of inconsistent lineage assignment between samples for the same individual in the same period of time.
Timeliness
Sequenced data also follows a lag of 2 to 3 weeks due to collection, sequencing, and reporting requirements; hence, reported data can only be updated fortnightly and there will be a systematic delay in lineage trends.
Validity
Only high-quality sequenced SARS-CoV-2 samples are included in lineage surveillance, reducing the risk of samples being assigned historic lineages incorrectly. Additional sample date checks are in place to minimize the risk of incorrect lineage assignment.
Vaccine uptake
Data sources
Influenza vaccine uptake
Influenza vaccine uptake data is collated throughout the vaccination programme and is reported in November through to March. Vaccinations occur from 1 September (for children and pregnant women and a small number of exceptions (for example, those due to commence chemotherapy), or 3 October (for older adults, clinical risk groups, frontline healthcare workers) through to 31 March. Publications on a weekly or monthly basis during this period typically include data on vaccine uptake in:
- GP patients for those aged 65 years and over, those in clinical risk groups, pregnant women, and children aged 2 and 3 years (reported weekly (October to January) and monthly (November to March))
- frontline healthcare workers (HCWs) working in direct patient care (reported on a monthly basis from November to March)
- school-aged children, in the cohorts eligible each season (reported on a monthly basis from November to February)
Cohorts eligible for vaccination each winter are set out in the annual flu letter.
Further details on the methodology (for example, data sources, data validation, data limitations) used for seasonal influenza vaccine uptake for all GP patients are outlined in the previous seasons annual report. Further details are also given in the background information of that report.
Further details on the methodology used for seasonal influenza vaccine uptake for school-aged children are outlined in the previous seasons annual report. Further details are also given in the background information of that report.
Further details on the methodology used for seasonal influenza vaccine uptake for frontline healthcare workers are outlined in the previous seasons annual report. Further details are also given in the background information of that report. There is an explainer document that summarises the differences between the UKHSA and NHSE data collections for healthcare workers.
COVID-19 vaccine uptake from the immunisation information system (IIS)
COVID-19 vaccine uptake statistics are derived from the immunisation information system (IIS), formally known as National Immunisations Management System (NIMS), which itself derives data from the Data Provisioning Service (DPS) at NHS England (NHSE). This information is published in the weekly national influenza and COVID-19 surveillance report. IIS is an individual-level vaccine and eligible population record system containing around 68 million primary care registered individuals in England since 2020. Unlike similar systems, IIS offers improved update frequency (for example, daily) and depth of sociodemographic information (for example, sex, ethnicity, regional deprivation, etc.). As a result, analysis can be conducted in near real time or at the time of interest, and at high granularity (for example, LSOA, NHS commissioning region, etc.).
Data on COVID-19 vaccine in frontline HCWs is published separately in the COVID-19 vaccine uptake in frontline healthcare workers monthly reports, as is the NHSE/UKHSA explainer on HCWs. For comparable COVID-19 and influenza vaccine uptake, please see the Seasonal flu and COVID-19 vaccine uptake in frontline healthcare workers monthly data reports.
Relevance
The COVID-19 vaccine uptake surveillance system provides an indication of COVID-19 vaccine coverage across England, stratified by populations of interest. It is used for the assessment of vaccine catch-up campaigns, planning and intervention by both clinicians and policy makers.
Strengths and limitations of the data
As previously mentioned, IIS receives high granularity patient identifiable information (PII) on a daily basis. This strength allows for monitoring of changes to vaccine coverage in real time allowing decision makers to adapt to unexpected situations or exploit opportunities in for example vaccine catch-up campaigns where vaccinating low coverage areas is not proceeding as expected, in a rapid manner. In addition, high granularity PII allows for the identification and targeting of highly specific and poorly vaccinated groups for intervention, reducing the risk of outbreaks from these groups and improving herd immunity.
However, in contrast to its strengths, IIS has several limitations. Firstly, IIS only contains primary care data (that is, NHS number assigned individuals). This means that those outside of the healthcare system, which incidentally are those most likely to be unvaccinated, are not assessed for their vaccine status. The total number of individuals not in primary care is difficult to quantify. However, for children in England aged between 0 and 4 years, it has been estimated to be fewer than 1%.
While the following are neither necessarily a strength or limitation, it is worth noting that stratification by geography and ethnicity are dependent on valid postcode of residence from the ONS National Statistics Postcode Lookup (NSPL) and ethnicity codes respectively. Data completeness is therefore an important part of IIS’s strengths, and fortunately general completeness is substantial.
Data quality
Accuracy
Accuracy of the COVID-19 vaccine uptake data depends on the quality of data entry (for example, clinical coding), data specifications for access to the correct data, as well as the ability of data engineers to successfully transfer the data between NHSE’s centralised systems and the UKHSA’s data warehouse. Both NHSE and UKHSA conduct regular data checks of our vaccine and eligible population records, and UKHSA have a well-documented and run Extract, Transform, and Load (ETL) processes for loading the data into our data warehouse. In addition, peer review publications have shown the extent of the reliability and accuracy of the COVID-19 data in IIS.
Completeness
COVID-19 vaccine uptake data within IIS has an excellent level of completeness, especially for those fields required for linkage such as NHS Number. This is mainly due to the preprocessing done by the DPS at NHSE before the data is loaded into IIS. Where data is missing, this mainly includes fields such as ethnicity and postcode of residency. This may be because the individual is unwilling to provide this data. We are unable to distinguish this from missing data caused by data manipulation or transfer issues, which means incompleteness may not be solely caused by data quality issues.
Uniqueness
IIS’s eligible population dataset is unique and based on an individual NHS number. Data in vaccine record tables will have records with multiple NHS numbers as each records indicates a unique vaccine dose administered.
Consistency
Data within IIS is strictly controlled for consistency. Vaccine and eligible population records are kept in separate datasets. However, some fields may contain the same information on the same individual in both for analytical and logistic convenience. For example, personal information dependent on time or situation such as postcode of residency is included in both datasets which may appear to be unnecessary and sometimes contradictory. However, this information in the vaccine records dataset should be considered to be correct at the time of vaccination (in the past), whereas this information in the eligible population dataset should be considered to be correct at the time of analysis (currently correct). The separation of these same fields into both datasets allows for questions to be answered using the data from a retrospective, current, and prospective viewpoint.
Where data should remain the same regardless of time or situation, such as date of birth, inclusion in both datasets is done because the eligible population dataset is dynamic. It means that when an individual is no longer eligible for primary care in England their record is removed from the eligible population dataset. If the date of birth was only stored in the eligible population dataset and an individual became ineligible for primary care in England, their data at this point would be lost, meaning that no age requiring analysis of this individual could be conducted from that point onwards. Hence it is also stored in the vaccine records dataset and should never be contradictory.
Timeliness
IIS data is added, deleted, and updated daily. The data includes a 24 hour lag period meaning that data for example from day 1, will be uploaded on the morning of day 3. This frequency includes both vaccine and eligible population records.
Validity
Data within IIS is regularly checked for completeness and inconsistencies such as valid GP practice codes and dates of birth. All data are stored within a relational database management system located on UKHSA premises and accessed using standard tools such as T-SQL (Microsoft). For the purposes of data access permissions and to simplify frequently requested perspectives of the data, SQL views are often used which may differ slightly from the original tables but are valid for their intended used.
COSECCO
Data sources
The preceding/co-/secondary infections with COVID-19 (COSECCO) project at UKHSA provides routine, robust and rapid detection and surveillance of bacterial, viral and fungal preceding, co- and secondary infections in persons with COVID-19 or influenza to detect outbreaks and monitor disease burden in the population by producing linked data outputs. Surveillance covers England for the period of Jan 2020 to 28 days prior to the report date.
Datasets linked together in the COSECCO project include COVID-19 episodes, SGSS, UKHSA healthcare associated infections data, Respiratory Datamart, and bespoke pathogen expert datasets: invasive pneumococcal disease (IPD) and Haemophilus influenzae (Hi). This data is combined using 2 methodologies: (1) COVID as base infection, and (2) Influenza A or B as base infection.
Preceding, co- and secondary infections are split into groups by specimen type (blood, lower respiratory tract, or Clostridiodes difficile infection) and reported as case count. They are also reported as percentage of total COVID-19 or Influenza A and B episodes.
Relevance
The COSECCO project provides early detection of changing count and rate of preceding, co- and secondary infections within COVID-19 or Influenza A and B episodes. This provides timely indications of changing infections within the population of England and can be used to inform NHS winter planning and interventions such as vaccination and antimicrobial usage and stockpiling.
Strengths and limitations of the data
Infection episode linkage between databases at UKHSA provides a unique insight into population level infections in England and an early indication of change in these patterns. However, this relies on laboratory testing and, in some cases, further verification of clinical relevance which adds a delay onto reporting and subsequent linkage within this dataset. Clinical relevance is assured for the pathogen expert data and blood stream infections. The linkage of preceding, co- and secondary infections does not guarantee clinical significance in all linked infection episodes, this requires subsequent interpretation by the user of the data.
Data quality
Accuracy
Accuracy of testing data is reliant on data input to reporting systems and the data given by patients (for example, address). UKHSA runs regular training sessions with staff inputting data to improve data quality.
Pathogen expect datasets include cases where clinical relevance has been verified from the original data which includes any positive specimen. Therefore, non-relevant positivity has been minimised in data extracts.
Completeness
Completeness of the dataset relies on reporting of the infection data to the datasets the COSECCO project pulls from. The Respiratory Datamart receives testing data from 17 laboratories in England (as mentioned above). Comparisons of mandatory (HCAI DCS) versus voluntary (SGSS) reporting from labs estimate voluntary reporting of tests to SGSS at 98% (see mandatory healthcare associated infection surveillance. Therefore, using the subset of key healthcare associated infections as an example, 98% of all positive blood cultures for key pathogens are reported in SGSS, indicating that the dataset provided by SGSS is likely highly complete. However, it is worth noting that there will be patients with infections who do not have positive microbiology and these patients will be missed. As such, even though we assume a complete microbiology dataset, these data will be an underestimate of all preceding, co- and secondary infections.
The linkage process has been designed with a matching system which would allow for 1 or 2 differences in personal information and still correctly identify the match. This minimises the possibility of missing a ‘true’ match between records and increases confidence in the matches that are made. However, missing date of specimen collection would prevent linkage with a base infection. Additionally, if multiple personal details were missing or inaccurate (including NHS number, date of birth, sex and postcode) then that could prevent linkage.
Uniqueness
Deduplication occurs as part of the weekly data analysis to ensure person-level linkage occurs between infections datasets and these are counted once within each report. Outputs should have minimal duplication of records. Surname, forename, date of birth, NHS number, sex, species and specimen date are used for deduplication (specimen date deduplicated within episode duration).
Consistency
Electronic health record data is processed to preserve the same value where a single person has multiple records within the data. The data cannot be checked against other people’s data to evaluate consistency due to the available data and data protection.
Timeliness
COSECCO linkage is completed on a weekly basis for the full time period of the report. Data for the base infection of each report is extracted to 28 days prior to the report date to allow for the full follow up period of secondary infections. Linking data is extracted for the full period of the report generated, unless specified below. This ensures the most up to date version of testing data is utilized in each report. The timeliness of data used in each linkage is reliant on the lag time for each data source. Lag times are the minimum possible with current reporting and surveillance capabilities.
Pathogen expert data (IPD and Hi): data provided monthly on a one-month lag (for example, data for January is provided in the first week of February).
Validity
Validity checks are undertaken including adding limits for maximum age and a minimum date of birth.
Hospital-based acute respiratory infection sentinel surveillance (HARISS) system (additional surveillance system)
Data sources
HARISS was developed by UKHSA in collaboration with the University of Nottingham and NHS hospital trusts in 2023 to 2024 as a pilot across 7 sentinel sites in England. HARISS focuses on capturing data from sentinel NHS hospital sites for patients aged 65 years or older admitted to hospital with acute respiratory infection. In the 2024 to 2025 season there are plans to expand to up to 15 sites. There are 3 components of the HARISS system:
1) Laboratory reporting of respiratory virus testing via Respiratory DataMart
Trusts in the HARISS network are advised to test all patients 65 years or older admitted to hospital (for 24 hours or over) for influenza, RSV and SARS-CoV-2 where the main reason for admission was a symptomatic ARI. The symptomatic ARI may cause any of:
- pneumonia, or pneumonitis
- non-pneumonia lower respiratory infection, or acute bronchitis
- exacerbation of chronic lung disease, for example, chronic obstructive pulmonary disease (COPD)
- exacerbation of chronic heart disease, for example, heart failure/angina
- exacerbation of frailty, or poor mobility, for example, a fall
- symptomatic ARI with another reason for admission
The results of testing for these respiratory pathogens are reported via UKHSA’s established laboratory surveillance system Respiratory DataMart.
2) Clinical data collection
For patients aged 65 years or older tested for influenza, SARS-CoV-2 and RSV (including those negative for these pathogens), a short questionnaire is completed from clinical notes by surveillance nurses at hospital sites. The questionnaire focusses on symptoms, reason for admission and severity outcomes.
3) Virological surveillance
Residual sample material from RSV positive samples is collected and stored at HARISS sites. Samples undergo virological characterisation at the RVU, Colindale.
Relevance
The HARISS system aims to strengthen understanding of the burden of RSV, influenza and COVID-19 associated illness requiring hospital admission in patients aged 65 years or older. The surveillance system will contribute to the monitoring of immunisation programmes for respiratory infections and provide data to estimate vaccine effectiveness. It provides a source of residual sample material from hospitalised patients for virological surveillance of RSV.
Strengths and limitations of the data
The HARISS system enhances our understanding of the burden of respiratory illness requiring hospital admission. It provides information direct from NHS hospital sites on clinical characteristics, reason for admission and severity outcomes of older adults admitted to hospital with acute respiratory infection. This enables an assessment of whether the hospital admission is directly attributed to the infection or if the infection is an incidental finding.
The clinical data collection is conducted at least 30 days after the patient is admitted to hospital resulting in a lag between admission due to acute respiratory infection and reporting. The RSV residual samples collected for virological surveillance provide a large amount of material for sequencing work. They are analysed and sequenced after the winter season.
Data quality
Accuracy
The accuracy of HARISS data relies on consistency of testing for respiratory pathogens in patients meeting acute respiratory infection case definitions. Further, extraction of data from clinical notes into the clinical questionnaire by research teams is paramount for accuracy. UKHSA run a training session with each new sentinel site’s data collection team to aid this process. UKHSA also corresponds with sites to discuss any queries from data collection teams. The virological surveillance is based on well-validated PCR testing undertaken at the national reference laboratory.
Completeness
Completeness of the clinical data relies on consistent testing and reporting of respiratory pathogen results in patients who meet the case definition. This is advised and encouraged at NHS sites. Completeness of data also depends on the selection of appropriate cases for clinical data collection, and the full completion of the clinical data questionnaire. UKHSA works with NHS sites to select appropriate cases and Standard Operating Procedures and training sessions have been developed. Data entry rules also help to improve completeness of the clinical data questionnaire, for example, the inability to leave a question unanswered.
Uniqueness
Deduplication occurs during the analysis of HARISS clinical data to ensure that each admission is only included once in the dataset. The fields used to perform this deduplication are surname, forename, date of birth, NHS number, admission date and sample date.
Consistency
Data captured by the clinical data questionnaire is checked for consistency against other data sources including the Respiratory DataMart dataset and immunisation information system dataset. If inconsistencies are identified these are discussed with the clinical data collection teams at NHS sites.
Timeliness
In the pilot 2023 to 2024 year of HARISS clinical data were collected retrospectively however this is planned to be more prospective in 2024 to 2025 to improve timeliness. Residual sample material for virological surveillance is transferred to the RVU from sentinel sites for testing at the end of the winter season.
Validity
Validity checks are included within the HARISS clinical data questionnaire for example date validation rules. Further checks are also carried out upon including this data in the dataset to ensure data is in an appropriate format, for example, NHS number is of correct length and fields are not left blank. The virological surveillance is based on well-validated PCR testing undertaken at the national reference laboratory.