Guidance

UKHSA HIV and STI data publication guidelines

Updated 15 August 2024

Main points

The main points of this guidance are as follows:

  • the small cell size guidance referenced in this document must be followed for all HIV and sexually transmitted infection (STI) testing or diagnosis data that will be made publicly available
  • for example, all small cell (values ranging from 1 to 4 inclusive) must be masked when the underlying population denominator is less than 10,000 people
  • any reproduction or analysis undertaken using aggregate data provided or published by UK Health Security Agency (UKHSA) must acknowledge the data source

1. Introduction

This guidance describes how the Blood Safety, Hepatitis, Sexually Transmitted Infections (STI) and HIV (BSHSH) Division of the UK Health Security Agency (UKHSA) publishes and shares data obtained through routine HIV and STI surveillance. The principles and guidelines included are designed to minimise the risk of deductive disclosure.

This guidance should be followed by all UKHSA staff and all external users of aggregate UKHSA HIV and STI surveillance data; it is in line with the Caldicott Principles and will be regularly reviewed and updated as required. ‘External users’ refers to staff at local authorities, service providers and anyone requesting data which are not routinely published. All users of HIV and STI data are expected to follow the same guidelines outlined in this document. ‘The data’ refers to data from the surveillance systems held at the BSHSH Division (Appendix 1).

The UKHSA sexual health and HIV privacy notice provides further information on the de-personalised surveillance data BSHSH collects, how it is used, and how this information is protected.

2. Small cell sizes

The following guidelines for small cells (values ranging from 1 to 4 inclusive) must be followed for all data that will be published or provided to those without direct responsibility for commissioning or provision of care to the relevant population. Where data tables include multiple detailed breakdowns, and counts in individual cells are small, the risk of identification (deductive disclosure) increases and protection is needed (see reference 1). HIV and STI data is considered to be sensitive, and the impact of re-identification would be great.

2.1 Deductive disclosure

Deductive disclosure is where it may be possible to identify a person within a set of aggregate data tables using a combination of their characteristics and small cell sizes (values ranging from 1 to 4 inclusive). For example, where there are small cell sizes for characteristics (such as gender or ethnicity) that are uncommon in a particular area – such that the associated population denominator is less than 10,000.

2.2 Geographical presentations

This guidance covers the publication and sharing of HIV and STI surveillance data for various geographical areas. The publication and sharing rules are different across the different geographical areas (see section 4). Data may be published or shared as numbers or rates by geographical area of patient residence or by area of service provider (clinic or laboratory).

Data may only be published or shared following a thorough risk assessment to determine whether masking is required to prevent deductive disclosure (see section 2.4).

2.3 Disclosure by difference

This can occur when small differences in geographic boundaries can create areas with a small underlying population thereby increasing the risk of deductive disclosure. Such areas at risk of disclosure will be identified by regional staff at UKHSA and the data masked as appropriate (see section 2.4).

2.4 Masking cells that contain small numbers

Primary masking

All cells with values ranging from 1 to 4 (inclusive) are considered unsafe (small cell sizes). When presenting HIV or STI test or diagnosis data based on underlying denominators of less than 10,000 population, cells with values ranging from 1 to 4 must be masked and replaced with ‘under 5’ (see Example 1). Masking of cells with values ranging from 1 to 4 is also recommended when the population size is unknown or, in the case of particularly sensitive data, irrespective of the population size (see ‘Sensitive masking’ below). Additional care should also be taken when there are many cells with 0 (zero) values (across rows or columns) (see reference 2).

Secondary masking

Where the primary masked cell can be deduced from other cells within the data presented (in the same row, column or overall total), the next smallest cell size must also be masked and replaced with ‘under x’ – such that ‘x’ is the value of the cell rounded up to the nearest multiple of 10 (see Example 1).

No masking

Data can only be published without masking if the denominator is greater or equal to 10,000 population (see Example 2). For this reason, it is unlikely that publicly available data can be cross tabulated by multiple demographic factors at small geographic levels.

Sensitive masking

There may be situations where an additional risk assessment is warranted if data is particularly sensitive, which may result in greater masking restrictions than the standard masking requirements described above. Some examples include:

  • publication of data on STIs in children (given the higher degree of sensitivity)
  • publication of data in sex workers (because the population size is unknown)
  • cell sizes with a value of 1 irrespective of the underlying population size (because of the risk of self-identification)
  • where data masking requirements have changed over time which may allow masked data to be deduced by comparing data from current publications with data from past publications (see Appendix 2)

Alternatives to masking

There may also be some instances where an alternative approach may be applied if it is unfeasible to apply primary and secondary masking rules. For example:

  • all values may be rounded up to the nearest multiple of 5
  • data for areas with populations less than 10,000 may be combined to increase the size of the denominator
  • special attention should be paid when publishing data that is broken down by multiple characteristics such as age group, gender, ethnicity, and sexual orientation

Example 1. Data where primary and secondary masking is required

Local authority Male Female Total
Population denominator 11,000 8,500 19,500
Number of syphilis cases 7 [note 2] 3 [note 1] 10
Masking (publish as) Under 10 [note 2] Under 5 [note 1] 10

Example 2. Data where no masking is required

Local authority Male Female Total
Population denominator 29,107 29,892 58,999
Number of syphilis cases 3 [note 3] 2 [note 3] 5
Masking (publish as) 3 [note 3] 2 [note 3] 5

Note 1: Primary masking for values between 1 to 4 with a population less than 10,000.

Note 2: Secondary masking is required to prevent primary masking from being deduced from other data (such as totals).

Note 3: No masking for values between 1 to 4 with a population greater than 10,000.

3. Levels of access to HIV and STI surveillance data

The level of data access required will be determined by the purpose for which the data is required and the risk of deductive disclosure (and the need to apply the small cell size guidelines). The different levels of access are outlined below (see Table 1).

3.1 High-level

National or regional level data

These can be published subject to a risk assessment and appropriate masking of small cells (where necessary).

3.2 Medium-level

Upper or lower tier local authority level data

These can be published subject to a risk assessment and appropriate masking of small cells (where necessary).

3.3 Low-level

Middle layer Super Output Area (MSOA) or Lower layer Super Output Area (LSOA) level data

These can be shared in confidence with appropriate stakeholders using clear handling instructions. Data can only be published as maps where rates are grouped (for example, 1 to 5) – see section 4.3.

3.4 Individual-level

Patient level (pseudonymised and de-personalised) and service provider (clinic or laboratory) level

These data must not be published in any format. Access to this level of data is restricted to the minimum necessary number of UKHSA staff (including those with an honorary contract) working on the collation, analysis and reporting of these data. Individual level data can also be shared in confidence with appropriate stakeholders using clear handling instructions.

Table 1. Levels of access to HIV and STI data

Data level Data access level [note 4] Data user: UKHSA Data user: service provider (stakeholder) [note 5] Data user: service commissioners (stakeholder) [note 6] Data user: other users [note 7]
National High Yes Yes Yes Yes
Regional High Yes Yes Yes Yes
Local authority and upper tier local authority Medium Yes Yes Yes Yes
LSOA and MSOA Low Yes Yes Yes No
Service provider [note 5] Individual Yes Yes, own data only [note 8] Yes, own data only [note 8] No
Patient Individual Yes Yes, own data only Yes No

Note 4: See section 4 for guidelines on how data can be shared or published for each level of data.

Note 5: NHS sexual health services (HIV, STI, sexual and reproductive health (SRH)) and laboratories reporting surveillance data.

Note 6: Commissioners of NHS sexual health services (HIV, STI and SRH) – such as local authorities and upper tier local authorities.

Note 7: Including, but not limited to, government organisations, specialist sexual health organisations (HIV, STI and SRH), academic organisations, media organisations, charities and the general public.

Note 8: UKHSA many share additional data in confidence when appropriate.

3.5 Data requests

Data requests should be sent to the contact person for each surveillance system as listed in Appendix 1. Requesters will be required to complete a data request form detailing the data required and the purpose for which the data will be used.

All Freedom of Information (FOI) requests for HIV and STI data should be sent via the FOI office at UKHSA ([email protected]). The small cell size guidelines referenced in this document must also be followed for all FOI requests.

Press enquiries to BSHSH staff should be directed to the national UKHSA press office (UKHSA[email protected]). Regional UKHSA staff should direct press queries either to the UKHSA regional or national press office.

4. Data publication and sharing guidelines

Every data publication should undergo a thorough risk assessment to determine whether masking is required to prevent deductive disclosure and balance the public health benefits and risks to individuals or UKHSA. If there is any doubt whether masking is required, the data should be reviewed by the appropriate information asset owner in consultation with an Associate Caldicott Guardian. The guidelines for publishing and sharing HIV and STI data at the different levels is detailed in the sections below.

4.1 Publication and sharing of high-level data

National and regional numbers and rates may be published in public facing reports in the format of aggregated tables, graphs and maps – following a risk assessment and appropriate masking (where necessary).

Please note:

  • data masking is not usually required at high-level – due to large population denominators (greater than 10,000 people)
  • data masking may be required for particularly sensitive data, even where there is a large population denominator (greater than 10,000 people) – see section 2.4

4.2 Publication and sharing of medium-level data

Upper tier local authority and local authority numbers and rates may be published in public facing reports in the format of aggregated tables, graphs and maps – following a risk assessment and appropriate masking (where necessary).

Please note:

  • data masking is often required at medium-level – as some local authority upper tiers and local authorities have small population denominators (less than 10,000 people)
  • data masking may be required for particularly sensitive data, even where there is a large population denominator (greater than 10,000 people) – see section 2.4

4.3 Publication and sharing of low-level data

MSOA and LSOA numbers and rates cannot be published in public facing reports. However, grouped rates (not numbers) can be shared, in confidence, with appropriate stakeholders in the format of aggregated tables and maps – following a risk assessment and the application of appropriate categories for grouping rates (defined below).

Please note:

  • MSOA and LSOA grouped numbers should not be shared (only grouped rates)
  • MSOA and LSOA rates must be grouped into ranges of at least 5 or more – where the smallest permitted category is ‘1 to 5’, ‘6 to 10’ (and so on)

4.4 Access to individual-level data

Individual-level data includes patient level and service provider (clinic or laboratory) data.

Patient level

Data cannot be published in public facing reports in any format.

Patient level data can only be accessed by authorised UKHSA staff (including those with an honorary contract) and the appropriate service provider – patient level data cannot be accessed by any other party (including service commissioners).

Service provider level

Data cannot be published in public facing reports in any format – unless prior consent has been obtained from clinical leads at the appropriate service provider and the information asset owners at UKHSA.

Service level data can only be accessed by authorised UKHSA staff (including those with an honorary contract) and the appropriate service provider and their service commissioner.

Requests to access individual-level data (patient or service provider) from internal UKHSA or external parties will be reviewed on a case-by-case basis by the UKHSA information asset owner and the Office for Data Release and Acquisition (ODRA) – where necessary. Upon approval, additional privacy controls will be applied to the data provided, such as:

  • only the minimum data (variables) required for analysis will be provided
  • individual-specific pseudonyms, such as patient identifiers (IDs) or soundex codes, will be replaced with bespoke unique identifiers
  • individual-specific characteristics, such as date of birth and postcode, will be replaced by broader characteristics, such as age group and local authority (respectively)

Users granted access to individual-level data (patient or service provider) must ensure the following:

  • data must be stored in a secure restricted location within the network of their place of work – it must never be stored on a computer hard drive, on a personal hard drive or on an external transportable storage device (such as a usb memory stick)
    • users with a UKHSA honorary contract must only store the data on the UKHSA network – data must not be transferred to the network of any other place of work
  • only authorised staff can access the data
  • the maximum retention period for data is one year (data must not be kept indefinitely) – data should be deleted immediately after the purpose for which it was provided has been completed
    • if access is required beyond one year, additional permission must be requested from the appropriate information asset owner at UKHSA

4.5 Acknowledging publication data sources

Any publication including HIV and STI data obtained from UKHSA (either published as provided or following additional analyses) must include the following UKHSA acknowledgement:

‘Data from xxx surveillance system, UK Health Security Agency’ (where ‘xxx’ is CTAD, GUMCAD, GRASP or HARS).

5. Responsibilities for adhering to the guidelines

This guidance permits the use of ‘the data’ by the ‘data requester’ within their organisation. Any data published by the requester should acknowledge UKHSA (see section 4.5).

Any breaches of this guidance should be reported immediately to the relevant information asset owner and Associate Caldicott Guardians at UKHSA. This includes any loss of individual-level data due to storage on non-permitted media such as memory sticks or the publication of tabular aggregated data at upper tier local authority level or below which have not been appropriately masked.

6. Appendices

Appendix 1. UKHSA’s bloodborne virus, HIV and STI surveillance systems

This includes:

Appendix 2. Example of revised masking requirements

This applies where data masking requirements have changed over time (due to revised data content) which may allow masked data to be deduced by comparing the content of previous and current data releases. In these instances, the same cells should be masked across both releases – see section 2.4.

Example of revised masking requirements between data releases

The descriptions of the revised masking requirements shown in the example above:

  1. The current release presents data that has been revised since the previous release shown in Tables 1a and 2a (outlined in blue).
  2. The change in data between the previous release and the current release results in a change to the data masking requirements shown in Tables 1b and 2b (outlined in red).
  3. The current release now reveals data that was masked in the previous release shown in Tables 1b and 2b (highlighted in grey).
  4. The data revealed in the current release shown in Table 2b (highlighted in grey) can be used to unmask the equivalent data from the previous release shown in Table 1b (highlighted in grey) which can then be used to unmask the rest of the previously masked data shown in Table1b (outlined in red).
  5. The data unmasked in the previous release shown in Table 1b (outlined in red) can be used to unmask the data in the current release shown in Table 2b (outlined in red).

References

1. Office for National Statistics (ONS). ‘The review of the dissemination of health statistics in England, supporting paper’. 2007

2. ONS. ‘Review of the dissemination of health statistics: confidentiality guidance’. 2006