Official Statistics

Pathways between probation and addiction treatment in England: methodology

Published 30 March 2023

Applies to England

1. Introduction

This document provides technical details to support the report Pathways between probation and addiction treatment in England. It explains the methods and procedures used to link probation data from the Ministry of Justice (MOJ) and the National Drug Treatment Monitoring System from the Office for Health Improvement and Disparities (OHID), which is part of the Department of Health and Social Care (DHSC).

This document also describes:

  • the approach taken to ensure sound information governance
  • the lawful basis for linking the data
  • how individuals’ privacy was protected
  • how treatment pathways were analysed

2. Methods

2.1 Databases

This study was based on linking 2 databases, which were:

  1. The national probation database (nDelius).

  2. The National Drug Treatment Monitoring System (NDTMS).

NDTMS contains data on all publicly funded treatment for drugs and alcohol in England. It captures clinical information about people receiving treatment, along with:

  • sociodemographic information
  • treatment interventions delivered
  • the outcomes of treatment

nDelius is the probation case management system that captures information about offenders who either received an alcohol treatment requirement (ATR) or a drug rehabilitation requirement (DRR) in England as defined by the Criminal Justice Act 2003.

2.2 Data quality

The data from nDelius is a direct extract from an operational system upon which the probation service depends for managing offenders locally. You can find more information about the probation data in nDelius in the guide to proven reoffending statistics, which is available in each data release in the proven reoffending statistics collection.

For more information about the quality of the NDTMS data and the methodology for collecting alcohol and drug treatment data and producing statistics, see the NDTMS quality and methodology information paper.

2.3 Information governance

This study put in place 3 levels of information governance.

  1. A formal data protection impact assessment (DPIA) was carried out. The DPIA was approved by the data protection officer of both departments.

  2. A formal data-sharing agreement, which was signed by senior management representatives in both departments.

  3. Authorisation by the UK Health Security Agency (UKHSA) Caldicott Guardian, since the data was hosted by UKHSA. This ensured the project was adhering to the 8 Caldicott principles.

People accessing specialist drug and alcohol treatment in England are asked to provide consent for their information to be shared with NDTMS. Almost 98% of people provide this consent, which allows DHSC to link NDTMS information with other systems, such as prison, probation and hospital data sets. This satisfies DHSC’s common law duty of confidentiality. For more information about NDTMS consent, see NDTMS: consent and privacy notice.

DHSC cannot identify individuals accessing treatment to other government departments. This means that only staff from DHSC conducted the linkage and analysed the information used in this project.

Data protection legislation requires us to have a valid legal reason to process and use data for this project. This is often called a legal basis.

UK General Data Protection Regulation (UK GDPR) requires us to be clear about the legal basis we rely on to process this information. Under Articles 6 and 9 of the UK GDPR, the legal bases we rely on for processing the information are that it’s necessary:

  • in the public’s interest or the controller’s official authority
  • for reasons of public interest in the area of public health (for example, to ensure high standards of quality and safety of care)
  • for archiving, scientific or historical research or statistical purposes

These legal bases only apply if we take suitable and specific measures to protect your rights, and we only use your information for the purposes we describe in the section above on what we do with your information.

The Schedule 1 conditions of the Data Projection Act 2018 that are satisfied are:

  • condition 2: health or social care purposes
  • condition 3: public health
  • condition 4: research

2.6 Protecting privacy

To further ensure people’s privacy, MOJ transferred 2 separate files to DHSC. One file contained the necessary information to conduct the linkage. The other file contained the sociodemographic and offending-related information. Data scientists at DHSC also created 2 files for this project. The first contained identifiers for the linkage, and the second had core information about the clinical profile and treatment outcomes.

Two separate data scientists at DHSC conducted the linkage and created a ‘key’ file that would enable the 2 sets of attribute data to be combined. At this stage, a different analyst was responsible for combining the attribute data. Once combined, the analyst ensured that the final data set was anonymised, following the Information Commissioner’s Office code of practice on anonymisation (PDF, 1.9MB). This means that the likelihood of being able to re-identify individuals was remote. Once this final file was quality assured, we permanently deleted all preceding files.

It is important to note that, while this project depends on individual-specific information, we did not intend to use the information to affect any specific individual. Instead, our purpose was to determine whether the probation and treatment systems were working as intended.

2.7 Data linkage procedures

Data linkage

Data linkage is the process by which personal records from one data set are attached to personal records from another. Ideally, both data sets would have the same unique identifier. This is so records in each data set align personal information from different sources, such as a national insurance number or NHS number. However, there is no common unique identifier available to link NDTMS and nDelius data.

NDTMS does not collect a person’s full forename or surname, only the initials for those names. It also holds the person’s:

  • date of birth
  • sex
  • ethnicity
  • postcode sector (up to the first digit after the space)
  • local authority of residence

This has been sufficient when previously linking offending or state benefit data with NDTMS. For this project, we conducted both deterministic and probabilistic linkage approaches.

Deterministic linkage

Deterministic linkage is defined by a clear set of rules. The simplest method to apply in this project is to require all individuals on both NDTMS and nDelius to share the same set of initials, date of birth, sex and local authority. Where a pair of records exist across the 2 data sets, our interpretation is that these are the same individuals.

Probabilistic linkage

Probabilistic linkage is a more flexible approach that can lead to a higher linkage rate. Fellegi-Sunter is a probabilistic model for linking 2 data sets over several fields. This model has been incorporated into MOJ’s software Splink, which enables probabilistic record linkage on a large scale. The model allows for different fields to have higher discriminatory power than others (for example, if 2 records have the same sex, that is less discriminatory for linking purposes than the 2 records having the same set of first name and surname initials).

For small data sets, it is possible to compare each record in one data set with each record in another data set, but the software allows us to incorporate ‘blocking rules’. This helps us to narrow down the number of record pairs being compared and enhance the efficiency of the linkage. Blocking rules are a set of criteria that any 2 records must meet (for example, initials and dates of birth must match) before any other comparisons are done. In practice, we can develop multiple blocking rules (for example, initials and dates of birth must match or initials and postcodes must match).

This approach reduces how many record pairs are compared by discarding implausible matches. For example, if there was one data set with 10,000 records and another with 100,000 records, there are potentially a billion comparisons that could be made. The blocking rules allow for most of this potential billion to be ignored.

Conducting the probabilistic linkage involves calculating 3 fundamental statistics.

  1. The m-probability is the likelihood that 2 records match on a given field, if the records are a true match (the records belong to the same person).

  2. The u-probability is the likelihood that 2 records match on a given field, if the records are a false match (the records belong to different people).

  3. Lamda is the overall probability that any 2 randomly selected records are a match. We can then apply Bayes’ formula to assign a single probability that each pair of records is related to the same individual across the 2 systems.

2.8 Pathways between probation and specialist addiction treatment

Defining successful pathways

The initial plan for analysing the pathways between community sentence treatment requirements and specialist addiction services in England assumed that people would only access treatment after their sentencing date. However, early analysis showed that a sizeable minority of people were already engaged in treatment at their sentencing date.

In response, we decided to define a successful pathway between the probation system and specialist addiction treatment. This pathway included people who were either already engaged in treatment at the point of disposal or who entered the treatment system after their disposal date.

Characteristics associated with accessing treatment

A limited set of offender attributes were available in this data-sharing project.

Sociodemographic information available included the offender’s:

  • gender
  • age
  • ethnicity
  • accommodation
  • employment status

Offending-related data contains:

  • the year in which the treatment requirement was issued
  • whether the requirement was a DRR or an ATR
  • the number of requirements issued
  • the length of the order
  • whether the order was a community order or a suspended sentence order
  • the main offence category
  • the risk of serious reoffending
  • the offender group reoffending scale score (a predictor of reoffending based on age, sex and criminal history)
  • the termination reason for the order

We entered these characteristics into a statistical model (specifically, a multilevel multivariable logistic regression) to estimate whether they were associated with the likelihood of accessing treatment.

Multilevel models are required when the data has a multilevel structure. This happens when people are in clusters – for example, by attending different treatment providers. Two random people who attend one provider may have outcomes that are more similar than 2 random people attending 2 different treatment providers. This can happen if, for example, one provider treats people in a particular way that is different from other providers. Or because the 2 random people are treated by the same keyworker.

The multivariable part of the model means there are many factors that might be associated with achieving the outcome, and they are all entered into the model to calculate the probability of something happening.

2.9 Treatment outcomes

Primary treatment outcome

The primary treatment outcome in this report is whether people who engage in treatment complete their treatment. The discharge status of ‘treatment completed’ is defined as a clinician-verified report that a person has recovered from their substance use disorder and has met their care plan goals. These people cannot still be using illicit opioids or crack cocaine.

People may also drop out of treatment. The number of people who drop out is measured by a collection of discharge codes that includes people who:

  • decide to leave treatment before they are clinically ready to do so
  • have been sent to prison
  • have had their treatment withdrawn by the provider (usually due to the person breaking their treatment contract)

People can also still be engaged in treatment at the end of the data period (March 2022).

Secondary treatment outcome

The change in a person’s drug and alcohol use when they are in treatment is measured by the treatment outcomes profile (TOP). The service collects data about the person at the start of treatment and then every 6 months. The service carries out a final TOP review when the person leaves treatment. You can find more information about TOP in the NDTMS drug and alcohol treatment business definitions.

The TOP captures the frequency of use of several substances in the 28-day period before the assessment, alongside other critical concerns, such as housing, employment, and physical and psychological health.

Characteristics associated with successfully completing treatment

For offenders who engage in treatment, NDTMS can attach several clinical attributes to each record, including:

  • the substances people are in treatment for
  • injecting status
  • age of first use for the main substance that brought a person into treatment
  • information from the TOP that is captured at the start of treatment

We entered these additional information points into a statistical model (specifically, a multilevel multivariable logistic regression) to estimate whether they were associated with completing treatment. All variables used in the analysis are available in the data tables (tables 16 and 17).

2.10 Sensitivity analysis

The main analyses presented in this report are based on a linkage between nDelius and NDTMS at a probability threshold of 0.999. This means there is a 99.9% probability that an individual identified in both systems is the same person. However, it is important to assess whether the main analysis in this report is supported by ‘sensitivity analysis’. For example, if women are less likely to access treatment under the probabilistic linkage condition, there would be a greater level of confidence in the results if they are also less likely to access treatment under the deterministic linkage condition.

It is possible to lower the probability threshold to a different level. However, this means that the results would link a greater proportion of individuals with at least one treatment record. And crucially, there would be less confidence that the correct offenders are linked with the correct treatment records.

3. Contact details

You can send enquiries and feedback on these experimental statistics to MOJ at [email protected].