Design your evaluation: evaluating digital health products
Guidance on choosing evaluation study types and methods.
This page is part of a collection of guidance on evaluating digital health products.
Decide what data to collect
You will need to work out how to collect the data you need for your evaluation. This may mean balancing the effectiveness of different data collection methods against the time and resources needed to carry them out.
Whether you want to run evaluation yourself or commissioning an external team, you should design your product to collect the data you need. Considering these practicalities will make your evaluation simpler to run:
Develop your product with evaluation in mind
This is best practice. Find out what you need to evaluate and make sure you can collect, record and store it in a consistent way over the period of the evaluation.
Consider evaluation when getting ethics approval
If you want to follow up with users, especially those who disengaged from your product, you need to have permission to email them. To associate data with the individual, you also need informed consent in advance.
Ensure the design of the product supports randomisation
The product should support randomisation between users and between events for a single user.
Types of data collection
Data collection through the digital product
Collecting data through the digital product is straightforward but means you cannot collect data on people who have stopped using the product. Some data collection from a digital product, like usage data, can be automatic. This is useful, but only some data can be collected in this way. Other data collection relies on users describing their own behaviour (self-report data), which can be inaccurate or biased.
Data collection must comply with the General Data Protection Regulation (GDPR) – read guidance on GDPR. It’s important to know that:
- users may not always consent to data collection
- GDPR requirements will introduce some bias in your data – for example, the requirement for users of a website to accept cookies
App store data
App stores, or similar services, generally provide some statistics about an app, which you can access easily.
The number of people who have downloaded a product is a useful measure of impact. However, people often download an app but never use it, or only use it once. Downloads cannot tell you about longer-term engagement or whether a product is effective.
App stores give average user ratings (quantitative data) and access to individual reviews (qualitative data). These provide valuable feedback and will also influence whether someone will download an app. Reviews of your product and rival products can help to inform improvements.
However, popular apps are not always the most effective. People leaving reviews may also not be representative of most users.
Usage data
You can draw data from the usage of your digital product. The data you can collect will depend on how the product has been coded, so think about this when developing your product.
Commonly available data includes:
- when the user accesses the product
- what parts of the product they viewed
Usage data is good for assessing engagement – unlike downloads, it shows individuals are actually engaging with the product and how they are engaging with it.
Usage data cannot show whether an app is effective. However, usage data can be correlated with outcome data. This can show whether users who access the product more show a greater improvement in the outcome.
Data from smartphones
Many digital devices record data without the user having to do anything different (passive data collection). This reduces missing data and biases associated with who records data.
Smartphones typically have motion sensors and location data, which can be used, for example, to estimate physical activity. These measures are not perfect. For example:
- users may not have the phone with them during exercise
- users might disable data collection if it impacts on battery usage
- users may not consent to data collection
- phones cannot always differentiate between activities
Users may be more likely to consent to data collection if they see a benefit to it – for example, if they are using an app for self-tracking purposes.
Data from wearable devices
Wearables offer more possibilities for passive data collection. Separate motion sensors may give more reliable data. Watch-like devices may directly measure things like heart rate and blood pressure (physiological indicators). Some devices target specific conditions, like continuous glucometers for diabetes.
Data collection can still be incomplete – devices may require charging and upkeep and people may not always wear them. Data may also not be accurate – for example, step count data is sometimes unreliable.
Self-tracking and self-report data
Many digital health products ask the user to input data (active data collection) as part of how they function. For example, if the app is used for self-tracking, users may track:
- behaviours, such as physical activity or diet
- how they feel, for example with a symptom tracker
Because it requires action from the individual, incomplete data collection is a common problem. You could still use this data for your evaluation and also ask users questions directly, separate to the product’s functions. For example, you can ask users:
- what they think of your product (to generate quantitative or qualitative data)
- to complete standardised questionnaires, for example, the Alcohol Use Disorders Identification Test (AUDIT)
- questions specific to your evaluation
Questions should be informed by your model of how your product works. Ask questions in a clear and neutral manner. It is worthwhile piloting your questions to check they work well.
This approach cannot tell you about the experience of non-users and usually cannot tell you about the lasting effects of your product. For example, does an app to promote physical activity still have an impact 5 years later? If someone stops using a product, this may not mean the product hasn’t worked for them. For example, a Couch to 5K user may have stopped using the app because they became a confident and regular runner.
Linked data
Data about users of a digital health product or service may be collected for other reasons – for example, when they attend NHS services in person. This data may be useful for your evaluation.
To link data from another source to your evaluation of a digital health product, you need to be able to identify the user of the product with the data collected elsewhere. This has legal and technical challenges, so discuss this with the relevant NHS body and consider what permissions you need.
Data collection outside the digital product
Sometimes the best way to collect data on an individual is to approach them directly, outside of and separate to the digital health product. For example, it may be possible to send them a questionnaire through contact details they provided or to meet them in person.
These approaches have problems with response rates but are better at reaching people who have disengaged from the digital product.
Sometimes you will want to collect more objective data from individuals. This might require specific testing, for example, measuring cotinine levels in saliva to objectively measure smoking behaviour.
Choose your design
Start by thinking about why you are doing an evaluation. Different types of evaluations answer different questions and may be required at different times. Different types of evaluation can complement each other. You rarely want to do just 1 evaluation over the life of a product.
Common reasons for doing evaluations include to:
- check whether your product is effective
- go through a regulatory process
- inform the development of the product
- understand users’ needs
There are some general points to consider when designing your study:
Choosing an approach
Choosing an approach often involves compromise. A more complicated study will require more time and resources but will give an answer in which you can have more confidence. A cheaper and quicker evaluation may be less rigorous. The Medical Research Council (MRC) has advice on evaluating complex interventions.
You may be in a situation where you can do something very rigorous relatively cheaply, depending on how your product works, what data is available and other factors. Thinking about evaluation early in your development cycle may allow you to design systems that make evaluations cheaper and easier to do later on.
These resources divide evaluation approaches into 4 broad categories:
- descriptive
- comparative
- qualitative
- economic studies
The guidance Choose evaluation methods describes these and will help you choose what sort of evaluation you want.
Mixing methods
In practice, some evaluations fall between categories or use mixed methods that combine more than 1 design. For example, you could collect data in a before-and-after study on a large number of participants, and also interview a subset of participants about their experience of using the product. Read more about carrying out a Mixed methods study.
Independent evaluations
Consider who should carry out the evaluation. You may want an independent organisation or group to conduct it. Demonstrating independence will give your evaluation more credibility as it avoids potential bias. However, there are added costs and less flexibility.
Independent evaluations are more usual when you have a product and want to check it works, and you want to communicate your findings to an external audience. It is rare to use independent evaluations when you are still developing your product.
Sampling
Most evaluations involve sampling – collecting data from only some users of your product.
You want the participants in your sample to be representative of all users. Usually, the best way to do this is a random sample. A consecutive series of users (for example, everyone who downloads an app between 2 dates) often works like a random sample. In some situations, it will be easier to use a specific group. Consider whether this will introduce any biases in your study – for example, might the selected users like the product more or be more likely to respond to it better?
Representativeness is less important to some evaluation approaches – for example, many qualitative methods. With these methods, you may want to pick a deliberately diverse sample.
Statistical power
One approach used in quantitative analysis is hypothesis testing. Statistical power is the likelihood of a hypothesis test detecting an effect if there is an effect to detect. For example, if your app really does cause users to smoke fewer cigarettes (in which case your hypothesis is ‘this app helps users to smoke fewer cigarettes’), what is the likelihood of measuring that change? Calculation of statistical power tells you how many product users you need to have information about (for example, information about their smoking before and after using the app) for the test to be reliable.
The size of your sample will affect the statistical power. A power analysis or calculation can help you decide if your sample size is large enough. Read more about power calculations on the University of Manchester website.
Updates to this page
Published 30 January 2020Last updated 2 December 2020 + show all updates
-
Information added under Decide what data to collect.
-
First published.