From Apple Watches to MRI scanning, wearable devices and technological advancements in artificial intelligence (AI) are transforming medicine and healthcare.
However, the use of AI and, specifically, machine learning raises a fundamental question from a safety point of view: how is it possible to define standards and regulations for system behaviour that might be impossible to guarantee?
Anil Anthony Bharath, of Imperial College London, is the author of BSI's white paper Recent advancements in AI – implications for medical device technology and certification. In this summary we examine his views on the use of AI in medicine and healthcare and its likely developments.
Two types of artificial intelligence
It's important to make a key distinction between the following two types of AI:
- AI that displays partially autonomous behaviour based on a well-defined rule set or scheme; and
- AI that incorporates machine learning, developing its own set of rules by learning from example.
Because AI can help improve how complex software systems are engineered, it is common to see AI used in some form across industry sectors: running hardware and software to interpret data, controlling the actions of devices or interacting with human users.
How is AI currently used in medicine and healthcare?
The aim of AI in medicine and healthcare is to improve patient outcomes. Any device that interprets data through software could potentially be improved by AI integration. With these potential improvements and extended capabilities, AI can improve diagnostic accuracy.
AI is already beginning to have a degree of impact on systems that can really benefit from it. A prime example is imaging systems and radiomics. The latter is used to create biomarkers, which perform categorization for patients to enable determination of appropriate therapies, surgical margins and the effects of treatment. There is an increasing reliance on machine learning to implement image-based biomarker detection and measurement.
AI can also enable new measurement capabilities – for example, for free-breathing MRI scanning that uses optical flow algorithms to help correct for chest wall and cavity motion. Adopted systems exceed the performance of existing hand-engineered algorithms, meaning that, moving forward, AI-enabled networks might replace human-created algorithms.
Large-scale collection of clinical and laboratory measurements and patient outcomes is likely to employ machine learning. Instead of waiting for results from large-scale studies and trials, patient risk can now be inferred from the data. The Apple Watch 4 already has a feature that uses data from around 9,000 patients to detect atrial fibrillation with a moderate-to-high degree of accuracy.
There is also scope for AI to improve diagnoses. The diagnostic decision process has traditionally used carefully designed diagnostic decision trees, which do not learn from outcomes. Machine learning, natural language processing and maintaining histories on individual patients could remedy this. For example, Babylon Health is developing AI systems for interacting directly with patients through natural language processing.
How might AI be increasingly seen in medical devices and certified for use?
Sensor data are reliant on well-defined mappings and algorithms. Redefining or adding mappings and algorithms raises questions around these new parameters and may cause a device to behave in unexpected ways.
For AI that provides partially autonomous behaviour based on a set of rules, its usage in medical devices is essentially 'business as usual'.
For AI that incorporates machine learning, however, there may be extra requirements on system design before certification. For example, it is possible to change a system's function by altering the weights of the network (the 'impact' that a node has on the next node in a network).
Since this is easy to do, and the difference between two trained networks can be difficult to detect, errors can occur. There are two obvious solutions to this. One is to agree a set of sensor signal examples, symptoms and even images to represent an accepted standard. All systems that are candidates for certification must then perform to the same level as expert human diagnoses.
Another, complementary, solution is to capture information on the entire software stack. Network weights, which largely determine function, should be uniquely coded.
A further question is whether the data and examples used to 'train' the machine learning system should be subject to scrutiny. The large data sizes make this seem impractical, yet tools can be created to find pathological items in datasets that might lead to bad device behaviour, and datasets and trained models can be assigned a joint signature.
Other challenges: open learning and gold standards
Another key topic is that of open learning, which updates models while an inference system is live. Though relevant for using large amounts of data from several individuals for diagnostic purposes, it can also act as reinforcement learning for a specific device with a single patient. It is noted that adaptation would have to be within specified bounds.
Finally, there are questions around the development of a 'gold standard' of performance. Vital questions still need to be answered. How can one establish a gold standard for treatment when AI is deployed for patient triage? How does one take quality of life into account when defining the 'best' outcome for palliative care?
This could be a challenge to medicine as a whole. The best way forward might be to have agreement on standardization and sharing of clinical records, pooled across hospitals and with fine-grained clinical data. In these conditions, gold standards could emerge naturally.
You can read the full white paper at https://www.bsigroup.com/en-GB/medical-devices/resources/whitepapers/.