Introduction by Croakey: Artificial intelligence (AI) tools for clinicians and patients are advancing at a rapid pace, but are they fit for purpose in healthcare?
That was the overarching question for an expert panel at a webinar recently hosted by the Consumers Health Forum of Australia (CHF), as Marie McInerney reports below for the Croakey Conference News Service.
Marie McInerney writes:
In the artificial intelligence (AI) world, they’re known as ‘hallucinations’.
It’s where emerging AI technology gets the information wrong or, more technically, where the so-called large language model that sits behind AI ‘scribing’ tools like Otter or the increasingly popular ChatGPT generates content that is “not verified or accurate”.
In healthcare applications, as elsewhere, these hallucinations can be funny, such as one clinician’s letter, scribed by AI, that was signed off from ‘the world’s greatest cardiologist’.
But they can also be grave and dangerous, as speakers told a webinar on AI in healthcare, convened by the Consumers Health Forum of Australia (CHF) on 13 June.
The webinar focused particularly on generative AI, algorithms such as ChatGPT and other tools that can be used to create new content, including being able to turn a patient interaction into clinical notes in seconds.
Webinar participants heard that, in one informal test, ChatGPT extolled the nutritional virtues of adding a potentially fatal substance to breast milk, while a study published last year in the European Heart Journal showed how ChatGPT provided harmful medical advice, as well as references to study results and journal articles that did not exist.
Despite those risks, the webinar heard there is significant excitement and hope about what generative AI might deliver in healthcare for patients, clinicians and communities, especially in reducing the burden of documentation, which contributes to burnout for health professionals.
But speakers also raised many issues that need to be addressed, with a landmark citizens’ jury having recently recommended rigorous evaluation, fairness and patient rights, clinical governance and training, technical and data requirements, and community education and involvement, before we can be sure that these applications are “fit for purpose” for healthcare.
These discussions come as the Federal Government conducts a consultation on safe and responsible AI in Australia, with a Senate inquiry on Adopting AI also gathering submissions from health groups, including the CHF.
Acknowledging that healthcare consumers, healthcare services, and healthcare financing institutions could all potentially benefit from wider adoption of generative AI, the risks are primarily borne by consumers, CHF said.
Cautionary tales
Introducing the 90-minute panel discussion, the latest in a series of #CHFTalks events on health issues, CHF CEO Dr Elizabeth Deveny said it would not come as a surprise to anyone attending to hear that the “future of healthcare is increasingly going to be digital and heavily shaped by technology”.
“We know that artificial intelligence or AI can improve things, but we shouldn’t assume that that’ll be the case,” she said.
Deveny added that the question she always seeks to answer is “does a service or model of care need transformation?” If it does, then we need to ask whether a digital mechanism is a good way to deliver that, or does transformation require changing culture, funding mechanism or other factors?
The webinar put consumer perspectives front and centre, with leading consumer representative Jen Morris providing reflections on each speaker’s presentation.
Morris, who has a particular interest in patient safety and harm prevention, is not confident that increasing use of AI in healthcare will improve health.
She opened the discussion by saying that two of the biggest public administration scandals of recent time, Robodebt in Australia and the Post Office Horizon in the UK — which both involved “powerful entities weaponising [technology] against vulnerable people” — should serve as cautionary tales as nations grapple to find the right balance for AI in healthcare.
Addressing burnout
As a tech startup founder, it’s no wonder that #CHFTalks panelist, Sydney GP Dr Roy Sugeeharan Mariathas, admits to being a ‘techno optimist’, seeing big early benefits for doctors and patients in the use of generative AI to streamline the production of clinical notes and other time-consuming documentation.
Mariathas, who is a member of the Royal Australian College of General Practitioners (RACGP) Expert Committee for Practice, Technology and Manager, quit his general practice work last year to work on a startup, still in its fledgling stage, which he said is aimed at addressing “the problem of lost time and cognitive burden for healthcare workers from administrative work”.
He has seen a mixed response to generative AI from health professionals to date – “there are early adopters and also nay-sayers” – but he knows of GPs being brought to tears of relief when they realise “how much of a burden it would lift off them”.
“For some of these doctors, it meant that they went home and saw their families, for the first time in like weeks, on time….they got to kiss their kids good night,” Mariathas told the webinar, saying uptake of the technology would go a long way to addressing burnout in the medical profession.
At the consumer level, Mariathas says many are moving on from reliance on ‘Dr Google’ for health advice to ChatGPT. This holds promise for people who speak English as a second language, he noted, saying his Tamil-speaking mother tested one application in her native language, “and I have to say it handled it extremely well”, he said.
However, the risk is if the large language model cannot reference websites or resources and is relying on incomplete training data, he said. This could produce responses that are “funny and not so funny”, including the ”very obscure and highly dangerous answer” on breast milk.
Mariathas personally has found AI tools particularly helpful for long consultations that carry some form of medico-legal implication: “When I need to be a little bit defensive, these tools are highly helpful in order to provide some really in-depth documentation.”
But he readily concedes there are concerns that need to be addressed, including that it is easy to create these products so the barrier to entry for startups is quite low. He also raised concerns about hallucinations and agreed the takeup of these technologies poses complex questions around accuracy, privacy, data sovereignty, and consent.
“We’ve still got a long way to go, that’s for sure,” he said.
In response, Jen Morris raised concerns about what impact scribing applications might have on “depth of listening” by clinicians in consultations, saying that lack of deep and attentive listening is a contributing factor in many situations of diagnostic and patient safety and harm.
People could argue that clinicians will be able to check the recording or notes later if they haven’t paid deep attention, but she said “active listening in the moment changes the questions you ask in the moment too”.
Morris also is concerned about the issue of patient consent for the use of generative AI by their doctors, particularly if it comes via an ‘opt out’ option, given that a patient may struggle to tell a clinician, ‘No, I’m not going to let you use the technology that makes your life easier’.
“That’s a very difficult thing to do as a patient, [it’s] something we should have the right to do, but it does create an interesting and difficult power dynamic,” she said.
Early days
Professor Farah Magrabi, Professor of Biomedical and Health Informatics at the Australian Institute of Health Innovation at Macquarie University, leads research to improve clinical safety and the effectiveness of digital health and AI for clinicians and consumers.
With the health system under enormous pressure, she believes AI has the potential to solve some of its intractable problems but, at the same time, we are seeing tech giants in a race to build ever more powerful AI, and more and more of these are AIs being integrated into everyday consumer products.
A lot of AI today relies or builds on health data, but she said the market is also seeing emerging foundation models, which are built on general data, but being applied in healthcare.
Magrabi highlighted a study published in NEJM Catalyst last year, reporting on a trial of AI technology for 10,000 physicians and staff.
It is one of the first such studies, and a full evaluation of the benefits and risks of a scribing tool designed for clinical consultations is still underway.
To date, she said, the research had shown “enormous potential” in terms of improving the clinician/patient interaction, with both groups “quite happy overall”.
Assessing 35 representative examples of transcripts, it found that overall they averaged a score of 48 out of a possible 50 points, with strong results for the domains: free from bias, synthesised, internally consistent, and succinct.
However, it also highlighted instances of hallucinations. For example, when a physician mentioned scheduling a prostate examination for a patient, this was summarised as having been performed. In another case, when issues with a patient’s hand, feet and mouth were mentioned, the AI summary reported the patient being diagnosed with hand, foot and mouth disease.
Magrabi said it remains to be seen whether these tools are “going to help us or make things more difficult for us”, but stressed that it’s vital to “assure ourselves that these things work, that they’re fit for purpose before we deploy them in health care settings”.
Reflecting on Morris’s concerns about Robodebt, Magrabi said healthcare was not currently at the same level of risk, because clinicians are relying on these tech tools as an additional source of information, rather than having it “completely taking over their job”.
There is AI that goes further, such as being able to detect diabetic retinopathy, “so it’s actually making a decision or a diagnosis”, and this can powerfully improve screening access to services in places where specialists are not available, she said.
But the vast majority of generative AI in healthcare fits into the category of assisting clinicians rather than providing decisions.
Seminal research
Magrabi and colleagues at Macquarie University conducted a world-first review into the safety of AI enabled medical devices approved for use by the United States Food and Drug Administration (FDA).
Their study, published in the Journal of the American Medical Informatics Association (JAMIA), identified safety problems across all stages of medical device use, such as imaging for diagnosis and treatment, radiotherapy planning software, insulin dose calculators, clinical patient monitors, and cardiac arrhythmia detection apps.
The researchers looked at whether a reported problem was due to the device or to the way it was used.
By far the majority of safety issues actually come from bad data being fed into AI systems, Magrabi said. But, interestingly, while user errors featured in only seven percent of the events, they were four times more likely to harm patients compared to the data or algorithm issues.
It’s important to understand that AI models generate plausible responses to queries: “They do not model logic, they do not model facts, or laws of the physical world, or any morality…it’s not looking at the big picture at all.”
That said, generative AI like ChatGPT, can be beguiling, she said, giving as an example the difference between a Google Search and ChatGPT for someone looking for help and information about a father who has been diagnosed with Parkinson’s disease and how best to support him.
Google will bring up links to reputable and, importantly, verifiable information and websites but leaves it to the consumer to sift through them and piece them together.
By comparison, ChatGPT is more empathetic, informative and directive, a “much nicer user experience”. Its drawback is that it’s “a black box”, it doesn’t tell the user about the information’s source.
“If it’s something I can verify, then I’m okay, but if it’s something I can’t verify then obviously I’m in trouble,” she said, though noting the data capturing is “only getting better”.
Responding to Magrabi, Morris said she worried about medical records providing the data for generative AI when studies have over many years shown that many records contain factual errors (for example, in acute care in Australia).
“And that’s before we take into account bias, poor clinical reasoning, sloppy note taking, transcribing errors, medical legal defensiveness [and] all the other things that dirty the quality of medical records,” she added.
Citing the risk in data of “rubbish in, rubbish out”, Morris said she did not believe medical records data is “close to ready” to being a reliable input for healthcare AI.
She also raised concerns about how to ensure that generative AI systems could “unlearn” things when they were superseded by significant social and scientific shifts – for example, how knowledge about the causes of stomach ulcers has changed – and/or addressing issues like misogyny, whereby women with endometriosis have their pain misdiagnosed and mis-attributed to anxiety.
Deliberative and democratic
So under what circumstances, if any, should artificial intelligence be used in Australian health systems to detect or diagnose disease?
The first national citizens’ jury on AI in healthcare sought to address that question last year, Professor Stacy Carter told the webinar.
The landmark findings, now published in The Medical Journal of Australia, shaped ten recommendation categories: an overarching, independently governed charter and framework; balancing benefits and harms; fairness and bias; patients’ rights and choices; clinical governance and training; technical governance and standards; data governance and use; open source software; AI evaluation and assessment; and education and communication.
Carter, founding director of the Australian Centre for Health Engagement, Evidence and Values, and Professor of Empirical Ethics in Health at the University of Wollongong, told the webinar about the careful deliberative democratic methods involved in the jury process.
In terms of oversight and accountability, Carter endorsed the work of the National Policy Roadmap for AI in Healthcare, published by the Australian Alliance for Artificial Intelligence in Healthcare.
And she encouraged interest in global moves towards so-called “horizontal legislation”, which seeks to regulate AI as a technology across multiple domains.
“It’s a rapidly, rapidly changing space. Everyone is trying to run to keep up right now and I think there’s going to be a lot of change in the next couple of years,” she said.
Asked about the role and risk of bias in data, a big concern internationally with AI, Carter said there was a risk that AI systems would replicate and potentially intensify the bias that sees systemic prejudice and therefore less or lesser quality care for particular groups in the community.
“That could be a good thing…it might actually mean we see the bias that was previously invisible,” she said. “But it could be a terrible thing if we don’t look for it, because it could just strengthen and intensify the bias, the prejudice, the systematic discrimination that’s already in health systems.
“So I think AI is an inflection point for thinking about discrimination in healthcare, and we really need to take that seriously and use it as an opportunity.”
More viewing and reading
See this summary of the webinar via live posts on X/Twitter by Marie McInerney.
Also see this recent article at The Conversation: What OpenAI’s deal with News Corp means for journalism (and for you)
The Croakey Conference News Service is covering a selection of the #CHFTalks webinars in coming months. Bookmark this link to follow the coverage.