Concerns Machine Learning AI Spot False Correlations In Cancer Data

Concerns Machine Learning AI Spot False Correlations In Cancer Data

A leading scientist last week raised concerns that growing faith in patterns spotted by deep learning AI algorithms could lead researchers down false paths. Recent machine learning-based conclusions based on the analysis of cancer data were highlighted as a particularly good example of the danger. The unreliability of machine learning-based conclusions are attributed to the fact that algorithms built to return a finding or prediction rather than to admit to uncertainty or the failure to spot a pattern.

The concerns were raised by Dr Genevera Allen, a leading scientist at the Baylor College of Medicine and Rice University at last week’s annual meeting of the American Association for the Advancement of Science. She warned:

“I would not trust a very large fraction of the discoveries that are currently being made using machine learning techniques applied to large date sets”.

AI-powered big data analysis is currently being extensively applied in medical research. It is making a hugely positive contribution and turbo-charging advances and discoveries across medicine and pharmaceuticals. But Dr Allen believes the hype around the latest technology in the world of AI and biomedicine entails the hidden danger of over reliance and failure to question ‘findings’ returned by algorithms.

In modern biomedicine AI is being used to drill down into data in an attempt to find causation connections between an individual’s genetics and disease. Those patterns can be related to an individual’s genetic risk of developing a particular disease or condition, specifics of how the disease manifests itself based on an individual’s genetics and personalised treatment recommendations. The direction modern medicine is now taking thanks to AI is to break patients down into smaller categories based on relevant similarities in their DNA profile rather than a ‘one size fits all’ approach to prevention and treatment.

But despite its increasingly important contribution to medicine, Allen is wary of researchers failing to show enough critical caution around the ‘findings’ that AI analysis can return.

“There are cases where discoveries aren’t reproducible. The clusters discovered in one study are completely different from the clusters found in another. Why? Because most machine-learning techniques today always say: ‘I found a group’. Sometimes, it would be far more useful if they said: ‘I think some of these are really grouped together, but I’m uncertain about these others.’”

Allen says that human researchers are then naturally inclined towards finding a way to rationalise the reason why DNA profile clusters grouped by an AI might be affected by a disease in a particular way. But she fears that doesn’t mean the rationalisation is then correct. Often the patterns highlighted by AI can prove to be random associations that represent correlation rather than causation. And even then correlation that isn’t repeated across new data sets.

The answer, says Allen, is teaching AI algos used to analyse medical data to critique their own analysis and indicate how likely it is that a finding is a genuine correlation rather than random association. And human researchers not getting carried away with the hype around AI analysis and accepting patterns it highlights unquestioningly.

Leave a Comment