Sekhar Subramoney

Data science in the fight against Covid-19

By Sekhar Subramoney

 

The N-Corona pandemic has shaken and is, without doubt, in the process of transforming much of the world as we know it. Industrial Engineer, renewable energy consultant, technical content writer of case studies in Data Sciences says.

With more than 3,5 million people infected and more than 250 000 deaths worldwide in four months, there most likely isn’t a soul on earth who hasn’t heard of the SARS COV-2 virus, probably not by that name but simply as Coronavirus or a similar euphemism, and affected by it in some way, either directly or indirectly through the lockdowns and severe restrictions on daily life.

One thing, though, that has become stark as the virulent disease spread throughout the world – is the massive importance of data science and analytics to help stem the tsunami-like flow of the pandemic.

Authorities the world over have come to rely on data science to find ways to stem the infection and death rate. This is not to say that medical science doesn’t play its part, in fact, it is the medics who are leading the fight against the virus, but they are doing so together with data scientists as it is the latter group who are adept at applying various mathematical and computational tools, to ‘play’ the data in order to provide incisive insights on the measures taken or may be taken by the medics.

As the data pool from the entire world increases each day, data scientists are discovering various ways by which the infection rate may be reduced, including ways in which individual behaviour can stem the ‘tsunami’.

While face masks cannot stop the spread of the virus, Data Science (DS) found that where this was prolifically used the infection rate was reduced. DS found that generally, people over 60 were more prone to death from the disease; and also found that fewer women perish from the disease than men; it was DS that determined the close correlation between co-morbidity (patients with underlying ailments) and Covid-19 deaths. DS also confirmed the effectiveness (or lack) of certain drugs administered, and the list of contributions by DS goes on.

There is a fundamental premise upon which medical science determines the efficacy of medical drugs and treatments – that premise is statistics. It takes several years for new drugs and vaccines to be approved by the responsible medical authority of a country (such as the FDA of the US) Medical drugs go through several series and layers of testing; in the final analysis, scientifically proven methodologies such as control groups and test groups are given the drug to determine which groups’ condition improves.

If a significant number of people in the test group respond positively while a significant number in the control group (they use what is called a placebo with the control group) does not get better because it’s only a placebo, only then does a new drug make it through to the final admin stages for approval. The word ‘significant’ here is significant – it is used herein statistical terms; in the real final analyses, therefore, it’s stats that is the last threshold for acceptance.

 

What about the random selection of test participants – this is also based on a mathematical concept, that of randomness – random selection avoids inadvertent bias. The level of confidence for acceptance of the test, ie the percentage to which all possible samples are expected to include the true population parameter, is another element of statistics in medical drugs genesis.

The close connection between medical sciences and mathematical sciences has a strong and long history; is it any wonder then that during this unprecedented viral attack that these two sciences together are devising the strategy to arrest the ‘Corona tsunami’.

There are reliable sources for data on Covid-19 that is being added to daily, available for free from the Johns Hopkins University of Baltimore, USA, the WHO (World Health Organisation, the Centres for Disease Control (CDC) of many countries such as the ICMR of India, and so forth.

As new data is added each day and DS algorithms are applied to the data, new factors or insights surrounding infection, disease and death emerge, that informs authorities on the measures to be taken to stem the flow. It must be understood that lockdown and other restrictive measures have been imposed by authorities after testing the data. DS techniques such as predictive and prescriptive modelling, hypotheses testing, correlation coefficient calculations, regression analyses, sample vs population mean, distribution models, etc, are in use to gain greater insights into SARS COV 2 and Covid-19 (the former being the name of this damnable virus and the latter the name of the resultant disease).

The use of operations research and quantitative techniques goes back to World War II1, however, the advent of  DS (a recent branch of information technologies) has popularised these number-crunching techniques outside of books and universities. DS has now become the de facto ‘home’ for much of the mathematical and statistical techniques; they are being used to successfully transform the effectiveness and bottom lines of companies, organisations and large events (such as the current virus crisis) bringing about greater effectiveness and efficiency wherever DS is applied.

Besides the use of algorithms and quantitative techniques, the fight against the c-virus has also seen technologies such as AI and bots being used. It has been widely reported that during the height of the epidemic in China, face recognition software was extensively used to track the frequency of movement of people, drones were used to remotely sanitise places and broadcast warning messages, and various other innovative use of mobile phone technologies were used (still being used in many countries) to stem the flow of the virus.

One of the open-source technologies in use with big data on the c-virus is NEXSTRAIN, a tool that tracks the movement and mutation tendencies of infectious agents such as viruses and bacteria. This tool was developed some years ago to help Epidemiologists understand the evolution of pathogens in different conditions, countries, environments, climates, etc.

In this case, we would see that Data Scientists specialising in this tool are working with the Epidemiologists. (In other industries Data Science boffins may be adept on tools such as RapidMiner or YellowFin BI). Using transparent and accessible public data the WHO has facilitated the development of Big Data dashboards to track the spread of the virus.

This allows users (governments and scientists working on the pandemic) to access real-time updates easily; the WHO dashboard is accessible through several platforms. Other similar dashboards depict infection concentration graphically which makes it easy for much needed and scarce resource distribution and to which areas travel must be seriously minimised.

AI, a technology that is closely associated with DS, is also under wide use in the pandemic. Technologies such as image recognition using machine learning algorithms are also used widely to deal with the pandemic. Machine learning-based computer vision algorithms are being used to diagnose thousands of CT scans related to Covid-19 in a matter of seconds with an accuracy rate of 96 percent.

Also, robots are in use in many hospitals (to minimise contact between medical staff and patients); they are used to deliver food and medicines to patients (AI recognises which patient requires which medicine). It is used in various other hospital tasks such as remote measurement and logging of body temperatures, washing critical equipment, etc. AI is at the core of robotics these days.

 

Machine Learning, a core function of DS, is at the heart of AI, yup we keep coming back to the fact that it is Data Science that is a cogent partner in several technological applications.

Researchers from different parts of the world are collaboratively using AI to create a prediction model for antiviral drugs as shown by a team from South Korea’s Dankook University in South Korea working with researchers from the US academy at Deargen to run a series of tests using commercially available antivirals that may act upon Covid-19 – prediction model is again is a function of DS.

The versatility of data science, being able to take a deep dive and adapt to highly interdisciplinary use cases, is inexorably making it the biggest thing in the world since the advent of oil.

Graduates with Mathematics as part of their graduate / post-graduate qualification, you may want to consider enhancing your current CV with a course in Data Science.

Sekhar Subramoney is an Industrial Engineer, renewable energy consultant, technical content writer of case studies in Data Science with www.aptuslearn.in