COVID-19 Visual Data Science Update: Virus, Variants, Vaccines

10:49am Apr 08, 2021




There are now over 10 SARS-CoV-2 vaccines in use around the world. The development of vaccines in less than 1 year from the date of sequencing the genome has been a remarkable scientific achievement. As of April 8, 2021, there have been more than 710M jabs of vaccine worldwide and more than 170M jabs in the US. Approximately 3% of the world has been fully vaccinated and a further 8% partially vaccinated. Approximately 19% of the US has been fully vaccinated and a further 13% partially. 

That is great progress, but we still have cause for concern. The large number of COVID-19 infections creates a huge pool of viruses that are mutating rapidly, and COVID-19 cases are surging in Europe, South America, and some of the United States (eg Michigan). This blog explores the progression of the 3 V’s - vaccines, virus and variants. The blog has four main sections:

  • Virus, variants and vaccine trends and relationships: among cases, fatalities and vaccine jabs in local regions worldwide. 
  • Virus and variants bioinformatics and epidemiology: genomic structure, mutations and mechanism of action
  • Vaccine development, testing and effectiveness
  • Pandemic future state and predictions

The reader can follow-along much of the visual data science analysis in this blog in the TIBCO COVID-19 Visual Analysis Hub COVID-19 Live Report - an interactive TIBCO Spotfire application that refreshes data from 15 global sources on a daily basis. 

The Spotfire COVID-19 Live Report is a visual data science application, featuring visual analytics best practices for graphics design, layout and interactive visual data discovery methodologies. 

There is also an on-demand webinar including software demonstrations of the virus/variant genomics, and the visual data science of vaccine-virus race. Both of these analyses are done with Spotfire, the premier visual analytics and data science platform for scientific and business BI and analytics. 


2. Virus, Variants and Vaccines Relationships - Worldwide

As of April 8, there were >130M cases and >2.8M deaths worldwide. The global epidemic case curve is now on the rise, and many of the countries with higher case counts are experiencing a surge in cases (Figure 1).

Figure 1. Worldwide cases and deaths associated with COVID-19 (as of April 8, 2021). 
This pandemic overview includes KPI tiles for cases and deaths on the top left, a date slider in the center that allows viewing cases or deaths over time, and navigation on the top right enabling a toggle between cases and deaths and drill into regions. The top right graphical table includes a bar chart that sorts cases/deaths by country, along with new cases/deaths and increase/decrease rates. The epidemic curve for the chosen geography (cases/deaths vs time) is shown on the lower right.

Meanwhile, more than 600M doses of vaccine have been administered worldwide, approx. 8 doses per 100 people. These are concentrated in high-income countries; North and South America and Europe. 

To study the effects of vaccinations on new cases, we examine relationships among people vaccinated (including partial dose), daily new cases, benchmarked by vaccination rates across geographies (Figure 2). 

Highlights of this analysis include:

  • Israel – has vaccinated more than 60% of its population; and daily new cases have fallen dramatically to ~4/100K people. One might claim they have crushed the virus. 
  • UK – has partially vaccinated (one dose) more than 45% of its population; and daily new cases have fallen to ~5/100K people. 
  • US – has partially vaccinated more than 30% of its population (>100M people!) and daily new cases have flattened to ~16/100K people. We will see below that some regions of the US are trending positive with respect to new cases.  
  • Chile – has vaccinated more than 35% of its population since early February, but cases have risen from ~20 to ~40 per 100K people during March
  • France – has partially vaccinated just ~9% of its population and cases have risen from ~20 to ~60 per 100K people in 2021. 
  • Italy – has partially vaccinated just ~12% of its population (~4% fully vaccinated) and cases have risen from ~20 to ~35 per 100K people during February-March

Figure 2. Relationships between vaccinations administered and new cases by worldwide countries (as of April 4, 2021). 
Lower left
: percent of people vaccinated (including partial dose) vs daily new cases per 100,000 people; labels based on markings above, tooltips on mouseover. 
Lower center: epidemic curve of marked region (cases per 100,000 people vs time). 
Lower right: vaccinations in the marked countries (pale blue = partial vaccination, dark blue = full vaccination); reference lines are averages from marked countries. 
The axes of the epidemic curve and vaccination graphics are fixed, to allow rapid comparison of cases and vaccinations, by clicking on the tiles or bars across the top. Zoom sliders are included to also allow detailed review of individual countries. 
Top row: By default the top 10 countries by vaccination rate (%) are shown in the bar chart and map; and all countries are sorted by vaccination rate in the graphical table on the right. Countries can be added by choosing “all countries” in the top right, and using Ctl-Select on Windows or Shift-Cmd on Mac to add individual countries. 

It is useful to view the recent history and progression of cases and vaccinations over time. We show this by fixing the axes (x: cases, y: vaccinations), and allowing the points representing individual countries to change over time.  This is achieved using an animated bubble chart from  the new Spotfire Mods environment with an adaptation of the Hans Rosling gapminder analysis

Figure 3. Animated bubble chart relationship between worldwide vaccinations administered (%population) and new cases (per 100K population); as of April 8, 2021. 
Countries are sized by total vaccinations and colored by case velocity: -10% is green and +10% is red. 
This animation starts in January and runs through the present time. The animation is launched by hovering over the tip in lower left and pressing the play arrow. The speed can be controlled by hovering over the speed dial on lower right. The graphic is implemented via an Animated Bubble Chart Mod for Spotfire, available from the TIBCO Community Exchange. 

Highlights of this analysis in Figure 3 include:

Countries moving towards defeating the virus: right to left (cases), bottom to top (vaccinations)

  • Israel – rapid move from bottom-right (high cases, low vaccinations) to top-far-left (high cases, low vaccinations) from mid-February to early April.
  • UK – rapid move from bottom-right (high cases, low vaccinations) to middle-far-left (high cases, low vaccinations) from mid-February to early April.
  • US – rapid move from bottom-right (high cases, low vaccinations) to middle-left (high cases, low vaccinations) from mid-February to early April.
  • Jordan, Czech Republic – move from lower-right to slightly-higher-left from end-March through present
  • Portugal, Spain - move from lower-right to slightly-higher-left from mid-March through present

Countries moving towards worsening viral infection rates: left to right (cases)

  • Chile – rapid move from bottom-right (high cases, low vaccinations) to middle-left (high cases, low vaccinations) from mid-February to mid-March; then shift to middle-right from mid-March through present.
  • Turkey, Hungary, France, Poland, Netherlands, Belgium, Austria, Ukraine – steady move from lower-left to lower right from early March through present
  • Brazil, India - moving from lower-left to middle right from early March through present

The bottom line is that SARS-CoV-2 cases are surging in much of Europe, apart from the UK. There are some signs of improvement in Portugal, Spain and very recently in countries such as Jordan, Czech Republic; though, these should be investigated from a data quality perspective. 

Local regions within countries can be assessed from the top level of the Live Report. 

Figure 4. Local case trends in Brazil, (as of April 8, 2021). 


3. Virus, Variants and Vaccines Relationships – United States

As of April 8, there were ~30M cases and ~550K deaths in the United States (Figure 5).

Meanwhile, more than 170M doses of vaccine have been administered in the US, approx. 33 doses per 100 people. Approx 32% have received a vaccine jab and approx. 19% are fully vaccinated. 

To study the effects of vaccinations on new cases, we examine relationships among people vaccinated (including partial dose), daily new cases, benchmarked by vaccination rates across geographies (Figure 6). 

Figure 5. Local case trends in US counties  (as of April 8, 2021). 

Figure 6. Relationships between vaccinations administered and new cases in US States (as of April 8, 2021). 

Highlights of this analysis include:

  • The top 10 states by % vaccinated have all administered at least one jab to more than 28% of their populations. 
  • New Jersey has administered the highest % vaccinated with more than 36% of the population with at least one jab.
  • California has administered the most vaccinations with more than 13M people receiving at least one jab
  • Michigan, New York and New Jersey are still logging more than 40 cases per 100K people. 

Figure 7 shows the recent history and progression of cases and vaccinations over time for US states in an animated bubble chart. We fix the vaccinations and cases axes, and allow the points representing individual states to move.  

Figure 7. Animated bubble chart relationship between vaccinations administered (%population) and new cases (per 100K population) in the United States; as of April 4, 2021. 
States are sized by total vaccinations and colored by case velocity: -10% is green and +10% is red. 
This animation starts in January and runs through present time. The animation is launched by hovering over the tip in lower left and pressing the play arrow. The speed can be controlled by hovering over the speed dial on lower right. The graphic is implemented via an Animated Bubble Chart Mod for Spotfire, available from the TIBCO Community Exchange. 

Highlights of this analysis include:

  • All states move rapidly to the left from mid-January when vaccinations started delivery in earnest

States moving towards worsening viral infection rates: left to right (cases)

  • Michigan
  • New Jersey, New York

States moving towards defeating the virus: right to left (cases)

  • New Mexico, Oklahoma
  • California


4. SARS-CoV-2 Bioinformatics


The SARS-CoV-2 virus has been mutating at an estimated rate of ~0.001 nucleotide changes / site / year, posing a challenge for labeling and tracking the proliferation of differentiating virus strains.  Rambaut et al, 2020, proposed  a methodology called PANGO lineages (phylogenetic assignment of named global outbreak lineages), which has been broadly adopted by the international community of scientists.  An illustration of how the PANGO lineage identifiers are assigned is shown in Figure 3 and more details are available at their PANGO lineage website (

Figure 8. PANGO lineages of SARS-CoV-2 as of March 30 (Rambaut et al, 2021)

Variants of concern
Of the >7500 PANGO lineages identified to date, a handful have become variants of concern; these include :-

  • B.1.1.7 : first emerged in the UK in September 2020. Davies et al (March 2021) estimate 43-90% increase in reproduction number than pre-existing variants; and in another article Davies et al (March 2021) estimate an increase of 39-72% in hazard of death. 
  • B.1.351 : first detected in October 2020 in South Africa. There is no evidence thus far of increased reproduction or fatality. There is some evidence that one of the mutations may affect neutralization from monoclonal antibody treatment, making this treatment less effective
  • P.1 : first detected at a Japanese airport in travelers from Brazil. This variant appears to be associated with a COVID-19 case surge in Brazil, since mid-December. 

Table 1. SARS-CoV-2 Emerging Variants. CDC Science Brief.

Variants, Vaccines and Case Relationships

The combination of the B.1.1.7 variant appearance in Europe, coupled with the vaccination rollout missteps, has led to a surge in cases in some European countries. Figure 9 shows B.1.1.7 distribution and prevalence (green color) along with the less virulent B.1.2 (bone color).

Figure 9. Most common SARS-CoV-2 variants of concern distribution and prevalence.
This shows the 5 most common variants of SARS-CoV-2, specifically B.1.1.7, B.1.177, B.1, B.1.2, B.1.1.29. 
Top-right: The world map shows the relative frequency of each in different geographies, by country and US state, highlighting that there is significant differences in virus variant in the US vs the rest of the world. 
Bottom-left: the relative frequency of each SARS-CoV-2 variant by month between January 2020 and March 2021, showing that variants which were dominant early (B.1) have been entirely subsumed by later emerging variants (B.1.1.7, VOC 202012/01). 
Bottom-right: the absolute number of samples detected with each of the variants, showing that both that our rate of sample sequencing is much higher and that B.1.1.7 is predominant.. 

It is pretty clear that aggressive vaccination is highly effective in crushing the virus, even the more aggressive variants. The UK has done a remarkable job in thwarting the B.1.1.7 variant, by leading with a broad single dose strategy. Israel has rapidly vaccinated a large percentage of their populations and the appearance of new cases is minimal. However, some European countries have struggled with cautious vaccination programs, and delays in rollouts. The difference in vaccinations (%) and cases (per 100K) between these countries (Figure 10) is stark. 

The US has largely dodged the B.1.1.7 bullet, with a preponderance of cases in just Michigan, the far northeast, Arizona and Idaho. With the rapid vaccination push, the goal of crushing the virus across the US is attainable. 

Cases continue to surge in other geographies such as Brazil and India. 

a) Israel

b) United Kingdom

c) France
Figure 10. Differences in vaccination rates and corresponding epidemic case curves. 
Top: Israel (62% vaccination, 4 cases/100K); Middle: United Kingdom (47%, 5); Bottom France (14%, 48)

5. SARS-CoV-2 vaccines and their effectiveness

Table 2. Leading vaccines (Milken Institute, 2021).

Table 3. Vaccine categories (Milken Institute, 2021).

As of April 8, more than 7M jabs have been administered around the world. Of the 10 SARS-CoV-2 vaccines used in the world, the Oxford/AstraZeneca and Pfizer BioNTech Vaccines are the most prevalent. Both are being used in over 80 countries. The rest of the vaccines, including Moderna, are used in fewer countries (~35 for Moderna and ranging between 1 to 20 for the rest). 

Figure 11. Pfizer/BioNTech vaccine distribution 

The Pfizer vaccine has predominantly been used by European countries and North America. The vaccine is also being used by Middle Eastern and South American countries.

Figure 12. Oxford/AstraZeneca vaccine distribution 

The AstraZeneca vaccine has predominantly been used in Europe. It is also more widespread across continents, being used across Latin America, Africa, and South Asia. Unlike Pfizer and Moderna, AstraZeneca’s vaccine does not need refrigeration and can be stored in normal refrigerated temperatures, increasing its distribution to warmer countries and, expectedly, the less developed ones where transportation logistics raise concerns.

Figure 13. Vaccine jabs by country 

The supply and wait for vaccines varies. It's a matter of politics and economics (more versus less developed countries) but also a matter of countries’ differing vaccine regulations. Chile, with over a third of population partially vaccinated, has mainly used Beijing-based Sinovac's vaccine. The United States has used approximately equal doses of Moderna and Pfizer and has not approved AstraZeneca’s vaccine. European countries including Italy and Germany suspended the use of AstraZeneca’s vaccine in mid-March over concerns of side effects. With more vaccines readily being tested and approved, countries are also facing the logistics of 'vaccine diplomacy’. Still, the development and use of several SARS-CoV-2 vaccines has allowed for fast, increased vaccinations around the world.

Vaccine efficacy

The RNA-based BioNTech/Pfizer and Moderna vaccines are highly safe and efficacious (Table 4). The BioNTech/Pfizer and Moderna vaccines have a cold storage requirement that is challenging for the developing world. While the Oxford/AstraZeneca vaccine’s low cost and simple storage make it a frontrunner  to be the leading ‘vaccine for the world’ some rare but serious side effects have occurred with it. Kate Bingham who has led the UK’s successful Vaccine Taskforce says that the Novovax vaccine shows promise for worldwide use.

Table 4. Vaccine efficacies from clinical trial data (Eric Topol Twitter)

95% efficacy of the Pfizer/Moderna vaccine means that the vaccine reduces the attack rate of the virus by 95%. In most populations of people the attack rate of the virus is currently running at around 1%. So a 95% vaccine efficacy means that in a population of 100,000 people receiving the vaccine, instead of 1000 COVID-19 cases (1%) we would expect 50 cases. This translates in 99.95% of the population receiving the vaccine being COVID-free. So in reality, after getting the Pfizer/Moderna vaccine, based on the clinical trial data, you only have a 1 in 2,000 chance (50 in 100,000) of getting the virus.

Note that it is not entirely appropriate to compare the efficacies of the various vaccines from the clinical trial data alone, as they were done under different conditions and with different inclusion criteria. For example, the Pfizer/Moderna trials were done earlier than the others and solely in the US, during a period of lower cases; and the J&J trial was done later when cases were higher, and in different geographies where more aggressive virus variants had emerged. More important is that all the vaccines are highly effective at preventing death and hospitalization; in fact there were no deaths or hospitalizations in the vaccine treatment arm in all of the vaccine clinical trials. This video from Vox Explainer (2021) describes these issues.

Subsequent to the clinical trials, these efficacy rates have held up in multiple observational studies. For example, in a study at the University of Texas Southwestern Medical Center (UTSW), 23,234 UTSW employees received a first dose of one of the mRNA vaccines and 30% received a second dose. In the following month, 350 of the 23,234 employees (1.5%) who were eligible to receive the vaccine were identified as being newly infected with SARS-CoV-2. As shown in Figure 8, the percentages of persons who became infected differed according to vaccination status, with infections in 234 of 8969 unvaccinated employees (2.61%; 95% confidence interval [CI], 2.29 to 2.96), 112 of 6144 partially vaccinated employees (1.82%; 95% CI, 1.50 to 2.19), and 4 of 8121 fully vaccinated employees (0.05%; 95% CI, 0.01 to 0.13) (P<0.01 for all pairwise comparisons).

Figure 14. mRNA vaccine efficacy from observational data (Daniel et al, 2021)

Vaccine development timeline

Much has been written about the rapid development (less than one year) of several highly efficacious vaccines against a previously unknown viral pathogen. As described by Fauci et al (2021) in a recent Science publication, “What is not fully appreciated is that the starting point of the timeline for SARS-CoV-2 vaccines was not 10 January 2020, when the Chinese published the genetic sequence of the virus. Rather, it began decades earlier, out of the spotlight,, with the utilization of highly adaptable vaccine platforms such as RNA (among others) and the adaptation of structural biology tools to design agents (immunogens) that powerfully stimulate the immune system. The RNA approach evolved over several years owing to the ingenuity of individual scientists, including Drew Weissman and Katalin Karikó, and the concentrated efforts of several biotech and pharmaceutical companies.”

The timeline in Table 5 shows the vaccine development timeline and milestones (Topol, 2021):

Table 5. TImeline from COVID-19 appearance to initial vaccination for healthcare professionals.


6. Summaries and predictions for the pandemic future

The analyses in this blog explore the trends in SARS-CoV-2 infections and relationships with vaccinations on a local level worldwide. Countries with aggressive vaccination rates are seeing modest rates of new cases. Countries and states with low vaccination rates are experiencing increased case rates, especially in regions with a preponderance of SARS-CoV-2 variants of concern, such as France, Netherlands and Turkey. The B.1.1.7 variant is dominant in many European countries; its reproduction number is estimated to be 77% higher than prior variants and with 55% higher risk of death within 28 days of infection (Nicholas et al, 2021, Davies et al, 2021). The B.1.1.7 variant prevalence is increasing in the US as well, now making up 50% of cases in several states (MI, MN, FL, TX, NC, PA, MA, IL, GA, IN). Cases are now rising again in many US States (New York Times, April 6), and it's becoming a race between vaccines and virus variants in local regions worldwide. 

In general, SARS-CoV-2 is mutating rapidly as it spreads, with rates of ~ 0.001 nucleotide change / site / year. Mutations in the spike glycoprotein are of most concern (Tegally et al, 2021). Figure 14 show the distribution of spike protein mutations for the B.1.1.7 variant and highlights that, while PANGO lineage variants have defining common mutation, that there are many other, less common mutations within the lineage as well.

This mutation rate is approx. one quarter the rate of HIV and one half the rate of Influenza. As such, vaccine booster shots may be needed every 2 years or so. These estimates are guesses for many reasons, not the least of which is the lack of sequence data from much of the developing world.

Figure 15. . Common mutations in SaRS-CoV-2 variant B.1.1.7.  In this Spotfire SARS-CoV-2 mutation analysis dashboard, B.1.1.7 has been selected to show the number of times a given mutations has been detected within that lineage (upper-left), the relative frequency of each mutation (lower-left), the position of the given mutation juxtaposed with a cartoon of the spike protein structure (upper-right), . 


7. References

Alanagreh L, Alzoughool F, Manar Atoum (2020) The Human Coronavirus Disease COVID-19: Its Origin, Characteristics, and Insights into Potential Drugs and Its Mechanisms Pathogens, 9(5), 331;

CDC. SARS-CoV-2 Variant Classifications and Definitions

Corbett, K.S., Edwards, D.K., Leist, S.R. et al. SARS-CoV-2 mRNA vaccine design enabled by prototype pathogen preparedness. Nature 586, 567–571 (2020).

Daniel, W., Nivet, M., Warner, J., Podolsky, D. (2021)..Early Evidence of the Effect of SARS-CoV-2 Vaccine at One Medical Center. letter. March 23, 2021

Davies, N.G., Abbott S., Barnard R., Jarvis C.I., Kucharski A, Munday J.D., Pearson C.A.B., Russell T., Tully D., Washburne A.D., Wenseleers T., Gimms, A., Waites, W., Wong, K.L.M., van Zandvoort, K., Silverman, J.D., Diaz-Ordaz, K., Keoghs, R., Eggo, R.M., Funk, S., Jit, M., Atkins, K.E., Edmunds, W.J. CMMID COVID-19 Working Group, COVID-19 Genomics UK (COG-UK) Consortium. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science  03 Mar 2021, DOI: 10.1126/science.abg3055

Davies, N.G., Jarvis, C.I., CMMID COVID-19 Working Group, Edmunds, WJ, Jewell, NP, Dias-Ordaz, K, Keogh RH. Increased mortality in community-tested cases of SARS-CoV-2 lineage B.1.1.7. Nature (2021).

Elbe, S., and Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33-46. DOI:10.1002/gch2.1018  PMCID: 31565258

Fauci, A. (2021). The story behind COVID-19 vaccines. Science  09 Apr 2021: Vol. 372, Issue 6538, pp. 109, DOI: 10.1126/science.abi8397

Financial Times (Aprol 2, 2021). UK vaccine supremo Kate Bingham: ‘The bickering needs to stop’.

Milken Institute (2021). Vaccine tracker.

New York Times (April 6, 2021). As Variants Have Spread, Progress Against the Virus in U.S. Has Stalled

Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L & Pybus OG (2020) Nature Microbiology  A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. DOI:10.1038/s41564-020-0770-5

Shu, Y., McCauley, J. (2017)  GISAID: Global initiative on sharing all influenza data – from vision to reality. EuroSurveillance, 22(13) DOI:10.2807/1560-7917.ES.2017.22.13.30494  PMCID: PMC5388101

Tegally, H., Wilkinson, E., Giovanetti, M. et al.  Emergence of a SARS-CoV-2 variant of concern with mutations in spike glycoprotein. Nature (2021).

Topol, E. (2021) Vaccine efficacies from clinical trial data. Twitter @EricTopol

Vox Explainer. What a vaccine's "efficacy rate" actually means. 



Michael O'Connell, Ph.D., is the chief analytics officer at TIBCO, where he helps clients with analytics software applications that drive business value. He has written a bunch of scientific papers and software packages on statistical methods. He also likes listening to electronic music; watching basketball, football and cricket; going to art galleries and walking around neighborhoods.

Neil Kanungo is a Data Scientist at TIBCO and specializes in data visualization and business analytics. He helps deliver unique solutions to industry’s biggest challenges. Neil takes a special interest in operationalizing analytics across organizations at multiple levels, and in fostering user engagement. In his free time, Neil enjoys hiking with his dog, live music, and playing pinball. 

David Katz is a Principal Consultant at TIBCO. With a long career in data analysis, model building and statistical consulting, David enjoys tackling challenging problems with real-world benefits, in particular using advanced regression methods and making the invisible visible. The most fun is the variety of applications he has been able to work with, from Formula One racing to marketing and operations. In his spare time he likes to bike, hike and do yoga.

Adam Faskowitz is a Senior Data Scientist at TIBCO and recent graduate from UC Berkeley. His interests include data visualization, machine learning, and communication within data science. His passion for data science is driven by his curiosity for trying to understand complex problems and goal of creating meaningful solutions. In his free time, Adam enjoys watching and playing sports, going to art museums, and relaxing at the beach.

Sweta Kotha is a Data Scientist at TIBCO and a recent graduate from Carnegie Mellon University. Her experience spans data analysis, natural language processing, and biostatistics. She strives to unravel complex data and create meaningful plus explainable insights. In her free time, Sweta enjoys reading, running, and traveling.

Dan Weaver is Sr Solutions Architect, PerkinElmer