COVID19 : Visual Data Science Part 2  Update & Methodologies
Follow along with this blog in Spotfire, and for live updates
Live Spotfire application available here
This blog and Spotfire application are authored by the TIBCO Data Science team
Contact: Michael O’Connell, @MichOConnell
Introduction
We are now in the midst of many COVID19 regional outbreaks. WW confirmed cases have topped half a million and growing rapidly. Italy now has more confirmed cases than China; and Italy, Spain, Germany, UK and US are on an approximate 35 day doubling rate of cases. There have been more than 20 thousand deaths. Italy and Spain now have more deaths than China; and Italy, Spain, UK and US are on an approximate 35 day doubling rate for deaths.
Note that errors around any predictions of future cases are substantial  with exponential parameters comes exponential prediction errors! It is only by modeling, visualizing and predicting emerging infections, that everyone can understand the pandemic in their own region, assess the effects of preventive measures, and apply best protective practices in their local communities. And to understand our personal risk!
This paper provides an update on our analyses and some details on our modeling, simulation and analytics methodologies. This includes :

COVID19 Trajectories : interpretation and normalization; including autocluster of trajectories

Data Science Modeling : Rt progression

Compartment Modeling: epidemiology and statistical parameters

Healthcare resource requirements modeling

GeoSpatial Analysis: map layers, cartograms, chloropleths
The analyses are presented using Spotfire visual analytics in a hosted environment. Figure 1 shows the Spotfire application Global Overview.
Figure 1. Spotfire application Global Overview. Shows worldwide cases, fatalities, recoveries and countrylevel stats. Includes slider for stepping through time by date.
The analyses refresh hourly, depending on availability of data sources. Spotfire apps and code will be made available for download. Links to various trusted data sources are provided. Collaboration is encouraged and Spotfire will be available for use by those who don’t have it. TIBCO customers who are struggling with data and analytics issues around COVID19 effects, can contact the authors for more information and assistance.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 1 above.
COVID19 Trajectories
Figures 2 and 3 show COVID19 case trajectories by country and US states. Figure 4 shows a cluster analysis of case trajectories by country. Figures 5 and 6 shows COVID19 deaths by country and US States. All of these analyses update hourly as data permit, or by a refresh button click in the Spotfire apps.
For case trajectories, the yaxis is the cumulative number of confirmed cases, on the log scale and the xaxis is the time in days after the first <100> confirmed cases. The dashed lines are at slopes representing 1day, 2day, 3day 5day and 7day doubling.
Note that we use raw and cumulative cases rather than normalizing by total population. Normalized numbers are good at showing *relatively* how much strain a country is under, but they’re not suited to tracking the extent/state of a country’s outbreak, which spreads at approximately the same pace regardless of country size. Also note that cases are a function of the number of tests performed; this varies considerably by country. As such, the number of confirmed cases should not be interpreted as reflective of actual infections.
Figure 2. COVID19 case trajectories by country. The yaxis is the number of confirmed cases (log scale), and the xaxis is the number of days after the first <100> confirmed cases. The <100> days aligns the curves to a common starting point in the epidemic outbreaks, and is configurable in the Spotfire application. The dashed lines indicate various doubling rates in days.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 2 above.
Figure 3. COVID19 case trajectories by US state. The yaxis is the number of confirmed cases (log scale), and the xaxis is the number of days after the first <100> confirmed cases. The <100> days aligns the curves to a common starting point in the epidemic outbreaks, and is configurable in the Spotfire application. The dashed lines indicate various doubling rates in days.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 3 above.
Figure 4. COVID19 case trajectories clustered by country. The yaxis is the number of confirmed cases (log scale), and the xaxis is the number of days after the first <100> confirmed cases. The <100> days aligns the curves to a common starting point in the epidemic outbreaks, and is configurable in the Spotfire application. The dashed lines indicate various doubling rates in days. Countries are clustered using the Hartigan–Wong algorithm in Spotfire, using the silhouette value for autoselection of the number of clusters. Sequences longer than those available in the US are truncated.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 4 above.
Figure 5. COVID19 fatality trajectories by country. The yaxis is the number of fatalities (log scale), and the xaxis is the number of days after the first <10> deaths. The <10> days aligns the curves to a common starting point in the epidemic outbreaks, and is configurable in the Spotfire application. The dashed lines indicate various doubling rates in days.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 5 above.
Figure 6. COVID19 fatality trajectories by US state. The yaxis is the number of fatalities (log scale), and the xaxis is the number of days after the first <10> deaths. The <10> days aligns the curves to a common starting point in the epidemic outbreaks, and is configurable in the Spotfire application. The dashed lines indicate various doubling rates in days.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 6 above.
Modeling the Outbreaks  Visual Data Science
See O’Connell (18 March), for an outline of our analysis to date, and epidemiology modeling basics. In summary :
The reproduction number R0 (pronounced Rnought) is the average number of people infected from a person with an infection, without any interventions in place. This is a crucial parameter in describing an epidemic. The effective reproduction number Re includes intervention effects. If Re is bigger than 1, the disease spreads. Conversely if Re, or the timevarying reproduction number Rt can be reduced over time, the disease can be contained.
The reproduction number R0 as the product of D*O*T*S (Kucharsky), where :
D = duration (number of days someone is infectious)
O = opportunities for transmission (number of personperson greetings / day)
T = probability of transmission
S = susceptibility (proportion of population susceptible)
Delameter et al. describe R0, its use and misuse.
For COVID19, without intervention (per Kucharski, TED Interview) :

D (number of days someone is infectious) is approx. 12 weeks, before isolation. This includes ~56 days incubation until symptoms, and often an additional ~25 days before isolation. Flu is slightly shorter e.g. ~3 days. STDs can be several months.

O (number of personperson greetings / day) is modeled as ~510 people/day (personperson greetings) under usual behavior

T (probability of the virus being transmitted in an interaction) is approx. 1/3. This is high compared to Flu and SARS.

S (proportion of population susceptible) is high i.e. 95100%. Per Kucharski (TED Interview), based on early Wuhan data, ~95% of the initial population were still susceptible up to the end of January.
Kucharski describes R0 = 2 to 3 in uncontrolled outbreaks for COVID19, compared with Flu where R0 = ~1.2.
The other key parameter is the Case fatality rate (CFR)  this measures the risk that someone who develops symptoms will eventually die from the infection.
For COVID19, Kucharski (TED Interview) says this about the CFR: “I’d say on best available data, when we adjust for unreported cases and the various delays involved, we’re probably looking at a fatality risk of probably between maybe 0.5 and 2 percent for people with symptoms.” By comparison, the CFR for Flu is ~0.1%. Kucharski summarizes by stating that COVID19 is ~10X+ more deadly than Flu. This is inline with other experts and studies e.g. Pail Atwater (Johns Hopkins) stated that "CFR is clearly going to be less than 2%, but at the moment we just don’t know what that number is".
Early estimates of CFR in epidemics is typically high as focus is on the sickest of the sick. The early CDC estimates were 3.5% in China; and across 82 countries 4.2% and a cruise chip 0.6%. They suggested a wide range of 0.25%3.0%.
Wu et al. estimate the CFR of COVID19 in Wuhan at 1.4% (0.9–2.1%). This is a big dataset as Wuhan was the epicenter for the initial outbreak. They note that this is substantially lower than the corresponding naïve confirmed case fatality risk of 2,169/48,557 = 4.5%; and the approximator of deaths/(deaths + recoveries): 2,169/(2,169 + 17,572) = 11%, as of 29 February 2020. The risk of symptomatic infection increased with age, with those above 59 years were 5.1 (4.2–6.1) times more likely to die after developing symptoms, compared to those aged 30–59.
Ruan summarizes a number of studies and shows wide variability in CFR by region (2·9% in Hubei vs 0·4% in other areas of China), in different phases of the outbreak (eg, 14·4% before Dec 31, 15·6% for Jan 1–10, 5·7% for Jan 11–20, 1·9% Jan for 21–31, and 0·8% after Feb 1), and by sex (2·8% for males vs 1·7% for females). They also quote the Chinese CDC reports that the case fatality ratio increases with age (from 0·2% for people aged 11–19 years, to 14·8% for people aged ≥80 years), and with the presence of comorbid conditions (10·5% for cardiovascular disease, 7·3% for diabetes, 6·0% for hypertension, 6·3% for chronic respiratory disease, and 5·6% for cancer).
Verity et al analyze deaths in mainland China and recoveries outside of China, estimating the mean duration from onset of symptoms to death to be 17·8 days (95% credible interval 16·9–19·2) and to hospital discharge to be 24·7 days (22·9–28·1). With adjustment for demography and underreporting, they estimate case fatality rate in China of 1·38% (1·23–1·53), with substantially higher ratios in older age groups (0·32% [0·27–0·38] in those aged <60 years vs 6·4% [5·7–7·2] in those aged ≥60 years), up to 13·4% (11·2–15·9) in those aged 80 years or older. Estimates of case fatality rate from international cases stratified by age were consistent with those from China (parametric estimate 1·4% [0·4–3·5] in those aged <60 years [n=360] and 4·5% [1·8–11·1] in those aged ≥60 years [n=151]). These early estimates give an indication of the fatality ratio across the spectrum of COVID19 disease and show a strong age gradient in risk of death.
It is tricky to calculate the CFR. The best way to calculate CFR would be to track a large group of people from the point when they develop symptoms until they later die or recover, and to then calculate the proportion of all these cases who had died. This is not possible in the real world. It is incorrect to just divide the total number of deaths by total number of cases as this does not account for unreported cases or the delay from illness to death.
It is widely recognized that there are many unreported cases eg due to unavailable test kits. In the US analysis below, Bedford estimates and approx 10X underreporting of cases on March 13. Re. the time delay, consider 20 new people admitted to a hospital with confirmed COVID19 infection on a given day  that doesn’t mean the CFR is zero!. We need to wait to see what happens to them. Conversely any deaths that occur are people who showed symptoms some weeks before.
Fauci et al. state that "if one assumes that the number of asymptomatic or minimally symptomatic cases is several times as high as the number of reported cases, the case fatality rate may be considerably less than 1%. This suggests that the overall clinical consequences of Covid19 may ultimately be more akin to those of a severe seasonal influenza (which has a case fatality rate of approximately 0.1%) or a pandemic influenza (similar to those in 1957 and 1968) rather than a disease similar to SARS or MERS, which have had case fatality rates of 9 to 10% and 36%, respectively."
Bendavid and Bhattacharya also write about the underreporting and effects of limited testing, and suggest CFR could be more like 0.01 to 0.1% ie more in line with seasonal Flu or perhaps less deadly.
For all the reasons above, there is both a wide range of estimates and opinions on the CFR. It seems clear that CFR is higher on people older than 60 and with comorbind conditions. The CFR will become clearer as more people are tested and more people are followed from infection through recovery or death. It is important to get an accurate estimate of CFR soon, so as to best focus interventions at appropriate levels in regional and global communities.
Compartment models
Compartment models are a technique used to simplify the mathematical modelling of infectious disease. The population is divided into compartments, with the assumption that individuals in the same compartment have the same characteristics. The models are defined with ordinary differential equations (ODEs, deterministic), and can also be viewed in a stochastic framework, which is more realistic but more complex to analyze (Wikipedia: Compartmental Models).
Compartment models may be used to predict properties of how a disease spreads, for example the prevalence (total number of infected), reproduction number (average number of people infected from a person with an infection) and the duration of an epidemic. Also, the models enable understanding how different interventions may affect the outcome of the epidemic, and can be used to simulate various scenarios.
The SIR model is one of the simplest compartmental models, and many models are derivations of this basic form. With the SIR model, people transition from susceptible (S) to infected (I) to removed (R), with S+I+R = N (the total population size); where R can be recovered or death. The number of susceptible, infected and removed individuals vary over time (even if the total population size remains constant), we make the precise numbers a function of t (time): S(t), I(t) and R(t). This model is reasonably predictive for infectious diseases which are transmitted from human to human, and where recovery confers lasting resistance, such as measles, mumps and rubella.
COVID19 has a significant incubation period, with estimated median of 5.1 days. This requires at least one additional compartment for modeling. The SEIR model where people transition from susceptible (S) to exposed (E) to infected (I) to removed (R), with S+E+I+R = N (the total population size); where R can be recovered or death.
We can fit SIR and SEIR models in R, with packages such as EpiModel. Tim Churches has provided an excellent blog on fitting compartment models and modeling the epidemic trajectory and the effective reproduction number over time. For the purpose of simulating and forecasting healthcare resource scenarios we use models with additional compartments. Figure 5 shows one such model. We are developing approaches for healthcare resource planning using the work of Althaus (25 March). He presents a method for modeling and projections of the COVID19 epidemic in Switzerland, using an SEIR model and the daily number of reported deaths. Althaus includes additional compartments for hospitalization and critical care (ICU). He assumes constant uncontrolled transmission until the lockdown that was set in Switzerland on 17 Mar 2020; and then varies a parameter kappa = Re/R0 as a measure of the effectiveness of the subsequent interventions.
These two modeling scenarios are covered in sections below.
Figure 5. Compartment model for studying and simulating scenarios of COVID19 outbreaks.
The time sequence of virus and human host states are outlined in Figure 6. This shows a number of epidemiology parameters :

The Latent Period is the time between the occurrence of infection and the onset of infectiousness (when the infected individual becomes infectious).

The Serial Interval = the duration of time between the onset of symptoms in a primary case and the onset of symptoms in a secondary case infected by the primary case.
 The Incubation Period represents the time period between the occurrence of infection (or transmission) and the onset of disease symptoms
Figure 6. Infection and transmission timeline of COVID19. Based on supplement to: Anderson et al.. Lancet 2020.
Modeling the Effective Reproduction Number over time  Rt
R0 is a base rate, with no interventions, and with the virus in an unmodified state of the population. For COVID19, R0 has been widely reported to be in the range 23. The effective reproduction number Re includes intervention efforts (drugs, nondrugs). If the effective reproduction number Re >1, the disease spreads. Rt, the timevarying reproduction number, tracks Re over time. Our current nonpharmaceutical interventions (NPIs) are aimed at reducing Re. If the Re can be reduced below 1 with interventions, the virus stops spreading.
We have been estimating the timevarying reproduction number Rt, at a state and county level, across the US and worldwide. While some data are thin, early results are encouraging, showing a downward trend in Re over time in some countries and states.
Figure 7 shows some results of modeling Rt for different countries and Figure 8 some results of this Rt modeling of different US states. Models are fit using the package EpiEstim. This package can be added to Spotfire via the TERR Tools menu, and configured to run via a Spotfire data function. Userselected markings on maps and other visuals then invoke the Rt estimates to run interactively, in context of exploratory visual data analysis.
EpiEstim (Cori et al, 2019) analyzes time series incidence data to estimate timevarying reproduction numbers as outlined in Cori et al 2013. EpiEstim incorporates uncertainty in the distribution of the serial interval  the time between the onset of symptoms in a primary case and the onset of symptoms in secondary cases.
There are five estimation methods in EpiEstim; these vary in the way the serial interval distribution is specified. In the first two methods, a unique serial interval distribution is considered, whereas in the last three, a range of serial interval distributions are integrated over:
 "parametric_si" the user specifies the mean and sd of the serial interval
 "uncertain_si" the mean and sd of the serial interval are each drawn from truncated normal distributions, with parameters specified by the user
 "si_from_data", the serial interval distribution is directly estimated, using MCMC, from interval censored exposure data, with data provided by the user together with a choice of parametric distribution for the serial interval
 "si_from_sample", the user directly provides the sample of serial interval distribution to use for estimation of R.
Zhanwei et al. (CDC EID) estimate the distribution of serial intervals for 468 confirmed cases of COVID19 reported in China as of February 8, 2020. They found mean interval of 3.96 days (95% CI 3.53–4.39 days), and SD 4.75 days (95% CI 4.46–5.07 days).
We have been exploring all the above methods, following the logic and approach set out by Churches. Our live Spotfire currently uses values for SI as mean 2.6 and standard deviation of 1.5; and we are exploring mean 4.7 days and standard deviation 2.9 days. Churches reasoning for these higher values is that they better account for transmittion before the onset of symptoms, which results in shorter serial intervals than expected, possibly even shorter than the incubation period (see Figure 9). As we explore additional approaches eg by Abbott et al and with application to estimates of Rt on US states, we will update the Spotfire app. In particular, we are planning to expose these parameters to the R functions in Spotfire eg ranges (3.7,6.0) and (1.9,4.9). We are using window length of 7 days. We are also planning to expose this and let people change the window length from (1,7) as a parameter in Spotfire.
Figure 7. Rt modeling of countries as of March 26. highlighting France, Germany, Italy, the Netherlands, Spain and the UK.. The colored bands show Rt < 1.0 (green), 1.0 < Rt < 2.0 (yellow), 2.0 < Rt < 3.0 (amber). The dark line is the Rt estimate and the gray lines are 95% credible intervals. The models use the R package EpiEstim invoked through a Spotfire data function.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 7 above.
Figure 8. Rt modeling of US states as of March 26. highlighting California, Connecticut, Louisiana, Massachusetts, Michicagan and New York. The colored bands show Rt < 1.0 (green), 1.0 < Rt < 2.0 (yellow), 2.0 < Rt < 3.0 (amber). The dark line is the Rt estimate and the gray lines are 95% credible intervals. The models use the R package EpiEstim invoked through a Spotfire data function.
Data: Johns Hopkins University  Spotfire application available here  this includes regular refresh of the data on Figure 8 above.
Note that Rt values can change quickly in response to non pharmaceutical interventions (NPIs). As outlined above, Kucharski et al. (March 11, 2020) found that the median daily Rt in Wuhan declined from 2·35 1 week before travel restrictions were introduced on Jan 23, 2020, to 1·05, just 1 week after. The next section assesses the effects of various nonpharmaceutical interventions.
Associating and Interpreting Rt and Case Data
Italy and Spain have had major outbreaks of COVID19 in March. We show estimates of Re over time for Italy and Spain though April 2nd in Figure 9 below.
During March Italy and Spain aggressively adopted nonpharmaceutical interventions (NPIs) such as social distancing, so as to reduce the effective reproduction number and slow down the rate of spread. Results from the Rt estimates show that this appears to be working. This is not surprising as we have seen Re change quickly in response to non pharmaceutical interventions (NPIs) in other regions. As outlined above, Kucharski et al. (March 11, 2020) found that the median daily Rt in Wuhan declined from 2·35 1 week before travel restrictions were introduced on Jan 23, 2020, to 1·05, just 1 week after.
Note that todays cases are people that initiated infection some 23 weeks ago. It takes time for the virus to go from one host to another, and for that person to get tested and to be confirmed as a case. So the estimates of Re reflect cases that initiated some weeks prior. As such, the Re estimates around 1 on April 2nd indicate a solid effect of social distancing in March; and harbor well for the future in terms of reducing cases and flattening the epidemic curve.
Figure 9. Rt modeling of Spain and Italy as of April 2. The colored bands show Rt < 1.0 (green), 1.0 < Rt < 2.0 (yellow), 2.0 < Rt < 3.0 (amber). The dark line is the Rt estimate and the gray lines are 95% credible intervals. The models use the R package EpiEstim invoked through a Spotfire data function.
We check the case counts in Italy over the latter half of March in Figure 10. As predicted from the Rt estimates presented in Figure 9, we see a drop in daily case counts in Italy over the last days of March. While there are many sources of error, and low #tests, there is some comfort in these trends are now aligning, and the epidemic curves are flattening.
Figure 10. Case Counts in Italy and Spain up until April 4th. Note the fall in cases over the prior 7 days.
We study the effects of individual NPIs in the next section. We have collated NPIs across all WW locations and are making these available on the TIBCO Community  COVID19 Visual Data Science Headquarters. These can be referred to in context of the previous 2 figures.
Effects of Interventions
The objective of any public health response during a pandemic, is to slow or stop the spread of the virus by employing mitigation strategies that reduce Rt. Typical interventions include:

testing and isolating infected people

reducing opportunities for transmission (e.g. via social distancing, school closures)

changing the duration of infectiousness (e.g., through antiviral use)

reducing the number of susceptible individuals (e.g., by vaccination)
The initial focus of public health experts with COVID19 has been on suppression i.e. reducing the effective reproduction number Re to below 1; by isolating infected people, reducing case numbers and maintaining this situation until a vaccine is available. This worked well for SARS, but is more challenging for COVID19 because many infected people are asymptomatic and go undetected.
The current focus is on mitigation i.e. reducing Re to slow spreading :

Opportunity parameter : to get Rt below 1, Kucharski (TED Interview) describes the need for everybody in the population to cut interactions by onehalf to twothirds. This can be achieved by initiatives such as working from home (WFH), school closures, reducing social dinners etc.

As a simple analogy, there is a 84% chance of rolling at least one 6 in 10 rolls of a die. This reduces to 31% in 2 rolls (1  (⅚)^n). So you can reasonably expect to cut your odds by onehalf to twothirds by reducing usual social meetings from say 10 meetings to 2 meetings per day.

Measures such as handwashing, reducing contacts with others and cleaning surfaces can reduce the Transmission probability.
Note that the fatality rate in people aged 6070 is increased to ~5%, in people aged 7080 to ~10% and for people older than 80 at 1520%. People with cormorbid conditions are at increased risk. So a key mitigation strategy to reduce deaths is to reduce interactions with the elderly.
Ferguson et al. (Imperial College, 16 March, 2020) describe interventions such as case isolation, household quarantines, restricting large events, closing social gathering spots, closing schools and universities, encouraging individuals to stay at home, pausing sporting and arts events  and how these NPIs can affect the rate of contact and hence R0.
Label 
Policy 
Description 
CI 
Case isolation in the home 
Symptomatic cases stay at home for 7 days, reducing nonhousehold contacts by 75% for this period. Household contacts remain unchanged. Assume 70% of household comply with the policy 
HQ 
Voluntary home quarantine 
Voluntary home quarantine Following identification of a symptomatic case in the household, all household members remain at home for 14 days. Household contact rates double during this quarantine period, contacts in the community reduce by 75%. Assume 50% of household comply with the policy. 
SDO 
Social distancing of those over 70 years of age 
Reduce contacts by 50% in workplaces, increase household contacts by 25% and reduce other contacts by 75%. Assume 75% compliance with policy. 
SD 
Social distancing of entire population 
All households reduce contact outside household, school or workplace by 75%. School contact rates unchanged, workplace contact rates reduced by 25%. Household contact rates assumed to increase by 25%. 
PC 
Closure of schools and universities 
Closure of schools and universities Closure of all schools, 25% of universities remain open. Household contact rates for student families increase by 50% during closure. Contacts in the community increase by 25% during closure. 
Table 1. Summary of NPI Interventions. Based on Ferguson et al. March 16
They also model these mitigation strategy scenarios for the GB to estimate hospital bed and critical care (ICU) requirements.
They predict that for R0 = 2.4, i.e. with a "do nothing approach", that 81% of the Great Britain and US populations would be infected over the course of the epidemic. They then show the effects of the interventions in Table 1 applied to this R0=2.4 scenario, in terms of critical care beds required. The resulting estimated effects are shown in Table 2.
NonPharmaceutical Intervention (NPI)  Maximum critical care beds required 
Do nothing  280 
Closing schools and universities  240 
Case isolation  180 
Case isolation and household quarantine  130 
Case isolation, home quarantine, social distancing of >70s  90 
Table 2. Predicted Effects of NPI Interventions on maximum critical care beds required (per 100,000 population). Based on Ferguson et al. March 16. NPI measures are described in Table 1.
Ferguson et al. suggest that the interventions remain in place for as much of the epidemic period as possible (they show April to July, 2020). They note that “Introducing such interventions too early risks allowing transmission to return once they are lifted (if insufficient herd immunity has developed); it is therefore necessary to balance the timing of introduction with the scale of disruption imposed and the likely period over which the interventions can be maintained.”
The Predictive Healthcare team at Penn Medicine recently released CHIME, a tool for COVID19 hospital capacity planning. CHIME features an interface where users input parameters as follows
 number of days to project
 currently hospitalized COVID19 patients
 doubling time before social distancing
 social distancing (% reduction in social contact)
 hospitalization % (toa=tal infections)
 ICU %(total infections)
 ventialited %(Total infections)
 hospital length of stay
 ICU length of stay
 vent length of stay
 regional population
 currently known regional infections
Results of a CHIME run include projections for hospitalized, ICU and ventilated cases.
Draugelis and Hanish compare the Penn CHIME and Imperial College team’s ‘Do nothing’ scenarios; and analyze CHIME’s Social Distancing parameter with different scenarios from the Imperial College model.
In a similar base scenario, CHIME and Imperial College results are comparable (Table 4).
Parameter 
CHIME Scenario 
Imp Col 
Peak Date 
Mid June 
MidEnd June 
Peak Ventilated or Critical Care Census 
600,000 
730,000 
Table 3: Comparison of CHIME and Imperial College results from a similar base scenario ie no social distancing.
They use a paper by Zhaoyang et al. (2018) on adult daily social interactions to do a rough conversion of Imperial College scenarios to CHIME social distancing scenarios. Table 5 shows the results of running CHIME with these roughly comparable parameters.
% reduction of social contact 
Imperial College Scenario 
CHIME 
Imperial College (their Table 3) 
4% 
PC 
15% 
14% 
12% 
PC + noness_SD 
31% 

15% 
PASD 
42% 
33% 
31% 
PC + noness_SD + PASD 
65% 
69% 
Table 4. Comparison of CHIME’s Social Distancing parameter settings with different scenarios from the Imperial College model.
While this comparison is rough, it is encouraging that the base scenario projections are similar (Table 4) and that the CHIME Social Distancing parameter scenarios (top down overall % reduction in contact) can be lined up with the bottom up estimates of the Imperial College scenarios.
Bottom line, in unmitigated exponential growth, health systems can be quickly overburdened. The NPI measures are designed to save hospital resources eg ICU beds to serve the patients in serious condition. Given that an ICU bed may be taken for 2 weeks, the protective measures need to be aggressive.
Modeling Required Healthcare Resources
In order to understand the application of compartment models to healthcare resource requirements, we are exploring a similar compartment modeling approach to CHIME. Althaus (25 March) presents a method for modeling and projections of the COVID19 epidemic in Switzerland. He fits an SEIR transmission model to the daily number of reported deaths, with additional compartments for hospitalization and critical care (ICU). He assumes constant uncontrolled transmission until the lockdown that was put in place on 17 Mar 2020, and then varies the transmission rate relative to the epidemic spread before the lockdown.
A schematic of the extended SEIR model used by Althaus is depicted in Figure 11.
Figure 11. Schematic of the extended SEIR model from Althaus (25 March).
The parameters include :
Population groups
 S=Susceptible / E=Exposed / I=Infected / R=Recovered, H=Hospitalized / V=ICU / D=died
Free parameters
 C = the cumulative number of cases
 beta = (# contacts per person per time) * probability of infection per contact
Fixed parameters
 omega1 = 1/ hospital stay, days, for mild and severe cases
 omega2 = 1/ hospital stay, days, for critical cases
 epsilon1 = proportion of Infected patients needing hospitalization
 epsilon2 = proportion of Hospitalized patients moving to ICU
 epsilon3 = proportion of ICU patients fatalities
 gamma = 1/(duration of disease, days)
 sigma = 1/(incubation period, days)
Parameters that can be controlled with NPIs  omega1 = 1/ hospital stay, days, for mild and severe cases
 kappa = the NPI effectiveness multiplier; kappa in (0,1), where 1 = no intervention, 0 = max intervention
 Re = kappa * beta / gamma (the effective reproduction number)
 where beta / gamma = R0, the basic reproduction number
Althaus (25 March) varies kappa to reflect NPIs and show effects on hospitalization and ICU bed requirements.
Before the lockdown in Switzerland, Althaus reported the basic reproduction number R0 of COVID19 at 2.99 (95% confidence interval: 2.54  3.59). We checked this using EpiEstim applied to the case data available from Althaus (the table swiss_covid_epidemic). The case data is plotted in Figure 10 (upper left), with the marking (orange line) indicating the data prior to lockdown. In the Spotfire analysis shown in Figure 10, we provide an input field for the serial interval distribution parameter (upper right). We found that using SI = 5.0 gives good agreement with the Re reported in Althaus. Note that the Wuhan data analysis by Li et. al. provided an estimate of 5.3 for the mean SI, so this is a reasonable value. Figure 12 shows Rt dropping from 4.0 to 2.2 over the time sequence prior to lockdown on 17 March, with an average of 2.8 ie close the value of 2.99 (2.543.59) reported by Althaus. This indicates agreement in R0 estimates using different data (case data from Althaus website) and method (EpiEstim) as compared to the Althaus compartment model.
Figure 12. Calibrating the Althaus model with estimated Rt from case data. Fit uses EpiEstim package on case data from Althaus (25 March)
We are working on calibrating the kappa parameter from the Althaus model, to the social distancing results from CHIME and Ferguson et al. as presented in Table 4. Our goal is to create an interactive application for modeling regional hospitals and healthcare systems in the US and other WW regions. In order to do the projections in a state / region level we need :
1) Population size / census (including age distribution)
2) Hospital stats : hospital locations, beds and capacity stats
3) Case data
 Daily case data
 Daily fatality data
4) Epidemic data
 Duration of hospitalization for mild and severe cases
 Additional duration of hospitalization for critical cases
 Proportion hospitalized cases
 Additional duration of hospitalization for critical cases
 Case fatality rate (use 1%)
 Reproduction number (variable)
Our interactive Spotfire application allows endusers to plugin their data for hospital capacity and generate scenarios for resource planning. We are calibrating our models against the Penn CHIME Hospital Impact Model. Our goal is to enable regional healthcare organizations to drill into their local area and interactively obtain liveupdate forecasts of hospital resources needed to meet emerging demand. The forecasts include kappa scenarios combined with userentered data on hospital stats from American Hospital Association and CDC age band risks, along with epidemic and case data selections. We are using census data in age bands, along with hospital data at the county level, to make the whatif scenarios more targeted.
GeoSpatial Data Science
Spotfire’s map charts display multiple layers of information  including points, lines, WKB objects like shapefiles and polylines, and TMS and WMS layers that show e.g. geology, live weather, or customized image, terrain, or other information. Map layers with points, lines, and WKB objects can be configured to respond to marking, and refreshed by Spotfire data functions including model fitting in R and Python. This provides a convenient means of injecting calculations and predictions into interactive map presentations e.g. interactive contour lines, heatmaps, polygons, territory calculations, and route optimization.
Figures 13 and 14 show US county level case data, with drilldown into hotspots in the NY area and the South East. The hotspot colorings are relative within the markings. The companion visuals show confirmed cases sorted by county, and combination daily cases and cumulative cases from the marking.
Figure 13. Interactive marking around the NY hotspot on 3 April. The hotspot coloring show the NYC hotpot and surround. The companion visuals show confirmed cases sorted by counties in the marking, and combination daily and cumulative cases. The combination chart shows no evidence of case flattening.
Figure 14. Interactive marking around the southeast on 3 April. The companion visuals show confirmed cases sorted by counties in the marking, including cities in IN, GA, DC, MO TN. The combination chart of daily and cumulative cases show no evidence of case flattening.
Figure 15 shows an area cartogram (Dorling 1996) of confirmed cases in the US. This is set of nonoverlapping regions with state areas proportional to the number of cases, using a rubber sheet distortion algorithm (Dougenik et al. 1985). The cartogram is invoked via a data function in Spotfire, with the R package Cartogram (Jeworutzki et al) run inside Spotfire on a mouse marking, using the builtin TIBCO Runtime for R engine.
Figure 15. Cartograms of COVID19 confirmed cases from March 19 and March 30. This shows a shifting dominance of cases in from WA and CA to NY.
Summary and Community Actions
Reading Adam Kucharski and other experienced epidemiologists, this COVID19 virus is clearly highly contagious and deadly. However, from a statistical perspective, with exponential growth parameters there are similarly exponential errors on predictions, and many different scenarios could eventuate. The case fatality rate is particularly unclear, with estimates ranging from 0.01% up to 2%. Thats an enormous range of outcomes, perhaps implying a range of total deaths from 50 thousand to 2 million.
When our predictive models are this uncertain, it is no wonder that we are seeing a wide range of human reactions  from terror to indifference. And the community measures that are being implemented have been shown to be effective. At one level we can think of communities and populations, where deaths in the thousands are certain, the economy is in turmoil and our life savings are under attack. At the other end of the spectrum there is us and our individual friends and families. If I/we assume a 10% risk of infection and a 0.1% mortality rate, my/our personal death rate is 1 in 10,000. Or perhaps better said my/our chance of being just fine is 9,999/10,000.
I guess what I’m saying is from a personal perspective it's ok to be afraid and take every measure to protect myself. But I'm not going to take these highly uncertain outcomes as events that are likely to happen. In the current situation, I take comfort in the uncertainty. We are moving forward with hope and confidence in the analytics and predictions  and uncertainties of the predictions  that are summarized in this paper.
The interventions are being driven by our medical and epidemiology experts (e.g., CDC and WHO) and these are measures we know to work since the Spanish Flu of 1918. It's clear that we have to all chip in, in our everyday lives to enforce these :
Be aware of the path to infection: hand to face, etc.
 Stop things like handshakes
 Clean hands often
 Clean and disinfect surfaces
Practice Social Distancing
 Avoid gatherings
 Maintain distance between yourself and others
Think about old people and their high infection and mortality rates
 Cover coughs and sneezes
 If sick, stay home. If that is not possible wear a facemask
We are all in this together. Be kind. Watch out for others in your orbit. Educate others with the knowledge you have. Be generous to others in our lives who are struggling. Help keep the young ones away from the elderly and immunocompromised. Good luck. We will be back with another visual data science update on COVID19 soon.
Updates
This blog was updated on April 4 as follows:
 added new section on GeoSpatial Data Science
 updated the Compartment Model Section
 updated the section Modeling the Effective Reproduction Number over time  Rt
 updated the Modeling Healthcare Resources section
 added CDC reference on age band risk
 added references on Case Fatality Rate
 added Cartogram references
Future updates will likely appear in a new blog, including :
 error sources  testing, case and fatility reporting
 COVID19 risk in context of existing risk
 CFR and case reporting
 testing and diagnostics
 healthcare resource planning
 udpates on visual and geospatial data science
 natural language generation
Appendix: TIBCO Analytics
Acknowledgements:
Special thanks to the awesome TIBCO Data Science team who are working on these analyses using Spotfire (Visual Analytics; R, Python) : Neil Kanungo, Peter Shaw, Prem Shah and David Katz did the heavy lifting, and were well supported by Vinoth Manamala, Eric Hsu, Andrew Berridge, Heleen Snelting, Mike Alperin, Colin Gray and Dan Rope.
Blog contact author: Michael O’Connell, @MichOConnell
References
 Abbott, S, Hellewell, J, Munday, JD, Young Chun, J, Thompson, RN, Bosse, NI, Chan, YWD, Russell, TW, Jarvis, CI. Temporal variation in transmission during the COVID19 outbreak, online March 14, 2020
 Althaus, C. Realtime modeling and projections of the COVID19 epidemic in Switzerland. March 25, 2020
 Anderson RM, Heesterbeek H, Klinkenberg D, Hollingsworth TD. How will countrybased mitigation measures influence the course of the COVID19 epidemic? Lancet 2020, with appendices; published online March 6, 2020
 Becker M and Chivers C. Announcing CHIME, A tool for COVID19 capacity planning. March 14, 2020.
 Bendavid E and Bhattacharya J. Is the Coronavirus as Deadly as They Say? WSJ March 27 2020
 CDC. Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID19) — United States, February 12–March 16, 2020. MMWR Morb Mortal Wkly Rep 2020;69:343346. DOI: http://dx.doi.org/10.15585/mmwr.mm6912e2external icon
 Churches, T. Analyzing COVID19 outbreak data with R  part 1. published online February 7, 2020
 Community mitigation guidelines to prevent pandemic influenza. https://stacks.cdc.gov/view/cdc/45220 United States, 2017
 Cori A, Cauchemez S, Ferguson NM, Fraser C, Dahlqwist E, emarsh A, Jombart T, Kamvar ZN, Lessler J, Li S, Polonsky JA, tockwin J, Thompson R, van Gaalen R. EpiEstim, 2019.
 Cori A, Ferguson NM, Fraser C, Cauchemez S, A New Framework and Software to Estimate TimeVarying Reproduction Numbers During Epidemics. Am J Epidemiology, 2013
 Dalmeter PL, Street EJ, Leslie TF, Yang T and Jacobsen KH. (2019). Complexity of the Basic Reproduction Number (R_{0}). CDC Emerging Infectious Diseases, 25, 1  January 2019
 Dorling, D. (1996). Area Cartograms: Their Use and Creation. In Concepts and Techniques in Modern Geography. Catmog, 59.
 Dougenik JA, Chrisman NR, Niemeyer DR. (1985). An Algorithm to Construct Continuous Area Cartogram. Professional Geographer, 37(1). 1985, 7581.
 Draugelis M and Hanish A. CHIME comparison with Imperial College COVID19 Publication March 18, 2020
 Du Z, Xu X, Wu Y, Wang L, Cowling BJ, and Lauren Ancel Meyers LA, Serial Interval of COVID19 among Publicly Reported Confirmed Cases CDC Emerging Infectious Diseases, Vol 26, 6  June 2020.
 Fauci AS, Lane HC, Redfield RR. Covid19 — Navigating the Uncharted. NEJM March 26, 2020; 382:12681269. DOI: 10.1056/NEJMe2002387
 Ferguson NM, Laydon D, NedjatiGilani G, Imai N, Ainslie K, Baguelin B, Bhatia S, Boonyasiri A, Cucunubá Z, CuomoDannenburg G, Dighe A, Dorigatti I, Fu H, Gaythorpe K, Green W, Hamlet A, Hinsley W, Okell LC, van Elsland S, Thompson T, Verity R, Volz E, Wang H, Wang Y, Walker PGT, Walters C, Winskill P, Whittaker C, Donnelly CA, Riley S, Ghani AC. Impact of nonpharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand. Imperial College, 16 March 2020
 Jeworutzki S, Giraud T, Lambert N, Bivand R, Pebesma E, Nowosad J, Cartogram R package. Version 0.2. CRAN 20191207
 Jones, J. Notes on R0, Stanford University, 2007
 Jones, J. Models of Infectious Disease, Stanford Spring Workshop in Formal Demography, May 2008.
 Kucharski, Adam. The TED Interview, March 12, 2020
 Kucharski et al. Early dynamics of transmission and control of COVID19: a mathematical modeling study, March 11, 2020
 Lauer et al. The Incubation Period of Coronavirus Disease 2019 (COVID19) From Publicly Reported Confirmed Cases: Estimation and Application, Pubmed, March 10, 2020
 Interim prepandemic planning guidance : community strategy for pandemic influenza mitigation in the United States : early, targeted, layered use of nonpharmaceutical interventions. https://stacks.cdc.gov/view/cdc/11425, CDC, 2007
 O'Connell M. COVID19 : A Visual Data Science Analysis and Review TIBCO Blog, 18 March 2020
 Ridenhour, B., Kowalik, J. and Shay, D. Unraveling R0: Considerations for Public Health Applications. Am J Public Health. Doi: 10.2105/AJPH.2013.301704. Published online February 2014
 Riou J, Hauser A, Counotte, MJ, Athaus CL, Adjusted AgeSpecific Case Fatality Ratio during the COVID19 Epidemic in Hubei, China, January and February 2020, 3 March 2020, Preprint.
 Ruan S Likelihood of survival of coronavirus disease 2019. March 30, 2020 DOI: https://doi.org/10.1016/S14733099(20)302577
 Spiegelhalter D. How much 'normal' risk does Covid represent? Medium
 Stanway, A. Real Time COVID19 Tracking. Medium, March 14
 VerityR, LC Okell, I Dorigatti, P Winskill, C Whittaker, N Imai, GC Dannenburg, H Thompson, P Walker, H Fu, A Dighe, J Griffin, A Cori, Marc Baguelin, Sangeeta Bhatia, Adhiratha Boonyasiri, ZM Cucunuba, R Fitzjohn, KAM Gaythorpe, W Green, A Hamlet, W Hinsley, D Laydon, G NedjatiGilani, S Riley, S vanElsand, E Volz, H Wang, Y Wang, X Xi, C Donnelly, A Ghani, N Ferguson. Estimates of the severity of COVID19 disease. doi: https://doi.org/10.1101/2020.03.09.20033357
 Wilson N, Kvalsvig A, Barnard LT, Baker MG. CaseFatality Risk Estimates for COVID19 Calculated by Using a Lag Time for Fatality. CDC EID Journal. Voliume 26, Number 6, June 2020.
 Wu JT, Leung K, Bushman M, Kishore N, Niehus R, de Salazar PM, Cowling BJ, Lipsitch M, Leung GM: Estimating clinical severity of COVID19 from the transmission dynamics in Wuhan, China, Nature Medicine, March 19, 2020
 Zhaoyang R, Sliwinski MJ, Martire LM, Smyth JM. (2018). Age Differences in Adults’ Daily Social Interactions: An Ecological Momentary Assessment Study. Psychol Aging. 2018 Jun; 33(4): 607–618.
Websites with data updates
 1Point3Acres: COIV19 in US and Canada
 Johns Hopkins: Coronavirus Resource Center
 KCDC: Daily cases update from Korea
 Our World in Data: Coronavirus Testing – Source Data
 Wikipedia: Case data for US States
 World Health Organization: Coronavirus situation reports
Twitter feeds
 Trevor Bedford : @trvrb
 Nextstrain : @Nextstrain
 Hannah Ritchie : @_HannahRitchie
 Eric Topol : @EricTopol
 Adam Kucharski : @AdamJKucharski
 Sam Abbott : @seabbs
Michael O'Connell, Ph.D., is the chief analytics officer at TIBCO, where he helps clients with analytics software applications that drive business value. He has written a bunch of scientific papers and software packages on statistical methods. He also likes listening to electronic music; watching basketball, football and cricket; going to art galleries and walking around neighborhoods.  
Neil Kanungo is a Data Scientist at TIBCO and specializes in data visualization and business analytics. He helps deliver unique solutions to industry’s biggest challenges. Neil takes a special interest in operationalizing analytics across organizations at multiple levels, and in fostering user engagement. In his free time, Neil enjoys hiking with his dog, live music, and playing pinball.  
Peter Shaw is a data scientist in the TIBCO Data Science team, based in Seattle. His interests include computational geoanalytics, mapping, pattern recognition, optimization, time series and routing. He views data science as a contact sport, with the analyst, the data, and analytical models as the players. Other interests include photography, drawing, music, and partner dancing.  
Prem Shah is a data scientist working in the Data Science Team at TIBCO based out of their Seattle office. He has a strong inclination to figure out data driven and automated solutions and wants to work with new technologies to get insights. He likes to play the keyboard in his spare time and usually is working on pet projects that involve combining deep learning with his interests. 