Netflix announced a game associated with it’s popular show The Queen’s Gambit last month. This announcement reminded me of a class assignment where I was evaluating if there is any relationship between the tv show and popularity of chess in America.
In this project, we were analyzing the interplay between increase in downloads for chess.com on ios and android devices and the launch of popular Netflix series – The Queen’s Gambit’ in 2020. Our approach was studying the following three phenomenon to evaluate this interplay:
- Duration dependence (time varying hazard) in download propensities of android and ios users and heterogeneity across different users.
- Impact of other time varying covariates in addition to launch of the series that may impact the underlying propensity to download chess.com.
- Latent class/finite mixture models to determine segments with different types of propensities to explain the behavior of the users.
***Note: The entire modeling was done via Excel by estimating parameters that describe a probability distribution in chess downloads. Details on the distributions considered are described below.
Summary of findings
- In this analysis we fit and evaluated 4 different classes (Pareto -II, Burr-XII, Burr-XII+Covariates, Weibull Gamma Latent class) of models to explain the interplay between chess popularity and The Queen’s Gambit’s viewership.
- Data transform and covariates
- We have accounted for lingering effects of tv show viewership over multiple weeks by applying exponential decay and promotional effects of the show by applying weights.
- We have accounted for macro factors by collecting data from google search trends on the keyword ‘chess’ and included participation data from various chess events conducted by the US Chess Federation.
- We have standardized all covariates to ensure that we are able to compare the level of impact they have on the underlying observed dataset
- Model findings :
- We believe Weibull gamma with 3 covariates – TQG viewership, internet trends and chess events in US (WG+cov3) is the best model to explain the behavior of both ios and android users.
- R (shape parameter of distribution of lambdas across users) is fairly large for ios users indicating that ios users are more homogenous compared to android users. Additionally, c (duration dependence parameter) is greater than 1 indicating that there is positive duration dependence.
- Both Weibull gamma model with covariates and latent class model are saying very similar things in different ways for ios and android users. The covariate model is suggesting that netflix viewership along with a few macro trends explain variation in chess.com downloads. Similarly, the 2 segment shifted latent class model is suggesting that a new segment of users with positive duration dependence arise during the same period as launch of the show.
- The Queen’s Gambit won’t have a second season (link) and the announcement on launching a game associated with TQG is refreshing as we hope that this will trigger fresh interest in the game. Our analysis shows that chess events have a positive impact on the popularity of the game. Hence, we will hope that the US Chess Federation will continue to conduct more events popularizing the game.
Modeling
High level question
We are interested in understanding the sudden increase in popularity of chess by studying the increase in chess.com downloads on ios and android in Q3-2020 to Q1 2021 (ref- New York Times (11/23/2020), Business Insider (11/24/2020),Variety (11/25/2020). This type of understanding will help us evaluate the demand for chess games and how it changes across different types of users during different time periods.
Data considerations and covariates
Below is a brief description of the dataset and various transformations used for this analysis.
Starting dataset
Our starting dataset consists of the number of downloads of chess.com on ios and android collected from App Annie. We also have an unofficial scaled number of views of The Queen’s Gambit from Netflix (shared with The Wharton School). This dataset is left truncated and we are assuming as if chess.com started from the first week of the available data. As shown in the chart below, we can see that the adoption of chess.com on both android and ios are increasing over time. This suggests that the underlying propensity to download chess.com is possibly positively duration dependent, meaning that users as time passes an individual is more likely to download chess.com (a.k.a: Absence makes the heart grow fonder)
Transformations and additional covariates
Accommodating lingering effects of seeing a show
We know that platforms such as Netflix launch an entire series on a particular day and viewers often binge watch the whole series. The Queen’s Gambit had 7 episodes of ~45-60 mins in their season 1. We can expect a typical user to watch it over 1-2 weeks and the strongest impact of the show pertains for ~4 weeks. Hence, the viewers who started watching the show on week 1, probably had a strong lingering effect of the show for about 4 weeks. We can think of this sort of effect similar to how radioactive substances decay over time which is often explained via exponential decay function.
Applying exponential decay (see appendix), we can say that the effect of a show on any given week is current week’s viewership and the decayed effect from previous weeks (we have included decay upto 8 weeks). Calculation below:
Accommodating promotional effects in previous weeks below launch of a show
Show producers usually run extensive campaigns to promote their show either on Netflix or on other platforms a few weeks before their start date. In this analysis, we are assuming that ~10% of users who had watched the show in a given week were already aware and hooked on to the concepts of the show a week before watching it. This is included by adding a 10% multiplier on a given week from their next week. See calculations below:
Additional covariates
To understand the interplay between the launch of The Queen’s Gambit (TQG) and the popularity of chess.com, we would need to incorporate other covariates in our analysis that may explain any reasons for the rise in popularity of chess. In this analysis, we have included the following covariates:
- Chess events conducted in the US (size of chess events) – We have collected data from the US chess federation(link) to determine the general popularity of chess during our weeks of study by summing up the number of the players who have participated in different tournaments conducted by the federation.
- Search index from google’s keyword search trends (internet trends) – We have collected daily search trends on the keyword ‘chess’ from google trends (link) and taken a weekly average for our study weeks. As per google’s definition, they have standardized search trends for different keywords on a scale of 1-100 where 100 represents highest popularity. Thus an average across 7 days to represent the popularity of the keyword on a given week gives us a directional sense on the popularity of chess in the US. These search trends may be influenced by various macro factors including chess events conducted in the US and Netflix launch of TQG. Hence, this covariate will help us separate out the impact of other directly unobserved factors.
Additionally, we have also looked at adding weighted dummy variables for seasonality during christmas to account for free time people have; and for the first few weeks of the analysis to capture any lingering unobserved effects from weeks prior to our study period (to address left truncation). While, these variables were giving us a better fit in the calibration period, they did not give us sufficient gains to demand their inclusion in the final model used to explain the popularity of chess.com
Standardization of the covariates
We want to make sure that we are able to interpret the impact of each covariate with respect to others in our model. This required us to standardize each of the covariates by calculating z score using the following formula for each data point:
, where X= value of the covariate, u= Mean across all weeks, sigma= standard deviation across all weeks. This operation before model building ensured that all variables were on the same scale and we are able to compare the impact of each covariate with respect to other covariates.
Updated trend plot after standardization:
Calibration and forecasting periods
We are using the first 24 weeks of the data to build and calibrate model parameters. The last 3 weeks of the data are used for forecasting and evaluating how our model does on unobserved data. Though, we would have wanted to validate our forecasting for longer than 3 weeks, reducing our training dataset when we are trying to learn as much as 7 parameters (for some models) with 27 weeks of data; would have made the results uninterpretable.
Model building
Individual level, mixing and mixture distribution
We have used 2 different sets of models to model the behavior of ios and android users. For each of the sets of users from different devices, we started by assuming a simple memory less exponential adoption distribution at the individual level along with mixing gamma distribution to account for heterogeneity. This gave us a Pareto-II mixture model with parameters r and alpha. This model did not fit well with our story as the adoption for chess.com was increasing over time. So assuming that the individual level distribution to download chess.com would not be duration dependent seemed impossible.
This encouraged us to explore weibull distribution with duration dependent parameters for the individual level adoption probability as described by the following CDF and hazard function:
We assumed gamma distribution to account for heterogeneity across users and used the following function (Weibull Gamma . a.k.a- Burr XII) to estimate probability of adoption in various weeks during our calibration period.
The above model was not able to do well on specific weeks when the volume of downloads increased a lot compared to usual downloads trends. Hence, we incorporated the previously described time varying covariates to get a better fit. This resulted in using the following distribution for individual level CDF where B(t) is incorporating the impact of time varying covariates:
Next, by assuming that individual level Lambdas were following a gamma distribution, we could arrive at the final mixture model (WG+cov/Burr XII +cov) defined as:
We used different combinations of the covariates by taking a single covariate at a time and also taking them in pairs and taking all of them together. The parameter estimates shown below have the best models from each combination of covariates ( for example, while taking only 1 covariate, we saw that internet trends covariate had the best fit amongst all single covariate models which is then picked to show in the comparison table).
Lastly, we also explored using finite mixture/latent class models in which we created segments with different levels of heterogeneity and duration dependence within each segment as described with the CDF below:
In this model, we introduced the 2nd segment on week 8 (shifted lcm) when Netflix launched TQG to account for users who would have otherwise not downloaded chess app.The comparison table below shows the finite mixture model that gave us the lowest bayesian information criterion.
Model parameter comparison
We used the above definitions of the models and estimated model parameters by maximizing log likelihood of observing what we had observed in the data. Below is a comparison of different estimated parameters under the previously described models. Our interpretation of the trend is based on the best model selected on grounds of robustness (measured via calibration period and forecast period Mean Absolute Percent Error – MAPE), parsimony ( measured by number of parameters used in the model and bayesian information criterion -BIC) and quality of fit (measured by log likelihood, r square and significance of the chi square error). All of the models had a very high r square indicating that our models were good at explaining variance in the data and also had high chi square p_value if we ignore the first 2-3 weeks. Hence the quality of fit was mostly determined by studying log likelihood changes as displayed below.
Parameters for ios users
*Note – All covariates are highly significant with a very low p_value. All nested models have a significant LRT p_value
Findings for ios users
- We believe WG+cov3 (Weibull gamma with 3 covariates – TQG viewership, internet trends and chess events in US) is the best model to explain the behavior of ios users. It has high log likelihood and low BIC compared to other models. The WG+cov4 (which includes additional covariate on christmas seasonality) has a higher LL but does not justify the need for an additional parameter (parsimony) as the MAPE’s do not change compared to the 3 covariate model.
- R is fairly large for ios users indicating that ios users are more homogenous compared to android users. Additionally, c (duration dependence parameter) is greater than 1 indicating that there is positive duration dependence confirming our hypothesis. C is also less than 2 which suggests that though there is positive duration dependence, the rate at which likelihood of downloading if a user hasn’t downloaded yet, is decreasing.
- It seems like internet trends have a higher influence on chess.com downloads than other time varying covariates. This covariate includes some of the influence from TQG viewership as users usually search after watching a show. However, it can capture additional macro trends.
- The size of chess events have a very small but significant influence
- The latent class model does really well on MAPE and has some interesting learnings:
- We can see that the best lcm model gives us 2 segments where one of the segments are shifted to start from the netflix show start.
- The first segment is fairly homogenous and memory-less (with c=.99). This segment represents 84% of the users whose willingness to download (if they haven’t downloaded) does not change over time.
- The second segment has a lot of heterogeneity and a positive duration dependence (c=4.58). This segment represents 16% of the users whose willingness to download increases if they haven’t downloaded yet. It is perhaps safe to assume that the second segment originated from the first one due to the launch of the show.
- Both wg+cov model and latent class model are saying very similar things in different ways for ios users. The covariate model is suggesting that netflix viewership along with a few macro trends explain variation in chess.com downloads. Some of these macro trends may have been influenced by the show and the ones that are not influenced have a small impact. Similarly, the 2 segment shifted latent class model is suggesting that a new segment of users with positive duration dependence arise during the same period as launch of the show.
Below is a CDF plot of the actual vs estimated downloads for ios users (WG+Cov3, WG+Cov4 and WG + 2 segment shifted are all pretty indistinguishable in the plot):
Parameters from android users
*Note: All covariates are highly significant with a very low p_value. All nested models have a significant LRT p_value
Findings for android users
- We again believe that WG+cov3 (Weibull gamma with 3 covariates – TQG viewership, internet trends and chess events in US) is the best model to explain the behavior of ios users. It has high log likelihood and low BIC compared to other models. The LCM model (WG + 2 segment shifted) has a higher LL but does not justify the need for an additional parameter (parsimony) as we believe it is possibly over-fitting during the calibration period.
- R is large for android users but not larger than ios users indicating that android users are more heterogeneous. Additionally, c (duration dependence parameter) is greater than 1 indicating that there is positive duration dependence confirming our hypothesis. C is also greater than 2 which suggests the ones who haven’t downloaded yet their propensity to download is increasing at an increasing rate.
- It seems like internet trends have a higher influence on chess.com downloads than other time varying covariates. This covariate includes some of the influence from TQG viewership as users usually search after watching a show. However, it can capture additional macro trends. Additionally, the coefficient of the TQG viewership is negative for android users. This is possibly suggesting that there is a lag from the time when android users view the show and when they download the app [our time decay approach was not able to capture this well]. It is further explained in the interpretation of the lcm model
- The size of chess events have a very small but significant influence
- The latent class model does really well on MAPE and has some interesting learnings:
- We can see that the best lcm model gives us 2 segments where one of the segments are shifted to start from the netflix show start.
- The first segment is fairly homogenous and memory-less or somewhat negative duration dependent (with c=.96). This segment represents only 6% (compared to 84% for ios) of the users whose willingness to download (if they haven’t downloaded) does not change over time or reduces over time.
- The second segment has a lot of heterogeneity (very low r) and a positive duration dependence (c=6.59). This segment represents 94% of the users whose willingness to download increases if they haven’t downloaded yet. It is perhaps safe to assume that the second segment originated from the first one due to the launch of the show.
- This segment represents a large portion of android users and their willingness to download if they haven’t downloaded yet is increasing at a fast rate. These users are the reason why we see a negative relationship between TQG viewership and app downloads from our WG+cov model. As time passes after they view a larger portion of users are expected to download the app which in turn shows up as a negative relationship with TQG viewership.
- Similar to ios users, both wg+cov model and latent class model are saying very similar things in different ways for android users. The covariate model is suggesting that netflix viewership along with a few macro trends explain variation in chess.com downloads. Some of these macro trends may have been influenced by the show and the ones that are not influenced have a small impact. The influence of TQG viewership is negative with app download as the show seems to have resulted in a large new segment of users who have very high positive duration dependence as observed in the LC model. These users are likely to eventually download the app but don’t do it immediately.
Below is a CDF plot of the actual vs estimated downloads for ios users (WG+Cov3, WG+Cov4 and WG + 2 segment shifted are all pretty indistinguishable in the plot):
Discussions and interpretations
We have concluded that TQG viewership impacted chess.com downloads on both ios and android. However, there are other macro factors that explain some of the downloading behavior. Interestingly, it seems like the impact of TQG is delayed for android users as it resulted in a fairly large segment of positively duration dependent users whose propensity to download if they haven’t downloaded increases as time passes by. Often these downloads are observed much later than than the peak viewership season.
In this analysis we haven’t looked at viewership of TQG by device ownership which could have shed better light on the delay in app downloads on android. We have also not considered any impact due to social contagion and network effects where app download or viewership is influenced by the network a person is a part of. We can also learn more if we had individual level data on viewership and app downloads for users.
Conclusion and next steps
It is interesting to study how a TV show can impact the popularity of a sport/game. We have often observed the reverse relationship where a sporting event (such as soccer world cup, olympics) resulted in increased viewership of TV shows and movies on related topics. The Queen’s Gambit won’t have a second season (link) and the announcement on launching a game associated with TQG is refreshing as we hope that this will trigger fresh interest in the game. Our analysis shows that chess events have a positive impact on the popularity of the game. Hence, we will hope that the US Chess Federation will conduct more events and continue popularizing the game.
Appendix
Exponential decay determination for lingering effect of tv show viewership
We can represent the lingering effect of watching a show in time 0 that can be felt in time t using the formula:
Here A0 is the effect of watching the show in time0 and K is the decay rate constant that determines how much of the effect remains with a user after time t. According to our previous definition that a user usually has a strong effect till week 4, we can assume that half life of the show in the user’s mind is ~4 weeks. This half life of 4 weeks can be used to determine the value of K using the following derivation (more details here)
Replacing t with 4, we get K as -1732867951.
Additional things considered
We tried incorporating seasonality by adding dummy variables with different weights during the first few weeks of the study period and during Christmas. The dummy variable during the first few weeks was intended to capture any behavior from past unobserved data and the dummy variable for Christmas was trying to capture any impact from free time people have during holidays. Both of the models gave us a great fit (probably overfitting) but did not justify the need for additional parameters. CDFs below :