Introduction

This analysis is intended to examine the relationship between site/program attendance and member renewal rate.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

The aggregated dataset "total_perfs" is necessary for simple plotting because the raw data's lowest level of detail is at the att_type value (distinguishing between General and Program admission). This distinction will be useful later, but for now we just want to see if there is an easily identifiable relationship between attendance and likelihood of renewal.

In [2]:
sns.set(style='darkgrid')
raw = pd.read_csv('mbr_data.csv')
total_perfs = raw.groupby(['customer_no', 'renew_ind', 'first_yr_ind']).sum().reset_index()

Now that we have an initial aggregate, we can aggregate that again to get counts of how many customers are assigned to each grouping of did/did not renew and how many events they attended.

In [3]:
agg = total_perfs.groupby(['num_perfs', 'renew_ind']).count().reset_index()
g = sns.relplot(x='num_perfs', y='renew_ind', size='customer_no', sizes=(15,200), data=agg,
               facet_kws={'subplot_kws': {'title': 'Renewal Rate Over Events Attended'}})
g.set_axis_labels('Events Attended', 'Renewal (0/1)')
Out[3]:
<seaborn.axisgrid.FacetGrid at 0x1df5d8fe7f0>

There does seem to be at least some relationship, but it looks like there's a large chunk of non-attending members, which makes it harder to tell at a glance if there's a difference between the renewal rates of members who attended 1 event or not. By filtering out the members whose num_perfs = 0, we can compare the size of the 1 time admittances more readily.

In [4]:
agg_no_zero = total_perfs[total_perfs['num_perfs']>=1].groupby(['num_perfs', 'renew_ind']).count().reset_index()
g = sns.relplot(x='num_perfs', y='renew_ind', size='customer_no', sizes=(15,200), 
            data=agg_no_zero, facet_kws={'subplot_kws': {'title': 'Renewal Rate Over Events Attended'}})
g.set_axis_labels('Events Attended', 'Renewal (0/1)')
Out[4]:
<seaborn.axisgrid.FacetGrid at 0x1df5da77d68>

That makes the relationship a little clearer, and it does appear that members who attend one event were more likely to renew than not for that year of data. However, these scatter plots aren't really that easy to read.

Analysis

This analysis will use a couple new columns added to total_perfs, namely bin_att_ind and att_ind. The former is a 0 or a 1, meaning that they did not attend or did attend at least 1 event. The latter is a "trinary" indicator, 0 for no attendance, 1 for they attended 1 event, and 2 if they attended 2 or more.

In [5]:
bin_att = total_perfs
bin_att["bin_att_ind"] = bin_att['num_perfs'].apply(lambda x: 1 if x > 0 else 0)
bin_att["att_ind"] = bin_att['num_perfs'].apply(lambda x: 2 if x > 1 else (1 if x==1 else 0))

Tables

In [6]:
bin_att.groupby(['bin_att_ind', 'renew_ind']).count()
Out[6]:
customer_no first_yr_ind num_perfs paid_amt att_ind
bin_att_ind renew_ind
0 0 2808 2808 2808 2808 2808
1 6487 6487 6487 6487 6487
1 0 5503 5503 5503 5503 5503
1 6414 6414 6414 6414 6414
In [7]:
bin_att.groupby(['att_ind', 'renew_ind']).count()
Out[7]:
customer_no first_yr_ind num_perfs paid_amt bin_att_ind
att_ind renew_ind
0 0 2808 2808 2808 2808 2808
1 6487 6487 6487 6487 6487
1 0 2333 2333 2333 2333 2333
1 3222 3222 3222 3222 3222
2 0 3170 3170 3170 3170 3170
1 3192 3192 3192 3192 3192

Bar Plots

Since we are interested in the ratios (renewal rate) between members who attended and who did not attend events, bar plots should provide a clearer look at the relationship than the scatter plots did.

General Attendance Analysis

We'll begin by examining member attendance of any kind.

In [8]:
g = sns.barplot(x='bin_att_ind', y='customer_no', hue='renew_ind', 
                data=bin_att.groupby(['bin_att_ind', 'renew_ind']).count().reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Attendance Status')
g.set_xticklabels(['Did Not Attend', 'Attended'])
g.set_title('Number of Members Over Attendance Status')
Out[8]:
Text(0.5,1,'Number of Members Over Attendance Status')

Interestingly, the renewal rate for members who did not attend anything in FY17 is actually way better than members who attended 1 or more events. This is counter to what was expected, so further investigation will be worthwhile.

In [9]:
g = sns.barplot(x='att_ind', y='customer_no', hue='renew_ind', 
                data=bin_att.groupby(['att_ind', 'renew_ind'])
                            .count()
                            .reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Events Attended')
g.set_xticklabels(['0', '1', '2+'])
g.set_title('Number of Members Over Attendance Status')
Out[9]:
Text(0.5,1,'Number of Members Over Attendance Status')

By separating out the "did attend" group into "attended 1 event" and "attended 2+" events, an interesting relationship emerges. The renewal rate for the members who attended just one event is better than the members who attended 2 or more. Perhaps there is a diminishing return here, where members who attend many events are more likely to feel like they have no further need for their membership (i.e. they've gotten the benefit they wanted, and so don't renew).

In [10]:
g = sns.barplot(x='bin_att_ind', y='customer_no', hue='renew_ind', 
                data=bin_att[bin_att['first_yr_ind']==1]
                                .groupby(['bin_att_ind', 'renew_ind'])
                                .count()
                                .reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Attendance Status')
g.set_xticklabels(['Did Not Attend', 'Attended'])
g.set_title('Number of NEW Members Over Attendance Status')
Out[10]:
Text(0.5,1,'Number of NEW Members Over Attendance Status')
In [11]:
g = sns.barplot(x='bin_att_ind', y='customer_no', hue='renew_ind', 
                data=bin_att[bin_att['first_yr_ind']==0]
                                .groupby(['bin_att_ind', 'renew_ind'])
                                .count()
                                .reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Attendance Status')
g.set_xticklabels(['Did Not Attend', 'Attended'])
g.set_title('Number of Longstanding Members Over Attendance Status')
Out[11]:
Text(0.5,1,'Number of Longstanding Members Over Attendance Status')

Clearly, new members renew at a much lower rate than longstanding members, which makes sense. Here we also see that new members that attended anything in FY17 renewed at 3 times the rate of new members that attended nothing.

Interestingly, the renewal rate for longstanding members who attending something is lower than that of longstanding members who attending nothing, which explains the unexpected outcome of the first bar plot. Possibly this is because members who don't use their membership for free or discounted admission are members for other, potentially more reliable reasons (e.g. mission affinity rather than desire for a deal).

Programs vs General Admission

This section will examine differences in members who attended programs vs members who came for general admission.

In [12]:
prog_compare = raw
prog_compare["bin_att_ind"] = prog_compare['num_perfs'].apply(lambda x: 1 if x > 0 else 0)
prog_compare["att_ind"] = prog_compare['num_perfs'].apply(lambda x: 2 if x > 1 else (1 if x==1 else 0))
prog_compare.head()
Out[12]:
customer_no att_type num_perfs paid_amt renew_ind first_yr_ind bin_att_ind att_ind
0 28 None 0 NaN 0 1 0 0
1 100070 Program 1 44.0 1 0 1 1
2 100176 Program 1 16.0 0 1 1 1
3 100470 General 2 16.0 1 1 1 2
4 100663 General 1 0.0 0 1 1 1
In [13]:
g = sns.barplot(x='att_type', y='customer_no', hue='renew_ind', 
                data=prog_compare.groupby(['att_type', 'renew_ind']).count().reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Attendance Type')
g.set_title('Number of Members Over Attendance Status')
Out[13]:
Text(0.5,1,'Number of Members Over Attendance Status')
In [14]:
g = sns.barplot(x='att_type', y='customer_no', hue='renew_ind', 
                data=prog_compare[prog_compare['first_yr_ind']==1].groupby(['att_type', 'renew_ind']).count().reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Attendance Type')
g.set_title('Number of New Members Over Attendance Status')
Out[14]:
Text(0.5,1,'Number of New Members Over Attendance Status')
In [15]:
g = sns.barplot(x='att_type', y='customer_no', hue='renew_ind', 
                data=prog_compare[prog_compare['first_yr_ind']==0].groupby(['att_type', 'renew_ind']).count().reset_index())
g.set_ylabel('Number of Members')
g.set_xlabel('Attendance Type')
g.set_title('Number of Longstanding Members Over Attendance Status')
Out[15]:
Text(0.5,1,'Number of Longstanding Members Over Attendance Status')

New members who attended programs did renew at a higher rate than general admission attendees. Longstanding members who attended programs also renewed at a slightly higher rate than general admission attendees, though we obviously see that the most favorable rate is still non-attendees.

Conclusion

It seems, then, that what can be gleaned from this is that members who joined in FY17 were more likely to renew in FY18 if they had attended something, and more likely to renew if what they attended was a program rather than general admission.

Once a member has been a member for some time, they are more likely to renew, so if more first year members can be encouraged to attend programs, perhaps they will renew and then stay on?

It will be interesting to revist this analysis at the end of this fiscal year, when we can look at two full years of member data (FY17 and FY18) and compare that with two full years of complete attendance data.