By Brijesh Jani, senior data scientist, analytics, University of Chicago, and Justin Cox, prospect management analyst, University of Chicago
Advanced machine learning concepts can seem like a domain reserved only for large companies that need complex algorithms, like Netflix and Amazon. We’re here to show otherwise. Our traditional fundraising problems can indeed be tackled using these very same concepts. We just need to identify the right opportunities to use them.
Where’s the Opportunity?
Like many fundraising shops, a significant portion of our fundraising success at the University of Chicago is attributed to our managed gift program. In this program, our analysts and gift officers partner to cultivate prospects, plan solicitations, propose gifts and track what actually happened throughout the lifecycle of each solicitation. Having just closed a large capital campaign, we had five years’ worth of relatively recent and robust solicitation data that we could use to analyze our fundraising and managed gift practices.
Through this analysis we identified two opportunities to improve. First, we had not been providing actionable guidance on how to better manage a solicitation to increase its probability of closing successfully. Second, we were dependent on gift officer feedback to forecast how managed gifts would contribute to fundraising progress, especially at the highest levels of giving. We needed the following:
- A better system to recommend changes that gift officers could make in the management of a solicitation, to increase its probability of closure
- More reliable predictions of which solicitations were most likely to close
How Do We Move Forward With This Opportunity?
To address these two issues, we needed to dig into our solicitation data even further to determine which factors of managed gifts were most significantly related to the closing of these solicitations. Based on our findings, we could then recommend adjustments to these factors to better manage, and more reliably identify, those gifts that would help us reach our fundraising goals. The statistical methods through which we obtained these findings might seem familiar: logistic regression and random forest (RF).
Both logistic regressions and RFs can comb through patterns in data to help answer a probability-based question of interest (i.e., are your prospects likely to donate?). The few key differences, which ultimately led us to use both methods, are that a logistic regression gives us a good prediction with a relatively easy to interpret algorithm, while an RF generally yields more accurate predictions. However, those predictions are more difficult to interpret and explain.
We realized we could combine the classification accuracy of an RF model with our knowledge of the average lifespan of solicitations at different gift levels, allowing us to forecast how much we’d raise by the end of the year. Concurrently, we could use the explicable nature of a logistic regression algorithm to create an interactive recommendation tool, which would allow us to help gift officers better manage their solicitations.
Using R, an open source statistical analysis software, we went through several iterations of running our data through both algorithms and arrived at a final set of factors and models. Given we were interested in creating recommendations for improved pipeline management, we focused primarily on including factors that were within our ability to change (i.e., how often someone is contacted as opposed to someone’s age). Note that the following is not a comprehensive list of the factors that went into our models but simply a set of examples to provide an idea of the types of factors included:
- Time lapse factors: These addressed questions about the solicitation that dealt with “how long before” or “how long after.” For instance, what was the number of days between the opening of a solicitation and when the ask was made?
- Counts: These addressed questions about the solicitation that dealt with "how much,” “how many” and “how often.” For instance, how many solicitors were involved in the solicitation, or how many outreach attempts were made?
- Flags: These addressed other circumstances that might have influenced the success of the solicitation, like whether the prospect had other pledges to fulfill going into the new solicitation.
The following table breaks down the resulting models we obtained alongside our factors:
Note that we created two logistic models, one for <$100K solicitations and the other for $100K+, where $100K represented our Major Gift-level threshold. The reason for this setup is that a single logistic regression wouldn’t be able to pick up the nuances of managing higher level solicitations vs. lower level solicitations. We tried separating the solicitation ask amounts into more than just two bands, but this simple two sub-model approach yielded the best accuracies.
Shifting our focus over to the RF forecasting model, we decided it wasn’t necessary to create sub-models because, despite having an accuracy not much higher than the logistic regressions, the RF was still able to capture the nuances of managed gifts at different solicitation levels. For instance, we tested both models on predicting the outcome of our past Principal Gift (PG) band of $5 million+ solicitations, and the RF, with an accuracy of 84%, outperformed the regressions considerably, with an accuracy of only 70%. These PGs make up a small number of our managed gifts but contribute a significant portion of the dollars we raise. Thus, being able to accurately predict which of these gifts will successfully close will go a long way toward improving the accuracy of our fiscal year forecasts.
Looking for ways to effectively dissect and talk about data? Read Cannon Brooke's article "Tame the Data Tsunami" for tips and recommendation for navigating this challenge and sharing information that resonates with key stakeholders.
How Do We Share This Opportunity?
With our models trained and finalized, we proceeded to create our tools and visualize our results.
First, we used our logistic regressions to create two excel workbooks, one for each model. They were designed to help generate the recommendations for managed gifts through the row-by-row focus on the factors surrounding each individual solicitation along with its probability to close successfully.
Figure 1 below is an image taken from of one of these workbooks. Note how column AE, which displays the probabilities, also houses the underlying logistic model algorithm. This allows the user to manually modify the factors to see the immediate effect on those probabilities, and that’s how we’d generate our recommendations. That is, we could say that if the gift officer cultivates the prospect in a more personalized manner several times by either phone or letter as opposed to email, they increase the chances of closing successfully by X%.
Figure 1. This view is of just one tab in the entire workbook where many columns have been hidden and renamed to simplify the example. There are many other pieces of functionality within the workbook, including color coding that tells the user what factors seem to be hurting or helping the chances of closing that solicitation successfully, several simple graphs to give an overview of the open solicitation pipeline and definitions for each of the factors that feed into the algorithm.
We hope to improve these tools based on the feedback we get from user testing. As of now, the user is required to feed currently open solicitation data into the workbooks from a data extract that gets refreshed on a weekly basis. In future iterations, however, we’d like to create a completely automated, extract-free tool in which the user would simply access a single dashboard that gives them all the necessary functionality, if not more.
Our second set of output transformed the RF model’s classifications into Tableau charts that allow the user to see what areas of the university need help with regards to managed gifts. Figure 2 below is an example of just one breakdown of a certain unit’s pipeline, showing the user how many of their solicitations are likely to close, likely to fail or missing key data (thus yielding no prediction).
We’re still working toward visualizing the forecasting piece, again using the RF model classifications in conjunction with average lifespans of solicitations per gift level, which will allow the user to see how much money is likely to be raised and from where it’s coming.
Although this is still a project in its adolescence, we’re excited to have taken this opportunity to identify areas in need of improvement with regards to managed gifts and address them using a few advanced machine learning concepts. We’re eager to see how this project will evolve as testing continues, but we are also confident this will open our eyes to other opportunities to improve some of our most important fundraising practices.
This article relates to the Data Science domain in the Apra Body of Knowledge.
Learn more about the authors featured in this article on the Connections Thought Leaders Page.