You will not need to rewrite your donor prediction models every year if you build your data files sustainably from the start.
I still remember the excitement I felt several years ago when the annual fund results began to confirm the validity of the first donor prediction model I built. As it turned out, the most time-consuming part of the project had been writing the code to calculate all the variables associated with each record: age, who is in a reunion year, years since graduation, who gave last year, who gave in each of the past five years, etc.
And then something terrible happened. With the single flip of a calendar page, suddenly it was next year, and I realized that in my haste to build the data files for the model, I used hard coded date values in all the programs that calculated date-dependent variables.
Also, in the excitement to add new variables while the model was being built, the code to calculate some variables was intertwined with the code used to calculate other variables (the technical term for this technique is “spaghetti code”). It was a nightmare to build the files needed for the second year of the modeling project because my software programs were not sustainable.
My definition of a sustainable program is one that can create a data file for any specified date range without the need to modify any of the code inside the program. It should also use independent subroutines (think of these as building blocks) to calculate each variable. Whoever is writing the program that will build your data file needs to know this is the requirement at the very start of the project because it affects the way she or he will write the program. Sustainable programming is important because predictive modeling is a continuous process, not a one-time event.
The most time-consuming part of any modeling project is getting to know your variables: how they relate to one another; have tendencies to exhibit outliers; blow up if a birth date is left out of a record in the database. In short, your variables will become old friends. When you present management with your plan for a modeling program, explaining this part of the project will help set expectations for when you might see the first results.
Over time, your fundraising program and constituents will change. As this happens, it can change which variables correlate to being a donor, as well as the weight each variable carries in the model formula. So, at the start of each fiscal year, you will want to check in with all of your old friends to see if anything changed with them and tweak your model to account for these changes. Doing this means that you will need an updated data file, so the program that builds your data file must be ready to run on demand.
Occasionally, you will want to experiment with new variables, and you may decide to remove variables that don’t correlate to the behavior you are trying to model (being a donor for our fundraising example). By using an independent subroutine to calculate each variable in the model, adding and removing variables can be done with surgical efficiency and will not require a major rewrite of the program.
After building your model based on a sample population, you will want to apply the model on a target population. For example, you would build a model using a sample population of everyone you solicited last year. But your target population will consist of everyone you plan to solicit in the current year. The data file for the sample and target populations must have the same variables; however, time-dependent variables will have different values for each record in the two files. A sustainably built program will not require any modification when you switch from building your sample file to building the target file.
The first key to building a sustainable program is to include a date range parameter as an input variable when running the program, then use the date range parameter to calculate all time dependent variables in the file.
Let’s look at some specific variables taken from the fundraising program of an academic institution. We’ll use the fiscal year start and end dates as date range parameters. For example, if your fiscal year runs from July to June, and we want to calculate who the donors were in that year, anyone with a qualifying gift between July 1, 2014, and June 30, 2015, will be marked as a donor in the data file. In fact, all the time-dependent variables in your data file need to be calculated based on the start and end dates for the fiscal year.
A popular variable to examine in a little more detail is a 50th reunion variable. Many schools put extra effort into soliciting alums celebrating their 50th, or some other, milestone reunion
(for example 25th or fifth). So it shouldn’t be a surprise if the 50th reunion year is a variable that correlates with being a donor. Building a data file with 2015 as the date range parameter should cause alums from the class of 1965 to have their 50th reunion variable set to 1. For people who graduated in any other year, for example 1966, the 50th reunion variable would be set to 0.
The second key to building variables sustainably is to write the code to compute each variable in its own subroutine. If you plan to have variables for the 50th, 25th and fifth reunion milestones, use a separate subroutine to calculate each variable. Use comments within your code to identify the start and end of each subroutine. This will allow you to find, modify or remove this variable without affecting any other variables.
If the program that builds your data file was written sustainably, when you run the program to build the fiscal year 2016 data file, the members of the class of 1966 will have a 1 in the 50th reunion variable, and the members of the class of 1965, and all other classes, will have that variable set to 0.
Within every model building cycle, you will need to build two data files. For example, to build models for fiscal year 2016, you will first want to build a file containing all the results from your fiscal year 2015 campaign. This sample data file will be used to design your model. It will tell you the weight each variable carries in the formula to predict the probability of being a donor. After building the model formula, you will want to apply the formula to a target data file built using fiscal 2016 date parameters (see Figure 1). Using sustainable programming techniques will allow you to build each of these files by simply changing the date range parameters and hitting the RUN button.
At the start of my first modeling project, I made the mistake of thinking that the project was just an experiment; therefore, I did not take the time to build the data files sustainably. I paid the price for my mistake in year two of the project when I had to rewrite all my programs using the techniques described in this article. Whether you are writing your own programs or asking a programmer do the work for you, assume the project will be successful and build your data files sustainably from the start of the project.
Mike Pasqua is a development services professional with over 12 years of experience in the field. He is currently the director of development services at the University of San Francisco. He has been doing predictive modeling work for the past six years and has implemented predictive modeling programs at several schools where he had worked previously.