Tutorial: Cleaning and Using the Billionaire Data Set

The following are instructions from Steve Grimes, related to his article “Don't Be Tardy to the Party: Identifying Your Billionaires with Open Data.” Read the article here before diving into this tutorial.

Watch the video linked here for a step-by-step guide from Steve, in addition to the written instructions provided. 

Here is how I have approached cleaning and using the billionaire data set for my needs at Jazz.

Using the billionaire data set for matching names within your database:

Here we are simply going to use these names to see if there are any matches in your database.

  • Download Fuzzy Lookup, an Excel attachment, to match names to records in your file.
  • Clean up names for matching, using a “First Name Last Name” format in each cell. The names in the data set have three basic formats:

o   “First Name Last Name”: This format is fine and we want to keep it this way for now

o   “First Name Last Name Suffix or ‘Family’”: For name matching purposes, get rid of the suffixes and the ‘family’ (use Find and Replace option or the some of the formulas in the appendix)

o   “First Name, ‘&’, First Name, Last Name”: For name matching purposes and ease, delete everything before and including the “&” (use Find and Replace option or the some of the formulas in the appendix)

o   Other formats can be fixed manually, as that there are so few of them

  • Once all names are cleaned up, place in a separate worksheet in one column and format as a table
  • Gather all names within your database that you would like to match to the list of names just cleaned up. Keep the list of names to no more than 300k records (depending on the speed of your computer); these names should be formatted with “First Name Last Name” in one cell.
  • Place your database’s names in another worksheet and format as a table (I suggest also having a column for IDs just in case you have any matches to search in your CRM)
  • Use Fuzzy Lookup to match names (use the directions in the Read Me file of the installation folder)

Using the billionaire data set to create an alert system in your CRM (full names):

Here we’ll use full names to create an alert system that can run nightly to check if any activity takes place in that data set. Depending on your CRM, you will need to work with your IT team to set up a system for this. However, if you have full access to your records and a CRM that can set up a nightly report to cross check the names (like Tessitura), simply format the names in a way that they can be matched in your system.

  • If you fall into the latter option, use the concatenate function in Excel to join first and last names in your CRM if they are stored in their own separate fields (see appendix)
  • Conversely, and with a bit more complication, you can split the billionaire names up by first and last names by using the Text to Columns option in Excel, if the names within your CRM are stored that way

Using the billionaires list to create an alert system or match names in your CRM (last names):

While it is great to know if a billionaire has a relationship with your organization, it is equally important to know if family members have a relationship as well. To that end, you can use the last names in the billionaire data set to accomplish this task.

  • Separate the last names in the billionaire dataset; follow the directions above to get the names into the “First Name, Last Name” format
  • Use a formula to separate the names to get just the last name (see appendix)
  • Get rid of the most popular surnames. The alert system will do you no good if it returns information every time for every “Smith.” Use the following data set to cut down on the most popular last names within the US, found here (use a VLOOKUP for this, see appendix)
  • Once you have collected unique last names, match names in your database with Fuzzy Lookup or create an alert system that is triggered every time a last name match occurs with an action in your CRM. 

APPENDIX: Useful Excel Formulas for Data Cleaning

Concatenate

=CONCATENATE(A2," ",B2)

‘A2’ and ‘B2” represent first name and last name respectively. ‘ “ “ ‘ gives a space between each. 

 

VLookup

=VLOOKUP(A1,Sheet#!A:H,2,FALSE)

‘A1’ should be the beginning of the “unique IDs” to match to sets of fields. The unique ID can be anything, as long as it is in the first column and it exists in both sets of data. If the data is in the same Excel document use the ‘ ! ‘ to denote this (i.e. Sheet2!). “A:H” denotes the columns to be searched in the document being imported over. “2” denotes the field that will be imported over. “FALSE” means you are asking for an exact match.

 

Get First Word

=LEFT(A1,FIND(" ",A1)-1)

 

Get Last Word

=IF(ISERR(FIND(" ",A1)),"",RIGHT(A1,LEN(A1)-FIND("*",SUBSTITUTE(A1," ","*",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))))

 

Return Everything Before Character

=LEFT(A1,(FIND(" &",A1,1)-1))

‘ “&” ‘ denotes the character.

 

Return Everything After Character

=MID(A1,FIND("&",A1)+1,256)

‘ “&” ‘ denotes the character.

 

Switch Order of Words

=MID(A1,FIND(",",A1)+2,4e4)&" "&LEFT(A1,FIND(",",A1)-1)

Recent Stories
Message from the President: Taking Action With Our Strategic Plan

Rethinking Gift Tables

Religious Roots of Charity