Adding SQL to Your Analytics or Prospect Research Tool Belt: A Beginner’s Tutorial

By Farouk Aregbe, director of analytics, University of Missouri

A vast majority of nonprofit and higher education data live in databases that require ‌Structured Query Language‌ (‌also‌ ‌known as‌ SQL or Sequel) ‌‌‌to‌ ‌retrieve. This makes SQL an important key to unlocking critical business information for those in data analytics/science or prospect research/development. In this article, I make a case for learning SQL with the hope of serving up some inspiration or a little nudge, followed by a hands-on tutorial for beginners, answering 10 common questions with SQL. 

When ‌considering‌ a career in ‌data‌ analytics/science‌, ‌people often ‌ask‌ ‌which‌ ‌language‌ ‌they‌ ‌should‌ ‌learn‌ ‌first‌: ‌R‌ or‌ ‌Python. ‌My‌ ‌response‌ ‌is‌ ‌often‌ ‌neither.‌ ‌‌SQL‌‌ ‌should‌ ‌be‌ ‌learned‌ ‌first‌ ‌because‌, 80% of most data scientists/analysts time is spent collecting and cleaning data. Understanding SQL saves you a lot of hours and headaches, since this is the tool used to extract and transform ‌data in most organizations.

As‌ ‌a prospect analyst‌ ‌or‌ ‌researcher, ‌having‌ SQL in your tool belt ‌differentiates‌ ‌you‌ ‌from‌ ‌others. Understanding how the database is set up equips you to ask better questions. Also, many programming teams I have worked with have a two-week turnaround to get you data because they are inundated with requests. By learning SQL, you help change the dynamic — you add value by fetching the answers to your questions and you make your programmers happy by relieving some pressure created by the backlog of requests. ‌ ‌ ‌

Additionally, in ‌times‌ ‌of‌ ‌crisis, ‌‌one‌ ‌of‌ ‌the‌ ‌best‌ ‌things‌ ‌we‌ ‌can‌ ‌do‌ ‌for‌ ‌our‌ ‌career‌ ‌is‌ ‌to‌ ‌learn‌ ‌a‌ ‌new‌ ‌skill. Many‌ ‌of‌ ‌us‌ ‌are‌ ‌aware‌ ‌of‌ ‌friends‌ ‌and‌ ‌colleagues‌‌ ‌that‌ ‌have‌ ‌lost‌ ‌their‌ ‌jobs‌ ‌due‌ ‌to‌ ‌COVID-19. Knowing a friend had survived‌ ‌two‌ ‌big‌ ‌layoffs‌ ‌at‌ ‌AT&T‌, ‌even‌ ‌prior‌ ‌to‌ ‌COVID‌-19, I asked, ‌“Why‌ ‌do‌ ‌you‌ ‌think‌ ‌you‌ ‌didn't‌ ‌get‌ ‌cut?” He said, “I‌ ‌have‌ ‌a‌ ‌skill‌ ‌others‌ ‌didn't‌ ‌have, ‌I’m‌ ‌able‌ ‌to‌ ‌run‌ ‌SQL‌ ‌queries‌ ‌myself‌ ‌to‌ ‌retrieve‌ ‌data ‌from‌ ‌multiple‌ ‌databases.”‌ ‌Having‌ ‌the‌ knowledge‌ ‌of‌ ‌SQL‌ ‌was‌ ‌the‌ ‌differentiator‌ ‌that‌ ‌allowed‌ ‌him‌ ‌to‌ ‌keep‌ ‌his‌ ‌job‌ ‌when‌ ‌other‌ ‌folks‌ ‌were‌ being‌ ‌let‌ ‌go. ‌ ‌

While SQL enhances your tool belt, many may find the idea of learning programming intimidating. However, SQL is made to be a very accessible language that is relatively easy to learn. Depending‌ ‌on‌ ‌your‌ ‌learning‌ ‌style‌, ‌a‌ ‌solid‌ ‌weekend‌ ‌of‌ ‌immersion‌ ‌or‌ even ‌a‌ ‌few‌ ‌weeks‌ ‌of‌ ‌daily‌ ‌study‌ ‌will‌ ‌provide‌ ‌you‌ ‌a‌ ‌solid‌ ‌foundation‌ ‌on‌ ‌the‌ ‌basics‌. ‌From‌ there, ‌you‌ ‌can‌ ‌add‌ ‌more‌ ‌advanced‌ ‌concepts over time.‌ ‌ ‌ ‌

A majority of ‌nonprofits‌ ‌and‌ ‌higher‌ ‌education‌ ‌institutions‌ ‌will‌ ‌have‌ ‌either‌ ‌SQL‌ ‌Server‌ ‌from‌ ‌Microsoft,‌ ‌Oracle‌ ‌SQL‌ ‌from‌ ‌Oracle‌ ‌or‌ ‌MySQL‌ ‌once‌ ‌owned‌ ‌by‌ ‌Swedish‌ software company‌ ‌MySQL‌ ‌AB‌ ‌(now‌ ‌owned‌ ‌by‌ ‌Oracle).‌ The‌ ‌important‌ ‌thing‌ ‌to‌ ‌know‌ ‌is‌ ‌that‌ ‌while‌ some‌ ‌of‌ ‌the‌ ‌SQL‌ ‌code‌ ‌for‌ ‌these‌ ‌products (or dialects)‌ ‌are‌ ‌similar,‌ ‌there‌ ‌are‌ ‌notable‌ ‌differences‌ ‌in‌ ‌the‌ ‌code‌ ‌for‌ ‌each.‌ ‌For‌ ‌starters,‌ ‌find‌ ‌out‌ ‌which‌ ‌product‌ ‌your‌ ‌organization‌ ‌uses‌ ‌and‌ ‌learn‌ ‌that ‌first. This will make it easier‌ ‌to‌ ‌pick‌ ‌up‌ ‌others‌ ‌afterwards.‌       

Ready to dive in? Let’s start with a brief tutorial, to answer 10 common questions ‌to‌ ‌get‌ ‌you‌ ‌started‌ ‌regardless‌ ‌of‌ ‌the product.‌ 

What Is a Database?

Let’s assume that you received an Excel spreadsheet with ‌a‌ ‌table‌ named alumni.‌ The‌ alumni ‌table‌ ‌has‌ ‌seven‌ ‌columns‌: ‌ID,‌ Name,‌ ‌GradYear,‌ ‌TotalDonated‌ ‌and‌ ‌so‌ ‌on.‌ ‌Each‌ ‌row‌ ‌contains‌ ‌information‌ ‌about‌ ‌an‌ ‌individual‌ ‌alum.‌ ‌

00.1.jpg

A‌ ‌database‌ ‌is‌ ‌simply‌ ‌a‌ ‌collection‌ ‌of‌ ‌tables like the one above. Your Excel spreadsheet might have another table named organizations (like below) ‌for‌ ‌‌alumni that were involved in student organizations ‌while‌ ‌in‌ ‌school. ‌ ‌ 

00.2.jpg

Notice each table has an Org_ID column that allows us to connect or relate the data in both tables. For instance, we can see from the alumni table that our first alum Nicholas Tremper was in an organization with Org_ID 30. This helps us make the connection that this alum was in the Biology Club when we reference Org_ID 30 in the organizations table. To get information from these tables, we write sentences, called queries, that the computer interprets and outputs results.      

With that in mind, let’s focus our attention on the alumni table and get some hands-on practice with SQL queries.      

Get Real-Time Practice on RexTester.com

In this tutorial, I have written each query in a blue box (like below), followed by a picture of the result right under the blue box.

SELECT column FROM table


00.3.jpg

You may choose to get real-time practice using RexTester.com‌ which‌ ‌allows‌ us‌ ‌to‌ ‌run‌ ‌queries‌ ‌without a‌ ‌‌login,‌ ‌download‌ ‌or‌ ‌setup‌ of ‌software.‌ ‌Click here to access‌ ‌the RexTester‌ ‌database‌‌‌ for this tutorial ‌and ‌play‌ ‌along‌ ‌by‌ ‌typing‌ ‌each‌ ‌query. The website will pop up in your browser and should look something like this:

RexTester (2).png

To follow along, type your query in‌ ‌the‌ ‌box‌ ‌on‌ ‌the‌ top ‌left ‌side of RexTester, just like I have done in the blue boxes. Next, click the ‌“Run it”‌ button on the bottom left to execute the query. Then, find your results on the right side of the screen and compare them to the results I have provided below each query. In this tutorial, I used Oracle SQL dialect because that’s what I use regularly. Again, as you learn, I would recommend starting with what your organization uses. Be sure to pay attention to the little details in the code as you switch dialects and versions of SQL.

That’s it for the introduction. Let’s jump into answering some questions with SQL.

Answering 10 Common Questions With SQL

Let’s‌ ‌write‌ an ‌SQL ‌query ‌to‌ ‌show‌ ‌all‌ ‌the‌ ‌columns‌ ‌and‌ ‌rows‌ ‌in‌ ‌the‌ alumni ‌table.‌ ‌ 

1. Can you provide the information on all alumni in the database?

Let’s‌ ‌write‌ an ‌SQL ‌query ‌to‌ ‌show‌ ‌all‌ ‌the‌ ‌columns‌ ‌and‌ ‌rows‌ ‌in‌ ‌the‌ alumni ‌table.‌ ‌

SELECT id, name, gradyear, totaldonated, membership, city, degree, org_id, FROM alumni

01.jpg

SELECT ‌tells‌ ‌SQL‌ ‌to‌ ‌show‌ ‌each of the chosen columns FROM‌ the alumni table. ‌‌SELECT‌ ‌and FROM‌ ‌are‌ ‌in‌ ‌uppercase‌ ‌because‌ ‌they‌ ‌are‌ ‌considered‌ ‌keywords‌ in ‌SQL.‌ Keywords‌ ‌are‌ ‌words‌ ‌that‌ ‌have‌ ‌a‌ ‌specific‌ ‌meaning‌ ‌within‌ ‌the‌ ‌SQL‌ ‌language‌, ‌and‌ ‌as‌ ‌such‌ ‌are‌ ‌not‌ ‌used‌ ‌other‌ ‌than‌ ‌to‌ ‌tell ‌SQL‌ ‌to‌ ‌take‌ ‌a‌ ‌particular‌ ‌action.‌ The‌ ‌proper way to ‌‌write‌ ‌SQL‌ ‌keywords‌ ‌is‌ ‌in‌ ‌uppercase. Like highlighting important phrases on a document, writing keywords in uppercase improves readability and makes it easier for us (or anyone reading our code) to focus on the action SQL is asked to execute. However, ignoring the uppercase doesn’t stop the code from working properly. 

SQL sometimes has shortcuts that come in handy, too. For example, instead of writing out all the column names (‌‌ID, ‌‌Name, ‌‌GradYear,‌ ‌TotalDonated‌ ‌and‌ ‌so‌ ‌on) in the above query, we can do the following:

SELECT * FROM alumni

The ‌asterisk (*‌) ‌represents‌ ‌all‌ ‌columns and saves us from having to write out a long list of column names while the semi-colon (;) indicates the end of the query.

2. Which alums graduated before the year 2000?
SELECT * FROM alumni WHERE gradyear < 2000

02.jpg

SELECT‌ again ‌tells‌ ‌SQL‌ ‌to‌ ‌show‌ us ‌data‌ ‌in the chosen ‌columns‌. ‌This time, limit rows to only those WHERE the alums have a gradyear less than 2000.

WHERE is one of the keywords that allows‌ us to limit what alums we see in the rows. The less than sign (<) is called an operator, like the term you might be familiar with from your early math instruction.           

Here are some other operators you will find valuable in your SQL journey:

> greater than

= equal

!= or <>  not equal

>= greater than or equal

<= less than or equal

3. Which alums graduated before 2000 and have donated more than $20,000?
SELECT *FROM alumni WHERE gradyear < 2000 AND totaldonated > 20000

03.jpg

AND is another keyword that instructs SQL to return alums that satisfy two conditions. In this case, it’s that gradyear is less than 2000 and totaldonated is greater than $20,000. As the queries get longer, a good tip to keep your query readable is to start your keywords on a new line. Wouldn’t you agree the following lines are easier to read?

SELECT *

FROM alumni

WHERE gradyear < 2000

AND totaldonated > 20000

4. Which alums live in Houston or majored in Biology?

Note: For those following along in RexTester, in the next query we will encounter quotation marks, which are not RexTester friendly. Please ensure that you type the query rather than copy and pasteOtherwise, you will get a “ORA-00911: invalid character” error, which means your characters are not recognized. I might also take the opportunity to encourage you to type rather than copy and paste the queries going forward, as a best practice for active learning.

SELECT *

FROM alumni

WHERE city = ‘Houston’

OR degree = ‘BIOLOGY’

04a.jpg

Here, we encounter our first characters (also called strings), Houston and BIOLOGY. Characters are case-sensitive and must be enclosed in single quotation marks. OR is a keyword that instructs SQL to return alums that satisfy the first or second condition. In this case, those who live in the city of Houston or graduated with a BIOLOGY degree.

To further simplify things and provide just the columns that are needed, lets include only the name, city and degree columns.

SELECT name, city, degree

FROM alumni

WHERE city = ‘Houston’

OR degree = ‘BIOLOGY’

04b.jpg
5. What is the total amount alumni have donated by city?

SELECT city, ‌SUM(totaldonated‌)

FROM alumni

GROUP BY city

05a.jpg

SUM is part of a short list of words called aggregate functions. Aggregate functions work alongside another keyword called GROUP BY to summarize groups within the data set. For example, the aggregate function SUM will add up the total donated for the group of alumni in Houston, then it will do the same for all the other cities. Other aggregate functions include AVG, COUNT, MAX and MIN, to name a few.

If we have a long list of cities, arranging the list in alphabetical order will allow us to quickly find a city. Here is how to get the cities in alphabetical or ascending order:

SELECT city, ‌SUM(totaldonated‌)

FROM alumni

GROUP BY city

ORDER BY city ASC

05b.jpg

ORDER BY is a keyword that tells SQL to arrange the city column in ascending order (ASC — from A to Z), as opposed to descending order (DESC — from Z to A).

6. What is the single largest donation in each city?

SELECT city, ‌SUM(totaldonated‌)

FROM alumni

GROUP BY city

ORDER BY city ASC

06a.jpg

MAX is another aggregate function. When used with the GROUP BY keyword, it returns the highest or maximum totaldonated in each city. 

If others will be viewing the list, it might be helpful to make the column header more descriptive than Max(TotalDonated). Let’s call it largest_donation instead of Max(TotalDonated). Here is how to do that:

SELECT city, MAX(totaldonated‌) AS largest_donation

FROM alumni

GROUP BY city

ORDER BY city ASC

06b.jpg

AS is a keyword that tells SQL to rename (or give an alias to) the column from Max(TotalDonated) to largest_donation. The underscore between the words largest and donation helps SQL read this as one word rather than two words. We have done this to simplify things, as renaming works differently for various products. You can learn the nuances as you increase your knowledge of SQL and choose a product to master.

7. What is the average donated for alumni in each city?

Note: For those following along in RexTester, you may encounter some incorrect data on this example. Please know that the output you see below is correct and we have alerted the developers of RexTester about this error.

SELECT city, AVG(totaldonated‌) AS average_donation

FROM alumni

GROUP BY city

07.jpg

AVG is another aggregate function. When used with GROUP BY, it returns the mean or average of the totaldonated in each city.

8. How many alumni have donated in each city?

SELECT ‌city, COUNT(DISTINCT id‌) AS no_of_donors

FROM alumni

WHERE totaldonated > 0

GROUP BY city

08a.jpg

COUNT is another aggregate function. When used with GROUP BY, it returns the count or number of alums with a totaldonated in each city.      

DISTINCT is a keyword that asks SQL to count only one occurrence of each ID number. Therefore, if one individual has two donations, they are still counted once.

In this case, we included WHERE totaldonated > 0, because alums with zero haven’t donated.

Finally, it might make sense that we arrange the No_of_Donors from highest to lowest so we might know what cities to focus on for an event or campaign. Here is how we would accomplish that:

SELECT ‌city, COUNT(DISTINCT id‌) AS no_of_donors

FROM alumni

WHERE totaldonated > 0

GROUP BY city

ORDER BY‌ COUNT(DISTINCT ID) DESC

08b.jpg

ORDER BY keyword arranges the column referred to as COUNT(DISTINCT id) in descending DESC (highest to lowest) or ascending (lowest to highest) order.

9. Can you generate an alphabetical list of cities with alumni listed from highest to lowest amount donated?

SELECT ‌city, name, ‌totaldonated‌

FROM alumni

WHERE totaldonated > 0

ORDER BY city ASC, totaldonated DESC

09.jpg

With the ORDER BY keyword you can ask SQL to arrange multiple columns and it does so in the order they appear in the code. Therefore, the city is arranged alphabetically (ASC), then the totaldonated is displayed highest donation to lowest (DESC) while honoring the order already created in city. 

10. Can you generate a list of alumni and the student organization(s) they participated in while in school?

For our final question, we’ll need information from two tables: name from the alumni table and org_name from the organizations table. You’ll remember both tables have an Org_ID column, which helps us connect them. Just as we did earlier, SQL finds Nicholas Tremper’s record with Org_ID 30 in the alumni table, matches that to Org_ID 30 in the organizations table and returns the Biology Club as the org_name associated with the Org_ID. This is repeated for all the alumni in each row of the alumni table. Here is the query:

SELECT alumni.name, organizations.org_name

FROM alumni, organizations

WHERE alumni.org_id = organizations.org_id

10.jpg

With alumni.name, the word alumni before the period tells SQL to reference the name column from the alumni table, and conversely with organizations.org_name to reference the org_name column from the organizations table.

Next, following the FROM keyword, we list all the tables we are referencing — in this case alumni and organizations. Lastly, after the WHERE keyword, we ask SQL to match the Org_ID in the alumni table to the Org_ID in the organization table. Therefore, Nicholas Tremper with Org_ID 30 in the alumni table is matched to Biology Club with Org_ID 30 in the organizations table. Next, Taylor Woltz with Org_ID 20 in the alumni table is matched to Delta Sigma Theta with Org_ID 20 in the organizations table, and so on and so forth.

That’s it for the tutorial. If you have made it this far, you have the desire and follow-through it takes to learn SQL. You may choose one or more of these resources to guide your learning. For those that prefer learning through video courses, you may find courses like these on Udemy helpful. Just make sure you buy them when they are on sale, or find free courses like this one on YouTube to begin building your SQL foundation. Happy learning!

 

This article relates to the Data Science domain in the Apra Body of Knowledge.

 

 

 

  


Learn more about the author featured in this article on the Connections Thought Leadership Page .

Recent Stories
Apra Fundamentals: Advice to My Former Self

Looking North: Understanding Your Canadian Prospects in 10 Questions

Adding SQL to Your Analytics or Prospect Research Tool Belt: A Beginner’s Tutorial