U.S. Universities - Regional Performance#
By Sarina Noone
Using datasets from the Integrated Postsecondary Education Data System, or IPEDS, I will explore the range of American 4-year universities (public, private or for-profit), their distribution across the country and look at one basic indicator of student outcomes: years spent in attaining a bachelor’s degree.
Hypothesis: Americans tend to hold biases towards elite, east coast universities because of their historical excellence and the reputation of the Ivy League. While based on density and presence of major employers in coastal cities probably leads to a higher number of institutions of higher education, this may not be linked to school performance.
import pandas as pd
#import first dataset with all university data incl name, state, geo region
allunis = pd.read_csv('/home/jovyan/python-public-policy/Sarina_Noone_Final/hd2020.csv')
#total number of universities in dataset?
allunis['UNITID'].count()
6440
allunis.head()
UNITID | INSTNM | IALIAS | ADDR | CITY | STABBR | ZIP | FIPS | OBEREG | CHFNM | ... | CBSATYPE | CSA | NECTA | COUNTYCD | COUNTYNM | CNGDSTCD | LONGITUD | LATITUDE | DFRCGID | DFRCUSCG | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100654 | Alabama A & M University | AAMU | 4900 Meridian Street | Normal | AL | 35762 | 1 | 5 | Dr. Andrew Hugine, Jr. | ... | 1 | 290 | -2 | 1089 | Madison County | 105 | -86.568502 | 34.783368 | 109 | 1 |
1 | 100663 | University of Alabama at Birmingham | Administration Bldg Suite 1070 | Birmingham | AL | 35294-0110 | 1 | 5 | Ray L. Watts | ... | 1 | 142 | -2 | 1073 | Jefferson County | 107 | -86.799345 | 33.505697 | 95 | 1 | |
2 | 100690 | Amridge University | Southern Christian University Regions University | 1200 Taylor Rd | Montgomery | AL | 36117-3553 | 1 | 5 | Michael C.Turner | ... | 1 | 388 | -2 | 1101 | Montgomery County | 102 | -86.174010 | 32.362609 | 126 | 2 |
3 | 100706 | University of Alabama in Huntsville | UAH University of Alabama Huntsville | 301 Sparkman Dr | Huntsville | AL | 35899 | 1 | 5 | Darren Dawson | ... | 1 | 290 | -2 | 1089 | Madison County | 105 | -86.640449 | 34.724557 | 99 | 2 |
4 | 100724 | Alabama State University | 915 S Jackson Street | Montgomery | AL | 36104-0271 | 1 | 5 | Quinton T. Ross | ... | 1 | 388 | -2 | 1101 | Montgomery County | 107 | -86.295677 | 32.364317 | 118 | 1 |
5 rows × 73 columns
allunis['SECTOR'].dtypes
dtype('int64')
For the sake of this exploration, we’ll limit our scope to “traditional” four-year college experiences. To do this, I have identified the appropriate sector codes in the IPEDS data set to filter out all community colleges, trade schools, certificate programs and graduate/professional degree granting insitutions.
sectorcodes = [1,2,3]
fouryearunis = allunis[allunis.SECTOR.isin(sectorcodes)]
#total number of four year universities to confirm filtered list?
fouryearunis['UNITID'].count()
2846
fouryearunis_cleaned = fouryearunis[["UNITID","INSTNM","CITY","STABBR","OBEREG","LONGITUD","LATITUDE","SECTOR"]]
fouryearunis_cleaned.head()
UNITID | INSTNM | CITY | STABBR | OBEREG | LONGITUD | LATITUDE | SECTOR | |
---|---|---|---|---|---|---|---|---|
0 | 100654 | Alabama A & M University | Normal | AL | 5 | -86.568502 | 34.783368 | 1 |
1 | 100663 | University of Alabama at Birmingham | Birmingham | AL | 5 | -86.799345 | 33.505697 | 1 |
2 | 100690 | Amridge University | Montgomery | AL | 5 | -86.174010 | 32.362609 | 2 |
3 | 100706 | University of Alabama in Huntsville | Huntsville | AL | 5 | -86.640449 | 34.724557 | 1 |
4 | 100724 | Alabama State University | Montgomery | AL | 5 | -86.295677 | 32.364317 | 1 |
#to contextualize sector number, add a column explaining sector code
def label_sectortype(row):
if row['SECTOR']==1:
return "Public 4-Year"
elif row['SECTOR']==2:
return "Private 4-Year"
elif row['SECTOR']==3:
return "For-Profit 4-Year"
else:
return 'Invalid Sector'
#applying that label to the dataset
fouryearunis_cleaned['sectortype'] = fouryearunis_cleaned.apply(label_sectortype, axis=1)
fouryearunis_cleaned.head()
/tmp/ipykernel_1558/675621173.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
fouryearunis_cleaned['sectortype'] = fouryearunis_cleaned.apply(label_sectortype, axis=1)
UNITID | INSTNM | CITY | STABBR | OBEREG | LONGITUD | LATITUDE | SECTOR | sectortype | |
---|---|---|---|---|---|---|---|---|---|
0 | 100654 | Alabama A & M University | Normal | AL | 5 | -86.568502 | 34.783368 | 1 | Public 4-Year |
1 | 100663 | University of Alabama at Birmingham | Birmingham | AL | 5 | -86.799345 | 33.505697 | 1 | Public 4-Year |
2 | 100690 | Amridge University | Montgomery | AL | 5 | -86.174010 | 32.362609 | 2 | Private 4-Year |
3 | 100706 | University of Alabama in Huntsville | Huntsville | AL | 5 | -86.640449 | 34.724557 | 1 | Public 4-Year |
4 | 100724 | Alabama State University | Montgomery | AL | 5 | -86.295677 | 32.364317 | 1 | Public 4-Year |
#to get a sense of the type of universities represented in this set
fouryear_bytype = fouryearunis_cleaned.groupby('sectortype').UNITID.size().reset_index(name='counts')
print(fouryear_bytype)
sectortype counts
0 For-Profit 4-Year 367
1 Private 4-Year 1673
2 Public 4-Year 806
import plotly.express as px
fig = px.bar(fouryear_bytype, x='sectortype', y='counts')
fig.show()
It’s not surprising to see that the number of Private Four-Year universities far exceeds the number of for-profit and public universities combined.
Next up, we’ll see how these institutions are spread across the country. To do this, I will add context to the IPEDS dataset’s OBEREG data flag to indicate which part of the country is represented.
#to contextualize regions in column OBEREG, add a column with more detail
def label_region(row):
if row['OBEREG']==1:
return "New England"
elif row['OBEREG']==2:
return "Mid Atlantic"
elif row['OBEREG']==3:
return "Great Lakes"
elif row['OBEREG']==4:
return "Plains"
elif row['OBEREG']==5:
return "Southeast"
elif row['OBEREG']==6:
return "Southwest"
elif row['OBEREG']==7:
return "Rocky Mountains"
elif row['OBEREG']==8:
return "Far West"
elif row['OBEREG']==9:
return "US Territories"
else:
return 'N/A or Other'
#applying that label to the dataset
fouryearunis_cleaned['region'] = fouryearunis_cleaned.apply(label_region, axis=1)
fouryearunis_cleaned.head()
/tmp/ipykernel_1558/1056732486.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
UNITID | INSTNM | CITY | STABBR | OBEREG | LONGITUD | LATITUDE | SECTOR | sectortype | region | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 100654 | Alabama A & M University | Normal | AL | 5 | -86.568502 | 34.783368 | 1 | Public 4-Year | Southeast |
1 | 100663 | University of Alabama at Birmingham | Birmingham | AL | 5 | -86.799345 | 33.505697 | 1 | Public 4-Year | Southeast |
2 | 100690 | Amridge University | Montgomery | AL | 5 | -86.174010 | 32.362609 | 2 | Private 4-Year | Southeast |
3 | 100706 | University of Alabama in Huntsville | Huntsville | AL | 5 | -86.640449 | 34.724557 | 1 | Public 4-Year | Southeast |
4 | 100724 | Alabama State University | Montgomery | AL | 5 | -86.295677 | 32.364317 | 1 | Public 4-Year | Southeast |
fouryearunis_fullset = fouryearunis_cleaned.groupby(['region','sectortype'])['UNITID'].agg('count').reset_index()
print(fouryearunis_fullset)
region sectortype UNITID
0 Far West For-Profit 4-Year 87
1 Far West Private 4-Year 216
2 Far West Public 4-Year 110
3 Great Lakes For-Profit 4-Year 26
4 Great Lakes Private 4-Year 264
5 Great Lakes Public 4-Year 107
6 Mid Atlantic For-Profit 4-Year 39
7 Mid Atlantic Private 4-Year 364
8 Mid Atlantic Public 4-Year 120
9 N/A or Other Public 4-Year 7
10 New England For-Profit 4-Year 5
11 New England Private 4-Year 138
12 New England Public 4-Year 44
13 Plains For-Profit 4-Year 26
14 Plains Private 4-Year 172
15 Plains Public 4-Year 60
16 Rocky Mountains For-Profit 4-Year 25
17 Rocky Mountains Private 4-Year 34
18 Rocky Mountains Public 4-Year 45
19 Southeast For-Profit 4-Year 94
20 Southeast Private 4-Year 340
21 Southeast Public 4-Year 199
22 Southwest For-Profit 4-Year 52
23 Southwest Private 4-Year 96
24 Southwest Public 4-Year 93
25 US Territories For-Profit 4-Year 13
26 US Territories Private 4-Year 49
27 US Territories Public 4-Year 21
import plotly.express as px
df = fouryearunis_fullset
fig = px.histogram(df, x="region", y="UNITID",
color='sectortype', barmode='group',
height=400)
fig.show()
I was honestly surprised to find that the Southeast has nearly the same number of private 4-year universities as the Mid Atlantic region, and almost twice as many public universities. The Great Lakes region surprised me at first, but then I remembered that includes all Chicago/IL schools, Wisconsin, Michigan, etc. To represent these findings a little more simply, I’ll run some sums by region next.
total_regional = fouryearunis_cleaned.groupby(['region'],sort=False).count()
sorted_regional = total_regional.sort_values('UNITID',ascending=False)['UNITID']
print(sorted_regional)
region
Southeast 633
Mid Atlantic 523
Far West 413
Great Lakes 397
Plains 258
Southwest 241
New England 187
Rocky Mountains 104
US Territories 83
N/A or Other 7
Name: UNITID, dtype: int64
Now that we have established a sense of where American universities are located and the range of four year institutions, we’ll look broadly at the number of students they serve and what kind of outcomes they generally have. To do so, I’ll import a new data set from IPEDS that focuses on student outcomes.
#import second dataset with university outcome measures such as enrollment, degree attainment
uni_outcomes = pd.read_csv('/home/jovyan/python-public-policy/Sarina_Noone_Final/om2020.csv')
uni_outcomes.head()
UNITID | OMCHRT | XOMRCHRT | OMRCHRT | XOMEXCLS | OMEXCLS | XOMACHRT | OMACHRT | XOMCERT4 | OMCERT4 | ... | XOMAWDP8 | OMAWDP8 | XOMENRTP | OMENRTP | XOMENRYP | OMENRYP | XOMENRAP | OMENRAP | XOMENRUP | OMENRUP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100654 | 10 | R | 969 | R | 4 | R | 965 | R | 0 | ... | R | 30.0 | R | 32.0 | R | 2.0 | R | 30.0 | R | 38.0 |
1 | 100654 | 11 | R | 788 | R | 4 | R | 784 | R | 0 | ... | R | 28.0 | R | 32.0 | R | 2.0 | R | 30.0 | R | 40.0 |
2 | 100654 | 12 | R | 181 | R | 0 | R | 181 | R | 0 | ... | R | 39.0 | R | 33.0 | R | 2.0 | R | 31.0 | R | 28.0 |
3 | 100654 | 20 | R | 106 | R | 1 | R | 105 | R | 0 | ... | R | 14.0 | R | 34.0 | R | 0.0 | R | 34.0 | R | 51.0 |
4 | 100654 | 21 | R | 80 | R | 1 | R | 79 | R | 0 | ... | R | 13.0 | R | 34.0 | R | 0.0 | R | 34.0 | R | 53.0 |
5 rows × 54 columns
Reviewing the IPEDS variable list and descriptions, I’m most interested in looking at the total number of students an institution serves. The relevant variable is OMCHRT, where a value of 50 = total entering students; 51 = Total entering Pell Grant recipients; and 52 = total entering non-Pell Grant recipients. While it would definitely be interesting to explore different outcomes for students based on their financial aid status, for the sake of this assignment, I’ll use the total number of students entering in a cohort (OMCHRT = 50). The value in OMACHRT is the number of students who fit the descriptor in OMCHRT.
To assess outcomes, we’ll look at data in columns OMBACH4, OMBACH6 and OMNOAWD, which represent, respectively, the number of students who earned a bachelor’s degree within four years, within six years or who at 8 years have not earned a degree yet.
PellCodes = [50]
uni_students = uni_outcomes[uni_outcomes.OMCHRT.isin(PellCodes)]
uni_students.head()
UNITID | OMCHRT | XOMRCHRT | OMRCHRT | XOMEXCLS | OMEXCLS | XOMACHRT | OMACHRT | XOMCERT4 | OMCERT4 | ... | XOMAWDP8 | OMAWDP8 | XOMENRTP | OMENRTP | XOMENRYP | OMENRYP | XOMENRAP | OMENRAP | XOMENRUP | OMENRUP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | 100654 | 50 | R | 1277 | R | 5 | R | 1272 | R | 0 | ... | R | 31.0 | R | 31.0 | R | 1.0 | R | 30.0 | R | 38.0 |
27 | 100663 | 50 | R | 3526 | R | 5 | R | 3521 | R | 0 | ... | R | 57.0 | R | 26.0 | R | 1.0 | R | 25.0 | R | 17.0 |
41 | 100690 | 50 | R | 147 | R | 0 | R | 147 | R | 0 | ... | R | 36.0 | R | 6.0 | R | 0.0 | R | 6.0 | R | 58.0 |
56 | 100706 | 50 | R | 1573 | R | 0 | R | 1573 | R | 0 | ... | R | 54.0 | R | 33.0 | R | 2.0 | R | 31.0 | R | 13.0 |
71 | 100724 | 50 | R | 1874 | R | 1 | R | 1873 | R | 0 | ... | R | 32.0 | R | 36.0 | R | 1.0 | R | 35.0 | R | 32.0 |
5 rows × 54 columns
#to contextualize student population served (OMCHRT) number, add a descriptor column
def label_studentdetails(row):
if row['OMCHRT']==50:
return "Total Students"
else:
return 'Data Unavailable'
#applying that label to the dataset
uni_students['studentdetails'] = uni_students.apply(label_studentdetails, axis=1)
uni_students.head()
/tmp/ipykernel_1558/3244937734.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
UNITID | OMCHRT | XOMRCHRT | OMRCHRT | XOMEXCLS | OMEXCLS | XOMACHRT | OMACHRT | XOMCERT4 | OMCERT4 | ... | OMAWDP8 | XOMENRTP | OMENRTP | XOMENRYP | OMENRYP | XOMENRAP | OMENRAP | XOMENRUP | OMENRUP | studentdetails | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
12 | 100654 | 50 | R | 1277 | R | 5 | R | 1272 | R | 0 | ... | 31.0 | R | 31.0 | R | 1.0 | R | 30.0 | R | 38.0 | Total Students |
27 | 100663 | 50 | R | 3526 | R | 5 | R | 3521 | R | 0 | ... | 57.0 | R | 26.0 | R | 1.0 | R | 25.0 | R | 17.0 | Total Students |
41 | 100690 | 50 | R | 147 | R | 0 | R | 147 | R | 0 | ... | 36.0 | R | 6.0 | R | 0.0 | R | 6.0 | R | 58.0 | Total Students |
56 | 100706 | 50 | R | 1573 | R | 0 | R | 1573 | R | 0 | ... | 54.0 | R | 33.0 | R | 2.0 | R | 31.0 | R | 13.0 | Total Students |
71 | 100724 | 50 | R | 1874 | R | 1 | R | 1873 | R | 0 | ... | 32.0 | R | 36.0 | R | 1.0 | R | 35.0 | R | 32.0 | Total Students |
5 rows × 55 columns
uni_students_cleaned = uni_students [["UNITID","OMACHRT","OMBACH4","OMBACH6","OMNOAWD","studentdetails"]]
uni_students_cleaned.head()
UNITID | OMACHRT | OMBACH4 | OMBACH6 | OMNOAWD | studentdetails | |
---|---|---|---|---|---|---|
12 | 100654 | 1272 | 118.0 | 349.0 | 875 | Total Students |
27 | 100663 | 3521 | 1366.0 | 1899.0 | 1531 | Total Students |
41 | 100690 | 147 | 43.0 | 49.0 | 94 | Total Students |
56 | 100706 | 1573 | 553.0 | 807.0 | 724 | Total Students |
71 | 100724 | 1873 | 239.0 | 553.0 | 1278 | Total Students |
For each university and each subset of students, we’ll calculate the number of students that are “well served” as those who earn their degree within the four years; we’ll calculate those “poorly served” as those who do not have a degree after eight years. Each of these will be represented as a percentage of the total subpopulation.
uni_students_cleaned.dtypes
UNITID int64
OMACHRT int64
OMBACH4 float64
OMBACH6 float64
OMNOAWD int64
studentdetails object
dtype: object
uni_students_cleaned['pct_well_served']=(uni_students_cleaned['OMBACH4']/uni_students_cleaned['OMACHRT'])
uni_students_cleaned['pct_poorly_served']=(uni_students_cleaned['OMNOAWD']/uni_students_cleaned['OMACHRT'])
uni_students_cleaned
/tmp/ipykernel_1558/3501207542.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
/tmp/ipykernel_1558/3501207542.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
UNITID | OMACHRT | OMBACH4 | OMBACH6 | OMNOAWD | studentdetails | pct_well_served | pct_poorly_served | |
---|---|---|---|---|---|---|---|---|
12 | 100654 | 1272 | 118.0 | 349.0 | 875 | Total Students | 0.092767 | 0.687893 |
27 | 100663 | 3521 | 1366.0 | 1899.0 | 1531 | Total Students | 0.387958 | 0.434820 |
41 | 100690 | 147 | 43.0 | 49.0 | 94 | Total Students | 0.292517 | 0.639456 |
56 | 100706 | 1573 | 553.0 | 807.0 | 724 | Total Students | 0.351558 | 0.460267 |
71 | 100724 | 1873 | 239.0 | 553.0 | 1278 | Total Students | 0.127603 | 0.682328 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
48192 | 495031 | 2 | 0.0 | 2.0 | 0 | Total Students | 0.000000 | 0.000000 |
48196 | 495147 | 2 | NaN | NaN | 2 | Total Students | NaN | 1.000000 |
48200 | 495183 | 2 | NaN | NaN | 0 | Total Students | NaN | 0.000000 |
48206 | 495280 | 13 | 0.0 | 2.0 | 5 | Total Students | 0.000000 | 0.384615 |
48220 | 495767 | 20061 | 9848.0 | 13258.0 | 5915 | Total Students | 0.490903 | 0.294851 |
3694 rows × 8 columns
To attempt to put this into context with the data on institution type and region, I will merge these datasets using their unique UNITIDs.
finaldata = pd.merge(
left=fouryearunis_cleaned,
right=uni_students_cleaned,
how="left",
on=None,
left_on='UNITID',
right_on='UNITID',
left_index=False,
right_index=False,
sort=True,
suffixes=("_x", "_y"),
copy=True,
indicator=False,
validate=None,
)
finaldata.head()
UNITID | INSTNM | CITY | STABBR | OBEREG | LONGITUD | LATITUDE | SECTOR | sectortype | region | OMACHRT | OMBACH4 | OMBACH6 | OMNOAWD | studentdetails | pct_well_served | pct_poorly_served | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100654 | Alabama A & M University | Normal | AL | 5 | -86.568502 | 34.783368 | 1 | Public 4-Year | Southeast | 1272.0 | 118.0 | 349.0 | 875.0 | Total Students | 0.092767 | 0.687893 |
1 | 100663 | University of Alabama at Birmingham | Birmingham | AL | 5 | -86.799345 | 33.505697 | 1 | Public 4-Year | Southeast | 3521.0 | 1366.0 | 1899.0 | 1531.0 | Total Students | 0.387958 | 0.434820 |
2 | 100690 | Amridge University | Montgomery | AL | 5 | -86.174010 | 32.362609 | 2 | Private 4-Year | Southeast | 147.0 | 43.0 | 49.0 | 94.0 | Total Students | 0.292517 | 0.639456 |
3 | 100706 | University of Alabama in Huntsville | Huntsville | AL | 5 | -86.640449 | 34.724557 | 1 | Public 4-Year | Southeast | 1573.0 | 553.0 | 807.0 | 724.0 | Total Students | 0.351558 | 0.460267 |
4 | 100724 | Alabama State University | Montgomery | AL | 5 | -86.295677 | 32.364317 | 1 | Public 4-Year | Southeast | 1873.0 | 239.0 | 553.0 | 1278.0 | Total Students | 0.127603 | 0.682328 |
Lastly, I’ll try a few visualizations to see if there are any trends in quality of institutions by type or by region.
finaldata_grouped = finaldata.groupby(['region','sectortype'])['pct_well_served','pct_poorly_served'].agg('mean').reset_index()
finaldata_grouped
/tmp/ipykernel_1558/2067835919.py:1: FutureWarning:
Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
region | sectortype | pct_well_served | pct_poorly_served | |
---|---|---|---|---|
0 | Far West | For-Profit 4-Year | 0.152861 | 0.424287 |
1 | Far West | Private 4-Year | 0.471077 | 0.368449 |
2 | Far West | Public 4-Year | 0.251621 | 0.471987 |
3 | Great Lakes | For-Profit 4-Year | 0.120212 | 0.542366 |
4 | Great Lakes | Private 4-Year | 0.463626 | 0.395726 |
5 | Great Lakes | Public 4-Year | 0.246805 | 0.563913 |
6 | Mid Atlantic | For-Profit 4-Year | 0.168820 | 0.564898 |
7 | Mid Atlantic | Private 4-Year | 0.425702 | 0.401470 |
8 | Mid Atlantic | Public 4-Year | 0.436490 | 0.399721 |
9 | N/A or Other | Public 4-Year | 0.828581 | 0.144768 |
10 | New England | For-Profit 4-Year | 0.196760 | 0.553050 |
11 | New England | Private 4-Year | 0.551633 | 0.312201 |
12 | New England | Public 4-Year | 0.428574 | 0.396982 |
13 | Plains | For-Profit 4-Year | 0.145377 | 0.583044 |
14 | Plains | Private 4-Year | 0.414622 | 0.407241 |
15 | Plains | Public 4-Year | 0.334296 | 0.485910 |
16 | Rocky Mountains | For-Profit 4-Year | 0.347062 | 0.417599 |
17 | Rocky Mountains | Private 4-Year | 0.313327 | 0.500538 |
18 | Rocky Mountains | Public 4-Year | 0.194103 | 0.558302 |
19 | Southeast | For-Profit 4-Year | 0.120625 | 0.597062 |
20 | Southeast | Private 4-Year | 0.366566 | 0.489800 |
21 | Southeast | Public 4-Year | 0.300463 | 0.495468 |
22 | Southwest | For-Profit 4-Year | 0.164060 | 0.555494 |
23 | Southwest | Private 4-Year | 0.319113 | 0.517798 |
24 | Southwest | Public 4-Year | 0.315166 | 0.518157 |
25 | US Territories | For-Profit 4-Year | 0.078183 | 0.537755 |
26 | US Territories | Private 4-Year | 0.140022 | 0.553903 |
27 | US Territories | Public 4-Year | 0.123628 | 0.530386 |
import plotly.express as px
df = finaldata_grouped
fig = px.histogram(df, x="region", y="pct_well_served", histfunc='avg',
color='sectortype', barmode='group',
height=400,
title="Percent of Students Earning BA in 4 years, by Region")
fig.show()
The outlier on the right under “N/A or Other region” for public 4-year institution is likely representative of US Armed Forces academies which had their own classification in IPEDS for regions. It is also worth noting that in calculating the size of the student cohort and degree attainment, students who are called into active duty, injured or deceased are excluded from the data set, which would also impact the military colleges’ performance data.
df = finaldata_grouped
fig = px.histogram(df, x="region", y="pct_poorly_served", histfunc='avg',
color='sectortype', barmode='group',
height=400,
title="Percent of Students Not Earning a Degree in 8 Years, by Region")
fig.show()
These visualizations show generally little variation in terms of academic outcomes for the students served. There may be a slightly higher percentage of students who are “well-served” by Mid-Atlantic private universities, but this may, of course, be conflated with the academic competitiveness of gaining admissions to certain schools and a student’s past performance and aptitude.
Altogether, this study on U.S. universities and regional performance reveals the real depth of educational data and complexity in comparing school-to-school. As we know, student experiences vary based on their PK-12 educational preparation, household support and income, community resources and so many other factors that are out of the hands of the learner.
A more rigorous study could leverage IPEDS data on students’ SAT scores or high school GPAs, family income, post-college job placement or more to gauge the quality or impact of the university on student outcomes. It would also be interesting to dive into the specifics of one region, for example looking within the Mid Atlantic to surface deeper variation. The four-year university dataset included 2,846 colleges which are difficult to compare.
Personal note: I was glad to have the chance to engage with IPEDS data through this assignment as I will be graduating this month and working in postsecondary education consulting. This was my first foray into using this robust data set myself, though I’ve read countless studies that leverage the data. I know this is a very amateur first step to exploring here, but appreciated the chance to get more familiar with it and learn how to read the descriptions for variables a little more closely.