Importing Libraries¶

In [883]:
from pgmpy.models import BayesianModel, BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

Data¶

Loading the Data¶

In [884]:
# Loading Data
df = pd.read_csv('Financial_well_being_literacy_Romania_csv.csv', encoding='cp1252')
In [885]:
# Check the first 5 rows of the data
df.head()
Out[885]:
id NUTS3 NUTS2 SD1 SD1a SD2 SD3 age age2 SD4 ... C3 C4 C5 C6 C7 C8 weight age16_61 self FWB
0 2989 IF Bucuresti-Ilfov Urban area, with fewer than 30,000 people Urban Female 49 40-59 years 45-54 years High school (12 grades) ... False They are equally rich The same Multiple business of investments Bonds Stocks 0.054678 1 0 67
1 3346 IF Bucuresti-Ilfov Urban area, with fewer than 30,000 people Urban Female 32 16-39 years 25-34 years Bachelor and master education ... False They are equally rich The same Multiple business of investments Bonds Stocks 0.054678 1 0 75
2 2574 IF Bucuresti-Ilfov Urban area, with fewer than 30,000 people Urban Female 41 40-59 years 35-44 years Bachelor and master education ... False They are equally rich The same Multiple business of investments Bonds Stocks 0.054678 1 0 68
3 3337 IF Bucuresti-Ilfov Urban area, with fewer than 30,000 people Urban Female 44 40-59 years 35-44 years High school (12 grades) ... Don’t know They are equally rich The same Multiple business of investments Bonds Stocks 0.054678 1 0 48
4 3332 IF Bucuresti-Ilfov Urban area, with fewer than 30,000 people Urban Female 75 60+ years 65+ years Middle school (8 grades) ... True Don’t know The same Don’t know Savings deposit Don’t know 0.054678 0 0 53

5 rows × 73 columns

Looking into the Columns¶

In [886]:
# Showing the column
columns = df.columns
columns
Out[886]:
Index(['id', 'NUTS3', 'NUTS2', 'SD1', 'SD1a', 'SD2', 'SD3', 'age', 'age2',
       'SD4', 'SD5', 'SD6', 'SD7', 'SD8', 'SD9', 'I1', 'I2_1', 'I2_2', 'I2_3',
       'I2_4', 'I2_5', 'I2_6', 'I2_7', 'I2_8', 'I2_0', 'I3_1', 'I3_2', 'I3_3',
       'I3_4', 'I3_5', 'I3_6', 'I4_1', 'I4_2', 'I4_3', 'I4_4', 'I5', 'I6_1',
       'I6_2', 'I6_3', 'I6_4', 'I6_5', 'I6_7', 'I6_8', 'I6_9', 'I7_1', 'I7_2',
       'I7_3', 'I7_4', 'I7_5', 'I7_6', 'I7_0', 'A1_1', 'A1_2', 'A1_3', 'A1_4',
       'A1_5', 'A1_6', 'B1_1', 'B1_2', 'B1_3', 'B1_4', 'C1', 'C2', 'C3', 'C4',
       'C5', 'C6', 'C7', 'C8', 'weight', 'age16_61', 'self', 'FWB'],
      dtype='object')

As shown in the result above, the dataset will need to be used alongside with a table that describes each of the column. The table can be referred in the appendix.

The first 'id' is the unique identifier for each row.

The columns with 'SD' in its name, are questions relating to Sociodemographic variables.

The columns with 'I' in its name, are question relating to Financial Behaviour and Attitudes.

The columns with 'A' in its name, are question relating to Financial Well-Being.

The column with 'C; in its name, are questions relating to Financial Literacy.

There are also columns related to weight as 'weight'. As for columns 'age16_61' and 'self' are attributes that contribute to the final financial well being score. This is because, the test selected has different scoring for those who are older than 61 and those who are doing the survey by themselves. The financial well being index method used in for the paper is from the Consumer Financial Protection Bureau (CFPB) and also the financial literacy index, is also adopted from a well established standard, called the 'The Big Three' questions with additional questions that further evaluate an individual's financial literacy. This method is from Lusardi and Mitchell, 2009.

The full dataset and the accompanying paper can be downloaded here. The word file will clearly list the column id with its corresponding question, and the answer with its represented value.

In [887]:
# Checking for the total number of null values in each columns
for col_name in df.columns:
  count_nan = df[col_name].isna().sum()
  print (f"Column = {col_name}; Count of NaN = {count_nan}")
Column = id; Count of NaN = 0
Column = NUTS3; Count of NaN = 0
Column = NUTS2; Count of NaN = 0
Column = SD1; Count of NaN = 0
Column = SD1a; Count of NaN = 0
Column = SD2; Count of NaN = 0
Column = SD3; Count of NaN = 0
Column = age; Count of NaN = 0
Column = age2; Count of NaN = 0
Column = SD4; Count of NaN = 0
Column = SD5; Count of NaN = 0
Column = SD6; Count of NaN = 0
Column = SD7; Count of NaN = 0
Column = SD8; Count of NaN = 0
Column = SD9; Count of NaN = 0
Column = I1; Count of NaN = 0
Column = I2_1; Count of NaN = 0
Column = I2_2; Count of NaN = 0
Column = I2_3; Count of NaN = 0
Column = I2_4; Count of NaN = 0
Column = I2_5; Count of NaN = 0
Column = I2_6; Count of NaN = 0
Column = I2_7; Count of NaN = 0
Column = I2_8; Count of NaN = 0
Column = I2_0; Count of NaN = 0
Column = I3_1; Count of NaN = 997
Column = I3_2; Count of NaN = 997
Column = I3_3; Count of NaN = 997
Column = I3_4; Count of NaN = 997
Column = I3_5; Count of NaN = 996
Column = I3_6; Count of NaN = 997
Column = I4_1; Count of NaN = 0
Column = I4_2; Count of NaN = 0
Column = I4_3; Count of NaN = 0
Column = I4_4; Count of NaN = 0
Column = I5; Count of NaN = 0
Column = I6_1; Count of NaN = 0
Column = I6_2; Count of NaN = 0
Column = I6_3; Count of NaN = 0
Column = I6_4; Count of NaN = 0
Column = I6_5; Count of NaN = 0
Column = I6_7; Count of NaN = 0
Column = I6_8; Count of NaN = 0
Column = I6_9; Count of NaN = 0
Column = I7_1; Count of NaN = 0
Column = I7_2; Count of NaN = 0
Column = I7_3; Count of NaN = 0
Column = I7_4; Count of NaN = 0
Column = I7_5; Count of NaN = 0
Column = I7_6; Count of NaN = 0
Column = I7_0; Count of NaN = 0
Column = A1_1; Count of NaN = 0
Column = A1_2; Count of NaN = 0
Column = A1_3; Count of NaN = 0
Column = A1_4; Count of NaN = 0
Column = A1_5; Count of NaN = 0
Column = A1_6; Count of NaN = 0
Column = B1_1; Count of NaN = 0
Column = B1_2; Count of NaN = 0
Column = B1_3; Count of NaN = 0
Column = B1_4; Count of NaN = 0
Column = C1; Count of NaN = 0
Column = C2; Count of NaN = 0
Column = C3; Count of NaN = 0
Column = C4; Count of NaN = 0
Column = C5; Count of NaN = 0
Column = C6; Count of NaN = 0
Column = C7; Count of NaN = 0
Column = C8; Count of NaN = 0
Column = weight; Count of NaN = 0
Column = age16_61; Count of NaN = 0
Column = self; Count of NaN = 0
Column = FWB; Count of NaN = 0

Only I3 consisted of null values because that question asks what kind of support do individuals use to support their financial decisions, and they participants can choose up to 2 of the options available. They also had the option of not choosing any of them if it is not applicable. Hence, the larger number of null values as compared to the other columns. The following is the question.

I3. Which of the following sources do you use to support your financial decisions? Choose up to 2 answers

Yes No
I3_1 Mass-media (TV and radio) 1 2
I3_2 Online and printed newspapers 1 2
I3_3 Financial websites and mobile apps 1 2
I3_4 Advice from friends 1 2
I3_5 Personal experience and knowledge 1 2
I3_6 Other sources 1 2

We will be using all the columns available. For sociodemographic, we cannot derive a value or score to aggregate the columns, while financial behaviour & attitude, financial literacy and financial well being can. Therefore, the 3 mentioned will be aggregated into one value and produce a set of probability.

Sociodemographic Columns¶

We will be using only 3 sociaodemographic columns, because using them all, will cause the final distribution table to have many missing values as there will be instances of where they do not exists. For example, someone who is 16 of age, with high financial literacy and having an annual income of the top 10%, does not exist, which causes columns that have this particular set of characteristics, amongst others, to have empty values.

Gender¶

In [888]:
df['SD2'].describe()
Out[888]:
count       1391
unique         2
top       Female
freq         722
Name: SD2, dtype: object
In [889]:
# Printing unique value in a list format
for unique_value in list(set(df['SD2'])):
    print(unique_value)
Male
Female
In [890]:
# Histogram plot for visualization
df['SD2'].hist()
Out[890]:
<Axes: >
No description has been provided for this image
In [891]:
# Get the count of each unique value
pd.crosstab(df['SD2'], 'Count')
Out[891]:
col_0 Count
SD2
Female 722
Male 669
In [892]:
# Get the probability by normalizing the counts of unique value
gender_probability = pd.crosstab(df['SD2'], 'Probability', normalize=True)
gender_probability
Out[892]:
col_0 Probability
SD2
Female 0.519051
Male 0.480949

Age¶

In [893]:
df['SD3'].describe()
Out[893]:
count    1391.000000
mean       47.386053
std        15.071763
min        16.000000
25%        36.000000
50%        47.000000
75%        58.000000
max        90.000000
Name: SD3, dtype: float64
In [894]:
# Histogram plot for visualization
df['SD3'].hist(figsize=(7,5))
Out[894]:
<Axes: >
No description has been provided for this image

The ages are between 16 to 90. We shall break and discretize them into groups. We will start from 16 to 30 and then it will continue in the increments of 10, until we reach the age of 70, we will them group the remaining 71 and above into one group. This is because they have the lower count if we were to split the groups in 10.

In [895]:
# Setting age labels
age_labels = [
    '16-36', '37-56', '57-76', '76+'
]

# Edge value of each bins
age_groups = [15,36,56,76,90]

# Sort according to the age groups set
df['SD3_processed'] = pd.cut(df['SD3'], age_groups, labels=age_labels)
df['SD3_processed']
Out[895]:
0       37-56
1       16-36
2       37-56
3       37-56
4       57-76
        ...  
1386    37-56
1387    37-56
1388    16-36
1389    37-56
1390    37-56
Name: SD3_processed, Length: 1391, dtype: category
Categories (4, object): ['16-36' < '37-56' < '57-76' < '76+']
In [896]:
# Bar plot for visualization
df['SD3_processed'].value_counts().loc[age_labels].plot.bar()
Out[896]:
<Axes: xlabel='SD3_processed'>
No description has been provided for this image

Because the last group has too little count, we shall group the final group as 57+.

In [897]:
# Setting age labels
age_labels = [
    '16-36', '37-56', '57+'
]

# Edge value of each bins
age_groups = [15,36,56,90]

# Sort according to the age groups set
df['SD3_processed'] = pd.cut(df['SD3'], age_groups, labels=age_labels)
df['SD3_processed']
Out[897]:
0       37-56
1       16-36
2       37-56
3       37-56
4         57+
        ...  
1386    37-56
1387    37-56
1388    16-36
1389    37-56
1390    37-56
Name: SD3_processed, Length: 1391, dtype: category
Categories (3, object): ['16-36' < '37-56' < '57+']
In [898]:
# Bar plot for visualization
df['SD3_processed'].value_counts().loc[age_labels].plot.bar()
Out[898]:
<Axes: xlabel='SD3_processed'>
No description has been provided for this image
In [899]:
# Get the count of each unique value
pd.crosstab(df['SD3_processed'], 'Count')
Out[899]:
col_0 Count
SD3_processed
16-36 366
37-56 650
57+ 375
In [900]:
# Get the probability by normalizing the counts of unique value
age_probability = pd.crosstab(df['SD3_processed'], 'Probability', normalize=True)
age_probability
Out[900]:
col_0 Probability
SD3_processed
16-36 0.26312
37-56 0.46729
57+ 0.26959

Educational Attainment¶

In [901]:
df["SD4"].describe()
Out[901]:
count                        1391
unique                          5
top       High school (12 grades)
freq                          697
Name: SD4, dtype: object
In [902]:
# Printing unique value in a list format
for unique_value in list(set(df['SD4'])):
	print(unique_value)
High school (12 grades)
Bachelor and master education
Middle school (8 grades)
Primary school (4 grades) 
Post-graduate education
In [903]:
# Educational Attainment in ascending order
education_order = [
    'Primary school (4 grades) ',
    'Middle school (8 grades)',
    'High school (12 grades)',
    'Bachelor and master education',
    'Post-graduate education'
]

# Bar plot for visualization
df['SD4'].value_counts().loc[education_order].plot.bar()
Out[903]:
<Axes: xlabel='SD4'>
No description has been provided for this image

Similarly, we can see that 'Post-graduate education' and 'Primary school (4 grades), has very low count, we will group them according to the adjacent groups.

In [904]:
education_group_changes = {
    'Primary school (4 grades) ': 'Middle school (8 grades) and below',
    'Middle school (8 grades)': 'Middle school (8 grades) and below',
    'Bachelor and master education': 'Bachelor and above',
    'Post-graduate education': 'Bachelor and above'
}

df['SD4_processed'] = df['SD4'].replace(education_group_changes)
df['SD4_processed']
Out[904]:
0                  High school (12 grades)
1                       Bachelor and above
2                       Bachelor and above
3                  High school (12 grades)
4       Middle school (8 grades) and below
                       ...                
1386                    Bachelor and above
1387               High school (12 grades)
1388               High school (12 grades)
1389               High school (12 grades)
1390               High school (12 grades)
Name: SD4_processed, Length: 1391, dtype: object
In [905]:
# Get the count of each unique value
pd.crosstab(df['SD4_processed'], 'Count')
Out[905]:
col_0 Count
SD4_processed
Bachelor and above 547
High school (12 grades) 697
Middle school (8 grades) and below 147
In [906]:
# Get the probability by normalizing the counts of unique value
education_probability = pd.crosstab(df['SD4_processed'], 'Probability', normalize=True)
education_probability
Out[906]:
col_0 Probability
SD4_processed
Bachelor and above 0.393242
High school (12 grades) 0.501078
Middle school (8 grades) and below 0.105679
In [907]:
# Educational Attainment in ascending order
education_order = [
    'Middle school (8 grades) and below',
    'High school (12 grades)',
    'Bachelor and above'
]

# Bar plot for visualization
df['SD4_processed'].value_counts().loc[education_order].plot.bar()
Out[907]:
<Axes: xlabel='SD4_processed'>
No description has been provided for this image

Financial Behaviour and Attitude¶

We will need to make some general assumptions, using background knowledge of the field, to give each column a score, and then we make an aggregate score in the end to assess the final financial behaviour and attitude. The steps taken to do so will be shown in the following.

After carefully going through the lists of questions for financial behaviour and attitude, we noted that there are only I1 and I2 that we can properly give a value ranking the financial behaviour and attitude being financially favourable or not. Therefore, the aggregate of these two will be used represent the value of Financial Behaviour and Attitude.

Record Keeping¶

The question in the survey is 'Do you or other person in your household keep a record of income and expenses on a monthly basis?'

In [908]:
df['I1'].describe()
Out[908]:
count                                                  1391
unique                                                    4
top       No, we don’t keep records, but we know how muc...
freq                                                    550
Name: I1, dtype: object
In [909]:
# Printing unique value in a list format
for unique_value in list(set(df['I1'])):
    print(unique_value)
Yes, we keep records, but not all revenues and expenses are recorded
No, we don’t keep records, but we know how much money we earn and spend during a month
No, we don’t keep records, and we don’t know how much money we earn and spend during a month
Yes, we keep records of all revenues and all expenses
In [910]:
# Score given to each of the statement
record_keeping_assessment = {
    'Yes, we keep records of all revenues and all expenses': 3,
    'No, we don’t keep records, and we don’t know how much money we earn and spend during a month':0,
    'No, we don’t keep records, but we know how much money we earn and spend during a month': 1,
    'Yes, we keep records, but not all revenues and expenses are recorded':2
}

# Replacing the statement with the score
df['I1_processed'] = df['I1'].replace(record_keeping_assessment)

# Check the unprocessed column with the processed column
df[['I1','I1_processed']]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/1258281535.py:10: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df['I1_processed'] = df['I1'].replace(record_keeping_assessment)
Out[910]:
I1 I1_processed
0 Yes, we keep records, but not all revenues and... 2
1 No, we don’t keep records, but we know how muc... 1
2 Yes, we keep records, but not all revenues and... 2
3 Yes, we keep records, but not all revenues and... 2
4 No, we don’t keep records, but we know how muc... 1
... ... ...
1386 Yes, we keep records, but not all revenues and... 2
1387 No, we don’t keep records, but we know how muc... 1
1388 No, we don’t keep records, but we know how muc... 1
1389 Yes, we keep records of all revenues and all e... 3
1390 No, we don’t keep records, but we know how muc... 1

1391 rows × 2 columns

In [911]:
# Bar plot for visualization
df['I1_processed'].value_counts().loc[[0,1,2,3]].plot.bar()
Out[911]:
<Axes: xlabel='I1_processed'>
No description has been provided for this image

Money Invested¶

The question in the survey for this is 'In the past three years, have you saved or invested money in any of the following instruments'. We will be awarding more points for those that generally require a little bit more financial knowledge to apply or invest in. Such as Stocks, Bonds, Real Estate, etc. It is a multiple choice questions, and the answer is split into multiple columns. Therefore, we will need to replace some of the values according to the assessment and the summing it into one final column. The following are the columns and the corresponding options in the survey.

I2_0 I have not saved or invested
I2_1 Savings deposit
I2_2 Stocks
I2_3 Bonds
I2_4 Real estate
I2_5 Investment funds
I2_6 Life insurance
I2_7 Cryptocurrency
I2_8 I saved and kept money at home
In [912]:
# Create a list for the columns
list_of_columns_for_money_invested = []
for i in range(1,9):
    list_of_columns_for_money_invested.append("I2_" + str(i))

df[['I2_0',*list_of_columns_for_money_invested]]
Out[912]:
I2_0 I2_1 I2_2 I2_3 I2_4 I2_5 I2_6 I2_7 I2_8
0 No Yes No No Yes No Yes No No
1 No Yes No No Yes No No No Yes
2 No Yes No No Yes No Yes No No
3 No No No No No No No No Yes
4 No Yes No No No No No No Yes
... ... ... ... ... ... ... ... ... ...
1386 No Yes No Yes Yes No No No No
1387 No Yes No No No No No No Yes
1388 No No No No Yes No No No No
1389 Yes No No No No No No No No
1390 Yes No No No No No No No No

1391 rows × 9 columns

The first column, which asks if participant has saved or invested or not, will be processed separately because of the way the questions is ask as compared to the rest. When you answer yes, it is a negative outcome while the rest of the columns, they are positive when you were to answer yes.

In [913]:
# Score given to each of financial instrument, these are the values applied if it is a yes.
financial_instrument_assessment = [2,3,3,3,3,3,3,1]

# Looping through the columns and replacing yes with the values listed in the list above, while a no will be 0
for no, column in enumerate(list_of_columns_for_money_invested):
    df[column + '_processed'] = df[column].apply(
        lambda x: financial_instrument_assessment[no] if x == 'Yes' else 0
    )

df['I2_0_processed'] = df['I2_0'].apply(lambda x: 0 if x == "Yes" else 1)

# Create a list for the processed columns
list_of_columns_for_money_invested_processed = [x + '_processed' for x in list_of_columns_for_money_invested]

# Include the first column that was processed separately
list_of_columns_for_money_invested_processed.append('I2_0_processed')

# Sort the list in ascending order.
list_of_columns_for_money_invested_processed.sort()

df[list_of_columns_for_money_invested_processed]
Out[913]:
I2_0_processed I2_1_processed I2_2_processed I2_3_processed I2_4_processed I2_5_processed I2_6_processed I2_7_processed I2_8_processed
0 1 2 0 0 3 0 3 0 0
1 1 2 0 0 3 0 0 0 1
2 1 2 0 0 3 0 3 0 0
3 1 0 0 0 0 0 0 0 1
4 1 2 0 0 0 0 0 0 1
... ... ... ... ... ... ... ... ... ...
1386 1 2 0 3 3 0 0 0 0
1387 1 2 0 0 0 0 0 0 1
1388 1 0 0 0 3 0 0 0 0
1389 0 0 0 0 0 0 0 0 0
1390 0 0 0 0 0 0 0 0 0

1391 rows × 9 columns

Aggregate into Financial Behaviour Column¶

Now that we have a list of processed columns, we will combine them into one final column, and then aggregate it once more with the processed record keeping to get the final financial behaviour and attitude value.

In [914]:
# Sum all the processed columns of I2
df['I2_processed'] = df[list_of_columns_for_money_invested_processed].sum(axis=1)
df['I2_processed']
Out[914]:
0       9
1       7
2       9
3       2
4       4
       ..
1386    9
1387    4
1388    4
1389    0
1390    0
Name: I2_processed, Length: 1391, dtype: int64
In [915]:
# Bar plot for visualization
df['I2_processed'].value_counts().plot.bar(figsize=(10,7))
Out[915]:
<Axes: xlabel='I2_processed'>
No description has been provided for this image
In [916]:
# Showing both columns together
df[['I2_processed', 'I1_processed']]
Out[916]:
I2_processed I1_processed
0 9 2
1 7 1
2 9 2
3 2 2
4 4 1
... ... ...
1386 9 2
1387 4 1
1388 4 1
1389 0 3
1390 0 1

1391 rows × 2 columns

In [917]:
# Combining both I1 Processed and I2 Processed for Financial Behaviour and Attitude Index
df['I1&I2'] = ( (df['I2_processed'] * 0.5) + (df['I1_processed'] * 0.5) )
df['I1&I2']
Out[917]:
0       5.5
1       4.0
2       5.5
3       2.0
4       2.5
       ... 
1386    5.5
1387    2.5
1388    2.5
1389    1.5
1390    0.5
Name: I1&I2, Length: 1391, dtype: float64
In [918]:
df['I1&I2'].describe()
Out[918]:
count    1391.000000
mean        1.706326
std         1.268400
min         0.000000
25%         0.500000
50%         1.500000
75%         2.500000
max         6.500000
Name: I1&I2, dtype: float64
In [919]:
# Visualize the data
df['I1&I2'].hist()
Out[919]:
<Axes: >
No description has been provided for this image
In [920]:
# Setting the Financial Behaviour labels
financial_behaviour_score_labels = ['0-2', '2-4', '4+']

# Edge values for each bin
financial_behaviour_groups = [-1,2,4,7]

# Group the values into the three groups
df['Financial Behaviour'] = pd.cut(df['I1&I2'], financial_behaviour_groups, labels=financial_behaviour_score_labels)

df[['I1&I2','Financial Behaviour']]
Out[920]:
I1&I2 Financial Behaviour
0 5.5 4+
1 4.0 2-4
2 5.5 4+
3 2.0 0-2
4 2.5 2-4
... ... ...
1386 5.5 4+
1387 2.5 2-4
1388 2.5 2-4
1389 1.5 0-2
1390 0.5 0-2

1391 rows × 2 columns

In [921]:
# Visualize final financial behaviour column
df['Financial Behaviour'].value_counts().plot.bar()
Out[921]:
<Axes: xlabel='Financial Behaviour'>
No description has been provided for this image
In [922]:
# Get the count of each unique value
pd.crosstab(df['Financial Behaviour'], 'Count')
Out[922]:
col_0 Count
Financial Behaviour
0-2 965
2-4 347
4+ 79
In [923]:
# Get the probability by normalizing the counts of unique value
financial_behaviour_probability = pd.crosstab(df['Financial Behaviour'], 'Probability', normalize=True)
financial_behaviour_probability
Out[923]:
col_0 Probability
Financial Behaviour
0-2 0.693746
2-4 0.249461
4+ 0.056794

Financial Literacy¶

The financial literacy questions, are those with column name that starts with 'C'. The questions each have a correct answer as this is more like a test. Even though, the survey participants could be wrong, but they will still be awarded some points for being close to the answer.

Because the values that are collect in this dataset are the answer value rather than the points awarded, we will need to go column by column and process them, and then aggregate them into one column, similar to the financial behaviour column done above.

Note that, in the specification paper for the survey, the lower point of 1 is the answer, while 3 is the option furthest from the answer, and 0 indicates individual who did not answer the question. Therefore, we will need to re-calibrate the values accordingly.

Question 1¶

The question in the survey is 'Which of the following represents the highest probability of something happening?'. The following are the choices and corresponding points awarded.

1 in 10 1
1 in 1,000 2
1 in 1,000,000 3
Don’t know 0
In [924]:
df['C1'].describe()
Out[924]:
count        1391
unique          4
top       1 in 10
freq          537
Name: C1, dtype: object
In [925]:
# Printing unique value in a list format
for unique_value in list(set(df['C1'])):
	print(unique_value)
1 in 1,000,000
1 in 10
1 in 1,000
Don’t know
In [926]:
list_of_columns_for_financial_literacy = []
In [927]:
# Make a dictionary based on the values awarded
c1_score = {
    'Don’t know':0,
    '1 in 10':3,
    '1 in 1,000':2,
    '1 in 1,000,000':1
}

# Temp variable for new column name
temp_col = 'C1_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C1'].replace(c1_score)
df[['C1',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/1883059767.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C1'].replace(c1_score)
Out[927]:
C1 C1_processed
0 1 in 1,000 2
1 1 in 1,000 2
2 1 in 1,000 2
3 1 in 1,000 2
4 1 in 1,000 2
... ... ...
1386 1 in 1,000 2
1387 1 in 1,000 2
1388 Don’t know 0
1389 Don’t know 0
1390 Don’t know 0

1391 rows × 2 columns

Question 2¶

'Suppose you had LEI 100 in a savings account and the interest rate was 10 percent per year. After 5 years, how much do you think you would have in the account if you left the money to grow? '

More than LEI 150 1
Exactly LEI 150 lei 2
Less than LEI 150 lei 3
Don’t know 0
In [928]:
df['C2'].describe()
Out[928]:
count                   1391
unique                     4
top       More than LEI 150 
freq                     464
Name: C2, dtype: object
In [929]:
# Printing unique value in a list format
for unique_value in list(set(df['C2'])):
	print(unique_value)
Don’t know
More than LEI 150 
Less than LEI 150
Exactly LEI 150 
In [930]:
# Make a dictionary based on the values awarded
c2_score = {
    'Don’t know':0,
    'More than LEI 150 ':3,
    'Exactly LEI 150 ':2,
    'Less than LEI 150':1
}

# Temp variable for new column name
temp_col = 'C2_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C2'].replace(c2_score)
df[['C2',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/670163980.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C2'].replace(c2_score)
Out[930]:
C2 C2_processed
0 Exactly LEI 150 2
1 Exactly LEI 150 2
2 Exactly LEI 150 2
3 Exactly LEI 150 2
4 Exactly LEI 150 2
... ... ...
1386 Exactly LEI 150 2
1387 Exactly LEI 150 2
1388 More than LEI 150 3
1389 Exactly LEI 150 2
1390 Don’t know 0

1391 rows × 2 columns

Question 3¶

'True or false: A 15-year mortgage typically requires higher monthly payments than a 30-year mortgage but the total interest over the life of the loan will be less'

True 1
False 2
Don’t know 0
In [931]:
df['C3'].describe()
Out[931]:
count     1391
unique       3
top       True
freq       753
Name: C3, dtype: object
In [932]:
# Printing unique value in a list format
for unique_value in list(set(df['C3'])):
	print(unique_value)
Don’t know
True
False
In [933]:
# Make a dictionary based on the values awarded
c3_score = {
    'Don’t know':0,
    'True':3,
    'False':1
}

# Temp variable for new column name
temp_col = 'C3_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C3'].replace(c3_score)
df[['C3',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/1063833420.py:14: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C3'].replace(c3_score)
Out[933]:
C3 C3_processed
0 False 1
1 False 1
2 False 1
3 Don’t know 0
4 True 3
... ... ...
1386 Don’t know 0
1387 Don’t know 0
1388 True 3
1389 Don’t know 0
1390 True 3

1391 rows × 2 columns

Question 4¶

'Assume a friend inherits LEI 50,000 and his sibling inherits LEI 50,000 3 years from now. Who is richer because of the inheritance?'

They are equally rich 1
My friend 2
His sibling 3
Don’t know 0

Based on the specification paper and the research paper, there are some conflict in the data presented. The answer to this question is 'My Friend', it is also stated in the research paper. Therefore, the score of 'My Friend' will set to the highest.

In [934]:
df['C4'].describe()
Out[934]:
count                      1391
unique                        4
top       They are equally rich
freq                        516
Name: C4, dtype: object
In [935]:
# Printing unique value in a list format
for unique_value in list(set(df['C4'])):
	print(unique_value)
His sibling
They are equally rich
My friend
Don’t know
In [936]:
# Make a dictionary based on the values awarded
c4_score = {
    'Don’t know':0,
    'My friend':3,
    'They are equally rich': 2,
    'His sibling':1
}

# Temp variable for new column name
temp_col = 'C4_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C4'].replace(c4_score)
df[['C4',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/3573706696.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C4'].replace(c4_score)
Out[936]:
C4 C4_processed
0 They are equally rich 2
1 They are equally rich 2
2 They are equally rich 2
3 They are equally rich 2
4 Don’t know 0
... ... ...
1386 They are equally rich 2
1387 My friend 3
1388 His sibling 1
1389 They are equally rich 2
1390 Don’t know 0

1391 rows × 2 columns

Question 5¶

'Suppose over the next 10 years the prices of things you buy double. If your income also doubles, will you be able to buy less than you can buy today, the same as you can buy today, or more than you can buy today?'

More 1
Less 2
The same 3
Don’t know 0
In [937]:
df['C5'].describe()
Out[937]:
count         1391
unique           4
top       The same
freq           635
Name: C5, dtype: object
In [938]:
# Printing unique value in a list format
for unique_value in list(set(df['C5'])):
	print(unique_value)
More
The same
Less
Don’t know
In [939]:
# Make a dictionary based on the values awarded
c5_score = {
    'Don’t know':0,
    'The same':3,
    'Less': 2,
    'More':1
}

# Temp variable for new column name
temp_col = 'C5_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C5'].replace(c5_score)
df[['C5',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/4079600377.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C5'].replace(c5_score)
Out[939]:
C5 C5_processed
0 The same 3
1 The same 3
2 The same 3
3 The same 3
4 The same 3
... ... ...
1386 The same 3
1387 The same 3
1388 Don’t know 0
1389 Don’t know 0
1390 Don’t know 0

1391 rows × 2 columns

Question 6¶

' Suppose you have some money. It is safer to put your money into one business or investment, or to put your money into multiple businesses or investments? '

One business or investment 1
Multiple business of investments 2
Don’t know 0

Similarly, to question 4, where the correct answer is 'Multiple business of investments', but the score here is not 1, and we will have to make changes accordingly.

In [940]:
df['C6'].describe()
Out[940]:
count                                 1391
unique                                   3
top       Multiple business of investments
freq                                   566
Name: C6, dtype: object
In [941]:
# Printing unique value in a list format
for unique_value in list(set(df['C6'])):
	print(unique_value)
Multiple business of investments
One business or investment
Don’t know
In [942]:
# Make a dictionary based on the values awarded
c6_score = {
    'Don’t know':0,
    'Multiple business of investments':3,
    'One business or investment':1
}

# Temp variable for new column name
temp_col = 'C6_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C6'].replace(c6_score)
df[['C6',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/982456800.py:14: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C6'].replace(c6_score)
Out[942]:
C6 C6_processed
0 Multiple business of investments 3
1 Multiple business of investments 3
2 Multiple business of investments 3
3 Multiple business of investments 3
4 Don’t know 0
... ... ...
1386 Multiple business of investments 3
1387 Multiple business of investments 3
1388 One business or investment 1
1389 Don’t know 0
1390 Don’t know 0

1391 rows × 2 columns

Question 7¶

' Considering a long time period (for example, 10 or 20 years), which asset normally gives the highest return? '

Savings deposit 1
Bonds 2
Stocks 3
Don’t know 0

Similar to question 4 and 6, where the values are inconsistent pattern as before. This time, it is according to the values we would want.

In [943]:
df['C7'].describe()
Out[943]:
count           1391
unique             4
top       Don’t know
freq             541
Name: C7, dtype: object
In [944]:
# Printing unique value in a list format
for unique_value in list(set(df['C7'])):
    print(unique_value)
Savings deposit
Bonds
Stocks
Don’t know
In [945]:
# Make a dictionary based on the values awarded
c7_score = {
    'Don’t know':0,
    'Stocks':3,
    'Bonds': 2,
    'Savings deposit':1
}

# Temp variable for new column name
temp_col = 'C7_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C7'].replace(c7_score)
df[['C7',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/4083590857.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C7'].replace(c7_score)
Out[945]:
C7 C7_processed
0 Bonds 2
1 Bonds 2
2 Bonds 2
3 Bonds 2
4 Savings deposit 1
... ... ...
1386 Bonds 2
1387 Bonds 2
1388 Don’t know 0
1389 Don’t know 0
1390 Don’t know 0

1391 rows × 2 columns

Question 8¶

' Normally, which asset displays the highest fluctuations over time? '

Savings deposit 1
Bonds 2
Stocks 3
Don’t know 0

Similar, to the 3 questions above, the value shown in the table are according to the correct answer

In [946]:
df['C8'].describe()
Out[946]:
count           1391
unique             4
top       Don’t know
freq             576
Name: C8, dtype: object
In [947]:
# Printing unique value in a list format
for unique_value in list(set(df['C8'])):
	print(unique_value)
Savings deposit
Bonds
Stocks
Don’t know
In [948]:
# Make a dictionary based on the values awarded
c8_score = {
    'Don’t know':0,
    'Stocks':3,
    'Bonds': 2,
    'Savings deposit':1
}

# Temp variable for new column name
temp_col = 'C8_processed'

# Append new column name
list_of_columns_for_financial_literacy.append(temp_col)

df[temp_col] = df['C8'].replace(c8_score)
df[['C8',temp_col]]
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/1833694699.py:15: FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  df[temp_col] = df['C8'].replace(c8_score)
Out[948]:
C8 C8_processed
0 Stocks 3
1 Stocks 3
2 Stocks 3
3 Stocks 3
4 Don’t know 0
... ... ...
1386 Stocks 3
1387 Stocks 3
1388 Don’t know 0
1389 Don’t know 0
1390 Don’t know 0

1391 rows × 2 columns

Get the sum of all the columns and then group them into categories¶

In [949]:
df[list_of_columns_for_financial_literacy]
Out[949]:
C1_processed C2_processed C3_processed C4_processed C5_processed C6_processed C7_processed C8_processed
0 2 2 1 2 3 3 2 3
1 2 2 1 2 3 3 2 3
2 2 2 1 2 3 3 2 3
3 2 2 0 2 3 3 2 3
4 2 2 3 0 3 0 1 0
... ... ... ... ... ... ... ... ...
1386 2 2 0 2 3 3 2 3
1387 2 2 0 3 3 3 2 3
1388 0 3 3 1 0 1 0 0
1389 0 2 0 2 0 0 0 0
1390 0 0 3 0 0 0 0 0

1391 rows × 8 columns

In [950]:
# Getting the sum of the 8 columns into a column
df['C1-C8'] = df[list_of_columns_for_financial_literacy].sum(axis=1)
df[[*list_of_columns_for_financial_literacy,'C1-C8']]
Out[950]:
C1_processed C2_processed C3_processed C4_processed C5_processed C6_processed C7_processed C8_processed C1-C8
0 2 2 1 2 3 3 2 3 18
1 2 2 1 2 3 3 2 3 18
2 2 2 1 2 3 3 2 3 18
3 2 2 0 2 3 3 2 3 17
4 2 2 3 0 3 0 1 0 11
... ... ... ... ... ... ... ... ... ...
1386 2 2 0 2 3 3 2 3 17
1387 2 2 0 3 3 3 2 3 18
1388 0 3 3 1 0 1 0 0 8
1389 0 2 0 2 0 0 0 0 4
1390 0 0 3 0 0 0 0 0 3

1391 rows × 9 columns

In [951]:
df['C1-C8'].describe()
Out[951]:
count    1391.000000
mean       13.636233
std         5.604593
min         0.000000
25%        10.000000
50%        15.000000
75%        18.000000
max        24.000000
Name: C1-C8, dtype: float64
In [952]:
df['C1-C8'].hist()
Out[952]:
<Axes: >
No description has been provided for this image
In [953]:
# Setting the Financial Behaviour labels
financial_literacy_score_labels = ['0-5', '6-10', '11-15', '16-20', '+20']

# Edge values for each bin
financial_literacy_groups = [-1,5,10,15,20,25]

# Group the values into the three groups
df['Financial Literacy'] = pd.cut(df['C1-C8'], financial_literacy_groups, labels=financial_literacy_score_labels)

df[['C1-C8','Financial Literacy']]
Out[953]:
C1-C8 Financial Literacy
0 18 16-20
1 18 16-20
2 18 16-20
3 17 16-20
4 11 11-15
... ... ...
1386 17 16-20
1387 18 16-20
1388 8 6-10
1389 4 0-5
1390 3 0-5

1391 rows × 2 columns

In [954]:
# Visualize final financial literacy column
df['Financial Literacy'].value_counts().loc[financial_literacy_score_labels].plot.bar()
Out[954]:
<Axes: xlabel='Financial Literacy'>
No description has been provided for this image
In [955]:
# Get the count of each unique value
pd.crosstab(df['Financial Literacy'], 'Count')
Out[955]:
col_0 Count
Financial Literacy
0-5 144
6-10 218
11-15 421
16-20 495
+20 113
In [956]:
# Get the probability by normalizing the counts of unique value
financial_literacy_probability = pd.crosstab(df['Financial Literacy'], 'Probability', normalize=True)
financial_literacy_probability
Out[956]:
col_0 Probability
Financial Literacy
0-5 0.103523
6-10 0.156722
11-15 0.302660
16-20 0.355859
+20 0.081237

Financial Well Being¶

As for financial well being, they have already given the values in the dataset. It will require quite a bit of preprocessing as not only do we need to process the scores of each question, similar to the financial literacy, but also take age into consideration, prepare 2 custom list of scores to be used, according to the information on CFPB. The score from the questions will need to be converted based on the tables provided by CFPB. One table for those that are above 61 or not, and the other is whether it is self administered or assisted test.

In [957]:
df['FWB'].describe()
Out[957]:
count    1391.000000
mean       51.118620
std         9.963438
min        16.000000
25%        45.000000
50%        51.000000
75%        57.000000
max        91.000000
Name: FWB, dtype: float64
In [958]:
df['FWB'].hist()
Out[958]:
<Axes: >
No description has been provided for this image
In [959]:
# Setting the Financial Behaviour labels
financial_well_being_labels = ['Very Low', 'Low', 'Medium', 'High', 'Very High']

# Group the values into the 5 groups
df['Financial Well Being'] = pd.cut(df['FWB'], 5, labels=financial_well_being_labels)

df[['FWB','Financial Well Being']]
Out[959]:
FWB Financial Well Being
0 67 High
1 75 High
2 68 High
3 48 Medium
4 53 Medium
... ... ...
1386 65 High
1387 48 Medium
1388 60 Medium
1389 55 Medium
1390 53 Medium

1391 rows × 2 columns

In [960]:
# Visualize the final financial well being column
df['Financial Well Being'].value_counts().loc[financial_well_being_labels].plot.bar()
Out[960]:
<Axes: xlabel='Financial Well Being'>
No description has been provided for this image
In [961]:
# Get the count of each unique group
pd.crosstab(df['Financial Well Being'], 'Count')
Out[961]:
col_0 Count
Financial Well Being
Very Low 34
Low 393
Medium 771
High 178
Very High 15
In [962]:
# Get the probability by normalizing the counts of unique value
financial_well_being_probability = pd.crosstab(df['Financial Well Being'], 'Probability', normalize=True)
financial_well_being_probability
Out[962]:
col_0 Probability
Financial Well Being
Very Low 0.024443
Low 0.282531
Medium 0.554277
High 0.127965
Very High 0.010784

Defining the Network¶

We will start defining the network and making a list of the connection that they have. We will be connecting the Sociodemographic columns, as they are more descriptive and inherit characteristics of an individual, to both Financial Literacy and Financial Behaviour. This is because given the traits an individual have, it will affect their financial literacy and behaviour. Then, both these financial behaviour and financial literacy will be connected to financial well being, as we would like to explore the relationship between the three of them.

In [963]:
model = BayesianNetwork(
    [
        ('Gender', 'Financial Literacy'),
        ('Gender', 'Financial Behaviour'),
        ('Age', 'Financial Literacy'),
        ('Age', 'Financial Behaviour'),
        ('Education', 'Financial Literacy'),
        ('Education', 'Financial Behaviour'),
        ('Financial Literacy', 'Financial Well Being'),
        ('Financial Behaviour', 'Financial Well Being')
    ]
)

Defining the CPD to the Bayesian Network¶

Gender¶

In [964]:
gender_probability
Out[964]:
col_0 Probability
SD2
Female 0.519051
Male 0.480949
In [965]:
gender_cpd = TabularCPD(
    variable='Gender',
    variable_card=2,
    values = [[0.519051],[0.480949]],
    state_names={
        'Gender':[
            'Female',
            'Male'
        ]
    }
)
In [966]:
model.add_cpds(gender_cpd)
In [967]:
print(model.get_cpds('Gender'))
+----------------+----------+
| Gender(Female) | 0.519051 |
+----------------+----------+
| Gender(Male)   | 0.480949 |
+----------------+----------+

Age¶

In [968]:
age_probability
Out[968]:
col_0 Probability
SD3_processed
16-36 0.26312
37-56 0.46729
57+ 0.26959
In [969]:
age_cpd = TabularCPD(
    variable='Age',
    variable_card=3,
    values=[
        [0.26312],[0.46729],[0.26959]
    ],
    state_names={
        'Age':[
            '16-36', '37-56', '76+'
        ]
    }
)
In [970]:
model.add_cpds(age_cpd)
In [971]:
print(model.get_cpds('Age'))
+------------+---------+
| Age(16-36) | 0.26312 |
+------------+---------+
| Age(37-56) | 0.46729 |
+------------+---------+
| Age(76+)   | 0.26959 |
+------------+---------+

Education¶

In [972]:
education_probability
Out[972]:
col_0 Probability
SD4_processed
Bachelor and above 0.393242
High school (12 grades) 0.501078
Middle school (8 grades) and below 0.105679
In [973]:
income_cpd = TabularCPD(
    variable='Education',
    variable_card=3,
    values=[
        [0.3932],[0.5011],[0.1057],
    ],
    state_names={
        'Education':[
            'Middle school (8 grades) and below',
            'High school (12 grades)',
            'Bachelor and above',
        ]
    }
)
In [974]:
model.add_cpds(income_cpd)
In [975]:
print(model.get_cpds('Education'))
+-----------------------------------------------+--------+
| Education(Middle school (8 grades) and below) | 0.3932 |
+-----------------------------------------------+--------+
| Education(High school (12 grades))            | 0.5011 |
+-----------------------------------------------+--------+
| Education(Bachelor and above)                 | 0.1057 |
+-----------------------------------------------+--------+

Financial Literacy¶

In [976]:
sociodemographic_columns = []
sociodemographic_columns.append('SD2')
sociodemographic_columns.append('SD3_processed')
sociodemographic_columns.append('SD4_processed')

FL_socio_probability = pd.crosstab(
    df['Financial Literacy'],
    [df['SD2'],df['SD3_processed'],df['SD4_processed']],
    normalize='columns',dropna=False,margins=True
)

#FL_socio_probability = FL_socio_probability.applymap('{:.4f}'.format)
#FL_socio_probability = FL_socio_probability.astype(np.float16)

FL_socio_probability
Out[976]:
SD2 Female Male All
SD3_processed 16-36 37-56 57+ 16-36 37-56 57+
SD4_processed Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below
Financial Literacy
0-5 0.020690 0.097222 0.250 0.058065 0.052632 0.615385 0.043478 0.164706 0.46 0.013514 0.052632 0.3 0.030303 0.069892 0.346154 0.058824 0.103175 0.500 0.103523
6-10 0.055172 0.083333 0.250 0.064516 0.152047 0.153846 0.086957 0.270588 0.26 0.040541 0.175439 0.5 0.121212 0.193548 0.230769 0.098039 0.285714 0.325 0.156722
11-15 0.337931 0.319444 0.125 0.277419 0.362573 0.230769 0.260870 0.400000 0.26 0.243243 0.228070 0.1 0.313131 0.301075 0.346154 0.215686 0.333333 0.150 0.302660
16-20 0.441379 0.472222 0.375 0.496774 0.415205 0.000000 0.565217 0.141176 0.02 0.432432 0.473684 0.1 0.404040 0.370968 0.038462 0.372549 0.238095 0.025 0.355859
+20 0.144828 0.027778 0.000 0.103226 0.017544 0.000000 0.043478 0.023529 0.00 0.270270 0.070175 0.0 0.131313 0.064516 0.038462 0.254902 0.039683 0.000 0.081237
In [977]:
# Check for sum of every column 
FL_testing = FL_socio_probability[FL_socio_probability.columns].sum()

for FL in FL_testing:
    print(FL)

print('Sum: ', sum(FL_testing))
1.0
1.0
1.0
1.0
1.0
1.0
0.9999999999999999
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.9999999999999999
1.0
0.9999999999999999
1.0
1.0
Sum:  19.0
In [978]:
distinct_value_count_for_socio_col = [len(list(set(df[x]))) for x in sociodemographic_columns]
distinct_value_count_for_socio_col
Out[978]:
[2, 3, 3]
In [979]:
[x for x in FL_socio_probability.index]
Out[979]:
['0-5', '6-10', '11-15', '16-20', '+20']
In [980]:
FL_socio_cpd = TabularCPD(
    variable='Financial Literacy',
    variable_card=5,
    evidence=['Gender', 'Age','Education'],
    evidence_card=distinct_value_count_for_socio_col,
    values = [
        [x for x in FL_socio_probability.loc['0-5'][:-1]],
        [x for x in FL_socio_probability.loc['6-10'][:-1]],
        [x for x in FL_socio_probability.loc['11-15'][:-1]],
        [x for x in FL_socio_probability.loc['16-20'][:-1]],
        [x for x in FL_socio_probability.loc['+20'][:-1]]
    ],
    state_names = {
        'Financial Literacy':[x for x in FL_socio_probability.index],

        'Gender':[
            'Female',
            'Male'
        ],

        'Age':[
            '16-36', '37-56', '76+'
        ],

        'Education':[
            'Middle school (8 grades) and below',
            'High school (12 grades)',
            'Bachelor and above',
        ]
    }
)
In [981]:
model.add_cpds(FL_socio_cpd)
In [982]:
print(model.get_cpds('Financial Literacy'))
+---------------------------+-----+-------------------------------+
| Gender                    | ... | Gender(Male)                  |
+---------------------------+-----+-------------------------------+
| Age                       | ... | Age(76+)                      |
+---------------------------+-----+-------------------------------+
| Education                 | ... | Education(Bachelor and above) |
+---------------------------+-----+-------------------------------+
| Financial Literacy(0-5)   | ... | 0.5                           |
+---------------------------+-----+-------------------------------+
| Financial Literacy(6-10)  | ... | 0.325                         |
+---------------------------+-----+-------------------------------+
| Financial Literacy(11-15) | ... | 0.15                          |
+---------------------------+-----+-------------------------------+
| Financial Literacy(16-20) | ... | 0.025                         |
+---------------------------+-----+-------------------------------+
| Financial Literacy(+20)   | ... | 0.0                           |
+---------------------------+-----+-------------------------------+

Financial Behaviour¶

In [983]:
df['Financial Behaviour']
Out[983]:
0        4+
1       2-4
2        4+
3       0-2
4       2-4
       ... 
1386     4+
1387    2-4
1388    2-4
1389    0-2
1390    0-2
Name: Financial Behaviour, Length: 1391, dtype: category
Categories (3, object): ['0-2' < '2-4' < '4+']
In [984]:
FB_socio_probability = pd.crosstab(
    df['Financial Behaviour'],
    [df['SD2'],df['SD3_processed'],df['SD4_processed']],
    normalize='columns',dropna=False,margins=True
)
#FB_socio_probability = FB_socio_probability.applymap('{:.4f}'.format)
#FB_socio_probability = FB_socio_probability.astype(np.float16)
FB_socio_probability
Out[984]:
SD2 Female Male All
SD3_processed 16-36 37-56 57+ 16-36 37-56 57+
SD4_processed Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below
Financial Behaviour
0-2 0.537931 0.736111 1.0 0.445161 0.754386 0.846154 0.608696 0.776471 0.8 0.486486 0.842105 1.0 0.585859 0.833333 0.961538 0.490196 0.81746 0.925 0.693746
2-4 0.331034 0.236111 0.0 0.406452 0.204678 0.153846 0.347826 0.223529 0.2 0.351351 0.122807 0.0 0.343434 0.155914 0.038462 0.431373 0.18254 0.075 0.249461
4+ 0.131034 0.027778 0.0 0.148387 0.040936 0.000000 0.043478 0.000000 0.0 0.162162 0.035088 0.0 0.070707 0.010753 0.000000 0.078431 0.00000 0.000 0.056794
In [985]:
# Check for sum of every column 
FB_Testing = FB_socio_probability[FB_socio_probability.columns].sum()

for FB in FB_Testing:
    print(FB)

print('Sum: ', sum(FB_Testing)) 
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.9999999999999999
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
Sum:  19.0
In [986]:
FB_socio_cpd = TabularCPD(
    variable='Financial Behaviour',
    variable_card=3,
    evidence=['Gender', 'Age','Education'],
    evidence_card=distinct_value_count_for_socio_col,
    values = [
        [x for x in FB_socio_probability.loc['0-2'][:-1]],
        [x for x in FB_socio_probability.loc['2-4'][:-1]],
        [x for x in FB_socio_probability.loc['4+'][:-1]],
    ],
    state_names = {
        'Financial Behaviour':[x for x in FB_socio_probability.index],

        'Gender':[
            'Female',
            'Male'
        ],

        'Age':[
            '16-36', '37-56', '76+'
        ],

        'Education':[
            'Middle school (8 grades) and below',
            'High school (12 grades)',
            'Bachelor and above',
        ]
    }
)
In [987]:
model.add_cpds(FB_socio_cpd)
In [988]:
print(model.get_cpds('Financial Behaviour'))
+--------------------------+-----+-------------------------------+
| Gender                   | ... | Gender(Male)                  |
+--------------------------+-----+-------------------------------+
| Age                      | ... | Age(76+)                      |
+--------------------------+-----+-------------------------------+
| Education                | ... | Education(Bachelor and above) |
+--------------------------+-----+-------------------------------+
| Financial Behaviour(0-2) | ... | 0.925                         |
+--------------------------+-----+-------------------------------+
| Financial Behaviour(2-4) | ... | 0.075                         |
+--------------------------+-----+-------------------------------+
| Financial Behaviour(4+)  | ... | 0.0                           |
+--------------------------+-----+-------------------------------+

Financial Well Being¶

In [989]:
FWB_probability = pd.crosstab(
    df['Financial Well Being'],
    [df['Financial Behaviour'], df['Financial Literacy']],
    normalize='columns', dropna=False, margins=True
)

FWB_probability
Out[989]:
Financial Behaviour 0-2 2-4 4+ All
Financial Literacy 0-5 6-10 11-15 16-20 +20 0-5 6-10 11-15 16-20 +20 0-5 6-10 11-15 16-20 +20
Financial Well Being
Very Low 0.085271 0.062857 0.019672 0.013115 0.000000 0.000000 0.023810 0.000000 0.006803 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.024443
Low 0.457364 0.400000 0.370492 0.301639 0.137255 0.200000 0.261905 0.166667 0.122449 0.000000 0.0 0.0 0.1 0.046512 0.000000 0.282531
Medium 0.418605 0.462857 0.544262 0.570492 0.666667 0.666667 0.523810 0.687500 0.585034 0.595745 0.0 1.0 0.7 0.627907 0.533333 0.554277
High 0.038760 0.062857 0.055738 0.108197 0.196078 0.133333 0.190476 0.135417 0.278912 0.319149 0.0 0.0 0.2 0.279070 0.466667 0.127965
Very High 0.000000 0.011429 0.009836 0.006557 0.000000 0.000000 0.000000 0.010417 0.006803 0.085106 0.0 0.0 0.0 0.046512 0.000000 0.010784
In [990]:
# Check for sum of every column 
FWB_Testing = FWB_probability[FWB_probability.columns].sum()

for FWB in FWB_Testing:
    print(FWB)

print('Sum: ', sum(FWB_Testing)) 
0.9999999999999999
1.0
0.9999999999999999
1.0
1.0
1.0
1.0
0.9999999999999999
1.0
1.0
0.0
1.0
1.0
0.9999999999999999
1.0
1.0
Sum:  15.0

Based on the checking of columns, because there are columns that are not filled, as shown by the 0 in the print above. Therefore, even though financial well being is well established and standardized, we will regroup them once again.

In [991]:
# Visualize the final financial well being column
df['Financial Well Being'].value_counts().loc[financial_well_being_labels].plot.bar()
Out[991]:
<Axes: xlabel='Financial Well Being'>
No description has been provided for this image

Also as shown in in the plots, because value of 'Very Low' and 'Very High', are very low, we will group them to 'Low' and 'High', respectively, and renaming them to 'Low and below', and 'High and above'.

In [992]:
FWB_group_changes = {
    'Very Low': 'Low and Below',
    'Low': 'Low and Below', 
    'High': 'High and Above', 
    'Very High': 'High and Above'
}

df['Financial Well Being_processed'] = df['Financial Well Being'].replace(FWB_group_changes)
df['Financial Well Being_processed'] 
/var/folders/4d/3zg2grqx5kj9w9kfm1mqfcvw0000gn/T/ipykernel_26276/1069431917.py:8: FutureWarning: The behavior of Series.replace (and DataFrame.replace) with CategoricalDtype is deprecated. In a future version, replace will only be used for cases that preserve the categories. To change the categories, use ser.cat.rename_categories instead.
  df['Financial Well Being_processed'] = df['Financial Well Being'].replace(FWB_group_changes)
Out[992]:
0       High and Above
1       High and Above
2       High and Above
3               Medium
4               Medium
             ...      
1386    High and Above
1387            Medium
1388            Medium
1389            Medium
1390            Medium
Name: Financial Well Being_processed, Length: 1391, dtype: category
Categories (3, object): ['Low and Below' < 'Medium' < 'High and Above']
In [993]:
# Get the count of each unique value
pd.crosstab(df['Financial Well Being_processed'], 'Count')
Out[993]:
col_0 Count
Financial Well Being_processed
Low and Below 427
Medium 771
High and Above 193
In [994]:
# Financial Well Being 
fwb_processed_order = [
    'Low and Below', 
    'Medium', 
    'High and Above'
]

# Visualize the final financial well being column
df['Financial Well Being_processed'].value_counts().loc[fwb_processed_order].plot.bar()
Out[994]:
<Axes: xlabel='Financial Well Being_processed'>
No description has been provided for this image
In [995]:
# Get the probability by normalizing the counts of unique value
fwb_processed_probability = pd.crosstab(df['Financial Well Being_processed'], 'Probability', normalize=True)
fwb_processed_probability 
Out[995]:
col_0 Probability
Financial Well Being_processed
Low and Below 0.306973
Medium 0.554277
High and Above 0.138749
In [996]:
FWB_probability = pd.crosstab(
    df['Financial Well Being_processed'],
    [df['Financial Behaviour'], df['Financial Literacy']],
    normalize='columns', dropna=False, margins=True
)

FWB_probability
Out[996]:
Financial Behaviour 0-2 2-4 4+ All
Financial Literacy 0-5 6-10 11-15 16-20 +20 0-5 6-10 11-15 16-20 +20 0-5 6-10 11-15 16-20 +20
Financial Well Being_processed
Low and Below 0.542636 0.462857 0.390164 0.314754 0.137255 0.200000 0.285714 0.166667 0.129252 0.000000 0.0 0.0 0.1 0.046512 0.000000 0.306973
Medium 0.418605 0.462857 0.544262 0.570492 0.666667 0.666667 0.523810 0.687500 0.585034 0.595745 0.0 1.0 0.7 0.627907 0.533333 0.554277
High and Above 0.038760 0.074286 0.065574 0.114754 0.196078 0.133333 0.190476 0.145833 0.285714 0.404255 0.0 0.0 0.2 0.325581 0.466667 0.138749
In [997]:
# Check for sum of every column 
FWB_Testing = FWB_probability[FWB_probability.columns].sum()

for FWB in FWB_Testing:
    print(FWB)

print('Sum: ', sum(FWB_Testing)) 
0.9999999999999999
1.0
0.9999999999999999
1.0
1.0
1.0
1.0
1.0
1.0
1.0
0.0
1.0
1.0
1.0
1.0
1.0
Sum:  15.0

And still, after making the changes, we might need to make further changes. The changes are to the 'Financial Literacy' columns. As it can be seen that in the column where 'Financial Behaviour' is 4+ and 'Financial Literacy' is 0-5, there are no instances or outcome for this. Therefore, we will need to lower the number of groups for financial literacy once again.

In [998]:
# Setting the Financial Behaviour labels
financial_literacy_score_labels = ['Below Average','Above Average']

# Edge values for each bin
financial_literacy_groups = [-1,14,25]

# Group the values into the three groups
df['Financial Literacy_processed'] = pd.cut(df['C1-C8'], financial_literacy_groups, labels=financial_literacy_score_labels)
df['Financial Literacy_processed'].value_counts().loc[financial_literacy_score_labels].plot.bar()
Out[998]:
<Axes: xlabel='Financial Literacy_processed'>
No description has been provided for this image
In [999]:
# Get the count of each unique value
pd.crosstab(df['Financial Literacy_processed'], 'Count')
Out[999]:
col_0 Count
Financial Literacy_processed
Below Average 686
Above Average 705
In [1000]:
FL_socio_probability = pd.crosstab(
    df['Financial Literacy_processed'],
    [df['SD2'],df['SD3_processed'],df['SD4_processed']],
    normalize='columns',dropna=False,margins=True
)

FL_socio_probability
Out[1000]:
SD2 Female Male All
SD3_processed 16-36 37-56 57+ 16-36 37-56 57+
SD4_processed Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below Bachelor and above High school (12 grades) Middle school (8 grades) and below
Financial Literacy_processed
Below Average 0.344828 0.430556 0.625 0.322581 0.48538 0.923077 0.347826 0.764706 0.94 0.243243 0.403509 0.8 0.363636 0.494624 0.846154 0.333333 0.634921 0.975 0.49317
Above Average 0.655172 0.569444 0.375 0.677419 0.51462 0.076923 0.652174 0.235294 0.06 0.756757 0.596491 0.2 0.636364 0.505376 0.153846 0.666667 0.365079 0.025 0.50683
In [1001]:
FL_socio_cpd = TabularCPD(
    variable='Financial Literacy',
    variable_card=2,
    evidence=['Gender', 'Age','Education'],
    evidence_card=distinct_value_count_for_socio_col,
    values = [
        [x for x in FL_socio_probability.loc['Below Average'][:-1]],
        [x for x in FL_socio_probability.loc['Above Average'][:-1]]
    ],

    state_names={
        'Financial Literacy':[x for x in FL_socio_probability.index],

        'Age':[
            '16-36', '37-56', '76+'
        ],

        'Gender':[
            'Female',
            'Male'
        ],

        'Education':[
            'Middle school (8 grades) and below',
            'High school (12 grades)',
            'Bachelor and above',
        ]
    }
)
In [1002]:
model.add_cpds(FL_socio_cpd)
print(model.get_cpds('Financial Literacy'))
WARNING:pgmpy:Replacing existing CPD for Financial Literacy
+-----------------------------------+-----+-------------------------------+
| Gender                            | ... | Gender(Male)                  |
+-----------------------------------+-----+-------------------------------+
| Age                               | ... | Age(76+)                      |
+-----------------------------------+-----+-------------------------------+
| Education                         | ... | Education(Bachelor and above) |
+-----------------------------------+-----+-------------------------------+
| Financial Literacy(Below Average) | ... | 0.975                         |
+-----------------------------------+-----+-------------------------------+
| Financial Literacy(Above Average) | ... | 0.025                         |
+-----------------------------------+-----+-------------------------------+
In [1003]:
FWB_probability = pd.crosstab(
    df['Financial Well Being_processed'],
    [df['Financial Behaviour'], df['Financial Literacy_processed']],
    normalize='columns', dropna=False, margins=True
)

FWB_probability
Out[1003]:
Financial Behaviour 0-2 2-4 4+ All
Financial Literacy_processed Below Average Above Average Below Average Above Average Below Average Above Average
Financial Well Being_processed
Low and Below 0.459410 0.293144 0.201550 0.110092 0.133333 0.03125 0.306973
Medium 0.479705 0.588652 0.658915 0.582569 0.666667 0.62500 0.554277
High and Above 0.060886 0.118203 0.139535 0.307339 0.200000 0.34375 0.138749
In [1004]:
FWB_socio_cpd = TabularCPD(
    variable='Financial Well Being', 
    variable_card=3,
    evidence=['Financial Behaviour', 'Financial Literacy'],
    evidence_card=[3, 2],
    values = [
        [x for x in FWB_probability.loc['Low and Below'][:-1]],
        [x for x in FWB_probability.loc['Medium'][:-1]],
        [x for x in FWB_probability.loc['High and Above'][:-1]]

    ], 

    state_names = {
        'Financial Literacy': [x for x in FL_socio_probability.index],
        'Financial Behaviour': [x for x in FB_socio_probability.index],
        'Financial Well Being': [x for x in FWB_probability.index]
    }
)
In [1005]:
model.add_cpds(FWB_socio_cpd)

Inferences and Queries¶

In [1017]:
infer = VariableElimination(model)
print(infer.query(variables = ['Financial Literacy']))
+-----------------------------------+---------------------------+
| Financial Literacy                |   phi(Financial Literacy) |
+===================================+===========================+
| Financial Literacy(Below Average) |                    0.4850 |
+-----------------------------------+---------------------------+
| Financial Literacy(Above Average) |                    0.5150 |
+-----------------------------------+---------------------------+
In [1007]:
print(infer.query(variables = ['Financial Well Being']))
+--------------------------------------+-----------------------------+
| Financial Well Being                 |   phi(Financial Well Being) |
+======================================+=============================+
| Financial Well Being(Low and Below)  |                      0.3055 |
+--------------------------------------+-----------------------------+
| Financial Well Being(Medium)         |                      0.5587 |
+--------------------------------------+-----------------------------+
| Financial Well Being(High and Above) |                      0.1358 |
+--------------------------------------+-----------------------------+
In [1008]:
print(infer.query(
    variables = ['Financial Well Being'],
    evidence = {'Financial Literacy':'Above Average'}
))
+--------------------------------------+-----------------------------+
| Financial Well Being                 |   phi(Financial Well Being) |
+======================================+=============================+
| Financial Well Being(Low and Below)  |                      0.2254 |
+--------------------------------------+-----------------------------+
| Financial Well Being(Medium)         |                      0.5894 |
+--------------------------------------+-----------------------------+
| Financial Well Being(High and Above) |                      0.1852 |
+--------------------------------------+-----------------------------+
In [1009]:
print(infer.query(
    variables = ['Financial Well Being'],
    evidence = {'Financial Literacy':'Below Average'}
    )
)
+--------------------------------------+-----------------------------+
| Financial Well Being                 |   phi(Financial Well Being) |
+======================================+=============================+
| Financial Well Being(Low and Below)  |                      0.3906 |
+--------------------------------------+-----------------------------+
| Financial Well Being(Medium)         |                      0.5260 |
+--------------------------------------+-----------------------------+
| Financial Well Being(High and Above) |                      0.0833 |
+--------------------------------------+-----------------------------+
In [1020]:
print(infer.query(
    variables = ['Financial Well Being'],
    evidence = {'Age':'76+', 'Gender':'Female'}
    )
)
+--------------------------------------+-----------------------------+
| Financial Well Being                 |   phi(Financial Well Being) |
+======================================+=============================+
| Financial Well Being(Low and Below)  |                      0.3304 |
+--------------------------------------+-----------------------------+
| Financial Well Being(Medium)         |                      0.5494 |
+--------------------------------------+-----------------------------+
| Financial Well Being(High and Above) |                      0.1202 |
+--------------------------------------+-----------------------------+
In [1024]:
print(infer.query(
    variables = ['Financial Well Being'],
    evidence = {'Age':'16-36', 'Age':'37-56', 'Gender':'Female'}
    )
)
+--------------------------------------+-----------------------------+
| Financial Well Being                 |   phi(Financial Well Being) |
+======================================+=============================+
| Financial Well Being(Low and Below)  |                      0.2887 |
+--------------------------------------+-----------------------------+
| Financial Well Being(Medium)         |                      0.5648 |
+--------------------------------------+-----------------------------+
| Financial Well Being(High and Above) |                      0.1465 |
+--------------------------------------+-----------------------------+
In [1028]:
print(infer.query(
    variables = ['Age', 'Gender'],
    evidence = {'Financial Well Being':'Low and Below'}
    )
)
+------------+----------------+-------------------+
| Age        | Gender         |   phi(Age,Gender) |
+============+================+===================+
| Age(16-36) | Gender(Female) |            0.1302 |
+------------+----------------+-------------------+
| Age(16-36) | Gender(Male)   |            0.1214 |
+------------+----------------+-------------------+
| Age(37-56) | Gender(Female) |            0.2292 |
+------------+----------------+-------------------+
| Age(37-56) | Gender(Male)   |            0.2331 |
+------------+----------------+-------------------+
| Age(76+)   | Gender(Female) |            0.1513 |
+------------+----------------+-------------------+
| Age(76+)   | Gender(Male)   |            0.1349 |
+------------+----------------+-------------------+
In [1032]:
print(infer.query(
    variables = ['Age', 'Gender'],
    evidence = {'Financial Well Being':'Low and Below', 'Financial Literacy':'Below Average'}
    )
)
+------------+----------------+-------------------+
| Age        | Gender         |   phi(Age,Gender) |
+============+================+===================+
| Age(16-36) | Gender(Female) |            0.1148 |
+------------+----------------+-------------------+
| Age(16-36) | Gender(Male)   |            0.1024 |
+------------+----------------+-------------------+
| Age(37-56) | Gender(Female) |            0.2246 |
+------------+----------------+-------------------+
| Age(37-56) | Gender(Male)   |            0.2289 |
+------------+----------------+-------------------+
| Age(76+)   | Gender(Female) |            0.1796 |
+------------+----------------+-------------------+
| Age(76+)   | Gender(Male)   |            0.1498 |
+------------+----------------+-------------------+
In [1025]:
print(infer.query(
    variables = ['Financial Literacy'],
    evidence = {'Age':'76+', 'Gender':'Female'}
    )
)
+-----------------------------------+---------------------------+
| Financial Literacy                |   phi(Financial Literacy) |
+===================================+===========================+
| Financial Literacy(Below Average) |                    0.6193 |
+-----------------------------------+---------------------------+
| Financial Literacy(Above Average) |                    0.3807 |
+-----------------------------------+---------------------------+
In [1026]:
print(infer.query(
    variables = ['Financial Literacy'],
    evidence = {'Age':'16-36', 'Age':'37-56', 'Gender':'Female'}
    )
)
+-----------------------------------+---------------------------+
| Financial Literacy                |   phi(Financial Literacy) |
+===================================+===========================+
| Financial Literacy(Below Average) |                    0.4676 |
+-----------------------------------+---------------------------+
| Financial Literacy(Above Average) |                    0.5324 |
+-----------------------------------+---------------------------+
In [1029]:
print(infer.query(
    variables = ['Financial Well Being'],
    evidence = {'Age':'76+', 'Gender':'Female', 'Financial Literacy':'Below Average'}
    )
)
+--------------------------------------+-----------------------------+
| Financial Well Being                 |   phi(Financial Well Being) |
+======================================+=============================+
| Financial Well Being(Low and Below)  |                      0.3925 |
+--------------------------------------+-----------------------------+
| Financial Well Being(Medium)         |                      0.5258 |
+--------------------------------------+-----------------------------+
| Financial Well Being(High and Above) |                      0.0817 |
+--------------------------------------+-----------------------------+
In [1016]:
model.check_model()
Out[1016]:
True