Init¶

In [1]:
import pandas as pd
import survey
from IPython.display import display
from pylab import *
get_ipython().run_line_magic(
    "matplotlib",
    "inline" if "ZMQInteractiveShell" in str(get_ipython()) else "tk"
)

Accessible variables often have prefixes such as ur_ or ugh_

Response forms¶

We provide some data based on whether participants responded online or over the phone (prefix: ur_) This is done by education

In [2]:
survey.ur_edu
Out[2]:
Online Direct
0 133034 101271
1 28859 18142
2 30510 11910
3 100020 21623
4 68119 7221

and age

In [3]:
survey.ur_age
Out[3]:
Online Telephone
(-0.117, 11.715] 217367 109838
(11.715, 23.43] 130002 40103
(23.43, 35.145] 10760 6160
(35.145, 46.86] 2092 2473
(46.86, 58.575] 299 1015
(58.575, 70.29] 22 405
(70.29, 82.005] 0 119
(82.005, 93.72] 0 39
(93.72, 105.435] 0 10
(105.435, 117.15] 0 5

You can plot these DataFrames using eg.

In [4]:
survey.ur_edu.plot.bar()
xlabel('Education')
ylabel("Respondents")
Out[4]:
Text(0, 0.5, 'Respondents')

We have five levels: none (0), GCSE (1), A-levels (2), under-graduate (3), graduate (4).

Univariant data¶

We provide some univariant data by habitat. The split is in u_habitat

In [5]:
display(survey.u_habitat)
survey.u_habitat.plot.pie(y=0)
Grassland?
False 292181
True 228528
Out[5]:
<AxesSubplot: ylabel='Grassland?'>

For the grasslands (prefix ug_) we have species

In [6]:
display(survey.ug_species)
survey.ug_species.plot.pie(y=0)
Species
m 117905
e 59728
l 30867
g 20028
Out[6]:
<AxesSubplot: ylabel='Species'>

education level

In [7]:
survey.ug_edu
Out[7]:
Education
0 110937
3 45658
1 27888
4 24848
2 19197

and response type

In [8]:
survey.ug_response
Out[8]:
Response
True 152841
False 75687

We have the same for the forest (prefix uf_). For the continuous distributions (ugh_ and ufh_) we have histograms for age ranges

In [9]:
display(survey.ugh_age)
survey.ugh_age.plot.bar()
xlabel("Age range")
ylabel("Respondents")
Age
(-0.112, 7.463] 102817
(7.463, 14.927] 77043
(14.927, 22.39] 43756
(22.39, 29.853] 3562
(29.853, 37.317] 778
(37.317, 44.78] 305
(44.78, 52.243] 148
(52.243, 59.707] 56
(59.707, 67.17] 36
(67.17, 74.633] 17
(74.633, 82.097] 5
(82.097, 89.56] 2
(104.487, 111.95] 2
(89.56, 97.023] 1
(97.023, 104.487] 0
Out[9]:
Text(0, 0.5, 'Respondents')

and income ranges

In [10]:
display(survey.ugh_income)
survey.ugh_income.plot.bar()
xlabel("Income range")
ylabel("Respondents")
Income
(9908.162, 16122.511] 163645
(22245.023, 28367.534] 13260
(16122.511, 22245.023] 12297
(28367.534, 34490.045] 11780
(34490.045, 40612.557] 9816
(46735.068, 52857.579] 5017
(40612.557, 46735.068] 4627
(52857.579, 58980.091] 3945
(58980.091, 65102.602] 1680
(65102.602, 71225.113] 988
(71225.113, 77347.625] 676
(77347.625, 83470.136] 417
(83470.136, 89592.647] 268
(89592.647, 95715.159] 93
(95715.159, 101837.67] 19
Out[10]:
Text(0, 0.5, 'Respondents')

We also provide the 10%-quantile (prefix ugq_ and ufq_)

In [11]:
survey.ugq_age
Out[11]:
Age
0.1 1.60
0.2 3.24
0.3 4.91
0.4 6.60
0.5 8.43
0.6 10.56
0.7 12.85
0.8 15.25
0.9 17.79

and a summary report (ugs_ and ufs_)

In [12]:
survey.ugs_age
Out[12]:
{'mean': 9.363614436742981,
 'std': 6.505748238948622,
 'media': 8.43,
 'range': [0.0, 111.95]}

Correlations and 2D plots¶

We also provide the following 2D histograms

  • age v. income (h_age_income)
  • age v. education (h_age_edu)
  • income v. education (h_income_edu)

To protect our respondents privacy, we do not split this further. Plotting these is a bit annoying

In [13]:
survey.h_age_income
Out[13]:
Age_cut          Income_cut              
(-0.117, 23.43]  (9467.548, 116490.402]      492176
                 (116490.402, 222980.804]        12
                 (222980.804, 329471.206]      2096
                 (329471.206, 435961.608]      2354
                 (435961.608, 542452.01]        672
(23.43, 46.86]   (9467.548, 116490.402]       20819
                 (116490.402, 222980.804]         0
                 (222980.804, 329471.206]       242
                 (329471.206, 435961.608]       322
                 (435961.608, 542452.01]        102
(46.86, 70.29]   (9467.548, 116490.402]        1741
                 (116490.402, 222980.804]         0
                 (222980.804, 329471.206]         0
                 (329471.206, 435961.608]         0
                 (435961.608, 542452.01]          0
(70.29, 93.72]   (9467.548, 116490.402]         158
                 (116490.402, 222980.804]         0
                 (222980.804, 329471.206]         0
                 (329471.206, 435961.608]         0
                 (435961.608, 542452.01]          0
(93.72, 117.15]  (9467.548, 116490.402]          15
                 (116490.402, 222980.804]         0
                 (222980.804, 329471.206]         0
                 (329471.206, 435961.608]         0
                 (435961.608, 542452.01]          0
dtype: int64
In [14]:
dat = survey.h_age_income.to_numpy().reshape((5,5))
dat
Out[14]:
array([[492176,     12,   2096,   2354,    672],
       [ 20819,      0,    242,    322,    102],
       [  1741,      0,      0,      0,      0],
       [   158,      0,      0,      0,      0],
       [    15,      0,      0,      0,      0]])
In [15]:
imshow(np.log(dat))
/tmp/ipykernel_38/126979549.py:1: RuntimeWarning: divide by zero encountered in log
  imshow(np.log(dat))
Out[15]:
<matplotlib.image.AxesImage at 0x7fe4acce8be0>

We also provide the full $3\times3$ correlation matrix

In [16]:
survey.corr
Out[16]:
Education Age Income
Education 1.000000 0.700183 0.405668
Age 0.700183 1.000000 0.351382
Income 0.405668 0.351382 1.000000

Surveys¶

You can run surveys in which you can ask at most three questions.

In [17]:
survey.valid_questions
Out[17]:
['Species',
 'Age',
 'Education',
 'Income',
 'Grassland?',
 'Increase taxes?',
 'Maintain forest']

You may poll at most 1000 animals, either online or over the telephone. The following is a relatively small online survey

In [18]:
results = survey.survey(
    ['Species', 'Increase taxes?', 'Maintain forest'],
    'online',
    100
)
In [19]:
dictionary = dict(list(results.groupby('Species')))
In [20]:
figure()
dictionary['m'].plot.scatter('Increase taxes?', 'Maintain forest',c='blue',ax=gca())
dictionary['l'].plot.scatter('Increase taxes?', 'Maintain forest',c='red',ax=gca())
dictionary['e'].plot.scatter('Increase taxes?', 'Maintain forest',c='green',ax=gca())
dictionary['g'].plot.scatter('Increase taxes?', 'Maintain forest',c='black',ax=gca())
legend(['Monkey', 'Lion', 'Elephant', 'Gazelle'])
Out[20]:
<matplotlib.legend.Legend at 0x7fe4ac93b3d0>

ONS players¶

As ONS players, you import the file as ons

In [21]:
import ons

You have access to all the same data but also the full census data

In [22]:
ons.all_data
Out[22]:
Species Age Education Income Grassland? Response
0 m 16.71 1 24457.97 True False
1 m 9.39 3 40965.07 True True
2 m 6.46 0 10000.00 True True
3 m 8.25 2 52457.45 False True
4 m 14.19 4 64831.25 False True
... ... ... ... ... ... ...
520704 g 2.62 0 10000.00 False False
520705 g 44.35 4 12000.00 False True
520706 g 2.65 0 10000.00 False False
520707 g 9.50 2 50730.52 False False
520708 g 2.79 0 10000.00 False True

520709 rows × 6 columns