import pandas as pd
import survey
from IPython.display import display
from pylab import *
get_ipython().run_line_magic(
"matplotlib",
"inline" if "ZMQInteractiveShell" in str(get_ipython()) else "tk"
)
Accessible variables often have prefixes such as ur_ or ugh_
We provide some data based on whether participants responded online or over the phone (prefix: ur_)
This is done by education
survey.ur_edu
| Online | Direct | |
|---|---|---|
| 0 | 133034 | 101271 |
| 1 | 28859 | 18142 |
| 2 | 30510 | 11910 |
| 3 | 100020 | 21623 |
| 4 | 68119 | 7221 |
and age
survey.ur_age
| Online | Telephone | |
|---|---|---|
| (-0.117, 11.715] | 217367 | 109838 |
| (11.715, 23.43] | 130002 | 40103 |
| (23.43, 35.145] | 10760 | 6160 |
| (35.145, 46.86] | 2092 | 2473 |
| (46.86, 58.575] | 299 | 1015 |
| (58.575, 70.29] | 22 | 405 |
| (70.29, 82.005] | 0 | 119 |
| (82.005, 93.72] | 0 | 39 |
| (93.72, 105.435] | 0 | 10 |
| (105.435, 117.15] | 0 | 5 |
You can plot these DataFrames using eg.
survey.ur_edu.plot.bar()
xlabel('Education')
ylabel("Respondents")
Text(0, 0.5, 'Respondents')
We have five levels: none (0), GCSE (1), A-levels (2), under-graduate (3), graduate (4).
We provide some univariant data by habitat.
The split is in u_habitat
display(survey.u_habitat)
survey.u_habitat.plot.pie(y=0)
| Grassland? | |
|---|---|
| False | 292181 |
| True | 228528 |
<AxesSubplot: ylabel='Grassland?'>
For the grasslands (prefix ug_) we have species
display(survey.ug_species)
survey.ug_species.plot.pie(y=0)
| Species | |
|---|---|
| m | 117905 |
| e | 59728 |
| l | 30867 |
| g | 20028 |
<AxesSubplot: ylabel='Species'>
education level
survey.ug_edu
| Education | |
|---|---|
| 0 | 110937 |
| 3 | 45658 |
| 1 | 27888 |
| 4 | 24848 |
| 2 | 19197 |
and response type
survey.ug_response
| Response | |
|---|---|
| True | 152841 |
| False | 75687 |
We have the same for the forest (prefix uf_).
For the continuous distributions (ugh_ and ufh_) we have histograms for age ranges
display(survey.ugh_age)
survey.ugh_age.plot.bar()
xlabel("Age range")
ylabel("Respondents")
| Age | |
|---|---|
| (-0.112, 7.463] | 102817 |
| (7.463, 14.927] | 77043 |
| (14.927, 22.39] | 43756 |
| (22.39, 29.853] | 3562 |
| (29.853, 37.317] | 778 |
| (37.317, 44.78] | 305 |
| (44.78, 52.243] | 148 |
| (52.243, 59.707] | 56 |
| (59.707, 67.17] | 36 |
| (67.17, 74.633] | 17 |
| (74.633, 82.097] | 5 |
| (82.097, 89.56] | 2 |
| (104.487, 111.95] | 2 |
| (89.56, 97.023] | 1 |
| (97.023, 104.487] | 0 |
Text(0, 0.5, 'Respondents')
and income ranges
display(survey.ugh_income)
survey.ugh_income.plot.bar()
xlabel("Income range")
ylabel("Respondents")
| Income | |
|---|---|
| (9908.162, 16122.511] | 163645 |
| (22245.023, 28367.534] | 13260 |
| (16122.511, 22245.023] | 12297 |
| (28367.534, 34490.045] | 11780 |
| (34490.045, 40612.557] | 9816 |
| (46735.068, 52857.579] | 5017 |
| (40612.557, 46735.068] | 4627 |
| (52857.579, 58980.091] | 3945 |
| (58980.091, 65102.602] | 1680 |
| (65102.602, 71225.113] | 988 |
| (71225.113, 77347.625] | 676 |
| (77347.625, 83470.136] | 417 |
| (83470.136, 89592.647] | 268 |
| (89592.647, 95715.159] | 93 |
| (95715.159, 101837.67] | 19 |
Text(0, 0.5, 'Respondents')
We also provide the 10%-quantile (prefix ugq_ and ufq_)
survey.ugq_age
| Age | |
|---|---|
| 0.1 | 1.60 |
| 0.2 | 3.24 |
| 0.3 | 4.91 |
| 0.4 | 6.60 |
| 0.5 | 8.43 |
| 0.6 | 10.56 |
| 0.7 | 12.85 |
| 0.8 | 15.25 |
| 0.9 | 17.79 |
and a summary report (ugs_ and ufs_)
survey.ugs_age
{'mean': 9.363614436742981,
'std': 6.505748238948622,
'media': 8.43,
'range': [0.0, 111.95]}
We also provide the following 2D histograms
h_age_income)h_age_edu)h_income_edu)To protect our respondents privacy, we do not split this further. Plotting these is a bit annoying
survey.h_age_income
Age_cut Income_cut
(-0.117, 23.43] (9467.548, 116490.402] 492176
(116490.402, 222980.804] 12
(222980.804, 329471.206] 2096
(329471.206, 435961.608] 2354
(435961.608, 542452.01] 672
(23.43, 46.86] (9467.548, 116490.402] 20819
(116490.402, 222980.804] 0
(222980.804, 329471.206] 242
(329471.206, 435961.608] 322
(435961.608, 542452.01] 102
(46.86, 70.29] (9467.548, 116490.402] 1741
(116490.402, 222980.804] 0
(222980.804, 329471.206] 0
(329471.206, 435961.608] 0
(435961.608, 542452.01] 0
(70.29, 93.72] (9467.548, 116490.402] 158
(116490.402, 222980.804] 0
(222980.804, 329471.206] 0
(329471.206, 435961.608] 0
(435961.608, 542452.01] 0
(93.72, 117.15] (9467.548, 116490.402] 15
(116490.402, 222980.804] 0
(222980.804, 329471.206] 0
(329471.206, 435961.608] 0
(435961.608, 542452.01] 0
dtype: int64
dat = survey.h_age_income.to_numpy().reshape((5,5))
dat
array([[492176, 12, 2096, 2354, 672],
[ 20819, 0, 242, 322, 102],
[ 1741, 0, 0, 0, 0],
[ 158, 0, 0, 0, 0],
[ 15, 0, 0, 0, 0]])
imshow(np.log(dat))
/tmp/ipykernel_38/126979549.py:1: RuntimeWarning: divide by zero encountered in log imshow(np.log(dat))
<matplotlib.image.AxesImage at 0x7fe4acce8be0>
We also provide the full $3\times3$ correlation matrix
survey.corr
| Education | Age | Income | |
|---|---|---|---|
| Education | 1.000000 | 0.700183 | 0.405668 |
| Age | 0.700183 | 1.000000 | 0.351382 |
| Income | 0.405668 | 0.351382 | 1.000000 |
You can run surveys in which you can ask at most three questions.
survey.valid_questions
['Species', 'Age', 'Education', 'Income', 'Grassland?', 'Increase taxes?', 'Maintain forest']
You may poll at most 1000 animals, either online or over the telephone.
The following is a relatively small online survey
results = survey.survey(
['Species', 'Increase taxes?', 'Maintain forest'],
'online',
100
)
dictionary = dict(list(results.groupby('Species')))
figure()
dictionary['m'].plot.scatter('Increase taxes?', 'Maintain forest',c='blue',ax=gca())
dictionary['l'].plot.scatter('Increase taxes?', 'Maintain forest',c='red',ax=gca())
dictionary['e'].plot.scatter('Increase taxes?', 'Maintain forest',c='green',ax=gca())
dictionary['g'].plot.scatter('Increase taxes?', 'Maintain forest',c='black',ax=gca())
legend(['Monkey', 'Lion', 'Elephant', 'Gazelle'])
<matplotlib.legend.Legend at 0x7fe4ac93b3d0>
As ONS players, you import the file as ons
import ons
You have access to all the same data but also the full census data
ons.all_data
| Species | Age | Education | Income | Grassland? | Response | |
|---|---|---|---|---|---|---|
| 0 | m | 16.71 | 1 | 24457.97 | True | False |
| 1 | m | 9.39 | 3 | 40965.07 | True | True |
| 2 | m | 6.46 | 0 | 10000.00 | True | True |
| 3 | m | 8.25 | 2 | 52457.45 | False | True |
| 4 | m | 14.19 | 4 | 64831.25 | False | True |
| ... | ... | ... | ... | ... | ... | ... |
| 520704 | g | 2.62 | 0 | 10000.00 | False | False |
| 520705 | g | 44.35 | 4 | 12000.00 | False | True |
| 520706 | g | 2.65 | 0 | 10000.00 | False | False |
| 520707 | g | 9.50 | 2 | 50730.52 | False | False |
| 520708 | g | 2.79 | 0 | 10000.00 | False | True |
520709 rows × 6 columns