Input data for creating a synthetic population#

You can create a more specific synthetic population by providing more precise data about the population distribution for regions. This section describes all the input data that can be used for creating a synthetic population. You can create a synthetic population with only a few necessary input data and values. However, if you want to create a more specific synthetic population, you can provide more specific inputs described in this section.

You can specify these specific data globally as described in Synthpops configuration file as a csv/xlsx file containing of paths to the files. Also you can specify them as 'default','all', or for each region or like 'CZ01' for each parent region name, like 'Czechia'.

NOTE: All keys (column names) must be used in the default format. DO NOT RENAME THEM!

Types of input data#

necessary input data:

  • Age distribution

Optional input data:

  • Region mobility data
  • Employment rate
  • Enrollment rate
  • Household head age brackets
  • Household head age distribution by family size
  • School size brackets
  • School size distribution
  • School size distribution by type
  • School types by age
  • Workplace size counts by the number of people

Age distribution data#

The age distribution contains information about each region and its age distribution. You can use 16, 18 or 20 brackets for the age distribution settings. Also this is the place for providing information about the region code, its name and sum of the population. The distribution of all brackets must sum up to 1.0.

  • For 16 brackets: 00_04,...,70_74,75_100
  • For 18 brackets: 00_04,...,80_84,85_100
  • For 20 brackets: 00_04,...,90_94,95_100

An example can be downloaded here.

location_code name population 00_04 05_09 ... 95_99
CZ01 Hlavní město Praha 1241664 0.057312606308953 0.041531364362662 ... 0.0007804043606
CZ02 Středočeský kraj 1279345 0.063014276836975 0.051952366249917 ... 0.000469771640957

region mobility data#

Region mobility data can be used for adding people from different regions to a selected region. Basically, it simulates the number of people traveling across regions. These traveling agents are picked randomly from agents traveling from each region. The diagonal is empty, because it is not possible to travel from region to itself.

If the distribution information across regions is not known, you can specify a wider area, for example, a country. So you can use Czechia instead of CZ01, ..., CZXX. However, if you want to specify each region, you can do that as well, or you can also specify only some regions and the rest will be filled with the country level data. NOTE: for a wider area you have to specify the country name not only in input data, but also in Synthpops region configuration creater

An example can be downloaded here.

location_code name CZ01 CZ02 ... CZ08
CZ01 Hlavní město Praha 15847 ... 169
CZ02 Středočeský kraj 100045 ... 121

Employment rate distribution by age#

Employment distribution data stands for defining the probability of being employed at certain age defined from start year to the end year, commonly in the interval from 16 to 100 years of age. It can also hold region codes for another regions, but it will run only selected based on main synthetic population settings. It can use keys for region codes or region names, optionally. NOTE: It depends on region_data_name defined in region settings, in supplied data must least one region name which correspond with region location code or a wider area name must be supplied.

An example can be downloaded here.

Age CZ01 CZ02 ... CZ08 Czechia
16 0.332 0.319 ... 0.241 0
17 0.332 0.237 ... 0.310 0
... ... ... ... ... 0
100 0.064 0.039 .. 0.015 0

School enrollment rate distribution by age#

School enrollment distribution data stands for defining the probability of being enrolled in school at certain age defined from start year to the end year, again from 16 to 100 years of age. The same instructions apply for region coding as in the employment rate distribution information above.

An example can be downloaded here.

Age CZ01 CZ02 ... CZ08 Czechia
0 0 0 ... 0 0
... ... ... ... ... ...
17 0.974 0.974 ... 0.974 0.974
18 0.707 0.707 ... 0.707 0.707
... ... ... ... ... ...
100 0 0 ... 0 0

Household head age brackets#

Household head age brackets are used for defining the age brackets for household head. It is used for creating household head age distribution. Every bracket is defined from minimum age to the maximum age, commonly between 15 and 100 years by step of 5. NOTE: all household related input data must be based on this brackets

An Example can be downloaded here.

min_age max_age
15 19
20 24
... ...
75 79
80 100

Household head age distribution by family size#

Household head age distribution by family size is used for defining the number of household heads at certain household head age brackets. Each row in this table specifies the distribution for a given family size. The family size is the first entry in the row. The remaining entries are, for each household head age bracket, the number or percentage of households with a household head in that age bracket. In the *.csv file the certain household head age brackets are defined by number from 0 to N (based on total number of brackets).

An example can be downloaded here.

number 0 1 2 ... 13
1 1.0 1.0 1.0 ... 1.0
2 163.0 999. 2316.0 ... 2230
... ... ... ... ... ...
7 24.0 33.0 63.0 ... 144.0
8 0.0 0.0 0.0 ... 0.0

Household size distribution#

Household size distribution is used for defining the distribution of households at certain household size. Each row in this table specifies the distribution for a given household size. It depends on the household head age brackets. Every row is defined as the percentage distribution across given every row of household head age bracket. NOTE: this household input data depends on household head age brackets. Also the sum of all distributions must be equal to 1.

An example can be downloaded here.

number distribution
1 0.064
2 0.108
... ...
7 0.128
8 0.0

School size brackets#

School size brackets are used for defining the size of school. It is used for creating school size distribution. Every bracket is defined from the minimum to the maximum size. NOTE: all school related input data must be based on these brackets

An example can be downloaded here.

min max
20 50
51 100
101 300
... ...
2301 2700

School size distribution#

School size distribution is used for defining the number of schools at certain school size brackets. Each row in this table specifies the distribution for a given school size. It depends on the school size brackets. Every row is defined as the percentage distribution across given every row of the school size bracket.

An example can be downloaded here.

distribution
0.06024096385542162
0.07831325301204821
...
0.006024096385542101
0.0

School size distribution by type#

School size distribution by type is used for defining the distribution of school size by school type. Each row in this table specifies the distribution for a given school type. It depends on the school size brackets. Every row is defined as the percentage distribution across given every row of school size bracket. You can specify types of school by yourself and set them age distribution in school types by age section. NOTE: this school input data depends on school size brackets and on school types by age. Also every row distribution sum must be equal to 1.

pk = preschool, es = elementary school, ms = middle school, hs = high school, uv = university

An example can be downloaded here.

school_type distribution
pk "0.012658227848101266, 0.0, ..., 0.43037974683544306, 0.0, 0.0, 0.0, 0.0, 0.0,"
es "0.012658227848101266, 0.0, ..., 0.43037974683544306, 0.0, 0.0, 0.0, 0.0, 0.0,"
hs "0.07407407407407407, 0.1111111111111111, ... , 0.18518518518518517, 0.0, 0.0 , 0.0"
uv "0.027522935779816515, 0.009174311926605505, ... , 0.2018348623853211, 0.3944954128440367, 0.0"

NOTE: distribution must be in double quotes to be validly parsed

School types by age#

School types by age is used for defining the age distribution for each school type. It is used for creating school age distribution. Every row in this table specifies the distribution for a given school type. You can define here those school types.

pk = preschool, es = elementary school, ms = middle school, hs = high school, uv = university

An example can be downloaded here.

school_type min_age max_age
pk 3 5
es 6 10
hs 15 18
uv 19 100

Workplace size counts by number of people#

Workplace size counts is defined by number of people in a specific workplace and the total count of those workplaces. There are defined "brackets" for minimum to maximum agents per workplace and the total count of those workplaces.

An example can be downloaded here.

min_people max_people count
1 4 107682.0
5 9 35584.0
... ... ...
500 999 245.0
1000 1999 150.0