Redistribute data from providers as-is
We want our users to be aware that the data found on DBnomics is similar to the provider data. On the other hand, we want our users to avoid dealing with data representation specificities.
As a consequence, DBnomics distinguishes data from its format, and simplifies format only.
If DBnomics simplified or harmonized provider data, that would require more manual work (i.e. data curation), and this would be incompatible with DBnomics automatic data fetching (see next section), and it would be impossible for the user to know what the provider data was. So data curation is left to the user.
The following items are kept as-is from the provider:
- time series and their observations
- dataset dimensions: DBnomics does not harmonize dimension names and values.
- NA (non-available) values usage: DBnomics does not add or remove them. If a provider distributes a time series with an incomplete calendar (with some missing periods) DBnomics does not tries to complete it.
However some data formatting is harmonized:
- periods: some providers use different codes to represent them (
2020M01for January, 2020). DBnomics always use
2020-01. See below for all period formats.
- NA (non-available) values: some providers use
NaN, some other
-9999, etc. DBnomics always use
Some providers distribute time series with no observation, or with only NA values, and DBnomics keeps them as-is as well. Here are some examples:
Update data regularly
We want up-to-date data on DBnomics, so data has to be updated automatically.
Data acquisition is done by small programs called DBnomics fetchers which are run automatically by the DBnomics platform.
Any manual data acquisition (e.g. copy-pasting values from a spreadsheet) would lead to outdated data.
We also want to keep track of the execution of fetchers, and that's way we have a dashboard.
Keep versions of provider data
Access data from programming languages
Access data from external software
Harmonized data model
Dimensions are provided as-is from provider data.
Period format is normalized:
YYYY-MMfor months (e.g.
YYYY-MM-DDfor days (MUST be padded for
YYYY-Q[1-4]for year quarters
2018-Q1represents jan to mar 2018, and
2018-Q4represents oct to dec 2018
YYYY-S[1-2]for year semesters (aka bi-annual, semi-annual)
2018-S1represents jan to jun 2018, and
2018-S2represents jul to dec 2018
YYYY-B[1-6]for pairs of months (aka bi-monthly)
2018-B1represents jan + feb 2018, and
2018-B6represents nov + dec 2018
YYYY-W[01-53]for year weeks (MUST be padded)
Normalization is done by each fetcher based on the knowledge of the provider data.
For example, a period like
2000-qII would be normalized as
2000-Q2 by the conversion script of the fetcher.
Note: in the case the time series periods have a daily format with a lower frequency (e.g. monthly), then the period format is simplified to match the frequency. For example, periods like
2000-01-01, 2000-02-01, 2000-03-01 are simplified as
2000-01, 2000-02, 2000-03, but periods like
2000-01-15, 2000-02-15, 2000-03-15 can't be simplified because we would lose the first day information they convey.
Support different data models
DBnomics defines a data model inspired from SDMX, which has to be compatible with all supported providers, even if their own data model is not SDMX-compliant.
As a consequence, DBnomics data model defines hard constraints, but some other constraints have to be soft (cf data model).