If you want to contribute to DBnomics by adding support for a new data provider, or enhance an existing one, please read the dedicated sections below.
DBnomics core team is open to external contributions and we make increasing efforts to ease the contribution path. Thank you for your interest in DBnomics!
Avoid redundant work
Suppose you want to add to DBnomics a new dataset of a specific provider. To avoid double work you should first check that it is not already available in DBnomics.
Of course when adding a completely new fetcher, there is no question. But sometimes it is not as straightforward as it seems: a particular dataset can be published by a provider at different places of its website, maybe under different names, etc.
In case of doubt, just ask DBnomics community on the forum.
Only open-data or data under a permissive licence can be contributed to DBnomics.
Please double check that the licence of source data is permissive enough when writing a fetcher.
See also: Can I have my private data on DBnomics?
Should I write a new fetcher or contribute to an existing one?
When adding support for a completely new provider, create a new fetcher.
When fixing errors in existing data, submit a patch to the corresponding fetcher via a GitLab merge-request.
When adding new datasets to an existing provider, it depends. If you feel confident with the source code of the existing fetcher and if your changes fit well go ahead and submit a merge-request. On the other hand, if your feel more confident working in your own project, go ahead and create a personal Git repository.
When several fetchers write data for the same provider, they should have their own source-data repository but share the same json-data repository. The case has never been encountered for now.
In case of doubt, just ask DBnomics community on the forum.
Contributing to an existing fetcher
When contributing to an existing fetcher, you should respect the code style and use code quality tools defined by its maintainers.
In case of error the maintainer of the fetcher will have to fix it, so the contributed changes should be cristal clear to her.
And of course the data produced by your changes must be valid (cf acceptation process section below).
When a fetcher is ready to be submitted, its author can follow this process in order for the fetcher to be accepted, deployed to production, and its data visible on DBnomics website.
One of the main conditions for a fetcher to be accepted is to produce valid data. Follow the task: Validate data produced by a fetcher. Once data is valid, the fetcher can be submitted to the DBnomics core team.
Criteria to meet
Technically the fetcher must meet some criteria in order to be run in an automated job.
- the fetcher must be installable from a fresh virtual env: commit dependencies (
requirements.in) and locked versions (
requirements.txt) as explained in this section
convert.pymust be executable with the common script arguments (like all fetchers do) as explained in the download and convert sections
convert.pymust produce data that is valid towards DBnomics data model, as explained in this section
- the fetcher must define a license as explained in this section
Optionally, to ease contribution:
- the Python source files should be formatted automatically as explained in this section
- the Python source files should be linted as explained in this section
Submit the fetcher
The contributor can create an account on the GitLab instance of DBnomics (click on register).
By default, to avoid spam, new accounts are created as external users, that can't create repositories. The contributor can send an email to
email@example.com to ask for removing the external status.
At first, the source code is just executed by DBnomics core team, but not really audited. It's considered as a black box. Only the validity of data produced by
convert.py is checked.
If data is valid, the fetcher is deployed manually to DBnomics pre-production instance by a developer of the core team.
An economist of the core team will check data corresponding to the fetcher on the pre-production instance, and compare it to source data available on the provider website.
If there are problems they will be discussed on the issue opened earlier.
If everything is OK, then a core team developer will deploy the new fetcher to production.
Production and maintainance
Deploying a fetcher to production consists in configuring a pipeline of jobs that is scheduled in order to run the fetcher every day. This is done by a developer of the core team.
After running in production, fetchers often break, and this can hardly be avoided. In order to have up-to-date data on DBnomics, it is recommended to do fetcher maintainance as quickly as possible.
Each fetcher has a maintainer, which is its author by default. The dashboard shows them.
In case of problem with a fetcher, an issue is created and assigned to its maintainer who is responsible for solving it. DBnomics core members will look at the issue in a second priority.
Also, if questions are asked on the forum about a fetcher, its maintainer is expected to participate to the discussion. DBnomics core members will participate as well.
See also: Why do fetchers break after a while?
Writing a new fetcher
Install and configure environment
Here we use a classic Python workflow with
See also: Virtualenv section of The Hitchhiker’s Guide to Python!
The recommended Python version is the latest stable.
Initialize a new project from dbnomics-fetcher-cookiecutter.
pip install virtualenv mkvirtualenv my-fetcher pip install cookiecutter git clone https://git.nomics.world/dbnomics/dbnomics-fetcher-cookiecutter.git cookiecutter dbnomics-fetcher-cookiecutter # Prepare directories to write data to. mkdir source-data json-data
The download script, named
download.py, downloads data from the provider and writes it to a target directory named
source-data. It expects the target directory to be empty.
- Your download script is a bot, similarly to search engine bots that index webpages. Respect the directives exposed by the
robots.txtfile of the provider website.
- Write your script in a resilient way, such that when data evolves, it may not break. Of course one cannot completely anticipate every possible change.
Run the download script:
python download.py source-data
The convert script, named
convert.py, converts downloaded data from
source-data to DBnomics data model and writes it to a target directory named
json-data. It expects the target directory to be empty.
Run the convert script:
python convert.py source-data json-data
Follow the task: Validate data produced by a fetcher.
Define a license
Every source code that is published publically should include a license file named
For example, for the AGPL-3.0:
wget https://www.gnu.org/licenses/agpl-3.0.txt -O LICENSE
The license file should be committed.
Code style and quality
In order to improve maintainability of a fetcher, is it highly recommended to follow code style and quality guidelines recommended by DBnomics project. Indeed, once a fetcher fails in production, it is very likely that a member of DBnomics maintainance team will handle the problem instead of its original author.
The recommended guidelines mainly follow Python good practices, with additional DBnomics related specific ones.
Declare dependencies with versions
Your fetcher may use external Python packages that are installed in your virtualenv with
It is highly recommended to pin the version number of those dependencies in
requirements.txt. It is not sufficient to just mention the names of the packages in
requirements.txt, otherwise one future day someone will install them with the versions available that future day, and the packages may behave differently than those you worked with. Also, it is important to pin versions recursively.
There are several solutions in the Python community to achieve version pinning. DBnomics fetchers usually use pip-tools like this:
# requirements.in python-slugify ujson
pip install pip-tools pip-compile
The following file is generated:
# requirements.txt # # This file is autogenerated by pip-compile # To update, run: # # pip-compile # python-slugify==4.0.0 # via -r requirements.in text-unidecode==1.3 # via python-slugify ujson==2.0.3 # via -r requirements.in
requirements.txt must be committed.
Format your code automatically
Format Python source code with an non opinionated formatter and ship the configuration you used along with the source code.
This will almost completely avoid committing changes only related to source code formatting, and ease finding bugs while reading through a clean source code history.
Configure your source code editor to use that formatter.
Use a linter
A linter is a tool that catches common errors in the source code.
Source code editors can take advantage of linters to highlight errors directly under the lines of the source code.
Use Python types
It is highly recommended to use Python types annotations in the source code of a fetcher. This will improve source-code editor instrumentation such as auto-completion and tooltips, and help catching errors.
This mainly consists of using mypy in your source code editor.
Follow common conventions
Some well-known Python libraries like Pandas propose a de-facto standard way to name variables, like
df for data-frames.
DBnomics recommends following those conventions.
Do not abbreviate data model concepts
DBnomics data model defines concepts such as provider, dataset, time series, dimension or observation. Do not abbreviate those terms.
Plural of time series
In English, a time series is invariable.
In order to distinguish a single series from a list of series:
- name a single series
- name a list of series
Submit your fetcher
Follow the acceptation process.
Report problems with data
If you notice wrong data on the website, you can help by contributing at different levels.
First of all, you can tell DBnomics core team about the problem by creating a new issue and filling the template named "Problem with data". This template contains placeholders that you can replace with real values. The idea is to give as much details as possible to help the DBnomics team to investigate!
Then you can try to solve the issue by yourself if you'd like to. Once you identified the source code repository of the fetcher, you can fork it and submit a merge-request. We recommend doing that a discussion with the DBnomics core team on the issue you created.
In any case: thank you for your contribution!
Validate data produced by a fetcher
Suppose you just finished writing or fixing a fetcher. Now you'd like to check the validity of data produced by
convert.py. Run your fetcher if not already done:
mkdir source-data json-data python download.py source-data python convert.py source-data json-data
Now install the validation script and run it:
pip install dbnomics-data-model dbnomics-validate --all-series --all-observations --developer-mode json-data
- Series "RBA/A3-4/AFROMOTD" at location AFROMOTD.tsv (line 3) Error code: duplicated-observations-period Message: Duplicated period Context: period: '2013-11-11' - Series "RBA/A3-4/AFROMOTD" at location AFROMOTD.tsv (line 5) Error code: duplicated-observations-period Message: Duplicated period Context: period: '2013-11-12' [...] Encountered errors codes: - duplicated-observations-period: 12448
At the end of the output you'll find a summary of the count of errors by type.
--developer-mode option displays all errors, in particular the non fatal ones, in order to improve the quality of your fetcher. In production this option is not used to accelerate validation.
If your fetcher writes a huge quantity of data, you can remove the
--all-series option to validate only a randomly chosen sample of series per dataset. You can also remove the
--all-observations option to validate only a few observations per series.
View data produced by a fetcher in a local instance of DBnomics
See the dbnomics-docker project.