I'm setting up data collection for an African genomics study and have been pointed to the H3ABioNet 'Data Collection Toolkit'. What is it, what's in it, and how do I use it?
"Data Collection Toolkit" refers to H3ABioNet's set of
harmonised standards and reusable modules for collecting
clinical phenotype and exposure data in genomics studies. It is
not a single piece of software — it's a curated combination of
standards, ontologies, and pre-built data-collection forms.
What's in the toolkit:
- Phenotype-specific data collection modules. Developed
with the H3Africa Phenotype Harmonisation Working Group,
these are reusable form templates and standardised variable
definitions for common disease areas (cardiometabolic,
infectious disease, cancer, etc.). The intent is that
studies across the continent can collect comparable data,
which makes downstream joint analyses possible. - REDCap as the collection tool. H3ABioNet provides
support for REDCap and recommends it as the primary data
capture platform. See
https://www.afrigen-d.org/data-resources#data-collection-toolkits. - PhenX-based protocols. Where appropriate, modules draw
on the international PhenX Toolkit
(https://www.phenxtoolkit.org) for consistent measurement
protocols. - Standardised consent codes (GA4GH DUO). Datasets are
tagged with Data Use Ontology terms so downstream consumers
can computationally check that their intended use matches
the consent. - GECKO ontology for cohort description. Standardises how
cohorts are described so they can be discovered and compared
across studies.
Why it matters:
- Without harmonised collection, joint analyses across studies
become impossible — you can't pool data when one study
collected blood pressure with two readings and another with
three. - DUO-tagged consent means future data-access decisions can be
made consistently and at scale, including by automated tooling. - Studies that follow H3ABioNet standards have a much easier
path to publishing their data via the H3Africa catalogue
and EGA, because the metadata aligns with what the catalogue
expects.
How to use it for your study:
- Read the overview at
https://www.afrigen-d.org/data-resources
and the standards page at
https://www.afrigen-d.org/data-resources. - Decide which phenotype modules you need (disease area
specific). If your area isn't currently covered, raise a
helpdesk ticket — the Phenotype Harmonisation Working Group
may already have an in-progress module, or your study could
feed into the next iteration. - Set up REDCap (or get access to your institution's
H3ABioNet-supported REDCap instance — see the REDCap queue
for access requests). - Import the relevant data dictionary / module template,
adapt to your study, and validate against the standards. - Tag your datasets with appropriate DUO consent codes from
the outset — much harder to do retrospectively.
Where to get help:
Raise a helpdesk ticket in the Data Collection Toolkits
queue with: your study design (briefly), the phenotype area
you're collecting on, and your REDCap status. The data
standards team will route you to the right module or
collaborator.