Data Collection Toolkit

H3ABioNet's harmonised phenotype data collection standards and modules: how to use them, how they relate to REDCap, PhenX, DUO and GECKO, and how to request a specific module.

What is the H3ABioNet Data Collection Toolkit?

I'm setting up data collection for an African genomics study and have been pointed to the H3ABioNet 'Data Collection Toolkit'. What is it, what's in it, and how do I use it?

"Data Collection Toolkit" refers to H3ABioNet's set of
harmonised standards and reusable modules for collecting
clinical phenotype and exposure data in genomics studies. It is
not a single piece of software — it's a curated combination of
standards, ontologies, and pre-built data-collection forms.

What's in the toolkit:

  • Phenotype-specific data collection modules. Developed
    with the H3Africa Phenotype Harmonisation Working Group,
    these are reusable form templates and standardised variable
    definitions for common disease areas (cardiometabolic,
    infectious disease, cancer, etc.). The intent is that
    studies across the continent can collect comparable data,
    which makes downstream joint analyses possible.
  • REDCap as the collection tool. H3ABioNet provides
    support for REDCap and recommends it as the primary data
    capture platform. See
    https://www.afrigen-d.org/data-resources#data-collection-toolkits.
  • PhenX-based protocols. Where appropriate, modules draw
    on the international PhenX Toolkit
    (https://www.phenxtoolkit.org) for consistent measurement
    protocols.
  • Standardised consent codes (GA4GH DUO). Datasets are
    tagged with Data Use Ontology terms so downstream consumers
    can computationally check that their intended use matches
    the consent.
  • GECKO ontology for cohort description. Standardises how
    cohorts are described so they can be discovered and compared
    across studies.

Why it matters:

  • Without harmonised collection, joint analyses across studies
    become impossible — you can't pool data when one study
    collected blood pressure with two readings and another with
    three.
  • DUO-tagged consent means future data-access decisions can be
    made consistently and at scale, including by automated tooling.
  • Studies that follow H3ABioNet standards have a much easier
    path to publishing their data via the H3Africa catalogue
    and EGA, because the metadata aligns with what the catalogue
    expects.

How to use it for your study:

  1. Read the overview at
    https://www.afrigen-d.org/data-resources
    and the standards page at
    https://www.afrigen-d.org/data-resources.
  2. Decide which phenotype modules you need (disease area
    specific). If your area isn't currently covered, raise a
    helpdesk ticket — the Phenotype Harmonisation Working Group
    may already have an in-progress module, or your study could
    feed into the next iteration.
  3. Set up REDCap (or get access to your institution's
    H3ABioNet-supported REDCap instance — see the REDCap queue
    for access requests).
  4. Import the relevant data dictionary / module template,
    adapt to your study, and validate against the standards.
  5. Tag your datasets with appropriate DUO consent codes from
    the outset — much harder to do retrospectively.

Where to get help:

Raise a helpdesk ticket in the Data Collection Toolkits
queue with: your study design (briefly), the phenotype area
you're collecting on, and your REDCap status. The data
standards team will route you to the right module or
collaborator.