African Genome Variation Database (AGVD)

What AGVD is, what data it offers, how to look up regional African allele frequencies, and how it relates to gnomAD and the rest of the AfriGen-D platform.

What is AGVD and what data does it provide?

I keep seeing AGVD referenced in AfriGen-D materials and in Beacon variant pages. What is it, what data does it contain, and what can I use it for?

AGVD — the African Genome Variation Database — is the
AfriGen-D resource that publishes allele frequencies of
human genetic variants across African populations
, broken
down by sub-region.

Live at https://agvd.afrigen-d.org.

What it is:

  • An aggregated, summary-level resource. AGVD does not host
    raw sequencing data or individual-level genotypes — it
    publishes minor allele frequencies (MAF) and related summary
    statistics computed from underlying African cohorts.
  • Regional stratification: African allele frequencies are
    reported per sub-region (Central, Eastern, Northern, Southern,
    and Western Africa) where sample sizes allow.
  • Open access for read / lookup. No data access committee
    approval is needed to query AGVD.

What it's for:

  • Looking up the "right" allele frequency for a variant in a
    specific African population, when the global gnomAD figure
    isn't representative.
  • Powering variant-frequency callouts on the AfriGen-D Beacon
    (https://beacon.afrigen-d.org) variant detail pages, which
    cross-link out to AGVD.
  • Reference data for clinical interpretation, study power
    calculations, and population genetics analyses where African
    ancestry needs to be considered properly.

For raw genotype or whole-genome sequencing data, see the
related KB article on getting raw WGS data — AGVD is not the
right resource for that.

How do I look up the African allele frequency for a variant?

I have a variant of interest (rsID, or genomic coordinates) and want to know its allele frequency across African populations. How do I search AGVD?

  1. Go to https://agvd.afrigen-d.org.
  2. Use the search box to enter your variant identifier — either
    an rsID (e.g. rs1801133) or genomic coordinates in
    chr:position form (e.g. chr1:11796321) on GRCh38.
  3. The variant detail page shows allele frequencies grouped by:
    • Overall African — pooled across all contributing
      African cohorts.
    • Per-region — Central, Eastern, Northern, Southern, and
      Western Africa, where sample sizes are sufficient.
    • gnomAD global for comparison.

Worked example — rs1801133 (MTHFR C677T):

  • Central Africa — ~3.1%
  • Western Africa — ~6.8%
  • Eastern Africa — ~7.2%
  • Southern Africa — ~8.0%
  • Northern Africa — ~25.0%
  • gnomAD global (for comparison) — ~25.7%

The eight-fold within-Africa range here illustrates why a single
"African" or "global" figure is often inadequate for clinical or
population-genetics decisions.

If your variant isn't found, the most common cause is that it
doesn't pass quality filters in any contributing cohort. Open a
helpdesk ticket with the rsID / coordinate and we'll check.

How does AGVD compare to gnomAD for African populations?

gnomAD already reports African / African-American allele frequencies. Why do I need a separate African database, and when should I prefer AGVD over gnomAD's `afr` column?

gnomAD's afr category is dominated by African-American samples,
which reflect a mix of West/Central African ancestry plus
non-African admixture. It is not a substitute for
within-Africa structure.

AGVD differs in three ways that matter for African research:

  1. Sub-regional stratification. AGVD splits frequencies
    across five African sub-regions (Central, Eastern, Northern,
    Southern, Western). gnomAD reports a single pooled afr
    figure. For many variants the within-Africa range is several-
    fold (see the rs1801133 example in the variant-lookup KB
    article).
  2. Continental African cohorts. AGVD aggregates studies that
    sampled people living in Africa, not diaspora cohorts. For
    clinical interpretation in an African setting, this is the
    more relevant denominator.
  3. African data governance. Contributing cohorts retain
    control of their data; AGVD publishes summary statistics
    under terms agreed by the originating studies.

When to prefer AGVD:

  • Powering / interpreting a study in a specific African country
    or region.
  • Variant prioritisation for an African clinical case.
  • Building a non-overlap or enrichment argument that depends on
    within-Africa variation.

When gnomAD is still appropriate:

  • Looking at non-African ancestry strata.
  • Variants AGVD doesn't yet cover (e.g. ultra-rare variants
    dependent on cohorts not yet contributing).
  • Cross-platform comparisons where gnomAD is the de-facto
    global denominator.

For most African-focused projects, use both — gnomAD for context
and AGVD for the African-specific number.

Can I get raw WGS / sequencing data from AGVD?

I need raw whole-genome sequencing data from African cohorts for my research (e.g. variant discovery, case-control study, ML training). Can AGVD provide this?

No. AGVD is an aggregated summary statistics resource.
It publishes per-variant allele frequencies and related summary
data; it does not host individual-level genotypes, VCFs, or
sequencing reads.

If you need raw or individual-level data, the path depends on
what kind of data and which cohort:

Reference panel data (for imputation):

  • The H3Africa reference panel is accessible through the
    AfriGen-D Imputation Service at
    https://fedimpute.afrigen-d.org. It is open access for
    imputation use but is not distributed for download.

Individual-level cohort data:

  • Browse https://catalog.h3africa.org to discover H3Africa
    studies and their underlying datasets. Most H3Africa data is
    controlled access — each study has a Data Access Committee
    (DAC) that reviews and approves requests.
  • For each dataset you're interested in, the catalogue entry
    lists the DAC contact, the access conditions, and the data
    repository (typically EGA — European Genome-phenome Archive —
    for sequence-level data).
  • You'll need: an institutional affiliation, ethics clearance,
    a data access agreement (DAA), and a research protocol.
    Approval timelines are usually weeks to months.

Disease-specific cohorts not in H3Africa:

  • AGVD covers what its contributing studies cover — which is
    broad but not exhaustive. For specific disease cohorts (e.g.
    brain cancer in African populations) the data may simply not
    exist in any African consortium yet.
  • In that case, options are: contact African research groups
    working in that area directly; submit a sequencing proposal
    to a service like the H3Africa-affiliated nodes; or design
    your study around what's accessible (e.g. summary statistics
    from AGVD plus targeted local sequencing).

For help routing your specific request, raise a helpdesk ticket
in the AGVD queue with: the data type, the size you need,
the disease / phenotype, and your institutional / ethics status.
We can suggest the most realistic path.