Platform & General Questions

Cross-cutting how-to articles for the AfriGen-D platform: checksums, container conversion, where to report problems.

How do I verify a download with MD5 or SHA256?

I downloaded a large file and want to confirm it transferred without corruption. The download page provides an MD5 / SHA256 checksum — how do I compare?

Computing the checksum locally and comparing to the published value
is the standard verification step. Commands per OS:

Linux / macOS terminal:

# MD5
md5sum file.zip                    # Linux
md5 file.zip                       # macOS

# SHA256
sha256sum file.zip                 # Linux
shasum -a 256 file.zip             # macOS

Windows PowerShell:

Get-FileHash file.zip -Algorithm MD5
Get-FileHash file.zip -Algorithm SHA256

Compare the output (a long hex string) to the value on the download
page. If they match exactly, the file is intact. If they differ,
re-download — the difference is not normally something you can fix
after the fact.

How do I convert a Docker container to Singularity?

I have a Docker image and need to run it on an HPC cluster that only allows Singularity / Apptainer. How do I convert?

Two paths depending on where the Docker image lives:

From a container registry (Docker Hub, quay.io, GHCR):

singularity pull docker://quay.io/h3abionet_org/imputation_tools:latest
# produces imputation_tools_latest.sif

From a local Docker daemon (docker images shows it):

docker save myimage:latest -o myimage.tar
singularity build myimage.sif docker-archive://myimage.tar

Common issues:

  • ENTRYPOINT differs from Singularity behavioursingularity exec invokes a command directly; singularity run runs the
    runscript. If your Docker image relies on ENTRYPOINT, you may
    need to invoke singularity run, or pass the original entrypoint
    explicitly.
  • User-id remapping — Docker runs as root by default;
    Singularity runs as your user. Files written to mounted volumes
    will be owned by you, not root.
  • Bind paths — pass --bind /scratch:/scratch (or similar) to
    make HPC scratch directories available inside the container.

For an AfriGen-D / H3ABioNet pipeline, the recommended approach is
to use the Nextflow singularity profile, which handles the pull
and caching automatically.

How do I find existing African genomic datasets?

I'm preparing a grant / proposal / paper and need to discover what African genomic datasets (genotyping, sequencing, microbiome) are already available. Where do I look?

Several catalogues index African genomic resources:

  • H3Africa Data & Biospecimens Catalogue
    https://catalog.h3africa.org. Searchable index of H3Africa
    consortium datasets and biospecimen collections, with contact
    information for each study.
  • AfriGen-D Reference Panels Explorer
    https://fedimpute.afrigen-d.org (Explorer tab). Lists the imputation
    reference panels available on the AfriGen-D service, with sample
    counts and per-population allele frequency information.
  • eLwazi Open Data Science Platform
    https://elwazi.org. The platform aggregating African data
    science resources, including pointers to consortium catalogues.

Most H3Africa datasets are controlled access — discovery is open,
but obtaining the data requires a Data Access Agreement (DAA) with
the originating study. The catalogue entry for each dataset lists
the contact / data access committee responsible.

For dataset-specific questions that the catalogue can't answer
(custom subsets, joint analyses), reach out to the AfriGen-D /
H3ABioNet leadership directly via the helpdesk — we'll route to
the right collaborator.

How do I convert a VCF file to MAF format?

I have variant calls in VCF format and need MAF (Mutation Annotation Format) to use tools like maftools in R for oncoplots and lollipop plots. What's the conversion path?

The standard converter is vcf2maf from Memorial Sloan
Kettering Cancer Center:

# Clone the repository
git clone https://github.com/mskcc/vcf2maf.git
cd vcf2maf

# Install Perl dependencies
cpanm --installdeps .

# Convert
perl vcf2maf.pl --input-vcf sample.vcf --output-maf sample.maf \
  --tumor-id SAMPLE_TUMOR --normal-id SAMPLE_NORMAL \
  --vep-path /path/to/vep --vep-data /path/to/vep_cache \
  --ref-fasta /path/to/GRCh38.fa

vcf2maf requires Ensembl VEP for variant annotation, so install
VEP first if you don't have it. The tool annotates each variant
and selects the canonical transcript effect, which is what MAF
format expects.

For a quick conversion without full annotation, maftools::vcfToMaf
in R can take a vcf2maf-style TSV or a pre-annotated VCF and
produce a maftools-ready data frame — useful if you only need
the conversion to drive maftools visualisations and not the full
cancer-genomics annotation set.

Alternatives: bcftools + a custom awk script can produce a minimal
MAF for simple cases, but vcf2maf is the standard most maftools
workflows expect.

How do I report a problem with the AfriGen-D platform?

I'm seeing an error in the UI, my job didn't run, or I have a question that isn't answered in the documentation. How do I get help?

Open a helpdesk ticket at https://helpdesk.afrigen-d.org/helpdesk/
and choose the queue that best matches:

  • Genotyping & Imputation Service — anything to do with
    imputation jobs, reference panels, or fedimpute.afrigen-d.org.
  • Pangenome & Reference Graph — questions about the pangenome
    reference work.
  • System/Technical Administration — server access, account
    issues, infrastructure.
  • General Enquiries — anything that doesn't fit a specific queue.

To help the team respond faster, include:

  1. What you were doing — the URL or feature you used.
  2. What you expected — the result you wanted.
  3. What happened instead — the error message (paste full text;
    screenshots are also useful).
  4. The Job ID or ticket reference — if the issue is about a
    specific imputation run or earlier ticket.

You'll receive an email confirmation with the ticket number.
Replies to that email automatically add to the ticket thread, so
you can continue the conversation by replying to email.