Comprehensive Peer-Reviewed Literature Report

Browser-Based Genomic Data Processing: Standards, Tools, and Architectures

Date: May 22, 2026  |  Status: Comprehensive Review & Platform Specification

1. The SAM/BAM Format: Foundational Standard for Sequence Alignment

The Sequence Alignment/Map (SAM) format, introduced by Li et al. (2009), represents the universal standard for storing read alignments against reference sequences. The format supports short and long reads (up to 128 Mbp) produced by different sequencing platforms and is flexible in style, compact in size, and efficient in random access. The companion SAMtools suite implements various utilities for post-processing alignments including indexing, variant calling, and format conversion. The BAM format is the binary, compressed equivalent of SAM, indexed via BAI files for rapid random access.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. PMID: 19505943.[reference:0]

The SAM/BAM v1.5 extensions (2015) added support for de novo assemblies, padded reference sequences (with gap characters), annotation of reads or regions of the reference, and the option of embedding the reference sequence within the file.[reference:1] The canonical specification (SAMv1.tex) defines the SAM format, BAM binary equivalent, BAI indexing format, and CRAM compressed format.[reference:2]

Key SAM/BAM Processing Tools

  • Sambamba: A faster multi-core alternative to samtools exploiting parallel processing for coverage analysis and powerful filtering.[reference:3]
  • BamSnap: Lightweight viewer for sequencing reads in BAM files utilizing graphics libraries and BAM indexing.[reference:4]
  • Bambino: Variant detector and graphical alignment viewer for SAM/BAM format capable of pooling data from multiple source files.[reference:5]
  • cljam: Library for handling SAM/BAM data with parallel processing supporting cloud and PC cluster environments.[reference:6]

2. VCF Format and Processing: The Variant Call Ecosystem

The Variant Call Format (VCF) is the standard output format for software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify VCF files efficiently. Multiple tools have emerged to address these needs.

Pedersen BS, Quinlan AR. Vcfexpress: flexible, rapid user-expressions to filter and format VCFs. Bioinformatics. 2025 Mar 4;41(3):btaf097. doi: 10.1093/bioinformatics/btaf097. PMID: 40037622.[reference:7]

Key VCF processing tools include:

  • vcfexpress: High-performance Rust-based toolset nearly as fast as BCFTools with Lua user expressions for precise filtering and reporting.[reference:8]
  • BCFTools: The widely-used C-based suite for VCF/BCF manipulation, filtering, and statistical analysis.
  • @gmod/vcf (vcf-js): High-performance streaming Variant Call Format parser in pure JavaScript, enabling browser-based VCF processing without server-side dependencies.[reference:9][reference:10]
  • 123VCF: Java-based tool with disk-streaming real-time filtering algorithm for managing sizable variant files on conventional desktop computers.[reference:11]
  • vcfpp: C++ API for rapid processing of the VCF format.[reference:12]
  • cyvcf2: Fast, flexible variant analysis with Python.[reference:13]

3. Browser-Based Genome Visualization: HTML5, JavaScript & WebAssembly

The landscape of browser-based genomic visualization has evolved dramatically, with several mature, peer-reviewed platforms now available:

3.1 IGV.js — Embeddable Integrative Genomics Viewer

igv.js is an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). It can be dropped into any web page with a single line of code, has no external dependencies, and runs completely in the web browser with no backend server and no data pre-processing required. It supports a wide range of genomic track types and file formats, including aligned reads (BAM/CRAM), variants (VCF), coverage, signal peaks, annotations, eQTLs, GWAS, and copy number variation.[reference:14][reference:15]

Robinson JT, Thorvaldsdóttir H, Turner D, et al. igv.js: an embeddable JavaScript implementation of the Integrative Genomics Viewer (IGV). bioRxiv 2020.05.03.075499.[reference:16]

The Track System is the core visualization engine of IGV.js. BAMTrack handles sequencing alignment data with coverage visualization and individual alignment rendering sub-components. The modular architecture centers around a Browser controller orchestrating track views, data management, and user interactions through a column-based layout system.[reference:17]

3.2 JBrowse & JBrowse 2

JBrowse is a fast, scalable genome browser built completely with JavaScript and HTML5. It supports GFF3, BED, FASTA, Wiggle, BigWig, BAM, CRAM, VCF, and REST. BAM, BigBed, BigWig, and VCF data are displayed directly from chunks of the compressed binary files with no conversion needed. JBrowse requires no back-end server code—it reads chunks of files directly over HTTP using byte-range requests.[reference:18]

JBrowse 2, published in Genome Biology (2023), is a ground-up rewrite using ReactJS and TypeScript offering enhanced visualization of complex structural variation and evolutionary relationships with syntenic visualizations. It is a modular, pluggable, open-source platform for visualizing and integrating biological data.[reference:19][reference:20]

Diesh C, Stevens GJ, Xie P, et al. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biology. 2023;24:74.[reference:21]

3.3 Dalliance / Biodalliance

Dalliance is a genome viewing tool that offers a high level of interactivity while working entirely within the web browser. It integrates data from a wide variety of sources and can load data directly from popular genomics file formats including bigWig, BAM, and VCF. It was the first pure JavaScript component to natively support NGS data in binary BAM files in a web browser.[reference:22][reference:23]

3.4 Chromatic — WebAssembly Cancer Genome Viewer

Chromatic is the first cancer bioinformatics tool developed using WebAssembly technology, comprising a portable, low-level byte code format. It enables researchers to visually inspect genomic variations identified through NGS of cancer data sets to determine whether such calls are valid.[reference:24]

3.5 Scribl — HTML5 Canvas Genomic Graphics Library

Scribl is an HTML5 Canvas-based graphics library for visualizing genomic data over the web, specifically targeting coordinate-based data such as genomic features, DNA sequence, and genetic variants.[reference:25]

3.6 Genome Maps

Genome Maps is an open-source high-performance HTML5 web-based genome browser that allows local upload of huge genomic data files (e.g., VCF or BAM) that can be dynamically visualized in real time at the client side, facilitating management of medical data affected by privacy restrictions.[reference:26]

3.7 Genoverse & Pileup.js

Genoverse is a JavaScript and HTML5-based genome browser that is portable, customizable, and backend-independent. Pileup.js is an interactive in-browser track viewer built from the ground up using modern JavaScript (ES2015, React.js, Promises).[reference:27][reference:28]

4. WebAssembly for Genomics: Running Native Tools in the Browser

Biowasm is a repository of genomics tools compiled from C/C++ to WebAssembly so they can run in a web browser. Supported packages include samtools, bedtools, bcftools, minimap2, seqtk, fastp, kalign, and others. Key use cases include sandbox.bio (interactive tutorials), 42basepairs (genomic file preview), and Ribbon (BAM parsing, coverage estimation, and subsampling).[reference:29]

Aboukhalil R. biowasm: WebAssembly modules for genomics. GitHub repository. 2019.[reference:30]

Aioli is the companion framework for running genomics command-line tools in the browser using WebAssembly and WebWorkers. It creates a single WebWorker in which all WebAssembly tools run, uses a PROXYFS virtual filesystem so output of one tool can be used as input of another, and uses WORKERFS to mount local files efficiently without loading all contents into memory.[reference:31][reference:32]

The @biowasm/aioli npm package (v3.1.0) provides a clean API for integrating these capabilities. WebAssembly SIMD detection ensures fallback to non-SIMD versions when needed. Communication with the WebWorker uses the Comlink library for seamless async operations.[reference:33]

5. The 1000 Genomes Project: Ancestry-Stratified Reference Datasets

The 1000 Genomes Project (1KGP) provides high-coverage whole genome sequences encompassing five biogeographical populations: African (AFR), American (AMR), East Asian (EAS), European (EUR), and South Asian (SAS). Recent long-read sequencing efforts (2024) identified an average of 24,543 high-confidence structural variants per genome, including pathogenic expansions within disease-associated repeats undetected by short reads.[reference:34][reference:35]

Schloissnig S, et al. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. bioRxiv 2024.[reference:36]

These population-stratified reference datasets are essential for algorithmic fairness testing in genomic pipelines, ensuring that variant calling and clinical risk assessment tools perform equitably across all ancestry groups. The pharmacogenetic analysis of structural variation in 1KGP (2024) characterized PGx variation across 2,504 samples using tools like PyPGx for star-allele calling.[reference:37]

6. Client-Side PDF Generation for Genomic Reports

jsPDF is a HTML5 client-side solution for generating PDF documents directly in the browser. It requires no backend server, reduces server load, and offers immediate feedback. It provides a JavaScript API that constructs PDF files according to the PDF specification, allowing developers to add text, shapes, tables, and images programmatically.[reference:38][reference:39]

Combined with jsPDF-AutoTable, genomic variant tables can be rendered with proper pagination, column formatting, and header repetition—critical for producing publication-quality variant call reports (VCR) directly from browser-based analyses.

7. Platform Architecture: Unified Browser-Based Processing Pipeline

Drawing from the peer-reviewed literature above, the present platform implements:

  1. File Ingestion: Client-side parsing of VCF (using @gmod/vcf-compatible streaming parser), SAM (text-based line parser), and BAM (via WebAssembly SAMtools integration or text-based SAM fallback).
  2. Variant Processing: Filtering by QUAL threshold, PASS filter status, and chromosome; computation of per-chromosome variant counts, transition/transversion ratios, and allele frequency distributions.
  3. Alignment Processing: Coverage depth estimation, mapping quality distribution, flag-based read categorization, and per-reference statistics.
  4. Visualization: HTML5 Canvas-based coverage track rendering, variant density histograms, and chromosome distribution bar charts.
  5. PDF Export: jsPDF-based report generation with embedded statistics tables, variant lists, and quality metrics.
  6. CSV Export: SheetJS-based spreadsheet export for downstream analysis.

References

[1] Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9. PMID: 19505943.[reference:40]
[2] SAM/BAM format v1.5 extensions for de novo assemblies. 2015.[reference:41]
[3] Pedersen BS, Quinlan AR. Vcfexpress: flexible, rapid user-expressions to filter and format VCFs. Bioinformatics. 2025;41(3):btaf097. PMID: 40037622.[reference:42]
[4] Robinson JT, et al. igv.js: an embeddable JavaScript implementation of the IGV. bioRxiv 2020.05.03.075499.[reference:43]
[5] Diesh C, Stevens GJ, Xie P, et al. JBrowse 2: a modular genome browser. Genome Biology. 2023;24:74.[reference:44]
[6] Down TA, et al. Dalliance: interactive genome viewing on the web. Bioinformatics. 2011.[reference:45]
[7] Aboukhalil R. biowasm: WebAssembly modules for genomics. 2019.[reference:46]
[8] Aioli: Running genomics CLI tools in the browser using WebAssembly. npm @biowasm/aioli v3.1.0.[reference:47]
[9] @gmod/vcf: High performance VCF parser in pure JavaScript. GitHub GMOD/vcf-js.[reference:48]
[10] Schloissnig S, et al. Long-read sequencing and SV characterization in 1,019 1KGP samples. bioRxiv 2024.[reference:49]
[11] Pharmacogenetic analysis of structural variation in the 1000 Genomes Project. PMC 2024.[reference:50]
[12] Chromatic: WebAssembly-Based Cancer Genome Viewer. 2018.[reference:51]
[13] Scribl: HTML5 Canvas-based graphics library for genomic data visualization.[reference:52]
[14] Genome Maps: open-source high-performance HTML5 genome browser.[reference:53]
[15] jsPDF: Client-side JavaScript PDF generation.[reference:54]

🧬 Variant Call Format (VCF) Results

CHROMPOSIDREFALTQUALFILTER

📐 Sequence Alignment (SAM/BAM) Results

QNAMEFLAGRNAMEPOSMAPQCIGARSEQ (truncated)

📈 Genomic Data Visualization

📝 Processing Log

Files: 0 Variants: 0 Alignments: 0 Memory: --