Skip to content

samplesheet-parser

Format-agnostic parser for Illumina SampleSheet.csv files.

Supports both the classic IEM V1 format (bcl2fastq era) and the modern BCLConvert V2 format (NovaSeq X series) — with automatic format detection, bidirectional conversion, index validation, Hamming distance checking, diff comparison, multi-sheet merging, splitting, filtering, programmatic sheet creation, and a full-featured CLI.

PyPI version Python 3.12+ License: Apache 2.0 Tests codecov DOI

samplesheet-parser overview


The problem this solves

Labs running mixed instrument fleets — older NovaSeq 6000 alongside newer NovaSeq X series — produce two incompatible SampleSheet formats. BCLConvert V2 sheets use [BCLConvert_Settings] / [BCLConvert_Data] sections, OverrideCycles for UMI encoding, and FileFormatVersion in the header. IEM V1 sheets use IEMFileVersion and a flat [Data] section.

Existing tools either hard-code one format or require the caller to know which format they have. samplesheet-parser auto-detects the format, exposes a consistent interface for both, converts between formats, validates index integrity (including Hamming distance), diffs sheets to catch accidental changes before a run starts, and writes new sheets programmatically — so you never have to hand-edit a CSV again.


Key features

Feature Description
Auto-detection Three-step format detection — no hints required
V1 & V2 parsing Consistent samples() / index_type() interface for both formats
Bidirectional conversion V1 → V2 and V2 → V1 (lossy, with warnings)
Validation 9 checks covering index chars, length, duplicates, Hamming distance, adapters
Diff Cross-format structural comparison with per-field change records
Merge Combine multiple per-project sheets with collision detection
Split Divide a combined sheet into per-project or per-lane files
Filter Extract a sample subset by project, lane, or ID (glob patterns supported)
Writer Fluent API for building or editing sheets programmatically
CLI Full shell interface with --format json for pipeline integration
UMI parsing Decode OverrideCycles to extract UMI length and location

Quickstart

pip install samplesheet-parser
from samplesheet_parser import SampleSheetFactory, SampleSheetValidator

sheet = SampleSheetFactory().create_parser("SampleSheet.csv", parse=True)
result = SampleSheetValidator().validate(sheet)
print(result.summary())
# PASS — 0 error(s), 0 warning(s)

See Installation for full setup options, jump to the Quickstart guide, or browse the Examples for end-to-end runnable scenarios.