Skip to content

API Reference

All public classes are importable directly from the top-level package:

from samplesheet_parser import (
    SampleSheetFactory,
    SampleSheetV1,
    SampleSheetV2,
    SampleSheetConverter,
    SampleSheetValidator,
    SampleSheetDiff,
    SampleSheetWriter,
    SampleSheetMerger,
    SampleSheetSplitter,
    SampleSheetFilter,
    normalize_index_lengths,
)

SampleSheetFactory

Method / attribute Returns Description
create_parser(path, *, clean, experiment_id, parse) SampleSheetV1 \| SampleSheetV2 Auto-detect format and return the appropriate parser
get_umi_length() int UMI length from the current parser
.version SampleSheetVersion \| None Detected format version

SampleSheetV1 / SampleSheetV2 (shared interface)

Method / attribute Returns Description
parse(do_clean=True) None Parse all sections
samples() list[dict] One record per unique sample
index_type() str "dual", "single", or "none"
.adapters list[str] Adapter sequences
.experiment_name str \| None Run/experiment name

V2-only

Method Returns Description
get_umi_length() int UMI length from OverrideCycles
get_read_structure() ReadStructure Parsed read structure dataclass

SampleSheetConverter

SampleSheetConverter(path, *, workflow: Workflow | str | None = None)
Method / attribute Returns Description
to_v2(output_path) Path Convert IEM V1 → BCLConvert V2
to_v1(output_path) Path Convert BCLConvert V2 → IEM V1 (lossy)
.source_version SampleSheetVersion \| None Auto-detected format of the input
.workflow_override Workflow \| None Resolved workflow override, if any

The workflow parameter accepts "a", "b", or a Workflow enum value and overrides auto-detection of the i5 orientation workflow from the instrument header. See Conversion → Index 2 orientation.


samplesheet_parser.instruments

i5 orientation workflow classification helpers.

from samplesheet_parser.instruments import (
    Workflow,
    detect_workflow,
    parse_workflow,
    reverse_complement,
    WORKFLOW_A_INSTRUMENTS,
    WORKFLOW_B_INSTRUMENTS,
    AMBIGUOUS_INSTRUMENTS,
)
Name Kind Description
Workflow StrEnum Workflow.A (i5 forward) / Workflow.B (i5 RC'd on chip)
detect_workflow(name) Workflow \| None Classify an instrument name; None for unknown or ambiguous (e.g. NovaSeq 6000)
parse_workflow(value) Workflow \| None Coerce a CLI string ("a" / "b") to Workflow
reverse_complement(seq) str Reverse-complement a DNA sequence (preserves N, case-preserving)
WORKFLOW_A_INSTRUMENTS frozenset[str] Normalised names of workflow-A instruments
WORKFLOW_B_INSTRUMENTS frozenset[str] Normalised names of workflow-B instruments
AMBIGUOUS_INSTRUMENTS frozenset[str] Instruments whose workflow depends on chemistry and require an explicit override

SampleSheetValidator

Method Returns Description
validate(sheet, *, min_hamming_distance=3) ValidationResult Run all checks; returns structured result

ValidationResult

Attribute / method Type Description
is_valid bool False if any errors present
errors list[ValidationIssue] Structured error records
warnings list[ValidationIssue] Structured warning records
summary() str One-line human-readable summary

ValidationIssue

Attribute Type Description
code str e.g. "DUPLICATE_INDEX"
message str Human-readable description
context dict Relevant sample IDs, lane, etc.

SampleSheetDiff

Method Returns Description
compare() DiffResult Full comparison across header, reads, settings, and samples

DiffResult

Attribute / method Type Description
has_changes bool True if any difference detected
summary() str Human-readable one-paragraph summary
header_changes list[HeaderChange] Header, reads, and settings diffs
samples_added list[dict] Records present in new sheet only
samples_removed list[dict] Records present in old sheet only
sample_changes list[SampleChange] Per-sample field-level diffs
source_version SampleSheetVersion Format of the old sheet
target_version SampleSheetVersion Format of the new sheet

SampleSheetWriter

Method / attribute Returns Description
SampleSheetWriter(version=) Instantiate for SampleSheetVersion.V1 or .V2
from_sheet(sheet, version=) SampleSheetWriter Load a parsed sheet for editing; optionally change format
set_header(*, run_name, platform, ...) self Set header fields (fluent)
set_reads(*, read1, read2, index1, index2) self Set read cycle counts (fluent)
set_adapter(adapter_read1, adapter_read2) self Set adapter sequences (fluent)
set_override_cycles(override) self Set OverrideCycles — V2 only (fluent)
set_software_version(version) self Set SoftwareVersion — V2 only (fluent)
set_setting(key, value) self Set an arbitrary settings key/value (fluent)
add_sample(sample_id, *, index, ...) self Append a sample row (fluent)
remove_sample(sample_id, *, lane=) self Remove sample(s) by ID, optionally scoped to a lane (fluent)
update_sample(sample_id, *, lane=, **fields) self Update fields on an existing sample in-place (fluent)
clear_samples() self Remove all samples while preserving header/reads/settings (fluent)
write(path, *, validate=True) Path Serialise to disk; validates first by default
to_string() str Serialise to string without writing to disk
.sample_count int Number of samples currently in the writer
.sample_ids list[str] Sample IDs currently in the writer

SampleSheetMerger

Method / attribute Returns Description
SampleSheetMerger(target_version=, min_hamming_distance=3) Instantiate with target format and optional Hamming threshold
add(path) self Register an input sheet path (fluent)
merge(output_path, *, validate=True, abort_on_conflicts=True) MergeResult Run the merge and write output

MergeResult

Attribute / method Type Description
has_conflicts bool True if any conflict recorded
sample_count int Samples in the merged output
output_path Path \| None Path written; None if write was aborted
source_versions dict[str, str] Per-input-file detected version
conflicts list[MergeConflict] Structured conflict records
warnings list[MergeConflict] Structured warning records
summary() str One-line human-readable summary

SampleSheetSplitter

Method / attribute Returns Description
SampleSheetSplitter(path, *, by="project", target_version=None, unassigned_label="unassigned") Instantiate with input path and grouping strategy
split(output_dir, *, prefix="", suffix="_SampleSheet.csv", validate=True) SplitResult Parse input and write one file per group

SplitResult

Attribute / method Type Description
output_files dict[str, Path] Group key → path of the written file
sample_counts dict[str, int] Group key → number of samples written
warnings list[str] Non-fatal issues (incomplete records, unassigned samples)
source_version str "V1" or "V2"
summary() str One-line human-readable summary

SampleSheetFilter

Method / attribute Returns Description
SampleSheetFilter(path, *, target_version=None) Instantiate with input path
filter(output_path, *, project=None, lane=None, sample_id=None, validate=True) FilterResult Write filtered copy to output_path; at least one criterion required

sample_id supports glob patterns (e.g. "CTRL_*") via fnmatch.fnmatchcase — matching is always case-sensitive.

FilterResult

Attribute / method Type Description
matched_count int Samples that passed all filter criteria
total_count int Total samples in the input sheet
output_path Path \| None Path written; None when no samples matched
source_version str "V1" or "V2"
summary() str One-line human-readable summary

normalize_index_lengths

normalize_index_lengths(
    samples: list[dict],
    strategy: str,                  # "trim" or "pad"
    index1_key: str | None = None,  # auto-detected if None
    index2_key: str | None = None,  # auto-detected if None
) -> list[dict]

Normalizes index sequence lengths across a list of sample dicts. See Index Utilities for details.


Enums

from samplesheet_parser.enums import SampleSheetVersion, InstrumentPlatform, UMILocation

SampleSheetVersion.V1   # IEM / bcl2fastq
SampleSheetVersion.V2   # BCLConvert