Changelog¶
The full changelog is maintained in the repository:
Changelog¶
All notable changes to samplesheet-parser are documented here.
The format follows Keep a Changelog.
[Unreleased]¶
[1.3.0] - 2026-05-20¶
Fixed¶
SampleSheetConverternow reverse-complementsIndex2for workflow-B instruments (NovaSeq X / X Plus, NextSeq 500/550/1000/2000, iSeq 100, MiniSeq, HiSeq 3000/4000) on V1 ↔ V2 conversion.bcl2fastqrecords i5 as read on the chip (reverse-complemented for workflow-B instruments) whileBCLConvertexpects i5 in the forward orientation; previously the converter passedIndex2through verbatim, silently producing V2 sheets that demultiplexed to the wrong samples (#30).- The workflow is auto-detected from
[Header] Instrument Type(V1) orInstrumentPlatform/InstrumentType(V2). When the sheet has a non-emptyIndex2column and the workflow cannot be determined, the converter raisesValueErrorand the CLI exits non-zero rather than guessing. Behavior change: dual-index sheets with no recognised instrument header that previously converted (with silently wrongIndex2) will now fail loudly untilworkflow=/--workflowis supplied. - V2 → V1 conversion now preserves the instrument as
Instrument Typein the V1[Header]so the workflow signal survives a round trip.
Added¶
samplesheet_parser.instruments— public module exposingWorkflow(StrEnum:A/B),WORKFLOW_A_INSTRUMENTS/WORKFLOW_B_INSTRUMENTS/AMBIGUOUS_INSTRUMENTStables,detect_workflow(),parse_workflow(), andreverse_complement().SampleSheetConverter(path, *, workflow=...)— explicit workflow override accepting"a","b", or aWorkflowenum value. Required for ambiguous instruments such asNovaSeq 6000(workflow depends on v1.0 vs v1.5 chemistry).samplesheet convert --workflow {a,b}/-w— CLI override that takes precedence over auto-detection.
[1.2.0] - 2026-04-14¶
Changed¶
- Minimum
typerversion raised from>=0.9to>=0.12to ensure compatibility withclick >=8.2(make_metavarsignature change).
Added¶
SampleSheetSplitterandSplitResult— splits a combined sheet into one file per project or per lane (the inverse ofSampleSheetMerger).SampleSheetSplitter(path, *, by="project"|"lane", target_version=None, unassigned_label="unassigned")split(output_dir, *, prefix="", suffix="_SampleSheet.csv", validate=True) → SplitResult- Header, reads, and settings are copied into every output file; only sample
rows are divided. Samples with no project/lane are grouped under a
configurable
unassigned_label. - Incomplete records (missing
Sample_IDorIndex) are skipped with a warning. Groups that produce no valid samples are omitted with a warning. -
SampleSheetSplitterandSplitResultare exported from the top-level package. -
SampleSheetFilterandFilterResult— extracts a subset of samples from a sheet by project, lane, or sample ID (with glob support). SampleSheetFilter(path, *, target_version=None)filter(output_path, *, project=None, lane=None, sample_id=None, validate=True) → FilterResult- Multiple criteria are ANDed — a sample must match all provided criteria.
sample_idsupports glob patterns (e.g."CTRL_*","SAMPLE_00[1-3]") viafnmatch.fnmatchcasefor consistent case-sensitive behavior across platforms.- No file is written when no samples match;
FilterResult.output_pathisNonein that case. -
SampleSheetFilterandFilterResultare exported from the top-level package. -
samplesheet splitCLI command — splits a combined sheet into per-group files. Accepts--by project|lane,--output-dir / -d,--to v1|v2,--format json|text, and--prefix. Exits 0 on clean split, 1 if warnings were produced, 2 on bad arguments or unreadable files. -
samplesheet filterCLI command — filters samples by--project / -p,--lane / -l, and--sample-id / -s(glob patterns supported). Accepts--output / -o,--to v1|v2, and--format json|text. Exits 0 when samples match, 1 when no samples match, 2 on bad arguments. -
SampleSheetWriter.clear_samples()— public method to remove all samples from a writer while preserving header, reads, and settings. Enables the copy-metadata-then-repopulate pattern without accessing the private_samplesattribute. -
--format jsonforsamplesheet convert— the convert command now accepts--format jsonand emits a structured JSON object withinput,output,source_version, andtarget_versionkeys. All seven CLI subcommands now support--format jsonuniformly. -
Bioconda recipe (
recipes/meta.yaml) — anoarch: pythonconda recipe withloguruas the only runtime dependency. Python version is not pinned (noarch packages rely on the conda solver for compatibility). The CLI extra (typer) is intentionally omitted from the base recipe so the conda package stays lightweight; users who need thesamplesheetCLI canconda install typeralongside. -
nf-core compatible Nextflow modules — three modules following nf-core/modules conventions, each with
main.nf,meta.yml,environment.yml, nf-test tests, andtags.yml: SAMPLESHEETPARSER_VALIDATE— validates V1/V2 sheets and emits a structured JSON report; exits 1 on errors so pipelines fail before demultiplexing.SAMPLESHEETPARSER_CONVERT— bidirectional V1↔V2 conversion with target version passed as a channel value.-
SAMPLESHEETPARSER_INFO— emits a JSON metadata summary (format, sample count, lanes, index type, read lengths, adapters) for run logging and conditional pipeline branching. -
Example scripts —
examples/demo_splitter.pyandexamples/demo_filter.pydemonstrating the new split and filter APIs with three and four runnable scenarios respectively.
Tests¶
-
tests/test_splitter.py— 46 tests covering: split by project and lane, header/reads/settings preservation, unassigned samples (custom label), incomplete records (missingSample_IDorIndex, all-incomplete group), empty input, target version override, custom prefix/suffix, output directory creation,SplitResultsummary andsource_version, and error conditions (FileNotFoundError,ValueErroron invalid--by). -
tests/test_filter.py— 46 tests covering: filter by project, lane (string and int), and sample ID (exact and glob); multiple ANDed criteria; no-match behaviour (no file written,matched_count == 0); header/settings preservation; target version override;FilterResultattributes; incomplete records; and error conditions (ValueError,FileNotFoundError). -
TestCLISplitandTestCLIFilterintests/test_cli.py— 13 tests each covering happy paths, JSON output, warning exit codes, error exit codes, exception paths (viamonkeypatch), and text output formatting. -
Three new
TestCLIConverttests covering--format jsonexit code, JSON output structure (source_version,target_version,input,outputkeys), and the--format xmlinvalid-format guard.
[1.1.0] - 2026-04-05¶
Added¶
-
--version/-VCLI flag — prints the installed package version and exits. Reads the version viaimportlib.metadataso the full package is not loaded just to print a version string. -
demo_converter.py— runnable example covering V1→V2 conversion, V2→V1 (lossy) conversion, and a full V1→V2→V1 roundtrip with sample identity verification. -
demo_diff.py— runnable example covering five diff scenarios: identical sheets, header change, sample added, index correction, and cross-format (V1 vs its V2 conversion) diff. -
demo_writer.py— runnable example covering the fluentSampleSheetWriterAPI: building V1 and V2 sheets from scratch, correcting a sample index on an existing sheet, and removing a sample before submission. -
demo_index_utils.py— runnable example coveringnormalize_index_lengthswith trim and pad strategies, dual-index normalization, and a real-sheet walkthrough.
Fixed¶
-
SampleSheetFactory.create_parser()now returns a typed local variable instead ofself.parser, resolving a mypyreturn-valueerror caused by the instance attribute being typed asSampleSheetV1 | SampleSheetV2 | None. -
cli.pyfallback type aliases (_FormatOption,_OutputOption,_VersionOption) reduced fromtype: ignore[assignment,misc]totype: ignore[misc]— theassignmentsuppression was unused under current mypy.
Tests¶
-
TestCLIVersion— four new tests covering--versionexit code,-Vshort flag, package name in output, version string in output, andPackageNotFoundErrorfallback to"unknown". -
Five Copilot PR #23 review comments resolved: long test signatures and
runner.invoke(...)calls wrapped to the 100-char line limit.
[1.0.0] - 2026-04-05¶
Added¶
-
py.typedmarker — package now ships inline type information per PEP 561, enabling mypy and pyright to type-check downstream code without extra configuration. -
InstrumentPlatformandUMILocationenums exported — both were already defined inenums.pybut not part of the public API. They are now importable directly from the top-level package and listed in__all__. -
.pre-commit-config.yaml— pre-commit hook configuration included in the repository (black, ruff with--fix, mypy, and standard file hygiene hooks) so contributors get the same checks locally that CI enforces.
Fixed¶
-
SampleSheetFactory.create_parser()now returns a typed local variable instead ofself.parser, resolving a mypyreturn-valueerror caused by the instance attribute being typed asSampleSheetV1 | SampleSheetV2 | None. -
SampleSheetMerger._parse_all()guards againstfactory.versionbeingNonebefore accessing.value, fixing a potentialAttributeErroron unexpected parse paths. -
Removed redundant
type: ignore[assignment]suppressions inindex_utils.pythat were no longer needed under strict mypy.
Changed¶
-
Development status classifier updated from
3 - Alphato5 - Production/Stable. -
Ruff config adds
[tool.ruff.lint.per-file-ignores]to suppress E402 forsamplesheet_parser/__init__.py, where the version-detection block intentionally precedes the package re-exports. -
Stability guarantee —
1.0.0marks the first stable release. The public API (all names in__init__.__all__) is now subject to semantic versioning: breaking changes will not be made without a major version bump.
[0.3.4] - 2026-04-04¶
Added¶
-
samplesheet infoCLI command — prints a concise summary of any V1 or V2 sample sheet (format, sample count, lanes, index type, read lengths, adapters, experiment name, instrument). Supports--format jsonfor machine-readable output; exits 0 on success, 2 on unreadable files. -
Configurable Hamming distance threshold —
SampleSheetValidator.validate()now accepts amin_hamming_distancekeyword argument (default: 3) so labs using longer indexes can enforce stricter thresholds without changing the module-level constant. SampleSheetMergeraccepts the same parameter in__init__()and applies it to both the intra-sheet and cross-sheet Hamming checks as well as the post-merge validation step.-
samplesheet validateexposes--min-hamming N(must be ≥ 1; exits 2 on invalid input). The JSON output includesmin_hamming_distancefor auditability. -
normalize_index_lengths()utility — normalizes index sequence lengths across a list of sample dicts (output ofsheet.samples()) to a consistent length before merging sheets with mixed-length indexes. strategy="trim"— trims all indexes to the shortest sequence length.strategy="pad"— pads shorter indexes to the longest length using"N"wildcard characters (supported by BCLConvert ≥ 3.9 and bcl2fastq ≥ 2.20).- Auto-detects V1-style (
index/index2) and V2-style (Index/Index2) field names; explicitindex1_key/index2_keyoverrides supported. -
Exported from the top-level package as
normalize_index_lengths. -
CI / pre-commit integration guide in README — GitHub Actions workflow and pre-commit hook configuration for automatic sample sheet validation on every commit or pull request that touches a
SampleSheet.csv.
Fixed¶
_detect_key()inindex_utilsnow selects the key with at least one non-empty value before falling back to key presence, preventing silent normalization skip when a key exists but all its values areNoneor"".
Changed¶
--min-hammingCLI option default and help text are now derived from theMIN_HAMMING_DISTANCEconstant invalidators.pyto prevent drift.
[0.3.3] - 2026-03-13¶
Documentation¶
- Add architecture diagram showing full library structure including CLI and SampleSheetMerger
- Update README with architecture overview, solid vs dashed line legend
- Add
[Custom_Sections*]to V1 and V2 format descriptions
[0.3.2] - 2026-03-12¶
Added¶
.zenodo.jsonmetadata file for automatic Zenodo archival and DOI minting on GitHub releasesCITATION.cfffile enabling GitHub's "Cite this repository" button and standardized software citation for downstream users
[0.3.1] - 2026-03-11¶
Fixed¶
SampleSheetMerger—INDEX_DISTANCE_TOO_LOWandDUPLICATE_INDEXwere reported twice in--forcemerges (once by the pre-merge cross-sheet check, once by the post-merge validator). Duplicate codes are now suppressed in_validate_merged— the more descriptive pre-merge message is always preferred.
[0.3.0] - 2026-03-10¶
Added¶
SampleSheetMerger— combines multiple per-project sample sheets into a single sheet for a flow cell run.add(path)— register an input sheet (V1 or V2); mixed formats are auto-converted to the target version before merging.merge(output_path, validate=True, abort_on_conflicts=True)— merges all registered sheets, writes the combined output, and returns aMergeResult.- Index collision detection — raises a conflict when two samples share the same lane and index sequence across project boundaries.
- Hamming distance check — warns when the combined I7+I5 distance between any two samples across sheets falls below 3.
- Read-length conflict detection — raises a conflict when registered
sheets specify incompatible
Read1Cycles/Read2Cycles(V2) or[Reads]lengths (V1). - Adapter conflict detection — warns when adapter sequences differ across sheets.
- Mixed-format warning — emits a warning when V1 and V2 sheets are combined, with the auto-conversion strategy logged.
MergeResultdataclass — exposesconflicts,warnings,sample_count,source_versions,output_path,has_conflicts, andsummary(); consistent withValidationResultandDiffResult.abort_on_conflicts=True(default) — skips writing the output file when any conflict is present; setFalse(via--forcein the CLI) to write despite conflicts.-
SampleSheetMergerandMergeResultare exported from the top-level package. -
samplesheetCLI — command-line interface exposing the four core operations, available as an optional extra (pip install "samplesheet-parser[cli]"; addstyperas a dependency). samplesheet validate <file>— exits 0 if clean, 1 if errors, 2 on usage/parse errors. Supports--format jsonfor machine-readable output.samplesheet convert <file> --to <v1|v2> --output <path>— converts between formats; exits 0 on success, 1 on conversion error, 2 on bad arguments.samplesheet diff <old> <new>— exits 0 if identical, 1 if differences detected (useful in CI pre-run checks). Supports--format json.samplesheet merge <files...> --output <path>— merges two or more sheets; exits 0 on clean merge, 1 on conflicts or warnings, 2 on bad arguments. Supports--force,--to <v1|v2>, and--format json.- All commands print errors to stderr and structured data to stdout.
- Entry point configured in
pyproject.toml:samplesheet = "samplesheet_parser.cli:main". - Module imports cleanly without
typerinstalled — missing-extra error is surfaced only at invocation time.
Changed¶
- README updated to document
SampleSheetMerger, thesamplesheetCLI, all new API reference tables, and installation instructions for the[cli]extra. CONTRIBUTING.mdupdated with CLI testing instructions and the new[dev,cli]install target.
[0.2.0] - 2026-02-25¶
Added¶
SampleSheetWriter— programmatic creation and editing of IEM V1 and BCLConvert V2 sample sheets.- Build sheets from scratch with a fluent API:
set_header(),set_reads(),set_adapter(),set_override_cycles(),set_software_version(),set_setting(),add_sample(). from_sheet(sheet, version=)class method — load any parsed V1/V2 sheet, edit in place, and write back; pass a differentversionto convert format while editing.remove_sample(sample_id, lane=)andupdate_sample(sample_id, **fields)for surgical edits to existing sheets.write(path, validate=True)— runsSampleSheetValidatorbefore writing by default; raisesValueErrorwith the full error list if validation fails.to_string()— serialise to a string without writing to disk (useful for testing and inspection).- CSV safety:
_validate_fieldrejects commas, newlines, and quotes in all free-text inputs (sample_id,index,project,run_name, adapter sequences, custom column keys/values, etc.) at input time with a clear error message. -
SampleSheetWriteris now exported from the top-level package. -
SampleSheetDiff— structured comparison of two sample sheets across any combination of V1 and V2 formats. - Compares header, reads, settings, and samples in a single
compare()call. - Returns a
DiffResultdataclass withheader_changes,samples_added,samples_removed, andsample_changes. - V1-only metadata columns (
I7_Index_ID,I5_Index_ID,Sample_Name,Description) are suppressed during cross-format comparison to avoid format-noise diffs. -
DiffResult.summary()andDiffResult.has_changesfor quick inspection. -
INDEX_DISTANCE_TOO_LOWvalidation check —SampleSheetValidatornow computes the Hamming distance between every pair of index sequences within each lane and warns when the distance falls below the recommended minimum of 3. For dual-index sheets the combined I7+I5 sequence is used so that pairs well-separated on I5 are not incorrectly flagged. -
_hamming_distancehelper — module-level pure function, independently testable, handles sequences of unequal length by comparing up to the shorter sequence length. -
scripts/demo_writer.py— smoke-test script demonstrating V1/V2 from-scratch creation and round-trip editing. -
scripts/demo_diff.py— smoke-test script demonstrating identical, modified, and cross-format diff scenarios. -
.github/copilot-instructions.md— Copilot review instructions scoping suggestions to logic bugs, test coverage gaps, and type errors.
Changed¶
- README updated to document
SampleSheetDiff,SampleSheetWriter, Hamming distance validation, and the full API reference tables.
[0.1.5] - 2026-02-23¶
Added¶
SampleSheetConverter— bidirectional V1 ↔ V2 format conversion.to_v2(output_path)— converts IEM V1 to BCLConvert V2.to_v1(output_path)— converts BCLConvert V2 to IEM V1 (lossy; V2-only fields dropped with a warning).-
Auto-detects source format via
SampleSheetFactory. -
scripts/demo_converter.py— smoke-test script for converter scenarios including V1→V2→V1 and V2→V1→V2 round-trips. -
CONTRIBUTING.md— local development setup, test instructions, and PR checklist.
[0.1.1] – [0.1.4] - 2026-02-22 / 2026-02-23¶
Fixed¶
- CI workflow not triggering on tag push — added
tagstrigger toci.yml(was gated on tags but never configured to run on them). - PyPI README image not rendering — switched from
badge.fury.iotoshields.iodynamic badge; bumped versions to force PyPI to re-render the README on each new release. - Minor ruff and mypy fixes surfaced during initial CI runs.
These were infrastructure-only patch releases with no API or behaviour changes.
[0.1.0] - 2026-02-22¶
Added¶
-
SampleSheetV1— parser for IEM V1 (bcl2fastq-era) sample sheets. Parses[Header],[Reads],[Settings],[Manifests], and[Data]sections. Exposessamples(),index_type(),adapters,read_lengths, and all standard header fields. -
SampleSheetV2— parser for BCLConvert V2 (NovaSeq X series) sample sheets. Parses[Header],[Reads],[BCLConvert_Settings],[BCLConvert_Data], and optional[Cloud_Data]sections. Addsget_umi_length()andget_read_structure()forOverrideCyclesdecoding. -
SampleSheetFactory— auto-detects V1 vs V2 format using a three-step strategy (header key scan → section name scan → V1 fallback) and returns the appropriate parser. -
SampleSheetValidator— validates parsed sheets forEMPTY_SAMPLES,INVALID_INDEX_CHARS,INDEX_TOO_SHORT,INDEX_TOO_LONG,DUPLICATE_INDEX,MISSING_INDEX2,DUPLICATE_SAMPLE_ID,NO_ADAPTERS, andADAPTER_MISMATCH. Returns a structuredValidationResult. -
Initial PyPI release. Requires Python 3.10+, depends only on
loguru.