Parsing¶

Format auto-detection¶

SampleSheetFactory uses a three-step detection strategy — no format hints required from the caller:

Header discriminator — scan [Header] for FileFormatVersion (→ V2) or IEMFileVersion (→ V1)
Section name scan — if no header key found, look for [BCLConvert_Settings] / [BCLConvert_Data] in the full file (→ V2)
Default — fall back to V1 (broadest compatibility with legacy files)

The detector reads only as much of the file as needed — stopping after [Header] in the common case.

from samplesheet_parser import SampleSheetFactory

factory = SampleSheetFactory()
sheet = factory.create_parser("SampleSheet.csv", parse=True)

print(factory.version)   # SampleSheetVersion.V1 or .V2

V1 parser¶

from samplesheet_parser import SampleSheetV1

sheet = SampleSheetV1("SampleSheet.csv")
sheet.parse()

print(sheet.experiment_name)   # "MyRun_20240115"
print(sheet.read_lengths)      # [151, 151]
print(sheet.adapters)          # ["CTGTCTCTTATACACATCT"]
print(sheet.index_type())      # "dual"

for sample in sheet.samples():
    print(sample["sample_id"], sample["index"], sample["index2"])

V2 parser¶

from samplesheet_parser import SampleSheetV2

sheet = SampleSheetV2("SampleSheet.csv")
sheet.parse()

print(sheet.reads)             # {"Read1Cycles": 151, "Read2Cycles": 151}
print(sheet.adapters)          # ["CTGTCTCTTATACACATCT"]
print(sheet.index_type())      # "dual"

for sample in sheet.samples():
    print(sample["Sample_ID"], sample["Index"], sample["Index2"])

UMI / OverrideCycles parsing¶

The V2 OverrideCycles field encodes read structure including UMI positions:

OverrideCycles	UMI length	UMI location
`Y151;I10;I10;Y151`	0	—
`Y151;I10U9;I10;Y151`	9	`index2`
`U5Y146;I8;I8;U5Y146`	5	`read1`

# OverrideCycles: Y151;I10U9;I10;Y151 → 9 bp UMI in Index1
print(sheet.get_umi_length())       # 9
rs = sheet.get_read_structure()
print(rs.umi_location)              # "index2"
print(rs.read_structure)
# {"read1_template": 151, "index2_length": 10, "index2_umi": 9, ...}

Shared interface¶

Both SampleSheetV1 and SampleSheetV2 expose:

Method / attribute	Returns	Description
`parse(do_clean=True)`	`None`	Parse all sections
`samples()`	`list[dict]`	One record per unique sample
`index_type()`	`str`	`"dual"`, `"single"`, or `"none"`
`.adapters`	`list[str]`	Adapter sequences
`.experiment_name`	`str \\| None`	Run/experiment name