sevaht_utility.parsing¶
Text, CSV, and JSON parsing helpers.
The centerpiece is csv_load(), which streams rows from any
TextProvider into plain dictionaries or typed dataclass instances,
with optional column-name normalization (via
sevaht_utility.naming.NameStyle) and explicit field mapping for
awkward headers. Supporting utilities include get_text() /
open_text() for uniform text access, StringParser for
string-to-value conversion, and json5_load() for JSON with comments and
trailing commas.
- sevaht_utility.parsing.get_text(source: TextProvider) str[source]¶
Return the full text from any supported TextProvider.
- sevaht_utility.parsing.open_text(source: TextProvider) Iterator[TextIO][source]¶
Yield a readable TextIO. Must always be used as a context manager.
- sevaht_utility.parsing.parse_bool(value: str) bool[source]¶
Parse a string as a boolean.
- Parameters:
value – The string to parse.
- Returns:
Trueifvalueis (case-insensitively) one of"1","true", or"yes"; otherwiseFalse.
- exception sevaht_utility.parsing.UnconsumedColumnsError(columns: Sequence[str])[source]¶
Bases:
Exception
- exception sevaht_utility.parsing.NotADataclassError(obj: object)[source]¶
Bases:
TypeErrorRaised when an argument expected to be a dataclass is not one.
- exception sevaht_utility.parsing.ShortRowError(*, line_number: int, field_name: str, column_index: int, column_count: int)[source]¶
Bases:
ValueErrorRaised when a CSV row has too few columns to fill a mapped field.
- class sevaht_utility.parsing.DataMapping(column_names: Sequence[str] | None = None, field_to_column_name: Mapping[str, str] | None = None, field_to_column_index: Mapping[str, int] | None = None, name_style: NameStyle | None = None)[source]¶
Bases:
objectHow CSV columns map onto target fields in
csv_load().Every attribute is optional; an empty
DataMappingletscsv_loadmatch columns to fields by name. Provide attributes to override that matching for awkward or ambiguous headers. When several apply, the precedence (highest first) isfield_to_column_index->field_to_column_name-> field/parameter names -> dataclass metadata -> raw column names.- column_names¶
Column names to use instead of reading a header row. Supply this when the data has no header, or to override/rename the existing header positionally.
- Type:
collections.abc.Sequence[str] | None
- field_to_column_name¶
Maps each target field to the source column name it should read. Use for headers whose text differs from the field name (e.g.
{"user_id": "acct#"}).- Type:
collections.abc.Mapping[str, str] | None
- field_to_column_index¶
Maps each target field to a zero-based column index. Highest precedence; use to disambiguate duplicate headers (e.g. two
"a"columns) or to bypass name matching entirely.- Type:
collections.abc.Mapping[str, int] | None
- name_style¶
When set, both source column names and target field names are normalized to this
NameStylebefore matching, so acamelCaseheader can feed asnake_casefield. The target names are normalized too on purpose: it lets your dataclass keep idiomatic PEP 8snake_casemembers no matter how the file is cased, instead of renaming fields to match the header. Normalization that makes two columns collide raisesAmbiguousColumnNamesError.- Type:
- exception sevaht_utility.parsing.AmbiguousColumnNamesError(*, canonical_name: str, columns: Sequence[tuple[int, str]])[source]¶
Bases:
ValueError
- exception sevaht_utility.parsing.AmbiguousFieldMappingsError(*, canonical_name: str, fields: Sequence[str])[source]¶
Bases:
ValueError
- exception sevaht_utility.parsing.ColumnIndexOutOfRangeError(*, field_name: str, column_index: int, column_count: int)[source]¶
Bases:
ValueError
- class sevaht_utility.parsing.CsvLoadOptions(delimiter: str = ', ', field_metadata_key: str = 'csv_key', allow_column_subset: bool = True, string_parser: ~sevaht_utility.parsing.StringParser = <factory>)[source]¶
Bases:
objectTuning options for
csv_load()(the how, not the what).- field_metadata_key¶
Dataclass field-metadata key consulted for a custom column name, i.e.
field(metadata={field_metadata_key: "Header"}).- Type:
- allow_column_subset¶
If
True(default), columns with no matching field are ignored. IfFalse, an unmatched column raisesUnconsumedColumnsError.- Type:
- string_parser¶
The
StringParserused to convert cell strings to field types. Defaults to the sharedStringParser.default()instance.
- class sevaht_utility.parsing.ColumnResolution(resolved_indices: 'Mapping[str, int]', ambiguous_columns: 'AmbiguousColumns', column_count: 'int', mapping: 'DataMapping')[source]¶
Bases:
object
- sevaht_utility.parsing.csv_load(source: TextProvider, *, dataclass: None = None, init_function: None = None, mapping: DataMapping | None = None, options: CsvLoadOptions | None = None) Iterator[dict[str, str]][source]¶
- sevaht_utility.parsing.csv_load(source: TextProvider, *, dataclass: None = None, init_function: Callable[[...], dict[str, object]], mapping: DataMapping | None = None, options: CsvLoadOptions | None = None) Iterator[dict[str, object]]
- sevaht_utility.parsing.csv_load(source: TextProvider, *, dataclass: type[T], init_function: Callable[[...], T] | None = None, mapping: DataMapping | None = None, options: CsvLoadOptions | None = None) Iterator[T]
Stream CSV rows as dictionaries or typed dataclass instances.
Rows are yielded lazily, so very large inputs are processed without being held in memory. Blank lines are skipped. With no
dataclassthe result is a dict per row; with adataclasseach row becomes an instance, its cells converted to the annotated field types byoptions.string_parser. A field type may define afrom_string(cls, s)classmethod to control its own conversion.Columns are matched to fields by name. Override that for awkward headers via
mapping; the precedence, highest first, is:mapping.field_to_column_index(explicit zero-based index)mapping.field_to_column_name(explicit source column name)init_functionparameter namesDataclass field metadata (
options.field_metadata_key) or field nameDict mode: the raw column names
When
mapping.name_styleis set, both source and target names are normalized to that style before matching (e.g. acamelCaseheader feeding asnake_casefield).- Parameters:
source – Any
TextProvider(string,Path, open text stream, or list of lines).dataclass – When given, each row is built into an instance of this type.
init_function – A factory called with the resolved field values instead of the dataclass constructor; its parameter names drive matching.
mapping – Column-to-field mapping overrides. See
DataMapping.options – Reader/conversion tuning. See
CsvLoadOptions.
- Yields:
dict[str, str]per row in dict mode, or onedataclassinstance per row otherwise.- Raises:
NotADataclassError –
dataclassis not a dataclass type.AmbiguousColumnNamesError – Normalization collapses two columns onto one name that a field needs.
ColumnIndexOutOfRangeError – A
field_to_column_indexentry is out of range for the header.ShortRowError – A row has too few columns to fill a mapped field.
UnconsumedColumnsError –
options.allow_column_subsetisFalseand a column matched no field.
Example
Dict mode reads the header and yields one dict per row:
>>> list(csv_load(["name,score", "Ada,95"])) [{'name': 'Ada', 'score': '95'}]
Dataclass mode converts cells to the annotated types:
>>> from dataclasses import dataclass >>> @dataclass ... class Person: ... name: str ... score: int >>> list(csv_load(["name,score", "Ada,95"], dataclass=Person)) [Person(name='Ada', score=95)]