sevaht_utility.parsing

Text, CSV, and JSON parsing helpers.

The centerpiece is csv_load(), which streams rows from any TextProvider into plain dictionaries or typed dataclass instances, with optional column-name normalization (via sevaht_utility.naming.NameStyle) and explicit field mapping for awkward headers. Supporting utilities include get_text() / open_text() for uniform text access, StringParser for string-to-value conversion, and json5_load() for JSON with comments and trailing commas.

sevaht_utility.parsing.get_text(source: TextProvider) str[source]

Return the full text from any supported TextProvider.

sevaht_utility.parsing.open_text(source: TextProvider) Iterator[TextIO][source]

Yield a readable TextIO. Must always be used as a context manager.

sevaht_utility.parsing.parse_bool(value: str) bool[source]

Parse a string as a boolean.

Parameters:

value – The string to parse.

Returns:

True if value is (case-insensitively) one of "1", "true", or "yes"; otherwise False.

exception sevaht_utility.parsing.StringParserError(value: object)[source]

Bases: TypeError

class sevaht_utility.parsing.StringParser[source]

Bases: object

exception sevaht_utility.parsing.UnconsumedColumnsError(columns: Sequence[str])[source]

Bases: Exception

exception sevaht_utility.parsing.NotADataclassError(obj: object)[source]

Bases: TypeError

Raised when an argument expected to be a dataclass is not one.

exception sevaht_utility.parsing.ShortRowError(*, line_number: int, field_name: str, column_index: int, column_count: int)[source]

Bases: ValueError

Raised when a CSV row has too few columns to fill a mapped field.

class sevaht_utility.parsing.DataMapping(column_names: Sequence[str] | None = None, field_to_column_name: Mapping[str, str] | None = None, field_to_column_index: Mapping[str, int] | None = None, name_style: NameStyle | None = None)[source]

Bases: object

How CSV columns map onto target fields in csv_load().

Every attribute is optional; an empty DataMapping lets csv_load match columns to fields by name. Provide attributes to override that matching for awkward or ambiguous headers. When several apply, the precedence (highest first) is field_to_column_index -> field_to_column_name -> field/parameter names -> dataclass metadata -> raw column names.

column_names

Column names to use instead of reading a header row. Supply this when the data has no header, or to override/rename the existing header positionally.

Type:

collections.abc.Sequence[str] | None

field_to_column_name

Maps each target field to the source column name it should read. Use for headers whose text differs from the field name (e.g. {"user_id": "acct#"}).

Type:

collections.abc.Mapping[str, str] | None

field_to_column_index

Maps each target field to a zero-based column index. Highest precedence; use to disambiguate duplicate headers (e.g. two "a" columns) or to bypass name matching entirely.

Type:

collections.abc.Mapping[str, int] | None

name_style

When set, both source column names and target field names are normalized to this NameStyle before matching, so a camelCase header can feed a snake_case field. The target names are normalized too on purpose: it lets your dataclass keep idiomatic PEP 8 snake_case members no matter how the file is cased, instead of renaming fields to match the header. Normalization that makes two columns collide raises AmbiguousColumnNamesError.

Type:

sevaht_utility.naming.NameStyle | None

exception sevaht_utility.parsing.AmbiguousColumnNamesError(*, canonical_name: str, columns: Sequence[tuple[int, str]])[source]

Bases: ValueError

exception sevaht_utility.parsing.AmbiguousFieldMappingsError(*, canonical_name: str, fields: Sequence[str])[source]

Bases: ValueError

exception sevaht_utility.parsing.ColumnIndexOutOfRangeError(*, field_name: str, column_index: int, column_count: int)[source]

Bases: ValueError

class sevaht_utility.parsing.CsvLoadOptions(delimiter: str = ', ', field_metadata_key: str = 'csv_key', allow_column_subset: bool = True, string_parser: ~sevaht_utility.parsing.StringParser = <factory>)[source]

Bases: object

Tuning options for csv_load() (the how, not the what).

delimiter

Field delimiter passed to the underlying CSV reader.

Type:

str

field_metadata_key

Dataclass field-metadata key consulted for a custom column name, i.e. field(metadata={field_metadata_key: "Header"}).

Type:

str

allow_column_subset

If True (default), columns with no matching field are ignored. If False, an unmatched column raises UnconsumedColumnsError.

Type:

bool

string_parser

The StringParser used to convert cell strings to field types. Defaults to the shared StringParser.default() instance.

Type:

sevaht_utility.parsing.StringParser

class sevaht_utility.parsing.ColumnResolution(resolved_indices: 'Mapping[str, int]', ambiguous_columns: 'AmbiguousColumns', column_count: 'int', mapping: 'DataMapping')[source]

Bases: object

sevaht_utility.parsing.csv_load(source: TextProvider, *, dataclass: None = None, init_function: None = None, mapping: DataMapping | None = None, options: CsvLoadOptions | None = None) Iterator[dict[str, str]][source]
sevaht_utility.parsing.csv_load(source: TextProvider, *, dataclass: None = None, init_function: Callable[[...], dict[str, object]], mapping: DataMapping | None = None, options: CsvLoadOptions | None = None) Iterator[dict[str, object]]
sevaht_utility.parsing.csv_load(source: TextProvider, *, dataclass: type[T], init_function: Callable[[...], T] | None = None, mapping: DataMapping | None = None, options: CsvLoadOptions | None = None) Iterator[T]

Stream CSV rows as dictionaries or typed dataclass instances.

Rows are yielded lazily, so very large inputs are processed without being held in memory. Blank lines are skipped. With no dataclass the result is a dict per row; with a dataclass each row becomes an instance, its cells converted to the annotated field types by options.string_parser. A field type may define a from_string(cls, s) classmethod to control its own conversion.

Columns are matched to fields by name. Override that for awkward headers via mapping; the precedence, highest first, is:

  1. mapping.field_to_column_index (explicit zero-based index)

  2. mapping.field_to_column_name (explicit source column name)

  3. init_function parameter names

  4. Dataclass field metadata (options.field_metadata_key) or field name

  5. Dict mode: the raw column names

When mapping.name_style is set, both source and target names are normalized to that style before matching (e.g. a camelCase header feeding a snake_case field).

Parameters:
  • source – Any TextProvider (string, Path, open text stream, or list of lines).

  • dataclass – When given, each row is built into an instance of this type.

  • init_function – A factory called with the resolved field values instead of the dataclass constructor; its parameter names drive matching.

  • mapping – Column-to-field mapping overrides. See DataMapping.

  • options – Reader/conversion tuning. See CsvLoadOptions.

Yields:

dict[str, str] per row in dict mode, or one dataclass instance per row otherwise.

Raises:

Example

Dict mode reads the header and yields one dict per row:

>>> list(csv_load(["name,score", "Ada,95"]))
[{'name': 'Ada', 'score': '95'}]

Dataclass mode converts cells to the annotated types:

>>> from dataclasses import dataclass
>>> @dataclass
... class Person:
...     name: str
...     score: int
>>> list(csv_load(["name,score", "Ada,95"], dataclass=Person))
[Person(name='Ada', score=95)]
sevaht_utility.parsing.json5_load(source: TextProvider) JsonValue[source]

Parse JSON with comments and trailing commas into JSON data.

Comments and trailing commas are stripped only outside of string literals, so contents such as "a,}" or "// not a comment" survive intact.