IO utilities#

pasted._io#

XYZ format serialization and deserialization helpers.

Public API#

format_xyz(atoms, positions, charge, mult, metrics, prefix=””) → str

Serialize a structure to an extended-XYZ string. The second line (the XYZ comment line) includes charge, multiplicity, and all metric values as key=value pairs.

parse_xyz(text) → list[dict]

Parse one or more XYZ frames from text. Each frame is returned as a dict with keys atoms, positions, charge, mult, metrics, and prefix. Blank lines between frames are silently skipped.

Notes

  • The extended-XYZ comment line written by format_xyz() is machine-readable: all fields use = as a separator with no spaces, allowing downstream tools to extract metrics without regex.

  • parse_xyz() accepts files produced by any tool that writes standard XYZ (atom-count line, then free-form comment line, then N coordinate lines). Unrecognized comment-line content is stored as prefix and does not raise an error.

pasted._io.format_xyz(atoms: list[str], positions: list[Vec3], charge: int, mult: int, metrics: dict[str, float], prefix: str = '') str[source]#

Serialise a structure to the extended XYZ format.

The second line (comment line) encodes prefix, charge, multiplicity, composition, and all metric values.

Parameters:
  • atoms – Element symbols.

  • positions – Cartesian coordinates (Å), one per atom.

  • charge – Total system charge.

  • mult – Spin multiplicity 2S+1.

  • metrics – Dict of computed disorder metrics.

  • prefix – Prepended to the comment line (e.g. "sample=1 mode=gas").

Return type:

A multi-line string (no trailing newline).

pasted._io.parse_xyz(text: str) list[tuple[list[str], list[Vec3], int, int, dict[str, float]]][source]#

Parse a (possibly multi-frame) XYZ string — standard or extended format.

Supports both:

  • Standard XYZ — atom count line, comment line, then coordinate lines. charge defaults to 0, mult to 1, metrics is empty.

  • Extended XYZ (as written by PASTED) — the comment line may contain charge=+0, mult=1, and KEY=VALUE metric tokens.

Parameters:

text – Full contents of one or more XYZ frames (concatenated).

Return type:

list of (atoms, positions, charge, mult, metrics) tuples, one per frame.

Raises:

ValueError – When the atom-count line or a coordinate line cannot be parsed.

Note

Extended XYZ comment-line format.

The comment line written by PASTED follows this structure:

sample=N mode=M charge=+Q mult=M comp=[El1:n1,El2:n2,...]  KEY1=V1  KEY2=V2  ...

comp= encodes the composition as a sorted comma-separated list of Element:count pairs. All metric keys from ALL_METRICS appear in order; nan is written for any metric that could not be computed. Metric values are formatted to 4 decimal places.

parse_xyz() extracts charge, mult, and any KEY=FLOAT tokens from this line. Unknown keys are silently ignored, making the format forward-compatible with future metric additions.

High-level helpers

For most use-cases the higher-level methods on Structure are more convenient than calling format_xyz() and parse_xyz() directly:

Method / function

Description

Structure.to_xyz()

Serialise one structure to an extended XYZ string in memory.

Structure.write_xyz()

Write or append one frame to a file.

pasted._generator.read_xyz()

Load all frames from a file or raw string and return list[Structure] with metrics recomputed by default.

Note

Bug fix — ``read_xyz`` now raises ``FileNotFoundError`` for missing paths (v0.4.0).

Prior to v0.4.0, calling read_xyz("missing.xyz") (a string without newlines that does not exist as a file) fell through the path-existence check and tried to parse the path string as XYZ text, raising a confusing ValueError: Expected atom count on line 1, got 'missing.xyz'. The function now raises FileNotFoundError (or IsADirectoryError for directory paths), matching the behavior of from_xyz().