StructureGenerator & generate()#

class pasted._generator.StructureGenerator(config: GeneratorConfig | None = None, *, n_atoms: int | None = None, **kwargs: Any)[source]#

Bases: object

Generate random atomic structures with disorder metrics.

All parameters use Python snake_case names that correspond 1-to-1 with their CLI --flag counterparts.

Parameters:

n_atoms – Number of atoms per structure (before optional H augmentation).
charge – Total system charge (applied to every structure).
mult – Spin multiplicity 2S+1.
mode – Placement mode: "gas" (default), "chain", "shell", or "maxent".
region – Bounding-region spec: "sphere:R" | "box:L" | "box:LX,LY,LZ". Required when mode is "gas" or "maxent"; ignored for "chain" and "shell" (those modes use their own geometry parameters such as shell_radius and bond_range). Example: region="sphere:8" places atoms inside an 8 Å-radius sphere.
branch_prob – [chain] Branching probability (default: 0.3).
chain_persist – [chain] Directional persistence ∈ [0, 1] (default: 0.5).
chain_bias – [chain] Global-axis drift strength ∈ [0, 1] (default: 0.0). The direction of the first bond becomes the bias axis; each subsequent step is blended toward that axis before normalization. 0.0 → no bias (backwards-compatible); higher values produce more elongated structures with larger shape_aniso.
bond_range – [chain / shell tails] Bond-length range in Å (default: (1.2, 1.6)).
center_z – [shell] Atomic number of center atom. None → random per sample.
coord_range – [shell] Coordination-number range (default: (4, 8)).
shell_radius – [shell] Shell-radius range in Å (default: (1.8, 2.5)).
elements –
Element pool. Three forms are accepted:
- Atomic-number spec string — a comma-separated list of integers and/or integer ranges, e.g. "6,7,8" (C, N, O) or "1-30" (H to Zn) or "1-10,26,28" (H–Ne plus Fe and Ni). Ranges are inclusive. Symbol strings such as "C,N,O" are not accepted and will raise ValueError; use the numeric form "6,7,8" or pass a list instead.
- Explicit list of element symbols — e.g. ["C", "N", "O"] or ["Cr", "Mn", "Fe", "Co", "Ni"]. Symbols must be valid two-character-or-less IUPAC symbols recognised by PASTED.
- None — all Z = 1–106 (default).
element_fractions – Relative sampling weights for elements in the pool, as a {symbol: weight} dict (e.g. {"C": 0.5, "N": 0.3, "O": 0.2}). Weights are relative — they are normalized internally and need not sum to 1. Elements absent from the dict receive a weight of 1.0. When None (default), every element in the pool is sampled with equal probability.
element_min_counts – Minimum number of atoms per element guaranteed in every generated structure (e.g. {"C": 2, "N": 1}). The required atoms are placed first; remaining slots are filled by weighted random sampling. None (default) → no lower bounds. The sum of all minimum counts must not exceed n_atoms.
element_max_counts –
Maximum number of atoms allowed per element (e.g. {"N": 5, "O": 3}). Elements that have reached their cap are excluded from sampling for the remaining slots. None (default) → no upper bounds.

Note

When both element_min_counts and element_max_counts are given, each element’s min must be ≤ its max.

Note

The automatic hydrogen augmentation step (add_hydrogen=True) runs after the constrained sampling and may temporarily exceed element_max_counts for H. Set add_hydrogen=False if H count limits are critical.
cov_scale – Minimum-distance scale factor: d_min(i,j) = cov_scale × (r_i + r_j) using Pyykkö (2009) single-bond covalent radii. Default: 1.0.
relax_cycles – Maximum repulsion-relaxation iterations (default: 1500).
add_hydrogen – Automatically append H atoms when H is in the pool but the sampled composition contains none (default: True).
affine_strength –
Global dimensionless scale of the affine transformation applied to every generated structure before relax_positions() (default: 0.0 = disabled). When > 0 a random stretch/compress + shear is applied once per structure, creating more anisotropic initial geometries before the repulsion-relaxation step. Practical range: 0.05–0.4. At 0.1 the structure is stretched / compressed by up to ±10 % along a random axis and sheared by up to ±5 %. Works identically across all placement modes (gas, chain, shell, maxent). 0.0 preserves the behavior of all versions prior to v0.2.3.

Use affine_stretch, affine_shear, and affine_jitter to override individual operation strengths independently.
affine_stretch – Strength of the stretch/compress operation only ∈ (0, 1). When None (default) affine_strength is used. Set to 0.0 to disable stretching while keeping shear and jitter active.
affine_shear – Strength of the shear operation only ∈ (0, 1). When None (default) affine_strength is used. Set to 0.0 to disable shearing while keeping stretch and jitter active.
affine_jitter – Per-atom jitter scale ∈ (0, 1) relative to the move step. When None (default) affine_strength is used. For StructureGenerator the move step is always 0.0, so jitter is never applied during generation regardless of this value; the parameter exists for symmetry with StructureOptimizer.
n_samples – Maximum number of placement attempts (default: 1). Use 0 to allow unlimited attempts (only valid when n_success is also set, otherwise a ValueError is raised).
n_success –
Target number of structures that must pass all filters before generation stops (default: None).
- None → generate exactly n_samples attempts and return all that passed (original behavior).
- N > 0 with n_samples > 0 → stop as soon as N structures pass or n_samples attempts are exhausted, whichever comes first. Returns the structures collected so far with a warning if fewer than N were found.
- N > 0 with n_samples = 0 → unlimited attempts; stop only when N structures have passed.
seed – Random seed for reproducibility (None → non-deterministic).
n_bins – Histogram bins for H_spatial and RDF_dev (default: 20).
w_atom – Weight of H_atom in H_total (default: 0.5).
w_spatial – Weight of H_spatial in H_total (default: 0.5).
cutoff – Distance cutoff in Å for Steinhardt and graph metrics. None → auto-computed as cov_scale × 1.5 × median(r_i + r_j) over the element pool.
filters – Filter strings of the form "METRIC:MIN:MAX" (use "-" for an open bound). Only structures satisfying all filters are returned.
verbose – Print progress and statistics to stderr (default: False). The CLI always passes True; library callers usually leave it off.

Examples

Class API (config-based, recommended):

from pasted import GeneratorConfig, StructureGenerator

cfg = GeneratorConfig(
    n_atoms=12, charge=0, mult=1,
    mode="gas", region="sphere:9",
    elements="1-30", n_samples=50, seed=42,
    filters=["H_total:2.0:-"],
)
gen = StructureGenerator(cfg)
structures = gen.generate()
for s in structures:
    print(s)

Functional API (keyword-based, backward-compatible):

from pasted import generate

structures = generate(
    n_atoms=12, charge=0, mult=1,
    mode="chain", elements="6,7,8",
    n_samples=20, seed=0,
)

__iter__() → Iterator[Structure][source]#: Iterate over generated structures (delegates to stream()).

__repr__() → str[source]#: Return repr(self).

property config: GeneratorConfig#: The GeneratorConfig that was used to construct this generator.

property cutoff: float#: Distance cutoff in Å used for Steinhardt and graph metrics.

property element_pool: list[str]#: A copy of the resolved element pool (list of symbols).

generate() → GenerationResult[source]#

Generate structures and return a GenerationResult.

Collects all structures yielded by the internal generation loop, attaches generation metadata (attempt counts, rejection breakdowns), and returns a GenerationResult that behaves like a list[Structure] in all normal usage while also carrying the diagnostics needed for automated pipelines.

Run statistics (n_attempted, n_passed, etc.) are obtained directly from _stream_with_stats() rather than via a shared instance variable, so there is no hidden coupling between stream() and generate(). Calling one does not affect the other, and partial iteration of stream() cannot leave stale counters for a subsequent generate() call.

GenerationResult supports the full list interface (indexing, iteration, len, bool) so existing code that does result[0] or for s in result continues to work without modification.

Warnings are also emitted via warnings.warn() (category UserWarning) when:

Any attempts are rejected by the charge/multiplicity parity check.
No structures pass the metric filters.
The attempt budget is exhausted before n_success is reached.

Each call creates a fresh random.Random seeded with self._cfg.seed, so repeated calls with the same seed are reproducible.

Returns:: Wraps the list of passing structures together with generation metadata. Use result.structures for the raw list or result.summary() for a one-line diagnostic string.
Return type:: GenerationResult

Examples

Drop-in list usage:

result = gen.generate()
for s in result:
    print(s.to_xyz())

Metadata access:

result = gen.generate()
if result.n_rejected_parity > 0:
    print(result.summary())

stream() → Iterator[Structure][source]#

Generate structures one by one, yielding each that passes all filters.

Unlike generate(), structures are yielded immediately as they pass, so callers can write output or stop early without waiting for all attempts to complete.

Respects both n_samples (maximum attempts) and n_success (target number of passing structures):

If n_success is set, the iterator stops as soon as that many structures have been yielded — even if n_samples attempts have not been exhausted.
If n_samples is 0 (unlimited), the iterator runs until n_success structures have been yielded.
If n_samples attempts are exhausted before n_success is reached, a warning is emitted to stderr and the iterator ends.

Each call creates a fresh random.Random seeded with self._cfg.seed, so repeated calls with the same seed are reproducible.

Yields:: Structure – Each structure that passed all filters, in generation order.

Examples

Write structures to a file as they are found:

gen = StructureGenerator(
    n_atoms=12, charge=0, mult=1,
    mode="gas", region="sphere:9",
    elements="1-30", n_success=10, n_samples=500, seed=42,
)
for s in gen.stream():
    s.write_xyz("out.xyz")

pasted._generator.generate(config: GeneratorConfig | None = None, *, n_atoms: int | None = None, charge: int | None = None, mult: int | None = None, **kwargs: Any) → GenerationResult[source]#

Create a StructureGenerator and immediately call generate().

Two calling conventions are supported:

Config-based (recommended for new code): Pass a GeneratorConfig as the first positional argument. Provides full mypy / IDE type-checking on every field:

from pasted import generate, GeneratorConfig

cfg = GeneratorConfig(n_atoms=10, charge=0, mult=1,
                      mode="gas", region="sphere:8",
                      elements="6,7,8", n_samples=20, seed=0)
result = generate(cfg)

Keyword-based (backward-compatible, original API): Pass all parameters as keyword arguments. n_atoms, charge, and mult are required; all others are optional:

result = generate(
    n_atoms=10, charge=0, mult=1,
    mode="gas", region="sphere:8",
    elements="6,7,8", n_samples=20, seed=0,
)

Both forms may not be mixed: if config is given, all other keyword arguments are ignored.

Parameters:

config – A fully-populated GeneratorConfig instance. When given, all other keyword arguments are ignored.
n_atoms – Number of atoms per structure (required when config is None).
charge – Total system charge (required when config is None).
mult – Spin multiplicity 2S+1 (required when config is None).
**kwargs – Any optional GeneratorConfig field, e.g. mode, region, elements, n_samples, seed, filters, affine_strength, … Ignored when config is provided.

Returns:

A list-compatible object containing the structures that passed all filters plus metadata about the generation run.

Return type:

GenerationResult

pasted._generator.read_xyz(source: str | Path, *, recompute_metrics: bool = True, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cov_scale: float = 1.0) → list[Structure][source]#

Read one or more structures from an XYZ file or string.

Convenience wrapper around Structure.from_xyz() that reads all frames from a (possibly multi-frame) XYZ source and returns them as a list. Both plain XYZ and PASTED extended XYZ are supported.

Parameters:

source – Path to an XYZ file or a raw XYZ string.
recompute_metrics – Recompute all disorder metrics after loading each structure (default: True).
cutoff – Distance cutoff (Å) for metric computation. Auto-computed from each structure’s element pool when None.
n_bins – Histogram bins for H_spatial / RDF_dev (default: 20).
w_atom – Weight of H_atom in H_total (default: 0.5).
w_spatial – Weight of H_spatial in H_total (default: 0.5).
cov_scale – Minimum distance scale factor used for metrics (default: 1.0).

Returns:

One Structure per frame, in file order.

Return type:

list[Structure]

Raises:

FileNotFoundError – When source looks like a file path (no newlines) but the path does not exist on disk.
IsADirectoryError – When source is a path that points to a directory.
ValueError – When the XYZ content cannot be parsed.

Examples

Load a PASTED output file and pass the first structure to the optimizer:

from pasted import read_xyz, StructureOptimizer

structs = read_xyz("results.xyz")
opt = StructureOptimizer(
    n_atoms=len(structs[0]),
    charge=structs[0].charge,
    mult=structs[0].mult,
    objective={"H_total": 1.0},
    elements=list(set(structs[0].atoms)),
    max_steps=3000,
    seed=42,
)
result = opt.run(initial=structs[0])

Compose with GenerationResult via +:

from pasted import read_xyz, generate

existing = generate(n_atoms=10, charge=0, mult=1,
                    mode="gas", region="sphere:9",
                    elements="6,7,8", n_samples=5, seed=0)
loaded   = read_xyz("previous_run.xyz")
# loaded is a list[Structure]; wrap manually if needed:
from pasted import GenerationResult
all_structs = existing + GenerationResult(structures=loaded,
                                          n_passed=len(loaded),
                                          n_attempted=len(loaded))

Structure#

class pasted._generator.Structure(atoms: list[str], positions: list[tuple[float, float, float]], charge: int, mult: int, metrics: dict[str, float], mode: str, sample_index: int = 0, center_sym: str | None = None, seed: int | None = None)[source]

Bases: object

A single generated atomic structure with its computed disorder metrics.

atoms

Element symbols, one per atom.

Type:: list[str]

positions

Cartesian coordinates in Å, one (x, y, z) tuple per atom.

Type:: list[tuple[float, float, float]]

charge

Total system charge.

Type:: int

mult

Spin multiplicity 2S+1.

Type:: int

metrics

Computed disorder metrics (see pasted._atoms.ALL_METRICS).

Type:: dict[str, float]

mode

Placement mode used ("gas", "chain", "shell", "maxent", or "opt_<method>" for optimizer results).

Type:: str

sample_index

1-based index within the batch of structures that passed filters.

Type:: int

center_sym

Element symbol of the shell center atom (shell mode only).

Type:: str | None

seed

Random seed used for generation (None if unseeded).

Type:: int | None

Properties

----------

comp: Read-only composition string derived from atoms, sorted in alphabetical order by element symbol, e.g. 'C5N2O3'. Computed on access; not stored as a field.

Note

The sort order is alphabetical (sorted() on symbol strings), not Hill order (C first, H second, then alphabetical). Structures containing only C, H, N, O will look identical to Hill order, but others — e.g. ['Na', 'C', 'H'] → 'CH2Na' — differ.

Examples

Access the composition string directly:

s = generate(n_atoms=10, charge=0, mult=1, mode="gas",
             region="sphere:8", elements="6,7,8", n_samples=5, seed=0)[0]
print(s.comp)          # e.g. 'C4N3O3'
print(repr(s))         # Structure(n=10, comp='C4N3O3', mode='gas', H_total=…)

__len__() → int[source]

__repr__() → str[source]: Return repr(self).

atoms: list[str]

center_sym: str | None = None

charge: int

property comp: str

Alphabetically-sorted composition string derived from atoms.

Elements are sorted in ascending alphabetical order by symbol and counts above one are appended as a suffix, e.g. 'C5N2O3'. Single-atom elements are written without a count, e.g. 'C' rather than 'C1'.

Note

The sort order is alphabetical (Python sorted()), not Hill order (which would place C first, H second, then all other elements alphabetically). For structures containing only C, H, N, O the two orderings coincide, but elements such as Na, Fe, or Ar will appear at their alphabetical position rather than after H. For example ['Na', 'C', 'H', 'H'] yields 'CH2Na' (alphabetical) rather than 'CH2Na' (which happens to match Hill here) but ['Ar', 'C', 'H'] yields 'ArCH2' (alphabetical) not 'CH2Ar' (Hill).

This property is computed on each access and is not persisted as a dataclass field.

Returns:: Compact composition label, e.g. 'C5N2O3'.
Return type:: str

Examples

s.comp          # 'C5N2O3'
s.comp in repr(s)  # True

classmethod from_xyz(source: str | Path, *, frame: int = 0, recompute_metrics: bool = True, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cov_scale: float = 1.0) → Structure[source]

Load a Structure from an XYZ file or string.

Supports both plain XYZ and PASTED extended XYZ (with charge=, mult=, and metric tokens on the comment line). When recompute_metrics is True (default), all disorder metrics are recomputed from the loaded geometry so that the returned structure is fully usable as optimizer input or for filtering.

Parameters:

source – Path to an XYZ file or a raw XYZ string.
frame – Zero-based frame index when source contains multiple concatenated structures (default: first frame).
recompute_metrics – Recompute all disorder metrics after loading. Set to False to skip the recomputation and return the structure with whatever metric values were embedded in the extended XYZ comment (or an empty dict for plain XYZ).
cutoff – Distance cutoff (Å) for metric computation. Auto-computed from the element pool when None.
n_bins – Histogram bins for H_spatial / RDF_dev (default: 20).
w_atom – Weight of H_atom in H_total (default: 0.5).
w_spatial – Weight of H_spatial in H_total (default: 0.5).
cov_scale – Minimum distance scale factor used for metrics (default: 1.0).

Return type:

Structure

Raises:

FileNotFoundError – When source looks like a file path (no newlines) but the path does not exist on disk.
IsADirectoryError – When source is a path that points to a directory rather than a regular file.
ValueError – When the XYZ content cannot be parsed, or frame is out of range.

Examples

Load and immediately use as optimizer initial structure:

from pasted import Structure, StructureOptimizer

s = Structure.from_xyz("my_structure.xyz")
opt = StructureOptimizer(
    n_atoms=len(s), charge=s.charge, mult=s.mult,
    objective={"H_total": 1.0},
    elements=[sym for sym in set(s.atoms)],
    max_steps=2000, seed=42,
)
result = opt.run(initial=s)

metrics: dict[str, float]

mode: str

mult: int

property n: int: Number of atoms in the structure.

positions: list[tuple[float, float, float]]

sample_index: int = 0

seed: int | None = None

to_xyz(prefix: str = '') → str[source]

Serialise to extended XYZ format.

Parameters:: prefix – Custom prefix for the comment line. When omitted the standard "sample=N mode=M …" string is generated automatically.
Return type:: Multi-line string (no trailing newline).

write_xyz(path: str | Path, *, append: bool = True) → None[source]

Write this structure to an XYZ file.

Parameters:

path – Output file path.
append – If True (default) the file is opened in append mode so that multiple structures can be written in sequence. Use append=False to overwrite.

GenerationResult#

class pasted._generator.GenerationResult(structures: list[Structure] = <factory>, n_attempted: int = 0, n_passed: int = 0, n_rejected_parity: int = 0, n_rejected_filter: int = 0, n_success_target: int | None = None)[source]

Bases: object

Return value of generate() and StructureGenerator.generate().

Behaves like a list[Structure] in all normal usage (indexing, iteration, len, boolean test, for s in result) while also carrying metadata about how many attempts were made and why samples were rejected. This metadata is especially useful when integrating PASTED into automated pipelines such as ASE or high-throughput workflows, where a silent empty list would be indistinguishable from a successful run that just produced no results.

structures

Structures that passed all filters.

Type:: list[pasted._generator.Structure]

n_attempted

Total placement attempts made.

Type:: int

n_passed

Number of structures that passed all filters (equals len(structures) unless the caller mutates the list).

Type:: int

n_rejected_parity

Attempts rejected by the charge/multiplicity parity check.

Type:: int

n_rejected_filter

Attempts rejected by user-supplied metric filters.

Type:: int

n_success_target

The n_success value that was in effect during generation (None when not set).

Type:: int | None

Examples

Drop-in replacement for list[Structure]:

result = generate(n_atoms=10, charge=0, mult=1,
                  mode="gas", region="sphere:8",
                  elements="6,7,8", n_samples=20, seed=0)
for s in result:          # iterates like a list
    print(s.to_xyz())
print(len(result))        # number that passed

Inspect rejection metadata:

if result.n_rejected_parity > 0:
    print(f"{result.n_rejected_parity} samples failed parity check")
print(result.summary())

Notes

GenerationResult is a dataclass(); downstream code should treat it as immutable. The structures field is a plain list and may be sorted or sliced freely.

__iter__() → Iterator[Structure][source]

__len__() → int[source]

__repr__() → str[source]: Return repr(self).

n_attempted: int = 0

n_passed: int = 0

n_rejected_filter: int = 0

n_rejected_parity: int = 0

n_success_target: int | None = None

structures: list[Structure]

summary() → str[source]

Return a human-readable one-line summary of the generation run.

Returns:: E.g. "passed=5 attempted=20 rejected_parity=2 rejected_filter=13".
Return type:: str

Note

Attribute naming — always use the n_ prefix.

The one-line string returned by summary() uses short labels (passed, attempted, rejected_parity, rejected_filter). The corresponding Python attributes carry an n_ prefix: result.n_passed, result.n_attempted, result.n_rejected_parity, result.n_rejected_filter. Accessing result.passed or result.attempted directly raises AttributeError.

Note

Automatic UserWarning signals.

Both generate() and generate() emit a UserWarning (via Python’s warnings module) whenever:

any attempt is rejected by the parity check (n_rejected_parity > 0),
no structures pass the metric filters, or
the attempt budget is exhausted before n_success is reached.

These warnings fire regardless of verbose so that downstream tools receive a machine-visible signal even when PASTED is silent:

import warnings
from pasted import generate

with warnings.catch_warnings(record=True) as w:
    warnings.simplefilter("always")
    result = generate(
        n_atoms=8, charge=0, mult=1,
        mode="gas", region="sphere:8",
        elements="6",
        n_samples=10, seed=0,
        filters=["H_total:999:-"],   # impossible — nothing will pass
    )
if w:
    print(w[0].message)

StructureGenerator & generate()

Contents

StructureGenerator & generate()#

Structure#

GenerationResult#