StructureGenerator & generate()#
- class pasted._generator.StructureGenerator(config: GeneratorConfig | None = None, *, n_atoms: int | None = None, **kwargs: Any)[source]#
Bases:
objectGenerate random atomic structures with disorder metrics.
All parameters use Python snake_case names that correspond 1-to-1 with their CLI
--flagcounterparts.- Parameters:
n_atoms – Number of atoms per structure (before optional H augmentation).
charge – Total system charge (applied to every structure).
mult – Spin multiplicity 2S+1.
mode – Placement mode:
"gas"(default),"chain","shell", or"maxent".region – Bounding-region spec:
"sphere:R"|"box:L"|"box:LX,LY,LZ". Required when mode is"gas"or"maxent"; ignored for"chain"and"shell"(those modes use their own geometry parameters such as shell_radius and bond_range). Example:region="sphere:8"places atoms inside an 8 Å-radius sphere.branch_prob – [chain] Branching probability (default: 0.3).
chain_persist – [chain] Directional persistence ∈ [0, 1] (default: 0.5).
chain_bias – [chain] Global-axis drift strength ∈ [0, 1] (default: 0.0). The direction of the first bond becomes the bias axis; each subsequent step is blended toward that axis before normalization. 0.0 → no bias (backwards-compatible); higher values produce more elongated structures with larger
shape_aniso.bond_range – [chain / shell tails] Bond-length range in Å (default:
(1.2, 1.6)).center_z – [shell] Atomic number of center atom.
None→ random per sample.coord_range – [shell] Coordination-number range (default:
(4, 8)).shell_radius – [shell] Shell-radius range in Å (default:
(1.8, 2.5)).elements –
Element pool. Three forms are accepted:
Atomic-number spec string — a comma-separated list of integers and/or integer ranges, e.g.
"6,7,8"(C, N, O) or"1-30"(H to Zn) or"1-10,26,28"(H–Ne plus Fe and Ni). Ranges are inclusive. Symbol strings such as"C,N,O"are not accepted and will raiseValueError; use the numeric form"6,7,8"or pass a list instead.Explicit list of element symbols — e.g.
["C", "N", "O"]or["Cr", "Mn", "Fe", "Co", "Ni"]. Symbols must be valid two-character-or-less IUPAC symbols recognised by PASTED.None— all Z = 1–106 (default).
element_fractions – Relative sampling weights for elements in the pool, as a
{symbol: weight}dict (e.g.{"C": 0.5, "N": 0.3, "O": 0.2}). Weights are relative — they are normalized internally and need not sum to 1. Elements absent from the dict receive a weight of 1.0. WhenNone(default), every element in the pool is sampled with equal probability.element_min_counts – Minimum number of atoms per element guaranteed in every generated structure (e.g.
{"C": 2, "N": 1}). The required atoms are placed first; remaining slots are filled by weighted random sampling.None(default) → no lower bounds. The sum of all minimum counts must not exceedn_atoms.element_max_counts –
Maximum number of atoms allowed per element (e.g.
{"N": 5, "O": 3}). Elements that have reached their cap are excluded from sampling for the remaining slots.None(default) → no upper bounds.Note
When both element_min_counts and element_max_counts are given, each element’s min must be ≤ its max.
Note
The automatic hydrogen augmentation step (
add_hydrogen=True) runs after the constrained sampling and may temporarily exceed element_max_counts for H. Setadd_hydrogen=Falseif H count limits are critical.cov_scale – Minimum-distance scale factor:
d_min(i,j) = cov_scale × (r_i + r_j)using Pyykkö (2009) single-bond covalent radii. Default:1.0.relax_cycles – Maximum repulsion-relaxation iterations (default: 1500).
add_hydrogen – Automatically append H atoms when H is in the pool but the sampled composition contains none (default:
True).affine_strength –
Global dimensionless scale of the affine transformation applied to every generated structure before
relax_positions()(default:0.0= disabled). When > 0 a random stretch/compress + shear is applied once per structure, creating more anisotropic initial geometries before the repulsion-relaxation step. Practical range: 0.05–0.4. At 0.1 the structure is stretched / compressed by up to ±10 % along a random axis and sheared by up to ±5 %. Works identically across all placement modes (gas,chain,shell,maxent).0.0preserves the behavior of all versions prior to v0.2.3.Use affine_stretch, affine_shear, and affine_jitter to override individual operation strengths independently.
affine_stretch – Strength of the stretch/compress operation only ∈ (0, 1). When
None(default) affine_strength is used. Set to0.0to disable stretching while keeping shear and jitter active.affine_shear – Strength of the shear operation only ∈ (0, 1). When
None(default) affine_strength is used. Set to0.0to disable shearing while keeping stretch and jitter active.affine_jitter – Per-atom jitter scale ∈ (0, 1) relative to the move step. When
None(default) affine_strength is used. ForStructureGeneratorthe move step is always0.0, so jitter is never applied during generation regardless of this value; the parameter exists for symmetry withStructureOptimizer.n_samples – Maximum number of placement attempts (default: 1). Use
0to allow unlimited attempts (only valid when n_success is also set, otherwise aValueErroris raised).n_success –
Target number of structures that must pass all filters before generation stops (default:
None).None→ generate exactly n_samples attempts and return all that passed (original behavior).N > 0withn_samples > 0→ stop as soon as N structures pass or n_samples attempts are exhausted, whichever comes first. Returns the structures collected so far with a warning if fewer than N were found.N > 0withn_samples = 0→ unlimited attempts; stop only when N structures have passed.
seed – Random seed for reproducibility (
None→ non-deterministic).n_bins – Histogram bins for
H_spatialandRDF_dev(default: 20).w_atom – Weight of
H_atominH_total(default: 0.5).w_spatial – Weight of
H_spatialinH_total(default: 0.5).cutoff – Distance cutoff in Å for Steinhardt and graph metrics.
None→ auto-computed ascov_scale × 1.5 × median(r_i + r_j)over the element pool.filters – Filter strings of the form
"METRIC:MIN:MAX"(use"-"for an open bound). Only structures satisfying all filters are returned.verbose – Print progress and statistics to stderr (default:
False). The CLI always passesTrue; library callers usually leave it off.
Examples
Class API (config-based, recommended):
from pasted import GeneratorConfig, StructureGenerator cfg = GeneratorConfig( n_atoms=12, charge=0, mult=1, mode="gas", region="sphere:9", elements="1-30", n_samples=50, seed=42, filters=["H_total:2.0:-"], ) gen = StructureGenerator(cfg) structures = gen.generate() for s in structures: print(s)
Functional API (keyword-based, backward-compatible):
from pasted import generate structures = generate( n_atoms=12, charge=0, mult=1, mode="chain", elements="6,7,8", n_samples=20, seed=0, )
- property config: GeneratorConfig#
The
GeneratorConfigthat was used to construct this generator.
- generate() GenerationResult[source]#
Generate structures and return a
GenerationResult.Collects all structures yielded by the internal generation loop, attaches generation metadata (attempt counts, rejection breakdowns), and returns a
GenerationResultthat behaves like alist[Structure]in all normal usage while also carrying the diagnostics needed for automated pipelines.Run statistics (
n_attempted,n_passed, etc.) are obtained directly from_stream_with_stats()rather than via a shared instance variable, so there is no hidden coupling betweenstream()andgenerate(). Calling one does not affect the other, and partial iteration ofstream()cannot leave stale counters for a subsequentgenerate()call.GenerationResultsupports the fulllistinterface (indexing, iteration,len,bool) so existing code that doesresult[0]orfor s in resultcontinues to work without modification.Warnings are also emitted via
warnings.warn()(categoryUserWarning) when:Any attempts are rejected by the charge/multiplicity parity check.
No structures pass the metric filters.
The attempt budget is exhausted before
n_successis reached.
Each call creates a fresh
random.Randomseeded withself._cfg.seed, so repeated calls with the same seed are reproducible.- Returns:
Wraps the list of passing structures together with generation metadata. Use
result.structuresfor the raw list orresult.summary()for a one-line diagnostic string.- Return type:
GenerationResult
Examples
Drop-in list usage:
result = gen.generate() for s in result: print(s.to_xyz())
Metadata access:
result = gen.generate() if result.n_rejected_parity > 0: print(result.summary())
- stream() Iterator[Structure][source]#
Generate structures one by one, yielding each that passes all filters.
Unlike
generate(), structures are yielded immediately as they pass, so callers can write output or stop early without waiting for all attempts to complete.Respects both n_samples (maximum attempts) and n_success (target number of passing structures):
If n_success is set, the iterator stops as soon as that many structures have been yielded — even if n_samples attempts have not been exhausted.
If n_samples is
0(unlimited), the iterator runs until n_success structures have been yielded.If n_samples attempts are exhausted before n_success is reached, a warning is emitted to stderr and the iterator ends.
Each call creates a fresh
random.Randomseeded withself._cfg.seed, so repeated calls with the same seed are reproducible.- Yields:
Structure – Each structure that passed all filters, in generation order.
Examples
Write structures to a file as they are found:
gen = StructureGenerator( n_atoms=12, charge=0, mult=1, mode="gas", region="sphere:9", elements="1-30", n_success=10, n_samples=500, seed=42, ) for s in gen.stream(): s.write_xyz("out.xyz")
- pasted._generator.generate(config: GeneratorConfig | None = None, *, n_atoms: int | None = None, charge: int | None = None, mult: int | None = None, **kwargs: Any) GenerationResult[source]#
Create a
StructureGeneratorand immediately callgenerate().Two calling conventions are supported:
Config-based (recommended for new code): Pass a
GeneratorConfigas the first positional argument. Provides full mypy / IDE type-checking on every field:from pasted import generate, GeneratorConfig cfg = GeneratorConfig(n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=20, seed=0) result = generate(cfg)
Keyword-based (backward-compatible, original API): Pass all parameters as keyword arguments.
n_atoms,charge, andmultare required; all others are optional:result = generate( n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=20, seed=0, )
Both forms may not be mixed: if config is given, all other keyword arguments are ignored.
- Parameters:
config – A fully-populated
GeneratorConfiginstance. When given, all other keyword arguments are ignored.n_atoms – Number of atoms per structure (required when config is
None).charge – Total system charge (required when config is
None).mult – Spin multiplicity 2S+1 (required when config is
None).**kwargs – Any optional
GeneratorConfigfield, e.g.mode,region,elements,n_samples,seed,filters,affine_strength, … Ignored when config is provided.
- Returns:
A list-compatible object containing the structures that passed all filters plus metadata about the generation run.
- Return type:
GenerationResult
- pasted._generator.read_xyz(source: str | Path, *, recompute_metrics: bool = True, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cov_scale: float = 1.0) list[Structure][source]#
Read one or more structures from an XYZ file or string.
Convenience wrapper around
Structure.from_xyz()that reads all frames from a (possibly multi-frame) XYZ source and returns them as a list. Both plain XYZ and PASTED extended XYZ are supported.- Parameters:
source – Path to an XYZ file or a raw XYZ string.
recompute_metrics – Recompute all disorder metrics after loading each structure (default:
True).cutoff – Distance cutoff (Å) for metric computation. Auto-computed from each structure’s element pool when
None.n_bins – Histogram bins for
H_spatial/RDF_dev(default: 20).w_atom – Weight of
H_atominH_total(default: 0.5).w_spatial – Weight of
H_spatialinH_total(default: 0.5).cov_scale – Minimum distance scale factor used for metrics (default: 1.0).
- Returns:
One
Structureper frame, in file order.- Return type:
list[Structure]
- Raises:
FileNotFoundError – When source looks like a file path (no newlines) but the path does not exist on disk.
IsADirectoryError – When source is a path that points to a directory.
ValueError – When the XYZ content cannot be parsed.
Examples
Load a PASTED output file and pass the first structure to the optimizer:
from pasted import read_xyz, StructureOptimizer structs = read_xyz("results.xyz") opt = StructureOptimizer( n_atoms=len(structs[0]), charge=structs[0].charge, mult=structs[0].mult, objective={"H_total": 1.0}, elements=list(set(structs[0].atoms)), max_steps=3000, seed=42, ) result = opt.run(initial=structs[0])
Compose with
GenerationResultvia+:from pasted import read_xyz, generate existing = generate(n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:9", elements="6,7,8", n_samples=5, seed=0) loaded = read_xyz("previous_run.xyz") # loaded is a list[Structure]; wrap manually if needed: from pasted import GenerationResult all_structs = existing + GenerationResult(structures=loaded, n_passed=len(loaded), n_attempted=len(loaded))
Structure#
- class pasted._generator.Structure(atoms: list[str], positions: list[tuple[float, float, float]], charge: int, mult: int, metrics: dict[str, float], mode: str, sample_index: int = 0, center_sym: str | None = None, seed: int | None = None)[source]
Bases:
objectA single generated atomic structure with its computed disorder metrics.
- positions
Cartesian coordinates in Å, one
(x, y, z)tuple per atom.
- charge
Total system charge.
- Type:
- mult
Spin multiplicity 2S+1.
- Type:
- metrics
Computed disorder metrics (see
pasted._atoms.ALL_METRICS).
- mode
Placement mode used (
"gas","chain","shell","maxent", or"opt_<method>"for optimizer results).- Type:
- sample_index
1-based index within the batch of structures that passed filters.
- Type:
- center_sym
Element symbol of the shell center atom (shell mode only).
- Type:
str | None
- seed
Random seed used for generation (
Noneif unseeded).- Type:
int | None
- Properties
- ----------
- comp
Read-only composition string derived from
atoms, sorted in alphabetical order by element symbol, e.g.'C5N2O3'. Computed on access; not stored as a field.Note
The sort order is alphabetical (
sorted()on symbol strings), not Hill order (C first, H second, then alphabetical). Structures containing only C, H, N, O will look identical to Hill order, but others — e.g.['Na', 'C', 'H']→'CH2Na'— differ.
Examples
Access the composition string directly:
s = generate(n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=5, seed=0)[0] print(s.comp) # e.g. 'C4N3O3' print(repr(s)) # Structure(n=10, comp='C4N3O3', mode='gas', H_total=…)
- charge: int
- property comp: str
Alphabetically-sorted composition string derived from
atoms.Elements are sorted in ascending alphabetical order by symbol and counts above one are appended as a suffix, e.g.
'C5N2O3'. Single-atom elements are written without a count, e.g.'C'rather than'C1'.Note
The sort order is alphabetical (Python
sorted()), not Hill order (which would place C first, H second, then all other elements alphabetically). For structures containing only C, H, N, O the two orderings coincide, but elements such as Na, Fe, or Ar will appear at their alphabetical position rather than after H. For example['Na', 'C', 'H', 'H']yields'CH2Na'(alphabetical) rather than'CH2Na'(which happens to match Hill here) but['Ar', 'C', 'H']yields'ArCH2'(alphabetical) not'CH2Ar'(Hill).This property is computed on each access and is not persisted as a dataclass field.
- Returns:
Compact composition label, e.g.
'C5N2O3'.- Return type:
Examples
s.comp # 'C5N2O3' s.comp in repr(s) # True
- classmethod from_xyz(source: str | Path, *, frame: int = 0, recompute_metrics: bool = True, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, cov_scale: float = 1.0) Structure[source]
Load a
Structurefrom an XYZ file or string.Supports both plain XYZ and PASTED extended XYZ (with
charge=,mult=, and metric tokens on the comment line). When recompute_metrics isTrue(default), all disorder metrics are recomputed from the loaded geometry so that the returned structure is fully usable as optimizer input or for filtering.- Parameters:
source – Path to an XYZ file or a raw XYZ string.
frame – Zero-based frame index when source contains multiple concatenated structures (default: first frame).
recompute_metrics – Recompute all disorder metrics after loading. Set to
Falseto skip the recomputation and return the structure with whatever metric values were embedded in the extended XYZ comment (or an empty dict for plain XYZ).cutoff – Distance cutoff (Å) for metric computation. Auto-computed from the element pool when
None.n_bins – Histogram bins for
H_spatial/RDF_dev(default: 20).w_atom – Weight of
H_atominH_total(default: 0.5).w_spatial – Weight of
H_spatialinH_total(default: 0.5).cov_scale – Minimum distance scale factor used for metrics (default: 1.0).
- Return type:
Structure
- Raises:
FileNotFoundError – When source looks like a file path (no newlines) but the path does not exist on disk.
IsADirectoryError – When source is a path that points to a directory rather than a regular file.
ValueError – When the XYZ content cannot be parsed, or frame is out of range.
Examples
Load and immediately use as optimizer initial structure:
from pasted import Structure, StructureOptimizer s = Structure.from_xyz("my_structure.xyz") opt = StructureOptimizer( n_atoms=len(s), charge=s.charge, mult=s.mult, objective={"H_total": 1.0}, elements=[sym for sym in set(s.atoms)], max_steps=2000, seed=42, ) result = opt.run(initial=s)
- mode: str
- mult: int
- property n: int
Number of atoms in the structure.
- sample_index: int = 0
GenerationResult#
- class pasted._generator.GenerationResult(structures: list[Structure] = <factory>, n_attempted: int = 0, n_passed: int = 0, n_rejected_parity: int = 0, n_rejected_filter: int = 0, n_success_target: int | None = None)[source]
Bases:
objectReturn value of
generate()andStructureGenerator.generate().Behaves like a
list[Structure]in all normal usage (indexing, iteration,len, boolean test,for s in result) while also carrying metadata about how many attempts were made and why samples were rejected. This metadata is especially useful when integrating PASTED into automated pipelines such as ASE or high-throughput workflows, where a silent empty list would be indistinguishable from a successful run that just produced no results.- structures
Structures that passed all filters.
- Type:
list[pasted._generator.Structure]
- n_attempted
Total placement attempts made.
- Type:
- n_passed
Number of structures that passed all filters (equals
len(structures)unless the caller mutates the list).- Type:
- n_rejected_parity
Attempts rejected by the charge/multiplicity parity check.
- Type:
- n_rejected_filter
Attempts rejected by user-supplied metric filters.
- Type:
- n_success_target
The
n_successvalue that was in effect during generation (Nonewhen not set).- Type:
int | None
Examples
Drop-in replacement for
list[Structure]:result = generate(n_atoms=10, charge=0, mult=1, mode="gas", region="sphere:8", elements="6,7,8", n_samples=20, seed=0) for s in result: # iterates like a list print(s.to_xyz()) print(len(result)) # number that passed
Inspect rejection metadata:
if result.n_rejected_parity > 0: print(f"{result.n_rejected_parity} samples failed parity check") print(result.summary())
Notes
GenerationResultis adataclass(); downstream code should treat it as immutable. Thestructuresfield is a plainlistand may be sorted or sliced freely.- n_attempted: int = 0
- n_passed: int = 0
- n_rejected_filter: int = 0
- n_rejected_parity: int = 0
- structures: list[Structure]
Note
Attribute naming — always use the n_ prefix.
The one-line string returned by summary()
uses short labels (passed, attempted, rejected_parity,
rejected_filter). The corresponding Python attributes carry an n_
prefix: result.n_passed, result.n_attempted,
result.n_rejected_parity, result.n_rejected_filter.
Accessing result.passed or result.attempted directly raises
AttributeError.
Note
Automatic UserWarning signals.
Both generate() and
generate() emit a
UserWarning (via Python’s warnings module) whenever:
any attempt is rejected by the parity check (
n_rejected_parity > 0),no structures pass the metric filters, or
the attempt budget is exhausted before
n_successis reached.
These warnings fire regardless of verbose so that downstream tools
receive a machine-visible signal even when PASTED is silent:
import warnings
from pasted import generate
with warnings.catch_warnings(record=True) as w:
warnings.simplefilter("always")
result = generate(
n_atoms=8, charge=0, mult=1,
mode="gas", region="sphere:8",
elements="6",
n_samples=10, seed=0,
filters=["H_total:999:-"], # impossible — nothing will pass
)
if w:
print(w[0].message)