StructureOptimizer

StructureOptimizer#

class pasted._optimizer.StructureOptimizer(*, n_atoms: int, charge: int, mult: int, objective: dict[str, float] | Callable[[dict[str, float]], float] | Callable[[dict[str, float], EvalContext], float], elements: str | list[str] | None = None, method: str = 'annealing', max_steps: int = 5000, T_start: float = 1.0, T_end: float = 0.01, frag_threshold: float = 0.3, move_step: float = 0.5, allow_composition_moves: bool = True, allow_displacements: bool = True, allow_affine_moves: bool = False, affine_strength: float = 0.1, affine_stretch: float | None = None, affine_shear: float | None = None, affine_jitter: float | None = None, lcc_threshold: float = 0.0, cov_scale: float = 1.0, relax_cycles: int = 1500, cutoff: float | None = None, n_bins: int = 20, w_atom: float = 0.5, w_spatial: float = 0.5, n_restarts: int = 1, n_replicas: int = 4, pt_swap_interval: int = 10, max_init_attempts: int = 0, seed: int | None = None, verbose: bool = False)[source]#

Bases: object

Optimize a single structure to maximize a disorder objective.

Parameters:

n_atoms – Number of atoms.
charge – Total system charge.
mult – Spin multiplicity 2S+1.
objective –
Weight dict {"METRIC": weight, ...} or any callable. The optimizer maximizes the returned scalar.

Two calling conventions are supported:
- 1-argument f(m) — m is a dict[str, float] of disorder metrics. Fully backward-compatible.
- 2-argument f(m, ctx) — m is the same metrics dict; ctx is an EvalContext that exposes:
  - Structure: ctx.atoms, ctx.positions, ctx.charge, ctx.mult, ctx.n_atoms, ctx.to_xyz()
  - Optimizer state: ctx.step, ctx.temperature, ctx.f_current, ctx.best_f, ctx.progress, ctx.per_atom_q6, ctx.restart_idx
  - Configuration: ctx.element_pool, ctx.cutoff, ctx.method, ctx.T_start, ctx.T_end, ctx.seed
  - PT-only (None for other methods): ctx.replica_idx, ctx.replica_temperature, ctx.n_replicas
Dispatch is based on the number of required positional parameters via inspect.signature(). A callable with a default for the second argument (lambda m, ctx=None:) is treated as 1-argument. EvalContext construction is skipped entirely for 1-argument and dict objectives — no overhead for existing code.
elements – Element pool — spec string ("6,7,8"), list of symbols, or None for all Z = 1–106. When a list is given, duplicate symbols are silently removed while preserving insertion order (e.g. ['C', 'H', 'H', 'H', 'H'] is treated as ['C', 'H']). To bias sampling toward a particular element use element_fractions in StructureGenerator instead.
method – "annealing" (default), "basin_hopping", or "parallel_tempering".
max_steps – Number of MC steps per restart (or per replica per restart for "parallel_tempering"; default: 5000).
T_start – Initial temperature (default: 1.0). For "parallel_tempering" this is the highest replica temperature.
T_end – Final temperature for SA (default: 0.01). For "parallel_tempering" this is the lowest replica temperature (the coldest, most selective replica). BH uses T_start throughout.
n_replicas – Number of temperature replicas for "parallel_tempering" (default: 4). Ignored for other methods. Temperatures are spaced geometrically between T_end and T_start.
pt_swap_interval – Attempt a replica-exchange swap every this many MC steps (default: 10). Ignored for other methods.
allow_displacements – When True (default), fragment moves — small random displacements of one or more atoms — are included in the MC step pool as an independent move type. When False, fragment moves are excluded; coordinates are only modified by affine moves (if allow_affine_moves is True). If the initial structure passed to run() contains atoms whose symbols are not in elements, those atoms are automatically replaced with parity-compatible pool elements before the MC loop begins. This sanitization applies to all three methods (SA, BH, and PT); see _sanitize_atoms_to_pool(). Cannot be False simultaneously with allow_composition_moves and allow_affine_moves (at least one move type must be enabled).
allow_composition_moves – When True (default), composition moves — replacing a random atom with a different element drawn from elements while preserving charge/multiplicity parity — are included in the MC step pool as an independent move type. When False, element types are held fixed throughout the run. Cannot be False simultaneously with allow_displacements and allow_affine_moves (at least one move type must be enabled).
allow_affine_moves – When True, affine moves — a random stretch, compress, or shear applied to the entire structure, followed by a small per-atom jitter — are included in the MC step pool as an independent move type alongside (not as a subset of) displacement and composition moves. Affine moves allow the optimizer to explore elongated or compressed configurations that fragment moves cannot reach efficiently. When allow_displacements is False, affine moves are the only way positions change; the distance-constraint relaxation is not applied after affine moves (consistent with the allow_displacements=False semantics). Default: False (backward-compatible). Cannot be False simultaneously with allow_displacements and allow_composition_moves (at least one move type must be enabled).
affine_strength – Global dimensionless scale of the affine transform (default: 0.1). At 0.1 the structure is stretched / compressed by up to ±10 % along a random axis and sheared by up to ±5 %. Practical range: 0.02–0.4. Has no effect when allow_affine_moves is False. Use affine_stretch, affine_shear, and affine_jitter to override individual operation strengths independently.
affine_stretch – Strength of the stretch/compress operation only ∈ (0, 1). When None (default) affine_strength is used. Set to 0.0 to disable stretching while keeping shear and jitter active. Has no effect when allow_affine_moves is False.
affine_shear – Strength of the shear operation only ∈ (0, 1). When None (default) affine_strength is used. Set to 0.0 to disable shearing while keeping stretch and jitter active. Has no effect when allow_affine_moves is False.
affine_jitter – Per-atom jitter scale ∈ (0, 1) relative to move_step. When None (default) affine_strength is used. Set to 0.0 to disable per-atom jitter in affine moves. Has no effect when allow_affine_moves is False.
frag_threshold – Local Q6 threshold for fragment selection (default: 0.3). Atoms with local Q6 > threshold are preferentially displaced.
move_step – Maximum displacement magnitude per coordinate step (Å, default: 0.5). Also used as the per-atom jitter scale in affine moves (× 0.25).
lcc_threshold – Minimum graph_lcc required to accept a step (default: 0.0, i.e. no connectivity constraint). Set to 0.8 to enforce that at least 80 % of atoms remain connected.
cov_scale – Minimum distance scale factor for relax_positions().
relax_cycles – Max repulsion-relaxation cycles per step. Basin-Hopping uses 3× this value for its local-minimisation step.
cutoff – Distance cutoff (Å) for Steinhardt / graph metrics. Auto-computed from the element pool when None.
n_bins – Histogram bins for H_spatial / RDF_dev (default: 20).
w_atom – Weight of H_atom in H_total (default: 0.5).
w_spatial – Weight of H_spatial in H_total (default: 0.5).
n_restarts – Independent optimization runs (default: 1). The best result across all restarts is returned.
max_init_attempts –
Maximum number of single-sample tries that _make_initial() makes per restart when generating the starting structure (default: 0 = unlimited).
- 0 — unlimited retries (recommended for production runs with large or constrained element pools). Safe because __init__() validates at construction time that the element pool can satisfy the charge/multiplicity parity constraint; if that check passes, a valid structure is guaranteed to be found eventually.
- > 0 — at most max_init_attempts tries per restart. If exhausted the restart is skipped and a UserWarning is emitted. Useful as a time-budget guard in automated pipelines.
Note

__init__() raises ValueError immediately when the element pool is structurally incompatible with charge/mult (e.g. an all-nitrogen pool with charge=0, mult=1), making an infinite loop impossible for well-formed inputs.
seed – Random seed (None → non-deterministic).
verbose – Print per-step progress to stderr (default: False).

Examples

Class API:

from pasted import StructureOptimizer

opt = StructureOptimizer(
    n_atoms=50,
    charge=0, mult=1,
    elements="24,25,26,27,28",      # Cantor alloy
    objective={"H_atom": 1.0, "H_spatial": 1.0, "Q6": -2.0},
    method="annealing",
    max_steps=5000,
    lcc_threshold=0.8,
    seed=42,
)
best = opt.run()

Callable objective:

opt = StructureOptimizer(
    ...,
    objective=lambda m: m["H_spatial"] - 2.0 * m["Q6"],
)

__repr__() → str[source]#: Return repr(self).

property cutoff: float#: Distance cutoff (Å) used for Steinhardt and graph metrics.

property element_pool: list[str]#: A copy of the resolved element pool.

run(initial: Structure | None = None) → OptimizationResult[source]#

Run n_restarts optimizations and return an OptimizationResult.

Each restart begins from an independently generated random gas-mode structure (or from initial if provided). All per-restart results are collected, sorted by objective value (highest first), and returned together in an OptimizationResult.

OptimizationResult is list-compatible: result[0] and result.best both return the highest-scoring structure, and for s in result iterates all restarts in rank order. Existing code that calls opt.run() and uses the return value as a single Structure should switch to opt.run().best or opt.run()[0].

A UserWarning is emitted when one or more restarts fail to produce a valid initial structure after all internal retries are exhausted. Transient parity-check failures inside the initial- structure generation loop are silenced internally and do not reach the caller; only a definitive inability to start a restart is reported. The retry limit is controlled by max_init_attempts (0 = unlimited, the default).

Parameters:: initial – Starting structure. When None (default), a random gas-mode structure is generated automatically for each restart.
Returns:: All per-restart structures sorted by objective value (highest first), plus summary metadata. Raises RuntimeError if every restart fails to produce a valid initial structure.
Return type:: OptimizationResult
Raises:: RuntimeError – When all restarts fail to produce a valid initial structure.

Examples

Best structure only:

result = opt.run()
print(result.best)          # highest-scoring structure
print(result[0])            # same — index 0 is always the best
print(result.summary())     # one-line diagnostic

All restarts:

result = opt.run()
for rank, s in enumerate(result, 1):
    print(f"rank {rank}: f={result.objective_scores[rank-1]:.4f}  {s}")

pasted._optimizer.parse_objective_spec(specs: list[str]) → dict[str, float][source]#

Parse ["METRIC:WEIGHT", ...] into a weight dict.

Parameters:: specs – Each string must be of the form "METRIC:WEIGHT", e.g. ["H_atom:1.0", "Q6:-2.0"].
Return type:: dict[str, float]
Raises:: ValueError – On malformed strings or unknown metric names.

Note

Parity validation at construction time.

StructureOptimizer checks at __init__ time that the element pool can produce at least one composition of n_atoms atoms that satisfies the charge/multiplicity parity constraint. If it cannot, ValueError is raised immediately — before any call to run(). This makes max_init_attempts=0 (unlimited retries) safe: if construction succeeds, a valid initial structure is guaranteed to eventually be found.

Note

Move-type constraints.

allow_displacements=False and allow_composition_moves=False cannot both be set at the same time unless allow_affine_moves=True. Setting all three to False raises ValueError.

Affine moves

When allow_affine_moves=True, half of all displacement moves are replaced by random affine transforms (stretch / compress along one axis, shear one axis pair, and per-atom jitter). This lets the optimizer explore anisotropic configurations that fragment moves cannot reach efficiently.

Unlike in StructureGenerator, the affine_jitter term does have a visible effect here because move_step is non-zero during MC steps.

Position-only optimization

Set allow_composition_moves=False to fix the stoichiometry and only move atoms:

result = opt.run(initial=my_structure)
assert sorted(result.best.atoms) == sorted(my_structure.atoms)

Composition-only optimization

Set allow_displacements=False to fix the atomic coordinates and only swap element labels. Atoms outside the pool are automatically replaced by parity-compatible pool elements before the first MC step, so cross-pool starting structures work with all three methods:

result = opt.run(initial=my_structure)
import numpy as np
np.testing.assert_allclose(
    np.array(result.best.positions), np.array(my_structure.positions)
)

OptimizationResult#

class pasted._optimizer.OptimizationResult(all_structures: list[Structure] = <factory>, objective_scores: list[float] = <factory>, n_restarts_attempted: int = 0, method: str = 'annealing')[source]

Bases: object

Return value of StructureOptimizer.run().

Wraps all per-restart results and exposes the best structure as a first-class attribute. Behaves like a list[Structure] — indexing, iteration, len, and bool all work — so callers that only want the best result can access it without changing existing code:

result = opt.run()
best   = result.best        # highest-scoring Structure
best   = result[0]          # same — index 0 is always the best
for s in result:            # iterate all restarts, best first
    print(s.metrics["H_total"])

all_structures

All structures produced by each restart, sorted by objective value (highest first). all_structures[0] is always the best.

Type:: list[pasted._generator.Structure]

objective_scores

Scalar objective values corresponding to each entry in all_structures.

Type:: list[float]

n_restarts_attempted

Number of restarts that were actually run (may be less than n_restarts when initial-structure generation fails).

Type:: int

method

The optimization method used ("annealing", "basin_hopping", or "parallel_tempering").

Type:: str

Notes

Parallel Tempering result count. For method="parallel_tempering", each restart contributes one entry for the global best plus one entry for each replica whose final objective value differs from the global best. The total len(result) therefore satisfies:

n_restarts <= len(result) <= n_restarts * (n_replicas + 1)

For method="annealing" and method="basin_hopping", each restart contributes exactly one entry, so len(result) == n_restarts_attempted.

Examples

Single-structure usage (backward-compatible):

result = opt.run()
result.best.to_xyz()      # best structure
result[0].to_xyz()        # same

All-restarts usage:

result = opt.run()
print(result.summary())
for rank, s in enumerate(result, 1):
    print(f"rank {rank}: H_total={s.metrics['H_total']:.3f}")

__iter__() → Iterator[Structure][source]

__len__() → int[source]

__repr__() → str[source]: Return repr(self).

all_structures: list[Structure]

property best: Structure: The structure with the highest objective value.

method: str = 'annealing'

n_restarts_attempted: int = 0

objective_scores: list[float]

summary() → str[source]

Return a human-readable one-line summary of the optimization run.

Returns:: E.g. "restarts=5 best_f=1.2294 worst_f=0.7823 method='annealing'".
Return type:: str

EvalContext#

class pasted._optimizer.EvalContext(atoms: tuple[str, ...], positions: tuple[tuple[float, float, float], ...], charge: int, mult: int, n_atoms: int, metrics: dict[str, float], step: int, max_steps: int, temperature: float, f_current: float, best_f: float, restart_idx: int, n_restarts: int, per_atom_q6: ndarray, replica_idx: int | None, replica_temperature: float | None, n_replicas: int | None, element_pool: tuple[str, ...], cutoff: float, method: str, T_start: float, T_end: float, seed: int | None)[source]#

Bases: object

Full evaluation context passed as the second argument to a 2-parameter objective callable.

EvalContext consolidates every piece of information available at the moment the objective function is called: the current structure (atoms, positions, charge/mult), all pre-computed disorder metrics, and the live optimizer state (step number, temperature, best score seen so far, etc.). This design allows user-supplied objective functions to call external quantum-chemistry or machine-learning potential tools without depending on PASTED internals, and to implement adaptive or state-aware objectives.

Attributes — Structure#

atoms:: Element symbols for the current candidate structure, one per atom (e.g. ("C", "H", "O", ...)).
positions:: Cartesian coordinates in Å, one (x, y, z) tuple per atom.
charge:: Total system charge.
mult:: Spin multiplicity 2S+1.
n_atoms:: Number of atoms (len(atoms)).
metrics:: Computed disorder metrics dict — same reference as the m argument in the objective callable. Treat as read-only.

Attributes — Optimizer Runtime State#

step:: Current MC step index, 0-based. Ranges from 0 to max_steps - 1. Useful for progress-dependent or curriculum objectives.
max_steps:: Total number of MC steps per restart.
temperature:: Current temperature at this step. For "annealing" this decreases exponentially; for "basin_hopping" it is fixed at T_start; for "parallel_tempering" it is this replica’s fixed temperature.
f_current:: Objective value of the most recently accepted state. Use this to compute improvement margins or relative scores.
best_f:: Best objective value seen across all steps so far in this restart.
restart_idx:: 0-based index of the current restart.
n_restarts:: Total number of restarts configured.
per_atom_q6:: Per-atom Steinhardt Q6 values from the previous accepted step (shape [n_atoms], dtype float64). Already computed by the optimizer loop; available at zero additional cost. Treat the array as read-only — it is a reference, not a copy.

Attributes — Parallel Tempering (`None` for other methods)#

replica_idx:: 0-based index of the current replica (0 = coldest, n_replicas - 1 = hottest). None when method != "parallel_tempering".
replica_temperature:: This replica’s fixed temperature. None when method != "parallel_tempering".
n_replicas:: Total number of replicas. None when method != "parallel_tempering".

Attributes — Optimizer Configuration#

element_pool:: Tuple of element symbols available for composition moves.
cutoff:: Distance cutoff in Å used for Steinhardt and graph metrics.
method:: Optimization method: "annealing", "basin_hopping", or "parallel_tempering".
T_start:: Starting temperature.
T_end:: Ending temperature (for "annealing").
seed:: Random seed, or None if unseeded.

Full evaluation context passed as the second argument to a 2-parameter objective callable. Consolidates the current candidate structure, all pre-computed disorder metrics, and the live optimizer runtime state.

Calling conventions

Two calling conventions are supported for the objective parameter of StructureOptimizer:

1-argument f(m) — m is a dict[str, float] of disorder metrics. Fully backward-compatible with all existing code.
2-argument f(m, ctx) — m is the same metrics dict; ctx is an EvalContext. Dispatch is based on the number of required positional parameters via inspect.signature(). A callable with a default for the second argument (lambda m, ctx=None:) is treated as 1-argument.

ObjectiveType alias

ObjectiveType = (
    dict[str, float]
    | Callable[[dict[str, float]], float]
    | Callable[[dict[str, float], EvalContext], float]
)

EvalContext is exported from the top-level pasted namespace:

from pasted import EvalContext

Example — adaptive curriculum objective

def curriculum_objective(m: dict, ctx: EvalContext) -> float:
    """Broad exploration early, strong Q6 penalty late."""
    base = m["H_total"]
    if ctx.progress < 0.5:
        return base
    else:
        return base - 3.0 * m["Q6"]

opt = StructureOptimizer(
    n_atoms=15, charge=0, mult=1, elements="6,7,8,16",
    objective=curriculum_objective,
    method="annealing", max_steps=4000, seed=7,
)

Example — per-atom Q6 locality penalty

import numpy as np

def local_disorder_objective(m: dict, ctx: EvalContext) -> float:
    q6_var = float(np.var(ctx.per_atom_q6))
    q6_max = float(np.max(ctx.per_atom_q6))
    return m["H_total"] + q6_var * 0.5 - q6_max * 1.0

T_end: float#

T_start: float#

__repr__()#: Return repr(self).

atoms: tuple[str, ...]#

best_f: float#

charge: int#

cutoff: float#

element_pool: tuple[str, ...]#

f_current: float#

max_steps: int#

method: str#

metrics: dict[str, float]#

mult: int#

n_atoms: int#

n_replicas: int | None#

n_restarts: int#

per_atom_q6: ndarray#

positions: tuple[tuple[float, float, float], ...]#

property progress: float#

step / max_steps.

Returns a float in [0.0, 1.0) useful for curriculum-style objectives that change behavior over the course of a run.

Type:: Fractional progress of the current restart

replica_idx: int | None#

replica_temperature: float | None#

restart_idx: int#

seed: int | None#

step: int#

temperature: float#

to_xyz(comment: str = '') → str[source]#

Return a well-formed XYZ-format string for the current structure.

The string is suitable for writing directly to a .xyz file and passing to external tools such as xTB, ORCA, or any ASE calculator.

Parameters:: comment – Optional comment placed on the second line of the XYZ block. When empty, a default comment containing charge and multiplicity is generated automatically.
Returns:: Multi-line XYZ string (no trailing newline).
Return type:: str

StructureOptimizer

Contents

StructureOptimizer#

OptimizationResult#

EvalContext#

Attributes — Structure#

Attributes — Optimizer Runtime State#

Attributes — Parallel Tempering (None for other methods)#

Attributes — Optimizer Configuration#

Attributes — Parallel Tempering (`None` for other methods)#