بتوقيت بيروت - 9/7/2025 11:01:32 AM - GMT (+2 )


< div JannaLT', Arial, sans-serif; font-size: 18px; line-height: 2.3; color: #333; text-align: justify; text-justify: inter-word;" >
Once training has completed, the weight matrix W is quantized to signed 9-bit integers using
$$Wapprox frac{text{max}(W)}{255},text{clamp},{left(text{round},left(frac{W}{text{max}(W)}times 255right)right)}_{text{min}=-256}^{text{max}=255},=,frac{text{max}(W)}{255}{W}_{{rm{Q}}},$$ (4)
with the rounded and clamped matrix on the right-hand side being the quantized weight matrix WQ. Whenever we report AOC-DT results, we report results obtained with the quantized matrix. Exporting trained models to the AOC requires several further steps. First, the model inputs x and the bias term b need to be condensed into a single vector bAOC = b + x followed by clamp to ensure the values fit into the dynamic range of the AOC device (Supplementary Information section D). Second, as the optical matrix multiplication is implemented using SLMs, elements of the weight matrix are bounded by one such that all quantization-related factors disappear. However, the original maximum element of the matrix max(W) needs to be re-injected, which we achieve via the β gain in equation (2), approximately restoring the original matrix W. The quantized matrix is split into positive and negative parts, ({W}_{{rm{Q}}}={W}_{{rm{Q}}}^{+}-{W}_{{rm{Q}}}^{-}), and each part is displayed on its respective SLM.
AOC sampling and workflowEach classification instance (that is, MNIST or Fashion-MNIST test image) is run once on the AOC, and the fixed point is sampled at the point marked in Extended Data Fig. 3 after a short 2.5-μs cooldown window after the switch is closed, as shown in Extended Data Fig. 5a,b. The sampling window extends over 40 samples at 6.25 MHz, corresponding to 6.4 μs. This ensures that the search of fixed points for the equilibrium models happens entirely in the analog domain. Once sampled, we digitally project the vector into the output space. For classification, the input is projected from 784 to 16 dimensions, the output is projected from 16 to 10 classes. The label is then determined by the index of the largest element in the output vector (argument-max). For regression tasks, the IP and OP layers transform a scalar to 16 dimensions and back, respectively. The MSE results in Fig. 2c were obtained by averaging over 11 repeats for each input. This means that we restart the solution process 11 times, including the sampling window, and average the resulting latent fixed-point vectors. Importantly, the solve-to-solve variability appears to be centred close to the curve produced by the AOC-DT, enabling us to average this variability out (Supplementary Fig. 6).
The 4,096-weight ensemble modelWe can expand the model sizes supported by the hardware by using an ensemble of small models that fit on it. These smaller 256-weight models are independent at inference time but are trained jointly by receiving slices 16-sized slices of a larger input vector and stacking their outputs before the OP. To scale to a 4,096-weight equilibrium model, we expand the input space from 16 to 16 × 16 = 4,096 dimensions and the output space from 10 to 10 × 16 = 160 dimensions. The IP matrix is consequently a 784 × 4,096-shaped matrix and the OP matrix is shaped 160 × 10. MNIST or Fashion-MNIST images are scaled to the range (−1, 1) and, projected to 4,096 dimensions and split into 16 slices of 16 dimensions. Each of the 16 equilibrium models then runs its respective slice of input vectors to a fixed-point. Once all 16 models are run on the AOC, we concatenate outputs and project them into the 10-dimensional output space where the largest dimension determines the predicted cipher.
Nonlinear regressionThe first curve (I) is a Gaussian rescaled such that the Gaussian curve approximately stretches from −1 to 1, ({f}_{{rm{I}}}(x)=2{{rm{e}}}^{-{x}^{2}/2{sigma }^{2}}-1) for σ = 0.25 and x ∈ (−1, 1). The second curve (II) is given by ({f}_{{rm{II}}}(x)=sqrt{| x| },sin (3{rm{pi }}x)). For training sets, we choose 10,000 equidistant points xi in the range (−1, 1) whereas for test regression datasets, we choose 200 points randomly xi ≈ U((−1, 1)).
Error estimationFor regression tasks, we concatenate the 40 samples from all 11 repeats and calculate the standard deviation per point on the curve.
Classification datasetsWe trained the MNIST and Fashion-MNIST models on 48,000 images from their respective training set, validated on a set of 12,000 images and tested them on the full test set comprising 10,000 images.
Error estimationFor experimental results, the error bars in Fig. 2d were estimated using a Bayesian approach for the decision variable ct ∈ {0, 1, …, 9} for each sample t along the sampling window per image. We assume an uninformative prior p(ct) = beta(1, 1), which we then update with the observed number of correct decisions nsuccess and failures nfailure over the sampling window. The variance of the conjugate posterior of a beta distribution is given by (mathrm{Var}({c}_{t}| {n}_{mathrm{success}},{n}_{{rm{failure}}})=frac{(1+{n}_{mathrm{success}})(1+{n}_{mathrm{failure}})}{{(2+{n}_{mathrm{success}}+{n}_{mathrm{failure}})}^{2}(3+{n}_{mathrm{success}}+{n}_{mathrm{failure}})}). We use this to estimate the variance and, by taking the square root, the standard deviation per input image. The dataset error bars are then estimated as the mean of the standard deviations over all members of the dataset.
Optimization methods Positive and negative problem weightsTo address optimization problems involving positive and negative weights on the AOC hardware, QUMO instances without linear terms can have up to eight variables, which applies to both transaction-settlement scenarios and reconstruction of one-dimensional line of the Shepp–Logan phantom image. The weight matrices are unsigned in synthetic QUMO and QUBO hardware benchmarks; hence the AOC hardware can accommodate up to 16-variable instances in the absence of linear terms. Such instance size difference arises because, when both positive and negative weights are present, non-idealities in the dual-SLM configuration reduce the accuracy of matrix–vector multiplication. To mitigate this, a single SLM is used to process both positive and negative weights, effectively halving the number of variables per instance.
Industrial optimization problemsFor the transaction-settlement scenario and the Shepp–Logan phantom image slice, their 41-variable and 64-variable QUMO instances are decomposed into smaller 7-variable QUMO instances. For each of these subinstances, the 7 variables are connected with the rest of the variables via a linear vector b, which is incorporated into the quadratic matrix W via an additional binary variable. This decomposition is repeated for each subinstance and the linear vector b is updated at the end of each BCD iteration to create the next QUMO instance. Such an approach yields 8-variable QUMO instances and a single SLM is used to represent their positive and negative matrix elements, with analog electronics handling their subtraction, which effectively utilizes the full 16-variable capacity available in hardware. The required number of BCD iterations varies depending on factors such as the initial random state of the optimization instance variables, the selection of variable blocks among subinstances, and the order in which they are optimized.
For the one-dimensional Shepp–Logan phantom image, 12 out of 32 measurements are omitted, corresponding to a 37.5% data loss or a 1.6 undersampling (acceleration) rate. Although typical MRI acceleration ranges from 2 to 8, this rate is used here owing to the image’s non-smoothness at a 32-pixel resolution.
Binary and continuous variablesIn the AOC, binary variables are encoded using a hyperbolic tangent function, whereas continuous variables utilize the near-linear region of the function, connecting optimization variables to state variables via x = f(s). In simulations at scale with the AOC-DT, linear and sign functions are used for continuous and binary variables, respectively.
Hardware QUMO instancesTo ensure that some variables take indeed continuous values in the global optimal solution, we plant random continuous values and generate synthetic 16-variable QUMO instances. As the number of continuous variables increases for a given problem size, the problem instances become computationally easier to solve. Consequently, we consider instances with up to eight continuous variables.
Hardware QUBO instancesWe generate up to 8-bit dense and sparse instances. The sparse instances belong to the QUBO model on three-regular graphs that are NP-hard51, although NP-hardness does not imply that every random instance is difficult to solve. To make these instances more challenging to solve, we verify that their global objective minimizer is distinct from the signs of the principal eigenvector of the weight matrix52.
QPLIB benchmarkThe QPLIB is a library of quadratic programming instances23 collected over almost a year-long open call from various communities, with the selected instances being challenging for state-of-the-art solvers. As described in the main part of the paper, we consider only the hardest instances within the QPLIB:QBL class of problems, which contains instances with quadratic objective and linear inequality constraints. The QPLIB:QCBO class of problems, which contains instances with quadratic objective and linear equality constraints, and the QPLIB:QBN class of problems, which contains QUBO instances, are considered in Supplementary Information section G.5.
AOC-DT operation and parametersThe distinction of the AOC-DT algorithm is the simultaneous inclusion of both momentum and annealing terms, which markedly improves the performance of the standard steepest gradient-descent method on non-convex optimization problems. Typically, multiple hyperparameters need to be calibrated for heuristic methods to achieve their best performance in solving optimization problems. We consider (alpha
إقرأ المزيد