.. role:: raw-html-m2r(raw) :format: html Neural networks =============== Construction of a neural network (NN) interatomic potential requires three steps: * generation of reference data, i.e., *ab initio* energy and, optionally, forces for relevant structures * data parsing, i.e., converting the structural data into suitable NN input * model training, i.e., fitting the NN model's parameters to the reference data The following sections overview these steps in the framework of the MAISE package. Reference data generation ------------------------- An *ab initio* dataset can be generated with any code that produces standard target values for training NN models with MAISE: total energies and atomic forces. Although the *ab initio* method for reference calculations is chosen by the user, the data should be represented in a particular way to be readable by MAISE. The structural information should be specified in the VASP **POSCAR** file format and named ``POSCAR.0`` while the total energy, unit cell stress components (currently not used for NN training), and atomic forces corresponding to each structure should be given in a ``dat.dat`` file with the following format: .. literalinclude:: ./file/dat.dat :language: none This information can be easily extracted from an **OUTCAR** file of a single-point total energy/enthalpy VASP calculation using these bash commands: .. literalinclude:: ./file/jdat :language: none Each data directory should contain only one pair of the corresponding ``POSCAR.0`` and ``dat.dat`` files. The full set of directories should be organized in a particular fashion for MAISE to proprely process the data into training and testing sets. The following diagram illustrates the exected hierarchy, while the `Data parsing`_ section provides further details on data organization. .. figure:: ./figs/dataset.png :align: center In this structure, each data point (a pair of ``POSCAR.0`` and ``dat.dat`` files) should be inside a directory and the collection of these data points should be in a parent directory, e.g., ``main_dir/``. In this example, MAISE will process the full directory ``CuAg`` specified in the ``setup`` file with the `DEPO <#depo1>`_ FLAG. It will determine that the are two direct subdirectories, or **batches**, and treat each **batch** as a group of comparable data in terms of composition, dimensionality, conditions, etc. For instance, ``108`` could be a collection of 3D crystal structures evaluated at 0 GPa, while ``114`` could be a set of small clusters. This separation allows MAISE to filter out unphysical or irrelevant structures using settings in the ``setup`` file. By optionally placing a ``tag`` file in a subdirectory, one can overwrite the settings for that **batch**. Please see `Data filtering options`_ for more detail. .. note:: In Ref. [1_] we introduced our approach to generate the NN training data using unconstrained evolutionary searches. Further generatlizaion of that approach is provided in our recent study [2_] in which dataset generation is carried out in cycles of evolutionary runs. As discussed in these studies, we have found the following list to be good practices in dataset generation: * inclusion of some customized data to the training set, e.g., equation of state (EOS) data for select structures (the dimer, FCC, BCC, HCP, etc.) as they teach the NN to disfavor unphysical configurations that can be inadvertently probed in global searches or MD runs * elimination of structures that are either too similar to each other or clearly irrelevant to the regions of configuration space that are not of our interest for the problem in hand * exclusion of structures with unphysically high energy or forces Data parsing ------------ The data processing step allows the user to filter out irrelevant configurations, earmark structures for training and testing, and parse atomic environments into NN inputs. Here, the idea is to precompute and store NN inputs for each structure only once to avoid performing this costly operation at each NN fitting step. Data parsing is done in a single run with `JOBT <#jobt1>`__ = 30 FLAG in the ``setup`` file, produces a file for each structure with parsed energy/force NN inputs, and collects statistics on the energy, force, volume, and RDF distributions in the full dataset. Parsing task includes filtering, earmarking, and parsing operations. These operations can be customized by (i) choosing FLAGs in the ``setup`` file; (ii) arranging the data by type into subdirectories and specifying energy thresholds and intended application type (training or testing) for acceptable data in ``tag`` file; (iii) and specifying Behler-Parrinello (BP) symmetry functions [3_] in the ``basis`` file for converting atomic environments into NN input. Data parsing setup ~~~~~~~~~~~~~~~~~~ `Table 1 <#setupparsetable>`_ lists FLAGs in the MAISE ``setup`` file that define the data parsing task. .. _setupparsetable: .. table:: Table 1: Setup FLAGs that define the data parsing task. +------------------+-----------------------------------------------------------------------------+ | FLAG | Short description | +==================+=============================================================================+ | `JOBT <#jobt1>`_ | Data parsing (30) | +------------------+-----------------------------------------------------------------------------+ | `NPAR <#npar1>`_ | Number of the cores for parsing task | +------------------+-----------------------------------------------------------------------------+ | `TEFS <#tefs1>`_ | Parsing for: Energy (0); Energy-Force (1) | +------------------+-----------------------------------------------------------------------------+ | `FMRK <#fmrk1>`_ | Fraction of atoms that will be parsed to use for EF training | +------------------+-----------------------------------------------------------------------------+ | `NSPC <#nspc1>`_ | Number of element types for dataset parsing and training | +------------------+-----------------------------------------------------------------------------+ | `NSYM <#nsym1>`_ | Atomic number of the elements specified with NSPC tag | +------------------+-----------------------------------------------------------------------------+ | `NCMP <#ncmp1>`_ | The length of the input vector of the neural network | +------------------+-----------------------------------------------------------------------------+ | `ECUT <#ecut1>`_ | Parse only this fraction of lowest-energy structures (from 0 to 1) | +------------------+-----------------------------------------------------------------------------+ | `EMAX <#emax1>`_ | Maximum energy from the lowest-energy structure that is parsed | +------------------+-----------------------------------------------------------------------------+ | `FMAX <#fmax1>`_ | Will not parse data with forces larger than this value | +------------------+-----------------------------------------------------------------------------+ | `RAND <#rand1>`_ | Random seed for the parsing: time (0); seed value (+); no randomization (-) | +------------------+-----------------------------------------------------------------------------+ | `DEPO <#depo1>`_ | Path to the DFT datasets to be parsed | +------------------+-----------------------------------------------------------------------------+ | `DATA <#data1>`_ | Location of the parsed data to write the parsed data | +------------------+-----------------------------------------------------------------------------+ .. _jobt1: **JOBT** , set to 30, initiates the parsing task. .. _npar1: **NPAR** is number of cores to be used in the parsing job. The parallelization is done with over atoms in each structure. .. _tefs1: **TEFS** defines what type of data parsing is performed for subsequent use in the NN model training and testing. In **TEFS** = 0 parsing, only energy (E) data is processed. In **TEFS** = 1 parsing, both energy and force (EF) data are processed. .. _fmrk1: **FMRK** is a real number between 0.0 and 1.0 defining what fraction of atoms in each structure, provided that `TEFS <#tefs1>`__ = 1, will be processed for subsequent energy-force training. For each marked atom, the code parses all x, y, and z components of the force. Note that forces below :math:`10^{-5}` eV/A are ignored, as they are likely close to zero by symmetry. .. _nspc1: **NSPC** is the total number of the atomic species that are present in the dataset. .. _tspc1: **TSPC** is a list of all species atomic numbers present in the dataset. This list should be ordered from the lowest to highest atomic number. .. _nsym1: **NSYM** is the total number of the BP symmetry functions that will be used for data parsing. This number should match the number of the introduced symmetry functions in the ``basis`` file. .. _ncmp1: **NCMP** is the total number of the input vector component (per species in case of the multi-component systems) which will be produced from the introduced BP symmetry functions. This number depends on the number of species in the system (`NSPC <#nspc1>`__) and the number of the radial and angular functions in that total. Having N\ :sub:`2`\ and N\ :sub:`3`\ radial and angular BP functions, this number will be equal to `NSPC <#nspc1>`__ \ :math:`\times`\ N\ :sub:`2`\ + `NSPC <#nspc1>`__ \ :math:`\times`\ (`NSPC <#nspc1>`__-1)\ :math:`\times`\ ...\ :math:`\times`\ 1\ :math:`\times`\ N\ :sub:`3`\ . If the user does not provide the correct input number for **NCMP**, MAISE will exit with a message that suggests the correct number. .. _ecut1: **ECUT** is a a real number between 0.0 and 1.0 defining what fraction of the lowest-enthalpy structures will be kept in each **batch** (see `Reference data generation`_) after the structures in the **batch** are ranked by enthalpy per atom. The default value is 0.9 .. _emax1: **EMAX** is a cutoff in eV/atom for the highest energy structure that will be parsed. This energy window is measured with respect to the lowest enthalpy structure in each **batch**. The default value is 5.0 eV/atom. .. _fmax1: **FMAX** is a cutoff in eV/A. Any structure with an atomic force component higher than this value will not be parsed. The default value is 50 eV/A. .. _rand1: **RAND** is a random number seed that determines the arrangement of the structures in the training and testing sets. for **RAND** = 0 system time will be used, a **RAND** > 0 value will be used as seed, and for any **RAND** < 0 the structures will be parsed in the order they appear in the operating system list output. This list of parsed structures will be used at the training stage to pick training and testing sets. .. _depo1: **DEPO** is a path to the DFT data to be parsed. MAISE expects the dataset to be arranged in a format described in the `Reference data generation`_ section. .. _data1: **DATA** is a path to the location in which the parsed data will be stored. Data filtering options ~~~~~~~~~~~~~~~~~~~~~~ In data filtering, the `ECUT <#ecut1>`__, `EMAX <#emax1>`__, and `FMAX <#fmax1>`__ FLAGs described in the :ref:`table of setup parameters ` control the maximum values of energy (enthalpy) and forces allowed in the database. A single energy cutoff is ill-defined or not helpful if the database contains entries with different structure types (clusters or crystal structures), compositions (in multielement systems), or simulation conditions (pressure values). Provided that the data is sorted into **batch** subdirectories by type, `ECUT <#ecut1>`__ and `EMAX <#emax1>`__ are applied to the energy per atom within each subset. These values can be overwritten for a specific subset by placing a ``tag`` file in the corresponding subdirectory. This ``tag`` file can also be used to promote the inclusion of the subset, e.g., EOS data, into the training set. This is an example of a ``tag`` file illustrating which ``setup`` FLAGs will be overwritten when this **batch** subdirectory is prossessed. .. literalinclude:: ./file/tag Behler-Parrinello descriptor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ MAISE relies on BP symmetry functions for describing the atomic environments [9_][10_]. A customizable set of BP functions is defined in the ``basis`` file. Below is an example of a ``basis`` with a typical descriptor that has been used for generating the standard library of NN models in MAISE. In this set of models, we typically use 51 functions per element with the cutoff expanded from 6.0 A to 7.5 A and the corresponding η parameters rescaled by a factor of 1/(1.25*1.25) (for more details see our previous study [4_]). The GN values of (2 and 4) correspond to the pair and triplet BP functions defined as (G1 and G2) in Ref. [9_] or as (G2 and G4) in Ref. [10_]. n1 and n2 are eta parameters in the G2 and G4 functions while l is lambda in G4. k is an obsolete parameter no longer used in the BP functions. z is the zeta power value in G4. The original set of BP parameters was defined in Bohrs while MAISE performs calculations in Angstroms. So, if the Rc, n1, and n2 parameters are specified in Bohrs, they will be converted to Angstroms using this a = 0.529177249 conversion factor as follows: Rc = a*Rc, n1 = n1/(a*a), and n2 = n2/(a*a). The original BP functions cutoff radius was 6.0 Ang, while our current default is 7.5 Ang. So, we use r = 1.25 to rescale these parameters in the same way: Rc = r*Rc, n1 = n1/(r*r), and n2 = n2/(r*r). Note that if you define your own BP parameters in Angstroms, you can set both a and r to 1.0. .. literalinclude:: ./file/basis :language: none Data parsing execution ~~~~~~~~~~~~~~~~~~~~~~ With the required input files, i.e., ``setup`` and ``basis`` files in the current directory, the parsing task can be performed by running MAISE: .. code-block:: console $ maise For a typical database of a few thousand medium-size structures, the job takes a few minutes. Data parsing output ~~~~~~~~~~~~~~~~~~~ The output of the data parsing job is as follows: * A set of ``e*`` files contains the *ab initio* target energy/force values along with structural information converted to NN input vectors with the helpf of the chosen the BP symmetry functions. These ``e*`` files will be imported at the time of NN fitting to create the training and testing sets. * A ``stamp.dat`` file summarizes the most important parameters/specifications of the parsed data. An example of the ``stamp.dat`` file is presented here: .. literalinclude:: ./file/stamp.dat :language: none * ``index.dat`` file contains an ordered list of the parsed structures. The order is determined by the `RAND <#rand1>`__ FLAG in the parsing setup. * ``ve.dat`` contains a list of the volume per atom, energy per atom, and maximum atomic force component for each parsed structure. * ``RDFP.dat`` file is the average RDF profile of the parsed dataset. Neural network training ----------------------- The default NN implemented in MAISE has a standard feed-forward architecture with one bias per input or hidden layer. Signals are processed with hyperbolic activation functions in hidden layers and with the linear function in the output neuron. The filtered and parsed data can be split into training and testing sets; data earmarked for training with ``tag`` files in the corresponding subdirectories (see `Data filtering options`_ section) has a higher priority to be placed into the training set. NN fitting via backpropagation can be performed with BFGS [5_] or CG [6_] algorithm as implemented in the GSL [7_] by minimizing the root-mean-square-error between target and NN output values. Energy-only (E) and energy-force (EF) training types are available. For the latter, please make sure that the data is parsed into both energy and force NN inputs (FLAG `TEFS <#tefs1>`_ =1). Besides the traditional full training in which all NN weights are optimized, MAISE has an option to use the **stratified** and **generalized stratified** training schemes. These approaches are briefly introduced in the following sections, while a detailed description of these schemes is available in Refs. [1_] and [2_], respectively. NN training setup ~~~~~~~~~~~~~~~~~ A NN training job is fully configured in a single ``setup`` file. .. _setuptraintable: .. table:: Table 2: Parameters for NN training. +------------------+-----------------------------------------------------------------------------+ | FLAG | Short description | +==================+=============================================================================+ | `JOBT <#jobt2>`_ | Training type: full training (40); stratified training (41) | +------------------+-----------------------------------------------------------------------------+ | `NPAR <#npar2>`_ | Number of cores for parallel training | +------------------+-----------------------------------------------------------------------------+ | `MINT <#mint2>`_ | The optimizer algorithm for neural network training | +------------------+-----------------------------------------------------------------------------+ | `MITR <#mitr2>`_ | Number of the optimization steps for training | +------------------+-----------------------------------------------------------------------------+ | `ETOL <#etol2>`_ | Error tolerance for training | +------------------+-----------------------------------------------------------------------------+ | `TEFS <#tefs2>`_ | Training target value: E (0); EF (1) | +------------------+-----------------------------------------------------------------------------+ | `NSPC <#nspc2>`_ | Number of element types for dataset parsing and training | +------------------+-----------------------------------------------------------------------------+ | `TSPC <#tspc2>`_ | Atomic number of the elements specified with NSPC tag | +------------------+-----------------------------------------------------------------------------+ | `NSYM <#nsym2>`_ | Mumber of the BP symmetry functions for parsing data | +------------------+-----------------------------------------------------------------------------+ | `NCOM <#ncmp2>`_ | The length of the input vector of the neural network | +------------------+-----------------------------------------------------------------------------+ | `NTRN <#ntrn2>`_ | Number of structures used for training (negative number means percentage) | +------------------+-----------------------------------------------------------------------------+ | `NTST <#ntst2>`_ | Number of structures used for testing (negative number means percentage) | +------------------+-----------------------------------------------------------------------------+ | `NNNN <#nnnn2>`_ | Number of hidden layers (does not include input vector and output neuron) | +------------------+-----------------------------------------------------------------------------+ | `NNNU <#nnnu2>`_ | Number of neurons in hidden layers | +------------------+-----------------------------------------------------------------------------+ | `NNGT <#nngt2>`_ | Activation function of the hidden layers’ neurons: linear (0); tanh (1) | +------------------+-----------------------------------------------------------------------------+ | `LREG <#lreg2>`_ | Regularization parameter | +------------------+-----------------------------------------------------------------------------+ | `SEED <#seed2>`_ | Rand seed for generating NN weights (0 for system time) | +------------------+-----------------------------------------------------------------------------+ | `DATA <#data2>`_ | Location of the parsed data to read from for training | +------------------+-----------------------------------------------------------------------------+ | `OTPT <#otpt2>`_ | Directory for storing model parameters in the training process | +------------------+-----------------------------------------------------------------------------+ | `EVAL <#eval2>`_ | Directory for model testing data | +------------------+-----------------------------------------------------------------------------+ .. _jobt2: **JOBT** specifies the training task and its type: * **40** full training * **41** stratified training .. _npar2: **NPAR** is the number of the cores to be used for the NN training. Here, the parallelization is done over all structures in the training/testing set. .. _mint2: **MINT** specified the type of the optimizer to be used for the NN training. BFGS2 typically provides the most efficient optimization. * **0** BFGS2 * **1** CG-FR * **2** CG-PR * **3** steepest descent .. _mitr2: **MITR** is the number of the optimization steps (epochs) to be performed in the NN training task. .. _etol2: **ETOL** is a minimization stopping criterion. The training exits if the difference between the total errors for two subsequent steps falls below this value. Usually, it is better to set **ETOL** to a very small value and control the length of NN training with the `MITR <#mitr2>`__ FLAG. .. _tefs2: **TEFS** is type of the target value to be used in the NN training: * **0** energy only * **1** energy and force This FLAG should be consistent with the corresponding `TEFS <#tefs1>`_. FLAG in the data parsing. Force training is possible only if the force data is processed during the parsing stage. .. _nspc2: **NSPC** is the total number of the atomic species that are present in the dataset. .. _tspc2: **TSPC** is a list of all species atomic numbers present in the dataset. This list should be ordered from the lowest to highest atomic number. .. _nsym2: **NSYM** is the total number of the BP symmetry functions used for data parsing. This number should match the number of the introduced symmetry functions in the ``basis`` file. It should be consistent with the the corresponding value used at the parsing stage. The value can be retrieved from the ``stamp.dat`` file. .. _ncmp2: **NCMP** is the total number of the input vector component (per species in case of the multi-component systems) which will be produced from the introduced BP symmetry functions. It should be consistent with the the corresponding value used at the parsing stage. The value can be retrieved from the ``stamp.dat`` file. .. _ntrn2: **NTRN** is the number (if **NTRN** is positive) or the fraction (if **NTRN** is negative) of the parsed data which will be used for the NN training. At the time of the parsing, a list of structures is generated in the ``index.dat`` file in the parsed data location; the **NTRN** FLAG will read that list and import **NTRN** number/percent of the data from the *beginning* of the list for training. .. _ntst2: **NTST** is the number (if **NTST** is positive) or the fraction (if **NTST** is negative) of the parsed data which will be used for the NN testing. At the time of the parsing, a list of structures is generated in the ``index.dat`` file in the parsed data location; the **NTST** FLAG will read that list and import **NTST** number/percent of the data from the *end* of the list for the NN testing. .. _nnnn2: **NNNN** is the number of the hiddern layers in the NN excluding the input and the single-neuron output layers. Currently MAISE supports up to 2 hidden layers (hence, a total of 4 layers). .. _nnnu2: **NNNU** is a list specifying the number of neurons in each hidden layer. The number of provided values should match the `NNNN <#nnnn2>`__ FLAG. .. _nngt2: **NNGT** is the type of the activation function for neurons in each hidden layer: * **0** linear * **1** tanh The input layer has no neurons while the neuron in the output layer always has the linear activation function. .. _lreg2: **LREG** is the magnitude of the L\ :sub:`2`\ regularization parameter. Typical values are between :math:`10^{-8}` and :math:`10^{-6}`. .. _seed2: **SEED** is the seed for randomizing initial values of the NN weights at the beginning of the NN training. For **SEED** = 0, the system time will be used; a **SEED** > 0 value will be used as seed. .. _data2: **DATA** specifies the location of the parsed data to read from for the NN training. .. _otpt2: **OTPT** specifies the directory for storing model parameters in the training process. .. _eval2: **EVAL** specifies the directory for model testing data. The format of this evaluation data is not yet described in this manual. Training job submission ~~~~~~~~~~~~~~~~~~~~~~~ With the ``setup`` file in the current directory, the training task can be performed by running MAISE: .. code-block:: console $ maise However, as the training is a time-consuming task, it is advisable to submit the job to a compute node using a queueing system. Make sure to match the number of the requested cores with the `NPAR <#npar2>`_ parallelization FLAG. An example of the submissoin script for the **slurm** cluster is as follows: .. literalinclude:: ./file/jtrn :language: none .. note:: * For the case of the **stratified training**, substituent models should be placed in the working directory, e.g., ’Cu.dat’ and ’Pd.dat’ for fitting the Cu-Pd binary NN, or ’CuPd.dat’, ’CuAg.dat’, and ’PdAg.dat’ for fitting the Cu-Pd-Ag ternary NN. * Presently, MAISE allows for training NN models with up to 3 elements. While the treatment of systems with more elements is possible conceptually, the practical cost of data generation and parameter optimization becomes expensive. NN training output ~~~~~~~~~~~~~~~~~~ After the training job is finished a set of output files are being generated by MAISE. These files and their content are as follows: * ``model`` file is the main output of the training job. It contains a header section with the model/system specifications and training/testing errors, the optimized parameters of the NN model, and a copy of the ``basis`` file used for the data parsing. Here is an example of the header section of a ``model`` file: .. literalinclude:: ./file/model :lines: 1-38 :language: none * ``err-out.dat`` stores the residual error during the optimization process. * ``err-ene.dat`` contains a list of the *ab initio* target energies, energies produced by the optimized model, and their difference (i.e., the error in estimating the total energy) for all structures in the training and testing sets. * ``err-frc.dat`` file which is produced only in the case of the energy-force training and contains average errors in evaluating the atomic force components for the structures in the training and testing sets. Available MAISE models are listed `here `__. Stratified training ~~~~~~~~~~~~~~~~~~~ Under ideal conditions - given a complete basis for representing atomic environments within a large cutoff sphere, unlimited number of adjustable parameters and reference data, and a powerful fitting algorithm - a multielement NN with fully optimized elemental and interspecies weights is expected to accurately map the PES for all subsystems. In practice, the use of approximations leads to the following problem. Suppose one wishes to fit a model describing A, B, and AB phases given three datasets of A, B, and AB structures. Let’s say that the PES of element A happens to be trivial and can be approximated with negligible error in the region spanned by the A data. If one now fits all parameters simultaneously to the full A, B, and AB dataset the larger error will be distributed across all elemental and binary systems. In other words, the addition of B and AB data unphysically alters the description of the elemental A phases. It should be noted that the constrained NN architecture does account for the change in the interaction strength between A atoms induced by the presence of B atoms because the AA/AAA inputs are mixed in with the AB/AAB/ABB inputs via neurons’ non-linear activation functions. In addition to having a more sound foundation, the stratification procedure significantly accelerates the creation of NN libraries. For example, the full training of a binary AB model on all A, B, and AB data takes about the same time as the sequential training of A, B, and AB models on the corresponding data subsets. However, for an extended block of A, B, and C elements, the standard approach involves the fitting of AC and BC NNs from scratch, while the inheritance of A and B weights in the stratified scheme reduces the total fitting time by at least a factor of two. The speed-up increases dramatically as more elements are added and ternary models are built. Here is a schematic of the stratified training implemented in the MAISE code: .. figure:: ./figs/strat.png :scale: 60% :align: center Schematic representation of stratified training of multilayer NNs. Here we show only one hidden layer of neurons and only pair input symmetry functions. Weights and element species shown in bold green are adjusted while the ones in thin red are kept fixed during the training from the bottom up. The letters denote species types in the input vector; for example, AAB describes a triplet symmetry function centered on an A-type atom with A-type and B-type neighbors (the order of neighbors is irrelevant) Generalized stratified training ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to extend the stratified procedure to materials with more complex interactions and an arbitrary number of elements, we have considered more flexible NN architectures that still preserve the intact description of the subsystems. Compared to the original stratified NN layout [1_], it involves addition of new neurons, shown as green units in below figure, with different connection patterns and conditions. .. figure:: ./figs/stratplus.png :scale: 60% :align: center Schematic illustration of stratified+ (top row) and stratified± (bottom row) NN architectures for a binary chemical system. The expansion of the original stratified architecture is done with the addition of new neurons shown in green. The weights of elemental NNs (middle row) are copied and kept fixed in all stratified variations. Free, coupled, and fixed weights are shown in green, yellow, and red, respectively. (a) Connections in a simplified NN with one hidden layer and only pair inputs. The partial constraints shown in yellow and explained in the main text ensure intact description of the elemental structures. (b) Color-coded degrees of weight constraints in NNs with pair and triplet inputs. The original and stratified+ schemes have 60% adjustable weights in the first layer in binaries, 11% in ternaries (e.g., only the last one among AA, AB, AC, AAA, AAB, AAC, ABB, ACC, ABC, see Ref. [1_]), and none in quaternaries. The stratified± architecture can be used for an arbitrary number of chemical elements. The schematic of a ’stratified+’ binary NN (top row in above figure) illustrates that as long as there are no connections from the inputs or neurons in the elemental subnets to the inserted neurons, the new adjustable weights do not alter the signal processing for pure elemental structures. Despite the added flexibility, the NN still does not allow the proper fitting of interactions in compounds with more than three chemical elements. Indeed, the adjustable parts of such NNs involve 60% of inputs in binaries (top right box in above figure), 11% of inputs in ternaries (caption of the above figure) and none for systems with more elements. This restriction is actually imposed by the NN architecture, and can be lifted as follows. The ’stratified±’ expansion (bottom row in the above figure) introduces semi-adjustable links even in the inherited parts of the merged NN. We add neurons in pairs, coupling the two weights incoming from each subsystem input to have opposite values while coupling the two outgoing weights to be the same. For a purely elemental structure, the interspecies input values are zero and the net signal (at neuron 5) from each elemental input (1) passed through the paired neurons (3&4) will be zero as well regardless of the coupled weight magnitudes. For a binary structure, the non-zero binary inputs multiplied by fully unconstrained weights will unbalance the elemental signals because of the non-linear nature of the activation function resulting in a non-zero contribution at neuron 5 that depends on both elemental and binary (semi)adjustable weights. The set of new partially constrained weights shown in yellow in the above figure enables the stratified± NN to better capture the screening and charge transfer effects as well as describe interactions in systems with an unlimited number of species. In a trial implementation, we imposed the constraint by penalizing the mismatch between the coupled weights as :math:`\Sigma_N \sigma(w_{1,N}\pm w_{2,N})^2`. We have observed no need to adjust the :math:`\sigma` penalty factor during the NN optimization, as the differences between coupled weight magnitudes become negligible after a few dozen training steps; near the end of optimization, we set the magnitudes to their average and keep them fixed without any appreciable effect on the error. To the best of our knowledge, this semi-constrained solution for systematically expanding NN features has not been considered in the field of materials modeling. One way to determine whether the use of the expanded NN architectures is warranted is to reoptimize the standard stratified NN without any constraints on the full dataset. A significant reduction in the training and testing errors would indicate the need for additional NN flexibility. In our studies of metal alloys, the error reductions are usually in the 0-15% range (e.g., see Figure 4 in Ref. [1_]). Our preliminary tests have shown that both stratified+ and ± architectures end up with errors about midway between those in the stratified and full NNs. In order to quantify the improvements arising from the additional degrees of freedom in each scheme, we plan to investigate more challenging systems comprised of different element types in future studies. .. _1: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.95.014114 .. _2: https://arxiv.org/abs/2005.12131 .. _3: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401 .. _4: https://pubs.rsc.org/en/content/articlelanding/2018/CP/C8CP05314F#!divAbstract .. _5: https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.540030212 .. _6: https://nvlpubs.nist.gov/nistpubs/jres/049/jresv49n6p409_A1b.pdf .. _7: http://www.gnu.org/software/gsl/ .. _8: https://github.com/maise-guide/maise-net .. _9: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401 .. _10: https://aip.scitation.org/doi/abs/10.1063/1.3553717?journalCode=jcp