.. role:: raw-html-m2r(raw)
   :format: html


Neural networks
===============

Construction of a neural network (NN) interatomic potential requires three steps:

* generation of reference data, i.e., *ab initio* energy and, optionally, forces for relevant structures
* data parsing, i.e., converting the structural data into suitable NN input
* model training, i.e., fitting the NN model's parameters to the reference data

The following sections overview these steps in the framework of the MAISE package.

Reference data generation
-------------------------

An *ab initio* dataset can be generated with any code that produces
standard target values for training NN models with MAISE: total
energies and atomic forces. Although the *ab initio* method for
reference calculations is chosen by the user, the data should be
represented in a particular way to be readable by MAISE. The
structural information should be specified in the VASP **POSCAR** file
format and named ``POSCAR.0`` while the total energy, unit cell stress
components (currently not used for NN training), and atomic forces
corresponding to each structure should be given in a ``dat.dat`` file
with the following format:

.. literalinclude:: ./file/dat.dat
   :language: none

This information can be easily extracted from an **OUTCAR** file of a
single-point total energy/enthalpy VASP calculation using these bash
commands:

.. literalinclude:: ./file/jdat
   :language: none

Each data directory should contain only one pair of the corresponding
``POSCAR.0`` and ``dat.dat`` files. The full set of directories should
be organized in a particular fashion for MAISE to proprely process the
data into training and testing sets. The following diagram illustrates
the exected hierarchy, while the `Data parsing`_ section provides
further details on data organization.

.. figure:: ./figs/dataset.png
   :align: center

In this structure, each data point (a pair of ``POSCAR.0`` and
``dat.dat`` files) should be inside a directory and the collection of
these data points should be in a parent directory, e.g.,
``main_dir/``. 

In this example, MAISE will process the full directory ``CuAg``
specified in the ``setup`` file with the `DEPO <#depo1>`_ FLAG. It will
determine that the are two direct subdirectories, or **batches**, and
treat each **batch** as a group of comparable data in terms of
composition, dimensionality, conditions, etc. For instance, ``108``
could be a collection of 3D crystal structures evaluated at 0 GPa,
while ``114`` could be a set of small clusters. This separation allows
MAISE to filter out unphysical or irrelevant structures using settings
in the ``setup`` file. By optionally placing a ``tag`` file in a
subdirectory, one can overwrite the settings for that
**batch**. Please see `Data filtering options`_ for more detail.

.. note::    
   In Ref. [1_] we introduced our approach to generate the NN training
   data using unconstrained evolutionary searches. Further
   generatlizaion of that approach is provided in our recent study
   [2_] in which dataset generation is carried out in cycles of
   evolutionary runs. As discussed in these studies, we have found the
   following list to be good practices in dataset generation:

   * inclusion of some customized data to the training set, e.g.,
     equation of state (EOS) data for select structures (the dimer, FCC,
     BCC, HCP, etc.) as they teach the NN to disfavor unphysical
     configurations that can be inadvertently probed in global searches or
     MD runs

   * elimination of structures that are either too similar to each other
     or clearly irrelevant to the regions of configuration space that are
     not of our interest for the problem in hand

   * exclusion of structures with unphysically high energy or forces

Data parsing
------------

The data processing step allows the user to filter out irrelevant
configurations, earmark structures for training and testing, and parse
atomic environments into NN inputs. Here, the idea is to precompute
and store NN inputs for each structure only once to avoid performing
this costly operation at each NN fitting step. Data parsing is done in
a single run with `JOBT <#jobt1>`__ = 30 FLAG in the ``setup`` file,
produces a file for each structure with parsed energy/force NN
inputs, and collects statistics on the energy, force, volume, and RDF
distributions in the full dataset. Parsing task includes filtering,
earmarking, and parsing operations. These operations can be customized
by (i) choosing FLAGs in the ``setup`` file; (ii) arranging the data
by type into subdirectories and specifying energy thresholds and
intended application type (training or testing) for acceptable data in
``tag`` file; (iii) and specifying Behler-Parrinello (BP) symmetry
functions [3_] in the ``basis`` file for converting atomic
environments into NN input.

Data parsing setup
~~~~~~~~~~~~~~~~~~

`Table 1 <#setupparsetable>`_ lists FLAGs in the MAISE ``setup`` file that define
the data parsing task.

.. _setupparsetable:

.. table:: Table 1: Setup FLAGs that define the data parsing task.
  
  +------------------+-----------------------------------------------------------------------------+
  | FLAG             | Short description                                                           |
  +==================+=============================================================================+
  | `JOBT <#jobt1>`_ | Data parsing (30)                                                           |
  +------------------+-----------------------------------------------------------------------------+
  | `NPAR <#npar1>`_ | Number of the cores for parsing task                                        |
  +------------------+-----------------------------------------------------------------------------+
  | `TEFS <#tefs1>`_ | Parsing for: Energy (0); Energy-Force (1)                                   |
  +------------------+-----------------------------------------------------------------------------+
  | `FMRK <#fmrk1>`_ | Fraction of atoms that will be parsed to use for EF training                |
  +------------------+-----------------------------------------------------------------------------+
  | `NSPC <#nspc1>`_ | Number of element types for dataset parsing and training                    |
  +------------------+-----------------------------------------------------------------------------+
  | `NSYM <#nsym1>`_ | Atomic number of the elements specified with NSPC tag                       |
  +------------------+-----------------------------------------------------------------------------+
  | `NCMP <#ncmp1>`_ | The length of the input vector of the neural network                        |
  +------------------+-----------------------------------------------------------------------------+
  | `ECUT <#ecut1>`_ | Parse only this fraction of lowest-energy structures (from 0 to 1)          |
  +------------------+-----------------------------------------------------------------------------+
  | `EMAX <#emax1>`_ | Maximum energy from the lowest-energy structure that is parsed              |
  +------------------+-----------------------------------------------------------------------------+
  | `FMAX <#fmax1>`_ | Will not parse data with forces larger than this value                      |
  +------------------+-----------------------------------------------------------------------------+
  | `RAND <#rand1>`_ | Random seed for the parsing: time (0); seed value (+); no randomization (-) |
  +------------------+-----------------------------------------------------------------------------+
  | `DEPO <#depo1>`_ | Path to the DFT datasets to be parsed                                       |
  +------------------+-----------------------------------------------------------------------------+
  | `DATA <#data1>`_ | Location of the parsed data to write the parsed data                        |
  +------------------+-----------------------------------------------------------------------------+
  
.. _jobt1:

**JOBT** , set to 30, initiates the parsing task.

.. _npar1:

**NPAR** is number of cores to be used in the parsing job. The parallelization 
is done with over atoms in each structure.

.. _tefs1:

**TEFS** defines what type of data parsing is performed for subsequent
use in the NN model training and testing. In **TEFS** = 0 parsing,
only energy (E) data is processed. In **TEFS** = 1 parsing, both
energy and force (EF) data are processed.

.. _fmrk1:

**FMRK** is a real number between 0.0 and 1.0 defining what fraction
of atoms in each structure, provided that `TEFS <#tefs1>`__ = 1, will
be processed for subsequent energy-force training. For each marked
atom, the code parses all x, y, and z components of the force. Note
that forces below :math:`10^{-5}` eV/A are ignored, as they are likely close
to zero by symmetry.

.. _nspc1:

**NSPC** is the total number of the atomic species that are present in the
dataset.

.. _tspc1:

**TSPC** is a list of all species atomic numbers present in the
dataset. This list should be ordered from the lowest to highest atomic
number.

.. _nsym1:

**NSYM** is the total number of the BP symmetry functions that will be used
for data parsing. This number should match the number of the
introduced symmetry functions in the ``basis`` file.

.. _ncmp1:

**NCMP** is the total number of the input vector component (per species in
case of the multi-component systems) which will be produced from the
introduced BP symmetry functions. This number depends on the number of
species in the system (`NSPC <#nspc1>`__) and the number of the radial and angular
functions in that total. Having N\ :sub:`2`\  and N\ :sub:`3`\  radial and angular BP
functions, this number will be equal to

`NSPC <#nspc1>`__ \ :math:`\times`\ N\ :sub:`2`\ + `NSPC <#nspc1>`__ \ :math:`\times`\ (`NSPC <#nspc1>`__-1)\ :math:`\times`\ ...\ :math:`\times`\ 1\ :math:`\times`\ N\ :sub:`3`\ .

If the user does not provide the correct input number for
**NCMP**, MAISE will exit with a message that suggests the correct number.

.. _ecut1:

**ECUT** is a a real number between 0.0 and 1.0 defining what fraction
of the lowest-enthalpy structures will be kept in each **batch** (see
`Reference data generation`_) after the structures in the **batch**
are ranked by enthalpy per atom. The default value is 0.9

.. _emax1:

**EMAX** is a cutoff in eV/atom for the highest energy structure that will be
parsed. This energy window is measured with respect to the lowest
enthalpy structure in each **batch**. The default value is 5.0 eV/atom.

.. _fmax1:

**FMAX** is a cutoff in eV/A. Any structure with an atomic force
component higher than this value will not be parsed. The default value
is 50 eV/A.

.. _rand1:

**RAND** is a random number seed that determines the arrangement of the
structures in the training and testing sets. for **RAND** = 0 system time
will be used, a **RAND** > 0 value will be used as seed, and for any **RAND** < 0
the structures will be parsed in the order they appear in the
operating system list output. This list of parsed structures will be
used at the training stage to pick training and testing sets.

.. _depo1:

**DEPO** is a path to the DFT data to be parsed. MAISE expects the
dataset to be arranged in a format described in the `Reference data generation`_ section.

.. _data1:

**DATA** is a path to the location in which the parsed data will be stored.

Data filtering options
~~~~~~~~~~~~~~~~~~~~~~

In data filtering, the `ECUT <#ecut1>`__, `EMAX <#emax1>`__, and `FMAX
<#fmax1>`__ FLAGs described in the :ref:`table of setup parameters
<setuptable>` control the maximum values of energy (enthalpy) and
forces allowed in the database. A single energy cutoff is ill-defined
or not helpful if the database contains entries with different
structure types (clusters or crystal structures), compositions (in
multielement systems), or simulation conditions (pressure
values). Provided that the data is sorted into **batch** subdirectories
by type, `ECUT <#ecut1>`__ and `EMAX <#emax1>`__ are applied to the
energy per atom within each subset. These values can be overwritten
for a specific subset by placing a ``tag`` file in the corresponding
subdirectory. This ``tag`` file can also be used to promote the
inclusion of the subset, e.g., EOS data, into the training set.

This is an example of a ``tag`` file illustrating which ``setup`` FLAGs
will be overwritten when this **batch** subdirectory is prossessed.

.. literalinclude:: ./file/tag

Behler-Parrinello descriptor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MAISE relies on BP symmetry functions for describing the atomic
environments [9_][10_]. A customizable set of BP functions is defined
in the ``basis`` file. Below is an example of a ``basis`` with a
typical descriptor that has been used for generating the standard
library of NN models in MAISE. In this set of models, we typically use
51 functions per element with the cutoff expanded from 6.0 A to 7.5 A
and the corresponding η parameters rescaled by a factor of
1/(1.25*1.25) (for more details see our previous study [4_]).

The GN values of (2 and 4) correspond to the pair and triplet BP
functions defined as (G1 and G2) in Ref. [9_] or as (G2 and G4) in
Ref. [10_]. n1 and n2 are eta parameters in the G2 and G4 functions
while l is lambda in G4. k is an obsolete parameter no longer used in
the BP functions. z is the zeta power value in G4.

The original set of BP parameters was defined in Bohrs while MAISE
performs calculations in Angstroms. So, if the Rc, n1, and n2
parameters are specified in Bohrs, they will be converted to
Angstroms using this a = 0.529177249 conversion factor as follows: Rc
= a*Rc, n1 = n1/(a*a), and n2 = n2/(a*a). The original BP functions
cutoff radius was 6.0 Ang, while our current default is 7.5 Ang. So,
we use r = 1.25 to rescale these parameters in the same way: Rc =
r*Rc, n1 = n1/(r*r), and n2 = n2/(r*r). Note that if you define your
own BP parameters in Angstroms, you can set both a and r to 1.0.

.. literalinclude:: ./file/basis
   :language: none

Data parsing execution
~~~~~~~~~~~~~~~~~~~~~~

With the required input files, i.e., ``setup`` and ``basis`` files in the
current directory, the parsing task can be performed by running MAISE:

.. code-block:: console

   $ maise

For a typical database of a few thousand medium-size structures, the
job takes a few minutes.

Data parsing output
~~~~~~~~~~~~~~~~~~~

The output of the data parsing job is as follows:

* A set of ``e*`` files contains the *ab initio* target energy/force
  values along with structural information converted to NN input
  vectors with the helpf of the chosen the BP symmetry
  functions. These ``e*`` files will be imported at the time of NN
  fitting to create the training and testing sets.

* A ``stamp.dat`` file summarizes the most important
  parameters/specifications of the parsed data. An example of the
  ``stamp.dat`` file is presented here:

.. literalinclude:: ./file/stamp.dat
   :language: none

* ``index.dat`` file contains an ordered list of the parsed
  structures. The order is determined by the `RAND <#rand1>`__ FLAG
  in the parsing setup.

* ``ve.dat`` contains a list of the volume per atom, energy per atom,
  and maximum atomic force component for each parsed structure.

* ``RDFP.dat`` file is the average RDF profile of the parsed dataset.

Neural network training
-----------------------

The default NN implemented in MAISE has a standard feed-forward
architecture with one bias per input or hidden layer. Signals are
processed with hyperbolic activation functions in hidden layers and
with the linear function in the output neuron. 

The filtered and parsed data can be split into training and testing
sets; data earmarked for training with ``tag`` files in the
corresponding subdirectories (see `Data filtering options`_ section)
has a higher priority to be placed into the training set.

NN fitting via backpropagation can be performed with BFGS [5_] or CG
[6_] algorithm as implemented in the GSL [7_] by minimizing the
root-mean-square-error between target and NN output values.
Energy-only (E) and energy-force (EF) training types are
available. For the latter, please make sure that the data is parsed
into both energy and force NN inputs (FLAG `TEFS <#tefs1>`_ =1).

Besides the traditional full training in which all NN weights are
optimized, MAISE has an option to use the **stratified** and
**generalized stratified** training schemes. These approaches are
briefly introduced in the following sections, while a detailed
description of these schemes is available in Refs. [1_] and
[2_], respectively.

NN training setup
~~~~~~~~~~~~~~~~~

A NN training job is fully configured in a single ``setup`` file.

.. _setuptraintable:

.. table:: Table 2: Parameters for NN training.

  +------------------+-----------------------------------------------------------------------------+
  | FLAG             | Short description                                                           |
  +==================+=============================================================================+
  | `JOBT <#jobt2>`_ | Training type: full training (40); stratified training (41)                 |
  +------------------+-----------------------------------------------------------------------------+
  | `NPAR <#npar2>`_ | Number of cores for parallel training                                       |
  +------------------+-----------------------------------------------------------------------------+
  | `MINT <#mint2>`_ | The optimizer algorithm for neural network training                         |
  +------------------+-----------------------------------------------------------------------------+
  | `MITR <#mitr2>`_ | Number of the optimization steps for training                               |
  +------------------+-----------------------------------------------------------------------------+
  | `ETOL <#etol2>`_ | Error tolerance for training                                                |
  +------------------+-----------------------------------------------------------------------------+
  | `TEFS <#tefs2>`_ | Training target value: E (0); EF (1)                                        |
  +------------------+-----------------------------------------------------------------------------+
  | `NSPC <#nspc2>`_ | Number of element types for dataset parsing and training                    |
  +------------------+-----------------------------------------------------------------------------+
  | `TSPC <#tspc2>`_ | Atomic number of the elements specified with NSPC tag                       |
  +------------------+-----------------------------------------------------------------------------+
  | `NSYM <#nsym2>`_ | Mumber of the BP symmetry functions for parsing data                        |
  +------------------+-----------------------------------------------------------------------------+
  | `NCOM <#ncmp2>`_ | The length of the input vector of the neural network                        |
  +------------------+-----------------------------------------------------------------------------+
  | `NTRN <#ntrn2>`_ | Number of structures used for training (negative number means percentage)   |
  +------------------+-----------------------------------------------------------------------------+
  | `NTST <#ntst2>`_ | Number of structures used for testing (negative number means percentage)    |
  +------------------+-----------------------------------------------------------------------------+
  | `NNNN <#nnnn2>`_ | Number of hidden layers (does not include input vector and output neuron)   |
  +------------------+-----------------------------------------------------------------------------+
  | `NNNU <#nnnu2>`_ | Number of neurons in hidden layers                                          |
  +------------------+-----------------------------------------------------------------------------+
  | `NNGT <#nngt2>`_ | Activation function of the hidden layers’ neurons: linear (0); tanh (1)     |
  +------------------+-----------------------------------------------------------------------------+
  | `LREG <#lreg2>`_ | Regularization parameter                                                    |
  +------------------+-----------------------------------------------------------------------------+
  | `SEED <#seed2>`_ | Rand seed for generating NN weights (0 for system time)                     |
  +------------------+-----------------------------------------------------------------------------+
  | `DATA <#data2>`_ | Location of the parsed data to read from for training                       |
  +------------------+-----------------------------------------------------------------------------+
  | `OTPT <#otpt2>`_ | Directory for storing model parameters in the training process              |
  +------------------+-----------------------------------------------------------------------------+
  | `EVAL <#eval2>`_ | Directory for model testing data                                            |
  +------------------+-----------------------------------------------------------------------------+

.. _jobt2:

**JOBT** specifies the training task and its type:

* **40** full training
* **41** stratified training

.. _npar2:

**NPAR** is the number of the cores to be used for the NN training. Here,
the parallelization is done over all structures in the training/testing set.

.. _mint2:

**MINT** specified the type of the optimizer to be used for the NN training. 
BFGS2 typically provides the most efficient optimization.

* **0** BFGS2
* **1** CG-FR
* **2** CG-PR
* **3** steepest descent

.. _mitr2:

**MITR** is the number of the optimization steps (epochs) to be
performed in the NN training task.

.. _etol2:

**ETOL** is a minimization stopping criterion. The training exits if
the difference between the total errors for two subsequent steps falls
below this value. Usually, it is better to set **ETOL** to a very
small value and control the length of NN training with the `MITR
<#mitr2>`__ FLAG.

.. _tefs2:

**TEFS** is type of the target value to be used in the NN training:

* **0** energy only
* **1** energy and force

This FLAG should be consistent with the corresponding `TEFS
<#tefs1>`_. FLAG in the data parsing. Force training is possible only
if the force data is processed during the parsing stage.

.. _nspc2:

**NSPC** is the total number of the atomic species that are present in the
dataset.

.. _tspc2:

**TSPC** is a list of all species atomic numbers present in the
dataset. This list should be ordered from the lowest to highest atomic
number.

.. _nsym2:

**NSYM** is the total number of the BP symmetry functions used for
data parsing. This number should match the number of the introduced
symmetry functions in the ``basis`` file. It should be consistent with
the the corresponding value used at the parsing stage. The value can be
retrieved from the ``stamp.dat`` file.

.. _ncmp2:

**NCMP** is the total number of the input vector component (per species in
case of the multi-component systems) which will be produced from the
introduced BP symmetry functions. It should be consistent with
the the corresponding value used at the parsing stage. The value can be
retrieved from the ``stamp.dat`` file.

.. _ntrn2:

**NTRN** is the number (if **NTRN** is positive) or the fraction (if
**NTRN** is negative) of the parsed data which will be used for the NN
training. At the time of the parsing, a list of structures is
generated in the ``index.dat`` file in the parsed data location; the
**NTRN** FLAG will read that list and import **NTRN** number/percent
of the data from the *beginning* of the list for training.

.. _ntst2:

**NTST** is the number (if **NTST** is positive) or the fraction (if
**NTST** is negative) of the parsed data which will be used for the NN
testing. At the time of the parsing, a list of structures is
generated in the ``index.dat`` file in the parsed data location; the
**NTST** FLAG will read that list and import **NTST** number/percent
of the data from the *end* of the list for the NN testing.

.. _nnnn2:

**NNNN** is the number of the hiddern layers in the NN excluding the
input and the single-neuron output layers. Currently MAISE supports up
to 2 hidden layers (hence, a total of 4 layers).

.. _nnnu2:

**NNNU** is a list specifying the number of neurons in each hidden
layer. The number of provided values should match the `NNNN <#nnnn2>`__
FLAG.

.. _nngt2:

**NNGT** is the type of the activation function for neurons in each hidden
layer:

* **0** linear
* **1** tanh

The input layer has no neurons while the neuron in the output layer
always has the linear activation function.

.. _lreg2:

**LREG** is the magnitude of the L\ :sub:`2`\  regularization parameter. Typical
values are between :math:`10^{-8}` and :math:`10^{-6}`.

.. _seed2:

**SEED** is the seed for randomizing initial values of the NN weights
at the beginning of the NN training. For **SEED** = 0, the system time
will be used; a **SEED** > 0 value will be used as seed.

.. _data2:

**DATA** specifies the location of the parsed data to read from for the NN training.

.. _otpt2:

**OTPT** specifies the directory for storing model parameters in the training
process.

.. _eval2:

**EVAL** specifies the directory for model testing data. The format of this 
evaluation data is not yet described in this manual.

Training job submission
~~~~~~~~~~~~~~~~~~~~~~~

With the ``setup`` file in the current directory, the training task
can be performed by running MAISE:

.. code-block:: console

   $ maise

However, as the training is a time-consuming task, it is advisable to
submit the job to a compute node using a queueing system. Make sure to
match the number of the requested cores with the `NPAR <#npar2>`_
parallelization FLAG. An example of the submissoin script for the
**slurm** cluster is as follows:

.. literalinclude:: ./file/jtrn
   :language: none

.. note::
   * For the case of the **stratified training**, substituent models should be
     placed in the working directory, e.g., ’Cu.dat’ and ’Pd.dat’ for
     fitting the Cu-Pd binary NN, or ’CuPd.dat’, ’CuAg.dat’, and ’PdAg.dat’
     for fitting the Cu-Pd-Ag ternary NN. 
   * Presently, MAISE allows for training NN models with up to 3
     elements. While the treatment of systems with more elements is
     possible conceptually, the practical cost of data generation and
     parameter optimization becomes expensive.


NN training output
~~~~~~~~~~~~~~~~~~

After the training job is finished a set of output files are being
generated by MAISE. These files and their content are as follows:

* ``model`` file is the main output of the training job. It contains a
  header section with the model/system specifications and
  training/testing errors, the optimized parameters of the NN model,
  and a copy of the ``basis`` file used for the data parsing. Here is
  an example of the header section of a ``model`` file:

.. literalinclude:: ./file/model
   :lines: 1-38
   :language: none

* ``err-out.dat`` stores the residual error during the optimization process.

* ``err-ene.dat`` contains a list of the *ab initio* target
  energies, energies produced by the optimized model, and their
  difference (i.e., the error in estimating the total energy) for all
  structures in the training and testing sets.

* ``err-frc.dat`` file which is produced only in the case of the
  energy-force training and contains average errors in
  evaluating the atomic force components for the structures in the
  training and testing sets.

Available MAISE models are listed `here <models.html>`__.

Stratified training
~~~~~~~~~~~~~~~~~~~

Under ideal conditions - given a complete basis for representing
atomic environments within a large cutoff sphere, unlimited number of
adjustable parameters and reference data, and a powerful fitting
algorithm - a multielement NN with fully optimized elemental and
interspecies weights is expected to accurately map the PES for all
subsystems. In practice, the use of approximations leads to the
following problem. Suppose one wishes to fit a model describing A, B,
and AB phases given three datasets of A, B, and AB structures. Let’s
say that the PES of element A happens to be trivial and can be
approximated with negligible error in the region spanned by the A
data. If one now fits all parameters simultaneously to the full A, B,
and AB dataset the larger error will be distributed across all
elemental and binary systems. In other words, the addition of B and AB
data unphysically alters the description of the elemental A phases. It
should be noted that the constrained NN architecture does account for
the change in the interaction strength between A atoms induced by the
presence of B atoms because the AA/AAA inputs are mixed in with the
AB/AAB/ABB inputs via neurons’ non-linear activation functions.

In addition to having a more sound foundation, the stratification
procedure significantly accelerates the creation of NN libraries. For
example, the full training of a binary AB model on all A, B, and AB
data takes about the same time as the sequential training of A, B, and
AB models on the corresponding data subsets. However, for an extended
block of A, B, and C elements, the standard approach involves the
fitting of AC and BC NNs from scratch, while the inheritance of A and
B weights in the stratified scheme reduces the total fitting time by
at least a factor of two. The speed-up increases dramatically as more
elements are added and ternary models are built.

Here is a schematic of the stratified training implemented in the
MAISE code:

.. figure:: ./figs/strat.png 
   :scale: 60% 
   :align: center

   Schematic representation of stratified training of multilayer NNs. Here we
   show only one hidden layer of neurons and only pair input symmetry
   functions. Weights and element species shown in bold green are
   adjusted while the ones in thin red are kept fixed during the
   training from the bottom up. The letters denote species types in the
   input vector; for example, AAB describes a triplet symmetry function
   centered on an A-type atom with A-type and B-type neighbors (the
   order of neighbors is irrelevant)

Generalized stratified training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to extend the stratified procedure to materials with more
complex interactions and an arbitrary number of elements, we have
considered more flexible NN architectures that still preserve the
intact description of the subsystems. Compared to the original
stratified NN layout [1_], it involves addition of new neurons, shown
as green units in below figure, with different connection patterns and
conditions.

.. figure:: ./figs/stratplus.png
   :scale: 60%
   :align: center

   Schematic illustration of stratified+ (top row) and stratified±
   (bottom row) NN architectures for a binary chemical system. The
   expansion of the original stratified architecture is done with the
   addition of new neurons shown in green. The weights of elemental
   NNs (middle row) are copied and kept fixed in all stratified
   variations. Free, coupled, and fixed weights are shown in green,
   yellow, and red, respectively. (a) Connections in a simplified NN
   with one hidden layer and only pair inputs. The partial constraints
   shown in yellow and explained in the main text ensure intact
   description of the elemental structures. (b) Color-coded degrees of
   weight constraints in NNs with pair and triplet inputs. The
   original and stratified+ schemes have 60% adjustable weights in the
   first layer in binaries, 11% in ternaries (e.g., only the last one
   among AA, AB, AC, AAA, AAB, AAC, ABB, ACC, ABC, see Ref. [1_]), and
   none in quaternaries. The stratified± architecture can be used for
   an arbitrary number of chemical elements.

The schematic of a ’stratified+’ binary NN (top row in above figure)
illustrates that as long as there are no connections from the inputs
or neurons in the elemental subnets to the inserted neurons, the new
adjustable weights do not alter the signal processing for pure
elemental structures. Despite the added flexibility, the NN still does
not allow the proper fitting of interactions in compounds with more
than three chemical elements. Indeed, the adjustable parts of such NNs
involve 60% of inputs in binaries (top right box in above figure), 11%
of inputs in ternaries (caption of the above figure) and none for
systems with more elements. This restriction is actually imposed by
the NN architecture, and can be lifted as follows.

The ’stratified±’ expansion (bottom row in the above figure) introduces
semi-adjustable links even in the inherited parts of the merged NN. We
add neurons in pairs, coupling the two weights incoming from each
subsystem input to have opposite values while coupling the two
outgoing weights to be the same. For a purely elemental structure, the
interspecies input values are zero and the net signal (at neuron 5)
from each elemental input (1) passed through the paired neurons (3&4)
will be zero as well regardless of the coupled weight magnitudes. For
a binary structure, the non-zero binary inputs multiplied by fully
unconstrained weights will unbalance the elemental signals because of
the non-linear nature of the activation function resulting in a
non-zero contribution at neuron 5 that depends on both elemental and
binary (semi)adjustable weights.

The set of new partially constrained weights shown in yellow in the
above figure enables the stratified± NN to better capture the
screening and charge transfer effects as well as describe interactions
in systems with an unlimited number of species. In a trial
implementation, we imposed the constraint by penalizing the mismatch
between the coupled weights as :math:`\Sigma_N \sigma(w_{1,N}\pm
w_{2,N})^2`. We have observed no need to adjust the :math:`\sigma`
penalty factor during the NN optimization, as the differences between
coupled weight magnitudes become negligible after a few dozen training
steps; near the end of optimization, we set the magnitudes to their
average and keep them fixed without any appreciable effect on the
error. To the best of our knowledge, this semi-constrained solution
for systematically expanding NN features has not been considered in
the field of materials modeling.

One way to determine whether the use of the expanded NN architectures
is warranted is to reoptimize the standard stratified NN without any
constraints on the full dataset. A significant reduction in the
training and testing errors would indicate the need for additional NN
flexibility. In our studies of metal alloys, the error reductions are
usually in the 0-15% range (e.g., see Figure 4 in Ref. [1_]). Our
preliminary tests have shown that both stratified+ and ± architectures
end up with errors about midway between those in the stratified and
full NNs. In order to quantify the improvements arising from the
additional degrees of freedom in each scheme, we plan to investigate
more challenging systems comprised of different element types in
future studies.


.. _1:  https://journals.aps.org/prb/abstract/10.1103/PhysRevB.95.014114

.. _2:  https://arxiv.org/abs/2005.12131

.. _3:  https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401

.. _4:  https://pubs.rsc.org/en/content/articlelanding/2018/CP/C8CP05314F#!divAbstract

.. _5:  https://onlinelibrary.wiley.com/doi/abs/10.1002/jcc.540030212

.. _6:  https://nvlpubs.nist.gov/nistpubs/jres/049/jresv49n6p409_A1b.pdf

.. _7:  http://www.gnu.org/software/gsl/

.. _8:  https://github.com/maise-guide/maise-net

.. _9:  https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.98.146401

.. _10: https://aip.scitation.org/doi/abs/10.1063/1.3553717?journalCode=jcp