# Part 2: Replica-Exchange Grand-Canonical (REGC) scheme

This Replica-Exchange Grand-Canonical (REGC) code mainly focuses on calculating the phase diagrams of materials including cluster, nanoparticle, and surface in a reactive gas phase. The basic physical theories and techniques in REGC method are described in:

1 Y. Zhou, M. Scheffler, and L. M. Ghiringhelli, "Determining surface phase diagrams including anharmonic effects," Phys. Rev. B 100, 174106 (2019) https://link.aps.org/pdf/10.1103/PhysRevB.100.174106

2 Y. Zhou. Surface phase diagrams including anharmonic effects via a replica-exchange grand-canonical method. Doctoral thesis, Technische Universität Berlin, Berlin, 2020. https://depositonce.tu-berlin.de/handle/11303/11723

Here we will run a very small example of a $$\mathrm{Si}_2$$ cluster in a $$\mathrm{H}_2$$ gas phase to demonstrate the workflow of the REGC method.

## Getting started with FHI-panda

FHI-panda is written in python2.7. The FHI-panda package is distributed in source code form, the following prerequisites are needed:

To compile FHI-aims with PLUMED, add the following setting in your current CMake file and recompile

set(USE_PLUMED ON CACHE BOOL “”)


If you have used the point release 210716_2 the name of the executable compiled with PLUMED should be the following:

aims.210716_2.meta.scalapack.mpi.x


All FHI-panda scripts are shipped with the tutorial material, which you can download or clone from the following site:

https://gitlab.com/FHI-aims-club/tutorials/introduction-of-ab-initio-thermodynamics-and-regc/

Change into the directory:

introduction-of-ab-initio-thermodynamics-and-regc/Tutorials/P2-REGC/REGC


It includes exercises_1 for perform a REGC-MD run and exercise_2 for the post-processing of REGC data.

## The Input files for REGC molecular dynamics

We will perform a REGC molecular dynamics calculation for a $$\mathrm{Si}_2$$ cluster in contact with $$H_2$$ gas. Here, FHI-aims code was used for total energy and molecular dynamics calculations. Please go to the following directory:

introduction-of-ab-initio-thermodynamics-and-regc/Tutorials/P2-REGC/REGC/exercise_1/Si2_run


The working directory should contain the following necessary files:

1. REGC.dat: The REGC main input file
2. *.py: All the python source files of FHI-panda
3. run.sh: The shell script for submit the REGC job
4. control.in.E: the FHI-aims input file. Note that the geometry info is already included in REGC.dat.
5. subjob: a bash script that defines how to run the replica in parallel.

## Preparing the REGC.dat input file

The main input file REGC.dat contains all the necessary parameters for the structure prediction. For the $$\mathrm{Si}_2$$ cluster in a $$\mathrm{H}_2$$ gas phase it should look like this:

################################ The Basic Parameters of REGCMD ################################
SystemName = Si2Hx
NumberOfSpecies = 2
NameOfAtoms = Si D
Center = 0.0 0.0 0.0
Numberofatom = 2
Numberofmetalatom = 2
@DistanceOfIon
2.14 1.4
1.4 0.75
@End
Current_step = 0
Num_T = 2
Num_u = 2
Temperature =  300 400
Mu = -0.3 -0.2
MDt = 0.02
@Atomposition
0.94787         3.59052         0.22819
0.17231         3.14166         2.76918
@End
N_total = 20000
mass = 28.086 2.014101
PickUp = F


The file consists of input tags that can be given in any order, or be omitted while the default values are used. Here we offer a quick overview of REGC.dat syntax:

• The general syntax is "tag labels" = "value1 value2 value3 ...". For matrix inputs, it started with \@tag and ended with \@End. Values that are not specified in the REGC.dat file are assigned a default value. Input values should be separated by space.
• The lines beginning with # is taken as explanation of the following tag label.
• Logical values can be given as T or True, F or False.
• The labels are not case sensitive.

In the following the input settings are described in detail. You can skip this subsection and continue with the exercise by continuing with the next section.

### Input settings

• SystemName = <string>: A string gives a name for the description system. Default value: REGCMD

• NumberOfSpecies = <integer>: Number of different chemical species in the system. For example, this number is "2" for Si$$_2$$ in contact with $$\mathrm{H}_2$$ gas. Default value: There is no default. You must provide the number.

• NameOfAtoms = <string1> <string2> <string3> ...: Elemental symbols of each chemical species separated by space. Taking $$\mathrm{Si}_2$$ in contact with $$\mathrm{H}_2$$ gas as examples, one will write NameOfAtoms = Si H. Default value: There is no default. You must define it.

• Mass = <real1> <real2> <real3> ...: The atomic mass of each chemical species separated by space. Taking $$\mathrm{Si}_2$$ in contact with H$$_2$$ gas as examples, one will write Mass = 28.0855 1.0079. Default value: There is no default. You must define it.

• NumberOfmetalAtoms = <integer>: The total number of metal particles in the initial structure. Here the term metal particles refers to the particles of bare cluster/surface. Default value: There is no default. You must define it.

• @DistanceOfIon (array)

@DistanceOfIon
<real11> <real12> <real13> ...
<real21> <real22> <real23> ...
...
@End


Minimal interatomic distances (in unit of angstrom) in a format of $$n×n$$ matrix. The rank $$n$$ of the matrix is determined by the NumberOfSpecies. For $$\mathrm{Si}_4\mathrm{H}_x$$ NumberOfSpecies=2, the $$2×2$$ matrix is defined as:

@DistanceOfIon
d_11 d_12
d_11 d_12
@End

where $$d_{11}$$, $$d_{12}$$ are the minima distances between Si-Si and Si-H, respectively. Thereby, are the minima distance between H-Si and H-H, respectively. Default value: There is no default. You must define it.

• SystemType = <integer>: Our method targets at determining phase diagrams of materials, including clusters(<integer>=0), nanoparticles(<integer>=1), surfaces(<integer>=2) and bulk (<integer>=3). Default value: There is no default. You must define it.

• Num_T = <integer>: the number of different temperatures selected in the simulation. Default value: There is no default. You must define it.

• Num_U = <integer>: the number of different chemical potentials selected in the simulation. Default value: There is no default. You must define it.

The total number of replicas

In total, the number of replicas is Num_T×Num_U!

• Current_step = <integer>: the number of REGC steps in the simulation. If you just begin from scratch, set it to 0, otherwise, find the number in the head line of you restart.dat files. Default value: There is no default. You must define it.

• PickUp = <logical>: If False, the calculation is from scratch. If True, it will start from a previous simulation, together set with Current_step. Default value: There is no default. You must define it.

• MDt = <real>: in ps unit. The simulation time for the single molecular dynamics. Default value: There is no default. You must define it.

• N_total=<integer>: The total number of REGC steps for the simulation. Default value: There is no default. You must define it.

• NumberOfAtoms = <integer> or <integer1>, <integer2, integer3, ...>: The total number of particles in the initial structure. If all the replicas share the same initial structure, then only one integer should be given, on the other hand, each replica has different initial structures. Default value: There is no default. You must define it.

The following tag labels are system-dependent. We give the examples for clusters and surfaces. First of all, the tag labels are illustrated for clusters (SystemType = 0):

• Radius = <real>: The radius of the sphere where cluster are confined during REGC simulation in angstrom unit. The sphere should be large enough so that the adsorption of gas is able to reach its condensed line. Default value: There is no default. You must define it.

• @Atomposition (array)

@Atomposition
<real11> <real12> <real13>
<real21> <real22> <real23>
...
@End\


If the same initial structure are shared with all replicas, please set this tag

@Replica0atomposition
<real11> <real12> <real13>
<real21> <real22> <real23>
...
@End
@Replica1atomposition
<real11> <real12> <real13>
<real21> <real22> <real23>
...
@End
@Replica2atomposition
<real11> <real12> <real13>
<real21> <real22> <real23>
...
\@End
...


## Running REGC jobs in parallel

Note that one of merits of Replica Exchange method is its embarrassingly parallel nature, which indicating that the total energy/molecular dynamics calculations of replicas can perform simultaneously. The following subjobs file executes 4 replicas in parallel assuming you have 4 MPI tasks available on your machine:

cd replica_00
mpirun -n 4 aims.x >> aims.out&
cd ..
cd replica_01
mpirun -n 4 aims.x >> aims.out&
cd ..
cd replica_02
mpirun -n 4 aims.x >> aims.out&
cd ..
cd replica_03
mpirun -n 4 aims.x >> aims.out&
cd ..
wait


The subjobs file uses the MPI executable mpirun, but this depends on the actual computer setup (it could be srun, orterun, etc.). Also the name of the FHI-aims executable can be different on your machine. Please adapt it to your setup.

Now, we are ready to start the calculation:

python2.7 core.py >log.core


Excercise 1

Run a REGCMD simulation for $$\mathrm{Si}_2$$ systems. Temperatures are 300 and 500 K. The chemical potentials ($$\Delta \mu_{\mathrm{H}}$$) are -0.3 and -0.2 eV. The input files to fulfill this task are prepared for you in the folder Tutorials/P2-REGC/REGC/exercise_1/Si2_run. You only need to do the following two steps

1. Adapt the MPI executable and FHI-aims executable name in the subjob file according to your system setup.
2. Execute python2.7 core.py > log.core

## The Output files of the REGC molecular dynamics

During a REGC simulation, we define two kinds of replicas: one is called physical replica which is an actual collection of atoms (configuration) throughout the course of the simulation; the other is logical replica referred to a specific thermodynamic state $$T_i$$ and $$\mu_i$$ The major output files and folders are listed in the running directory:

• restart.dat: The file contains all the information for restart a REGC calculation including: total energy, particle position

• log_rex: The file contains all the information regarding to the swapping process.

• *_*_.out: The file collects the MD simulation of each logical replica (specific thermodynamic state) after each REGC step.

• rex_*_*: The file collects the MD simulation of each logical replica (specific thermodynamic state) after each replica swapping.

• replica_*/: The folder contains the trajctory files.

• trajectory: The file collects the MD calculation of the physical replica (specific configuration) after each REGC step.

300.0_-0.2.out  400.0_-0.3.out  replica_02  rex_300.0_-0.3
300.0_-0.3.out  replica_00  replica_03  rex_400.0_-0.2
400.0_-0.2.out  replica_01  rex_300.0_-0.2  rex_400.0_-0.3

Here is an example of the output files of a specific REGC-MD simulation for $$\mathrm{Si}_2$$ cluster in contact with $$\mathrm{H}_2$$. The so-called physical replicas are under the replica_*/ folders:

ls replica_00
>>> trajectory


## Analysis of Results

Explosive amount of data are generated in a REGC simulation. Here we only explain in detail analysis REGC data in two aspects: Convergence and a posteriori analysis via MBAR.

### Convergence

One simple criteria to meet for a convergence is that the histogram (distribution) of a variable of interest at each thermodynamical state become stable or in the other term does not change with the further more simulation time. This variable can be any properties you are interested in. For the applications in Ref. [1], the distribution of adsorption energy or the number of adsorbed particle for the logical replicas are chosen. There is a python script named check_convergence.py taking care of the convergence check. Just type the following command in your running directory:

python check_convergence.py


You will get a collection of files for each thermodynamic state:

N_*_* E_*_*


Then the idea of Block Average [3] can be used to check whether the simulation is long enough.

### Post-processing via MBAR

There is an example that MBAR is applied to analyze the data of T-only replica exchange simulation in the pymbar code (https://github.com/choderalab/pymbar). The example is given in our package to illustrate how to use MBAR to analyze our data in the directory Tutorials/P2-REGC/REGC/exercise_2/N_H:

data/
reweight_regcmc.py


The reweight_regcmc.py file is python script to execute this data analysis and can be used to estimate a 2D potential of mean force from REGC simulation data at target temperature and chemical potential. All the required input data are under the data/ directory.

Excercise 2

Run a simple point post-processing for the $$\mathrm{Si}_2$$ data of ref.[1] at (360K, -0.1 eV) and identify the number of adsorbed hydrogen of most probably phase. Do the following:

1. Go to REGC/excercise_2/N_H
2. Execute python reweight_regcmc.py
3. Inspect the resulting file f_k.out

The relative free energy surface ('g' column in f_k.out) and the number of chemisorbed D atoms ('phi' column in f_k.out) is shown as below:

2D PMF

bin    phi        N          g         dg
0 0.500000       21      4.658      0.242
1 1.500000      133     52.301      0.424
2 2.500000     1647     45.040      0.120
3 3.500000       76     43.130      1.002
4 4.500000      803     32.017      0.258
5 5.500000       99     21.015      1.002
6 6.500000     3304      0.000      0.000
7 7.500000        7     37.755      0.959


Here, the free energy is the relative free energy with respect to the lowest free energy. For the most probable phase, the relative free energy is 0.0 ("g = 0.0"). The corresponding "phi = 6.5" is the bin center of the $$N_\mathrm{H}$$ bin [6, 7.0), so the most probable $$N_\mathrm{H}$$ = 6 at (360K, -0.1 eV). The info of the most probable phase is also written in 'PTD' file. For more information about MBAR, you may go to this website https://pymbar.readthedocs.io/en/master/#.

In order to construct phase diagram, the post-processing of REGC data need to be performed at every 1 K and every 0.01 eV for the whole T-P region of relevant interest.

After the post-processing, 3D data including $$T$$, $$\mu$$ and $$N_\mathrm{H}$$ are obtained as shown in "3D_MuN" file. With the relation between $$T$$, $$p$$ and $$\mu$$, "3D_phasediagram" is ready for a $$T$$-$$p$$ phase diagram as shown below.

### Reference

1 Y. Zhou, M. Scheffler, and L. M. Ghiringhelli, "Determining surface phase diagrams including anharmonic effects," Phys. Rev. B 100, 174106 (2019) https://link.aps.org/pdf/10.1103/PhysRevB.100.174106

2 Y. Zhou. Surface phase diagrams including anharmonic effects via a replica-exchange grand-canonical method. Doctoral thesis, Technische Universität Berlin, Berlin, 2020. https://depositonce.tu-berlin.de/handle/11303/11723

3 D. Frenkel and B. Smit. Understanding molecular simulation: from algorithms to applications, volume 1. Elsevier, 2001.