GW for solids

Here, we will discuss the band structure of bulk silicon in the diamond structure, calculated with the G\(_0\)W\(_0\) approach based on the DFT-PBE functional. This is perhaps the most important standard example of G\(_0\)W\(_0\) calculations and, as shown below, the actual technical keywords to be used are simple. Further details on this particular implementation and on the meaning of technical choices made can be found in¹:

Warning: periodic GW calculations are expensive.

Similar to the calculation of the periodic RPA energy, periodic G\(_0\)W\(_0\) computations are computationally very demanding. With the current code version, a qualitative calculation with light species defaults for Si, using a 4x4x4 k-point grid, can be done with 4 CPUs and roughly about 8GB of RAM. However, the cost of numerically well converged G\(_0\)W\(_0\) calculations is significantly higher. The code is, however, usable and is also being worked on actively to achieve higher efficiency. For the specific calculations in this tutorial, we recommend to survey the input and output files, which are provided here: https://gitlab.com/FHI-aims-club/tutorials/rpa-and-gw-for-molecules-and-solids/-/tree/main/Tutorials/Part-3/solutions/Si-GW

Requesting a G₀W₀ calculation for periodic solids

We will use the geometry.in file from the tutorial Basics of Running FHI-aims, Part 3, which has been obtained from a structure optimization based on the HSE06 functional:

lattice_vector      0.00013684      2.72413122      2.72426803
lattice_vector      2.72413117      0.00013679      2.72426800
lattice_vector      2.72385756      2.72385758     -0.00027363
atom_frac       0.00000000     -0.00000000     -0.00000000 Si
atom_frac       0.25000000      0.25000000      0.25000000 Si

In FHI-aims, a G\(_0\)W\(_0\) calculation can be requested using the following keywords in the control.in file:

xc                pbe
relativistic      atomic_zora scalar
qpe_calc          gw_expt
frequency_points  40
anacon_type       0
k_grid            4 4 4
output            band   0.00000  0.00000  0.00000   0.50000  0.00000  0.50000   10 G  X
periodic_gw_optimize inverse

[attach light species defaults for Si]

This input file will trigger the following calculation:

First, a normal DFT calculation with the PBE functional will be executed.
After the DFT-PBE SCF cycle has converged, the G\(_0\)W\(_0\) calculation will start. This calculation is requested by the keyword qpe_calc gw_expt. The argument gw_expt indicates that this is a rather new feature (experimental) in FHI-aims and any results should always checked carefully.
The keyword anacon_type specifies the type of analytical continuation used to extrapolate the self-energy from the imaginary frequency axis to the real-valued frequency axis. The argument 0 means that the two-pole approximation will be used.
The keyword periodic_gw_optimize enforces FHI-aims to employ a newly implemented performance optimization feature (Cholesky Decomposition for the G\(_0\)W\(_0\) eigenvalue problem), which will save some additional time in G\(_0\)W\(_0\) calculations. This enhancement will likely become the default setting in the near future. We are also anticipating further efficiency improvements elsewhere in the future, so check back with later code versions over time.

Warning: Request of band output mandatory!

It is necessary to request the band output to obtain the GW eigenvalues. At the moment of writing this tutorial, you will get no GW output, if you don't request output band.

In addition, attach the defaults_2020/light species default for Si to the above control.in file.

When executed with 4 current CPU cores (i.e., MPI tasks), this simple calculation will require about 5 GB of memory. Please ensure that the computer used for performing this calculation is sufficiently powerful.

Execute the calculation as follows:

mpirun -n 4 aims.210716_2.scalapack.mpi.x | tee output

In our test and with 4 CPUs, this calculation took about 30 minutes to finish. Using 40 cores on Intel Skylake (solutions folder), about 8 minutes were needed.

After the calculation has successfully finished you should find the following files in your folder (see Tutorials/Part-3/solutions/Si-GW/1-FirstTest/Light-444-f40 for solutions):

GW_band1001.out
aims.out
control.in
geometry.in

The GW_band1001.out file contains the GW quasiparticle eigenvalues along the band segment "1" - here, this is the Gamma-X segment requested in the control.in file. For Si, we know that this segment contains both the valence band maximum and the conduction band minimum. For other materials, identifying the most appropriate band segments to plot is (of course) a separate task.

The energy band segment can be simply plotted by uploading the results to GIMS output analyzer, or using clims (version >=0.4.2) using the following command (at the command line in the terminal window):

clims-aimsplot

Already from the output file aims.out itself, we learn that the G\(_0\)W\(_0\) band gap calculated with these rather restricted settings along this band segment amounts to 1.12 eV:

  "GW Band gap" of total set of bands:
  | Lowest unoccupied state:      -4.69549072 eV at k point       0.44444      0.00000      0.44444
  | Highest occupied state :      -5.81714083 eV at k point       0.00000      0.00000      0.00000
  | Energy difference      :       1.12165012 eV

Actually, E\(_g\)=1.12 eV isn't a bad value. In fact, we believe that this is pretty close to what should be the numerically converged result as reported in Table I in¹. However, the fact this value comes out so well is, to some extent, a consequence of multiple small systematic errors cancelling one another. The above result is certainly encouraging. However, actual numerical convergence should always be tested with better settings.

In the subsequent sections, we will examine the numerical parameters that need to be tested in order to to obtain a numerically converged GW band structure. These next steps require more computational resources and cannot currently be executed on typical laptop computers. The tests to be performed are as follows:

Convergence with the number of frequency points used for the self-energy.
Convergence with the basis set used.
Convergence with the number of k-points.

Convergence with the number of frequency points

Similar to G\(_0\)W\(_0\) calculations for non-periodic systems, it is necessary to verify that the frequency grid for self-energy calculations is sufficiently well converged.

For simplicity, we here perform this test for the same light settings and 4x4x4 k-space grid as used in our initial test. However, we will double the number of frequency grid points to 80:

frequency_points  80

Create a new folder to execute this calculation, copy over the earlier geometry.in file and control.in file and change the number of frequency points to 80 in control.in.

From this simple test, we learn that nothing drastic changes for the band gap in this particular calculation of bulk Si - the predicted band gap is still 1.12 eV.

However, a better check would be to compare the predicted energy band structure (do this using clims-aimsplot or GIMS). Furthermore, the number of frequency points can make a more significant difference for other materials. The solutions for this part can be found under Tutorials/Part-3/solutions/Si-GW/2-frq/Light-444-f80 directory.

For the remaining part of this tutorial, we will use 40 frequency points (also in the interest of saving some computational resources).

Convergence with the basis set

The biggest impact on the numerical convergence of G\(_0\)W\(_0\) calculations arises from the basis set. As seen in other parts of this tutorial, the basis set can make a drastic difference in non-periodic RPA or G\(_0\)W\(_0\) calculations of small molecules.

For solids and for G\(_0\)W\(_0\) calculations, the situation appears to be somewhat more benign, since basis functions from different unit cells overlap and create an overall higher density of basis functions per volume (better numerical representation). Still, Table I in¹ shows that for some materials the influence of the basis set is very significant - not just in FHI-aims, but in other G\(_0\)W\(_0\) codes in the community as well. As laid out in¹ and elsewhere, it is not just the basis set used to expand the orbitals but also the so-called auxiliary basis set used to represent the Coulomb operator that could matter.

Nevertheless, the actual protocol to test the basis set convergence is straightforward. FHI-aims' species defaults and (in some cases) simple edits of the included basis functions are enough to get started.

Here, we will study the convergence behavior with the basis set for the 4x4x4 k-point grid using 40 frequency points (cf. previous sections). As mentioned above, we will study the impact of the species defaults. Create a separate folder for each of the following species defaults:

light
intermediate
tight
tight-tier2

In each folder, attach the corresponding species defaults to the control.in files.

For the tight-tier2 case, attach the tight species defaults but additionally uncomment the two remaining basis functions that are needed to include the full second tier of basis functions:

[...]
#  "First tier" - improvements: -571.96 meV to -37.03 meV
     hydro 3 d 4.2
     hydro 2 p 1.4
     hydro 4 f 6.2
     ionic 3 s auto
#  "Second tier" - improvements: -16.76 meV to -3.03 meV
     hydro 3 d 9
     hydro 5 g 9.4
     hydro 4 p 4
     hydro 1 s 0.65
[...]

It is instructive to compare the species default changes between light and intermediate. In addition to overall changes (integration grids, Hartree potential and radial extent of basis functions), one major change is the inclusion of an additional g function, but only for the purpose of constructing the auxiliary basis set used to represent the Coulomb operator:

for_aux     hydro 5 g 9.4

This small addition makes a difference. Note that this high-angular momentum function is not included in the basis set used to expand orbitals, but the high-angular momentum function does improve the representation of the Coulomb operator itself. The details of this modification, which is triggered by the for_aux keyword, are explained in http://dx.doi.org/10.1088/1367-2630/17/9/093020. For practical purposes here, it is enough to go ahead and see what the improved species defaults do.

In short, here is the plot of the predicted GW band gap values:

Verify these values in the output files. Also, plot the associated Gamma-X segments to see if any changes occur. Tutorials/Part-3/solutions/Si-GW/3-basis contains the solutions for this part.

One key point is that the species defaults matter and that we have now moved away from the ultimate numerically converged value of 1.12 eV in the literature. The deviation is not large but it is apparent that especially the better representation from light to intermediate does make a difference for Si.

Convergence with the number of k-points

We finally study the numerical convergence of G\(_0\)W\(_0\) for bulk Si with the chosen k-space grid. This is another very important test, since the 4x4x4 k-space grid used in the earlier tests is itself heavily underconverged for Si.

We will compare the following settings (create a folder for each one and modify the control.in file accordingly):

k_grid 4 4 4
k_grid 6 6 6
k_grid 8 8 8

In order to reflect our findings from the preceding exercise, "tight" settings are used. The result is, actually, somewhat of a warning sign when it comes to the predicted band gap - the predicted value is reduced down to 1.06 eV:

The computational resources needed for a better k-space grid have, unfortunately, increased quite drastically. From a technical point of view, this is due to a double sum over k-points that leads to N\(_k^2\) scaling, where N\(_k\) is the number of k-points considered. You can find the corresponding solutions in Tutorials/Part-3/solutions/Si-GW/4-kgrid for your reference.

Particularly fast k-point grid convergence for the Si diamond structure

The convergence of the G\(_0\)W\(_0\) band gap for the Si in the diamond structure with the number of k-points is actually quite fast. Other systems might need even denser k-space grids for convergence.

G₀W₀@PBE vs. PBE vs. HSE06 band structure

We are finally in a position to assess how far we have come. We next compare the full energy band structures for DFT-PBE, DFT-HSE06 (a hybrid functional) and G\(_0\)W\(_0\)@PBE, for FHI-aims tight species default settings, and a 8x8x8 k-space grids. Instructions regrading how to perform band structures calculation for periodic systems with FHI-aims are demonstrated in Basics of Running FHI-aims tutorial. The comparison of band structures look like:

The full header of control.in used to produce this band structure (G\(_0\)W\(_0\)@PBE) is as follows:

xc                pbe
relativistic      atomic_zora scalar
qpe_calc          gw_expt
frequency_points  40
anacon_type       0
k_grid            8 8 8
output                     band   0.00000  0.00000  0.00000   0.50000  0.00000  0.50000   17 G  X
output                     band   0.50000  0.00000  0.50000   0.50000  0.25000  0.75000   8 X  W
output                     band   0.50000  0.25000  0.75000   0.37500  0.37500  0.75000   6 W  K
output                     band   0.37500  0.37500  0.75000   0.00000  0.00000  0.00000   18 K  G
output                     band   0.00000  0.00000  0.00000   0.50000  0.50000  0.50000   15 G  L
output                     band   0.50000  0.50000  0.50000   0.62500  0.25000  0.62500   10 L  U
output                     band   0.62500  0.25000  0.62500   0.50000  0.25000  0.75000   6 U  W
output                     band   0.50000  0.25000  0.75000   0.50000  0.50000  0.50000   12 W  L
output                     band   0.50000  0.50000  0.50000   0.37500  0.37500  0.75000   10 L  K
output                     band   0.62500  0.25000  0.62500   0.50000  0.00000  0.50000   6 U  X
periodic_gw_optimize inverse

For DFT-PBE and DFT-HSE06 calculations, simply replace GW related keywords with PBE and HSE06 related one while keeping the output band lines unchanged, as shown below:

xc              pbe
relativistic    atomic_zora scalar
k_grid          8 8 8

xc                         hse06 0.11
hse_unit                   bohr
hybrid_xc_coeff            0.25
exx_band_structure_version 1
relativistic               atomic_zora scalar
k_grid                     8 8 8

The complete solutions to this part can be found in Tutorials/Part-3/solutions/Si-GW/5-Comparison.

As can be seen in the plot above, there are significant differences between all three methods. Near the band gap, G\(_0\)W\(_0\)@PBE and HSE06 agree qualitatively reasonably well for this case (this is, of course, one of the key reasons that hybrid density functionals are often used in lieu of much more expensive but more accurate G\(_0\)W\(_0\) calculations in the literature). DFT-PBE is clearly not sufficient to predict the band gap (we expected nothing else).

However, for energy bands away from the gap, there are also undeniable differences between G\(_0\)W\(_0\)@PBE and HSE06 and, generally, the detailed screening model embodied in G\(_0\)W\(_0\) would be expected to be closer to the actual behavior of these quasiparticle states, e.g., in photoemission spectroscopy.

For bulk Si, a customized, larger basis set than "tight" would recover the remaining difference to the literature G\(_0\)W\(_0\)@PBE band gap value of 1.12 eV, compared to the value of 1.06 eV obtained here using simple "tight" settings. This process is discussed in much greater details in¹, as well as in an appendix below. However, for the present tutorial, we can conclude that we are able to obtain qualitatively reasonable values for the band gap of Si in the diamond structure, for the right underlying mathematical reasons.

Conclusion: So, why G₀W₀ instead of computationally cheaper choices?

Importantly, the G\(_0\)W\(_0\) approach cures the significant inaccuracy of the Kohn-Sham band gap of "simple" DFT-PBE when (incorrectly) compared to the experimental quasiparticle band gap of a material.

Furthermore, the G\(_0\)W\(_0\) approach frequently allows one to obtain an essentially parameter-free estimate of the band gap and the band structure with reasonable accuracy already based on a single, simple DFT functional such as PBE.

In contrast, hybrid density functionals (such a HSE06) are frequently employed with hand-optimized parameters (the exchange mixing parameters) since the "optimal" parameterization of hybrid functionals actually varies with the band gap. In complex materials that contain two or more different components, the so-called "optimal" parameterization of hybrid density functionals often cannot be attained for both components at once. Furthermore, those parameterizations are only formally justified (to an extent) for the band gap itself, not for the remaining energy band structure.

The G\(_0\)W\(_0\) approach cures these downsides of computationally simpler methods such as HSE06. While the method is currently computationally more expensive, it is (in the authors' view) ultimately more physically robust than the available, simpler alternatives - even given the more detailed effort needed to quantify the technical uncertainties (degree of numerical convergence achieved) that still accompany the method.

Appendix: Complete numerical convergence of G₀W₀ calculations for Si

As noted above, it is possible to produce a fully numerically converged solution, albeit at yet higher expense. The crux is that our basis set needs to resolve the electron-electron interaction via the Green's function, the screened Coulomb potential and, ultimately, the self-energy everywhere in space. The underlying physical quantity (the so-called electron-electron cusp of the many-electron wave function) is sharply peaked. The need for a large, relatively dense basis set to resolve this cusp is generic to any basis set type (not limited to numeric atom-centered orbitals) and also appears in explicitly correlated methods other than G\(_0\)W\(_0\).

So how to proceed? In our present periodic \(G_0\)W\(_0\) implementation, we use a representation of the Coulomb interaction and other two-electron operators in terms of a so-called auxiliary basis set. The specific auxiliary basis set prescription used here is explained in detail in Ref.².

It turns out that what is needed for better convergence of the electron-electron cusp are additional auxiliary basis functions that provide good spatial resolution far away from a nucleus. From a technical point of view, these auxiliary basis functions are high-angular momentum functions that are extended relatively far away from each atom.

As mentioned before, additional basis functions to create a better auxiliary basis set can be added using the for_aux sub-tag after any species settings. A complete description of how these additional basis functions create the full auxiliary basis set is provided in Ref.\(^2\).

If more than one species is present in control.in, we normally add additional for_aux functions to all species. The additional auxiliary function of choice here are constructed as hydrogen-like functions. However, we here choose an effective charge 'Z=0' to generate them. As a result, the function(s) in question technically correspond to a spherical quantum well with no attractive nucleus in the center. These basis functions should be spherical Bessel functions, with a confinement given by the usual, configurable confinement radius that limits the extent of numerical basis functions in FHI-aims.

Specifically, for Si we add the following, high-angular momentum (f and g) basis functions to the set of functions that constructs the basis set. Despite the lengthy preceding explanation, the actual extra keywords are quite simple:

for_aux hydro 4 f 0.0

for_aux hydro 4 f 0.0
for_aux hydro 5 g 0.0

The following figure shows how the resulting additions to the auxiliary basis set improve the results for 4x4x4 and 8x8x8 k-grids, otherwise using FHI-aims normal "tight" settings. The corresponding calculations can be found in the folder Tutorials/Part-3/solutions/Si-GW/6-aux.

Clearly, the predicted band gap changes significantly especially with the first, added 4f type for_aux function. We now obtain a value of E\(_g\)=1.11 eV, very close to the converged value of 1.12-1.13 eV, reported in Ref.¹ for a yet larger benchmark basis set.

The computational cost has, however, also gone up further. For very high-precision G\(_0\)W\(_0\) calculations, this is an exercise that one can pursue successfully. However, as shown in the main part of the tutorial, qualitatively reasonable G\(_0\)W\(_0\) predictions for Si in the diamond structure can already be obtained at much lower cost.

Xinguo Ren et al., All-electron periodic G0W0 implementation with numerical atomic orbital basis functions: algorithm and benchmarks. Phys. Rev. Materials 5, 013807 (2021). https://doi.org/10.1103/PhysRevMaterials.5.013807 ↩↩↩↩↩↩
Arvid Ihrig et al., Accurate localized resolution of identity approach for linear-scaling hybrid density functionals and for many-body perturbation theory. New Journal of Physics 17, 093020 (2015). http://dx.doi.org/10.1088/1367-2630/17/9/093020 ↩