VASP:	Some	Accumulated	Wisdom
J.	M.	Skelton
WMD	Group	Meeting
21st September	2015
WMD	Group	Meeting,	September	2015	|	Slide	2
Convergence:	Parameters
• Four	key	technical	parameters	in	a	VASP	calculation:
o Basis	set:	ENCUT and	PREC (or,	alternatively,	NGX,	NGY,	NGZ)
o k-point	sampling:	KPOINTS file	and	SIGMA
o [For	certain	types	of	pseudopotential.]	Augmentation	grid:	ENAUG and	PREC (or,	
alternatively,	NGXF,	NGYF,	NGZF)
o Which	space	the	projection	operators	are	applied	in	(LREAL)
WMD	Group	Meeting,	September	2015	|	Slide	3
Convergence:	Augmentation	grid
• A	second,	finer	mesh	is	used	to	represent	the	charge	density	near	the	ion	cores:	
controlled	by	ENAUG (or	PREC +	EAUG in	the	POTCAR files),	which	determines	NG*F
WMD	Group	Meeting,	September	2015	|	Slide	4
Convergence:	ZnS revisited
• For	calculations	on	ZnS with	TPSS,	ENAUG needs	to	be	increased	from	the	default	(but	
ENCUT = 550 eV	is	fine)	- equivalent	to	increasing	NG*F [but	without	also	increasing	
NG* as	in	the	QHA-ExC paper,	which	evidently	unnecessary	(!)]
WMD	Group	Meeting,	September	2015	|	Slide	5
Convergence:	ZnS revisited
• For	calculations	on	ZnS with	TPSS,	ENAUG needs	to	be	increased	from	the	default	(but	
ENCUT = 550 eV	is	fine)	- equivalent	to	increasing	NG*F,	but	without	also	increasing	
NG*,	which	is	wasteful
eV NG* NG*F Noise? t /	min
550 575.892 120 160 û -
650 575.892 128 160 û -
750 575.892 140 160 û -
850 575.892 150 160 û -
550 675.892 120 180 ü 116
550 775.892 120 192 ü 108
550 875.892 120 200 ü 113
WMD	Group	Meeting,	September	2015	|	Slide	6
The	VASP	SCF	cycle
• The	SCF	cycle	proceeds	in	two	phases:
o The	plane-wave	coefficients	are	initialised	randomly	and	“pre-optimised”	within	a	
fixed	potential	given	by	the	superposition	of	atomic	densities	(INIWAV,	NELMDL)
o The	wavefunctions and	density	are	then	optimised	self-consistently	to	convergence	
o If	an	initial	charge	density	exists	(e.g.	from	a	previous	SCF	or	converged	
CHGCAR/WAVECAR),	the	first	step	can	be	skipped	(ISTART,	ICHARG)
• To	accelerate	convergence,	the	output	density	from	a	step	N is	not	fed	directly	into	the	
next	step	N+1,	but	is	mixed with	the	input	density	(IMIX,	INIMIX,	MIXPRE,	MAXMIX,	
• For	the	mathematically-minded:	http://th.fhi-berlin.mpg.de/th/Meetings/DFT-workshop-
WMD	Group	Meeting,	September	2015	|	Slide	7
The	VASP	SCF	cycle
N E dE d eps ncg rms rms(c)
DAV: 1 0.425437171796E+04 0.42544E+04 -0.38613E+05 920 0.178E+03
DAV: 2 -0.114846409831E+04 -0.54028E+04 -0.51653E+04 1130 0.323E+02
DAV: 3 -0.169662738043E+04 -0.54816E+03 -0.53994E+03 1130 0.100E+02
DAV: 4 -0.171494085624E+04 -0.18313E+02 -0.18206E+02 1160 0.198E+01
DAV: 5 -0.171553585547E+04 -0.59500E+00 -0.59387E+00 1220 0.331E+00 0.706E+01
RMM: 6 -0.159733114612E+04 0.11820E+03 -0.21124E+02 920 0.147E+01 0.352E+01
RMM: 7 -0.157358217358E+04 0.23749E+02 -0.82778E+01 920 0.937E+00 0.173E+01
RMM: 8 -0.157195752202E+04 0.16247E+01 -0.10028E+01 922 0.344E+00 0.736E+00
RMM: 9 -0.157170732229E+04 0.25020E+00 -0.24051E+00 920 0.173E+00 0.186E+00
RMM: 10 -0.157170709721E+04 0.22508E-03 -0.17654E-01 932 0.561E-01 0.965E-01
RMM: 11 -0.157173130475E+04 -0.24208E-01 -0.10240E-01 920 0.332E-01 0.466E-01
RMM: 12 -0.157174953342E+04 -0.18229E-01 -0.23004E-02 920 0.198E-01 0.213E-01
RMM: 13 -0.157175624413E+04 -0.67107E-02 -0.12470E-02 920 0.134E-01 0.938E-02
RMM: 14 -0.157175705572E+04 -0.81159E-03 -0.49641E-03 922 0.781E-02 0.577E-02
RMM: 15 -0.157175711576E+04 -0.60039E-04 -0.62130E-04 922 0.302E-02 0.211E-02
RMM: 16 -0.157175714692E+04 -0.31162E-04 -0.18825E-04 932 0.152E-02 0.146E-02
RMM: 17 -0.157175715237E+04 -0.54516E-05 -0.37827E-05 935 0.701E-03 0.564E-03
RMM: 18 -0.157175715526E+04 -0.28845E-05 -0.88070E-06 824 0.340E-03 0.361E-03
RMM: 19 -0.157175715551E+04 -0.24851E-06 -0.27408E-06 657 0.209E-03
1 F= -.15717572E+04 E0= -.15717572E+04 d E =-.291254-147
Between	NELMIN
and	NELM steps	in	
NELMDL steps	in	a	
fixed	potential
Total	free
Change	in	total	energy
and	eigenvalues
Number	of	evaluations
of	𝐻"#𝛹⟩
Difference	in	input	and	output	density;	oscillations
probably	indicate	convergence	problems
Total	free	and	zero-broadening
(𝜎 → 0)	energy
WMD	Group	Meeting,	September	2015	|	Slide	8
The	ALGO tag
• ALGO is	the	“recommended”	tag	for	selecting	the	electronic-minimisation	algorithm
• Most	of	the	algorithms	have	“subswitches”,	which	can	be	selected	using	IALGO
• I	tend	to	use	one	of	four	ALGOs:
• RMM-DIIS	(ALGO = VeryFast):	fastest	per	SCF	step,	best	parallelised,	and	
converges	quickly	close	to	a	minimum,	but	can	struggle	with	difficult	systems
• Blocked	Davidson	(ALGO = Normal):	slower than	RMM-DIIS,	but	usually	stable,	
although	can	still	struggle	with	difficult	problems (e.g.	magnetism,	meta-GGAs	and	
• Davidson/RMM-DIIS	(ALGO = Fast):	Uses	ALGO = Normal for	the	“pre-
optimisation”,	then	switches	to	ALGO = VeryFast;	a	good	default	choice
• All-band	conjugate	gradient	(ALGO = All):	Slow,	but	very	stable;	use	as	a	fallback
when	ALGO = Normal struggles,	and	for	hybrids
WMD	Group	Meeting,	September	2015	|	Slide	9
Taming	TPSS	(and	other	meta-GGAs)
!ALGO = Normal | All
!ENAUG = MAX(EAUG) * 1.5
!NGXF = <>; NGYF = <>; NGZF = <>;
• In	my	experience,	meta-GGAs	can	sometimes	be	more	difficult	to	converge	than	
standard	GGA	functionals (or	even	hybrids)
RMM-DIIS	(ALGO = Fast | VeryFast)
sometimes	struggle
Don’t	forget	- (rev)TPSS	are	based	on	PBE
Aspherical	gradient	corrections
inside	PAW	spheres
Pass	kinetic-energy	density	to	the
charge-density	mixer
May	need	to	increase	ENAUG/NG*F if	very	accurate
forces	are	needed	(e.g.	phonons)
WMD	Group	Meeting,	September	2015	|	Slide	10
• The	newest	versions	of	VASP	implement	four	levels	of	parallelism:
o k-point	parallelism:	KPAR
o Band	parallelism	and	data	distribution:	NCORE and	NPAR
o Parallelisation	and	data	distribution	over	plane-wave	coefficients	(=	FFTs;	done	over	
planes	along	NGZ):	LPLANE
o Parallelisation	of	some	linear-algebra	operations	using	ScaLAPACK (notionally	set	at	
compile	time,	but	can	be	controlled	using	LSCALAPACK)
• Effective	parallelisation	will…:
o …	minimise	(relatively	slow)	communication	between	MPI	processes,	…
o …	distribute	data	to	reduce	memory	requirements…
o …	and	make	sure	the	MPI	processes	have	enough	work	to	keep	them	busy
WMD	Group	Meeting,	September	2015	|	Slide	11
Parallelisation:	Workload	distribution
KPAR k-point
NPAR band
groups	(?)
• Workload	distribution	over	KPAR k-point	groups,	NBANDS band	groups	and	NGZ plane-
wave	coefficient	(FFT)	groups	[not	100	%	sure	how	this	works…]
WMD	Group	Meeting,	September	2015	|	Slide	12
Parallelisation:	Data	distribution
KPAR k-point
NPAR band
groups	(?)
• Data	distribution	over	NBANDS band	groups	and	NGZ plane-wave	coefficient	(FFT)	
groups	[also	not	100	%	sure	how	this	works…]
WMD	Group	Meeting,	September	2015	|	Slide	13
Parallelisation:	KPAR
• During	a	standard	DFT	calculation,	k-points	are	independent	->	k-point	parallelism	should
be	linearly	scaling,	although	perhaps	not	in	practice:	
• <#cores> must	be	divisible	by	KPAR,	but	the	parallelisation	is	via a	“round-robin”	
algorithm,	so	<#k-points> does	not	need	to	be	divisible	by	KPAR ->	check	how	many	
irreducible k-points	you	have	(head IBZKPT)	and	set	KPAR	accordingly
k1 k2
k1 k2 k3
KPAR = 1
t =	3	[OK]
KPAR = 2;	t =	2	[Bad]
KPAR = 3
t =	1	[Good]
NCORE :			number	of	cores	in	band	groups
NPAR :			number	of	bands	treated	simultaneously
WMD	Group	Meeting,	September	2015	|	Slide	14
Parallelisation:	NCORE and	NPAR
< #cores >
• Why	not	NCORE = 1/NPAR = <#cores> (the	default)?	- more	band	groups	
(probably)	increases	memory	pressure	and	incurs	a	substantial	communication	overhead
WMD	Group	Meeting,	September	2015	|	Slide	15
Parallelisation:	NCORE and	NPAR
• WARNING:	VASP	will	increase	the	default	NBANDS to	the	nearest	multiple	of	the	number	
of	groups
• Since	the	electronic	minimisation	scales	as	a	power	of	NBANDS, this	can	backfire	in	
calculations	with	a	large	NPAR (e.g.	those	requiring	NPAR = <#cores>)
Default Adjusted
96 455 480
128 455 512
192 455 576
256 455 512
384 455 768
512 455 512
Example	system:
• 238	atoms	w/	272	electrons
• Default	NBANDS =	455
WMD	Group	Meeting,	September	2015	|	Slide	16
Parallelisation:	Memory
• KPAR:	current	implementation	does	not	distribute	data	over	k-point	groups	->	KPAR =
N will	use	N x	more	memory	than	KPAR = 1
• NPAR/NCORE:	data	is	distributed	over	band	groups	->	decreasing	NPAR/increasing	
NCORE by	a	factor	of	N will	reduce	memory	requirements	by	N x
• NPAR takes	precedence	over	NCORE - if	you	use	“master”	INCAR files,	make	sure	you	
don’t	define	both
• The	defaults	for	NPAR/NCORE(NPAR = <#cores>,	NCORE = 1)	are	usually	a	poor	
choice	for	both	memory	and performance
• Band	parallelism	for	hybrid	functionals has	been	supported	since	VASP	5.3.5;	for	
memory-intensive	calculations,	it	is	a	good	alternative	to	underpopulating nodes
• LPLANE:	distributes	data	over	plane-wave	coefficients,	and	speeds	things	up	by	
reducing	communication	during	FFTs	- the	default	is	LPLANE = .TRUE.,	and	should	
only	need	to	be	changed	for	massively-parallel	architectures	(e.g.	BG/Q)
WMD	Group	Meeting,	September	2015	|	Slide	17
Parallelisation:	ScaLAPACK
• RMM-DIIS	(ALGO = VeryFast | Fast)	involves	three	steps:
EDDIAG :			subspace	diagonalisation
RMM-DIIS :			electronic	minimisation
ORTHCH :			wavefunction orthogonalisation
Routine 312	atoms 624 atoms 1,248	atoms 1,872 atoms
EDDIAG 2.90	(18.64	%) 12.97	(22.24	%) 75.26	(26.38	%) 208.29	(31.31	%)
RMM-DIIS 12.39	(79.63	%) 42.73	(73.27	%) 187.62	(65.78	%) 379.80	(57.10	%)
ORTHCH 0.27	(1.74 %) 2.62	(4.49	%) 22.36	(7.84	%) 77.11	(11.59	%)
• EDDIAG and	ORTHCH formally	scale	as	N3,	and	rapidly	begin	to	dominate	the	SCF	cycle	
time	for	large	calculations
• A	good	scaLAPACK library	can	improve	the	performance	of	these	routines	in	massively-
parallel	calculations
See	also:	https://www.nsc.liu.se/~pla/blog/2014/01/30/vasp9k/
WMD	Group	Meeting,	September	2015	|	Slide	18
Parallelisation:	My	“rules	of	thumb”
• For	x86_64	IB	systems	(Archer,	Balena,	Neon…):
o Use	KPAR in	preference	to	NPAR
o Set	NPAR = (<#nodes>/KPAR) or	NCORE = <#cores/node>
o 1	node/band	group	per	50	atoms;	may	want	to	use	2	nodes/50	atoms	for	hybrids,	
or	decrease	to	½	node	per	band	group	for	<	10	atoms
o ALGO = Fast is	a	usually	a	good	choice,	except	for	badly-behaved	systems
o Leave	LPLANE at	the	default	(.TRUE.)
o For	the	IBM	BG/Q	(STFC	Hartree):
o The	Hartree machine	currently	uses	VASP	5.2.x	->	no	KPAR
o Try	to	choose	a	square	number	of	cores,	and	set	NPAR = sqrt(<#cores>)
o Consider	setting	LPLANE = .FALSE. if	<#cores> ≥	NGZ

