build_dataset

The high-level API in this script is BuildDatabase.

For example:

database = BuildDatabase()
database.build()

See https://jzhang-github.github.io/AGAT/Tutorial/Build_database.html for more info.

class CrystalGraph(object)

Read structural file and return a graph.

Caution

The constructed crystal graph may be unreasonable for high-entropy materials, if the connections are analyzed by Voronoi method.

Code example:

from Crystal2raph import CrystalGraph
cg = CrystalGraph(cutoff = 6.0, mode_of_NN='distance', adsorbate=True)
cg.get_graph('POSCAR')

Hint

Although we recommend representing atoms with one hot code, you can use the another way with: self.all_atom_feat = get_atomic_features()

Hint

In order to build a reasonable graph, a samll cell should be repeated. One can modify “self._cell_length_cutoff” for special needs.

Hint

We encourage you to use ase module to build crystal graphs. The pymatgen module needs some dependencies that conflict with other modules.

__init__(self, **data_config)
param **data_config:

Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.

type **data_config:

str/dict

return:

A DGL.graph.

rtype:

DGL.graph.

Hint

Mode of how to get the neighbors, which can be:

  • 'voronoi': consider Voronoi neighbors only.

  • 'pymatgen_dist': build graph based on a constant distance using pymatgen module.

  • 'ase_dist': build graph based on a constant distance using ase module.

  • 'ase_natural_cutoffs': build graph from ase which has a dynamic cutoff scheme. In this case, the cutoff is deprecated because ase will use the dynamic cutoffs in ase.neighborlist.natural_cutoffs().

Parameters:

adsorbate (bool) – Identify the adsorbate or not.

get_adsorbate_bool(self, element_list)

Identify adsorbates based on elements: H and O.

Parameters:

element_list (list) – a list of element symbols.

Returns:

a list of bool values.

Return type:

torch.tensor

get_crystal(self, crystal_fpath, super_cell=True)

Read structural file and return a pymatgen crystal object.

Parameters:
  • crystal_fpath (str) – the path to the crystal structural.

  • super_cell (bool) – repeat the cell or not.

Returns:

a pymatgen structure object.

Return type:

pymatgen.core.structure.

get_1NN_pairs_voronoi(self, crystal)

The get_connections_new() of VoronoiConnectivity object is modified.

Parameters:

crystal (pymatgen.core.structure) – a pymatgen structure object.

Returns:
  • index of senders

  • index of receivers

  • a list of distance between senders and receivers

get_1NN_pairs_distance(self, crystal)

Find the index of senders, receivers, and distance between them based on the distance_matrix of pymargen crystal object.

Parameters:

crystal (pymargen.core.structure) – pymargen crystal object

Returns:
  • index of senders

  • index of receivers

  • a list of distance between senders and receivers

get_1NN_pairs_ase_distance(self, ase_atoms)
Parameters:

ase_atoms (ase.atoms) – ase.atoms object.

Returns:
  • index of senders

  • index of receivers

  • a list of distance between senders and receivers

get_ndata(self, crystal)
Parameters:

crystal (pymargen.core.structure) – a pymatgen crystal object.

Returns:

ndata: the atomic representations of a crystal graph.

Return type:

numpy.ndarray

get_graph_from_ase(self, fname, include_forces=False)

Build graphs with ase.

Parameters:
  • fname (str/ase.Atoms) – File name or ase.Atoms object.

  • include_forces (bool) – Include forces into graphs or not.

Returns:

A bidirectional graph with self-loop connection.

get_graph_from_pymatgen(self, crystal_fname, super_cell=True, include_forces=False)

Build graphs with pymatgen.

Parameters:
  • crystal_fname (str) – File name.

  • super_cell (bool) – repeat small cell or not.

  • include_forces (bool) – Include forces into graphs or not.

Returns:

A bidirectional graph with self-loop connection.

get_graph(self, crystal_fname, super_cell=False, include_forces=True)

This method can choose which graph-construction method is used, according to the mode_of_NN attribute.

Hint

You can call this method to build one graph.

Parameters:
  • crystal_fname (str) – File name.

  • super_cell (bool) – repeat small cell or not.

  • include_forces (bool) – Include forces into graphs or not.

Returns:

A bidirectional graph with self-loop connection.

class ReadGraphs

This object is used to build a list of graphs.

__init__(self, **data_config)
Parameters:

**data_config (dict) –

Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.

Hint

Mode of how to get the neighbors, which can be:

  • 'voronoi': consider Voronoi neighbors only.

  • 'pymatgen_dist': build graph based on a constant distance using pymatgen module.

  • 'ase_dist': build graph based on a constant distance using ase module.

  • 'ase_natural_cutoffs': build graph from ase which has a dynamic cutoff scheme. In this case, the cutoff is deprecated because ase will use the dynamic cutoffs in ase.neighborlist.natural_cutoffs().

read_batch_graphs(self, batch_index_list, batch_num)

Read graphs with batches.

Note

The loaded graphs are saved under the attribute of dataset_path.

Parameters:
  • batch_index_list (list) – a list of graph index.

  • batch_num (str) – number the graph batches.

read_all_graphs(self, scale_prop=False, ckpt_path='.')

Read all graphs specified in the csv file.

Note

The loaded graphs are saved under the attribute of dataset_path.

Danger

Do not scale the label if you don’t know what are you doing.

Parameters:
  • scale_prop (bool) – scale the label or not. DO NOT scale unless you know what you are doing.

  • ckpt_path (str) – checkpoint directory of the well-trained model.

Returns:
  • graph_list: a list of DGL graph.

  • graph_labels: a list of labels.

class TrainValTestSplit(object)

Split the dataset.

Note

This object is deprecated.

class ExtractVaspFiles(object)

Extract VASP outputs for building AGAT database.

Parameters:

data_config['dataset_path'] (str) – Absolute path where the collected data to save.

Note

Always save the property per node as the label. For example: energy per atom (eV/atom).

__init__(self, **data_config)
Parameters:

**data_config (dict) –

Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.

read_oszicar(self, fname='OSZICAR')

Get the electronic steps of a VASP run.

Parameters:

fname (str, optional) – file name, defaults to ‘OSZICAR’

Returns:

electronic steps of a VASP run.

Return type:

list

split_output(self, process_index)
Parameters:

process_index (int.) – A number to index the process.

__call__(self)

The __call__ function

class BuildDatabase

Build a database. Detailed information: https://jzhang-github.github.io/AGAT/Tutorial/Build_database.html

__init__(self, **data_config)
Parameters:

**data_config (dict) –

Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.

build(self)

Run the construction process.

concat_graphs(*list_of_bin)

Concat binary graph files.

Parameters:

*list_of_bin

input file names of binary graphs.

Returns:

A new file is saved to the current directory: concated_graphs.bin.

Return type:

None. A new file.

Example:

concat_graphs('graphs1.bin', 'graphs2.bin', 'graphs3.bin')