build_dataset
Hint
The high-level API in this script is BuildDatabase.
For example:
database = BuildDatabase()
database.build()
See https://jzhang-github.github.io/AGAT/Tutorial/Build_database.html for more info.
Warning
Some functions on this page will be deprecated in the future. Including select_graphs_random and concat_graphs. Use select_graphs_from_dataset_random and concat_dataset, respectively.
- class ReadGraphs
This object is used to build a list of graphs.
- __init__(self, **data_config)
- Parameters:
**data_config (dict) –
Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.
Hint
Mode of how to get the neighbors, which can be:
'voronoi': consider Voronoi neighbors only.'pymatgen_dist': build graph based on a constant distance usingpymatgenmodule.'ase_dist': build graph based on a constant distance usingasemodule.'ase_natural_cutoffs': build graph fromasewhich has a dynamic cutoff scheme. In this case, thecutoffis deprecated becauseasewill use the dynamic cutoffs inase.neighborlist.natural_cutoffs().
- read_batch_graphs(self, batch_index_list, batch_num)
Read graphs with batches.
Note
The loaded graphs are saved under the attribute of
dataset_path.- Parameters:
batch_index_list (list) – a list of graph index.
batch_num (str) – number the graph batches.
- read_all_graphs(self, scale_prop=False, ckpt_path='.')
Read all graphs specified in the csv file.
Note
The loaded graphs are saved under the attribute of
dataset_path.Danger
Do not scale the label if you don’t know what are you doing.
- Parameters:
scale_prop (bool) – scale the label or not. DO NOT scale unless you know what you are doing.
ckpt_path (str) – checkpoint directory of the well-trained model.
- Returns:
graph_list: a list of
DGLgraph.graph_labels: a list of labels.
- class TrainValTestSplit(object)
Split the dataset.
Note
This object is deprecated.
- class ExtractVaspFiles(object)
Extract VASP outputs for building AGAT database.
- Parameters:
data_config['dataset_path'] (str) – Absolute path where the collected data to save.
Note
Always save the property per node as the label. For example: energy per atom (eV/atom).
- __init__(self, **data_config)
- Parameters:
**data_config (dict) – Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.
- read_oszicar(self, fname='OSZICAR')
Get the electronic steps of a VASP run.
- Parameters:
fname (str, optional) – file name, defaults to ‘OSZICAR’
- Returns:
electronic steps of a VASP run.
- Return type:
list.
- split_output(self, process_index)
- Parameters:
process_index (int.) – A number to index the process.
- __call__(self)
The __call__ function
- class BuildDatabase
Build a database. Detailed information: https://jzhang-github.github.io/AGAT/Tutorial/Build_database.html
- __init__(self, **data_config)
- Parameters:
**data_config (dict) – Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.
- build(self)
Run the construction process.
- concat_graphs(*list_of_bin)
Concat binary graph files.
- Parameters:
*list_of_bin (strings) – input file names of binary graphs.
- Returns:
A new file is saved to the current directory: concated_graphs.bin.
- Return type:
None. A new file.
Example:
concat_graphs('graphs1.bin', 'graphs2.bin', 'graphs3.bin')
- concat_dataset(*list_of_datasets, save_file=False, fname='concated_graphs.bin')
Concat
agat.dataset.Datasetin the RAM.- Parameters:
*list_of_datasets (
agat.dataset.Dataset) – a list ofagat.dataset.Datasetobject.save_file (bool) – save to a new file or not. Default: False
fname (str) – The saved file name if
savefile=True. Default: ‘concated_graphs.bin’
- Returns:
A new file is saved to the current directory: concated_graphs.bin.
- Return type:
agat.dataset.Dataset
- select_graphs_random(fname: str, num: int)
Randomly split graphs from a binary file.
- Parameters:
fname (str) – input file name.
num (int) – number of selected graphs (should be smaller than number of all graphs.
- Returns:
A new file is saved to the current directory: Selected_graphs.bin.
- Return type:
None. A new file.
Example:
select_graphs_random('graphs1.bin')
- select_graphs_from_dataset_random(dataset, num: int, save_file=False, fname='selected_graphs.bin')
Randomly split graphs from a binary file.
- Parameters:
fname (str) – input file name.
num (int) – number of selected graphs (should be smaller than number of all graphs.
- Returns:
A new file is saved to the current directory: Selected_graphs.bin.
- Return type:
None. A new file.
Example:
select_graphs_random('graphs1.bin')