build_dataset
Hint
The high-level API in this script is BuildDatabase
.
For example:
database = BuildDatabase()
database.build()
See https://jzhang-github.github.io/AGAT/Tutorial/Build_database.html for more info.
Warning
Some functions on this page will be deprecated in the future. Including select_graphs_random
and concat_graphs
. Use select_graphs_from_dataset_random
and concat_dataset
, respectively.
- class ReadGraphs
This object is used to build a list of graphs.
- __init__(self, **data_config)
- Parameters:
**data_config (dict) –
Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.
Hint
Mode of how to get the neighbors, which can be:
'voronoi'
: consider Voronoi neighbors only.'pymatgen_dist'
: build graph based on a constant distance usingpymatgen
module.'ase_dist'
: build graph based on a constant distance usingase
module.'ase_natural_cutoffs'
: build graph fromase
which has a dynamic cutoff scheme. In this case, thecutoff
is deprecated becausease
will use the dynamic cutoffs inase.neighborlist.natural_cutoffs()
.
- read_batch_graphs(self, batch_index_list, batch_num)
Read graphs with batches.
Note
The loaded graphs are saved under the attribute of
dataset_path
.- Parameters:
batch_index_list (list) – a list of graph index.
batch_num (str) – number the graph batches.
- read_all_graphs(self, scale_prop=False, ckpt_path='.')
Read all graphs specified in the csv file.
Note
The loaded graphs are saved under the attribute of
dataset_path
.Danger
Do not scale the label if you don’t know what are you doing.
- Parameters:
scale_prop (bool) – scale the label or not. DO NOT scale unless you know what you are doing.
ckpt_path (str) – checkpoint directory of the well-trained model.
- Returns:
graph_list: a list of
DGL
graph.graph_labels: a list of labels.
- class TrainValTestSplit(object)
Split the dataset.
Note
This object is deprecated.
- class ExtractVaspFiles(object)
Extract VASP outputs for building AGAT database.
- Parameters:
data_config['dataset_path'] (str) – Absolute path where the collected data to save.
Note
Always save the property per node as the label. For example: energy per atom (eV/atom).
- __init__(self, **data_config)
- Parameters:
**data_config (dict) – Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.
- read_oszicar(self, fname='OSZICAR')
Get the electronic steps of a VASP run.
- Parameters:
fname (str, optional) – file name, defaults to ‘OSZICAR’
- Returns:
electronic steps of a VASP run.
- Return type:
list.
- split_output(self, process_index)
- Parameters:
process_index (int.) – A number to index the process.
- __call__(self)
The __call__ function
- class BuildDatabase
Build a database. Detailed information: https://jzhang-github.github.io/AGAT/Tutorial/Build_database.html
- __init__(self, **data_config)
- Parameters:
**data_config (dict) – Configuration file for building database. See https://jzhang-github.github.io/AGAT/Default%20parameters.html#default-data-config for the detailed info.
- build(self)
Run the construction process.
- concat_graphs(*list_of_bin)
Concat binary graph files.
- Parameters:
*list_of_bin (strings) – input file names of binary graphs.
- Returns:
A new file is saved to the current directory: concated_graphs.bin.
- Return type:
None. A new file.
Example:
concat_graphs('graphs1.bin', 'graphs2.bin', 'graphs3.bin')
- concat_dataset(*list_of_datasets, save_file=False, fname='concated_graphs.bin')
Concat
agat.dataset.Dataset
in the RAM.- Parameters:
*list_of_datasets (
agat.dataset.Dataset
) – a list ofagat.dataset.Dataset
object.save_file (bool) – save to a new file or not. Default: False
fname (str) – The saved file name if
savefile=True
. Default: ‘concated_graphs.bin’
- Returns:
A new file is saved to the current directory: concated_graphs.bin.
- Return type:
agat.dataset.Dataset
- select_graphs_random(fname: str, num: int)
Randomly split graphs from a binary file.
- Parameters:
fname (str) – input file name.
num (int) – number of selected graphs (should be smaller than number of all graphs.
- Returns:
A new file is saved to the current directory: Selected_graphs.bin.
- Return type:
None. A new file.
Example:
select_graphs_random('graphs1.bin')
- select_graphs_from_dataset_random(dataset, num: int, save_file=False, fname='selected_graphs.bin')
Randomly split graphs from a binary file.
- Parameters:
fname (str) – input file name.
num (int) – number of selected graphs (should be smaller than number of all graphs.
- Returns:
A new file is saved to the current directory: Selected_graphs.bin.
- Return type:
None. A new file.
Example:
select_graphs_random('graphs1.bin')