dataset
Hint
The objects in this page help manipulating dataset(s).
Examples of manipulating dataset(s):
from agat.data import Dataset, concat_dataset
from agat.data import select_graphs_from_dataset_random
dataset = Dataset('graphs.bin')
da = dataset[11]
db = dataset[1:4]
dc = concat_dataset(da, db)
dd = select_graphs_from_dataset_random(dataset, 10, save_file=False,
fname=None)
print(dataset, da, db, dc, dd)
from agat.data import select_graphs_from_dataset_random
de = select_graphs_from_dataset_random(dd, 3, save_file=False, fname=None)
print(de)
from agat.data import save_dataset
save_dataset(dd, fname='new_dataset.bin')
- class Dataset(torch.utils.data.Dataset)
This object is used to build a list of graphs.
Load the binary graphs.
Example:
import os from agat.data import LoadDataset dataset=LoadDataset(os.path.join('dataset', 'all_graphs.bin')) # you can index or slice the dataset. g0, props0 = dataset[0] g_batch, props = dataset[0:100] # the g_batch is a batch collection of graphs. See https://docs.dgl.ai/en/1.1.x/generated/dgl.batch.html
- __init__(self, dataset_path=None, from_file=True, graph_list=None, props=None)
Tip
You can load a dataset from file or from the RAM. From file:
dataset_path='string_example'
andfrom_file=True
From RAM: Specifyfrom_file=False
, and providegraph_list
andprops
.- Parameters:
dataset_path (str) – A paths leads to the binary DGL graph file.
from_file (bool) – Load from file or not.
graph_list (list) – A list of graphs.
props (list) – Properties tensor corresponding to a list of graphs.
- Returns:
a graph dataset.
- Return type:
list
- __getitem__(self, index)
Index or slice the dataset.
- param index:
list index or slice
- type index:
int/slice
- return:
Dataset
- rtype:
agat.data.Dataset
- __repr__(self)
Output if you
print()
a dataset.
- __len__(self)
Output if you
len()
a dataset.
- save(self, file='graphs.bin')
Save the dataset in RAM to the disk.
- Parameters:
file (str) – The output file name.
- Returns:
None. A file will be saved to the disk.
- class Collater(object)
The collate function used in torch.utils.data.DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
The collate function determines how to merge the batch data.
Example:
import os from agat.data import LoadDataset, Collater from torch.utils.data import DataLoader dataset=LoadDataset(os.path.join('dataset', 'all_graphs.bin')) collate_fn = Collater(device='cuda') data_loader = DataLoader(dataset, batch_size=64, shuffle=True, collate_fn=collate_fn)
- __init__(self, device='cuda')
- Parameters:
device (str) – The device for manipulating Dataset(s)
- __call__(self, data)
Collate the data into batches.
- Parameters:
data (AGAT Dataset) – the output of
Dataset
- Returns:
AGAT Dataset with dgl batch graphs. See https://docs.dgl.ai/en/1.1.x/generated/dgl.batch.html
- Return type:
AGAT Dataset
- concat_graphs(*list_of_bin)
Concat binary graph files.
- Parameters:
*list_of_bin (strings) – input file names of binary graphs.
- Returns:
A new file is saved to the current directory: concated_graphs.bin.
- Return type:
None. A new file.
Example:
concat_graphs('graphs1.bin', 'graphs2.bin', 'graphs3.bin')
- concat_dataset(*list_of_datasets, save_file=False, fname='concated_graphs.bin')
Concat
agat.dataset.Dataset
in the RAM.- Parameters:
*list_of_datasets (
agat.dataset.Dataset
) – a list ofagat.dataset.Dataset
object.save_file (bool) – save to a new file or not. Default: False
fname (str) – The saved file name if
savefile=True
. Default: ‘concated_graphs.bin’
- Returns:
A new file is saved to the current directory: concated_graphs.bin.
- Return type:
agat.dataset.Dataset
- select_graphs_random(fname: str, num: int)
Randomly split graphs from a binary file.
- Parameters:
fname (str) – input file name.
num (int) – number of selected graphs (should be smaller than number of all graphs.
- Returns:
A new file is saved to the current directory: Selected_graphs.bin.
- Return type:
None. A new file.
Example:
select_graphs_random('graphs1.bin')
- select_graphs_from_dataset_random(dataset, num: int, save_file=False, fname='selected_graphs.bin')
Randomly split graphs from a binary file.
- Parameters:
fname (str) – input file name.
num (int) – number of selected graphs (should be smaller than number of all graphs.
- Returns:
A new file is saved to the current directory: Selected_graphs.bin.
- Return type:
None. A new file.
Example:
select_graphs_random('graphs1.bin')