dataset

Hint

The objects in this page help manipulating dataset(s).

Examples of manipulating dataset(s):

from agat.data import Dataset, concat_dataset
from agat.data import select_graphs_from_dataset_random
dataset = Dataset('graphs.bin')
da = dataset[11]
db = dataset[1:4]
dc = concat_dataset(da, db)
dd = select_graphs_from_dataset_random(dataset, 10, save_file=False,
                                       fname=None)
print(dataset, da, db, dc, dd)

from agat.data import select_graphs_from_dataset_random
de = select_graphs_from_dataset_random(dd, 3, save_file=False, fname=None)
print(de)

from agat.data import save_dataset
save_dataset(dd, fname='new_dataset.bin')
class Dataset(torch.utils.data.Dataset)

This object is used to build a list of graphs.

Load the binary graphs.

Example:

import os
from agat.data import LoadDataset
dataset=LoadDataset(os.path.join('dataset', 'all_graphs.bin'))

# you can index or slice the dataset.
g0, props0 = dataset[0]
g_batch, props = dataset[0:100] # the g_batch is a batch collection of graphs. See https://docs.dgl.ai/en/1.1.x/generated/dgl.batch.html
__init__(self, dataset_path=None, from_file=True, graph_list=None, props=None)

Tip

You can load a dataset from file or from the RAM. From file: dataset_path='string_example' and from_file=True From RAM: Specify from_file=False, and provide graph_list and props.

Parameters:
  • dataset_path (str) – A paths leads to the binary DGL graph file.

  • from_file (bool) – Load from file or not.

  • graph_list (list) – A list of graphs.

  • props (list) – Properties tensor corresponding to a list of graphs.

Returns:

a graph dataset.

Return type:

list

__getitem__(self, index)

Index or slice the dataset.

param index:

list index or slice

type index:

int/slice

return:

Dataset

rtype:

agat.data.Dataset

__repr__(self)

Output if you print() a dataset.

__len__(self)

Output if you len() a dataset.

save(self, file='graphs.bin')

Save the dataset in RAM to the disk.

Parameters:

file (str) – The output file name.

Returns:

None. A file will be saved to the disk.

class Collater(object)

The collate function used in torch.utils.data.DataLoader: https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

The collate function determines how to merge the batch data.

Example:

import os
from agat.data import LoadDataset, Collater
from torch.utils.data import DataLoader

dataset=LoadDataset(os.path.join('dataset', 'all_graphs.bin'))
collate_fn = Collater(device='cuda')
data_loader = DataLoader(dataset, batch_size=64, shuffle=True, collate_fn=collate_fn)
__init__(self, device='cuda')
Parameters:

device (str) – The device for manipulating Dataset(s)

__call__(self, data)

Collate the data into batches.

Parameters:

data (AGAT Dataset) – the output of Dataset

Returns:

AGAT Dataset with dgl batch graphs. See https://docs.dgl.ai/en/1.1.x/generated/dgl.batch.html

Return type:

AGAT Dataset

concat_graphs(*list_of_bin)

Concat binary graph files.

Parameters:

*list_of_bin (strings) – input file names of binary graphs.

Returns:

A new file is saved to the current directory: concated_graphs.bin.

Return type:

None. A new file.

Example:

concat_graphs('graphs1.bin', 'graphs2.bin', 'graphs3.bin')
concat_dataset(*list_of_datasets, save_file=False, fname='concated_graphs.bin')

Concat agat.dataset.Dataset in the RAM.

Parameters:
  • *list_of_datasets (agat.dataset.Dataset) – a list of agat.dataset.Dataset object.

  • save_file (bool) – save to a new file or not. Default: False

  • fname (str) – The saved file name if savefile=True. Default: ‘concated_graphs.bin’

Returns:

A new file is saved to the current directory: concated_graphs.bin.

Return type:

agat.dataset.Dataset

select_graphs_random(fname: str, num: int)

Randomly split graphs from a binary file.

Parameters:
  • fname (str) – input file name.

  • num (int) – number of selected graphs (should be smaller than number of all graphs.

Returns:

A new file is saved to the current directory: Selected_graphs.bin.

Return type:

None. A new file.

Example:

select_graphs_random('graphs1.bin')
select_graphs_from_dataset_random(dataset, num: int, save_file=False, fname='selected_graphs.bin')

Randomly split graphs from a binary file.

Parameters:
  • fname (str) – input file name.

  • num (int) – number of selected graphs (should be smaller than number of all graphs.

Returns:

A new file is saved to the current directory: Selected_graphs.bin.

Return type:

None. A new file.

Example:

select_graphs_random('graphs1.bin')
save_dataset(dataset: Dataset, fname='graphs.bin')

Save a agat.dataset.Dataset to a binary file.

Parameters:
  • dataset (agat.dataset.Dataset) – AGAT dataset in RAM.

  • fname (str) – output file name.