fedlib.datasets.DatasetPartitioner

class DatasetPartitioner(num_clients: int, random_seed: int = 123, client_id_generator: Callable[[], Iterator] = None)[source]

Bases: ABC

An abstract base class for dataset splitting strategies that considers random states from both NumPy and PyTorch.

generate_subsets(dataset: Dataset) → Dict[str, Subset][source]

Generates subsets from a single dataset.

Parameters:: dataset – The dataset to be split.
Returns:: A dictionary with client IDs as keys and corresponding subsets as values.

generate_paired_subsets(train_dataset: Union[Dataset, HuggingFaceDataset], test_dataset: Union[Dataset, HuggingFaceDataset]) → Dict[str, Tuple[Subset, Subset]][source]

Generates paired subsets from two keyconcepts that may interact with each other.

Parameters:

train_dataset – The training dataset to be split.
test_dataset – The testing dataset to be split.

Returns:

A dictionary with client IDs as keys and tuples of corresponding training and testing subsets as values.

generate_client_datasets(train_dataset: Union[Dataset, HuggingFaceDataset], test_dataset: Union[Dataset, HuggingFaceDataset], **kwargs) → List[ClientDataset][source]

Generates client keyconcepts from two keyconcepts that may interact with each other.

Parameters:

train_dataset – The training dataset to be split.
test_dataset – The testing dataset to be split.

Returns:

A list of ClientDataset instances.

abstract split_dataset(dataset: Dataset) → List[Subset][source]

Split a single dataset into multiple subsets, each keyed by a unique client_id.

Parameters:

dataset (Dataset) – The dataset to be split.

Returns:

A dictionary where the key is a string client_id and the: value is a Subset.

Return type:

Dict[str, Subset]

abstract split_datasets(train_dataset: Dataset, test_dataset: Dataset) → List[Tuple[Subset, Subset]][source]

Split two keyconcepts (e.g., training and testing keyconcepts) into multiple pairs of subsets, each keyed by a unique client_id.

Parameters:

train_dataset (Dataset) – The training dataset to be split.
test_dataset (Dataset) – The testing dataset to be split.

Returns:

A dictionary where the key is a string: client_id and the value is a tuple of two Subsets (training and testing).

Return type:

Dict[str, Tuple[Subset, Subset]]

generate_client_ids() → List[Any][source]: Generate a list of client IDs using the specified client ID generator.