sebs.storage package

Submodules

sebs.storage.config module

Configuration classes for storage backends in the Serverless Benchmarking Suite.

All configuration classes support serialization/deserialization for caching and provide environment variable mappings for runtime configuration.

class sebs.storage.config.MinioConfig(address: str = '', mapped_port: int = -1, access_key: str = '', secret_key: str = '', instance_id: str = '', output_buckets: ~typing.List[str] = <factory>, input_buckets: ~typing.List[str] = <factory>, version: str = '', data_volume: str = '', type: str = 'minio', remove_containers: bool = False)[source]

Bases: PersistentStorageConfig

Configuration for MinIO object storage.

MinIO provides a local S3-compatible object storage service that runs in a Docker container. This configuration class stores all the necessary parameters for deploying and connecting to a MinIO instance.

address

Network address where MinIO is accessible (auto-detected)

Type:

str

mapped_port

Host port mapped to MinIO’s internal port 9000

Type:

int

access_key

Access key for MinIO authentication (auto-generated)

Type:

str

secret_key

Secret key for MinIO authentication (auto-generated)

Type:

str

instance_id

Docker container ID of the running MinIO instance

Type:

str

output_buckets

List of bucket names used for benchmark output

Type:

List[str]

input_buckets

List of bucket names used for benchmark input

Type:

List[str]

version

MinIO Docker image version to use

Type:

str

data_volume

Host directory path for persistent data storage

Type:

str

type

Storage type identifier (always “minio”)

Type:

str

access_key: str = ''
address: str = ''
data_volume: str = ''
static deserialize(data: Dict[str, Any]) MinioConfig[source]

Deserialize configuration from a dictionary.

Creates a new MinioConfig instance from dictionary data, typically loaded from cache or configuration files. Only known configuration fields are used, unknown fields are ignored.

Parameters:

data – Dictionary containing configuration data

Returns:

New configuration instance

Return type:

MinioConfig

envs() Dict[str, str][source]

Generate environment variables for MinIO configuration.

Creates environment variables that can be used by benchmark functions to connect to the MinIO storage instance.

Returns:

Environment variables for MinIO connection

Return type:

Dict[str, str]

input_buckets: List[str]
instance_id: str = ''
mapped_port: int = -1
output_buckets: List[str]
remove_containers: bool = False
secret_key: str = ''
serialize() Dict[str, Any][source]

Serialize the configuration to a dictionary.

Returns:

All configuration fields as a dictionary

Return type:

Dict[str, Any]

type: str = 'minio'
update_cache(path: List[str], cache: Cache) None[source]

Update the cache with this configuration’s values.

Stores all configuration fields in the cache using the specified path as a prefix. This allows the configuration to be restored later from the cache.

Parameters:
  • path – Cache key path prefix for this configuration

  • cache – Cache instance to store configuration in

version: str = ''
class sebs.storage.config.NoSQLStorageConfig[source]

Bases: ABC

Abstract base class for NoSQL database storage configuration.

This class defines the interface that all NoSQL storage configurations must implement. It provides serialization methods used for caching and configuration management.

This class will be overidden by specific implementations for different FaaS systems.

Subclasses must implement:
  • serialize(): Convert configuration to dictionary for caching

abstractmethod serialize() Dict[str, Any][source]

Serialize the configuration to a dictionary.

Returns:

Serialized configuration data suitable for JSON storage

Return type:

Dict[str, Any]

class sebs.storage.config.PersistentStorageConfig[source]

Bases: ABC

Abstract base class for persistent object storage configuration.

This class defines the interface that all object storage configurations must implement. It provides methods for serialization and environment variable generation that are used for caching and runtime configuration.

This is used by MinioStorage in different deployments.

Subclasses must implement:
  • serialize(): Convert configuration to dictionary for caching

  • envs(): Generate environment variables for benchmark runtime

abstractmethod envs() Dict[str, str][source]

Generate environment variables for the storage configuration.

Returns:

Environment variables to be set in benchmark runtime

Return type:

Dict[str, str]

abstractmethod serialize() Dict[str, Any][source]

Serialize the configuration to a dictionary.

Returns:

Serialized configuration data suitable for JSON storage

Return type:

Dict[str, Any]

class sebs.storage.config.ScyllaDBConfig(address: str = '', mapped_port: int = -1, alternator_port: int = 8000, access_key: str = 'None', secret_key: str = 'None', instance_id: str = '', region: str = 'None', cpus: int = -1, memory: int = -1, version: str = '', data_volume: str = '', remove_containers: bool = False)[source]

Bases: NoSQLStorageConfig

Configuration for ScyllaDB DynamoDB-compatible NoSQL storage.

ScyllaDB provides a high-performance NoSQL database with DynamoDB-compatible API through its Alternator interface. This configuration class stores all the necessary parameters for deploying and connecting to a ScyllaDB instance.

address

Network address where ScyllaDB is accessible (auto-detected)

Type:

str

mapped_port

Host port mapped to ScyllaDB’s Alternator port

Type:

int

alternator_port

Internal port for DynamoDB-compatible API (default: 8000)

Type:

int

access_key

Access key for DynamoDB API (placeholder value)

Type:

str

secret_key

Secret key for DynamoDB API (placeholder value)

Type:

str

instance_id

Docker container ID of the running ScyllaDB instance

Type:

str

region

AWS region placeholder (not used for local deployment)

Type:

str

cpus

Number of CPU cores allocated to ScyllaDB container

Type:

int

memory

Memory allocation in MB for ScyllaDB container

Type:

int

version

ScyllaDB Docker image version to use

Type:

str

data_volume

Host directory path for persistent data storage

Type:

str

access_key: str = 'None'
address: str = ''
alternator_port: int = 8000
cpus: int = -1
data_volume: str = ''
static deserialize(data: Dict[str, Any]) ScyllaDBConfig[source]

Deserialize configuration from a dictionary.

Creates a new ScyllaDBConfig instance from dictionary data, typically loaded from cache or configuration files. Only known configuration fields are used, unknown fields are ignored.

Parameters:

data – Dictionary containing configuration data

Returns:

New configuration instance

Return type:

ScyllaDBConfig

instance_id: str = ''
mapped_port: int = -1
memory: int = -1
region: str = 'None'
remove_containers: bool = False
secret_key: str = 'None'
serialize() Dict[str, Any][source]

Serialize the configuration to a dictionary.

Returns:

All configuration fields as a dictionary

Return type:

Dict[str, Any]

update_cache(path: List[str], cache: Cache) None[source]

Update the cache with this configuration’s values.

Stores all configuration fields in the cache using the specified path as a prefix. This allows the configuration to be restored later from the cache.

Parameters:
  • path – Cache key path prefix for this configuration

  • cache – Cache instance to store configuration in

version: str = ''

sebs.storage.minio module

Module for MinIO S3-compatible storage in the Serverless Benchmarking Suite.

MinIO runs in a Docker container and provides persistent storage for benchmark data and results. It is primarily used for local testing and on cloud platforms with no object storage, e.g., OpenWhisk.

class sebs.storage.minio.Minio(docker_client: DockerClient, cache_client: Cache, resources: Resources, replace_existing: bool)[source]

Bases: PersistentStorage

This class manages a self-hosted MinIO storage instance running in a Docker container. It handles bucket creation, file uploads/downloads, and container lifecycle management.

config

MinIO configuration settings

connection

MinIO client connection

MINIO_REGION = 'us-east-1'
T = ~T
clean_bucket(bucket_name: str) None[source]

Remove all objects from a bucket.

Deletes all objects within the specified bucket but keeps the bucket itself. Logs any errors that occur during object deletion.

Parameters:

bucket – Name of the bucket to clean

property config: MinioConfig

Get the MinIO configuration.

Returns:

The configuration object

Return type:

MinioConfig

configure_connection() None[source]

Configure the connection to the MinIO container.

Determines the appropriate address to connect to the MinIO container based on the host platform. For Linux, it uses the container’s bridge IP address, hile for Windows, macOS, or WSL it uses localhost with the mapped port.

Raises:

RuntimeError – If the MinIO container is not available or if the IP address cannot be detected

correct_name(name: str) str[source]

Format a bucket name to comply with MinIO naming requirements.

For MinIO, no name correction is needed (unlike some cloud providers that enforce additional restrictions).

Parameters:

name – Original bucket name

Returns:

Bucket name (unchanged for MinIO)

Return type:

str

static deployment_name() str[source]

Get the deployment platform name.

Returns:

Deployment name (‘minio’)

Return type:

str

static deserialize(cached_config: MinioConfig, cache_client: Cache, res: Resources) Minio[source]

Deserialize a MinIO instance from cached configuration.

Creates a new Minio instance from cached configuration data.

Parameters:
  • cached_config – Cached MinIO configuration

  • cache_client – Cache client

  • res – Resources configuration

Returns:

Deserialized Minio instance

Return type:

Minio

download(bucket_name: str, key: str, filepath: str) None[source]

Download an object from a bucket to a local file.

Parameters:
  • bucket_name – Name of the source bucket

  • key – Object key/path in the bucket

  • filepath – Local destination path

Raises:
  • RuntimeError – If the bucket does not exist

  • minio.error.ResponseError – If the download fails

exists_bucket(bucket_name: str) bool[source]

Check if a bucket exists.

Parameters:

bucket_name – Name of the bucket to check

Returns:

True if the bucket exists, False otherwise

Return type:

bool

get_connection() Minio[source]

Create a new MinIO client connection.

Creates a connection to the MinIO server using the configured address, credentials, and HTTP client settings.

Returns:

Configured MinIO client

Return type:

minio.Minio

list_bucket(bucket_name: str, prefix: str = '') List[str][source]

List all objects in a bucket with an optional prefix filter.

Parameters:
  • bucket_name – Name of the bucket to list

  • prefix – Optional prefix to filter objects

Returns:

List of object names in the bucket

Return type:

List[str]

Raises:

RuntimeError – If the bucket does not exist

list_buckets(bucket_name: str | None = None) List[str][source]

List all buckets, optionally filtered by name.

Parameters:

bucket_name – Optional filter for bucket names

Returns:

List of bucket names

Return type:

List[str]

remove_bucket(bucket: str) None[source]

Delete a bucket completely.

Removes the specified bucket from the MinIO storage. The bucket must be empty before it can be deleted.

Parameters:

bucket – Name of the bucket to remove

serialize() Dict[str, Any][source]

Serialize MinIO configuration to a dictionary.

Returns:

Serialized configuration data

Return type:

dict

start() None[source]

Start a MinIO storage container.

Creates and runs a Docker container with MinIO, configuring it with random credentials and mounting a volume for persistent storage. The container runs in detached mode and is accessible via the configured port.

Raises:

RuntimeError – If starting the MinIO container fails

stop() None[source]

Stop the MinIO container.

Gracefully stops the running MinIO container if it exists. Logs an error if the container is not known.

static typename() str[source]

Get the qualified type name of this class.

Returns:

Full type name including deployment name

Return type:

str

upload(bucket_name: str, filepath: str, key: str) None[source]

Upload a file to a bucket.

Not implemented for this class. Use fput_object directly or uploader_func.

Raises:

NotImplementedError – This method is not implemented

uploader_func(path_idx: int, file: str, filepath: str) None[source]

Upload a file to the MinIO storage.

Uploads a file to the specified input prefix in the benchmarks bucket. This function is passed to benchmarks for uploading their input data.

Parameters:
  • path_idx – Index of the input prefix to use

  • file – Name of the file within the bucket

  • filepath – Local path to the file to upload

Raises:

minio.error.ResponseError – If the upload fails

sebs.storage.resources module

Resource management for self-hosted storage deployments in SeBS.

Its main responsibility is providing consistent interface and cache behavior of self-hosted storage for the entire SeBS system.

Key Classes:

SelfHostedResources: Configuration management for self-hosted storage resources SelfHostedSystemResources: System-level resource management and service provisioning

class sebs.storage.resources.SelfHostedResources(name: str, storage_cfg: PersistentStorageConfig | None = None, nosql_storage_cfg: NoSQLStorageConfig | None = None)[source]

Bases: Resources

Resource configuration for self-hosted storage deployments.

_object_storage

Configuration for object storage (MinIO)

_nosql_storage

Configuration for NoSQL storage (ScyllaDB)

property nosql_storage_config: NoSQLStorageConfig | None

Get the NoSQL storage configuration.

Returns:

NoSQL storage configuration or None

Return type:

Optional[NoSQLStorageConfig]

serialize() Dict[str, Any][source]

Serialize the resource configuration to a dictionary.

Returns:

Serialized configuration containing storage and/or nosql sections

Return type:

Dict[str, Any]

property storage_config: PersistentStorageConfig | None

Get the object storage configuration.

Returns:

Object storage configuration or None

Return type:

Optional[PersistentStorageConfig]

update_cache(cache: Cache) None[source]

Update the configuration cache with current resource settings.

Stores both object storage and NoSQL storage configurations in the cache for later retrieval.

Parameters:

cache – Cache instance to store configurations in

class sebs.storage.resources.SelfHostedSystemResources(name: str, config: Config, cache_client: Cache, docker_client: DockerClient, logger_handlers: LoggingHandlers)[source]

Bases: SystemResources

System-level resource management for self-hosted storage deployments.

_name

Name of the deployment

_logging_handlers

Logging configuration handlers

_storage

Active persistent storage instance (MinIO)

_nosql_storage

Active NoSQL storage instance (ScyllaDB)

get_nosql_storage() NoSQLStorage[source]

Get or create a NoSQL storage instance.

Creates a ScyllaDB storage instance if one doesn’t exist, or returns the existing instance. The storage is deserialized from a serialized config of an existing storage deployment.

Returns:

ScyllaDB storage instance

Return type:

NoSQLStorage

Raises:

RuntimeError – If NoSQL storage configuration is missing or unsupported

get_storage(replace_existing: bool | None = None) PersistentStorage[source]

Get or create a persistent storage instance.

Creates a MinIO storage instance if one doesn’t exist, or returns the existing instance. The storage is deserialized from a serialized config of an existing storage deployment.

Parameters:

replace_existing – Whether to replace existing buckets (optional)

Returns:

MinIO storage instance

Return type:

PersistentStorage

Raises:

RuntimeError – If storage configuration is missing or unsupported

sebs.storage.scylladb module

ScyllaDB NoSQL storage implementation for the Serverless Benchmarking Suite.

This module implements NoSQL database storage using ScyllaDB, which provides a DynamoDB-compatible API through its Alternator interface. ScyllaDB runs in a Docker container, and the implementation uses boto3 while running locally for development and testing purposes.

class sebs.storage.scylladb.ScyllaDB(docker_client: DockerClient, cache_client: Cache, config: ScyllaDBConfig, resources: Resources | None = None)[source]

Bases: NoSQLStorage

ScyllaDB implementation for DynamoDB-compatible NoSQL storage.

This class manages a ScyllaDB instance running in a Docker container, providing DynamoDB-compatible NoSQL storage through ScyllaDB’s Alternator interface. It handles table creation, data operations, and container lifecycle management.

_docker_client

Docker client for container management

_storage_container

Docker container running ScyllaDB

_cfg

ScyllaDB configuration settings

_tables

Mapping of benchmark names to table mappings

_serializer

DynamoDB type serializer for data conversion

client

Boto3 DynamoDB client configured for ScyllaDB

SCYLLADB_REGION = 'None'
T = ~T
clear_table(name: str) str[source]

Clear all data from a table.

Parameters:

name – Name of the table to clear

Returns:

Table name

Return type:

str

Raises:

NotImplementedError – This method is not yet implemented

property config: ScyllaDBConfig

Get the ScyllaDB configuration.

Returns:

The configuration object

Return type:

ScyllaDBConfig

configure_connection() None[source]

Configure the connection to the ScyllaDB container.

Determines the appropriate address to connect to the ScyllaDB container based on the host platform. For Linux, it uses the container’s IP address, while for Windows, macOS, or WSL it uses localhost with the mapped port.

Creates a boto3 DynamoDB client configured to connect to ScyllaDB’s Alternator interface.

Raises:

RuntimeError – If the ScyllaDB container is not available or if the IP address cannot be detected

create_table(benchmark: str, name: str, primary_key: str, secondary_key: str | None = None) str[source]

Create a DynamoDB table in ScyllaDB.

Creates a new DynamoDB table with the specified primary key and optional secondary key. The table name is constructed to be unique across benchmarks and resource groups.

Note: Unlike cloud providers with hierarchical database structures, ScyllaDB requires unique table names at the cluster level.

Note: PAY_PER_REQUEST billing mode has no effect here.

Parameters:
  • benchmark – Name of the benchmark

  • name – Logical table name

  • primary_key – Name of the primary key attribute

  • secondary_key – Optional name of the secondary key attribute

Returns:

The actual table name that was created

Return type:

str

Raises:

RuntimeError – If table creation fails for unknown reasons

static deployment_name() str[source]

Get the deployment platform name.

Returns:

Deployment name (‘scylladb’)

Return type:

str

static deserialize(cached_config: ScyllaDBConfig, cache_client: Cache, resources: Resources) ScyllaDB[source]

Deserialize a ScyllaDB instance from cached configuration.

Creates a new ScyllaDB instance from cached configuration data.

Parameters:
  • cached_config – Cached ScyllaDB configuration

  • cache_client – Cache client

  • resources – Resources configuration

Returns:

Deserialized ScyllaDB instance

Return type:

ScyllaDB

envs() Dict[str, str][source]

Generate environment variables for ScyllaDB configuration.

Creates environment variables that can be used by benchmark functions to connect to the ScyllaDB storage instance.

Returns:

Environment variables for ScyllaDB connection

Return type:

Dict[str, str]

get_tables(benchmark: str) Dict[str, str][source]

Get the table name mappings for a benchmark.

Parameters:

benchmark – Name of the benchmark

Returns:

Mapping from original table names to actual table names

Return type:

Dict[str, str]

remove_table(name: str) str[source]

Remove a table completely.

Parameters:

name – Name of the table to remove

Returns:

Table name

Return type:

str

Raises:

NotImplementedError – This method is not yet implemented

retrieve_cache(benchmark: str) bool[source]

Retrieve cached table configuration for a benchmark.

Checks if table configuration for the given benchmark is already loaded in memory, and if not, attempts to load it from the cache.

Parameters:

benchmark – Name of the benchmark

Returns:

True if table configuration was found, False otherwise

Return type:

bool

serialize() Tuple[NoSQLStorage, Dict[str, Any]][source]

Serialize ScyllaDB configuration to a tuple.

Returns:

Storage type and serialized configuration

Return type:

Tuple[StorageType, Dict[str, Any]]

start() None[source]

Start a ScyllaDB storage container.

Creates and runs a Docker container with ScyllaDB, configuring it with the specified CPU and memory resources. The container runs in detached mode and exposes the Alternator DynamoDB-compatible API on the configured port.

The method waits for ScyllaDB to fully initialize by checking the nodetool status until the service is ready.

Raises:

RuntimeError – If starting the ScyllaDB container fails or if ScyllaDB fails to initialize within the timeout period

stop() None[source]

Stop the ScyllaDB container.

Gracefully stops the running ScyllaDB container if it exists.

static typename() str[source]

Get the qualified type name of this class.

Returns:

Full type name including deployment name

Return type:

str

update_cache(benchmark: str) None[source]

Update the cache with table configuration for a benchmark.

Stores the table configuration for the specified benchmark in the cache for future retrieval.

Parameters:

benchmark – Name of the benchmark

write_to_table(benchmark: str, table: str, data: Dict[str, Any], primary_key: Tuple[str, str], secondary_key: Tuple[str, str] | None = None) None[source]

Write data to a DynamoDB table in ScyllaDB.

Serializes the data using DynamoDB type serialization and writes it to the specified table with the provided primary and optional secondary keys.

Parameters:
  • benchmark – Name of the benchmark

  • table – Logical table name

  • data – Data to write to the table

  • primary_key – Tuple of (key_name, key_value) for the primary key

  • secondary_key – Optional tuple of (key_name, key_value) for the secondary key

Raises:

AssertionError – If the table name is not found

Module contents

This module provides storage abstractions and implementations for SeBS, supporting both object storage (S3-compatible) and NoSQL database storage.

It includes: - Configuration classes for different storage backends - MinIO implementation for local S3-compatible object storage - ScyllaDB implementation for local DynamoDB-compatible NoSQL storage - Resource management classes for self-hosted storage deployments

The storage module enables benchmarks to work with persistent data storage across different deployment environments while maintaining consistent interfaces. Thus, we can seamlessly port benchmarks between clouds and open-source serverless platforms.

Key Components:
  • config: Configuration dataclasses for storage backends

  • minio: MinIO-based object storage implementation

  • scylladb: ScyllaDB-based NoSQL storage implementation

  • resources: Resource management for self-hosted storage deployments

Example

To use MinIO object storage in a benchmark:

```python from sebs.storage.minio import Minio from sebs.storage.config import MinioConfig

# Configure and start MinIO config = MinioConfig(mapped_port=9000, version=”latest”) storage = Minio(docker_client, cache_client, resources, False) storage.config = config storage.start() ```