sebs.storage package
Submodules
sebs.storage.config module
Configuration classes for storage backends in the Serverless Benchmarking Suite.
All configuration classes support serialization/deserialization for caching and provide environment variable mappings for runtime configuration.
- class sebs.storage.config.MinioConfig(address: str = '', mapped_port: int = -1, access_key: str = '', secret_key: str = '', instance_id: str = '', output_buckets: ~typing.List[str] = <factory>, input_buckets: ~typing.List[str] = <factory>, version: str = '', data_volume: str = '', type: str = 'minio', remove_containers: bool = False)[source]
Bases:
PersistentStorageConfigConfiguration for MinIO object storage.
MinIO provides a local S3-compatible object storage service that runs in a Docker container. This configuration class stores all the necessary parameters for deploying and connecting to a MinIO instance.
- address
Network address where MinIO is accessible (auto-detected)
- Type:
str
- mapped_port
Host port mapped to MinIO’s internal port 9000
- Type:
int
- access_key
Access key for MinIO authentication (auto-generated)
- Type:
str
- secret_key
Secret key for MinIO authentication (auto-generated)
- Type:
str
- instance_id
Docker container ID of the running MinIO instance
- Type:
str
- output_buckets
List of bucket names used for benchmark output
- Type:
List[str]
- input_buckets
List of bucket names used for benchmark input
- Type:
List[str]
- version
MinIO Docker image version to use
- Type:
str
- data_volume
Host directory path for persistent data storage
- Type:
str
- type
Storage type identifier (always “minio”)
- Type:
str
- access_key: str = ''
- address: str = ''
- data_volume: str = ''
- static deserialize(data: Dict[str, Any]) MinioConfig[source]
Deserialize configuration from a dictionary.
Creates a new MinioConfig instance from dictionary data, typically loaded from cache or configuration files. Only known configuration fields are used, unknown fields are ignored.
- Parameters:
data – Dictionary containing configuration data
- Returns:
New configuration instance
- Return type:
MinioConfig
- envs() Dict[str, str][source]
Generate environment variables for MinIO configuration.
Creates environment variables that can be used by benchmark functions to connect to the MinIO storage instance.
- Returns:
Environment variables for MinIO connection
- Return type:
Dict[str, str]
- input_buckets: List[str]
- instance_id: str = ''
- mapped_port: int = -1
- output_buckets: List[str]
- remove_containers: bool = False
- secret_key: str = ''
- serialize() Dict[str, Any][source]
Serialize the configuration to a dictionary.
- Returns:
All configuration fields as a dictionary
- Return type:
Dict[str, Any]
- type: str = 'minio'
- update_cache(path: List[str], cache: Cache) None[source]
Update the cache with this configuration’s values.
Stores all configuration fields in the cache using the specified path as a prefix. This allows the configuration to be restored later from the cache.
- Parameters:
path – Cache key path prefix for this configuration
cache – Cache instance to store configuration in
- version: str = ''
- class sebs.storage.config.NoSQLStorageConfig[source]
Bases:
ABCAbstract base class for NoSQL database storage configuration.
This class defines the interface that all NoSQL storage configurations must implement. It provides serialization methods used for caching and configuration management.
This class will be overidden by specific implementations for different FaaS systems.
- Subclasses must implement:
serialize(): Convert configuration to dictionary for caching
- abstractmethod serialize() Dict[str, Any][source]
Serialize the configuration to a dictionary.
- Returns:
Serialized configuration data suitable for JSON storage
- Return type:
Dict[str, Any]
- class sebs.storage.config.PersistentStorageConfig[source]
Bases:
ABCAbstract base class for persistent object storage configuration.
This class defines the interface that all object storage configurations must implement. It provides methods for serialization and environment variable generation that are used for caching and runtime configuration.
This is used by MinioStorage in different deployments.
- Subclasses must implement:
serialize(): Convert configuration to dictionary for caching
envs(): Generate environment variables for benchmark runtime
- abstractmethod envs() Dict[str, str][source]
Generate environment variables for the storage configuration.
- Returns:
Environment variables to be set in benchmark runtime
- Return type:
Dict[str, str]
- abstractmethod serialize() Dict[str, Any][source]
Serialize the configuration to a dictionary.
- Returns:
Serialized configuration data suitable for JSON storage
- Return type:
Dict[str, Any]
- class sebs.storage.config.ScyllaDBConfig(address: str = '', mapped_port: int = -1, alternator_port: int = 8000, access_key: str = 'None', secret_key: str = 'None', instance_id: str = '', region: str = 'None', cpus: int = -1, memory: int = -1, version: str = '', data_volume: str = '', remove_containers: bool = False)[source]
Bases:
NoSQLStorageConfigConfiguration for ScyllaDB DynamoDB-compatible NoSQL storage.
ScyllaDB provides a high-performance NoSQL database with DynamoDB-compatible API through its Alternator interface. This configuration class stores all the necessary parameters for deploying and connecting to a ScyllaDB instance.
- address
Network address where ScyllaDB is accessible (auto-detected)
- Type:
str
- mapped_port
Host port mapped to ScyllaDB’s Alternator port
- Type:
int
- alternator_port
Internal port for DynamoDB-compatible API (default: 8000)
- Type:
int
- access_key
Access key for DynamoDB API (placeholder value)
- Type:
str
- secret_key
Secret key for DynamoDB API (placeholder value)
- Type:
str
- instance_id
Docker container ID of the running ScyllaDB instance
- Type:
str
- region
AWS region placeholder (not used for local deployment)
- Type:
str
- cpus
Number of CPU cores allocated to ScyllaDB container
- Type:
int
- memory
Memory allocation in MB for ScyllaDB container
- Type:
int
- version
ScyllaDB Docker image version to use
- Type:
str
- data_volume
Host directory path for persistent data storage
- Type:
str
- access_key: str = 'None'
- address: str = ''
- alternator_port: int = 8000
- cpus: int = -1
- data_volume: str = ''
- static deserialize(data: Dict[str, Any]) ScyllaDBConfig[source]
Deserialize configuration from a dictionary.
Creates a new ScyllaDBConfig instance from dictionary data, typically loaded from cache or configuration files. Only known configuration fields are used, unknown fields are ignored.
- Parameters:
data – Dictionary containing configuration data
- Returns:
New configuration instance
- Return type:
ScyllaDBConfig
- instance_id: str = ''
- mapped_port: int = -1
- memory: int = -1
- region: str = 'None'
- remove_containers: bool = False
- secret_key: str = 'None'
- serialize() Dict[str, Any][source]
Serialize the configuration to a dictionary.
- Returns:
All configuration fields as a dictionary
- Return type:
Dict[str, Any]
- update_cache(path: List[str], cache: Cache) None[source]
Update the cache with this configuration’s values.
Stores all configuration fields in the cache using the specified path as a prefix. This allows the configuration to be restored later from the cache.
- Parameters:
path – Cache key path prefix for this configuration
cache – Cache instance to store configuration in
- version: str = ''
sebs.storage.minio module
Module for MinIO S3-compatible storage in the Serverless Benchmarking Suite.
MinIO runs in a Docker container and provides persistent storage for benchmark data and results. It is primarily used for local testing and on cloud platforms with no object storage, e.g., OpenWhisk.
- class sebs.storage.minio.Minio(docker_client: DockerClient, cache_client: Cache, resources: Resources, replace_existing: bool)[source]
Bases:
PersistentStorageThis class manages a self-hosted MinIO storage instance running in a Docker container. It handles bucket creation, file uploads/downloads, and container lifecycle management.
- config
MinIO configuration settings
- connection
MinIO client connection
- MINIO_REGION = 'us-east-1'
- T = ~T
- clean_bucket(bucket_name: str) None[source]
Remove all objects from a bucket.
Deletes all objects within the specified bucket but keeps the bucket itself. Logs any errors that occur during object deletion.
- Parameters:
bucket – Name of the bucket to clean
- property config: MinioConfig
Get the MinIO configuration.
- Returns:
The configuration object
- Return type:
MinioConfig
- configure_connection() None[source]
Configure the connection to the MinIO container.
Determines the appropriate address to connect to the MinIO container based on the host platform. For Linux, it uses the container’s bridge IP address, hile for Windows, macOS, or WSL it uses localhost with the mapped port.
- Raises:
RuntimeError – If the MinIO container is not available or if the IP address cannot be detected
- correct_name(name: str) str[source]
Format a bucket name to comply with MinIO naming requirements.
For MinIO, no name correction is needed (unlike some cloud providers that enforce additional restrictions).
- Parameters:
name – Original bucket name
- Returns:
Bucket name (unchanged for MinIO)
- Return type:
str
- static deployment_name() str[source]
Get the deployment platform name.
- Returns:
Deployment name (‘minio’)
- Return type:
str
- static deserialize(cached_config: MinioConfig, cache_client: Cache, res: Resources) Minio[source]
Deserialize a MinIO instance from cached configuration.
Creates a new Minio instance from cached configuration data.
- Parameters:
cached_config – Cached MinIO configuration
cache_client – Cache client
res – Resources configuration
- Returns:
Deserialized Minio instance
- Return type:
Minio
- download(bucket_name: str, key: str, filepath: str) None[source]
Download an object from a bucket to a local file.
- Parameters:
bucket_name – Name of the source bucket
key – Object key/path in the bucket
filepath – Local destination path
- Raises:
RuntimeError – If the bucket does not exist
minio.error.ResponseError – If the download fails
- exists_bucket(bucket_name: str) bool[source]
Check if a bucket exists.
- Parameters:
bucket_name – Name of the bucket to check
- Returns:
True if the bucket exists, False otherwise
- Return type:
bool
- get_connection() Minio[source]
Create a new MinIO client connection.
Creates a connection to the MinIO server using the configured address, credentials, and HTTP client settings.
- Returns:
Configured MinIO client
- Return type:
minio.Minio
- list_bucket(bucket_name: str, prefix: str = '') List[str][source]
List all objects in a bucket with an optional prefix filter.
- Parameters:
bucket_name – Name of the bucket to list
prefix – Optional prefix to filter objects
- Returns:
List of object names in the bucket
- Return type:
List[str]
- Raises:
RuntimeError – If the bucket does not exist
- list_buckets(bucket_name: str | None = None) List[str][source]
List all buckets, optionally filtered by name.
- Parameters:
bucket_name – Optional filter for bucket names
- Returns:
List of bucket names
- Return type:
List[str]
- remove_bucket(bucket: str) None[source]
Delete a bucket completely.
Removes the specified bucket from the MinIO storage. The bucket must be empty before it can be deleted.
- Parameters:
bucket – Name of the bucket to remove
- serialize() Dict[str, Any][source]
Serialize MinIO configuration to a dictionary.
- Returns:
Serialized configuration data
- Return type:
dict
- start() None[source]
Start a MinIO storage container.
Creates and runs a Docker container with MinIO, configuring it with random credentials and mounting a volume for persistent storage. The container runs in detached mode and is accessible via the configured port.
- Raises:
RuntimeError – If starting the MinIO container fails
- stop() None[source]
Stop the MinIO container.
Gracefully stops the running MinIO container if it exists. Logs an error if the container is not known.
- static typename() str[source]
Get the qualified type name of this class.
- Returns:
Full type name including deployment name
- Return type:
str
- upload(bucket_name: str, filepath: str, key: str) None[source]
Upload a file to a bucket.
Not implemented for this class. Use fput_object directly or uploader_func.
- Raises:
NotImplementedError – This method is not implemented
- uploader_func(path_idx: int, file: str, filepath: str) None[source]
Upload a file to the MinIO storage.
Uploads a file to the specified input prefix in the benchmarks bucket. This function is passed to benchmarks for uploading their input data.
- Parameters:
path_idx – Index of the input prefix to use
file – Name of the file within the bucket
filepath – Local path to the file to upload
- Raises:
minio.error.ResponseError – If the upload fails
sebs.storage.resources module
Resource management for self-hosted storage deployments in SeBS.
Its main responsibility is providing consistent interface and cache behavior of self-hosted storage for the entire SeBS system.
- Key Classes:
SelfHostedResources: Configuration management for self-hosted storage resources SelfHostedSystemResources: System-level resource management and service provisioning
- class sebs.storage.resources.SelfHostedResources(name: str, storage_cfg: PersistentStorageConfig | None = None, nosql_storage_cfg: NoSQLStorageConfig | None = None)[source]
Bases:
ResourcesResource configuration for self-hosted storage deployments.
- _object_storage
Configuration for object storage (MinIO)
- _nosql_storage
Configuration for NoSQL storage (ScyllaDB)
- property nosql_storage_config: NoSQLStorageConfig | None
Get the NoSQL storage configuration.
- Returns:
NoSQL storage configuration or None
- Return type:
Optional[NoSQLStorageConfig]
- serialize() Dict[str, Any][source]
Serialize the resource configuration to a dictionary.
- Returns:
Serialized configuration containing storage and/or nosql sections
- Return type:
Dict[str, Any]
- property storage_config: PersistentStorageConfig | None
Get the object storage configuration.
- Returns:
Object storage configuration or None
- Return type:
Optional[PersistentStorageConfig]
- update_cache(cache: Cache) None[source]
Update the configuration cache with current resource settings.
Stores both object storage and NoSQL storage configurations in the cache for later retrieval.
- Parameters:
cache – Cache instance to store configurations in
- class sebs.storage.resources.SelfHostedSystemResources(name: str, config: Config, cache_client: Cache, docker_client: DockerClient, logger_handlers: LoggingHandlers)[source]
Bases:
SystemResourcesSystem-level resource management for self-hosted storage deployments.
- _name
Name of the deployment
- _logging_handlers
Logging configuration handlers
- _storage
Active persistent storage instance (MinIO)
- _nosql_storage
Active NoSQL storage instance (ScyllaDB)
- get_nosql_storage() NoSQLStorage[source]
Get or create a NoSQL storage instance.
Creates a ScyllaDB storage instance if one doesn’t exist, or returns the existing instance. The storage is deserialized from a serialized config of an existing storage deployment.
- Returns:
ScyllaDB storage instance
- Return type:
NoSQLStorage
- Raises:
RuntimeError – If NoSQL storage configuration is missing or unsupported
- get_storage(replace_existing: bool | None = None) PersistentStorage[source]
Get or create a persistent storage instance.
Creates a MinIO storage instance if one doesn’t exist, or returns the existing instance. The storage is deserialized from a serialized config of an existing storage deployment.
- Parameters:
replace_existing – Whether to replace existing buckets (optional)
- Returns:
MinIO storage instance
- Return type:
PersistentStorage
- Raises:
RuntimeError – If storage configuration is missing or unsupported
sebs.storage.scylladb module
ScyllaDB NoSQL storage implementation for the Serverless Benchmarking Suite.
This module implements NoSQL database storage using ScyllaDB, which provides a DynamoDB-compatible API through its Alternator interface. ScyllaDB runs in a Docker container, and the implementation uses boto3 while running locally for development and testing purposes.
- class sebs.storage.scylladb.ScyllaDB(docker_client: DockerClient, cache_client: Cache, config: ScyllaDBConfig, resources: Resources | None = None)[source]
Bases:
NoSQLStorageScyllaDB implementation for DynamoDB-compatible NoSQL storage.
This class manages a ScyllaDB instance running in a Docker container, providing DynamoDB-compatible NoSQL storage through ScyllaDB’s Alternator interface. It handles table creation, data operations, and container lifecycle management.
- _docker_client
Docker client for container management
- _storage_container
Docker container running ScyllaDB
- _cfg
ScyllaDB configuration settings
- _tables
Mapping of benchmark names to table mappings
- _serializer
DynamoDB type serializer for data conversion
- client
Boto3 DynamoDB client configured for ScyllaDB
- SCYLLADB_REGION = 'None'
- T = ~T
- clear_table(name: str) str[source]
Clear all data from a table.
- Parameters:
name – Name of the table to clear
- Returns:
Table name
- Return type:
str
- Raises:
NotImplementedError – This method is not yet implemented
- property config: ScyllaDBConfig
Get the ScyllaDB configuration.
- Returns:
The configuration object
- Return type:
ScyllaDBConfig
- configure_connection() None[source]
Configure the connection to the ScyllaDB container.
Determines the appropriate address to connect to the ScyllaDB container based on the host platform. For Linux, it uses the container’s IP address, while for Windows, macOS, or WSL it uses localhost with the mapped port.
Creates a boto3 DynamoDB client configured to connect to ScyllaDB’s Alternator interface.
- Raises:
RuntimeError – If the ScyllaDB container is not available or if the IP address cannot be detected
- create_table(benchmark: str, name: str, primary_key: str, secondary_key: str | None = None) str[source]
Create a DynamoDB table in ScyllaDB.
Creates a new DynamoDB table with the specified primary key and optional secondary key. The table name is constructed to be unique across benchmarks and resource groups.
Note: Unlike cloud providers with hierarchical database structures, ScyllaDB requires unique table names at the cluster level.
Note: PAY_PER_REQUEST billing mode has no effect here.
- Parameters:
benchmark – Name of the benchmark
name – Logical table name
primary_key – Name of the primary key attribute
secondary_key – Optional name of the secondary key attribute
- Returns:
The actual table name that was created
- Return type:
str
- Raises:
RuntimeError – If table creation fails for unknown reasons
- static deployment_name() str[source]
Get the deployment platform name.
- Returns:
Deployment name (‘scylladb’)
- Return type:
str
- static deserialize(cached_config: ScyllaDBConfig, cache_client: Cache, resources: Resources) ScyllaDB[source]
Deserialize a ScyllaDB instance from cached configuration.
Creates a new ScyllaDB instance from cached configuration data.
- Parameters:
cached_config – Cached ScyllaDB configuration
cache_client – Cache client
resources – Resources configuration
- Returns:
Deserialized ScyllaDB instance
- Return type:
ScyllaDB
- envs() Dict[str, str][source]
Generate environment variables for ScyllaDB configuration.
Creates environment variables that can be used by benchmark functions to connect to the ScyllaDB storage instance.
- Returns:
Environment variables for ScyllaDB connection
- Return type:
Dict[str, str]
- get_tables(benchmark: str) Dict[str, str][source]
Get the table name mappings for a benchmark.
- Parameters:
benchmark – Name of the benchmark
- Returns:
Mapping from original table names to actual table names
- Return type:
Dict[str, str]
- remove_table(name: str) str[source]
Remove a table completely.
- Parameters:
name – Name of the table to remove
- Returns:
Table name
- Return type:
str
- Raises:
NotImplementedError – This method is not yet implemented
- retrieve_cache(benchmark: str) bool[source]
Retrieve cached table configuration for a benchmark.
Checks if table configuration for the given benchmark is already loaded in memory, and if not, attempts to load it from the cache.
- Parameters:
benchmark – Name of the benchmark
- Returns:
True if table configuration was found, False otherwise
- Return type:
bool
- serialize() Tuple[NoSQLStorage, Dict[str, Any]][source]
Serialize ScyllaDB configuration to a tuple.
- Returns:
Storage type and serialized configuration
- Return type:
Tuple[StorageType, Dict[str, Any]]
- start() None[source]
Start a ScyllaDB storage container.
Creates and runs a Docker container with ScyllaDB, configuring it with the specified CPU and memory resources. The container runs in detached mode and exposes the Alternator DynamoDB-compatible API on the configured port.
The method waits for ScyllaDB to fully initialize by checking the nodetool status until the service is ready.
- Raises:
RuntimeError – If starting the ScyllaDB container fails or if ScyllaDB fails to initialize within the timeout period
- stop() None[source]
Stop the ScyllaDB container.
Gracefully stops the running ScyllaDB container if it exists.
- static typename() str[source]
Get the qualified type name of this class.
- Returns:
Full type name including deployment name
- Return type:
str
- update_cache(benchmark: str) None[source]
Update the cache with table configuration for a benchmark.
Stores the table configuration for the specified benchmark in the cache for future retrieval.
- Parameters:
benchmark – Name of the benchmark
- write_to_table(benchmark: str, table: str, data: Dict[str, Any], primary_key: Tuple[str, str], secondary_key: Tuple[str, str] | None = None) None[source]
Write data to a DynamoDB table in ScyllaDB.
Serializes the data using DynamoDB type serialization and writes it to the specified table with the provided primary and optional secondary keys.
- Parameters:
benchmark – Name of the benchmark
table – Logical table name
data – Data to write to the table
primary_key – Tuple of (key_name, key_value) for the primary key
secondary_key – Optional tuple of (key_name, key_value) for the secondary key
- Raises:
AssertionError – If the table name is not found
Module contents
This module provides storage abstractions and implementations for SeBS, supporting both object storage (S3-compatible) and NoSQL database storage.
It includes: - Configuration classes for different storage backends - MinIO implementation for local S3-compatible object storage - ScyllaDB implementation for local DynamoDB-compatible NoSQL storage - Resource management classes for self-hosted storage deployments
The storage module enables benchmarks to work with persistent data storage across different deployment environments while maintaining consistent interfaces. Thus, we can seamlessly port benchmarks between clouds and open-source serverless platforms.
- Key Components:
config: Configuration dataclasses for storage backends
minio: MinIO-based object storage implementation
scylladb: ScyllaDB-based NoSQL storage implementation
resources: Resource management for self-hosted storage deployments
Example
To use MinIO object storage in a benchmark:
```python from sebs.storage.minio import Minio from sebs.storage.config import MinioConfig
# Configure and start MinIO config = MinioConfig(mapped_port=9000, version=”latest”) storage = Minio(docker_client, cache_client, resources, False) storage.config = config storage.start() ```