sebs.experiments package

Submodules

sebs.experiments.config module

Configuration management for benchmark experiments.

This module provides the configuration class for benchmark experiments, handling settings such as: - Runtime environment (language, version) - Architecture (x64, arm64) - Deployment type (container, package) - Code and storage update flags - Experiment-specific settings

The Config class handles serialization and deserialization of experiment configurations, allowing them to be loaded from and saved to configuration files.

class sebs.experiments.config.Config[source]

Bases: object

Configuration class for benchmark experiments.

This class manages the configuration settings for benchmark experiments, including runtime environment, architecture, deployment type, and experiment-specific settings.

_update_code

Whether to update function code

_update_storage

Whether to update storage resources

_container_deployment

Whether to use container-based deployment

_download_results

Whether to download experiment results

_architecture

CPU architecture (e.g., “x64”, “arm64”)

_flags

Dictionary of boolean flags for custom settings

_experiment_configs

Dictionary of experiment-specific settings

_runtime

Runtime environment (language and version)

property architecture: str

Get the CPU architecture.

Returns:

CPU architecture (e.g., “x64”, “arm64”)

check_flag(key: str) bool[source]

Check if a specific experiment flag is set.

Currently it is only used to let benchmark know that Docker volumes are disabled (e.g., in CircleCI environment).

Parameters:

key – Name of the flag to check

Returns:

Value of the flag, or False if the flag is not set

property container_deployment: bool

Get whether to use container-based deployment.

Returns:

True if container-based deployment should be used, False otherwise

static deserialize(config: dict) Config[source]

Deserialize a configuration from a dictionary.

This method creates a new configuration object from a dictionary representation, which may have been loaded from a file or passed from another component.

Parameters:

config – Dictionary representation of the configuration

Returns:

A new configuration object with settings from the dictionary

Note

This method requires Python 3.7+ for proper type annotations. The string type annotation is a forward reference to the Config class.

experiment_settings(name: str) dict[source]

Get settings for a specific experiment.

Parameters:

name – Name of the experiment

Returns:

Dictionary of experiment-specific settings

Raises:

KeyError – If the experiment name is not found in the configuration

property runtime: Runtime

Get the runtime environment.

Returns:

Runtime environment (language and version)

serialize() dict[source]

Serialize the configuration to a dictionary.

This method converts the configuration object to a dictionary that can be saved to a file or passed to other components.

Returns:

Dictionary representation of the configuration

property update_code: bool

Get whether to update function code.

Returns:

True if function code should be updated, False otherwise

property update_storage: bool

Get whether to update storage resources.

Returns:

True if storage resources should be updated, False otherwise

sebs.experiments.environment module

Environment management for experiment execution.

This module provides the ExperimentEnvironment class for managing CPU settings and system configuration during benchmark experiments. This is useful for local, Docker-based executions. It handles:

  • CPU frequency scaling and governor management

  • Hyperthreading control (enable/disable)

  • CPU boost control

  • Memory management (page cache dropping)

  • Intel CPU-specific optimizations

Currently supports only Intel CPUs with the intel_pstate driver.

Note

This module assumes that all CPU cores are online at initialization. Future versions should use lscpu to discover online cores dynamically.

class sebs.experiments.environment.ExperimentEnvironment[source]

Bases: object

Environment management for benchmark experiments.

This class provides methods to control CPU settings, memory management, and other system configurations that can affect benchmark results. It focuses on creating a stable, reproducible environment for experiments.

_cpu_mapping

Dictionary mapping physical cores to logical cores

_vendor

CPU vendor identifier (currently only “intel” supported)

_governor

CPU frequency scaling governor (e.g., “intel_pstate”)

_prev_boost_status

Previous boost status for restoration

_prev_min_freq

Previous minimum frequency setting for restoration

after_benchmarking(cores: List[int]) None[source]

Restore environment settings after benchmarking.

This method restores the system to its previous state after benchmarking is complete: - Re-enables CPU boost/turbo - Re-enables hyperthreading - Restores frequency settings

Parameters:

cores – List of physical core IDs to restore

disable_boost(cores: List[int]) None[source]

Disable CPU boost (turbo) for specified cores.

Parameters:

cores – List of physical core IDs to disable boost for

Raises:

NotImplementedError – If CPU governor is not intel_pstate

disable_hyperthreading(cores: List[int]) None[source]

Disable hyperthreading for specified cores.

Parameters:

cores – List of physical core IDs to disable hyperthreading for

drop_page_cache() None[source]

Drop system page cache to ensure clean memory state.

This method clears the page cache to prevent cached data from affecting benchmark measurements.

enable_boost(cores: List[int]) None[source]

Enable CPU boost (turbo) for specified cores.

Restores the previous boost status that was saved when boost was disabled.

Parameters:

cores – List of physical core IDs to enable boost for

Raises:

NotImplementedError – If CPU governor is not intel_pstate

enable_hyperthreading(cores: List[int]) None[source]

Enable hyperthreading for specified cores.

Parameters:

cores – List of physical core IDs to enable hyperthreading for

set_frequency(max_freq: int) None[source]

Set minimum CPU frequency percentage.

Parameters:

max_freq – Minimum frequency percentage (0-100)

setup_benchmarking(cores: List[int]) None[source]

Set up the environment for stable benchmarking.

This method applies a standard set of optimizations to create a stable environment for benchmarking: - Disables CPU boost/turbo - Disables hyperthreading - Sets CPU frequency to maximum - Drops page cache

Parameters:

cores – List of physical core IDs to configure

unset_frequency() None[source]

Restore previous minimum CPU frequency setting.

Restores the frequency setting that was saved when set_frequency was called.

write_cpu_status(cores: List[int], status: int) None[source]

Write CPU online status for specified cores.

Parameters:
  • cores – List of physical core IDs to modify

  • status – Status to set (0 for offline, 1 for online)

sebs.experiments.eviction_model module

Container eviction model experiment implementation.

This module provides the EvictionModel experiment implementation, which measures how serverless platforms manage function container eviction. It determines how long idle containers are kept alive before being recycled by the platform, which affects cold start frequency.

The experiment involves invoking functions at increasing time intervals and observing when cold starts occur, thus inferring the platform’s container caching and eviction policies.

This implemnetation is slightly different than the original one, which used the 010.sleep benchmark. Here, we use the 040.server-reply to double check that all functions are “alive” at the same time. However, the sleep logic is not currently implemented in 040.server-reply.

class sebs.experiments.eviction_model.EvictionModel(config: Config)[source]

Bases: Experiment

Container eviction model experiment.

This experiment measures how serverless platforms manage function container eviction. It determines how long idle containers are kept alive before being recycled by the platform, which affects cold start frequency.

The experiment invokes functions at different time intervals (defined in the ‘times’ list) and observes when cold starts occur, thus inferring the platform’s container caching and eviction policies.

times

List of time intervals (in seconds) between invocations

_function

Function to invoke

_trigger

Trigger to use for invocation

_out_dir

Directory for storing results

_deployment_client

Deployment client to use

_sebs_client

SeBS client

static accept_replies(port: int, invocations: int) None[source]

Accept TCP connections from functions and respond to them.

This static method acts as a TCP server, accepting connections from functions and responding to them. It runs two rounds of connection acceptance to ensure functions receive a response. The method logs all activity to a file.

This is used by the ‘040.server-reply’ benchmark to confirm function execution.

Parameters:
  • port – TCP port to listen on

  • invocations – Number of expected function invocations

static execute_instance(sleep_time: int, pid: int, tid: int, func: Function, payload: dict) dict[source]

Execute a single instance of the eviction model test.

This method performs two invocations of a function with a sleep interval between them. The first invocation should be a cold start, and the second will indicate whether the container was evicted during the sleep period.

This function is intended to be run in a separate thread; it performs two synchronous HTTP invocations of the given function.

Parameters:
  • sleep_time – Time to sleep between invocations (seconds)

  • pid – Process ID for logging

  • tid – Thread ID for logging

  • func – Function to invoke

  • payload – Payload to send to the function

Returns:

Dictionary with invocation results and timing information

Raises:

RuntimeError – If the first invocation fails

function_copies_per_time = 1
static name() str[source]

Get the name of the experiment.

Returns:

The name “eviction-model”

prepare(sebs_client: SeBS, deployment_client: System) None[source]

Prepare the experiment for execution.

This method sets up the benchmark, functions, and output directory for the experiment. Retrieves the ‘040.server-reply’ benchmark, sets up result storage, and creates a separate function for each time interval and copy combination, allowing for parallel testing of different eviction times.

Parameters:
  • sebs_client – The SeBS client to use

  • deployment_client – The deployment client to use

static process_function(repetition: int, pid: int, invocations: int, functions: List[Function], times: List[int], payload: dict) List[dict][source]

Process a function with multiple time intervals.

This method executes multiple functions with different sleep times in parallel, starting with the largest sleep time to overlap executions. The total time should be equal to the maximum execution time.

Parameters:
  • repetition – Current repetition number

  • pid – Process ID for logging

  • invocations – Number of invocations to perform

  • functions – List of functions to invoke

  • times – List of sleep times corresponding to functions

  • payload – Payload to send to functions

Returns:

List of dictionaries containing invocation results

Raises:

RuntimeError – If any execution fails

run() None[source]

Execute the eviction model experiment.

This method runs the main eviction model experiment by: 1. Setting up server instances to handle function responses 2. Executing parallel invocations with different sleep times 3. Collecting and storing results

The experiment determines container eviction patterns by measuring whether functions experience cold starts after different idle periods.

times = [1]
static typename() str[source]

Get the type name of the experiment.

Returns:

The type name “Experiment.EvictionModel”

sebs.experiments.experiment module

Base abstract class for implementing serverless benchmark experiments.

This module provides the base Experiment abstract class that defines the common interface and functionality for all benchmark experiments in the serverless benchmarking suite. Each experiment type inherits from this class and implements its specific logic for executing benchmarks, measuring performance, and analyzing results.

The Experiment class handles: - Configuration management - Parallel invocation coordination - Logging setup - Type and name identification for experiments

class sebs.experiments.experiment.Experiment(cfg: Config)[source]

Bases: ABC, LoggingBase

Abstract base class for all serverless benchmark experiments.

This class provides the common functionality and interface for all experiment implementations. It manages configuration, handles logging, and defines the abstract methods that must be implemented by specific experiment types.

config

Experiment configuration settings

_threads

Number of concurrent threads to use for the experiment

_invocations

Number of function invocations to perform

_invocation_barrier

Semaphore for coordinating parallel invocations

property config: Config

Get the experiment configuration.

Returns:

The experiment configuration

abstractmethod static name() str[source]

Get the name of the experiment.

This method must be implemented by all subclasses to return a unique name for the experiment type, which is used for configuration and identification.

Returns:

A string name for the experiment

abstractmethod static typename() str[source]

Get the type name of the experiment.

This method must be implemented by all subclasses to return a human-readable type name for the experiment, which is used for display and reporting.

Returns:

A string type name for the experiment

sebs.experiments.invocation_overhead module

Invocation overhead measurement experiment implementation.

This module provides the InvocationOverhead experiment implementation, which measures the overhead associated with invoking serverless functions. It can measure:

  • Overhead of different invocation methods (HTTP, SDK)

  • Impact of code package size on deployment and invocation time

  • Overhead of different input data sizes

  • Cold vs. warm start invocation times

The experiment is designed to help identify performance bottlenecks and optimize function deployment and invocation. We deploy microbenchmark 030.clock-synchronization to exactly measure the network latency between client and function.

class sebs.experiments.invocation_overhead.CodePackageSize(deployment_client: System, benchmark: Benchmark, settings: dict)[source]

Bases: object

Helper class for code package size experiments.

This class handles creating and deploying functions with different code package sizes to measure the impact of package size on deployment and invocation overhead.

_benchmark_path

Path to the benchmark code

_benchmark

Benchmark instance

_deployment_client

Deployment client to use

sizes

List of code package sizes to test

functions

Dictionary mapping size to function instances

before_sample(size: int, input_benchmark: dict) None[source]

Prepare the benchmark with a specific code package size.

Creates a file named ‘randomdata.bin’ with the specified size of random bytes within the benchmark’s code package. Then, updates the function on the deployment.

Parameters:
  • size – Size of the code package to create

  • input_benchmark – Benchmark input configuration (unused)

class sebs.experiments.invocation_overhead.InvocationOverhead(config: Config)[source]

Bases: Experiment

Invocation overhead measurement experiment.

This experiment measures the overhead associated with invoking serverless functions. It can measure the impact of code package size, input data size, and different invocation methods on performance.

settings

Experiment-specific settings

_benchmark

Benchmark to use

benchmark_input

Input data for the benchmark

_storage

Storage service to use

_function

Function to invoke

_code_package

Code package size experiment helper

_out_dir

Directory for storing results

_deployment_client

Deployment client to use

_sebs_client

SeBS client

static name() str[source]

Get the name of the experiment.

Returns:

The name “invocation-overhead”

prepare(sebs_client: SeBS, deployment_client: System) None[source]

Prepare the experiment for execution.

This method sets up the benchmark, function, storage, and output directory for the experiment. It uses the clock-synchronization benchmark as a base and prepares the necessary resources for measuring invocation overhead.

Parameters:
  • sebs_client – The SeBS client to use

  • deployment_client – The deployment client to use

process(sebs_client: SeBS, deployment_client, directory: str, logging_filename: str, extend_time_interval: int) None[source]

Process experiment results and generate summary statistics.

This method processes the raw experiment results by: 1. Loading client-side timing data from CSV files

and server-side UDP datagram timestamps

  1. Computing clock drift and Round-Trip Time (RTT)

  2. Creating a processed results file with invocation times

Parameters:
  • sebs_client – SeBS client instance

  • deployment_client – Deployment client instance

  • directory – Directory containing experiment results

  • logging_filename – Name of the logging file (unused)

receive_datagrams(input_benchmark: dict, repetitions: int, port: int, ip: str) List[source]

Receive UDP datagrams from the function for clock synchronization.

This method implements a UDP server that communicates with the function to measure clock synchronization and network timing. It opens a UDP socket, triggers an asynchronous function invocation, and then listens for a specified number of datagrams, recording timestamps for received and sent datagrams.

Saves server-side timestamps to a CSV file named server-{request_id}.csv.

Parameters:
  • input_benchmark – Benchmark input configuration

  • repetitions – Number of repetitions to perform

  • port – UDP port to listen on

  • ip – IP address of the client

Returns:

[is_cold, connection_time, start_timestamp, finish_timestamp, request_id]

Return type:

List containing invocation results

Raises:

RuntimeError – If function invocation fails

run() None[source]

Execute the invocation overhead experiment.

This method runs the main experiment by: 1. Setting up either code package size or payload size experiments 2. Running warm-up and cold start invocations 3. Measuring invocation overhead for different sizes

(either code package or payload, based on settings)

  1. Collecting and storing results in CSV format,

    including client-side and server-side timestamps

static typename() str[source]

Get the type name of the experiment.

Returns:

The type name “Experiment.InvocOverhead”

class sebs.experiments.invocation_overhead.PayloadSize(settings: dict)[source]

Bases: object

Helper class for payload size experiments.

This class handles creating different payload sizes to measure the impact of input data size on function invocation overhead.

pts

List of payload sizes to test

before_sample(size: int, input_benchmark: dict) None[source]

Prepare the benchmark input with a specific payload size.

Generates different payload sizes by creating base64 encoded byte arrays.

Parameters:
  • size – Size of the payload to create

  • input_benchmark – Benchmark input configuration to modify

sebs.experiments.network_ping_pong module

Network latency and throughput measurement experiment implementation.

This module provides the NetworkPingPong experiment implementation, which measures network latency and throughput characteristics between client and serverless functions. It determines various latency characteristics of the network connection in the cloud.

class sebs.experiments.network_ping_pong.NetworkPingPong(config: Config)[source]

Bases: Experiment

Network latency and throughput measurement experiment.

This experiment measures the network RTT (Round-Trip Time) using a ping-pong mechanism. Deploys the ‘020.network-benchmark’ which echoes back UDP datagrams. The experiment sends a series of datagrams and measures the time taken for each to return. This experiment measures the network performance characteristics between the client and serverless functions.

benchmark_input

Input configuration for the benchmark

_storage

Storage service to use for testing

_function

Function to invoke

_triggers

Dictionary of triggers by type

_out_dir

Directory for storing results

_deployment_client

Deployment client to use

_sebs_client

SeBS client

static name() str[source]

Get the name of the experiment.

Returns:

The name “network-ping-pong”

prepare(sebs_client: SeBS, deployment_client: System) None[source]

Prepare the experiment for execution.

This method sets up the ‘020.network-benchmark’ benchmark, triggers, storage, and output directory for the experiment. It creates or gets the function and its HTTP trigger, and prepares the input data for the benchmark.

Parameters:
  • sebs_client – The SeBS client to use

  • deployment_client – The deployment client to use

process(directory: str) None[source]

Process the experiment results.

This method processes the CSV files generated during the experiment execution, computes round-trip times (RTT), and generates summary statistics and a histogram of the RTT distribution.

Parameters:

directory – Directory containing the experiment results

receive_datagrams(repetitions: int, port: int, ip: str) None[source]

Receive UDP datagrams from the function and respond to them.

This method acts as a UDP server, receiving datagrams from the function and responding to them. It measures the timestamps of packet reception and response, and records them for later analysis.

Parameters:
  • repetitions – Number of repetitions to execute

  • port – UDP port to listen on

  • ip – IP address to include in the function invocation input

run() None[source]

Run the network ping-pong experiment.

This method executes the experiment, measuring network latency and throughput between the client and the serverless function. It first determines the client’s public IP address to include in the results.

static typename() str[source]

Get the type name of the experiment.

Returns:

The type name “Experiment.NetworkPingPong”

sebs.experiments.perf_cost module

Performance and cost measurement experiment implementation.

This module provides the PerfCost experiment implementation, which measures the performance characteristics and execution costs of serverless functions. It can run several experiment types:

  • Cold: Measures cold start performance by enforcing container recreation

  • Warm: Measures warm execution performance with reused containers

  • Burst: Measures performance under concurrent burst load

  • Sequential: Measures performance with sequential invocations

The experiment collects detailed metrics about execution time, memory usage, and costs, and provides statistical analysis of the results.

class sebs.experiments.perf_cost.PerfCost(config: Config)[source]

Bases: Experiment

Performance and cost measurement experiment.

This experiment measures the performance characteristics and execution costs of serverless functions under different execution conditions. It can measure cold starts, warm execution, burst load, and sequential execution patterns.

The experiment can be configured to run with different memory sizes, allowing for comparison of performance across different resource allocations.

_benchmark

The benchmark to execute

_benchmark_input

The input data for the benchmark

_function

The function to invoke

_trigger

The trigger to use for invocation

_out_dir

Directory for storing results

_deployment_client

The deployment client to use

_sebs_client

The SeBS client

class RunType(*values)[source]

Bases: Enum

Types of experiment runs.

This enum defines the different types of experiment runs: - WARM: Measure warm execution performance (reused containers) - COLD: Measure cold start performance (new containers) - BURST: Measure performance under concurrent burst load - SEQUENTIAL: Measure performance with sequential invocations

BURST = 2
COLD = 1
SEQUENTIAL = 3
WARM = 0
str() str[source]

Get the string representation of the run type.

Returns:

The lowercase name of the run type

compute_statistics(times: List[float]) None[source]

Compute statistical analysis of execution times.

This method computes basic statistics (mean, median, standard deviation, coefficient of variation) and confidence intervals for the given times. It computes both parametric (Student’s t-distribution) and non-parametric confidence intervals.

Parameters:

times – List of execution times in milliseconds

static name() str[source]

Get the name of the experiment.

Returns:

The name “perf-cost”

prepare(sebs_client: SeBS, deployment_client: System) None[source]

Prepare the experiment for execution.

This method sets up the benchmark, function, trigger, and output directory for the experiment. It creates or gets the function and its HTTP trigger, and prepares the input data for the benchmark.

Parameters:
  • sebs_client – The SeBS client to use

  • deployment_client – The deployment client to use

process(sebs_client: SeBS, deployment_client: System, directory: str, logging_filename: str, extend_time_interval: int) None[source]

Process experiment results and generate a CSV report.

This method processes the experiment results, downloads additional metrics if needed, and generates a CSV report with the results. The report includes memory usage, execution times, and other metrics for each experiment type and invocation.

Parameters:
  • sebs_client – The SeBS client to use

  • deployment_client – The deployment client to use

  • directory – Directory where results are stored

  • logging_filename – Filename for logs

  • extend_time_interval – Time interval to extend metrics retrieval by (in minutes)

run() None[source]

Run the experiment.

This method runs the experiment with the configured settings. If memory sizes are specified, it runs the experiment for each memory size, updating the function configuration accordingly. Otherwise, it runs the experiment once with the default memory configuration.

run_configuration(settings: dict, repetitions: int, suffix: str = '') None[source]

Run experiments for each configured experiment type.

This method runs the experiment for each experiment type specified in the settings. It dispatches to the appropriate run type handler for each experiment type.

Parameters:
  • settings – Experiment settings

  • repetitions – Number of repetitions to run

  • suffix – Optional suffix for output file names (e.g., memory size)

Raises:

RuntimeError – If an unknown experiment type is specified

static typename() str[source]

Get the type name of the experiment.

Returns:

The type name “Experiment.PerfCost”

sebs.experiments.result module

Experiment result collection and management.

This module provides the Result class for managing experiment results, including: - Function invocation results - Metrics from cloud providers - Experiment start and end times - Configuration information

The Result class handles serialization, deserialization, and analysis of experiment results, making it easier to process and visualize the data.

class sebs.experiments.result.Result(experiment_config: Config, deployment_config: Config | None = None, invocations: Dict[str, Dict[str, ExecutionResult]] | None = None, metrics: Dict[str, dict] | None = None, result_bucket: str | None = None)[source]

Bases: object

Experiment result collection and management.

This class stores and manages the results of experiments, including function invocation results, metrics from cloud providers, and configuration information. It provides methods for adding invocation results, retrieving metrics, and serializing/deserializing results.

config

Dictionary containing experiment and deployment configurations

_invocations

Dictionary mapping function names to invocation results

_metrics

Dictionary mapping function names to metrics

_start_time

Experiment start time

_end_time

Experiment end time

result_bucket

Optional bucket name for storing results

logging_handlers

Logging handlers for the result

add_invocation(func: Function, invocation: ExecutionResult) None[source]

Add an invocation result for a specific function.

If the invocation doesn’t have a request ID (likely due to failure), a synthetic ID is generated.

Parameters:
  • func – Function the invocation belongs to

  • invocation – Execution result to add

add_result_bucket(result_bucket: str) None[source]

Set the result bucket for storing experiment results.

Parameters:

result_bucket – Name of the bucket to store results in

begin() None[source]

Mark the beginning of the experiment.

This method records the start time of the experiment.

static deserialize(cached_config: dict, cache: Cache | None, handlers: LoggingHandlers | None) Result[source]

Deserialize a result from a dictionary representation.

This static method creates a new Result object from a dictionary representation, which may have been loaded from a file or cache.

Parameters:
  • cached_config – Dictionary representation of the result

  • cache – Cache instance for resolving references

  • handlers – Logging handlers for the result

Returns:

A new Result object with settings from the dictionary

end() None[source]

Mark the end of the experiment.

This method records the end time of the experiment.

functions() List[str][source]

Get a list of all function names in the results.

Returns:

List of function names

invocations(func: str) Dict[str, ExecutionResult][source]

Get invocation results for a specific function.

Parameters:

func – Name of the function to get invocation results for

Returns:

Dictionary mapping request IDs to execution results

Raises:

KeyError – If function name is not found in results

metrics(func: str) dict[source]

Get metrics for a specific function.

If no metrics exist for the function, an empty dictionary is created and returned.

Parameters:

func – Name of the function to get metrics for

Returns:

Dictionary of metrics for the function

times() Tuple[float, float][source]

Get the start and end times of the experiment.

Returns:

Tuple of (start_time, end_time) as Unix timestamps

sebs.experiments.startup_time module

Module contents

Experiment implementations for serverless benchmarking.

This package provides a collection of experiment implementations for measuring various aspects of serverless function performance:

  • PerfCost: Measures performance and cost characteristics

  • NetworkPingPong: Measures network latency and throughput

  • EvictionModel: Measures container eviction patterns

  • InvocationOverhead: Measures function invocation overhead

Each experiment is designed to evaluate specific aspects of serverless platforms, enabling detailed comparison between different providers, configurations, and workloads.