cache_s3_based#

Note

TTL and automatic S3 cleanup

When ttl is set, an expired entry is deleted from S3 at the moment it is first read back after expiry, no S3 Lifecycle Rule or background job is required to keep the bucket clean.

Because TTLs are typically set to hours or days, expirations are infrequent; the extra DeleteObject call on each expiry has negligible overhead compared to the cost of recomputing the cached value.

S3-backed caching decorator.

class core_aws.decorators.cache_s3_based._S3Backend(fcn_qualname: str, bucket: str, key_prefix: str = 'cache/', ttl: float | None = None, s3_kwargs: Dict[str, Any] | None = None)[source]#

Bases: L2Backend

L2Backend that stores cache entries as pickle files in S3.

Each entry is stored as {key_prefix}{qualname}/{md5_of_cache_key} and serialized as {"v": value, "t": timestamp}, matching the on-disk format used by: _DiskBackend.

The S3 client is created lazily on the first cache access.

__init__(fcn_qualname: str, bucket: str, key_prefix: str = 'cache/', ttl: float | None = None, s3_kwargs: Dict[str, Any] | None = None) None[source]#
Parameters:
  • fcn_qualname__qualname__ of the decorated function; used to derive a unique S3 prefix so different functions never share entries.

  • bucket – S3 bucket where cached objects are stored.

  • key_prefix – Prefix prepended to every S3 key. Default: "cache/".

  • ttl

    Time-to-live in seconds. Entries older than this are deleted from S3 and treated as a miss on load(). None means entries never expire.

    Because TTLs are typically set to hours or days, expirations are infrequent; the extra delete_object call on expiry has negligible overhead and keeps the bucket clean automatically, without requiring an S3 Lifecycle Rule.

  • s3_kwargs – Extra keyword arguments forwarded to S3Client.

property _s3_client: S3Client#
_object_key(cache_key: Any) str[source]#
load(cache_key: Any) Any[source]#

Return the cached value, or _MISS when:

  • the S3 object does not exist (NoSuchKey), or

  • the entry is older than ttl seconds, in which case the stale object is deleted from S3 before returning _MISS, keeping the bucket clean without relying on S3 Lifecycle Rules.

Because TTLs are typically set to hours or days, these deletions are infrequent and the extra delete_object call has negligible overhead.

All other ClientError exceptions (e.g. AccessDenied) are re-raised.

save(cache_key: Any, value: Any) None[source]#

Persist value in S3 under cache_key.

_abc_impl = <_abc._abc_data object>#
core_aws.decorators.cache_s3_based.cache_s3_based(*, bucket: str, key_prefix: str = 'cache/', maxsize: int | None = None, ttl: float | None = None, s3_kwargs: Dict[str, Any] | None = None) Callable[source]#

Write-through caching decorator: L1 is a bounded in-memory LRU (_CacheWrapper); the fallback is an S3 bucket (_S3Backend).

Every new result is written to both L1 and S3 immediately. When L1 is full the least-recently-used entry is evicted from memory only, the S3 object is kept. A subsequent call with the same arguments (from the same or a different process/machine) reloads the value from S3 without invoking the wrapped function.

Parameters:
  • bucket – S3 bucket where cached objects are stored.

  • key_prefix – Prefix prepended to every S3 key. Default: "cache/".

  • maxsize – Maximum number of entries kept in the in-memory L1 cache. None means unbounded.

  • ttl

    Time-to-live in seconds applied symmetrically to both layers. Expired L1 entries are evicted from memory. Expired S3 entries are deleted from S3 and return _MISS on load, keeping the bucket clean automatically without S3 Lifecycle Rules. None (default) means entries never expire.

    TTLs are typically set to hours or days, so expirations are infrequent and the extra delete_object call has negligible overhead.

  • s3_kwargs – Extra keyword arguments forwarded to S3Client (e.g. region_name, endpoint_url).

Returns:

The wrapped function.

Example

from core_aws.decorators import cache_s3_based

@cache_s3_based(bucket="my-cache-bucket", key_prefix="etl/", ttl=3600)
def fetch_reference_data(dataset: str) -> dict:
    ...