Content-addressed storage

Content-addressed storage for a real file cloud.

Cotton does not treat files as loose objects in user directories. Visible files are assembled from content-addressed chunks, manifests, and layout metadata, which makes deduplication, resumable upload, snapshots, restore, and cleanup part of one model.

SHA-256 chunksFile manifestsDeduplicationLayout graph

Chunk identity

Every physical content chunk is addressed by its hash. The hash is not a label added after storage; it is the key the storage model uses to decide whether content already exists and can be safely reused.

Duplicate bytes do not need duplicate physical storage.
Interrupted uploads can retry only missing chunks.
The server can validate that the uploaded bytes match the claimed identity.

Manifests, not loose files

A visible file points to a manifest, and the manifest describes ordered chunks plus file-level properties. This is why a file can be large, seekable, previewable, versioned, shared, and restorable without becoming a single fragile blob.

Layouts and nodes

Cotton stores the user-facing tree as layout and node metadata. The folder tree is lightweight metadata over stable content references, so copy, snapshot, restore, trash, and version flows do not have to rewrite every physical byte.

Cross-user deduplication posture

Deduplication saves real storage in multi-user instances, but it must not leak too much through timing. Cotton's design can separate physical deduplication from the user-visible behavior so operators can reduce obvious cross-user existence signals.

Backend paths

Filesystem-backed storage can segment hash keys into directory paths, while S3-compatible storage can use the same logical object identity. The upper layers do not need to care whether the bytes land on local disk or object storage.

Reclaim-safe cleanup

A chunk is only safe to remove when the database says no live feature still references it. Snapshots, previews, backup artifacts, versions, shares, and active manifests all need explicit retention paths.

Reference graph proof

The model is concrete: clients upload SHA-256 chunks, manifests describe ordered chunk lists, layout records describe where files appear, and cleanup waits for the reference graph before reclaiming physical content.

Chunk upload is idempotent when the same hash already exists.
Visible files can move without moving the underlying bytes.
Snapshots and versions can preserve references instead of copying whole trees.

The product follows the hash

Cotton is easier to trust because the storage model explains the product. Deduplication, resume, previews, snapshots, restore, and integrity are not separate tricks; they are consequences of content identity.

The discipline it demands

Content-addressed storage is more explicit than a simple folder wrapper. Cotton has to keep database references, background verification, and garbage collection disciplined because the visible tree and physical bytes are separate.

FAQ

Direct answers

Why use content-addressed storage in a self-hosted cloud?

It gives the cloud a stable identity model for deduplication, resumable uploads, integrity checks, snapshots, and safe cleanup. That matters more as datasets and user counts grow.

Does content addressing replace encryption?

No. Content addressing identifies logical content. Cotton still writes stored data through its streaming encryption pipeline before persistence.

Can the same model work on local disk and object storage?

Yes. Filesystem and S3-compatible backends can use the same logical chunk identities, while Postgres remains the source of truth for live references and layout metadata.