Chunk identity
Every physical content chunk is addressed by its hash. The hash is not a label added after storage; it is the key the storage model uses to decide whether content already exists and can be safely reused.
- Duplicate bytes do not need duplicate physical storage.
- Interrupted uploads can retry only missing chunks.
- The server can validate that the uploaded bytes match the claimed identity.
Manifests, not loose files
A visible file points to a manifest, and the manifest describes ordered chunks plus file-level properties. This is why a file can be large, seekable, previewable, versioned, shared, and restorable without becoming a single fragile blob.
Layouts and nodes
Cotton stores the user-facing tree as layout and node metadata. The folder tree is lightweight metadata over stable content references, so copy, snapshot, restore, trash, and version flows do not have to rewrite every physical byte.
Cross-user deduplication posture
Deduplication saves real storage in multi-user instances, but it must not leak too much through timing. Cotton's design can separate physical deduplication from the user-visible behavior so operators can reduce obvious cross-user existence signals.
Backend paths
Filesystem-backed storage can segment hash keys into directory paths, while S3-compatible storage can use the same logical object identity. The upper layers do not need to care whether the bytes land on local disk or object storage.
Reclaim-safe cleanup
A chunk is only safe to remove when the database says no live feature still references it. Snapshots, previews, backup artifacts, versions, shares, and active manifests all need explicit retention paths.
Reference graph proof
The model is concrete: clients upload SHA-256 chunks, manifests describe ordered chunk lists, layout records describe where files appear, and cleanup waits for the reference graph before reclaiming physical content.
- Chunk upload is idempotent when the same hash already exists.
- Visible files can move without moving the underlying bytes.
- Snapshots and versions can preserve references instead of copying whole trees.
The product follows the hash
Cotton is easier to trust because the storage model explains the product. Deduplication, resume, previews, snapshots, restore, and integrity are not separate tricks; they are consequences of content identity.
The discipline it demands
Content-addressed storage is more explicit than a simple folder wrapper. Cotton has to keep database references, background verification, and garbage collection disciplined because the visible tree and physical bytes are separate.