CottonArchitecture
Architecture

How Cotton stores a file. End to end.

Content-addressed chunks. Streaming AES-GCM with per-chunk auth tags. Compression before encryption. Manifests over filesystem or S3. One pipeline from HTTP to disk — no cross-runtime glue, no plugin layer, no patchwork.

Content-addressed chunksManifest graphSeekable streamsReference snapshots

Upload shape

The browser hashes chunks in a worker, uploads them in parallel, and only retries missing chunks after interruption. Small and large files follow the same route, so the system does not need a special fragile path for big uploads or a full-file memory buffer.

  • Chunk identity is SHA-256 based.
  • A file manifest assembles ordered chunks into visible content.
  • Interrupted uploads can re-send only the missing pieces.

Storage model

Cotton separates the layout graph from stored content. A visible file is a database-backed abstraction over manifests and chunks, not a single loose object sitting in a user folder. Folders, nodes, file versions, snapshots, manifests, and chunks each have clear jobs, which makes navigation, restore, deduplication, and cleanup predictable.

  • Layouts describe where content appears.
  • Manifests describe what content is.
  • Chunks can be reused safely when multiple files reference the same bytes.

Backend contract

The storage backend only needs a small contract: write data for a key and read it back. Filesystem storage can place chunk blobs under hash-derived segments, while S3-compatible storage can use the same logical keys. Listing and delete support make verification and cleanup richer, but the core model stays simple.

  • The public chunk hash is the storage identity.
  • Backend objects do not reveal the user's folder tree.
  • The database remains the source of truth for live references.

Transform pipeline

Data is compressed before encryption and then written to either filesystem or S3-backed storage. Compression, encryption, and backend persistence are not optional afterthoughts; they are the normal ingest path, built around streaming buffers instead of loading the full object.

  • Inline Zstd keeps storage savings in the hot path.
  • Streaming AES-GCM authenticates content per chunk.
  • Filesystem and S3 backends share the same logical pipeline.

Serving model

Reads are assembled from chunks into a byte stream without rebuilding a whole file first. That is why downloads, previews, media seeking, range requests, share pages, and WebDAV can all ride the same storage design.

  • Large media stays seekable.
  • Preview generators can work against encrypted chunk streams.
  • HTTP range responses do not require temporary full-file copies.

Recovery model

Snapshots record references rather than copying the whole tree. Restore can switch layout state without turning rollback into a giant background copy job, while garbage collection still has a clear retention contract.

  • Snapshots are first-class layout operations.
  • Versions and trash fit into the same lifecycle.
  • Unreferenced content is rechecked before reclaim.

Operational model

Background manifest hashing, storage consistency checks, preview work, token cleanup, and temp cleanup are part of the product path. Operators get explicit warnings instead of discovering silent storage drift later.

Storage model proof

The architecture page is not asking visitors to trust a diagram. The product proof is visible in the storage vocabulary and page surface: chunks, manifests, snapshots, versions, range reads, WebDAV, previews, and integrity checks all point back to the same model.

  • SHA-256 chunk identity is the storage anchor.
  • Manifests describe file bytes separately from folder layout.
  • Snapshots and restore operate on references instead of copying every byte again.

Why the model sells

Cotton fits best when you want a file cloud whose internals explain the product instead of fighting it. The same architecture that makes large files practical also makes recovery, sharing, previews, and cleanup easier to reason about.

When a simpler stack wins

A chunk-first storage engine is more deliberate than mounting a folder and calling it a cloud. If you mainly need a broad app suite or direct filesystem semantics, a groupware stack or simple file server may fit better.

Architecture proof

The product behavior comes from the storage model.

Cotton is not a folder tree with a web skin. Upload, read, preview, share, restore, verify, and cleanup all depend on the same content model: chunks, manifests, layout records, and explicit references.

01Chunk
02SHA-256
03Compress
04AES-GCM
05Backend
06Manifest

Layout is separate

Folders, nodes, and visible files are layout state. Stored bytes live behind manifests and chunks.

Content has identity

Chunks and manifests make deduplication, idempotent upload, verification, and reuse part of the normal model.

Reads stay seekable

Range responses, video seeking, and preview extraction can assemble byte ranges without rebuilding whole objects first.

Cleanup sees references

Snapshots, shares, versions, backup artifacts, manifests, and live files all need to be visible before reclaim is safe.

Model conclusion

A file cloud gets simpler when content and layout stop pretending to be the same thing.

The visible file tree can move, restore, share, and version while content identity stays stable underneath. That is why the same architecture keeps showing up in performance, previews, snapshots, WebDAV, and integrity checks.

Model boundary

The model is more explicit than a plain filesystem wrapper. Cotton needs metadata correctness, background verification, and careful reclaim logic because the database is the source of truth for live references.

FAQ

Direct answers

Is Cotton just a filesystem wrapper?

No. Cotton uses a content-addressed storage engine with chunks, manifests, layout records, snapshots, and background verification. It can still expose WebDAV for compatibility, but the internal model is not a plain folder tree on disk.

Why does content addressing matter for users?

It makes duplicate content safe to reuse, interrupted uploads cheaper to resume, snapshots less copy-heavy, and integrity checks more direct.

Does this remove the need for backups?

No. The architecture makes snapshots, versions, restore, and cleanup more coherent inside a live instance. Full server loss still needs tested backups and a restore plan.