Skip to content
Git

Storage

Together with the database topic, this topic is surely among the most important ones. Both because storage will become the primary cost source at some point and also needs to deliver high performance, as the repository data is queried a lot.

To keep costs small, it is important to outsource all assets which are of secondary importance (packages, avatars, images) to an alternative storage provider, i.e. S3. This allows for highly reduced storage costs without any user-facing downsides.

As the repository files are stored on disk, disk speed matters. Also, a HA-based architecture is required to prevent hardware failures and allow for seamless node updates. This is why we are opting for a Ceph cluster across the three nodes, which we are currently working on. Right now, the Git service is running on a single node with frequent backups.

With all the HA ideas in mind, there is still a lot of work to be done on the Forgejo side. Having a HA-storage setup and a HA-database is only half the battle. Forgejo is not yet HA-ready, i.e. custom adaptions are required to make the individual components (queue, cache, cron) work reliably in HA mode.