Deployment¶
Overview¶
The Green Web Foundation Admin Portal uses GitHub Actions for continuous integration and deployment. The deployment process is automated through a series of reusable workflows that handle testing, permission checks, and deployment to both staging and production environments.
The deployment process is orchestrated through three main workflows:
CI Workflow (
ci.yml) - Coordinates testing and deploymentTest Workflow (
test.yml) - Runs the test suiteDeploy Workflow (
deploy.yml) - Handles deployment to servers
What is happening with each automated deploy via Github Actions?¶
The way a Github Actions driven deployment is handled depends on who is making it.
Pull requests from external contributors require manual approval before tests run. Collaborators and owners have tests run automatically, allowing them to get changes into staging or production in one go.
Once a workflow starting with tests begins running¶
Once a decision to run a test made, the tests are run:
Testing:
Tests run in a matrix against Python 3.11 and 3.12
MariaDB 10.11 and RabbitMQ 3.8 services are automatically started, then the tests run against the services they have exposed.
Energy consumption for each CI run is tracked with Eco CI
Deployment:
Deploys happen upon push to master or staging branches, and they only happen if tests, and a few safety checks pass like checking for pending migrations, or whether there is already an existing deployment in progress.
Representing this process visually¶
The (rather imposing) flowchart below is intended to help you trace progress through a deploy triggered by an update to the staging or master branches.
GitHub Actions Deployment Flow (Click to expand)
flowchart TD
Start([Code Push or PR Event]) --> EventCheck{Event Type?}
EventCheck -->|Push to master/staging| PushFlow[Direct Push]
EventCheck -->|Pull Request| PRFlow[Pull Request]
PushFlow --> IsCollab1[User is Collaborator]
IsCollab1 --> SetRef1[Set ref to branch]
PRFlow --> CheckCollab{Is Collaborator?}
CheckCollab -->|Yes| IsCollab2[Collaborator Status]
CheckCollab -->|No| NotCollab[External Contributor]
IsCollab2 --> SetRef2[Set ref to PR head SHA]
NotCollab --> SetRef3[Set ref to PR head SHA]
SetRef1 --> TestEnv1[Environment: test]
SetRef2 --> TestEnv2[Environment: test]
SetRef3 --> TestEnv3[Environment: test-external]
TestEnv3 --> WaitApproval[Wait for Manual Approval]
WaitApproval --> RunTests3
TestEnv1 --> RunTests1[Run Test Suite]
TestEnv2 --> RunTests2[Run Test Suite]
RunTests1 --> Matrix1[Matrix: Python 3.11, 3.12]
RunTests2 --> Matrix2[Matrix: Python 3.11, 3.12]
RunTests3[Run Test Suite] --> Matrix3[Matrix: Python 3.11, 3.12]
Matrix1 --> Services1[Start Services:<br/>MariaDB, RabbitMQ]
Matrix2 --> Services2[Start Services:<br/>MariaDB, RabbitMQ]
Matrix3 --> Services3[Start Services:<br/>MariaDB, RabbitMQ]
Services1 --> Setup1[Setup Environment:<br/>Python, uv, dependencies]
Services2 --> Setup2[Setup Environment:<br/>Python, uv, dependencies]
Services3 --> Setup3[Setup Environment:<br/>Python, uv, dependencies]
Setup1 --> Pytest1[Run pytest]
Setup2 --> Pytest2[Run pytest]
Setup3 --> Pytest3[Run pytest]
Pytest1 --> TestResult1{Tests Pass?}
Pytest2 --> TestResult2{Tests Pass?}
Pytest3 --> TestResult3{Tests Pass?}
TestResult1 -->|No| Fail1[CI Failed]
TestResult2 -->|No| Fail2[CI Failed]
TestResult3 -->|No| Fail3[CI Failed]
TestResult1 -->|Yes| DeployCheck{Push Event?}
TestResult2 -->|Yes| PRSuccess[PR Tests Passed]
TestResult3 -->|Yes| PRSuccess2[PR Tests Passed]
DeployCheck -->|No - PR| PRSuccess
DeployCheck -->|Yes| BranchCheck{Which Branch?}
BranchCheck -->|master| DeployProd[Deploy to Production]
BranchCheck -->|staging| DeployStaging[Deploy to Staging]
DeployProd --> CheckMigrations1[Check No Pending Migrations]
DeployStaging --> CheckMigrations2[Check No Pending Migrations]
CheckMigrations1 --> MigrationCheck1{Migrations OK?}
CheckMigrations2 --> MigrationCheck2{Migrations OK?}
MigrationCheck1 -->|No| MigrationFail1[Deploy Failed:<br/>Run migrations manually]
MigrationCheck2 -->|No| MigrationFail2[Deploy Failed:<br/>Run migrations manually]
MigrationCheck1 -->|Yes| Serialize1[Serialize Deploy<br/>with turnstyle]
MigrationCheck2 -->|Yes| Serialize2[Serialize Deploy<br/>with turnstyle]
Serialize1 --> AnsibleDeploy1[Run Ansible: deploy.yml]
Serialize2 --> AnsibleDeploy2[Run Ansible: deploy.yml]
AnsibleDeploy1 --> AnsibleWorkers1[Run Ansible: deploy-workers.yml]
AnsibleDeploy2 --> AnsibleWorkers2[Run Ansible: deploy-workers.yml]
AnsibleWorkers1 --> DeployComplete1[Deployment Complete]
AnsibleWorkers2 --> DeployComplete2[Deployment Complete]
style Start fill:#e1f5ff
style DeployComplete1 fill:#d4edda
style DeployComplete2 fill:#d4edda
style Fail1 fill:#f8d7da
style Fail2 fill:#f8d7da
style Fail3 fill:#f8d7da
style MigrationFail1 fill:#fff3cd
style MigrationFail2 fill:#fff3cd
style WaitApproval fill:#fff3cd
Manual Deployment with Ansible¶
While deployment is automated via GitHub Actions when pushing to master or staging branches, you can still deploy manually when needed.
Standard Deployment Process¶
The automated deployment (and manual deployment) runs through the following steps:
Check Python and Node.js are installed at recent versions
Fetch dependencies using
uvand npmRun build steps to generate static files for Django and frontend pipelines
Reload the servers
Update caddy/nginx (static files), gunicorn (web requests), and dramatiq (queue workers)
See ansible/deploy.yml and ansible/deploy-workers.yml for more information.
When to Deploy Manually¶
Database Migrations: If your code includes database migrations, the automatic deployment will fail with a migration check error. You must deploy manually with migrations.
Emergency Fixes: When you need to deploy outside the normal GitHub Actions flow.
Staging Testing: To test changes in the staging environment before merging to master.
Manual Deployment Commands¶
To deploy manually, you will need the following prerequisites:
SSH access set up for the correct servers:
A copy of the most up to date env file for the environment you’re deploying to, (
env.staging, orenv.prod) copied to your.envfile in your current local working directory. These are available from the GWF 1Password account. Remember to replace this with your local env file again after deploying!
Standard deployment (no migrations):
# Production
just release
# Staging
just release staging
# Or run ansible directly:
ansible-playbook -i ansible/inventories/prod.yml ./ansible/deploy.yml
ansible-playbook -i ansible/inventories/prod.yml ./ansible/deploy-workers.yml
Deployment with migrations:
# Production
just release_migrate
# Staging
just release_migrate staging
# Or run ansible directly:
ansible-playbook -i ansible/inventories/prod.yml ./ansible/deploy.yml
ansible-playbook -i ansible/inventories/prod.yml ./ansible/migrate.yml
ansible-playbook -i ansible/inventories/prod.yml ./ansible/deploy-workers.yml
Important: The GitHub Actions workflow includes a migration check that will prevent deployment if migrations are pending. This is a safety feature - always run migrations explicitly using just release_migrate or the migrate playbook.
If you need to make changes to how Github actions are set up¶
The deployment system consists of three coordinated workflows:
Main CI Workflow (.github/workflows/ci.yml)¶
The main coordinator that:
Determines user permissions (collaborator vs external contributor)
Routes to appropriate test environment
Triggers deployment for push events to
masterorstaging
Test Workflow (.github/workflows/test.yml)¶
A reusable workflow that:
Accepts
environment(test or test-external) andrefparametersSets up MariaDB and RabbitMQ services
Runs pytest against Python 3.11 and 3.12
Requires approval for external contributors via the
test-externalenvironment
Deploy Workflow (.github/workflows/deploy.yml)¶
A reusable workflow that:
Accepts
environment(staging or prod) parameterChecks for pending migrations (fails if any exist)
Uses turnstyle to serialize deployments
Runs Ansible playbooks against the specified inventory
Tracks energy consumption with Eco CI
Setup Environment Action (.github/actions/setup-environment)¶
A shared composite action that:
Installs specified Python version
Installs
uvfor dependency managementCreates virtual environment and syncs locked dependencies
Used by both test and deploy workflows
Note on MyST Markdown Syntax¶
This documentation uses MyST (Markedly Structured Text) syntax with ```{mermaid} fences. This is compatible with Sphinx documentation but may not render in the standard VS Code markdown preview without the Mermaid extension.
Understanding our infrastructure¶
Broadly speaking, the green web platform is deployed onto a set servers that are kept under config management, using a combination of Ansible for mainly provisioning VMs, and Nomad for scheduling the jobs run on these VMs.
See the staff-only private github repository for more, and if you have access see the infrastructure tag on the team trello board.
Other relevant links:¶
Google doc design document for making recent infrastructure updates
Our Nomad instance, which shows the state of most scheduled services and jobs
Access to the servers, and wider config maintained using a set of ansible scripts in a private github repository
how our web servers are deployed¶
The chart below outlines a high level model of how different moving parts serve web requests.
A request comes in, and normally a caddy either serves static files or reverse proxies the request to gunicorn, our django web server. Once the request reaches gunicorn, which is running at least one “worker” process, which actually services the request. Represented visually, it looks like so:
The simplest example
flowchart LR
request[http request]
request-->caddy
caddy-->master
master
subgraph gunicorn
master-->worker1
subgraph worker1[worker ]
%% render left to right
%% to make them stack
direction LR
end
end
One gunicorn master procss with one worker will not be able to serve that many requests by itself, so in production we use multiple workers.
Gunicorn allows us to use multiple types of workers, to fit the workload we are serving, and the resources we have available, like available RAM, free cores, and CPU cycles.
For a workload where we have free RAM, and CPU, we’d serve 4 sync workers, like so.
A model closer to production - 4 sync workers¶
flowchart LR
request
request-->master
subgraph gunicorn
master-->worker1
master-->worker2
master-->worker3
master-->worker4
subgraph worker1[worker 1]
end
subgraph worker2[worker 2]
end
subgraph worker3[worker 3]
end
subgraph worker4[worker 4]
end
end
Where ram is constrained, but we have spare CPU capacity, and work is IO bound, we can allocate multiple threads within a worker. This allows a single worker to serve multiple requests, and the threads within a worker are sharing memory, we can still serve multiple requests, without needing to allocate so much memory:
2 workers, each with 2 threads¶
flowchart LR
request
request-->master
subgraph gunicorn
master-->worker1
master-->worker2
subgraph worker1[worker 1]
%% render left to right
%% to make them stack
direction LR
thread1
thread2
end
subgraph worker2[worker 2]
%% render left to right
%% to make them stack
direction LR
thread3
thread4
end
end
As the workloads we serve change, we may need to update the numbers of workers and the kinds of workers,to make the best use of the resources available to serve the workloads facing us. See “scaling processes” below for more
See the code
Run the code below from the project root, to run gunicorn:
# run gunicorn using the `greenweb.wsgi` for defining the behavior inside django,
# the file `gunicorn.conf.py` to define gunicorn's behaviour,
# and binding to port 8000 of the network address 0.0.0.0
gunicorn greenweb.wsgi --bind 0.0.0.0:8000 -c gunicorn.conf.py
See gunicorn.conf.py in the code base for further informatinon about the workers in use, and greenweb/wsgi.py to see which django config file is used to define how the django application behaves.
Further reading
Workers¶
We use dramatiq to handle out of band requests, for actions that will take longer than we would like a user to wait to receive a response.
Here use rabbit MQ as our queue system, and Dramatiq for managing workers. Dramatiq relies on an actor model for picking up work on a queue, and allocating enough workers.
If you have a series of very heavy, computationally expensive jobs in the queue, there is a risk that all the workers will be stuck working on these, as lots of smaller jobs pile up.
To avoid this, we have multiple queues - regular, fast finishing throughput work is allocated to the default queue. Heavier, batch processing work to generate stats should be allocated to the stats_ queue.
Typical queue operation - serving fast and slow responses¶
flowchart LR
subgraph dramatiq
worker[worker ]
worker2[worker 2]
worker3[worker 3]
worker4[worker 4]
end
subgraph rabbitmq
%% render left to right
%% to make them stack
direction LR
default
stats
end
worker-->default
worker2-->default
worker3-->default
worker4-->stats
You can run a worker jobs with the following command:
# serve one worker, using one thread per worker, just for the default queue
manage.py rundramatiq --threads 1 --processes 1 --queues default
# serve one worker, using one thread per worker, just for the stats queue
manage.py rundramatiq --threads 1 --processes 1 --queues stats
# serve the default: as many workers as cores available, each with 8 threads, for all queues
manage.py rundramatiq
Update the number of threads and processes accordingly to allocate the appropriate amounts of resources for the workloads.
Scaling processes with ansible¶
Each new deploy using the deploy.yml ansible playbook deploys the version of the branch specified in project_deploy_branch, including number of processes for both the gunicorn web server and for the dramatiq queue workers.
If you only want to scale the workers up and down, and don’t want to run through the whole deployment process, updating just the processes is possible.
You have two possible options - first pass the systemd tag to the deploy script. This will only run the steps tagged with systemd in the deploy playbook.
ansible-playbook -i ansible/inventories/prod.yml ./ansible/deploy.yml --tags systemd
Alternatively, you can run the dedicated scale-processes.yml playbook. This includes the same tasks as are defined in the larger deploy playbook:
ansible-playbook -i ansible/inventories/prod.yml ./ansible/deploy.yml --tags systemd
These playbooks template out new scripts that systemd uses to run both the gunicorn web servers and dramatiq queue workers, then send a command to update stop, start or restart these processes.
Further reading
Logging¶
As mentioned before, we use systemd to run our both our workers and web server processes. This means processes are restarted automatically for us, and logs are rotated for us.
Gunicorn logging¶
The logs on each app server are sent to our the Loki server on our monitoring node, accessible at https://grafana.greenweb.org. This allow for centralised querying of logs.