If Part 1 was the “what” and Part 2 was the “how,” this part is the “oh no, it’s broken… okay, I fixed it.” Welcome to the deployment saga. This is the story of taking a perfectly functional local Docker setup and launching it into a live, secure, and fully automated production environment on the internet.
This is where the real DevOps work begins. It’s a process filled with challenges that test your understanding of networking, security, and automation. Let’s dive in.
Step 1: Hardening the Production Server
The first step was preparing the battlefield: a fresh Linux VPS. A default server is not secure. Before deploying any code, I performed essential server hardening:
- Created a Non-Root User: All operations are performed by a user with
sudo
privileges, never directly asroot
. - Configured UFW (Uncomplicated Firewall): I immediately locked down all ports, allowing only essential traffic: SSH (initially), HTTP, and HTTPS.
- Implemented a “Zero-Trust” SSH Policy with Tailscale: Instead of leaving the SSH port (22) open to the world and relying on IP whitelisting, I took a more secure approach. I installed Tailscale on both my local machine and the VPS, creating a private, encrypted mesh network. The UFW was then configured to only allow SSH connections originating from within this private Tailscale network. To the public internet, my server’s SSH port is completely invisible, drastically reducing the attack surface.
Step 2: The CI/CD Pipeline – Automation is King
The goal was a true Continuous Deployment pipeline: every git push
to the main
branch should automatically and safely update the live application.
The Tool: GitHub Actions. The Challenge: How do you get GitHub Actions to securely connect to a server whose SSH port is locked down behind a VPN?
The Solution: A Self-Hosted Runner.
Instead of using GitHub’s cloud runners, I configured a self-hosted runner directly on the VPS. This brilliant piece of software runs as a service, polling GitHub for new jobs. This inverts the connection model: the server initiates the connection outwards, meaning no inbound ports need to be opened for the CI/CD pipeline.
Here is the final, resilient deploy.yml
workflow file:
name: Deploy to Production VPS
on:
push:
branches: [ "main" ]
jobs:
deploy:
runs-on: self-hosted
steps:
- name: Clean and Prepare Workspace
run: |
if [ -d "${{ github.workspace }}" ]; then
sudo chown -R vanessa:vanessa "${{ github.workspace }}"
fi
- name: Checkout code
uses: actions/checkout@v4
- name: Create .env file
run: echo "${{ secrets.DOT_ENV }}" > .env
- name: Deploy Application
run: |
docker compose -f docker-compose.prod.yml up --build -d
docker image prune -f
Step 3: Troubleshooting in the Trenches (The Real DevOps Work)
A green pipeline is a beautiful thing, but it’s usually built on the ashes of many red ones. Here are the key “boss battles” I fought and won during this deployment.
Battle 1: The 502 Bad Gateway Mystery
The pipeline ran successfully, docker ps
showed all containers were “Up,” but the website showed a dreaded 502 error.
- Symptom: Nginx (the “gatekeeper”) was running but couldn’t communicate with the Flask app (the “upstream”).
- Investigation: The
docker logs app
showed no errors, only successful startups. This was the clue. The app was crashing so fast that Gunicorn couldn’t log the error before Docker restarted it. Thedocker logs nginx
finally revealed the truth:connect() failed (113: Host is unreachable)
. - Solution: The app container wasn’t correctly registering on Docker’s internal network after the CI/CD run. A simple
docker compose restart app
forced the container to re-register and resolved the issue, confirming a transient network glitch within Docker.
Battle 2: The Permission Denied Paradox
The second pipeline run failed spectacularly with an EACCES: permission denied
error during the “Checkout code” step.
- Symptom: The runner couldn’t clean its own workspace.
- Investigation: The first
docker compose
run had created certificate folders (certbot/
) owned by theroot
user. The runner, operating as thevanessa
user, was now forbidden from touching these folders. - Solution: This was a classic permissions battle. The fix was twofold: first, running
sudo chown -R vanessa:vanessa ~/actions-runner
to fix the immediate problem. Second, adding the “Clean and Prepare Workspace” step to thedeploy.yml
file. This made the pipeline self-healing, ensuring it fixes its own permissions before every run.
Battle 3: The Case of the Vanishing Certificates
After fixing the permissions, the 502 error returned, but this time the Nginx logs were different: cannot load certificate... No such file or directory
.
- Symptom: The SSL certificates were disappearing after every deploy.
- Investigation: I realized the “Clean and Prepare Workspace” step, while necessary, was too effective. It was wiping the entire project directory, including the
certbot
folder containing the live SSL certificates! - Solution: This highlighted a critical architectural principle: separating stateful data from stateless code. The certificates are stateful data; they should not live inside the ephemeral project folder. I created a permanent, absolute path on the server (
/opt/komocred/certs
), moved thecertbot
folder there, and updated thedocker-compose.prod.yml
to mount the certificates from this new, safe location. The CI/CD pipeline could now clean its workspace without destroying critical data.
Conclusion: A Resilient System Forged in Fire
Deploying an application is a journey. The final, stable production environment is a testament not just to a good plan, but to the ability to diagnose and solve the unexpected problems that arise along the way. Through systematic troubleshooting, I transformed a fragile deployment process into a secure, resilient, and fully automated CI/CD pipeline.
In the final post of this series, I’ll shift focus from the infrastructure to the user, showcasing the application’s key features, the UI/UX decisions I made, and the final polish that turned a tool into a product.