-
Create a new Google Compute Engine instance from the
sdow-web-serverinstance template, which is configured with the following specs:- Name:
sdow-web-server-# - Zone:
us-central1-c - Machine Type: e2-micro (2 vCPU, 1 core, 1 GB memory)
- Boot disk: 32 GB SSD, Debian GNU/Linux 12 (bookworm)
- Notes: Click "Set access for each API" and use default values for all APIs except set Storage to "Read Write"
- Firewall: Allow HTTP and HTTPS traffic
- Monitoring: Install Ops Agent for Monitoring and Logging
- Name:
-
Set the default region and zone for the
gcloudCLI:$ gcloud config set compute/region us-central1 $ gcloud config set compute/zone us-central1-c -
SSH into the machine:
$ gcloud compute ssh sdow-web-server-# --project=sdow-prod
-
Install required operating system dependencies to run the Flask app:
$ sudo apt-get -q update $ sudo apt-get -yq install git pigz sqlite3 $ sudo apt install python3-virtualenv
-
Clone this directory via HTTPS and navigate into the repo:
$ git clone https://github.com/jwngr/sdow.git $ cd sdow/ -
Create and activate a new
virtualenvenvironment:$ virtualenv -p python3 env $ source env/bin/activate -
Install the required Python libraries:
$ pip install -r requirements.txt
-
Copy the latest compressed SQLite file from the
sdow-prodGCS bucket:$ gsutil -u sdow-prod cp gs://sdow-prod/dumps/<YYYYMMDD>/sdow.sqlite.gz sdow/
-
Decompress the SQLite file:
# Warning: This may take ~10 minutes. $ pigz -d sdow/sdow.sqlite.gz -
Create the
searches.sqlitefile:$ sqlite3 sdow/searches.sqlite ".read sql/createSearchesTable.sql"Note: Alternatively, copy a backed-up version of
searches.sqlite:$ gsutil -u sdow-prod cp gs://sdow-prod/backups/<YYYYMMDD>/searches.sql.gz sdow/searches.sql.gz $ pigz -d sdow/searches.sql.gz $ sqlite3 sdow/searches.sqlite ".read sdow/searches.sql" $ rm sdow/searches.sql
-
Install required operating system dependencies to generate an SSL certificate (this and the following instructions are based on these blog posts):
$ sudo apt-get -q update $ sudo apt install nginx snapd $ sudo snap install --classic certbot $ sudo ln -s /snap/bin/certbot /usr/bin/certbot
-
Add this
locationblock inside theserverblock in/etc/nginx/sites-available/default:location ~ /.well-known { allow all; } -
Start NGINX:
$ sudo systemctl restart nginx
-
Ensure the VM has been assigned the proper static IP address (
sdow-web-server-static-ip) by editing it on the GCP console. -
Create an SSL certificate using Let's Encrypt's
certbot:$ sudo certbot certonly -a webroot --webroot-path=/var/www/html -d api.sixdegreesofwikipedia.com --email wenger.jacob@gmail.com
-
Ensure auto-renewal of the SSL certificate is configured properly:
$ sudo certbot renew --dry-run
-
Configure the following cron jobs:
$ crontab -e # Add the stuff below and save.# Auto-renew the SSL certificate daily. 0 4 * * * sudo /usr/bin/certbot renew --noninteractive --renew-hook "sudo /bin/systemctl reload nginx" # Restart the web server every ten minutes (to defend against hangs). */10 * * * * /home/jwngr/sdow/env/bin/supervisorctl -c /home/jwngr/sdow/config/supervisord.conf restart gunicorn # Backup the searches database weekly. 0 6 * * 0 /home/jwngr/sdow/scripts/backupSearchesDatabase.shNote: Let's Encrypt debug logs can be found at
/var/log/letsencrypt/letsencrypt.log.Note: Supervisor debug logs can be found at
/tmp/supervisord.log. -
Install a mail service in order to read logs from cron jobs:
$ sudo apt-get -yq install postfix # Choose "Local only" and use the default email address.Note: Cron job logs will be written to
/var/mail/jwngr. -
Generate a strong Diffie-Hellman group to further increase security:
$ sudo openssl dhparam -out /etc/ssl/certs/dhparam.pem 2048
-
Copy over the NGINX configuration, making sure to back up the original configuration:
$ sudo cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.backup $ sudo cp config/nginx.conf /etc/nginx/nginx.conf
-
Restart
nginx:$ sudo systemctl restart nginx
-
Activate the
virtualenvenvironment:$ cd sdow/ $ source env/bin/activate
-
Start the Flask web server via Supervisor which runs Gunicorn:
$ cd config/ $ supervisord -
Use
supervisorctlto manage the running web server:$ supervisorctl status # Get status of running processes $ supervisorctl stop gunicorn # Stop web server $ supervisorctl start gunicorn # Start web server $ supervisorctl restart gunicorn # Restart web server
Note:
supervisordandsupervisorctlmust be run from theconfig/directory or specify the configuration file via the-cargument or else they will return an obscure"http://localhost:9001 refused connection"error message.Note: Log output from
supervisordis written to/tmp/supervisord.logand log output fromgunicornis written to/tmp/gunicorn-stdout---supervisor-<HASH>.log. Logs are also written to Stackdriver Logging.
To update the web server to a more recent sdow.sqlite file with minimal downtime, run the
following commands after SSHing into the web server:
$ cd sdow/
$ source env/bin/activate
$ gsutil -u sdow-prod cp gs://sdow-prod/dumps/YYYYMMDD/sdow.sqlite.gz sdow/sdow_new.sqlite.gz
$ pigz -d sdow/sdow_new.sqlite.gz # This takes ~10 minutes and causes search to be non-responsive.
$ mv sdow/sdow_new.sqlite sdow/sdow.sqlite
$ cd config/
$ supervisorctl restart gunicornTo update the Python server code which powers the SDOW backend, run the following commands after SSHing into the web server:
$ cd sdow/
$ source env/bin/activate
$ git pull
$ pip install -r requirements.txt
$ cd config/
$ supervisorctl restart gunicorn