In this post I am going to document the steps I took to implement a fully automated deployment of my blog using GitHub Actions and GitHub Pages.
As always, I started my journey with the definition of what I really wanted to get at the end:
The website is published on GitHub pages
Since the website is static and all of its content can be easily downloaded using a web crawler (like
wget --mirror https://website.tld) I was OK with exposing the structure in the public repository, which is what GitHub offers on a free plan.
The code to generate the website should be private
I do a lot of work on the SSG (which is Pelican in my case) itself: extend it with plug-ins that may contain API tokens to reach out to some third party APIs, hack the core code when I want to quickly test stuff, etc. – so, I really did not have any desire to publish publicly all the commotions I did in the background (sometimes I do more than a hundred commits per day just to experiment with different ideas I have).
There should be a valid history of changes in both repositories
Well, I would get the history on my private repository for free, since it is the core value of maintaining a repository in the VCS, but I also wanted to have clean history of changes to the content I publish publicly.
It would be a pleasant bonus if the changes in the public repository could refer back to the corresponding commit in the private repository.
One may say that to do what I set out to do I would need to subscribe for a paid account with GitHub since according to their help page GitHub Pages for private repositories are only available on the paid plans.
However, as I pointed out above, it does not make sense to hide the content of the actual static website, hence all I needed to do is to find a way how to “publish” the resulting artefact to the GitHub Pages repository, and, preferably, that “publishing” should happen on GitHub’s side.
Luckily for me, GitHub started to support GitHub Actions on the free plan some time ago and as long as it is not abused according to their terms and conditions, it is a perfect vehicle for what I am trying to do, in my opinion.
Setting up GitHub Pages
There are multiple howtos and tutorials in the Internet on how to set GitHub Pages up, including the official help section on this topic, so I will only elaborate on details where I did something specific for the purposes of achieving my goals.
There are different types of GitHub Pages:
- user or organisation
The difference between two is subtle (the former requires a dedicated repository for your website, while the latter allows you to keep it in a branch of the existing repository), but for the purposes of this article I am assuming that we are working with the user level GitHub pages which are residing in the repository named “<username>.github.io” (where <username> is your GitHub user name) as per the official documentation.
A few caveat I found and spent some time solving after following the official documentation a listed below:
GitHub’s documentation assumes use of Jekyll for site generation.
It is not obvious how to use a different SSG (like Pelican). As far as I understand there are multiple triggers for GitHub to consider that the web site is in “published” state, so just ignore any references to Jekyll in the documentation: you will trip one of the triggers sooner or later, for example by pushing HTML files into your repository.
Configure your DNS before setting the custom domain name in GitHub Pages.
CNAMEfile with the name of your custom domain within will trigger a DNS check from GitHub to see that your custom domain name is pointing back to GitHub Pages.
DNS heavily relies on caching and depending on the TTL settings in your zone if a negative check is performed (that is, when GitHub fails to retrieve the corresponding record) you will likely need to wait for quite a while for GitHub to retry.
Setting up the CNAME record in advance and then verifying it with a query before you commit the
CNAMEfile to your repository ensures that you will get the quickest validation response from GitHub, e.g. I set up my CNAME records and then verified it from the command line (before) submitting the request to GitHub:
There are some shenanigans with the “Enforce HTTPS” option.
It is not obvious from the documentation, but the enforcement of HTTPS for custom domains on GitHub’s side is dependent on the several things:
- before the checkbox is enabled your custom domain name should be
confirmed by GitHub (your
CNAMEfile is in place and the repository settings show that the name was recognised);
- the CNAME record should point to your “<
>.github.io.” DNS record (or, you can point it directly to GitHub Pages IP addresses if you want to conceal the repository name in the DNS output);
- if GitHub did not like something and you adjusted anything in the above
dot points the only way to trigger the enforcement of HTTPS is to
CNAMEfile to the repository (yes, you read it right: you need to delete the file and push it to the repository again);
- Removing the
CNAMEfile from the repository is a disruptive action – the site will not be accessible for the duration of the file being missing.
- before the checkbox is enabled your custom domain name should be confirmed by GitHub (your
OK, you have your public repository configured the way you want, so let’s look at the settings we need to be able to publish our code to this public repository.
When I try to automate something, I usually start with writing down manual steps I would do to achieve the results. This helps me to see patterns and to understand what I can easily automate and what will require some brain-storming to resolve.
In the case of updating the repository it is quite trivial: if I were to push
updates manually all I need is a private SSH key with the corresponding public
SSH key configured with write privileges for the repository and I could push
git push from my local copy of the repository.
… private keys are called “private” for a reason – they are not supposed to leave the device under any circumstances. […] please pay attention when you read of hear somebody advising you to upload your private keys somewhere, it is usually bad advice.
My private keys are called “private” for a reason – they are not supposed to leave my device(s) under any circumstances (except for backup purposes such as storing them in a safe). So, please pay attention when you read or hear somebody advising you to upload your private keys somewhere, it is usually bad advice.
For the integration purposes, GitHub provides so-called “Deploy keys” and “Personal access tokens”. The former is just an SSH key pair associated with a particular repository (you can configure it in repository’s setting) while the latter is an OAuth access token associated with your account.
While you can successfully use both, I would recommend to use the “Deploy keys” only, since despite that you can try to scope access down for a personal token it would not be good enough and the actions performed using that token will look like you are executing them.
To configure a “Deploy key” we need to do two things:
Generate an SSH key pair, e.g.:[user@localhost ~]$ ssh-keygen -t ed25519 -N '' -C 'Updating the blog from GH Action' -f ~/gh-action
Here, I chose the
ed25519key type since it is the shortest from the GitHub supported key types at the moment, yet it is strong enough.
I also made the key pair passphrase-less (
-N '') since the purpose of the key pair is to automate things in the unattended fashion and there will be no one to type in the passphrase.
The key pair comment just makes it easier to maintain your keys, but optional.
-f ~/gh-actionoption specifies where the generated private key is going to be stored. The public counterpart will use the same path with the
.pubsuffix appended to it.
Set the newly generated public key up as the “Deploy key”:
All you need to do is to go to the repository setting for the public repository you created for GitHub Pages, click on “Deploy keys” in the left side menu, then click on the “Add deploy key” button in the upper right corner.
On the next page, provide a sensible description for the deploy key (I used the same text as I put into the keys comment, i.e. “Updating the blog from GH Action”) and copy and paste the recently generated public key. GitHub does not allow you to upload files over there, so you need to copy the content of the public key file and paste it into the form, e.g.:
NOTE: You need to ensure that you tick the “Allow write access” checkbox, otherwise we would not be able to push to the repository with the corresponding private key.
This, actually, concludes the configuration of the GitHub Pages repository for now – in later articles I will document how one could leverage the repository Issues for managing comments on the web site and maintain the counters for likes on the pages, but it would be a completely separate post :).
Setting up the private, code repository
A typical Pelican repository layout is quite simple and comprises of one mandatory directory, one semi-mandatory file, and everything else is optional, but could be used to enhance your experience.
The mandatory directory is the so-called
directory (in Pelican’s terms). The name of the directory can be anything you
want, but it is better be reflected in the
directive of the
I am saying “better be” since Pelican can operate without any configuration
files, but the result will be limited, hence I call the
(which is the default name for the configuration file) to be “semi-mandatory”.
The name of the configuration file can be also anything you like, however, I
suggest to stick with the default for now.
Basically, you can quickly start by following the Pelican documentation, by doing something as follows:
seeder FromAppData(download=False, pip=latest, setuptools=latest, wheel=latest, via=copy, app_data_dir=/home/user/.local/share/virtualenv/seed-app-data/v1.0.1)activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
A short break down of the above session snippet is:
- On line 1 we create a virtual Python environment, so we could install Pelican locally;
- We enter the newly created virtual environment on line 2, which makes Pelican available to us;
- We create an empty repository (
~/blog) and initialise it using Pelican’s quickstart;
- Since we are not using the default publishing capabilities and we are not interested in storing the generated pages in our code repository, we clean things up a bit;
- Finally, we commit the generated skeleton to Git.
A good test at this stage would be to ensure that Pelican is working and likes our structure:
So far so good, but it is not a real test since there are no source files to generate something from, so let’s give Pelican something to work on:
[user@localhost ~]$ printf 'Title: First Post\nDate: 2020-05-10\n\n#First post\nPelican is awesome!' > content/first.md
The output from
elinks was truncated on purpose since I just wanted to
showcase that Pelican has indeed generated the structure for a static website
from just one article file we created.
Before we push our local repository to GitHub we may want to do some house
keeping first, such as create the
.gitignore file and list the temporary
things we do not want Git to track. A good enough version of the
file I am using for my code repository is the following:
Do not forget to actually commit that
.gitignore file to your local
repository using the
git add .gitignore && git commit -m 'Added .gitignore',
by the way.
Now, we need to create a private repository on GitHub, so jump into your browser, go to your GitHub account, press the “+” icon in the upper right corner (right next to your profile icon), and select “New repository”.
On the “Create repository” page put whatever you desire as the name and the description of the repository you are about to create. Ensure that the “Private” radio button is selected and uncheck the “Initialize this repository with a README” if it was checked.
Once the repository is created, you will be presented with a page that
enumerates your options for the next step, but I will just go ahead and show a
session dump of what you will need to do. In the following session snippet
blog is the repository name I chose for my private code repository and you
will need to replace it with your private repository name (the working
directory is our newly created local repository):
Do you remember how we generated a deploy key pair earlier and installed the public key part into the public blog repository, so GitHub would allow the bearer of the private key to authenticate and deploy changes to the public blog repository? Well, since the purpose of this article is to introduce the full automation, the bearer of the key would be the GitHub Action associated with the private repository, hence we need to provide the action with the private key somehow.
GitHub has a feature called “repository secrets” and it is a perfect candidate to pass the private key to the GitHub Action. We need to follow the official documentation for the feature and create a secret called “DEPLOY_KEY” with the content of the private part of the deploy key. This will be used in the last step of the GitHub Action we are about to define.
Configuring the GitHub Action for publishing
Everything is well and good, but “where is the automation?” you may ask. After all, I suspect this was the primary reason you are reading this post. Well, we are about to start to look into the automation part and it is rather short in comparison to all the steps we did to set repositories up.
Our automation relies on the GitHub Action feature of GitHub. In plain terms, GitHub Action is free compute resource provided by GitHub (there are some limits, but for the purposes of a personal blog it is unlikely that you will ever hit these limits).
Each GitHub Action is associated with a specific repository and is defined
using quite a simple YAML configuration file that instructs GitHub on how to
provision a required compute environment and what to run inside that
environment. The YAML file can be arbitrarily named and resides in the
.github/workflows/ subdirectory (starting from the root of the corresponding
The GitHub Action I am using for my blog web site is stored in
.github/workflows/pelican.yml and contains the following (we will dissect it
further down the post):
This is a copy of my live GitHub Action for deploying my blog that you are most likely reading right now, and I decided not to edit anything, so if you just want to re-use it you will need to replace a few things, namely:
en_AU.UTF-8=> to a locale you are using (you can run
locale -aif you are running Linux to see the list of locales available on your system);
content=> you may need to change that to the name of your content directory (if you did not use the default name);
themes/mind-drops/content=> you will need to drop this line since it is my theme’s content directory and you would not have it;
galaxy4public/galaxy4public.github.io=> to <your_username/your_blog_repo_name>, obviously :)
This being sorted, let’s look a bit more closely to understand how this GitHub Action is structured and what each step is doing.
It all starts with the definition of the action itself and the conditions of how it is triggered and how does it run, you can get a formal description of the YAML structure of this configuration file in the official GitHub documentation on Workflows.
Here, we are only going to focus on steps defined under the “jobs:” section of the file since these steps are defining the logic we are after.
The “Initialise locale” step is quite important for Pelican since with a
misconfigured locale Pelican tends to produce incorrect output (which is kind
of expected). So in this step we are trying to determine whether the user (us
:) ) has supplied the
LANG variable and if they did we update
/etc/locale.gen file, then run the
locale-gen command to update the
corresponding files, and finally we set the locale of the container to the
The “Checkout the primary repo” step is leveraging the official “Checkout V2”
Action and checks out a full copy of the source code repository of our blog
and all the linked submodules. Initially, I was using a shalow copy using
fetch-depth: 1, but the next step was requiring the full repository history
to do its job reliably and I changed it to be a full history clone.
git does not store timestamps for the files and directories under its
control, yet Pelican relies on timestamps to populate the modification time of
the artefacts – we need to find a way to reconstruct at least file timestamps
after the tree was checked out. One of the possible approaches would be to
create a plugin that could determine whether we are inside a
git working tree
or not and depending on that apply different timestamp extraction policies, but
I thought that a much easier way would be to prepare the checked out tree,
hence making it compatible with the way Pelican expects things to be.
The “Restore modification times for content” step is my variant of how one
could reconstruct the timestamps for files close enough to make it possible to
use with Pelican. The approach relies on the fact that
git records the
timestamp of each commit including adding, updating, and deleting files. We
create a list of all these file events using
git log for file trees under
“content” (where our blog content lives) and “themes/mind-drops/content” (where
my custom theme injects some content such as the Web service worker script),
then we use
sed to filter and to re-arrange the output a bit, followed by
reverse sorting to help to remove the entries that were introduced and later
deleted. In the end, we have a list of file names with timestamps, so we go
through the list in a loop and set the timestamps to files using
The “Checkout Pages repo” step is cloning the public blog repository into the “output” directory where Pelican will put the generated files. This is needed to ensure that we can track the changes to the public repository, since Pelican is careful enough (if not instructed otherwise) only to update the files it generates and leave everything else in place as is. We use this later to determine whether any new content has been generated or not.
The “Set up Python” and the “Install dependencies” steps are pretty generic:
the former is using the official GitHub Action to install and configure the
latest available version of Python 3.x and the latter is leveraging
install all blog’s dependencies (including Pelican itself).
The “Generate the website” step is running
pelican to process our articles
and pages and to generate the result in the “output” directory. There are a couple of tricks with this step, though.
The first trick, which is not that obvious is that we are removing the content of the “output” directory. It seems a bit weird since we just checked it out several steps before, is not it? Well, we are removing everything except hidden files and directories, which happen to contain the “.git” subdirectory with all the actual data about the repository. Why do we do it? It is simple, this helps us to determine a situation if some file or directory was removed, so we could propagate this knowledge to the public blog repository. If we did not clean up the content of the “output” directory we would only append new changes and would never remove anything – this is how it was before I stumbled upon the problem, by the way. :)
The second trick of the “Generate the website” step is the extraction of the
time zone information from the configuration files and is setting the
variable correctly just before we call
pelican. Without this either Pelican
may fail or if it does not it will produce UTC based date and times, which
would be undesirable (at least for me, since my time zone is in Australia).
The final step is to push the updated content to the public blog repository, which will make it visible via GitHub Pages. Several things to notice there are:
- In the
env:section we are setting up the
DEPLOY_KEYvariable – this syntax is used to retrieve a named secret value from the associated secret key for a repository. We store the private part of the deploy key we specifically generated for this purpose at the beginning of this article in the private repository secret.
git statusis used to determine whether there are any changes between what we have in the working tree and the repository index. If no changes were detected we just exit gracefully.
- If any change to the generated content was detected, we temporarily load
the private part of the deploy key into the
ssh-agent(for 5 minutes), push changes to the public blog repository, then clean up the key from the agent and kill the agent itself.
From this point on, any push to the private codebase repository will trigger the GitHub Action and if the change has resulted in any updated content such content will be published to GitHub Pages!
There are quite a few thing we could improve such as introducing the broken links check, doing some sanity checks, etc. – but this would be for another article, I guess. :)