Serverless computing … with Pascal

This story is cross-posted on Medium as well.

Interested in serverless computing but don’t want to use some newfangled programming language like Python, JavaScript, or Go? Then why not write your serverless web app in Pascal? Let’s go back to the 1970s by taking a Pascal program from the definitive Pascal User Manual and Report, published in 1974, and deploy it to the Google Cloud Run serverless platform. Oh, and here’s a 1970s background music playlist to get you focused.

What’s that? You don’t have a need to run serverless Pascal? Then maybe you have some other legacy software in only executable form, or a language not widely supported by serverless platforms, or built out of several different apps. The techniques shown here can solve those problems, too.

Some background

Google Cloud Functions and similar serverless solutions make it extremely easy to deploy functionality to the cloud. You just write your application code, upload it to the service, and they handle deployment, provisioning, infrastructure, scaling, logging, and security for you. They’re wonderful, but require a trade-off. They will only accept a single program written in one of their supported languages. And Pascal is not (yet?) one of those supported languages.

But Cloud Run provides similar capabilities with a slightly different trade-off. You can use other languages (like Pascal!), executable files, or multiple programs but you need to provide a container, not just source code. That’s a little more work, but less than you might think, because Cloud Build will do it for you. And you get all the normal serverless benefits like automatic scaling (even to zero when your code isn’t running). Let’s see how.

TL;DR: One-button deployment

The sample project repository is on GitHub. Take a look at it. The README file is displayed and there’s a big button near the top of it:

If you have a Google Cloud account, such as a Gmail account, you can click that button, answer any prompts it displays, and in a few minutes you will have a running web service built from the Pascal program in the repository. The URL will be displayed when the service is deployed.

You can fork this repository and change it to run your own Pascal code (or, with a little more tweaking, any other code) and launch your own new service the same way. It’s even possible to make the service run at a URL on your own domain.

How it works

I’ve taken a Pascal program and deployed it to Cloud Run as a web service. It takes a number and returns the same number, but in Roman numerals. Want to see what 1974 (the year the program I’m using was published) looks like in Roman numerals? Just call the RESTful service via https://roman.engelke.dev/1974 to find out. Or you can put a different number in the URL to convert it. This program doesn’t use Roman numeral shortcuts (like IX for 9) so there can be four of one letter in a row.

To build this from scratch yourself, first create a new folder to hold all the pieces, or simply clone the GitHub repository to a new folder with the command below:

git clone https://github.com/engelke/cloud-run-pascal.git

The folder will contain the Pascal program and any other needed pieces to make it work in Cloud Run. First up is the Pascal program itself:

Clean and simple, and look: clauses are separated by semicolons and the program ends with a period! Newer languages aren’t so well punctuated. All this program does is read an integer from standard input and write the Roman numeral equivalent to standard output. No modern web technology is needed for that. Which is good, because Pascal is decades older than the web. It’s even older than the Internet Protocol, IP.

The Pascal program doesn’t understand the web, but the container must include a web server. When a web request comes in to Cloud Run, it will run the container and send the request to the web server in it. That web server is provided by a Python wrapper program. The wrapper program doesn’t understand the application, it’s glue software that listens for a web request, pulls the number out of the end of the URL, and runs the Pascal program with that number as its input. If the Pascal program crashes, the wrapper returns a 500 Server Error message. If the program writes an error message, the wrapper returns a 400 Bad Request message. Otherwise, the wrapper takes the output of the Pascal program and returns it as the body of the response.

Along with the Python wrapper program there is a requirements.txt file that specifies which Python libraries are needed by the program. Again, this is just needed to wrap around the real program we need to run, which is in Pascal.

The only other thing needed is the Dockerfile, a text file that tells how to build the container. Let’s take a look at it.

  • FROM python:3.7-slim
    Build the container on top of a standard one called python:3.7-slim.
  • ENV APP_HOME /app
    WORKDIR $APP_HOME
    COPY . ./

    Specify where in the container to place the code, and copy the files in the current directory there.
  • RUN pip install -r requirements.txt
    As part of building the container, run this command to install the libraries needed by the Python wrapper program.
  • RUN apt-get update -y -q
    RUN apt-get install -y -q fpc
    RUN fpc roman.pas

    Pascal source code can’t be run directly, it must be compiled (converted to a binary executable) first. So, when building the container, these lines say to run commands to install a Pascal compiler called fpc and then use it to compile the roman.pas program. The binary executable produced inside the container will be just be called roman.
  • CMD exec gunicorn --bind:$PORT --workers 1 --threads 8 app:app
    After the container is built and deployed, whenever it is run it should invoke this command, which starts the Python wrapper program and has it listen for web requests on the network port provided by the container.

If you take a look at a basic Cloud Run quickstart tutorial, you’ll see most of these pieces in it. The only new part, which is needed to run the Pascal code, is the three lines that install and then run the Pascal compiler to turn the Pascal source into an executable file.

Build and deploy

Once you’ve created or cloned a folder with the four needed files: roman.pasapp.pyrequirements.txt, and Dockerfile, you can deploy it to Cloud Run. You will need a Google account (such as a Gmail account) and you can either install the Google Cloud SDK or your own computer, or use Cloud Shell (which already has the SDK installed) from inside your browser to run the necessary commands.

Ready? Here are the steps:

  1. Go to the Google Cloud console in your browser, and log in if you aren’t already. If this is the first time you’ve used the console you will probably have to agree to terms and conditions.
  2. Create a new project by clicking on the drop-down at the top of the page (that will either say “Select a project” or show a selected project) then clicking NEW PROJECT and entering the name you want. Wait a few minutes for the project to be created, then select it from the drop-down at the top of the page.
  3. Back at the command line on your computer, or in Cloud Shell, have Google Cloud build your container:
    gcloud builds submit --tag gcr.io/PROJECT-ID/my-program-name
    (where PROJECT-ID is the one created when you created the project, and my-program-name is any name you want to use to describe it). Answer any prompts you are shown — the choices should be clear. This builds your container and saves it in a cloud container repository under your control.
  4. Now deploy the container to Cloud Run:
    gcloud beta run deploy \
    --image gcr.io/PROJECT-ID/my-program-name \
    --platform managed

    You can put the command just on a single long line. Remove the backslashes if you do. Again, answer any prompts displayed.
  5. In a few minutes, the command line will display the service URL. You can open it in your browser, but remember to append /number to it, for some decimal number, to get the Roman numeral version back.

You now have a 45 year old Pascal program running as a serverless cloud app. Maybe you don’t need that, but some day you might need something else that’s a bit outside what most serverless platforms support, but which you can do in a container. That’s when Cloud Run will pay off for you.

Final thoughts

Your deployed app’s URL will be provided by Google, but you might want a friendlier option. You can connect your app to a URL on a domain you own if you’d like. It’s a bit tricky, but if you have set up your own domain it shouldn’t pose a problem. I did it, and my Roman numeral service is available at roman.engelke.dev, e.g., https://roman.engelke.dev/2345.

One of the slowest parts of building the container in this example is installing the Pascal compiler. I could have compiled the Pascal program on my own computer and used the executable instead of the source code in my folder when doing the build. That would let me skip the steps that install the compiler in the container. But since building a container happens rarely, I decided I’d rather be sure that my latest source code was always being used by compiling it as part of that step.

This tutorial uses fully managed Cloud Run, which has Google handle all the work for you. But you can also deploy to Cloud Run on GKE, either on Google Cloud Platform or even on your own premises, if that’s important for your app.

Building and Publishing the Site

This post is fifth (and for now, last) in a series, beginning with Automating Web Site Updates.

We’ve been narrowing the “miracle” step in our solution. This post fills in the remaining gap.

We have a Cloud Function ready to kick off the missing step that will build and deploy the updated site. At a high level, this step will need to:

  1. Fetch a repository with static web content plus source directories that each need to be converted to web pages
  2. Convert each source directory to web pages in desired format
  3. Build new static website structure
  4. Deploy the pages to a Firebase hosting project

We need a place to run our code that uses some high level tools: git, firebase CLI, and anything that converts source to web pages. And we need a file system to build the new static site in. Plus, this process may take longer than a cloud function is allowed to run (or at least longer than the GitHub webhook is willing to wait for a response). Those requirements are why we couldn’t just do these steps in the cloud function that responds to the GitHub webhook. We need something more general purpose than that.

Cloud Run looks like a possible solution. The managed version is a lot like Cloud Functions in that it takes your code, runs it in response to HTTP requests, and only charges for resources while the code is running. But instead of just providing source code in a supported language, you provide Cloud Run with a container. That container could run any supporting software you need, not just the supported language environment of a cloud function. Cloud Run will even build the container for you, from your specifications.

Any negatives to using Cloud Run? There are several for this use case, though they can possibly be worked around:

  • Cloud Run is still in beta, so it is subject to change before becoming final.
  • Containers require maintenance. If a security update is needed for any software, the container needs to be rebuilt with the new versions.
  • The container runs only so long as it is serving a web request, so if the requesting program is only willing to wait a short time (for example, 10 seconds for a GitHub webhook or 10 minutes for a Google Cloud Pub/Sub push subscription) we have to be able to build and deploy our site in that amount of time.
  • In any case, the each run is limited to no more than 15 minutes at this time.
  • The file system size is limited by the memory allocated to the service (no more than 2GB).
  • Invocations of the service can be concurrent, so if you are building a site in the file system you have to be sure concurrent invocations don’t step on each other, and don’t use up all the memory.

Despite these negatives, I find trying to use Cloud Run for the problem to be an intriguing approach. I’m not going to use it here, but I’ll keep thinking out how it can solve problems like this one.

So what is the solution for the current problem? I’m going to go old school, and use a virtual machine for this. In Google terms, I’m going to use a Compute Engine instance. At first look this may seem to go against my goal to use “services that require little or no customization or coding on our part”. And I also said I don’t want to maintain any servers. But the way I’m going to use Compute Engine will not require any coding other than that specifically aimed at our business logic, and won’t need any server maintenance, either.

We will launch a virtual machine with a startup script that will:

  • Install the standard tools we need (git, Firebase CLI, etc.)
  • Fetch the source from a GitHub repository
  • Build the web pages using existing tools specific to our needs
  • Deploy the site to Firebase hosting
  • Destroy itself when done

The first and last steps are key: this virtual machine installs what it needs when run and then deletes itself when it has finished the task. That way we aren’t paying for an idle machine standing by waiting for work to do; we’re only paying for what really use. Further, by creating a new machine for each task and then throwing it away, we don’t need to worry about updates – we always launch and then install the latest versions of the tools we’re using.

For more background on this technique of creating, using, then deleting Compute Engine instances, see this tutorial by Laurie White and me.

The virtual machine’s actions all need to be scripted in advance, so they can run without human intervention. Once the script is in place, we can enhance the cloud function from the last blog post to create a new Compute Engine instance that will run that script. The script needs to end by deleting the instance it runs on.

We aren’t going to build the whole solution here, just give the outline. Here’s what the script will look like:

#!/bin/sh
apt update; apt install -y git
#
# Install other tools, get code from GitHub, run business
# logic to build web site pages, deploy to Firebase hosting
# -- Not included in this post
#
# Instance deletes itself below (see tutorial for details)
export METADATA=metadata.google.internal/computeMetadata/v1/instance
export NAME=$(curl -X GET http://$METADATA/name -H 'Metadata-Flavor: Google')
export ZONE=$(curl -X GET http://$METADATA/zone -H 'Metadata-Flavor: Google')
gcloud --quiet compute instances delete $NAME --zone=$ZONE

If we launch a machine with the startup script above (filled in with all the business logic specific details) it will pull our source content from GitHub, build website pages from that, then deploy it to our Firebase hosting site. Which leaves us with one more question: how do we launch such a machine from our Cloud Function (that unfinished update_the_site() function from the last post)? We use the google-api-python-client library. It’s pretty low-level, but there’s good sample code available you can adapt to do this.

So that’s the pipeline now:

I’m going to put this topic to rest for a while, but there are tips and trips regarding secrets and permissions I’ll probably talk about soon.

Responding to GitHub Updates

This post is fourth in a series, beginning with Automating Web Site Updates.

Most of the picture of the process we need is filled in now. We have to deal with what happens between a GitHub PR being merged and an updated website being deployed on Firebase Hosting. This post is going to just deal with responding to a GitHub PR merge.

We need to know when a PR is merged so we can kick off the rest of the update process. Lucky for us, GitHub has a feature that will tell us that: webhooks. At its core it’s a really simple idea: when an event you care about happens, GitHub will make a web request to a URL of your choosing with information about the event in its body. You just need to provide a web request handler to receive it. So before we set up the webhook, let’s figure out what we will use to receive those requests. We need:

  • to run our own custom code
  • when triggered by an HTTP (actually HTTPS) request
  • containing information about a merged PR
  • without costing much when nothing is happening (which in this case, is probably 99% or more of the time)

That sounds tailor-made for a Cloud Function. We can write code in Go, Node, or Python and say we want it triggered by an HTTPS request. Cloud Functions gives us a URL and runs our code whenever a request is sent to that URL. We don’t pay for anything except time and memory while our code is running, not while it is idle waiting for a notification. The only problem is that functions are limited in what they can do. They can’t run long jobs, they have only a small file system available, and they only provide a few language options. We can’t install other software in them, either. But none of that is a problem because we are not trying to handle the website update in the cloud function, we just need to kick that off when it’s appropriate (a subject for the next blog post).

So we will use the Google Cloud Platform console to create a new HTTP-triggered Python Cloud Function. For now, we’ll leave the default sample code in it; we just want to know the URL for the next step: setting up a GitHub webhook.

Authorized GitHub repository users can set up a GitHub webhook in the repository’s Settings section. There’s a section just for Webhooks, and a button to add a new webhook. After you click that, there are some choices to be made:

  • The Payload URL is the address that GitHub will send the request to. That’s our cloud function’s URL from the step above.
  • The Content type specifies the format of the body of the request GitHub will send. The default is application/w-www-form-urlencoded, which is what a web page might send when a user submits a form. Since we want to get a possibly complicated data structure from GitHub, the second option, application/json, is a better choice for us.
  • The Secret is a string (shared between GitHub and your receiving application) that GitHub will use to create a signature for each web request. This is a non-standard way to check that a request really comes from the GitHub webhook you created, and not somewhere else. I created a long random password with a password manager for this.
  • We finally reach the question “Which events would you like to trigger this webhook?” We can choose “just the push event”, but we don’t much care about pushes, we want to know about merges. The second option, “send me everything,” would certainly include the merge events, but we don’t want to be bothered about the vast majority of events we’d be told about then. So we can say “Let me select individual events” and just hear about Pull Requests. That choice still includes a lot of events we don’t care about (creating PRs, closing them unmerged, labeling them, and so on) but it seems to be the narrowest choice that includes PR merges.
  • And we’re going to want to make this webhook Active.

When we click the Add Webhook button, GitHub will send a test request to the URL to see if there’s something at that address accepting incoming data. That should pass, since we already created a cloud function there, but if not, that’s okay for now. We need it to work once the Cloud Function is finished.

Now that we have a webhook, every action on a PR on this repository will cause GitHub to send a JSON object to our cloud function. We need to verify that this request describes a PR merge, since we will get notification of all sorts of other PR events, too. And we need to make sure that this information really comes from our GitHub webhook, and not somebody trying to fool us into thinking a merge happened. Here’s an outline of what we need to do:

  • Load the JSON data in the request body into a Python object
    notification = request.get_json()
  • Check to see that this is a PR that is closed by a merge; if not, just exit, nothing to do here
    if notification.get('action') != 'closed':
    return 'OK', 200
    else:
    if notification.get('pull_request') is None:
    return 'OK', 200
    else:
    if not notification['pull_request'].get('merged'):
    return 'OK', 200
  • Check that the signature is valid; if not, just return a Forbidden response and exit, we aren’t going to deal with fake requests
    import hashlib, hmac, os
    secret = os.environ.get('SECRET', 'Missing!').encode()
    signature = request.headers.get('X-Hub-Signature')
    body = request.get_data()
    calc_sig = hmac.new(secret, body, hashlib.sha1)
    if signature != 'sha1={}'.format(calc_sig).hexdigest():
    return 'Forbidden', 403
  • Kick off the next step that will actually publish an updated website based on the contents of the repository
    update_the_site()

Notice that the secret for the signature, which was provided to GitHub when creating the webhook, is fetched from an environment variable. That returns a string, and it needs to be converted to bytes in order to send to the hash function. You can set up the necessary environment variable when creating or redeploying a cloud function. This is a better option than keeping the secret in the source code itself, which might be available to others in a source repository at some point.

Which leaves us with one big piece left to build, update_the_site(). That will be covered in the next post. Spoiler alert: the cloud function won’t be doing the update, it will just kick off some other tool to handle that.

So, our update process picture is nearly complete:

Jumping to the Other End

This post is third in a series, beginning with Automating Web Site Updates.

Readers are going to use web browsers to look at content, so we need some kind of web server to deliver the final formatted web pages. It’s static content, so there are lots of choices:

  1. A virtual machine running web server software like Apache or NGINX
  2. A web hosting service
  3. Cloud storage set for public access
  4. Serverless platforms

Option 1 is right out. I do not want to configure, manage, patch, and monitor a server. The second option might be okay, but most of them aren’t amenable to full automation. The third choice could be okay, but the cloud storage I’d prefer to use (Google Cloud Storage) doesn’t offer the ability to use HTTPS on a custom domain. That leaves a serverless solution.

Which serverless solution? I’m going to stick with Google Cloud Platform, but other cloud providers offer many similar services. GCP’s serverless offerings include Cloud Functions, Cloud Run, App Engine, and Firebase. I’d have to write code to respond to web requests for Cloud Functions or Cloud Run, so they’re out. Both App Engine and Firebase can serve static web pages without my writing any code, so they’re still looking good.

We want static web page hosting, via HTTPS, on a custom domain, and we want it cheap. We’d rather have it free (hey, is there an option to have them pay us?). Well, both App Engine and Firebase Hosting have free tiers available. So how to choose? I’ve used them both and they’d both work for this. We have to pick one, and I found Firebase Hosting to be easy, scalable, and affordable.

The solution will use Firebase Hosting for the last step.

We will need to build a static copy of the desired website, and use Firebase tools to deploy it to the service. Other than that, we don’t need to do anything to have the pages served in a scalable and reliable manner.

The picture is beginning to be filled in:

Contributors to GitHub to ? to Firebase Hosting to Readers

The unknown center is shrinking. Next time we will jump back to the other side of that unknown and expand on what we need GitHub to do.

Working from the Outside In

This post is second in a series, beginning with Automating Web Site Updates.

Let’s start searching for a solution by starting at the edges, then filling in the middle. That is, “where does this process begin, and where does it end up?”

Well, it begins by contributors uploading their content in markdown format, and ends up with a web pages delivered to readers. Here’s a vague picture:

Contributors to ? to Readers

That picture looks kind of familiar:

"Then a miracle occurs" Cartoon by S. Harris. Copyright ScienceCartoonsPlus.com, used with permission.
Copyright ScienceCartoonsPlus.com, reprinted with permission.

We need to fill in the middle of this picture, ideally with services that require little or no customization or coding on our part.

Let’s start with the contributors. They produce markdown files with their content, and need to send them somewhere where they can be reviewed, commented on, possibly changed (by them or others) before they can be used. That sounds like a Git repository. We could set up our own git repo on a server somewhere, but remember: we want to use existing services whenever we can. And GitHub is just such a service (there are others, too, like GitLab, that would work fine, too).

We will use GitHub for this first step. The contributors are all at least somewhat technical so they should have little trouble forking the main repository, adding content, and creating a pull request (PR) from their fork. If they need help there’s a lot of documentation at the site, in books, and on StackOverflow. And the editors can use regular GitHub tools and processes to manage the contributions, eventually resulting in a merged PR.

So, we’ve started to fill in the picture:

Contributors to GitHub to ? to Readers

Next time we will continue to work from the outside in by jumping to the other side and deciding on how to deliver web content to the readers. Then we’ll jump back to see where we go from GitHub.

Automating Web Site Updates: a Case Study

I was presented with a problem recently: automate the process of updating a web site when new contributions come in. The contributions are articles, in markdown format, and they need to be translated to web pages, inserted into the site’s content, and parts of the site (such as index pages) need to be updated accordingly. The contributors don’t have full publishing authority on their own – each submitted article is reviewed and perhaps sent for editing before it is accepted. The translation from markdown to web format can be done by an automated tool, but is usually run by hand by an editor.

The goal: automate this process as fully as possible. My goal: build a solution with cloud-native, preferably serverless, technology.

I’m going to take the next several blog posts to go over my approach to the problem and the eventual solution. I hope to get a new post up every few days. Stay tuned.

Runs of Digits in Pi

My colleague, Emma Haruka Iwao, just published the decimal value of Pi to 31,415,926,535,897 places. It’s an amazing accomplishment for all sorts of reasons, not least being how to run a calculation across many machines over several months successfully. Google Compute Engine’s Live Migration capability was key in keeping it running without a break.

But now, how about a much less amazing accomplishment? I wrote a program to scan those digits to look for runs of the same digit. I got the digits from Emma’s results. She has set up a web service anyone can use to fetch a bunch of the digits on demand. Technical details after a few results below.

How far do you have to go in the decimal value of Pi to find two identical successive digits? It turns out to be only a couple of dozen places. ’33’ is found at positions 25 and 26 (counting the first ‘3’ in Pi as position 0, the decimal point as position 1, and so on). Want to find three identical digits in a row? There’s a run of ‘111’ starting at position 34.

Interestingly (at least to me), the first run of more than three digits has six identical digits in a row: ‘999999’ starting at position 763. Want a run of seven? You’ll have to go to position 710,101 to find ‘3333333’. How about a run of eight? I don’t have the answer. I stopped my program from scanning the digits after about 10,000,000 places without finding that long a run.

I wrote a simple Python program to ask the web service for a thousand digits at a time (the most it will serve for each request), and scanned for long runs of digits. I say it’s simple, but it took me at least a dozen tweaks to get it working right. Still, the only special feature of it is how it gets its data: it makes an HTTP GET request to https://api.pi.delivery/v1/pi?start=0&numberOfDigits=1000 (with the 0 and 1000 replaced by the actual starting point and number of digits you want). The response is a JSON object with one field called content. The value of that field is a string containing the digits.

It’s a nice little programming exercise. Try it yourself, and see if you can make as many errors as I did before you get it working.