Finding Patterns in π

π is an interesting number. It’s not only irrational, it’s transcendental. Its decimal representation “looks random” (it isn’t random, it’s precisely defined, but seems to pass tests for randomness). It seems likely that any pattern of digits you want can be found somewhere in that representation. It may take a while to find them, but given enough digits of π you can expect to find any given number. For example, Jenny’s number, 8675309, is found at the 9,202,591st position (counting the initial 3 in the number as being at position 0).

I found that out using a cloud application I created to search for any string of decimal digits in π. The approach I took to building it demonstrates some useful approaches to cloud application development and architecture. That’s what this post is about: not a comprehensive cloud architecture framework, just an example of how to think about architecting applications for the cloud. This post doesn’t have a detailed description of the application, but there is example code available at https://github.com/engelke/where-in-pi-is/. With a bit of work you could use that code to build and deploy your own π digit searcher. Or you can get inspiration on how to build applications to solve other problems you encounter.

The Data

If I’m going to find strings in the digits of π, I’ll have to have those digits available somehow. In theory, a program could calculate more and more digits as it searched for a string, stopping when the string is found. But that would be ridiculous; calculating those digits is an intensive, slow process.

Lucky for me, someone already calculated quite a few digits of π. On March 14, 2019, Google announced that Emma Haruka Iwao had calculated more than 31.4 trillion digits of π using Google Cloud Platform tools. That was a new world record when announced. And Google set up a web API for returning selected decimal places at pi.delivery. To get the seven digits starting at the 9,202,591st decimal place, make a web request to https://api.pi.delivery/v1/pi?start=9202591&numberOfDigits=7. You should see:

{"content":"8675309"}

So one way to search for a seven digit number in π would be to just request the seven digits starting at position 0, then at position 1, and so on, until the number being looked for is returned.

Don’t try that. Really, don’t.

The service is pretty fast, taking about 75 ms per request of that size when I tried it out. At that rate, you’d find this number in about four years, unless Google blocked you first for abusing the API. I want faster answers.

So let’s figure out a faster way by building a cloud application to find digit strings in π. When we design for the cloud, we don’t think of a computer as the application platform. No, the entire collection of available web services is our platform. And we commonly will use several of those services, communicating in various ways with each other, to build an application. For the π searcher, we start by looking at each different action that needs to be done, and then perform each action as an independent piece. Then we’ll use cloud technology to get those pieces to work together.

First Piece: Searching

Something has to store the digits of π and search through them. There are links on the pi.delivery site to download the digits to your own storage for processing. The simplest fast way to stream through data is to put it in a file connected to a virtual machine, so that’s what I did: launched a virtual machine Google Compute Engine and put the digits into a text file named pi.txt. I wrote a Python program that memory-mapped that file to a Python byte string, and used the built-in find method for the search. I’d look for other ways if this was too slow.

A search for the seven digit string 8675309 took only 20ms. That’s much better than four years for the repeated calls to the API site. In fact, it’s good enough for me, for now. For every extra digit in length, we can expect to take about ten times as long to search. So looking for an 8 digit number (like a date) should take about a fifth of a second, ten digits (like a phone number with area code) about 20 seconds, and so on. And though this is fairly fast, a lot of searches take too long for the requester to wait for a response, as they would with a web page. Users will have to ask for a search and then later somehow get the result.

That’s the first piece of the application: a search box that can find a requested string of digits reasonably fast. See the where_is function in https://github.com/engelke/where-in-pi-is/blob/master/computeengine/find-in-pi.py for a minimal solution. Now let’s look at what else is needed.

Second Piece: Requesting a Search

The next problem to be solved is how someone can request a search. I completely control the search box virtual machine, so I can just SSH into it and run Python code. That’s not going to work for anybody else. There needs to be some way people can ask for a search to be run. I can think of a few ways:

  1. Fill out a form on a web page
  2. Send an email to a special address
  3. Post a tweet, mentioning a particular Twitter account
  4. Send a text via SMS
  5. Make a voice phone call and say a number (or use a keypad)

We will have to build one or more of those ways, but before getting into that, let’s figure out how that front end that a user interacts with will get the request to the search box. We want the method used to be asynchronous (so the requesting program doesn’t have to wait and possibly get overloaded) and authenticated (so malicious attackers can’t flood the searcher with fake requests). Cloud platforms offer services exactly tailored to that need: message passing.

There are different flavors of message passing available; Google offers PubSub. An application can create one or more topics (for example, one called search_requests) and components can publish messages to the topic and other components can subscribe to the topic, hence the name. Google PubSub guarantees published messages will be delivered at least once, and only rarely delivers a message more than once. That’s okay for our use, since the worst case is that we (very rarely) search for the same requested string more than once.

The front end getting a user request for a search is going to publish a message to the search_requests topic, and the search box will subscribe to that same topic. A PubSub subscription can be configured as push messaging or pull messaging. Push messaging forces each message on the subscriber as soon as it is available, and keeps pushing each message until the subscriber acknowledges it or a retry or timeout limit is exceeded. Pull messaging waits for the subscriber to ask for available messages before delivering them.

Pull messaging is a good fit for this use, since the search box can have a loop that asks for messages, performs the searches being requested one at a time, and then asks for more messages. The search box can’t be overloaded this way, since it’s only running one search at a time. Of course, the backlog of messages will grow if the search box doesn’t consume them fast enough, so that will have to be watched. If the box has the capacity to perform more than one search at a time, we can just have multiple processes, each fetching and processing messages on their own. If we need to scale way up, we can deploy multiple search boxes.

Third Piece: User Request Front End

I listed five possible ways for a user to request a search earlier. We need to implement at least one of them, or there’s no point to having a search box. We will just do one of them for this example, and the easiest seems to be the first: give the user a web page with a form to fill out, requesting the search. If we needed to we could always implement more front ends later, just by having them each publish to the search_requests PubSub topic.

There are lots of ways to provide a web page with a form and then accept the filled in form in return. If you are used to managing your own computing resources, you’ll likely think of setting up a virtual machine with web server software for that. That will work just fine, but requires doing our own system administration and maintenance. Serverless computing products let you write your application code and leave everything else, including scaling, monitoring, and logging, up to the platform.

We need a component that responds to web requests with a page containing a form for a GET request, and by publishing a message to the search_requests topic for a POST request. The simplest way that I know to do that is with a Cloud Function. Providing this web user interface just requires writing a Python function that the platform will route web requests to. The code needs to see if the request is a GET, in which case it returns a web page with a form, or a POST, in which case it reads the values submitted with the form, and asks for a search by publishing a message to the search_requests PubSub topic. You can see how easy it is to implement this as a Cloud Function at https://github.com/engelke/where-in-pi-is/blob/master/function/main.py.

The overall application now looks like:

Architectural diagram

Fourth Piece: Pushing a Result

One thing front end does not do is return the result of the search to the user. Since a search might take a while, or need to wait for other earlier searches to finish before it can start, the front end function is just responsible for triggering a search. The user is going to have to get the answer somewhere else. Possible “somewhere else” choices would include a web page (either requiring user login or having a unique request given to the user for each request) or an email message (to an address provided to the user).

For the moment, we’re going to postpone figuring out how to deliver a result to a user. Instead, we will set up another PubSub topic: send_result. When the search box gets an answer, it will publish a message to that topic and trust that there’s a subscriber to that topic that will do the job of delivery. The message is going to have include information on how to deliver that result, which is going to vary depending on how the request came in. For many requests, the natural way to send a result will be via email. For those situations, the front end will have to collect an email address, including it in the message to the search_requests topic, and the search box will have to then include it in the message to the send_result topic. Other forms of requests might need responses via other methods, needing a Twitter username or a phone number to send a text to. The messages can simply include the type of response and appropriate address. Until we get to the next step, nothing needs to much care what those are.

Final Piece: Sending an Email Result

The final piece of the application is a component that subscribes to the send_result topic and does the delivery. If we have multiple delivery methods, we could either have a single delivery component that handles them all, or separate components each subscribing to the topic and skipping any messages that they can’t handle. We will keep it simple here and handle every message by sending an email.

How do you send email using Google Cloud Platform? Well, there’s no easy way. There’s no email sending API, as some cloud providers have. If you set up a virtual machine with email software, it will probably be blocked to prevent possible spam. There are ways around this, and third-party services you can use, but I remember that Google App Engine used to have an email sending API. The latest version of App Engine no longer provides that API, but the previous App Engine version is still available, so that’s what we will use.

That’s right: we will set up an App Engine instance just to send emails based on PubSub messages sent to it.

Code for the App Engine for Python 2.7 application is at https://github.com/engelke/where-in-pi-is/tree/master/appengine. It uses a PubSub push subscription to send HTTP requests to it whenever a message is published to the send_result topic. Those incoming messages include the email address to send the result to and the result itself, and include a shared secret token that is configured in the push subscription so that App Engine can be sure those incoming requests aren’t forged.

Wrapping It All Up

Our complete app looks like this:

The user interacts with a web form provided by a Cloud Function, which publishes a message to the search_requests topic. The searching is done by a Compute Engine virtual machine that pulls search requests, finds the answer, and publishes the answer to the send_result topic. Finally, an App Engine instance gets those requests to send results via a push subscription that topic, and emails the address originally provided by the user.

The code for the various pieces is available at https://github.com/engelke/where-in-pi-is. For the time being, there’s a live version of this application available at https://us-central1-engelke-pi-blog.cloudfunctions.net/where-in-py. I don’t promise to keep it running, but while it is live you can try it out. It will only search the first couple billion digits of π because storing the full 31.4 trillion digits that Emma calculated would be pretty expensive for a demo like this. Still, that will find most 7 or 8 digit numbers you ask for.

Give this a try, or build your own serverless cloud application using some of these ideas!

Serverless computing … with Pascal

This story is cross-posted on Medium as well.

Interested in serverless computing but don’t want to use some newfangled programming language like Python, JavaScript, or Go? Then why not write your serverless web app in Pascal? Let’s go back to the 1970s by taking a Pascal program from the definitive Pascal User Manual and Report, published in 1974, and deploy it to the Google Cloud Run serverless platform. Oh, and here’s a 1970s background music playlist to get you focused.

What’s that? You don’t have a need to run serverless Pascal? Then maybe you have some other legacy software in only executable form, or a language not widely supported by serverless platforms, or built out of several different apps. The techniques shown here can solve those problems, too.

Some background

Google Cloud Functions and similar serverless solutions make it extremely easy to deploy functionality to the cloud. You just write your application code, upload it to the service, and they handle deployment, provisioning, infrastructure, scaling, logging, and security for you. They’re wonderful, but require a trade-off. They will only accept a single program written in one of their supported languages. And Pascal is not (yet?) one of those supported languages.

But Cloud Run provides similar capabilities with a slightly different trade-off. You can use other languages (like Pascal!), executable files, or multiple programs but you need to provide a container, not just source code. That’s a little more work, but less than you might think, because Cloud Build will do it for you. And you get all the normal serverless benefits like automatic scaling (even to zero when your code isn’t running). Let’s see how.

TL;DR: One-button deployment

The sample project repository is on GitHub. Take a look at it. The README file is displayed and there’s a big button near the top of it:

If you have a Google Cloud account, such as a Gmail account, you can click that button, answer any prompts it displays, and in a few minutes you will have a running web service built from the Pascal program in the repository. The URL will be displayed when the service is deployed.

You can fork this repository and change it to run your own Pascal code (or, with a little more tweaking, any other code) and launch your own new service the same way. It’s even possible to make the service run at a URL on your own domain.

How it works

I’ve taken a Pascal program and deployed it to Cloud Run as a web service. It takes a number and returns the same number, but in Roman numerals. Want to see what 1974 (the year the program I’m using was published) looks like in Roman numerals? Just call the RESTful service via https://roman.engelke.dev/1974 to find out. Or you can put a different number in the URL to convert it. This program doesn’t use Roman numeral shortcuts (like IX for 9) so there can be four of one letter in a row.

To build this from scratch yourself, first create a new folder to hold all the pieces, or simply clone the GitHub repository to a new folder with the command below:

git clone https://github.com/engelke/cloud-run-pascal.git

The folder will contain the Pascal program and any other needed pieces to make it work in Cloud Run. First up is the Pascal program itself:

Clean and simple, and look: clauses are separated by semicolons and the program ends with a period! Newer languages aren’t so well punctuated. All this program does is read an integer from standard input and write the Roman numeral equivalent to standard output. No modern web technology is needed for that. Which is good, because Pascal is decades older than the web. It’s even older than the Internet Protocol, IP.

The Pascal program doesn’t understand the web, but the container must include a web server. When a web request comes in to Cloud Run, it will run the container and send the request to the web server in it. That web server is provided by a Python wrapper program. The wrapper program doesn’t understand the application, it’s glue software that listens for a web request, pulls the number out of the end of the URL, and runs the Pascal program with that number as its input. If the Pascal program crashes, the wrapper returns a 500 Server Error message. If the program writes an error message, the wrapper returns a 400 Bad Request message. Otherwise, the wrapper takes the output of the Pascal program and returns it as the body of the response.

Along with the Python wrapper program there is a requirements.txt file that specifies which Python libraries are needed by the program. Again, this is just needed to wrap around the real program we need to run, which is in Pascal.

The only other thing needed is the Dockerfile, a text file that tells how to build the container. Let’s take a look at it.

  • FROM python:3.7-slim
    Build the container on top of a standard one called python:3.7-slim.
  • ENV APP_HOME /app
    WORKDIR $APP_HOME
    COPY . ./

    Specify where in the container to place the code, and copy the files in the current directory there.
  • RUN pip install -r requirements.txt
    As part of building the container, run this command to install the libraries needed by the Python wrapper program.
  • RUN apt-get update -y -q
    RUN apt-get install -y -q fpc
    RUN fpc roman.pas

    Pascal source code can’t be run directly, it must be compiled (converted to a binary executable) first. So, when building the container, these lines say to run commands to install a Pascal compiler called fpc and then use it to compile the roman.pas program. The binary executable produced inside the container will be just be called roman.
  • CMD exec gunicorn --bind:$PORT --workers 1 --threads 8 app:app
    After the container is built and deployed, whenever it is run it should invoke this command, which starts the Python wrapper program and has it listen for web requests on the network port provided by the container.

If you take a look at a basic Cloud Run quickstart tutorial, you’ll see most of these pieces in it. The only new part, which is needed to run the Pascal code, is the three lines that install and then run the Pascal compiler to turn the Pascal source into an executable file.

Build and deploy

Once you’ve created or cloned a folder with the four needed files: roman.pasapp.pyrequirements.txt, and Dockerfile, you can deploy it to Cloud Run. You will need a Google account (such as a Gmail account) and you can either install the Google Cloud SDK or your own computer, or use Cloud Shell (which already has the SDK installed) from inside your browser to run the necessary commands.

Ready? Here are the steps:

  1. Go to the Google Cloud console in your browser, and log in if you aren’t already. If this is the first time you’ve used the console you will probably have to agree to terms and conditions.
  2. Create a new project by clicking on the drop-down at the top of the page (that will either say “Select a project” or show a selected project) then clicking NEW PROJECT and entering the name you want. Wait a few minutes for the project to be created, then select it from the drop-down at the top of the page.
  3. Back at the command line on your computer, or in Cloud Shell, have Google Cloud build your container:
    gcloud builds submit --tag gcr.io/PROJECT-ID/my-program-name
    (where PROJECT-ID is the one created when you created the project, and my-program-name is any name you want to use to describe it). Answer any prompts you are shown — the choices should be clear. This builds your container and saves it in a cloud container repository under your control.
  4. Now deploy the container to Cloud Run:
    gcloud beta run deploy \
    --image gcr.io/PROJECT-ID/my-program-name \
    --platform managed

    You can put the command just on a single long line. Remove the backslashes if you do. Again, answer any prompts displayed.
  5. In a few minutes, the command line will display the service URL. You can open it in your browser, but remember to append /number to it, for some decimal number, to get the Roman numeral version back.

You now have a 45 year old Pascal program running as a serverless cloud app. Maybe you don’t need that, but some day you might need something else that’s a bit outside what most serverless platforms support, but which you can do in a container. That’s when Cloud Run will pay off for you.

Final thoughts

Your deployed app’s URL will be provided by Google, but you might want a friendlier option. You can connect your app to a URL on a domain you own if you’d like. It’s a bit tricky, but if you have set up your own domain it shouldn’t pose a problem. I did it, and my Roman numeral service is available at roman.engelke.dev, e.g., https://roman.engelke.dev/2345.

One of the slowest parts of building the container in this example is installing the Pascal compiler. I could have compiled the Pascal program on my own computer and used the executable instead of the source code in my folder when doing the build. That would let me skip the steps that install the compiler in the container. But since building a container happens rarely, I decided I’d rather be sure that my latest source code was always being used by compiling it as part of that step.

This tutorial uses fully managed Cloud Run, which has Google handle all the work for you. But you can also deploy to Cloud Run on GKE, either on Google Cloud Platform or even on your own premises, if that’s important for your app.

Runs of Digits in Pi

My colleague, Emma Haruka Iwao, just published the decimal value of Pi to 31,415,926,535,897 places. It’s an amazing accomplishment for all sorts of reasons, not least being how to run a calculation across many machines over several months successfully. Google Compute Engine’s Live Migration capability was key in keeping it running without a break.

But now, how about a much less amazing accomplishment? I wrote a program to scan those digits to look for runs of the same digit. I got the digits from Emma’s results. She has set up a web service anyone can use to fetch a bunch of the digits on demand. Technical details after a few results below.

How far do you have to go in the decimal value of Pi to find two identical successive digits? It turns out to be only a couple of dozen places. ’33’ is found at positions 25 and 26 (counting the first ‘3’ in Pi as position 0, the decimal point as position 1, and so on). Want to find three identical digits in a row? There’s a run of ‘111’ starting at position 34.

Interestingly (at least to me), the first run of more than three digits has six identical digits in a row: ‘999999’ starting at position 763. Want a run of seven? You’ll have to go to position 710,101 to find ‘3333333’. How about a run of eight? I don’t have the answer. I stopped my program from scanning the digits after about 10,000,000 places without finding that long a run.

I wrote a simple Python program to ask the web service for a thousand digits at a time (the most it will serve for each request), and scanned for long runs of digits. I say it’s simple, but it took me at least a dozen tweaks to get it working right. Still, the only special feature of it is how it gets its data: it makes an HTTP GET request to https://api.pi.delivery/v1/pi?start=0&numberOfDigits=1000 (with the 0 and 1000 replaced by the actual starting point and number of digits you want). The response is a JSON object with one field called content. The value of that field is a string containing the digits.

It’s a nice little programming exercise. Try it yourself, and see if you can make as many errors as I did before you get it working.