π is an interesting number. It’s not only irrational, it’s transcendental. Its decimal representation “looks random” (it isn’t random, it’s precisely defined, but seems to pass tests for randomness). It seems likely that any pattern of digits you want can be found somewhere in that representation. It may take a while to find them, but given enough digits of π you can expect to find any given number. For example, Jenny’s number, 8675309, is found at the 9,202,591st position (counting the initial 3 in the number as being at position 0).
I found that out using a cloud application I created to search for any string of decimal digits in π. The approach I took to building it demonstrates some useful approaches to cloud application development and architecture. That’s what this post is about: not a comprehensive cloud architecture framework, just an example of how to think about architecting applications for the cloud. This post doesn’t have a detailed description of the application, but there is example code available at https://github.com/engelke/where-in-pi-is/. With a bit of work you could use that code to build and deploy your own π digit searcher. Or you can get inspiration on how to build applications to solve other problems you encounter.
If I’m going to find strings in the digits of π, I’ll have to have those digits available somehow. In theory, a program could calculate more and more digits as it searched for a string, stopping when the string is found. But that would be ridiculous; calculating those digits is an intensive, slow process.
Lucky for me, someone already calculated quite a few digits of π. On March 14, 2019, Google announced that Emma Haruka Iwao had calculated more than 31.4 trillion digits of π using Google Cloud Platform tools. That was a new world record when announced. And Google set up a web API for returning selected decimal places at pi.delivery. To get the seven digits starting at the 9,202,591st decimal place, make a web request to https://api.pi.delivery/v1/pi?start=9202591&numberOfDigits=7. You should see:
So one way to search for a seven digit number in π would be to just request the seven digits starting at position 0, then at position 1, and so on, until the number being looked for is returned.
Don’t try that. Really, don’t.
The service is pretty fast, taking about 75 ms per request of that size when I tried it out. At that rate, you’d find this number in about four years, unless Google blocked you first for abusing the API. I want faster answers.
So let’s figure out a faster way by building a cloud application to find digit strings in π. When we design for the cloud, we don’t think of a computer as the application platform. No, the entire collection of available web services is our platform. And we commonly will use several of those services, communicating in various ways with each other, to build an application. For the π searcher, we start by looking at each different action that needs to be done, and then perform each action as an independent piece. Then we’ll use cloud technology to get those pieces to work together.
First Piece: Searching
Something has to store the digits of π and search through them. There are links on the pi.delivery site to download the digits to your own storage for processing. The simplest fast way to stream through data is to put it in a file connected to a virtual machine, so that’s what I did: launched a virtual machine Google Compute Engine and put the digits into a text file named pi.txt. I wrote a Python program that memory-mapped that file to a Python byte string, and used the built-in find method for the search. I’d look for other ways if this was too slow.
A search for the seven digit string 8675309 took only 20ms. That’s much better than four years for the repeated calls to the API site. In fact, it’s good enough for me, for now. For every extra digit in length, we can expect to take about ten times as long to search. So looking for an 8 digit number (like a date) should take about a fifth of a second, ten digits (like a phone number with area code) about 20 seconds, and so on. And though this is fairly fast, a lot of searches take too long for the requester to wait for a response, as they would with a web page. Users will have to ask for a search and then later somehow get the result.
That’s the first piece of the application: a search box that can find a requested string of digits reasonably fast. See the
where_is function in https://github.com/engelke/where-in-pi-is/blob/master/computeengine/find-in-pi.py for a minimal solution. Now let’s look at what else is needed.
Second Piece: Requesting a Search
The next problem to be solved is how someone can request a search. I completely control the search box virtual machine, so I can just SSH into it and run Python code. That’s not going to work for anybody else. There needs to be some way people can ask for a search to be run. I can think of a few ways:
- Fill out a form on a web page
- Send an email to a special address
- Post a tweet, mentioning a particular Twitter account
- Send a text via SMS
- Make a voice phone call and say a number (or use a keypad)
We will have to build one or more of those ways, but before getting into that, let’s figure out how that front end that a user interacts with will get the request to the search box. We want the method used to be asynchronous (so the requesting program doesn’t have to wait and possibly get overloaded) and authenticated (so malicious attackers can’t flood the searcher with fake requests). Cloud platforms offer services exactly tailored to that need: message passing.
There are different flavors of message passing available; Google offers PubSub. An application can create one or more topics (for example, one called
search_requests) and components can publish messages to the topic and other components can subscribe to the topic, hence the name. Google PubSub guarantees published messages will be delivered at least once, and only rarely delivers a message more than once. That’s okay for our use, since the worst case is that we (very rarely) search for the same requested string more than once.
The front end getting a user request for a search is going to publish a message to the
search_requests topic, and the search box will subscribe to that same topic. A PubSub subscription can be configured as push messaging or pull messaging. Push messaging forces each message on the subscriber as soon as it is available, and keeps pushing each message until the subscriber acknowledges it or a retry or timeout limit is exceeded. Pull messaging waits for the subscriber to ask for available messages before delivering them.
Pull messaging is a good fit for this use, since the search box can have a loop that asks for messages, performs the searches being requested one at a time, and then asks for more messages. The search box can’t be overloaded this way, since it’s only running one search at a time. Of course, the backlog of messages will grow if the search box doesn’t consume them fast enough, so that will have to be watched. If the box has the capacity to perform more than one search at a time, we can just have multiple processes, each fetching and processing messages on their own. If we need to scale way up, we can deploy multiple search boxes.
Third Piece: User Request Front End
I listed five possible ways for a user to request a search earlier. We need to implement at least one of them, or there’s no point to having a search box. We will just do one of them for this example, and the easiest seems to be the first: give the user a web page with a form to fill out, requesting the search. If we needed to we could always implement more front ends later, just by having them each publish to the
search_requests PubSub topic.
There are lots of ways to provide a web page with a form and then accept the filled in form in return. If you are used to managing your own computing resources, you’ll likely think of setting up a virtual machine with web server software for that. That will work just fine, but requires doing our own system administration and maintenance. Serverless computing products let you write your application code and leave everything else, including scaling, monitoring, and logging, up to the platform.
We need a component that responds to web requests with a page containing a form for a GET request, and by publishing a message to the
search_requests topic for a POST request. The simplest way that I know to do that is with a Cloud Function. Providing this web user interface just requires writing a Python function that the platform will route web requests to. The code needs to see if the request is a GET, in which case it returns a web page with a form, or a POST, in which case it reads the values submitted with the form, and asks for a search by publishing a message to the
search_requests PubSub topic. You can see how easy it is to implement this as a Cloud Function at https://github.com/engelke/where-in-pi-is/blob/master/function/main.py.
The overall application now looks like:
Fourth Piece: Pushing a Result
One thing front end does not do is return the result of the search to the user. Since a search might take a while, or need to wait for other earlier searches to finish before it can start, the front end function is just responsible for triggering a search. The user is going to have to get the answer somewhere else. Possible “somewhere else” choices would include a web page (either requiring user login or having a unique request given to the user for each request) or an email message (to an address provided to the user).
For the moment, we’re going to postpone figuring out how to deliver a result to a user. Instead, we will set up another PubSub topic:
send_result. When the search box gets an answer, it will publish a message to that topic and trust that there’s a subscriber to that topic that will do the job of delivery. The message is going to have include information on how to deliver that result, which is going to vary depending on how the request came in. For many requests, the natural way to send a result will be via email. For those situations, the front end will have to collect an email address, including it in the message to the
search_requests topic, and the search box will have to then include it in the message to the
send_result topic. Other forms of requests might need responses via other methods, needing a Twitter username or a phone number to send a text to. The messages can simply include the type of response and appropriate address. Until we get to the next step, nothing needs to much care what those are.
Final Piece: Sending an Email Result
The final piece of the application is a component that subscribes to the
send_result topic and does the delivery. If we have multiple delivery methods, we could either have a single delivery component that handles them all, or separate components each subscribing to the topic and skipping any messages that they can’t handle. We will keep it simple here and handle every message by sending an email.
How do you send email using Google Cloud Platform? Well, there’s no easy way. There’s no email sending API, as some cloud providers have. If you set up a virtual machine with email software, it will probably be blocked to prevent possible spam. There are ways around this, and third-party services you can use, but I remember that Google App Engine used to have an email sending API. The latest version of App Engine no longer provides that API, but the previous App Engine version is still available, so that’s what we will use.
That’s right: we will set up an App Engine instance just to send emails based on PubSub messages sent to it.
Code for the App Engine for Python 2.7 application is at https://github.com/engelke/where-in-pi-is/tree/master/appengine. It uses a PubSub push subscription to send HTTP requests to it whenever a message is published to the
send_result topic. Those incoming messages include the email address to send the result to and the result itself, and include a shared secret token that is configured in the push subscription so that App Engine can be sure those incoming requests aren’t forged.
Wrapping It All Up
Our complete app looks like this:
The user interacts with a web form provided by a Cloud Function, which publishes a message to the
search_requests topic. The searching is done by a Compute Engine virtual machine that pulls search requests, finds the answer, and publishes the answer to the
send_result topic. Finally, an App Engine instance gets those requests to send results via a push subscription that topic, and emails the address originally provided by the user.
The code for the various pieces is available at https://github.com/engelke/where-in-pi-is. For the time being, there’s a live version of this application available at https://us-central1-engelke-pi-blog.cloudfunctions.net/where-in-py. I don’t promise to keep it running, but while it is live you can try it out. It will only search the first couple billion digits of π because storing the full 31.4 trillion digits that Emma calculated would be pretty expensive for a demo like this. Still, that will find most 7 or 8 digit numbers you ask for.
Give this a try, or build your own serverless cloud application using some of these ideas!