Published on

Elixir on Google Cloud Platform: hot code reloading?

Authors

I stepped through this guide for a Hello World app running on Google Cloud Platform. How quickly can you get something working, and how convenient would it be to use longer-term?

In doing so, I confirmed GCP doesn't make use of Erlang's hot code reloading when re-deploying apps, but how important is that?

How long does it take to get started?

Starting point: OS X machine with Python installed; Google account with developer access and an existing payment profile.

Minutes elapsedStep
0Start!
1Create new Google Cloud project
3Add billing information (using existing payment profile) – they labour the point that you won't actually be charged
8Google Cloud SDK installed and initialised
13Elixir and Node packages installed locally
15Local development server up and running
19Distillery-based release running locally
20Start initial deploy to Google Cloud
33Initial deploy finished
35Start to deploy update to the app
45App update finished

Surprises, good and bad

  • Considering I deliberately began from a pretty unprepared starting point, I was pleased with how quickly the initial setup stages went (creating a new GCP project, installing the SDK locally, …). Quite often, these types of quick start tutorials seem quick on the surface of it, but it turns out there's a bunch of annoying admin you need to do first to get to the real starting point. Not so with this one.
  • This is a community-provided tutorial, and there are other very similarly named tutorials (see this list) with little to no distinction made about the differences between them, or why you might want to choose this over that. In addition, there are rudimentary Getting Started instructions on the GCP Elixir page which are different from the tutorials. It's easy to get confused or distracted when there's not a single, official source of information.
  • Erlang applications (and Elixir apps by extension) have a great feature of hot code loading which means that the application can be changed live, without needing to take down and restart the service. However, this feature isn't – and can't be – used in any of the full service cloud platforms as far as I know. You could implement it yourself on EC2 or similar, of course, but I'd prefer not to. More detail about why this is important below.
  • The update to the app took longer than I was expecting. The initial deploy needs to set up all sorts of infrastructure, I'm sure, so it's understandable it took a while (about 13 minutes in my case). However, the subsequent deploy should be much more straightforward and I'm surprised it took 10 minutes.
  • As is the norm (I find) with Google services, there wasn't a focus on an amazing developer experience. The platform is undoubtedly more extensible and configurable than something like Heroku, and the trade-off is that you have to wade through a bit more complexity to get things up and running, and to do common tasks.

Hot code upgrades

As mentioned above, Erlang and Elixir apps can be updated on the fly without stopping the app. In GCP's case, this feature isn't being used. I proved this to myself with the following test:

By including an Agent in the Phoenix app, I could store state between requests:

# words.ex
defmodule AppengineExample.Words do
  use Agent

  def start_link do
    Agent.start_link(fn -> [] end, name: __MODULE__)
  end

  def put(value) do
    Agent.update(__MODULE__, fn(state) -> state ++ [value] end)
  end

  def get do
    Agent.get(__MODULE__, fn(state) -> state end)
  end
end

I hooked my controllers up to that Agent, to save and retrieve words submitted from a <form>:

# page_controller.ex
defmodule AppengineExampleWeb.PageController do
  use AppengineExampleWeb, :controller

  def index(conn, _params) do
    render conn, "index.html", words: AppengineExample.Words.get
  end

  def save(conn,  %{"word" => word}) do
    AppengineExample.Words.put(word)
    redirect conn, to: "/"
  end
end

When running the server locally, I could update the code of a running server with something like:

# create the initial release
env MIX_ENV=prod mix release --env=prod

# run the server in the background
env PORT=8080 _build/prod/rel/appengine_example/bin/appengine_example foreground &

# edit mix.exs to bump the version
vim mix.exs

# create the upgrade release
env MIX_ENV=prod mix release --upgrade --env=prod

# deploy the upgrade
_build/prod/rel/appengine_example/bin/appengine_example upgrade 0.0.2

Crucially, state stored in the Agent was preserved throughout this upgrade.

When performing an analogous update on the Google Cloud app, this isn't true. GCP appears to be following the usual process of spinning up a new version of the app, then cutting over at the load-balancer level.

For webapps in frameworks like Rails or Django, this isn't a problem, as you don't store state in the app server. To achieve something similar to what I did using Agents in Elixir, you'd use Redis or similar.

However, for Elixir apps it's a shame, as process- or Agent-based state is extremely convenient. Not only that, but because it's so idiomatic there's a plethora of documentation encouraging its usage. New users in particular might get a nasty shock if they lean on process-based state then find that app updates wipe it out.