Storing JSON Blobs in Amazon S3 With Elixir
The idea for this project of using S3 with Elixir came from an actual project and a blog post I stumbled on.
I was working on a Ruby on Rails application with millions of rows of data quite a while ago. We would often pull reports from this data and since it was on the Heroku platform, we would often see these warning messages in our slack monitoring channel about memory consumption being over 100%.
We were worried that we wouldn’t be able to sustain this approach anymore, and so we wanted to start archiving the data that didn’t need to be used in real time anymore. I found this great post on the moz blog about how they essentially used a cold storage solution in Amazon S3 buckets (and here, I assume they mean they used something similar to a nested json data file based on their description).
JSON Blob Experiment
This led me to start experimenting with storing json files (let’s call them “json blobs”) on Amazon S3 with Elixir. This followed the approach from Moz described in the aforementioned blog post as well as allowing me to play with Elixir.
Here is a 3 step approach I used to have an Elixir project store json blobs on S3.
Note: At the time I created this, Phoenix 1.2 was the current version. I’m updating the commands to use phoenix 1.3, although the structure of the code may resemble Phoenix 1.2.
Step 1: Create Mix Project with ex_aws
First, I generated a new Phoenix project.
mix phx.new json_blobber
In the mix.exs file, I also added the ex_aws package as a dependency for storing files in Amazon S3 as shown below.
# Type `mix help deps` for examples and options.
defp deps do
[{:phoenix, "~> 1.3.0-rc"},
{:phoenix_pubsub, "~> 1.0"},
{:phoenix_ecto, "~> 3.2"},
{:postgrex, ">= 0.0.0"},
{:phoenix_html, "~> 2.6"},
{:phoenix_live_reload, "~> 1.0", only: :dev},
{:gettext, "~> 0.11"},
{:ex_aws, "~> 1.1"},
{:sweet_xml, "~> 0.6"},
{:hackney, "~> 1.7"},
{:poison, "~> 3.1"},
{:cowboy, "~> 1.0"}]
end
Step 2: Module to Generate the JSON records
The next step was to create a JsonGenerator module. This module simply generated the “json blobs” I wanted to store on S3 as shown below.
defmodule JsonGenerator do
def generate_records(n) do
for x <- 1..n do
json_record
end
|> Poison.encode!
end
def json_record do
%{video_id: :rand.uniform(200000), snapshot_id: :rand.uniform(56000), view_count: :rand.uniform(50000), comment_count: :rand.uniform(500), like_count: :rand.uniform(150), dislike_count: nil, share_count: :rand.uniform(400), created_at: random_date_string, updated_at: random_date_string, status: "fetched"}
end
defp random_date_string do
[year, month, day, hours, minutes, seconds] =
[Enum.random(1900..2020), Enum.random(1..12), Enum.random(1..28),
Enum.random(0..24), Enum.random(0..59), Enum.random(0..59)]
|> Enum.map(fn(e) -> e
|> Integer.to_string
|> String.pad_leading(2, "0")
end)
"#{year}-#{month}-#{day} #{hours}:#{minutes}:#{seconds}"
end
end
Step 3: Module to Write to AWS
To hew to the “separation of concerns” principle, I created another module specifically to write to AWS. Using Elixir’s import statement, I imported the JsonGenerator module to use the json blob generating functionality I created in Step 2.
The API for the JsonAws module is simple. The write_records method writes a user-specified number of json blobs up to S3 using the configured S3 bucket. The read_file method then reads the file that stores the json blobs.
defmodule JsonAws do
@s3_bucket Application.get_env(:ex_aws, :s3_bucket)
import ExAws
import JsonGenerator
def write_records(n) do
n
|> generate_records
|> (&ExAws.S3.put_object(@s3_bucket, "json_file.txt", &1)).()
|> ExAws.request!
end
def read_file(file \\ "json_file.txt") do
{status, data} = file
|> (&ExAws.S3.get_object(@s3_bucket, &1)).()
|> ExAws.request!
|> parse_json
end
defp parse_json(%{body: body, headers: headers, status_code: 200}) do
body
|> Poison.decode
end
end
Step 4: Set Up Your AWS Keys
One of the last few steps is to configure ex_aws. You can simply copy the config.exs file below.
In config.exs:
config :ex_aws,
access_key_id: [{:system, "AWS_ACCESS_KEY_ID"}, :instance_role],
secret_access_key: [{:system, "AWS_SECRET_ACCESS_KEY"}, :instance_role],
region: System.get_env("AWS_DEFAULT_REGION"),
s3_bucket: System.get_env("S3_BUCKET")
To make this all work, you’ll have to setup your AWS keys in your Amazon account and then put those values in a .env file in the top level of your application.
Set up a .env file as follows:
export AWS_ACCESS_KEY_ID=yyy
export AWS_SECRET_ACCESS_KEY=yyy
export AWS_DEFAULT_REGION=us-east-24
export S3_BUCKET=yyy
You get these values from the Amazon AWS management console. You may have to sign up for an account if you don’t have one.
Step 5: Run Your New Program
Finally, boot up an iex session as follows and run write_records method specifying the number of json records you want to create and store in S3.
$ iex -S mix
iex(1)> JsonAws.write_records(10)
Summary
Boom! And that’s all there is to storing simple json data on S3. It’s a good way to get your feet in Elixir.