May 13, 2013

778 words 4 mins read

Wrapping my brain around DynamoDB

At work I’ve been working on a big Amazon Web Services related project. For part of this project, I wanted to record some information in a database. Previously, all I’ve used has been relational databases like MySQL. One could use AWS Relational Database Service, but that’s expensive if you’re only storing a small amount of data. I decided to take the plunge and learn how to use DynamoDB and more importantly, how to use this “NoSQL” thing people have been talking about.

First stop. What is DynamoDB? “DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability”. Next stop, what is NoSQL? “NoSQL database provides a mechanism for storage and retrieval of data that use looser consistency models”. What does all this mean? Well if you think of an RDBMS (Relational Database Management System) as a spreadsheet with a fixed number of columns, NoSQL is like a key-value array of indefinite length.

dynamo create
In DynamoDB, the primary key can be one of two types: “Hash” or “Hash and Range”. The primary key must be unique, just like a primary key in a RDBMS. If you’re using Hash and Range, the combo of the two must be unique (so you can have several records with hash of “a”, as long as the range is unique). Keep in mind that string type fields are case sensitive, so you hash “a” != hash “A”. Now, once you’ve entered values into your Dynamo primary key, you cannot modify it. In MySQL, you can modify primary key values (even if they are set to be unique). Now, one clarification, the word “range” in “Hash and Range” really doesn’t mean anything (that I can tell). Your range can be something like a date/time stamp, or it can be a string value, doesn’t matter — as long as the hash/range combo are unique.

index cards
What about indexes? Well in a RDBMS, any field or combo of fields you’ll be querying — you want to index (though this depends on the RDBMS you use, and how exactly you want to index). In Dynamo, there are a number of limitations on “local secondary indexes” (LSI). First and foremost is that you can only have LSIs if you have a Hash and Range primary key. The second limit is that LSIs are single attribute only (DynamoDB calls them “attributes”, RDBMS would call them a column). Third, you can only have 5 LSIs per table. Lastly, you cannot add/modify/remove LSI after you’ve created the table. Since this is NoSQL, all attributes beyond your primary key are flexible — but if you want to use LSI, plan carefully.

So how about an example?

One of my applications receives text messages via Twilio. Every time my number receives a text message, Twilio sends a post to my server. I decided to log every message received to DynamoDB. The table is named “SMSLog” with a primary key type of Hash and Range. I happen to know the “SmsSid” that Twilio sends is always unique, so I used that as my Hash key. I don’t actually need a Range key on this table, however I wanted to use local secondary indexes, so I added a “Timestamp” field (also a good backup in case I’m wrong about SmsSid being unique). The two LSI I added were on attributes named “Body” and “From” (because I will want to check the logs against a particular phone number, i.e. the “From” column), which you’ll notice are the names of the Twilio request parameters.

So what do I do with this table? As soon as I receive a Twilio POST to my script, I foreach through Twilios key/values and push those directly into DynamoDB. I’ve posted the code for ddbLogSMS() to show how this works. This function also takes the results from SMS messages sent outbound via the Twilio REST API. Why bother with two tables with completely different schemas when NoSQL means… No Schema! Not only can I get away with mashing two different sets of data into the same table, I don’t need to worry about Twilio changing their API and breaking my app. As long as “SmsSid” doesn’t change, my function will take any and all data thrown at it.

Unlike RDBMS, you need to specify the read & write requirements per table. Since this is a low activity application, I’ve provisioned 1 write capacity unit and 5 read capacity units (this translates to 1 write per second and 5 reads per second). In fiduciary terms, this table will cost me an entire $0.97/mo. If my application ever gets uber popular, I can always increase the capacity on the fly.