AWS’s DynamoDB is a NoSQL database. It is also part of Amazon’s serverless offering meaning that you don’t have to carry the administrative burden of operating a managed database cluster and only pay for what you use in the serverless sense. You can use DynamoDB as part of the AWS Free tier.
NoSQL vs. SQL
There are some key differences when comparing NoSQL databases and SQL databases. SQL databases, often called RDBMS or relational databases are table based but NoSQL databases can be document based, key-value pairs, or graph databases. If you know about data structures, NoSQL can be structured similar to a Hashtable where you have a key and can quickly retrieve values from the database in that fashion.
Relational databases structure their tables and can be queried with SQL as follows:
SELECT * FROM USER WHERE LASTNAME='SMITH'
SQL(Structured Query Language) is not supported in DynamoDB
DynamoDB does not support SQL
In a DynamoDB table you will need to identify a primary key to uniquely identify items in the table with two kinds of primary keys:
Partition key (a single attribute only) : this single value will determine where the item will be stored with the use of an internal has function. It is best to use values for the partition key which are of high-cardinality(email address, employee number, session id)
Partition key and sort key (composite primary key) : the partition key value will be used as input into the hash key to determine the partition where the item will be stored. Items with the same partition key will be stored in the same location but they must have different sort keys. The partition key is known as its “hash attribute” and the sort key is known as its “range attribute”.
A key point to take away with DynamoDB is that it a serverless service and you only pay for what you use. Scanning whole tables is very inefficient and costly, so a proper design and structure will help retrieve data effectively and in a more cost effective manner. Scanning the whole table is something to be avoided for this reason.
In DynamoDB, the partition key usage will allow you to search for items based on the primary key alone. This means that “out of the box” DynamoDB functions very much like a Hashtable. To be more flexible DynamoDB offers two additional secondary indexes to create more flexibility in querying your data, these are:
Global Secondary Index(GSI) : An index with a partition key and sort key that can be different than those on your table. These can be created at any time on an existing table. Limit of 20 GSI per table currently. There is extra cost to store the global secondary index and this should be considered when designing the tables.
Simple GSI(username as Partition Key, email as GSI)
In the example above we have username as both the Partition Key and the Primary Key. We can search the DynamoDB table by default with the username. If the use case presented itself where we needed to search by email, we would create a Global Secondary Index on the email field to efficiently be able to do this. The GSI will increase cost more but will help find users by their email quickly and efficiently without scanning the whole table which will cost more.
Local Secondary Index(LSI) : This index has the same partition key as the table but a different sort keys. Special consideration: they can only be created at table creation time. Limit of 5 LSIs per table(you can file a support ticket to AWS to extend this if you need). An LSI extends the functionality of the sort key to other attributes so that you can perform more optimal queries which help lower costs and improve performance. LSIs can only be applied to the partition key.
Simple LSI (Composite Primary Key(state/city), population as LSI)
In this example, we have a composite primary key of state and city. This will allow us to naturally query by the composite primary key of state and further narrow down by the city. If we had an attribute such as population, we could add an LSI on this field to help query on population more effectively. This would help assist in queries where you wanted to find out which cities had the highest or lowest populations within each state.
When setting up a DynamoDb table you can set up the provisioned capacity which consists of setting the Read Capacity Units and the Write Capacity Units. A lower number for both will lower your costs but if you do not set a high enough number throttling may occur and hamper your performance.
One read capacity unit(RCU) represents one strongly consistent read per second or two eventually consistent reads per second for an item up to 4 KB in size.
If you set a table with 10 RCUs it will allow you to perform 10 strongly consistent reads per second or 20 eventually consistent reads per second for items up to 4KB.
On write capacity unit(WCU) represents one write per second for an item up to 1KB in size.
If you set a table with 10 WCUs, it will allow you to perform 10 writes per second for an item up to 1KB in size per second.
If your rows are less than 1KB and you do not make queries often, your database might be able to get away with setting an RCU of 1 and WCU of 1 and not experience any throttling. You can see that the larger the size of the row and the number of operations done per second that increasing the RCU and WCU will become warranted.
In 2018, the option of the On-Demand mode (rather than Provisioned) was introduced which will allow your DynamoDB table to scale according to workloads as it ramps up and down. You need to set the Billing Mode to be PAY_PER_REQUEST. However, it must be noted that the On-Demand mode is not part of the free-tier.
Some other additional thoughts on DynamoDB
Another item to be aware of with DynamoDB, is that you cannot search values with the LIKE operator as you can in SQL. You can only query string contents with the begins_with function. This is another reason that the structure and indexing of Dynamodb tables must be well thought out to start with to avoid a bad design which will result in costly rework or performance.
DynamoDB is a great service but some situations are not a good fit. Situations that can introduce complexity for DynamoDB include :
- when your access patterns are not clearly defined (which may result in costly refactoring)
- when multi-item or cross table transactions are required
- when complex queries and joins are required
- when real-time analytics on historic data is required
Our next article will include some real code illustrating a hands on approach to creating 2 DynamoDB tables and demonstrating how they are populated, queried and used in general.
If you wanted to proceed with the Hands On segment for these concepts, please proceed to DynamoDB Introduction Hands On With GSI and LSI
Feel free to get in touch with us at Xerris and we can help you with your cloud focused solutions!