I’m currently working on a project that will store large amounts of time-series data. I need to write this to a database very quickly, and read it back even faster. This workload isn’t really suited to traditional SQL Servers, and I don’t fancy myself taking on SysAdmin duties for what is currently a side-project, so I’m testing out Azure Table Storage as a possible solution.
Table Storage is Microsoft’s take on a persistent key-value database (similar to Redis). However, Table Storage makes it easy to store multiple fields per-key and partition this data seamlessly across hosts (managed for you based on whatever partition keys you set). Azure Table Storage is designed to magically scale across machines without you noticing.
I read a particularly good article from Troy Hunt showing his real world usage and outlining the performance he was able to achieve when storing 153 million records. The article appealed to me because it’s an honest warts-and-all look at the platform, using very basic client-side code.
I’ve run into a few cryptic error messages while prototyping a solution with the Python library (although these errors are reported directly from the Azure API so I think it’d apply universally).
The most frustrating error I’ve received so far is “One Of The Request Inputs Is Not Valid“. I’ve been using batching to insert a variable amount of records per-second (between tens and hundreds of records), but it was throwing this error pretty consistently. Googling returns more people complaining than actual answers. As far as errors go, it’s pretty unhelpful.
If you look at the error message carefully, you’ll notice it’s actually prefaced by a number. (e.g. “99:One Of The Request Inputs Is Not Valid”). This number seems to indicate the record number that is causing the problem.
Initially, my errors were caused by trying to store non-string values in the RowKey and PartitionKey fields. I thought I’d be smart and store floats in there (as these are the only two indexed fields and I wanted to primarily query based on a specific float value). However, it seems Strings are the only support datatype for these two fields. Also, make sure you exclude all these characters:
- The forward slash (/) character
- The backslash () character
- The number sign (#) character
- The question mark (?) character
- Control characters from U+0000 to U+001F, including:
- The horizontal tab (t) character
- The linefeed (n) character
- The carriage return (r) character
- Control characters from U+007F to U+009F
After fixing that, I was still getting errors. It turns out batches can only accept 100 records. If you want to send more than 100 records at a time, you must split these up into multiple batches.
It also turns out the Python Batching code has changed since v0.30.0:
Table batches are constructed using the Batch class rather than turning batching on and off via the TableService. The TableService can then execute these batches using commit_batch(table_name, batch).
Azure Table Storage Batching Example Code
Here’s some sample code to demonstrate how I’ve been doing batching with the latest version of the Python SDK:
Hopefully this information helps you debug your Azure Table Storage code!