Transactional workloads are commonly defined as being small atomic pieces of work. This is in contrast to decision support, Data Warehouse, Business Intelligence or otherwise labeled reporting systems that require fewer, larger, more sequential workloads. Updates, inserts, deletes and even small result-set selects are all included in OLTP, transactional efforts.
A defining bottleneck for an OLTP database transaction is that it cannot return as committed to the user until it has been fully committed to storage. This means that the faster (lower latency) a system can write to storage, the faster it can commit transactions. So, latency is the key defining metric for how fast the log writes can be. (This, by the way, is the obvious reason for choosing flash as your storage media for OLTP workloads.)
Also, it is commonly thought that log files are sequential in nature and therefore a random access storage architecture like Violin won’t be much faster. But, it is most common that there are many databases in an environment, the storage/SAN folks have created an ‘expensive’ RAID 10 over two spindles and have therefore effectively stacked several log files over this same disk pair. This means that the sequential file you thought you were writing to is sharing platters with all the other log files creating a sprinkler effect on the swing arm. Logically sequential and physically sequential are two different things and layers of abstraction aren’t always your friend.
Less obvious still is that before a transaction can be made on a database page, the page must be in memory. This makes the data reads required to start the transaction "virtually synchronous" in that they have to happen before the write of the log can happen.
Application Locks vs RAM Utilization
It is very common for designers to create an "unrelated ID" scheme whereby the records that will need to be touched in a simultaneous fashion will be purposely placed on different database pages. This helps avoids lock and latch based blocking issues. But, this causes data to be spread all over the storage system and eliminates adding more RAM as a solution. And, the event of bringing one page into the memory buffer does nothing for pre-loading the next user’s data into memory as the second user is purposefully placed on a different data page. So, data reads are purposely designed to be a storage hit versus a buffer hit.
Transactional workloads usually have many users each doing tiny bits of work. For each transaction, when the data page is not found in the memory buffer, the engine must release an asynchronous read IO and go idle until it returns. The scheduler will now context switch the processor core to go work on another transaction which will release yet another asynchronous IO and go idle. This continues on until an IO returns or all transactions are in an idle state. With modern processors running in the gigahertz range a multi-millisecond round trip for an IO can leave a CPU mostly idle. The faster IO’s can be returned the higher the CPU utilization can go and the more of the licensing fees for the deployed cores can be justified.
Distributed Block Architecture
In the case of Violin’s memory-like architecture, every storage address is equally accessible at the same speed at all times. The more cores, threads and workloads are thrown at the system the faster it will go. With spinning disks the more random the workload the more the swing arm thrashes and the throughput goes down. Spinning media is designed for large sequential IO’s whereas Violin’s random access storage is designed specifically scale with your workload. Rather nice, don't you think?
It is time to make sure that your multi-CPU, multi-core system can run at full throttle. If the CPU isn’t peaking while queries linger, if transactions per second have capped while the CPU isn’t pegged, look to slow storage latencies keeping the engines at idle.