The modern web has seen to it that few other metrics are more important than speed, scalability and high availability in delivering an application to consumers. It’s lucky that computers are faster than ever and the cost of bandwidth continues to go down.
While the computing world is seeing more progress today than any other point in time, that growth isn’t happening in a vacuum. The requirements we have of our software and hardware has been drastically increasing, and at times, traditional processing, even with modern architecture just doesn’t cut it.
Could In-memory computing be the savior we’ve been waiting for?
Why Use In-Memory Computing?
Like most technology, in-memory computing isn’t applicable in every application you can possibly think of. In order to understand why in-memory computing came up as a problem-solving strategy, it helps to look at current approaches to understand why they are so flawed.
As businesses grow, the amount of data they consume grows and as new technology is invented, data that was previously useless suddenly finds a myriad of applications. This usually necessitates either vertical scaling or horizontal scaling.
Vertical scaling invoices stacking up a single computer with more processing power, memory, and disk space to meet demands. This might include moving to newer technology such as SSDs whenever possible. Horizontal scaling involves adding more computers to divide processing between them.
Vertical and horizontal scaling is incredibly useful techniques to keep the engines going on a fast-growing system. Small, incremental changes to hardware alone can’t make up for the slowest part of most modern tech stocks: the read-write cycles of disk-based systems.
One of the main reasons the use of in-memory computing has become so widespread is for its ability to dramatically reduce the amount of latency experienced by a system. RAM is inherently faster than any solid-state drive and faster still as compared to hard drives. Storing information in-memory makes it available much faster, thus faster processing.
The biggest consequence of reduced latencies thanks to in-memory processing is real-time processing. Real-time processing, as opposed to deferred or batch processing, is analyzing data and sending it where it needs to be immediately (or so fast the end-user can barely notice a delay).
Real-time processing is an application that has made itself extremely important over the last few years, with live streaming videos, games and movies being a mainstay on social media and the internet.
To tap the full extent your computer has to offer and offer real-time data, you will likely have to utilize several layers of caching and at least some amount of in-memory processing.
The importance of real-time processing should be immediately apparent in cases such as transaction processing and fraud detection.
At the risk of sounding redundant, in-memory processing is very fast. But since all this data is stored in the RAM, an obvious concern is volatility. What happens to all those terabytes of data if there’s a sudden blackout or the system crashes for whatever reason?
The developers of projects like Spark and Apache Kafka recognized the problem early on and developed the programs to be fault-tolerant. They rely on both disk and memory processing at the same time in case anything goes wrong. But this doubles up to save the day when it comes to rebooting your system, too.
In-memory processing requires that all your data should be within the RAM to be processed, but even the fastest RAM modules will take hours to fully-load terabytes of data.
Rather than waiting around for this data to be available, in-memory systems rely on the beauty of fault-tolerance to process the data that was stored on disk while the rest loads into memory. It’s virtually zero downtime.
Drawbacks of In-memory Computing
In-memory processing can make certain applications so much faster it may seem like magic to the inexperienced developer. This may lead them into a harmful spiral where they attempt to leverage it for every application that they come across, but working with in-memory computing can be tricky, and any mistakes costly.
As such, some crucial factors must be taken into consideration before the decision for a full migration to in-memory processing is reached.
Like any technology, in-memory processing has its own set of drawbacks. The one that should be most apparent is price.
By virtue of how it operates, in-memory processing requires that as much of the data to be processed fits in the RAM as possible. This means that anyone looking to utilize the technology needs a very decent amount of memory.
The cost of memory has gone down so drastically in the last decade. Anyone can afford 64 GB of RAM if they know where to put their money. However, RAM is still a lot more expensive as compared to hard drive memory and even more costly than SSDs, which are already pretty expensive, to begin with.
Limits on Database Size
It’s already been mentioned before, but RAM is a lot more expensive per GB as compared to drive memory. One of the most obvious implications, then, is that it’s nearly impossible to fit all your data inside the memory. Especially so if your company is working on a tight budget and can’t funnel hundreds of thousands of dollars into a new piece of tech.
Risk of Data Loss
The biggest risk you will have to contend with is the risk of losing all your data. RAM is volatile, which means it can only store data so long as there is electricity flowing through it. In the event of a blackout, system-wide crash or reboot, all that data is going to be lost.
Ultimately, no in-memory system can survive on its own. Disk persistence is crucial if any system is going to remain immune to data loss due to abrupt power outages. Apache Zookeeper, the same technology that gives Kafka and Spark their fault-tolerance – is an example of a library that can be used to achieve just that.