Modern Batch Processing
2 min read
Batch processing is not a new concept. In fact, it may seem a bit old fashioned because it’s been around since the days of the mainframe. Rather than shy away from something that may seem outdated or uncool, Runly embraces batch processing as a core concept, reinvented for asynchronous applications.
Batch processing is an abstraction — a programming pattern — that fits most of today’s background job problems. Let’s dive into the reasoning behind Runly’s job architecture and see how batch processing is done.
Batch Processing Redefined
First, let’s define batch processing. There are two parts: a list of one or more items and an action that is performed for each item. Runly jobs perform batch processing and can optionally do something before and after.
So what kind of data sources look like batches?
- API calls that return a list
- File or memory streams
- Database queries
- Message queues
- Event publishers
Developers typically think of the first three examples as cases where the number of items is finite and knowable, even if the items are streamed in such a way that a total count is unknown until the end of the data is reached. The defining characteristic is that there is an end to the data. These are examples of processing a list.
Queues and events are usually thought of as indefinite data streams. The message queue processor or event handler will wait idly by until the next message or event occurs, with no end to the data stream ever occurring. These are examples of processing an event.
Modern Batches
The modern batch takes on properties of both a list and an event processor. This mirrors trends in language libraries moving to support asynchronous list iteration by introducing async move-next operations, unifying the two types of data streams.
Jobs are implemented using asynchronous list patterns in each of the supported languages. It doesn’t matter whether a job processes lists, streams, messages, or events, they’re all a natural fit.
What if your job doesn’t have any items?
That’s easy, running a block of code with no items is a simple edge case that fits this pattern too. From our definition of batch processing, it’s just an action with a single item.
It’s not your traditional group-of-things-in-a-transaction batch, but the structure of a group being processed together is a simple pattern to use as a foundation for more complex workflows.
Runly is unique because we take this approach in order to provide features that developers would otherwise have to build on their own. Out of the box, Runly offers valuable features that don’t come with competing frameworks, such as:
- Individual item retries
- Item-level debug and performance data
- Job progress API and components
- Multi-threaded execution
This isn’t your dad’s batch processing. Runly is a modern, cloud-friendly, intuitive approach to breaking down background work.
Runly is open-source and the API is free for individuals.
Let me know what you think on twitter @WillSossamon.