Elixir Dataflows

In support of a colleague’s data analysis, I built a live online data flow application in Elixir to ingest large quantities (100s of gigabytes) of social media data, process and filter the data, and load the data into a separate database for directy querying. We utilized an iterative, agile methodology to development the data processing and filtering techniques. I utilized Elixir for its easy parallelism and functional nature.

Flows the Wrong Way, Part 2: The Right Way


In my last post, I covered my first attempt to implement TCP streaming in Flow, a data flow library for Elixir. My first attempts involved a bunch of failed Unix sockets, and an attempt to implement a GenStage that failed for reasons I didn’t understand. I eventually settled on this:

Flows the Wrong Way: Streaming into Elixir

As part of a new and exciting project, I was faced with the task of ingesting a large amount of more or less homogeneous JSON data into a SQL database for an associate of mine to do some rudimentary business intelligence analysis on it. The context complicated things: the bulk data was a bunch of historical social media data, and in future he would also want to ingest the live API in addition to this archived historical data.