Boosting Efficiency: Switching to Elixir & Oban

I recently completed a major backend overhaul for my client’s Flutter app. The app helps people locate their favorite non-alcoholic beverages by displaying brand locations on a map. It's served by an incredibly complex backend that scrapes various brand store locators weekly, cleans the data, and updates a Firestore database for the Flutter app to consume. At the time of this overhaul, the backend supported 26 brands, with hopes to support around 3 times as many.

When I joined the project, this crucial part of the infrastructure hadn't been working for months. The developers who created it had finished their contract and left a fragile system without documentation. Even worse, the client didn't know that the backend had stopped working. I was initially hired to add new features, but it quickly became clear that the existing backend was not scalable and didn't meet their needs.

Inheriting poorly documented legacy code is my specialty, so I jumped in to assess the situation. What I found was a mess: a system that was complex in design but lacked scalability, fault tolerance, and proper error handling. Firebase costs were rising, processing times stretched up to 20 hours per run, and infinite loops effectively crippled the system. It became clear that a complete redesign was the only sustainable path forward.

The State of the Legacy System

I needed to figure out how things currently worked, and what technology was involved to wrap my mind around the state of things and determine the path forward.

The issues were overwhelming. The weekly job used too many resources and only occasionally finished without errors or timeouts, the original flowcharts were outdated, and I had to manually parse the code to understand it and create new documentation to keep me oriented.

Captain Picard facepalm

The legacy stack:

Flutter: The mobile app framework.
TypeScript: The language for Google Cloud Functions, embedded in the Flutter app.
MongoDB: A NoSQL database used to store “dirty” data from the scrapers.
Firebase Hosting: Hosting integrated with Google Cloud services.
Google Cloud Functions: Used to run weekly cron jobs for scraping and cleaning data.
Legacy Yelp API: Used for location data, with strict rate limits of 5,000 requests/day.
Firestore: A database for “clean” data consumed by Flutter.
Codemagic: CI/CD for mobile app builds.

My best attempt at visualizing the chaos:

sequence diagram showing the legacy state of the backend

Choosing the New Tech Stack

While my background is in Ruby on Rails, I knew it wasn't the right tool for this job. I considered using Python, but after discussing it with a colleague who encouraged me to use Elixir and doing some comparative research, I decided that the benefits were worth the effort to learn something new. Elixir's strengths in concurrency, scalability, and fault tolerance made it the clear choice to get this backend back on track and support the client’s future goals.

Taking into account all my colleague’s suggestions, I landed on this new stack:

Elixir: A functional programming language built for concurrency and scalability.
Phoenix: A framework for Elixir to decouple the backend from the Flutter app.
Supabase: Replacing MongoDB’s NoSQL with PostgreSQL for relational data.
Oban Pro: A background job processing library for Elixir, replacing Google Functions.
Google API: Providing more robust data than Yelp, with a pay-as-you-go model.
Fly.io: For hosting.

I also came up with a new strategy that reduced the number of steps and complexity:

Diving into the project

Adapting to Elixir and functional programming was a challenge. I started by reading From Ruby to Elixir by Stephen Bussey to get a high-level handle on the syntax and quirks from a Rubyist perspective. The Elixir community, StackOverflow, and GitHub offered only sparse guidance to the specific challenges I was encountering, so getting my footing was difficult. I am used to the robust user base of Ruby and Rails, so this was a big shift.

Migrating Scrapers

Starting with the scrapers was the obvious entry point to deepen my understanding of Elixir syntax, pattern matching, and functional programming. Completing that step felt like a huge win and gave me the confidence to move forward with the more complex portions of the system.

The original backend supported 26 brands, but many used similar store finder APIs. Fortunately, I ended up with only 8 unique API types. Each API had its quirks—some returned paginated results, others JSON, HTML, or XML. The original system looped through 1,000 cities; I optimized it to 100 strategically selected zip codes while maintaining coverage.

Working through these scrapers, I started thinking more strategically about how the background jobs would work. There would ultimately be a lot of trial and error as I figured out which Oban modules suited this system best.

Implementing Oban for Background Jobs

Why Oban?

Google Functions were a poor fit for this use case. Jobs couldn’t be stopped once initiated, only deleted, and debugging often required sifting through hours of logs. Oban, by contrast, offered features like clear error logging, automatic retries, and a web UI for job management. It became the backbone of the new backend, adding transparency and ease.

Using Batches

Tracking scraper completion was critical. I implemented Oban Batches to track progress and handle callbacks. Admittedly, I went back and forth between Batches and Workflows until I ultimately landed on batches. Batches provide handle_completed and handle_discarded hooks, enabling seamless transitions between stages of the scraping workflow. This approach also ensured that failed jobs didn’t block the entire process.

Moving to Supabase

In their own words, “Supabase is an open-source Firebase alternative.” It uses a Postgres database, which I prefer over NoSQL databases like Firestore and MongoDB. Additionally, Supabase works well with Elixir, and their Realtime architecture is even built with Elixir.

The legacy app used a shared cluster on Mongo, which led to many avoidable bugs. If the process ran during high-traffic times, timeouts would cause data discrepancies or even crash the entire process. Switching to a Pro Supabase subscription currently costs $25/month. While Mongo was cheaper at about $10/month, the shared cluster was unreliable for the demands of this system, and a dedicated cluster starts at $57/month. Ultimately, having a reliable and easy-to-use system outweighs a slight reduction in operating costs.

Cleaning Data

The data scraped from the store locators is shaky at best. Each location for each brand is entered by a human, with varying degrees of accuracy. All the sanitization, standardization, and matching are done using functions, so I likely didn’t catch all the edge cases since I did not utilize AI for any functionality.

Standardization

One of the most important pieces to standardize was the address. The likelihood of multiple brands existing at the same location was high (e.g. a grocery store would likely carry many different brands). If the addresses entered into various brands looked wildly different, I would have difficulty ensuring extra work wasn’t being done. With the APIs used, address data could be expressed as any of these: full_address, streetaddress, address, street, address_line_1, add1.

In addition, the location name also needs to match as closely as possible. There is a whole range of errant characters coming in that need to be stripped out. Data would be cleaned as it came in from the scrapers, making it available for the next step.

Enhancing Data with Google Places API

The old backend used an outdated Yelp API with a strict limit of 5000 requests per day, and it was linked to one of the original developers. This API was used to get the Yelp ID for a location, gather additional data like venue type, and check if the location was still open. Incoming locations were compared against a Yelp endpoint to find a match and collect extra data. Then, a check was performed on all the "Yelp locations" to see if they were still in business. Since there was no effort to ensure that location data was unique for each brand, the brand tables became bloated with duplicate data. Data integrity suffered because the rate limit was reached quickly, so most locations were never verified as operational.

Why Google Places API?

The Google Places API operates on a pay-as-you-go model, offers higher-quality data, and has no rate limit. The endpoints provide more flexibility for finding the corresponding Google Place ID needed for gathering data. Using a combination of Text Search and Nearby Search, I collect and store the place IDs using the standardized data from the previous step. There is some difference between the venue types offered by Yelp and Google. For example, Google doesn’t provide information on LGBTQ+ locations, a filter type in the app, and something the Yelp API includes. While omitting this data isn't ideal, it is a relatively small tradeoff considering the overall benefits of this new API.

Comparing Geohashes

Creating the geohash

After gathering the necessary data from the Google API, the Google locations are finally ready to be added to an existing or newly created geohash. Working with Firestore via Elixir was not easy. The Firestore ecosystem works best with Javascript and mobile dev languages, so I had to roll a lot of custom code to make communications between the two work.

To get the geohashes as close as possible to what Firestore calculates—since I was working with a large amount of existing data—I needed to replicate the GeoFire library. The encoding was straightforward once I realized that this was the library being used. I created a service, input the latitude and longitude, and received a 10-digit geohash. From there, I added all the necessary data into the geohash, including the brand array, alcohol percentage array, and several required fields for the Flutter app.

Supabase geohashes vs Firestore geohashes

We're finally at the most important piece: comparing all the geohash data in Supabase with the geohash data in Firestore. First, I fetch all the geohashes from Firestore and store only the essential data in a temporary table on Supabase. For reference, there are about 30,000 geohashes, each potentially containing 0 to over 25 locations. That's a lot. I also needed to standardize the incoming Firestore data to make it comparable to the Supabase data.

Once I stored the Firestore geohashes locally, I enqueued batches of 250 Supabase geohashes. Next, some pretty cool pattern matching happens to compare the two:

defp compare_geohashes(nil, nil), do: :ok

# Supabase has the geohash but Firestore does not
defp compare_geohashes(supabase, nil) do
 # add to new_places temporary table
end

# Firestore has the geohash but Supabase does not
defp compare_geohashes(nil, firestore) do
  # figure out if we need to add to delete_places temp table
end

# Geohash exists in both Supabase and Firestore
defp compare_geohashes(supabase, firestore) do
  # figure out if we need to add to new_places,
  # delete_places, or edit_places temp tables
end

There are three temporary tables that store geohash5 (the first five digits of the previously created geohash) and place_id. Initially, I used ETS as lightweight virtual tables. However, when testing in production with multiple processes, I realized this approach was flawed because I kept losing all the data in those tables. As a result, I had to switch to using these Supabase tables on the fly. Once all the comparisons are done, a Slack message is sent to the client with the counts for Add, Edit, and Delete.

Sending data to Firestore

As I mentioned earlier, there isn't a straightforward way to send data from Elixir to Firestore, so this final step took most of my time. Firestore and the Flutter app need the data in a specific format, and it took a lot of trial and error to get that format right.

I utilized batches again, 25 geohashes at a time, across create, update, and delete. While patching the legacy backend, I found that any more than that ran the risk of Firestore timing out. Sending one by one would have run the risk of hitting 429 Too Many Requests with all the concurrent hits.

Firestore Client

I created a custom Firestore client to sanitize, unwrap, and wrap data, depending on the direction it was going, as well as fetch specific geohash records and build the payload. I thought of trying to turn this into an open-source library, but we shall see, it feels pretty niche to my project. Using pattern matching again, I have 25 functions called wrap_value_for_firestore to handle formatting the data about to be passed to Firestore.

I had to manually build my payloads to send data from Elixir to Firestore. For delete and update operations, I retrieved the existing Firestore geohash data, removed or updated the necessary places, and then sent back the updated and wrapped geohash data. While creating new entries, I checked if a Firestore geohash already existed. If it was empty, I created a new one; if it existed, I merged the new place with the existing ones. For newly created Firestore geohashes, I needed to capture the document ID from the callback and update the new record with additional information required for the Flutter app to load.

Once I got all that working and verified on staging, I was ready to set up production and deploy.

Production Deployment

Deploying to Fly.io was a new experience for me. Configuring secrets was mostly straightforward, except for Oban. I occasionally need to pass the Oban Pro auth key as a build secret when deploying. The Google service account setup is also a bit tricky. DevOps isn't my strong suit, but Fly.io makes it quite user-friendly — they use Docker images and .toml configs. I had to adjust the configs to ensure everything ran smoothly without running out of memory. I ended up using the “performance” CPU because of the large amount of data being processed. I'm sure there's room for improvement in my configurations, but I was relieved when the deployment was successful.

Conclusion

End Results

Processing times were reduced from between 14-18 hours to under 4.
The system is now scalable and fault-tolerant, ready to handle more brands.
There is comprehensive documentation and an easy-to-follow workflow.

Opportunities for improvement

There are a few areas where performance can be significantly improved. For example, after google_places are created, we need to ensure all brands are included from the newly scraped locations. This involves removing outdated brands and adding new ones, which is currently very slow.
Now that all the Oban Batches and workers are running successfully, I might find a better way to organize them, possibly by using Oban Workflows.
I am currently using only 2 of the several Oban Batches callbacks. I could make better use of the handle_discarded callback.
More tests. We could always use more tests.

Personal Takeaways

This project is the most challenging thing I've ever done. There were times when I doubted I could finish it. It was scary and difficult. It felt impossible at times, but somehow I managed to succeed.

Solo development is challenging even on a good day. I love being an independent programmer but have mixed feelings about the isolation. I like being able to focus deeply with few distractions, but I sometimes miss having someone to share ideas with, help me when I'm stuck, offer a different perspective, or share the workload so I can take a break.

I'm glad I took my colleague's advice and wrote this backend using Elixir and Oban. I was consistently impressed by the elegance and speed of these technologies. I still see myself as an Elixir beginner, but I'm excited to find more opportunities to build and improve my skills. The potential is enormous, and this language is truly underutilized.

That’s it. Thanks for reading. 🖤

Enhancing Performance: Moving from Typescript & Firebase Functions to Elixir & Oban