Beyond the Distributed Monolith: Rearchitecting the Big Data Platform

The BBC’s Audience Platform Data team collects, transforms and delivers billions of events each day from audience interactions with mobile apps and web sites such as BBC News, BBC Sport,  iPlayer and Sounds.

Last year we migrated to a new analytics provider and we took this as an opportunity to re-architect our distributed monolith. We will share the lessons learnt from operating it for nearly 3 years, how we designed our new microservices architecture so that it is easier to test, scale to cater for increasing demand, keep track of the message flow and replay errors without stopping the rest of the messages from being processed. 

We will also discuss the ideas behind the tooling we have developed which helps us operate our pipeline and has helped new members of the team share the understanding required to troubleshoot problems.

We have been in production for over a year and as demand from our big data platform increases we are beginning to discuss what our platform may look like in the future and the steps we will go through to achieve it.

What are you doing now?

I am a Principal Systems Engineer at the BBC, and within the BBC I work in the area that deals with our personalizations in services. What this means is that when you sign up with an account with the BBC, then you enter all this personalization features and you can get show recommendations, you can follow shows, stay up to date by receiving notifications when things that you are interested in are becoming available and things like that. For those personalizations there's a lot of data that is involved from the time that you raised your account and you provide us with your personal information to then saving all the different activities and tracking that with do throughout the website. This is the team that I'm in, and that's called the data platform team. We aim to be the single point of reference within the BBC where we store all this data and we make it available for internal usage within the BBC. There's teams of data scientists in different areas who are reporting on the data, who are trying to use it to understand whether the features that we are providing and the products that we have are actually serving what people need.

What is the goal for your talk?

The goal is about rearchitecting the monolith. We used to have a different data platform a few years ago. It was not intentional, but it ended up being a distributed monolith. And this was because at the time that we realized we had operational issues, we saw that even though it was a bunch of microservices, at the end of the day, it was just really, really hard. And it appeared that it was this big thing that was just blowing up and there was no way to cope with it. For a couple of years, we had to operate this platform and there wasn't really a business case to change and re-architect the whole thing. But because things end up changing in the BBC, we migrated to a new analytics provider. And this is when we saw the golden opportunity to re-architect. Sometimes for the BBC we're seeing great increase of data based on a single news item. We can have spikes in load. How can we cope with that? When our users are querying the data, they don't really see the impact of what's going on. My main goals of a talk are applying the lessons learned, then how microservices limit failure, how you can recover and how we also cope with different loads and data evolving over time.

You also mentioned in your abstract that you developed your own tooling to operate your pipelines. Can you give us a little preview of what tooling you developed?

Because this microservices architecture can be quite complex, you have to know if you're operating it, the name of the services, where do they reside, how do you deploy this, a lot of names of things that you need to keep track on. Normally most companies have what is called run books. By the end of the day, this run books are these wiki documents that tell you what are the links to all these things. And there's a lot of nitty gritty detail that you have to know. And the interesting thing is that we thought, well, we've had a lot of new people joining the team and they only need to really understand a high level. How do you connect to certain services? We developed a command line interface, that you could post very simple questions. Developing this command line interface has allowed us to automate a lot of this text so we don't have to manually intervene as we used to do back in the day of the distributed monolith.

What would be the key takeaways?

The big lesson is that microservices architectures are always evolving. Whenever you build something, always think that it should be easy to change. If you make different components easily replaceable, it is going to make your life easier in the future. Another big question was that when you are designing, think ahead how you want to operate or how you think you would like to operate that new architecture. Because if you think about it at the end, then you probably haven't addressed the questions or don't have enough metrics or this is really hard to expand. And also, invest in testing in the early days. Think of unit tests on the early days and create a framework for testing. Another thing is the need to discuss the technologies to use so people would choose the language for their microservice taking full ownership and responsibility for their work. And the last one, how do you do this in a cost effective way? We spend quite a bit of time doing cost forecasting, how different technologies would impact the cost.


Blanca Garcia-Gil

Principal Engineer on data platform @BBC

Blanca Garcia Gil is a principal systems engineer at BBC. She currently works on a team whose aim is to provide a reliable platform at petabyte scale for data engineering and machine learning. She provides leadership on ensuring that the development team has the correct infrastructure and tooling...

Read more
Find Blanca Garcia-Gil at:


Fleming, 3rd flr.


Next Generation Microservices: Building Distributed Systems the Right Way


Interview AvailableLondonDistributed Systems


From the same track

SESSION + Live Q&A Microservices

Monolith Decomposition Patterns

Patterns to help you incrementally migrate from a monolith to microservices. Big Bang rebuilds of systems are so 20th century. With our users expecting new functionality to be shipped more frequently than ever before, we no longer have the luxury of a complete system rebuild. In fact, a big bang...

Sam Newman

Microservice, Cloud, CI/CD Expert

SESSION + Live Q&A Interview Available

Monitoring All the Things: Keeping Track of a Mixed Estate

Monitoring all of a team’s systems can be tricky when you have a microservice architecture. But what happens when you have many teams, each building systems using totally different technology stacks? Add in decades of legacy systems and a sprinkling of third-party tools and you’ve got...

Luke Blaney

Principal Engineer Operations and Reliability Programme @FT

SESSION + Live Q&A Distributed Systems

Why Distributed Systems Are Hard

Every company that has adopted microservices architecture operates a complex distributed system. It's basically a full-time endeavor to keep up with the ever-changing landscape of technologies and tools to build, maintain, and scale these towering production systems, but the fundamentals of...

Denise Yu

Senior Software Engineer @GitHub

SESSION + Live Q&A Silicon Valley

To Microservices and Back Again

From the start, Segment embraced a microservice architecture in our control plane and data plane. Microservices have many benefits: improved modularity, reduced testing burden, better functional composition, environmental isolation, and development team autonomy, etc. but when implemented wrong...

Alexandra Noonan

Software Engineer @segment

PANEL DISCUSSION + Live Q&A Microservices

Panel: Microservices - Are they still worth it?

Lots of us have moved away from monolithic architectures and embraced microservices but do we see the bang for the buck? Is the impact they are having a positive one or negative one? Is there an alternative middle ground? Have we learnt how to wrangle all the operational complexity inherent with...

Luke Blaney

Principal Engineer Operations and Reliability Programme @FT

Alexandra Noonan

Software Engineer @segment

Manuel Pais

IT Organizational Consultant and co-author of Team Topologies

Matt Heath

Senior Staff Engineer @Monzo

View full Schedule