Postmortem: December 7th

December 11, 2020

Brita Ulf

min read

Overview:

Link

Hi all,

On December 7th we had elevated error rates in our database throughout the peak of the workday. This caused many users to be unable to update their Streak boxes and contacts, and caused sporadic issues loading data throughout the day.

We know you rely on Streak to keep your business running and we apologize for the problems that this caused you. I wanted to give a little bit of context on what happened, how we resolved the issue, and what we’re doing to make sure it doesn’t happen again.

For background, Streak historically has been powered by the Google Cloud Datastore database. Cloud Datastore is very reliable and easy to maintain, but it’s restrictive in the way in which we can access data. For instance, if we want to get all boxes connected to the contacts in an email thread, we have to first manually fetch all of the contacts connected to the thread, and then in a second step manually fetch all of the boxes connected to those contacts. This makes the Streak experience slower and limits the amount of context we can give you about who you’re talking to, which means more manual work for you.

To better support this, we’ve been migrating some data from Cloud Datastore onto a platform based on MySQL, an industry standard database that provides better support for these kinds of context-based questions. When we started the migration, we ran into some problems early on where the query performance was limited by Google’s hosted MySQL service’s disk performance. To work around this limitation, we dramatically over-provisioned processing power and memory to make up for the disk performance shortfall.

We wanted to make sure we had a stable foundation for future work, so on Sunday, we moved to a different set of instances that have much better disk performance. As part of that move, we moved back to the instance size we were using before the disk issues. Unfortunately, in the intervening period, we had deployed additional queries that legitimately used more of the additional processing power than we anticipated. Unfortunately, this didn’t become evident until we hit the workday peak. Since our database was at full capacity, it wasn’t feasible to migrate to a larger instance until after the workload lessened as folks signed off for the evening in Europe and North America. We made some gains by optimizing queries, but the error rate and latency metrics remained higher than is acceptable for the remainder of the workday.

On the evening of December 7th, we migrated to instances that have both the better disk performance and the higher processing power and memory. Our metrics are back to their target range, and we’ve added additional monitoring in this area.

We’ve also taken process steps to ensure that where possible we add additional capacity in advance of needing it in future migrations. We appreciate your trust in us and apologize again for the issue.

Sincerely,

Fred Wulff
Engineering @ Streak

No items found.

Product fixes and polish: Mail merge, tracking, comments, and more

Explore the latest product fixes and improvements in Streak, including updates to mail merge, email tracking, contact management, automations, and collaboration tools—many based on user feedback.

Brita Ulf

Aug 6

min read

Streak Home brings your CRM workflows, tasks, and updates into one view

Explore the new Streak Home—your centralized workspace inside Gmail. Access pipelines, tasks, updates, and CRM tools all in one place to streamline your workflow and start your day with focus.

Brita Ulf

Jul 31

min read

The unique design choices we made for Streak’s new website

We didn’t follow the usual SaaS playbook. Learn how we approached content, design, and structure to reflect product maturity, support AI search, and help real users make buying decisions.

Brita Ulf

Jul 21

min read

We're hiring

Come build something great with us.

View Positions

Postmortem: December 7th

Related articles

Product fixes and polish: Mail merge, tracking, comments, and more

Streak Home brings your CRM workflows, tasks, and updates into one view

The unique design choices we made for Streak’s new website

Try it for free.

It only takes 30 seconds to get started.

We're hiring

Company

Resources

Industry

Features