Inconsistent platform functionality – Mar 9
- March 9, 2021 at 4:00 pm #115303
Between around 15:20-16:40 UTC on Tue, a malfunction of the main database has led to repeated inadequacies in functionality of multiple services.
Among other symptoms, wordpress backends have been at times unable to register or start new events, certain ongoing live events have been turned off or discontinued, and streaming bandwidth consumption has not been accounted for. Overall this has affected an estimated 25% of customers.
Issue has been acknowledged immediately, however efforts to restore full functionality lasted more than an hour. During this time some services have been restarted or rerouted, leading to spotty downtime of the ‘my-account’ area and also the whole wpstream.net website in some regions.
Currently the issue has been isolated and full functionality is restored. Investigation into the main cause of the problem is still ongoing and we are tightly monitoring the services until a definitive fix is in place.
Will update the thread with details as soon as available.March 11, 2021 at 9:54 am #116088christinaModerator
Root of the issue has been pinpointed to a glitch in the OAuth plugin. As we’ve been running a customized older version of it, we’ll need to take the time to customize its latest version and roll it out in production. Expecting this to take up to 10 days, we’ll meanwhile closely monitor the infrastructure for recurrence of the problem.March 24, 2021 at 9:48 am #117392
A permanent solution has been gradually deployed in production over the last few days and it is now complete.
We sincerely apologize for any inconvenience you may have experienced and appreciate your patience while we worked to resolve this.March 30, 2021 at 8:51 am #118435
The issue has resurfaced and platform has once again been functioning inconsistently for about 30 minutes, starting 8:09 GMT
We are diagnosing and will post updates in here.March 30, 2021 at 1:03 pm #118525christinaModerator
Very much similar to the first occurrence of the matter, what appears to have been faulty/inefficient routines in the OAuth plugin led to a database overload causing all sorts of symptoms (see above). We realize that having been reassured by the plugin creators that this would not happen again once we upgrade (and yet it did) is no good excuse. So far, we’ve managed to apply the following countermeasures:
- drastically lighten the DB, many of the oAuth specific entries are old (yet not properly recycled), redundant, or unneeded
- increase overall DB capacity and scalability
- improved the alerting system to be notified of a similar failure minutes before it actually happens
Steps to be taken over the next few days:
- get back in touch with the plugin creators in hope of permanent solutions
- further investigate the sum of factors that led to this
Longer term goals:
- handle authentication via a different plugin or a home grown solution if we are unable to sort out the current system
- fully separate website logic from streaming platform functionality, so as to ensure streaming operability independent of other components
We apologize once again for the issues that this may have caused and can reassure you that we are taking serious measures to overcome this for good. Thank you for your patience and continuous cooperation.
- You must be logged in to reply to this topic.