Widespread outage at Amazon and AWS led to heavy disruption of our services. Live streaming events cannot be started or broadcast to, with a concerning rate of occurrence.
The WpStream platform relies extensively on multiple Amazon Web Services. As AWS has not yet announced an ETA for the situation to come back to normal. we are anxiously waiting for updates from them. We’ve run a full checkup and expecting full functionality as soon as they’re fully back online.
Services WpStream relies on started operating normally (with zero or very small error rates) around 2AM UTC. In all, WpStream has been deficient for slightly over 10 hours, with issues ranging from subtle malfunctions to complete unavailability.
We are still analyzing the impact and implications of employing cloud services for critical operations.
Failure has affected WpStream on multiple accounts
-(moderate, intermittent) inability to sign in via the wordpress plugin -(moderate, intermittent) inability to start live events -(moderate, intermittent) inability to broadcast -(moderate, intermittent) inability to watch live events -(moderate, intermittent) support ticket disfunctionalities -(reduced) payment failures and failure to properly allocate resources after payment -(reduced) failure to dispatch transactional emails -(reduced) other malfunctions
These were all outcomes of an AWS disfunctionality detailed here https://www.reuters.com/markets/commodities/amazons-prime-ring-other-apps-down-thousands-users-2021-12-07/ We’re concluding that, given the fact that multiple services (database, computing, networking) have been malfunctioning, there is no feasible approach of having prepared for the event in a way that would fully keep the platform continuing to operate unaffected. Redundancy and failover are built into WpStream at multiple levels, however these are meant to address common issues, not (rare) catastrophic failures.
Future plans and betterment opportunities on our end -continue to rely on AWS (along other cloud providers) for various operations -work on improved failover in regard to payment processes and streaming resource allocation -improve emergency response procedures and support availability