Search This Blog

Showing posts with label ECS Instances. Show all posts

Microsoft Hit by Huge Service Outage


This week's 6-hour-long global outage of Microsoft 365 was caused by a flawed Enterprise Configuration Service (ECS) deployment, as per a preliminary post-incident review. This deployment caused cascade errors and availability effects across numerous locations.

ECS is an internal central configuration repository created to allow Microsoft services to make targeted updates, such as particular configurations per tenant or user, as well as broad-scope dynamic changes affecting many services and features.

According to Microsoft, a recent deployment that featured a "broken link to an internal storage service" was the most likely reason for an outage that prevented many customers from accessing or using a variety of Microsoft 365 products for several hours.

Access to several Microsoft services, including Microsoft Teams, Exchange Server, Microsoft 365 admin center, Microsoft Word, and other Office programs, was slowed down as a result of the service issues, which began on Wednesday, July 20 in the evening and persisted into Thursday morning. Microsoft Managed Desktop and other services were also not able to auto-patch due to the problem.

Overview of the outage

Through its public Twitter statements, Microsoft failed to mention the location of the disruptions. According to comments in Microsoft's Twitter statement, the Teams outage appears to have impacted users in Los Angeles, Dallas, New York City, Hong Kong, and Eastern Australia.

With its cloud computing, Microsoft does have a complex service level agreement. Accordingly, the sole form of compensation for any downtime that an organization can receive is a service-time credit. Additionally, since it is not automatically applied, they must ask for the service credit.

"Telemetry shows that this incident had an impact on about 300,000 calls. Due to business hours falling inside the effect timeframe, the Asia Pacific (APAC) region was the most impacted. Direct Routing and Skype MFA were also significantly affected," the company explained.


What sparked the outage?

In the end, the incident had an impact on users seeking to use one or more of the Microsoft 365 apps and services, according to Bleeping Computer.

The botched Enterprise Configuration Service (ECS) deployment was the initial root cause of this outage, as stated by Redmond in their incident report. "Backward compatibility with services that use ECS was impacted by a deployment of the ECS service that had a code flaw. The end result was that it would send inaccurate configurations to all of its partners for services using ECS " the firm stated.

As a result, downstream services received a status response with the code 200, suggesting that the pull was successful, but it just included a JSON object that was poorly formatted. How each Microsoft service used the flawed configuration supplied by ECS determined the impact's severity. Impact varied from services collapsing, like Teams, to low or no impact on other services.

Microsoft claims that as a result of this incident, they are working to strengthen the Microsoft Teams service's resilience so that it may fall back to a previous version of the ECS configuration in the case of a future ECS failure.


Alibaba Cloud Servers Hacked, Trend Micro Reports

 

Trend Micro announced on Monday that numerous hacking groups have been targeting Alibaba Cloud servers to install cryptocurrency mining malware known as "cryptojacking". 

One of the challenges with Alibaba ECS, as per Trend Micro, is the absence of distinct privilege tiers configured on an instance, including all instances providing root privileges by default. This allows malicious actors who obtain access to login credentials to connect to the targeted system via SSH as root without performing any preparatory (escalation of privilege) work. 

Alibaba is a Chinese technology behemoth with an international market presence, with cloud services mainly used throughout Southeast Asia. 

The ECS service, in specific, is advertised as having fast memory, Intel CPUs, and favorable low-latency operations. Perhaps better, ECS comes with a security agent pre-installed to safeguard against malware such as crypto miners. 

"The threat actor has the highest possible privilege upon compromise, including vulnerability exploitation, any misconfiguration issue, weak credentials, or data leakage," explains Trend Micro's report. 

Moreover, the cyber attackers can use these administrative privileges to generate firewall rules that drop incoming packets from IP ranges about internal Alibaba servers, preventing the installed security agent from sensing suspicious behavior. 

Owing to the ease with which kernel module rootkits and cryptojacking malware can be planted considering the elevated privileges, it is not surprising that numerous threat actors compete to take over Alibaba Cloud ECS instances. 

Trend Micro has also noticed scripts that search for processes running on specific ports frequently used by malware and backdoors and terminate the associated processes to eliminate competing malware. An auto-scaling system, which allows the service to automatically adjust computing resources depending on the volume of user queries, is yet another ECS feature used by the threat actors. 

This is to prevent future service disruptions and niggles caused by unexpected traffic loads, but it also provides an opportunity for cryptojackers. Abusing this while it is involved on the targeted account allows the actors to increase their Monero mining power while incurring extra costs to the instance owner.