Search This Blog

Powered by Blogger.

Blog Archive

Labels

Footer About

Footer About

Labels

Showing posts with label Downdetector. Show all posts

Amazon resolves major AWS outage that disrupted apps, websites, and banks globally



 


A widespread disruption at Amazon Web Services (AWS) on Monday caused several high-profile apps, websites, and banking platforms to go offline for hours before the issue was finally resolved later in the night. The outage, which affected one of Amazon’s main cloud regions in the United States, drew attention to how heavily the global digital infrastructure depends on a few large cloud service providers.

According to Amazon’s official update, the problem stemmed from a technical fault in its Domain Name System (DNS) — a core internet function that translates website names into numerical addresses that computers can read. When the DNS experiences interruptions, browsers and applications lose their ability to locate and connect with servers, causing widespread loading failures. The company confirmed the issue affected its DynamoDB API endpoint in the US-EAST-1 region, one of its busiest hubs.

The first reports of disruptions appeared around 7:00 a.m. BST on Monday, when users began facing difficulties accessing multiple platforms. As the issue spread, users of services such as Snapchat, Fortnite, and Duolingo were unable to log in or perform basic functions. Several banking websites, including Lloyds and Halifax, also reported temporary connectivity problems.

The outage quickly escalated to a global scale. According to the monitoring website Downdetector, more than 11 million user complaints were recorded throughout the day, an unprecedented figure that reflected the magnitude of the disruption. Early in the incident, Downdetector noted over four million reports from more than 500 affected platforms within just a few hours, which was more than double its usual weekday average.

AWS engineers worked through the day to isolate the source of the issue and restore affected systems. To stabilize its network, Amazon temporarily limited some internal operations to prevent further cascading failures. By 11:00 p.m. BST, the company announced that all services had “returned to normal operations.”

Experts said the incident underlined the vulnerabilities of an increasingly centralized internet. Professor Alan Woodward of the University of Surrey explained that modern online systems are highly interdependent, meaning that an error within one major provider can ripple across numerous unrelated services. “Even small technical mistakes can trigger large-scale failures,” he said, pointing out how human or software missteps in one corner of the infrastructure can have global consequences.

Professor Mike Chapple from the University of Notre Dame compared the recovery process to restoring electricity after a large power outage. He said the system might “flicker” several times as engineers fix underlying causes and bring services gradually back online.

Industry observers say such incidents reflect a growing systemic risk within the cloud computing sector, which is dominated by a handful of major firms such as Amazon, Microsoft, and Google collectively controlling nearly 70% of the market. Cori Crider, director of the Future of Technology Institute, described the current model as “unsustainable,” warning that heavy reliance on a few global companies poses economic and security risks for nations and organizations alike.

Other experts suggested that responsibility also lies with companies using these services. Ken Birman, a computer science professor at Cornell University, noted that many organizations fail to develop backup mechanisms to keep essential applications online during provider outages. “We already know how to build more resilient systems,” he said. “The challenge is that many businesses still rely entirely on their cloud providers instead of investing in redundancy.”

Although AWS has not released a detailed technical report yet, its preliminary statement confirmed that the outage originated from a DNS-related fault within its DynamoDB service. The incident, though resolved, highlights a growing concern within the cybersecurity community: as dependence on cloud computing deepens, so does the scale of disruption when a single provider experiences a failure.


ChatGPT Outage in the UK: OpenAI Faces Reliability Concerns Amid Growing AI Dependence

 


ChatGPT Outage: OpenAI Faces Service Disruption in the UK

On Thursday, OpenAI’s ChatGPT experienced a significant outage in the UK, leaving thousands of users unable to access the popular AI chatbot. The disruption, which began around 11:00 GMT, saw users encountering a “bad gateway error” message when attempting to use the platform. According to Downdetector, a website that tracks service interruptions, over 10,000 users reported issues during the outage, which persisted for several hours and caused widespread frustration.

OpenAI acknowledged the issue on its official status page, confirming that a fix was implemented by 15:09 GMT. The company assured users that it was monitoring the situation closely, but no official explanation for the cause of the outage has been provided so far. This lack of transparency has fueled speculation among users, with theories ranging from server overload to unexpected technical failures.

User Reactions: From Frustration to Humor

As the outage unfolded, affected users turned to social media to voice their concerns and frustrations. On X (formerly Twitter), one user humorously remarked, “ChatGPT is down again? During the workday? So you’re telling me I have to… THINK?!” While some users managed to find humor in the situation, others raised serious concerns about the reliability of AI services, particularly those who depend on ChatGPT for professional tasks such as content creation, coding assistance, and research.

ChatGPT has become an indispensable tool for millions since its launch in November 2022. OpenAI CEO Sam Altman recently revealed that by December 2024, the platform had reached over 300 million weekly users, highlighting its rapid adoption as one of the most widely used AI tools globally. However, the incident has raised questions about service reliability, especially among paying customers. OpenAI’s premium plans, which offer enhanced features, cost up to $200 per month, prompting some users to question whether they are getting adequate value for their investment.

The outage comes at a time of rapid advancements in AI technology. OpenAI and other leading tech firms have pledged significant investments into AI infrastructure, with a commitment of $500 billion toward AI development in the United States. While these investments aim to bolster the technology’s capabilities, incidents like this serve as a reminder of the growing dependence on AI tools and the potential risks associated with their widespread adoption.

The disruption highlights the importance of robust technical systems to ensure uninterrupted service, particularly for users who rely heavily on AI for their daily tasks. Despite restoring services relatively quickly, OpenAI’s ability to maintain user trust and satisfaction may hinge on its efforts to improve its communication strategy and technical resilience. Paying customers, in particular, expect transparency and proactive measures to prevent such incidents in the future.

As artificial intelligence becomes more deeply integrated into everyday life, service disruptions like the ChatGPT outage underline both the potential and limitations of the technology. Users are encouraged to stay informed through OpenAI’s official channels for updates on any future service interruptions or maintenance activities.

Moving forward, OpenAI may need to implement backup systems and alternative solutions to minimize the impact of outages on its user base. Clearer communication during disruptions and ongoing efforts to enhance technical infrastructure will be key to ensuring the platform’s reliability and maintaining its position as a leader in the AI industry.

Cell Service Restored Following Extensive AT&T Outage

 

AT&T has resolved issues affecting its mobile phone customers following widespread outages on Thursday, according to a company announcement.Throughout the day, tens of thousands of cell phone users across the United States reported disruptions.

Reports on Downdetector.com, a platform monitoring outages, indicated instances of no service or signal after 04:00 EST (09:00 GMT).

AT&T issued an apology to its customers and confirmed that services were fully operational again by early afternoon. The company stated its commitment to taking preventive measures to avoid similar incidents in the future. The cause of the outage is currently being investigated.

Verizon and T-Mobile informed the BBC that their networks were functioning normally. However, they acknowledged that some customers may have experienced service issues while attempting to communicate with users on different networks.

According to Downdetector, AT&T received over 74,000 customer complaints, with significant clusters in southern and eastern regions of the country.

Smaller carriers like Cricket Wireless, UScellular, and Consumer Cellular also reported interruptions in service. Complaints ranged from difficulties with calls, texts, to internet access, with many users reporting no service or signal.

Downdetector's data showed that major cities including Los Angeles, Chicago, Houston, and Atlanta experienced high numbers of outages.

Some individuals also faced challenges with 911 services, prompting officials to advise the use of landlines, social media, or cell phones from alternative carriers in emergencies.

The widespread outage has garnered the attention of the US government, with the FBI and Department of Homeland Security launching investigations, as confirmed by John Kirby, spokesperson for the US National Security Council.

Eric Goldstein, executive assistant director for cybersecurity at the US Cybersecurity and Infrastructure Security Agency, stated that they are collaborating with AT&T to understand the root cause of the outage and are ready to provide assistance as necessary.

Although a confidential memo reported by ABC News suggested no signs of malicious activity, CISA officials are actively investigating the incident.