Saturday, 16 November 2024
26.6 C
Singapore

AI startup Anthropic is accused of bypassing anti-scraping rules

Websites accuse AI startup Anthropic of bypassing anti-scraping protocols, causing disruptions and sparking debates over compliance and licensing.

In recent news, AI startup Anthropic, known for developing the Claude large language models, has accused multiple websites of disregarding their anti-scraping protocols. Freelancer and iFixit have raised concerns over Anthropic’s alleged behaviour, claiming that the company’s web crawler has been excessively active on their sites.

Freelancer’s complaints

Matt Barrie, CEO of Freelancer, has stated that Anthropic’s ClaudeBot is “the most aggressive scraper by far.” Barrie said the crawler visited Freelancer’s website 3.5 million times within four hours, causing significant disruption. This traffic volume is reportedly “about five times the volume of the number two” AI crawler. Barrie noted that this aggressive scraping has negatively impacted their site’s performance and revenue. Despite initially trying to refuse access requests, Freelancer blocked Anthropic’s crawler to prevent further issues.

iFixit’s experience

Kyle Wiens, CEO of iFixit, echoed similar concerns. Wiens mentioned on social media platform (formerly Twitter) that Anthropic’s bot hit iFixit’s servers one million times within 24 hours. This high volume of requests led to considerable strain on iFixit’s resources, prompting the team to set alarms for high traffic that woke them up at 3 AM due to Anthropic’s activities. The situation improved only after iFixit specifically disallowed Anthropic’s bot in its robots.txt file.

This isn’t the first time an AI company has been accused of ignoring the Robots Exclusion Protocol, or robots.txt. Back in June, Wired reported that AI firm Perplexity had been crawling its website despite the presence of a robots.txt file, which typically instructs web crawlers on which pages they can and cannot access. Although adherence to robots.txt is voluntary, bad bots often need to pay more attention to it. After Wired’s report, startup TollBit revealed that other AI firms, including OpenAI and Anthropic, have also bypassed robots.txt signals.

Anthropic’s response and ongoing issues

Anthropic has responded to these accusations, telling The Information that it respects robots.txt and that its crawler “respected that signal when iFixit implemented it.” The company strives for minimal disruption by being thoughtful about how quickly it crawls the exact domains and is currently investigating the issue to ensure compliance.

AI firms frequently use web crawlers to collect content to train their generative AI technologies. However, this practice has led to multiple lawsuits from publishers accusing these firms of copyright infringement. Companies like OpenAI have started forming partnerships with content providers to mitigate the risk of further legal action. OpenAI’s content partners include News Corp., Vox Media, the Financial Times, and .

Wiens from iFixit is willing to discuss a potential licensing agreement with Anthropic, suggesting that a formal deal could benefit both parties. This approach could pave the way for a more collaborative relationship between content providers and AI developers, reducing the friction caused by unauthorised scraping activities.

Hot this week

Best smartphone for 2024: Apple and Samsung, OPPO, Google phones reviewed

Explore the best 2024 smartphones: Samsung Galaxy S24 Ultra, OnePlus 12R, and OPPO Find N3 Flip. Compare AI capabilities, camera tech, and designs to find your ideal match.

Steam’s latest update introduces free gameplay recording for all users

Steam now offers free gameplay recording with easy sharing options for all users.

ChatGPT’s new voice mode brings real-time conversations to desktops

ChatGPT’s Advanced Voice Mode lets PC and Mac users enjoy real-time voice chats, adding natural interaction to AI for an improved user experience.

Meta’s collaboration with the US government fuels questions about AI use

Meta partners with US agencies to explore AI in the public sector, collaborating on projects with the State Department and Department of Education.

ChatGPT launches live search with real-time information

OpenAI launches live search for ChatGPT, enhancing AI accuracy with real-time information, no ads, and media partnerships just in time for the US elections.

World of Warcraft teams up with Diablo Immortal for an epic 20th anniversary event

Celebrate 20 years of World of Warcraft with the Diablo Immortal "Eternal War" crossover, live now with exclusive battles, rewards, and cosmetics.

Microsoft shuts down Beta testing channel for Windows 10

Microsoft shut down the Windows 10 Beta channel as the OS nears the end of support. Users were moved to Release Preview, and minimal updates were planned.

US confirms US$6.6 billion CHIPS Act funding for TSMC

TSMC secures US$6.6 billion in CHIPS Act grants to expand in Arizona, marking a milestone in US semiconductor development and job creation.

NASA tests AI chatbot to simplify complex Earth data

Nasa unveils Earth Copilot, an AI chatbot that simplifies satellite data analysis. It aims to make geospatial insights accessible to everyone in seconds.

Related Articles

Popular Categories