On Tuesday, its developer unveiled a new AI-based chatbot that he described as “helpful and harmless” Anthropic.
Claude 2 is a chatbot with a well-known repertoire. It can perform tasks such as creating summaries, writing code, translating text and performing other software-related tasks.
The latest version of generative AI can be accessed by the public via APIs and a web interface in the United States and United Kingdom. Prior to this, the generative AI was only accessible by businesses on request or as an Slack app.
Anthropic released a statement that said, “Think of Claude like a friendly and enthusiastic personal assistant or colleague who can be taught in natural language in order to assist you with many different tasks.”
Will Duffield a policy expert at the Observer noted that “Anthropic has been trying to get into the personal assistant market”. Cato Institute, a Washington, D.C., Think Tank
TechNewsWorld reported that Claude wanted to be a better personal assistant than others.
Improved Reasoning Scores
Anthropic claims that Claude 2 has improved on previous models when it comes to coding, mathematics, and reasoning.
Claude 2, for instance, scored 76.5% on the multiple-choice portion of a bar examination. Previous models scored an average of 73.0%.
Claude 2 achieved a score above 90th percentile on the GRE writing and reading tests for students applying to graduate school. It scored as well as the median applicant on quantitative reasoning.
Claude 2, a Python coding tester, scored 71.2% for the coding portion of the Codex HumanEval Test. This is a huge improvement over previous models which scored 56.0%.
The GSM8K test, which includes a wide range of grade-school mathematics problems, was only slightly more successful than Claude 1.3. It achieved a score 88.0% compared to a score 85.2%.
Claude 2 has been improved over our previous models, which included Codex HumanEval GSM8K MMLU and GSM8K. Our model card contains the complete set of evaluations: https://t.co/fJ210d9utd pic.twitter.com/LLOuUNfOFV
— Anthropic (@AnthropicAI) July 11, 2023
Knowledge Lag
Anthropic has improved Claude’s input.
Claude’s 2 context window can process up to 75,000 text words. Claude is capable of consuming hundreds of pages of documentation, or even a full book. ChatGPT can only handle 3,000 characters.
Anthropic added that Claude can now also write longer documents — from memos to letters to stories up to a few thousand words.
Claude, like ChatGPT is not connected to the Internet. It is trained using data that will abruptly end in December 2022. That gives it a slight edge over ChatGPT, whose data cuts off currently in September 2021 — but lags behind Bing and Bard.
Greg Sterling, the co-founder and CEO of Bard, explained that “Bing gives you up-to date search results.” Near MediaThe website is a source of news, commentary, and analysis.
However, this may have a limited effect on Claude 2. Sterling told TechNewsWorld that “most people will not see any major differences unless you use these apps together.” The differences people might perceive are primarily in the interfaces.
Anthropic also lauded the improvements to safety made in Claude 2 It said that its “red team” scores models using a wide range of harmful prompts. The tests may be automated, but they are still checked manually on a regular basis. Anthropic’s latest evaluation noted that Claude 2 is two times more likely to give harmless responses than Claude 1.3.
It also has a set or principles, called a constitution, built into the system which can temper its response without the need for a moderator.
Stopping Harm
Anthropic’s generative AI software isn’t the only one trying to limit potential harm. Rob Enderle is the president and principal analyst of the Enderle GroupA firm that provides advisory services in Bend, Ore.
TechNewsWorld reported that the main difference between providers will be in their execution.
He said that manufacturers like Microsoft Nvidia and IBM took AI safety very seriously when they first entered the industry. He said that some startups seem more interested in launching something, than a safe and reliable product.
Duffield said, “I take issue with language such as harmless because tools that are useful can be used to harm in some way.”
It is possible that a program with generative AI’s value could be negatively affected by efforts to reduce harm. This doesn’t appear to be the situation with Claude 2. Duffield noted that Claude 2 didn’t seem to be completely useless.
Conquering Noise Barrier
Enderle argued that having an “honest AI” is the key to trusting in it. He said that a dishonest, harmful AI is not good for us. “But, if we do not trust the technology we should not use it.”
“AIs work at machine speeds and we do not,” he added, “so, they could do much more damage in a very short time period than we are able.”
Sterling continued, “AI can create things that are accurate but plausible-sounding.” It is very dangerous if people use incorrect information.
He said that AI can also spread biased or toxic data in certain cases.
Even if Claude 2 lives up to its promise as a “helpful” AI bot that is “harmless and honest”, it will need to work hard to stand out on a market which has become very crowded.
Enderle said, “We’re overwhelmed by all the announcements. It is hard to rise above it.”
Sterling added that “ChatGPT Bing and Bard are the three applications with the highest mindshare and people will have little reason to use any other application.”
He pointed out that Claude being the “friendly AI” probably won’t suffice to differentiate it from other players in this market. “It’s a abstraction,” he explained. “Claude must perform better or prove more useful in order to be adopted. “People won’t be able to tell the difference between ChatGPT and Claude.”
In addition to the high level of noise, we also have a lot of ennui. Duffield observed that “it’s harder than ever to impress people using any type of new chatbot.” There’s chatbot fatigue starting to set in.