It appears that ChatGPT and Gemini have a significant disadvantage when it comes to trading crypto. DeepSeek has taken an early lead in one of the first live AI crypto trading experiments, outpacing rivals like Grok, Claude, and GPT-5 in real-time market performance. In a live cryptocurrency trading experiment known as Alpha Arena, hosted by Nof1, six large language models (LLMs) each began with an initial balance of $10,000 to trade six cryptocurrency perpetual contracts on the cryptocurrency exchange Hyperliquid. DeepSeek V3.1 established an early advantage, reporting a gain of roughly 10% within the first three days. The event was designed to assess how advanced AI models handle real-time decision-making in fast-moving financial markets. This DeepSeek’s performance was sustained as trading continued. The model’s account value later grew to around $14,200 before coming back to earth. This result was achieved by maintaining leveraged long positions on five of the six assets, including Ethereum (ETH) and Bitcoin (BTC). Its only unprofitable position was in XRP. Other participants include Grok4, GPT-5, Claude Sonnet 4.5 and others. The table below summarizes the preliminary results for all six AI models in the competition. How DeepSeek compares with the other AI tools ModelApproximate Return %Account ValueBiggest Single WinBiggest Single LossDeepSeek V3.1+9%$10,848$1,490-$348.33Qwen3 Max+8.4%$10,476$1,453-$586.18Grok 4+0.24%$9,901$1,356-$638.22Claude Sonnet 4.5-16.63%$8,320$1,807-$1,579Gemini-55.7%$4,428$347.70-$750.02GPT-5-65.1%$3,495$265.59-$621.81Source: Alpha Arena (no “1”) *table is prone to changes as trading is ongoing. Note the particularly poor performance of Gemini and GPT-5. Preliminary results and notes on their limitations The competition is in its early stage (just days/weeks in) and uses AI models without human intervention. A strong performance by one model does not imply it matches or surpasses human traders in all conditions. The results reflect this particular setup with identical inputs and leverage. Different market conditions would produce different results. The experiment demonstrates that LLMs can process market data and execute trades autonomously, which would suggest future applications in both institutional and retail trading. However, the absence of human judgment introduces significant risks, which are amplified by the inherent volatility of leveraged cryptocurrency markets. Challenges still abound with AI The experiment highlights significant ethical and security challenges with AI beyond trading performance. Independent research confirms that models like DeepSeek and Grok can generate malicious content, including phishing emails and fraudulent smart contracts. This capability enables the automation of sophisticated fraud schemes. Security firm Sophos has also documented LLMs being used to automate “pig butchering” scams, where AI generates personalized messages to build false trust with victims. Separately, Anthropic reported catching a cybercriminal using its Claude AI to research and extort multiple companies. The hacker reportedly used the Claude chatbot to automate the entire process. These incidents demonstrate that the same analytical capabilities powering market analysis can be weaponized for large-scale deception.
Is AI the Killer Crypto Trading App? DeepSeek Posts Surprising Results
Date:





