Domain Knowledge

In the specialized field of crypto investing, domain knowledge is paramount, and data is the gold to success. dolphin.fm approach leverages the expertise of our in-house specialists in crypto trading, blockchain technology, quantitative analysis, and risk management. These experts are responsible for labeling and creating our proprietary dataset of mined financial texts. This dataset is meticulously curated from various sources, including:

On-chain transactional data
Academic blockchain papers
New research and protocols
Emerging mining pools
Innovative smart contract designs
Decentralized exchanges
Social media insights

Our specialists select and curate content based on its relevance, timeliness, factual accuracy, and momentum in the market, ensuring that the model is trained with the most pertinent and up-to-date information.

Consistency and the mitigation of hallucinations

Accuracy and factuality are critical in financial applications, where inaccurate or fabricated information can lead to significant financial losses. Hallucinations often stem from outdated data, inherent biases, incorrect information, and fake news. This issue is particularly pronounced in the crypto space, where information is fragmented, and market sentiment can be highly volatile.

To address these challenges, we employ several advanced techniques:

Curated Training Datasets: As detailed above, we use carefully selected and updated datasets to train the model, reducing the risk of hallucinations.
Document Summarization: This technique reinforces the importance of grounding information in the original text, ensuring that summaries retain the critical context and factual accuracy of the source material.
Open Book Training: Self-Retrieval-Augmented Generation (self-RAG) and open-book Q&A methods help maintain consistency and alignment with accurate information. These techniques enable the model to reference source documents during the generation process, improving reliability.
Named Entity Recognition (NER): NER acts as a safeguard against hallucinations by encouraging the model to recognize and validate entities within the text. This technique enhances the accuracy of the model’s output by ensuring that it correctly identifies and contextualizes key entities, such as names, organizations, and specific crypto assets.

PreviousLarge-Language-Model Specialized in Investing NextTabular Understanding

Last updated 1 year ago