LLM:
Retrival Mode:
Advanced Config
🤗 Readme.md 🤗
Goals
Facts & Evidence's goal is to create an AI-driven tool to help check the factuality of any paragraph, whether human- or machine-generated. The absolute factuality of True or False content is not only hard to define but also sensitive to the selection of evidence. Thus, we view factuality as a spectrum of how much support this paragraph got from evidence on the internet.
Note: we try to show you 3 articles, but incase of scraping or retrieval failures we might show you less.
Here is how Fact & Evidences pipelines work steps by steps:
- Paragraph Breakdown: The input paragraph is broken down into pairs of sentences and atomic claims.
- Evidence Gathering: Each claim is searched on Google to find related evidence.
- Dense Retrieval: Using dense retrieval, Fact & Evidence finds the most relevant sentences and extracts surrounding context.
- LLM Analysis: A large language model (LLM) judges the atomic claims and their corresponding evidence to determine whether the evidence supports the claim.
- Evidence Classification: Each piece of evidence is classified into categories (news, blogs, wikis, social media, scientific/medical articles, government websites, etc.).
- Scoring and Filtering: Users can view the overall factuality scores across all evidence types or filter by specific categories or individual pieces of evidence.
How To
Fact & Evidence can be used by the following steps
- Input Paragraph: Enter the paragraph you want to fact-check in the input box.
- Configure Settings: Choose appropriate configuration settings for your use case.
- Submit: Click the "Submit" button to initiate the fact-checking process.
- Analysis: Allow Fact & Evidence a few minutes to browse the internet and carefully analyze the results.
- Examine Results: Examine the factuality score. Expand any claim in the sentence to see the supporting evidence, and click "More Details" to understand why the LLM considers the evidence supportive.
- Filter Evidence: Filter any evidence or evidence type as appropriate for your needs.
Customizing Configuration
- LLM: select any appropriate Large Language Model (LLM) that will judge whether the evidence supports the claim.
- Retrival Mode: There are two otpions for retrieval mode:
- sparse: This is the default option. It uses theBM25algorithm to retrieve the most relevant sentences based on the statistic between evidences's and claim's word frequency.
- dense: This option usesjina-embeddings-v3as the embedding model. It will create vector representations of the evidences and claims sentence, and then find piece of evidences that are most related to the claim.
- Evidence Per Document: For each evidence, there might be more than 1 related parts to attribute to the claim. This setting will determine how many chunks of evidences will be used to judge the claim.
- Context Window Size: Once, we found the most relevant sentences, we will use the context window size to extract the surrounding context of the sentences to maintain context.
- Number of Evidence: The number of evidences to be retrieved from internet for each claim. Maximum are 5 evidences.