Episyche
Open Ai/Langchain/
2023-06-15T05:38:41.518954Z
Published on
Since we are limited to 4096 characters for free users we can’t give custom data beyond this limit as a source to give ChatGPT a full understanding. There is some limitation we can’t directly give a text, ppt, or pdf document as input to chatGPT. To overcome this problem there is a tool called Langchain which can be used to provide additional functionality to LLMs. Using Langchain we can give a large amount of pdf, text, and ppt documents as inputs.
Langchain is a powerful Python library designed to simplify and streamline natural language processing tasks. It offers a wide range of functionalities for text analysis, language detection, sentiment analysis, named entity recognition, part-of-speech tagging, and more. Langchain leverages state-of-the-art machine learning models and algorithms, making it an efficient tool for processing and understanding textual data.
With Langchain, developers can easily integrate natural language processing capabilities into their Python projects, enabling them to extract valuable insights from text data. The library provides a user-friendly interface, allowing developers to perform various natural language processing tasks with just a few lines of code.
Navigate to https://platform.openai.com/docs/introduction
Signup for an account.
Purchase an API key.
copy the API key and store it in a secure place.
Install the following Python packages
Open AI
pip install openai
Langchain
pip install langchain
Also other optional Python packages
pip install python-magic
pip install pypdf
pip install faiss-cpu
pip install pdf2image
pip install unstructured
pip install nltk
pip install tiktoken
pip install tabulate
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader, PyPDFLoader, PyPDFDirectoryLoader
from langchain.vectorstores import Pinecone
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader
import magic
import os
import nltk
import pinecone
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import DirectoryLoader
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
To import large amounts of files in a directory use DirectoryLoader
from Langchain.
# Get your loader ready
loader = DirectoryLoader('../data/PaulGrahamEssaySmall/', glob='**/*.txt')
# Load up your text into documents
documents = loader.load()
Split your documents into individual pieces for efficient search using RecursiveCharacterTextSplitter
from Langchain.
# Get your text splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# Split your documents into texts
texts = text_splitter.split_documents(documents)
converts the document pieces into embeddings
OpenAIEmbeddings are trained using large-scale unsupervised learning techniques, typically on massive amounts of text data from diverse sources such as books, articles, websites, and more.
The embeddings are numerical vectors that encode the semantic meaning of words or texts. Similar words or texts will have similar vector representations.
# Turn your texts into embeddings
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
# Get your docsearch ready
docsearch = FAISS.from_documents(texts, embeddings)
Load Open Ai
# Load up your LLM
llm = OpenAI(openai_api_key=openai_api_key)
Use RetrievalQA
from Langchain to perform a search.
RetrievalQA refers to a task that focuses on answering questions by retrieving relevant information from a given collection of documents. It involves using retrieval techniques to search for the most appropriate documents or passages that contain the answer to a given question
qa = RetrievalQA.from_chain_type(llm=llm,
chain_type="stuff",
retriever=docsearch.as_retriever(),
return_source_documents=True)
query = "What is the download speed of AT&T?"
result = qa({"query": query})
result
result['result']
Enter your question on the query variable
query = "What is the download speed of AT&T?"
Execute the Python code to get the search results.
Comments