This repo contains several folders, each containing sample code for a specific langchain-box
object.
In the document_loader_samples
directory, there is sample code covering various scenarios with the BoxLoader
object. This object takes a List[str]
of Box file ids or a str
object containing a Box folder id and returns a List of Document
objects based on the text representation available in Box. This only works for files types that have text representations in Box.
- Single file
- Multiple files
- All files in a folder
- All files in a folder recursively
- Character limits
- Extra fields
In the blob_loader_samples
directory, there is sample code covering various scenarios with the BoxBlobLoader
object. This object accepts one of the following inputs:
List[str]
of Box file idsstr
containing a Box folder idstr
containing a search queryBoxMetadataQuery
specifying a Metadata template, query, and paramaters
Based on the input, this object returns a List of Blob
objects containing the raw data from any document or image file type in Box. This Blob object can be passed to your favorite BlobParser
to generate Document
objects.
- Single file
- Multiple files
- All files in a folder
- All files in a folder recursively
- Files in a folder if a document file type
- Files in a folder if an image file type
- Files in a folder if filename matches glob
- Files in a folder if filename doesn't match glob
- Files in a folder if a document has a matching extension
- Files based on Metadata query
- Files based on search
- Files based on search with seach filters
- Extra fields
In the retriever_samples
directory, there is sample code covering various scenarios with the BoxRetriever
object. This object accepts either a str
containing a search query or both a str
with a Box AI query and a List[str]
with Box file ids.
Based on the input, this object returns a List of Document
objects containing either the text representation from document file types in Box or the answer and/or citations from a Box AI API call.
- Search
- search with filters
- Box AI with one file
- Box AI with multiple files
- Box AI with answer and citations
- Box AI with citations only
- Search as an agent tool
- Search as part of a chain
- Extra fields
It also enables tests for mutiple authentication methods:
- Developer token as an environment variable
- Developer token passed to Loader
- Developer token passed as BoxAuth
- JWT with service account
- JWT as user
- Client credentials grant with service account
- Client credentials grant as user
Now that we have langchain ready to use locally, we can now set up the tests to run. The test scripts rely on two key components, the environment files located in the config folder and the box search object.
The environment files are use to configure the tests to work.
File | Purpose | Required ---+---+--- .openai.env.template | Configure openai API key | Yes .box.env.template | Configure Box-specific fields like file and folder ids | Yes .token.env.template | Configure developer token auth | Only for token tests .jwt.env.template | Configure JWT auth | Only for JWT tests .ccg.env.template | Configure CCG Auth | Only for CCG tests
The box search object enables a "real-life" scenario after the test scripts load the appropriate Documents from Box. box_search provides a two methods, train_ai
and box_search
.
The train_ai
method accepts the documents return from the BoxLoader as an argument. It then does several things with those documents. First, it splits the documents into logical chunks using langchains RecursiveCharacterTextSplitter
. It then takes those chunks of text, converts them to OpenAI embeddings and commits them to a local Chroma vector store. Finally, it instantiates a ChatOpenAI
as the llm of choice, creates an LLMChainExtractor
as a compressor, and uses it and Chroma as a ContextualCompressionRetriever
.
Many thanks to HTMLFiveDev for their video, on which this object is based.
OK, assuming you have completed the steps above to get the langchain fork installed and prepped, let's get this started.
- In your terminal, clone this repository to your local machine by running
git clone https://github.com/shurrey/box-langchain-documentloader-tests.git
. This should not be inside of the langchain directory. - Change directories to the project by running
cd box-langchain-documentloader-tests
. Open the folder in your favorite editor. - In the config directory, copy the environment templates to new files, for example, copy
.box.env.template
to.box.env
. You must have.box.env
,.openai.env
, and whichever auth modes you plan to use. Open the newly created files, and add your values. - Install the libraries you need. I recommend using a virtual environment. Follow these instructions to install it. Once installed, you can create a virtual environment at the command line in the root directory of this application by running
virtualenv .venv
. - Once you complete that step, you can activate your virtual environment at the commandline by running
source .venv/bin/activate
. This should change your command line prompt and prepend it with(.venv)
. - Now you can install your dependencies at the commandline in the root directory of this application by running
pip install -r requirements.txt
. - You should now be all set to run the tests. Each test has a variable called
prompt
, which you can set based on the file(s) or folder you choose. It will be asked to OpenAI, so you will get a real answer based on the file(s) you provide.
To run the tests, you can either use the tools provided by your development environment, or from the command line, run python TEST_NAME.py
where TEST_NAME is the file name of the test you wish to run. For example, to test one file, you can change to the appropriate directory like cd document_loader_test_scripts
and then run python test_one_-_file.py
.