Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic imports in LangChain document loaders causes lots of problems in different environments #7469

Open
5 tasks done
idutta2007 opened this issue Jan 4, 2025 · 3 comments
Assignees
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@idutta2007
Copy link

idutta2007 commented Jan 4, 2025

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf"

// Load document
const loader = new PDFLoader( "C:/test/example.pdf")
const docs = await loader.load()

Error Message and Stack Trace (if applicable)

No response

Description

I am trying to use the PDFLoader class to load a PDF document from a jest testing environment where testEnvironment is set to "node" in jest.config.js file. When running the test I get the error:

Error: Failed to load fs/promises. TextLoader available only on environment 'node'. It appears you are running environment 'node'

The error is due to using dynamic import in BufferLoader class and is misleading because the environment is correctly set to node.

static async imports() {
        try {
            const { readFile } = await import("node:fs/promises");
            return { readFile };
        }
        catch (e) {
            console.error(e);
            throw new Error(`Failed to load fs/promises. TextLoader available only on environment 'node'. It appears you are running environment '${(0, env_1.getEnv)()}'. See https://<link to docs> for alternatives.`);
        }
    }

You can get past the above error by passing the option --experimental-vm-modules to node. But then you get another error:

Failed to load pdf-parse. Please install it with eg. 'npm install pdf-parse'

even though pdf-parse is already installed. The second error in PDFLoader is also due to dynamic import but then for importing a very specific version of pdf-parse which I did not have. Again the error message is misleading.

async function PDFLoaderImports() {
    try {
        const { default: mod } = await import("pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js");
        const { getDocument, version } = mod;
        return { getDocument, version };
    }
    catch (e) {
        console.error(e);
        throw new Error("Failed to load pdf-parse. Please install it with eg. `npm install pdf-parse`.");
    }
}

IMO, dynamic imports should be removed from LangChain as it causes problems in different environments. Instead, just import parsers like pdf-parse normally from PDFLoader. The document loaders are useless with this types of problems. Just look at the number of problems people are having due to this issue.

System Info

Platform: windows
node version: 22.12.0
npm version: 10.9.0
Langchain versions:
"@langchain/community": "^0.3.22",
"@langchain/core": "^0.3.27",
"@langchain/openai": "^0.3.16",
"langchain": "^0.3.9",

Copy link

dosubot bot commented Jan 4, 2025

Hey @idutta2007, I'm here to assist you with any bugs or questions you have regarding LangChain.js. I can help you solve issues, answer questions, and become a contributor. Let me know how I can assist you!

There are two similar unsolved discussions related to the issues you're experiencing with dynamic imports in LangChain:

  1. Module not found: Error: Can't resolve 'fs' when using langchain document loader [1].
  2. Getting Module build failed: UnhandledSchemeError: Reading from "node:fs/promises" is not handled by plugins (Unhandled scheme). in building React app with Langchain JS [2].

To address the issues caused by dynamic imports in LangChain's PDFLoader when running tests in a Node environment, consider the following solutions:

  1. Ensure Node Environment: Verify that your test environment is correctly set up to run Node.js. This can be done by configuring your test runner (e.g., Jest) to use Node as the environment.

  2. Mock Dynamic Imports: If your environment does not support Node.js modules, you can mock the dynamic imports. For example, using Jest, you can mock the fs/promises module:

    jest.mock('node:fs/promises', () => ({
      readFile: jest.fn().mockResolvedValue(Buffer.from('mocked file content')),
    }));
  3. Use Custom PDF.js: Specify a custom pdfjs implementation when creating the WebPDFLoader to resolve issues with pdf-parse:

    const loader = new WebPDFLoader(
      new Blob([await fs.readFile(filePath)], {
        type: "application/pdf",
      }),
      {
        pdfjs: () => import("pdf-parse/lib/pdf.js/v1.10.100/build/pdf.js"),
      }
    );
  4. Check Environment Variables: Ensure that any environment variables or configurations that determine the runtime environment are correctly set to 'node'.

These steps should help mitigate the misleading error messages and ensure that dynamic imports work correctly in your test environment [3][4][5].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Jan 4, 2025
@jacoblee93
Copy link
Collaborator

Agreed, we should fix this.

@jacoblee93 jacoblee93 self-assigned this Jan 9, 2025
@jacoblee93
Copy link
Collaborator

@dqbd if you have time to look would be a nice win

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants