Skip to content

Improve PDF handling: increase size limit and extract images from PDFs #555

@karamvirsingh1998

Description

@karamvirsingh1998

@MaheshtheDev I've been testing supermemory and found two issues with PDF uploads:

Issue 1: 10MB limit is too small

  • Most research papers with images hit this limit
  • Users have to split files which breaks context

Issue 2: Images in PDFs are not processed

  • Charts, diagrams, and figures are ignored
  • Only text gets extracted
  • Important visual information is lost
  • Can't search for content that's in images

Example: I tried uploading a research paper with images - the images were completely ignored.

What I want to fix

I want to work on both of these issues:

  1. Increase PDF limit

  2. Extract and process images from PDFs

Why I'm the right person

  • I've identified exactly where the issues are
  • I understand the codebase structure
  • I have a clear solution in mind
  • Ready to write tests and docs

I want to take ownership of this issue and submit a PR.

Image Image

@MaheshtheDev Let me know if this sounds good!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions