Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access uploaded files in pipelines #164

Open
gqoew opened this issue Jul 17, 2024 · 10 comments
Open

Access uploaded files in pipelines #164

gqoew opened this issue Jul 17, 2024 · 10 comments

Comments

@gqoew
Copy link

gqoew commented Jul 17, 2024

Hi there,

Being able to access uploaded files would be a great addition to pipelines. It would greatly expand the potential of pipelines, by not being limited with text input.

It would be also great to enable pipelines to return files in the chat as well.

Is there any plan to move this feature forward in the near future? Would be happy to test

Related issues: #66 #19 #81

@g453030291
Copy link
Contributor

Looking forward to it.
Hopefully, we can add the ability to process files in custom pipelines as soon as possible.
This will greatly enhance the scalability of the project.
Is there anything I can do? I'd like to help.

@chandan-artpark
Copy link

Hey,
Is there a way to get the files the user has selected in the pipelines class ? currently the only arguments are "user_message, model_id, messages and body". In the default RAG pipeline information such as file names, collection_names are provided, basically information about which file/collection the user has selected in the message.
Can this information also be accessed in Pipelines ?

@JiangYain
Copy link

I have the same problem. If we don't have access to user-uploaded files, it limits a lot of functionality 😶It' s hard to get other parameters passed by the front end, such as whether a new session has been created (which bothers me, even if a new session is created, there is no way to restart a new context), likes or dislike, etc.

@InquestGeronimo
Copy link

you can access uploaded files by adding an inlet function, if you upload a file, you should see it in the body:

    async def inlet(self, body: dict, user: dict) -> dict:
        # This function is called before the OpenAI API request is made. You can modify the form data before it is sent to the OpenAI API.
        print(f"inlet:{__name__}")

        print(body)
        print(user)

        return body

@Fusseldieb
Copy link

Fusseldieb commented Aug 28, 2024

@InquestGeronimo Sorry for pinging you, but did the API change? Some weeks ago I tried to make a example pipeline, and it errored out as soon as I attached an image (#66). Is it now "supported"?

That's the main thing that's holding me back from integrating pipelines instead of OpenAI so far. I don't want to loose image capabilities.


EDIT: Looks like something HAS changed! The pipeline doesn't error out anymore. Yay! Guess I'll be using Pipelines now!

@tjbck Care to close this issue? I'm not OP but I guess this is solved.

@S1M0N38
Copy link

S1M0N38 commented Oct 4, 2024

Here is a hacky way to access uploaded files.
Define an inlet function as suggested by @InquestGeronimo and query + /content

async def inlet(self, body: dict, user: dict) -> dict:
    print(f"Received body: {body}")
    files = body.get("files", [])
    for file in files:
        content_url = file["url"] + "/content"
        print(f"file available at {content_url}")
        # read the file content as binary and do something ...
    return body

@jeandelest
Copy link

Hi @InquestGeronimo , you solution works for me. Thank ! I still have an issue, it seems the data is not given has it is, do you know why ? Is there a way to get the original file content ?

Here is the original data:

PlayerID;FirstName;LastName;Team;Position;Goals;Assists;Appearances
1;Leo;Messi;Paris Saint-Germain;Forward;672;305;786
2;Cristiano;Ronaldo;Al Nassr;Forward;700;223;900
3;Neymar;Da Silva Santos;Al Hilal;Forward;398;200;600
4;Kylian;Mbappe;Paris Saint-Germain;Forward;300;150;400
5;Robert;Lewandowski;FC Barcelona;Forward;500;150;700
6;Kevin;De Bruyne;Manchester City;Midfielder;100;200;500
7;Luka;Modric;Real Madrid;Midfielder;120;170;600
8;N'Golo;Kante;Chelsea;Midfielder;30;80;400
9;Ruben;Dias;Manchester City;Defender;10;20;250
10;Virgil;Van Dijk;Liverpool;Defender;20;15;250

And here is what I got from the pipeline:

{
  "id": "6547e61d-dc1d-4544-a4fa-b796d40303e5",
  "user_id": "80ce7079-c367-41e5-89f7-7de8534b90e4",
  "hash": "75e40889b84327411325d75964484104733eb18c58ff14ff1d0c8f057defa1e0",
  "filename": "6547e61d-dc1d-4544-a4fa-b796d40303e5_players.csv",
  "data": {
    "content": "PlayerID: 1\nFirstName: Leo\nLastName: Messi\nTeam: Paris Saint-Germain\nPosition: Forward\nGoals: 672\nAssists: 305\nAppearances: 786 PlayerID: 2\nFirstName: Cristiano\nLastName: Ronaldo\nTeam: Al Nassr\nPosition: Forward\nGoals: 700\nAssists: 223\nAppearances: 900 PlayerID: 3\nFirstName: Neymar\nLastName: Da Silva Santos\nTeam: Al Hilal\nPosition: Forward\nGoals: 398\nAssists: 200\nAppearances: 600 PlayerID: 4\nFirstName: Kylian\nLastName: Mbappe\nTeam: Paris Saint-Germain\nPosition: Forward\nGoals: 300\nAssists: 150\nAppearances: 400 PlayerID: 5\nFirstName: Robert\nLastName: Lewandowski\nTeam: FC Barcelona\nPosition: Forward\nGoals: 500\nAssists: 150\nAppearances: 700 PlayerID: 6\nFirstName: Kevin\nLastName: De Bruyne\nTeam: Manchester City\nPosition: Midfielder\nGoals: 100\nAssists: 200\nAppearances: 500 PlayerID: 7\nFirstName: Luka\nLastName: Modric\nTeam: Real Madrid\nPosition: Midfielder\nGoals: 120\nAssists: 170\nAppearances: 600 PlayerID: 8\nFirstName: N'Golo\nLastName: Kante\nTeam: Chelsea\nPosition: Midfielder\nGoals: 30\nAssists: 80\nAppearances: 400 PlayerID: 9\nFirstName: Ruben\nLastName: Dias\nTeam: Manchester City\nPosition: Defender\nGoals: 10\nAssists: 20\nAppearances: 250 PlayerID: 10\nFirstName: Virgil\nLastName: Van Dijk\nTeam: Liverpool\nPosition: Defender\nGoals: 20\nAssists: 15\nAppearances: 250"
  },
  "meta": {
    "name": "players.csv",
    "content_type": "text/csv",
    "size": 579,
    "path": "/app/backend/data/uploads/6547e61d-dc1d-4544-a4fa-b796d40303e5_players.csv",
    "collection_name": "file-6547e61d-dc1d-4544-a4fa-b796d40303e5"
  },
  "created_at": 1729251218,
  "updated_at": 1729251218
}

Regards

@sir3mat
Copy link

sir3mat commented Nov 7, 2024

i have "solved" the issue with this approach. This works when files are uploaded inside the chat

class Pipeline:
    class Valves(BaseModel):
        myValves...

    def __init__(self):
        self.name = "pipeline_custom_name"
        self.valves = self._initialize_valves()
        self.file_contents = {}

    def _initialize_valves(self) -> Valves:
        """Initialize valves using environment variables."""
        return self.Valves(
            my valves init
        )

    async def on_startup(self):
        """Called when the server is started."""
        logger.info(f"Server {self.name} is starting.")

    async def on_shutdown(self):
        """Called when the server is stopped."""
        logger.info(f"Server {self.name} is shutting down.")

    async def on_valves_updated(self):
        """Called when the valves are updated."""
        logger.info("Valves updated.")


    async def inlet(self, body: dict, user: dict) -> dict:
        """Modifies form data before the OpenAI API request."""
        logger.info("Processing inlet request")

        # Extract file info for all files in the body
        # here i have created an inmemory dictionary to link users to their owned files
        file_info = self._extract_file_info(body)
        self.file_contents[user["id"]] = file_info
        return body

    def _extract_file_info(self, body: dict) -> list:
        """Extracts the file info from the request body for all files."""
        files = []
        for file_data in body.get("files", []):
            file = file_data["file"]
            file_id = file["id"]
            filename = file["filename"]
            file_content = file["data"]["content"]

            # Create a OIFile object and append it to the list
            files.append(OIFile(file_id, filename, file_content))

        return files
        
    def pipe(
        self, body: dict, user_message: str, model_id: str, messages: List[dict]
    ) -> Union[str, Generator, Iterator]:
        
        logger.info("Starting PIPE process")

        # Extract parameters from body with default fallbacks
        stream = body.get("stream", True)
        max_tokens = body.get("max_tokens", self.valves.LLM_MAX_TOKENS)
        temperature = body.get("temperature", self.valves.LLM_TEMPERATURE)

        # Extract user ID from the body
        user = body.get("user", {})
        user_id = user.get("id", "")

        # Extract user files if available
        if user_id in self.file_contents:
            user_files = self.file_contents[user_id]
        else:
            user_files = None
        
        DO YOUR STUFF
        return result

    async def outlet(self, body: dict, user: Optional[dict] = None) -> dict:
        print(f"outlet:{__name__}")
        print(f"Received body: {body}")
      
        if user["id"] in self.file_contents:
            del self.file_contents[user["id"]]

        return body

        ```
        
Openwebui call the inlet, the pipe and the outlet every time the user send a query to the pipeline.
If you create a custom model (from the UI) using as base_model your pipeline, openWEBUI only call the pipe method (I don't understand why).

@rigvedrs
Copy link

rigvedrs commented Jan 3, 2025

@sir3mat I tried the above method and printed just the body dictionary to see what is being passed. This dictionary seems to be only containing all the chat info like RAG prompt and context for the user query. It is not passing the complete document, which we want to do, so as to be able to perform our own custom retrieval through our pipeline

To reproduce:

from typing import List, Union, Generator, Iterator
    

class Pipeline:
    def init(self):
        self.name = "00 Repeater Example"
        pass

    async def on_startup(self):
        # This function is called when the server is started.
        print(f"on_startup")
        pass

    async def on_shutdown(self):
        # This function is called when the server is shutdown.
        print(f"on_shutdown")
        pass

    async def inlet(self, body: dict, user: dict) -> dict:
        return body

    
    def pipe(self, user_message: str, model_id: str, messages: List[dict], body: dict) -> Union[str, Generator, Iterator]:
        return (f"Type of body: {type(body)} \n {body}") #user_message to the UI

@chandan-artpark
Copy link

The body data which is given to the pipe(), does not contain details like filename, collection_name, so you have to get the details from the inlet function and store it in a variable like this

self.inlet_details = []

class Pipeline:
 async def inlet(self, body: dict, user: dict) -> dict:
        print(f"Received body: {body}")
        files = body.get("files", [])
        for file in files:
            self.inlet_details.append({
                "filename": file.get("filename", "unknown"),
                "url": file.get("url", "unknown"),
    })

using these additional details you can lookup the file in the uploads dir and get the content, there is also an alternative way where you can send a request to a webui endpoint using an API key from your account in webui settings. Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants