Access uploaded files in pipelines #164

gqoew · 2024-07-17T08:00:54Z

Hi there,

Being able to access uploaded files would be a great addition to pipelines. It would greatly expand the potential of pipelines, by not being limited with text input.

It would be also great to enable pipelines to return files in the chat as well.

Is there any plan to move this feature forward in the near future? Would be happy to test

Related issues: #66 #19 #81

g453030291 · 2024-07-19T08:41:54Z

Looking forward to it.
Hopefully, we can add the ability to process files in custom pipelines as soon as possible.
This will greatly enhance the scalability of the project.
Is there anything I can do? I'd like to help.

chandan-artpark · 2024-07-25T09:48:21Z

Hey,
Is there a way to get the files the user has selected in the pipelines class ? currently the only arguments are "user_message, model_id, messages and body". In the default RAG pipeline information such as file names, collection_names are provided, basically information about which file/collection the user has selected in the message.
Can this information also be accessed in Pipelines ?

JiangYain · 2024-08-14T04:28:08Z

I have the same problem. If we don't have access to user-uploaded files, it limits a lot of functionality 😶It' s hard to get other parameters passed by the front end, such as whether a new session has been created (which bothers me, even if a new session is created, there is no way to restart a new context), likes or dislike, etc.

InquestGeronimo · 2024-08-20T15:31:07Z

you can access uploaded files by adding an inlet function, if you upload a file, you should see it in the body:

    async def inlet(self, body: dict, user: dict) -> dict:
        # This function is called before the OpenAI API request is made. You can modify the form data before it is sent to the OpenAI API.
        print(f"inlet:{__name__}")

        print(body)
        print(user)

        return body

Fusseldieb · 2024-08-28T16:29:30Z

@InquestGeronimo Sorry for pinging you, but did the API change? Some weeks ago I tried to make a example pipeline, and it errored out as soon as I attached an image (#66). Is it now "supported"?

That's the main thing that's holding me back from integrating pipelines instead of OpenAI so far. I don't want to loose image capabilities.

EDIT: Looks like something HAS changed! The pipeline doesn't error out anymore. Yay! Guess I'll be using Pipelines now!

@tjbck Care to close this issue? I'm not OP but I guess this is solved.

S1M0N38 · 2024-10-04T16:07:34Z

Here is a hacky way to access uploaded files.
Define an inlet function as suggested by @InquestGeronimo and query + /content

async def inlet(self, body: dict, user: dict) -> dict:
    print(f"Received body: {body}")
    files = body.get("files", [])
    for file in files:
        content_url = file["url"] + "/content"
        print(f"file available at {content_url}")
        # read the file content as binary and do something ...
    return body

jeandelest · 2024-10-18T13:43:50Z

Hi @InquestGeronimo , you solution works for me. Thank ! I still have an issue, it seems the data is not given has it is, do you know why ? Is there a way to get the original file content ?

Here is the original data:

PlayerID;FirstName;LastName;Team;Position;Goals;Assists;Appearances
1;Leo;Messi;Paris Saint-Germain;Forward;672;305;786
2;Cristiano;Ronaldo;Al Nassr;Forward;700;223;900
3;Neymar;Da Silva Santos;Al Hilal;Forward;398;200;600
4;Kylian;Mbappe;Paris Saint-Germain;Forward;300;150;400
5;Robert;Lewandowski;FC Barcelona;Forward;500;150;700
6;Kevin;De Bruyne;Manchester City;Midfielder;100;200;500
7;Luka;Modric;Real Madrid;Midfielder;120;170;600
8;N'Golo;Kante;Chelsea;Midfielder;30;80;400
9;Ruben;Dias;Manchester City;Defender;10;20;250
10;Virgil;Van Dijk;Liverpool;Defender;20;15;250

And here is what I got from the pipeline:

{
  "id": "6547e61d-dc1d-4544-a4fa-b796d40303e5",
  "user_id": "80ce7079-c367-41e5-89f7-7de8534b90e4",
  "hash": "75e40889b84327411325d75964484104733eb18c58ff14ff1d0c8f057defa1e0",
  "filename": "6547e61d-dc1d-4544-a4fa-b796d40303e5_players.csv",
  "data": {
    "content": "PlayerID: 1\nFirstName: Leo\nLastName: Messi\nTeam: Paris Saint-Germain\nPosition: Forward\nGoals: 672\nAssists: 305\nAppearances: 786 PlayerID: 2\nFirstName: Cristiano\nLastName: Ronaldo\nTeam: Al Nassr\nPosition: Forward\nGoals: 700\nAssists: 223\nAppearances: 900 PlayerID: 3\nFirstName: Neymar\nLastName: Da Silva Santos\nTeam: Al Hilal\nPosition: Forward\nGoals: 398\nAssists: 200\nAppearances: 600 PlayerID: 4\nFirstName: Kylian\nLastName: Mbappe\nTeam: Paris Saint-Germain\nPosition: Forward\nGoals: 300\nAssists: 150\nAppearances: 400 PlayerID: 5\nFirstName: Robert\nLastName: Lewandowski\nTeam: FC Barcelona\nPosition: Forward\nGoals: 500\nAssists: 150\nAppearances: 700 PlayerID: 6\nFirstName: Kevin\nLastName: De Bruyne\nTeam: Manchester City\nPosition: Midfielder\nGoals: 100\nAssists: 200\nAppearances: 500 PlayerID: 7\nFirstName: Luka\nLastName: Modric\nTeam: Real Madrid\nPosition: Midfielder\nGoals: 120\nAssists: 170\nAppearances: 600 PlayerID: 8\nFirstName: N'Golo\nLastName: Kante\nTeam: Chelsea\nPosition: Midfielder\nGoals: 30\nAssists: 80\nAppearances: 400 PlayerID: 9\nFirstName: Ruben\nLastName: Dias\nTeam: Manchester City\nPosition: Defender\nGoals: 10\nAssists: 20\nAppearances: 250 PlayerID: 10\nFirstName: Virgil\nLastName: Van Dijk\nTeam: Liverpool\nPosition: Defender\nGoals: 20\nAssists: 15\nAppearances: 250"
  },
  "meta": {
    "name": "players.csv",
    "content_type": "text/csv",
    "size": 579,
    "path": "/app/backend/data/uploads/6547e61d-dc1d-4544-a4fa-b796d40303e5_players.csv",
    "collection_name": "file-6547e61d-dc1d-4544-a4fa-b796d40303e5"
  },
  "created_at": 1729251218,
  "updated_at": 1729251218
}

Regards

sir3mat · 2024-11-07T09:16:51Z

i have "solved" the issue with this approach. This works when files are uploaded inside the chat

class Pipeline:
    class Valves(BaseModel):
        myValves...

    def __init__(self):
        self.name = "pipeline_custom_name"
        self.valves = self._initialize_valves()
        self.file_contents = {}

    def _initialize_valves(self) -> Valves:
        """Initialize valves using environment variables."""
        return self.Valves(
            my valves init
        )

    async def on_startup(self):
        """Called when the server is started."""
        logger.info(f"Server {self.name} is starting.")

    async def on_shutdown(self):
        """Called when the server is stopped."""
        logger.info(f"Server {self.name} is shutting down.")

    async def on_valves_updated(self):
        """Called when the valves are updated."""
        logger.info("Valves updated.")


    async def inlet(self, body: dict, user: dict) -> dict:
        """Modifies form data before the OpenAI API request."""
        logger.info("Processing inlet request")

        # Extract file info for all files in the body
        # here i have created an inmemory dictionary to link users to their owned files
        file_info = self._extract_file_info(body)
        self.file_contents[user["id"]] = file_info
        return body

    def _extract_file_info(self, body: dict) -> list:
        """Extracts the file info from the request body for all files."""
        files = []
        for file_data in body.get("files", []):
            file = file_data["file"]
            file_id = file["id"]
            filename = file["filename"]
            file_content = file["data"]["content"]

            # Create a OIFile object and append it to the list
            files.append(OIFile(file_id, filename, file_content))

        return files
        
    def pipe(
        self, body: dict, user_message: str, model_id: str, messages: List[dict]
    ) -> Union[str, Generator, Iterator]:
        
        logger.info("Starting PIPE process")

        # Extract parameters from body with default fallbacks
        stream = body.get("stream", True)
        max_tokens = body.get("max_tokens", self.valves.LLM_MAX_TOKENS)
        temperature = body.get("temperature", self.valves.LLM_TEMPERATURE)

        # Extract user ID from the body
        user = body.get("user", {})
        user_id = user.get("id", "")

        # Extract user files if available
        if user_id in self.file_contents:
            user_files = self.file_contents[user_id]
        else:
            user_files = None
        
        DO YOUR STUFF
        return result

    async def outlet(self, body: dict, user: Optional[dict] = None) -> dict:
        print(f"outlet:{__name__}")
        print(f"Received body: {body}")
      
        if user["id"] in self.file_contents:
            del self.file_contents[user["id"]]

        return body

        ```
        
Openwebui call the inlet, the pipe and the outlet every time the user send a query to the pipeline.
If you create a custom model (from the UI) using as base_model your pipeline, openWEBUI only call the pipe method (I don't understand why).

rigvedrs · 2025-01-03T18:56:18Z

@sir3mat I tried the above method and printed just the body dictionary to see what is being passed. This dictionary seems to be only containing all the chat info like RAG prompt and context for the user query. It is not passing the complete document, which we want to do, so as to be able to perform our own custom retrieval through our pipeline

To reproduce:

from typing import List, Union, Generator, Iterator
    

class Pipeline:
    def init(self):
        self.name = "00 Repeater Example"
        pass

    async def on_startup(self):
        # This function is called when the server is started.
        print(f"on_startup")
        pass

    async def on_shutdown(self):
        # This function is called when the server is shutdown.
        print(f"on_shutdown")
        pass

    async def inlet(self, body: dict, user: dict) -> dict:
        return body

    
    def pipe(self, user_message: str, model_id: str, messages: List[dict], body: dict) -> Union[str, Generator, Iterator]:
        return (f"Type of body: {type(body)} \n {body}") #user_message to the UI

chandan-artpark · 2025-01-06T07:31:29Z

The body data which is given to the pipe(), does not contain details like filename, collection_name, so you have to get the details from the inlet function and store it in a variable like this

self.inlet_details = []

class Pipeline:
 async def inlet(self, body: dict, user: dict) -> dict:
        print(f"Received body: {body}")
        files = body.get("files", [])
        for file in files:
            self.inlet_details.append({
                "filename": file.get("filename", "unknown"),
                "url": file.get("url", "unknown"),
    })

using these additional details you can lookup the file in the uploads dir and get the content, there is also an alternative way where you can send a request to a webui endpoint using an API key from your account in webui settings. Hope this helps.

jeandelest mentioned this issue Oct 18, 2024

Access to uploaded file in pipline #260

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access uploaded files in pipelines #164

Access uploaded files in pipelines #164

gqoew commented Jul 17, 2024

g453030291 commented Jul 19, 2024

chandan-artpark commented Jul 25, 2024

JiangYain commented Aug 14, 2024

InquestGeronimo commented Aug 20, 2024

Fusseldieb commented Aug 28, 2024 •

edited

Loading

S1M0N38 commented Oct 4, 2024

jeandelest commented Oct 18, 2024

sir3mat commented Nov 7, 2024 •

edited

Loading

rigvedrs commented Jan 3, 2025 •

edited

Loading

chandan-artpark commented Jan 6, 2025

Access uploaded files in pipelines #164

Access uploaded files in pipelines #164

Comments

gqoew commented Jul 17, 2024

g453030291 commented Jul 19, 2024

chandan-artpark commented Jul 25, 2024

JiangYain commented Aug 14, 2024

InquestGeronimo commented Aug 20, 2024

Fusseldieb commented Aug 28, 2024 • edited Loading

S1M0N38 commented Oct 4, 2024

jeandelest commented Oct 18, 2024

sir3mat commented Nov 7, 2024 • edited Loading

rigvedrs commented Jan 3, 2025 • edited Loading

chandan-artpark commented Jan 6, 2025

Fusseldieb commented Aug 28, 2024 •

edited

Loading

sir3mat commented Nov 7, 2024 •

edited

Loading

rigvedrs commented Jan 3, 2025 •

edited

Loading