Skip to content

Latest commit

 

History

History
80 lines (58 loc) · 1.4 KB

api_ref_data.rst

File metadata and controls

80 lines (58 loc) · 1.4 KB

torchtune.data

.. currentmodule:: torchtune.data

Text templates

Templates for instruct prompts and chat prompts. Includes some specific formatting for difference datasets and models.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    GrammarErrorCorrectionTemplate
    SummarizeTemplate
    QuestionAnswerTemplate
    PromptTemplate
    PromptTemplateInterface
    ChatMLTemplate

Types

.. autosummary::
    :toctree: generated/
    :nosignatures:

    Message
    Role

Message transforms

Converts data from common schema and conversation JSON formats into a list of torchtune :class:`Message`.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    InputOutputToMessages
    ShareGPTToMessages
    OpenAIToMessages
    ChosenRejectedToMessages
    AlpacaToMessages

Collaters

Collaters used to collect samples into batches and handle any padding.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    padded_collate
    padded_collate_tiled_images_and_mask
    padded_collate_sft
    padded_collate_dpo
    left_pad_sequence

Helper functions

Miscellaneous helper functions used in modifying data.

.. autosummary::
    :toctree: generated/
    :nosignatures:

    validate_messages
    truncate
    load_image
    format_content_with_images