Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop Lua filters for common LaTeX macros not handled by pandoc directly #35

Open
mtmorgan opened this issue Oct 15, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@mtmorgan
Copy link

Following on #34, This StackOverflow post shows how to write a Lua filter; a set of these might be developed for the BiocStyle macros as a kind of 'meta' resource for this project.

This

return {
  {
    RawInline = function (raw)
      local macro = raw.text:match '\\R{}'
      if raw.format == 'latex' and macro then
        return pandoc.RawInline('markdown', '_R_')
      end
    end
  }
}

would replace the Rnw macro \R{} with the markdown _R_ and if in a file BiocStyle-Rnw-to-Rmd.lua would be used as

pandoc -f latex+raw_tex -t markdown file.Rnw --lua-filter BiocStyle-Rnw-to-Rmd.lua -o file.Rmd

The next macros to tackle are likely \CRANpkg{<package name>} and \Biocpkg{<package name>} which translate to markdown links [<package name>](https://cran.r-project.org/package=<package name>) and [<package name>](https://bioconductor.org/packages/<package name> followed by \Rcode{<inline code>} translated to `<inline code>`. I think Sweave code chunks <<...>>= ... @ could also be translated automatically

@jwokaty
Copy link
Contributor

jwokaty commented Oct 15, 2022

Thank you again. This is exactly where we wanted to go! I think this would be an interesting task for our future Outreachy fellow.

@jwokaty jwokaty added the enhancement New feature or request label Oct 15, 2022
@jwokaty jwokaty moved this to Todo in Sweave2Rmd Oct 15, 2022
@LiNk-NY
Copy link

LiNk-NY commented Dec 20, 2022

Thanks Martin, @mtmorgan

Here is a working filter that I was able to come up with. The language is a bit unwieldy and I'm a novice :)

function RawInline (raw)
    local formula = raw.text:match '\\Rpackage{(.*)}'
    if raw.format == 'latex' and formula then
        return pandoc.RawInline('markdown', '`r Biocpkg(' .. formula .. ')`')
    end

    local formula = raw.text:match '\\Robject{(.*)}'
    if raw.format == 'latex' and formula then
        return pandoc.RawInline('markdown', '`' .. formula .. '`')
    end

    local formula = raw.text:match '\\Rfunction{(.*)}'
    if raw.format == 'latex' and formula then
        return pandoc.RawInline('markdown', '`' .. formula .. '`')
    end
end

@mtmorgan
Copy link
Author

It would probably be helpful to come up with a test Rnw document and corresponding expected Rmd document, with one line per LaTeX 'test' --> corresponding Rmd. I tweaked your code & my code a bit

return {
  {
    RawInline = function (raw)
      local macro = raw.text:match '\\R{}'
      if raw.format == 'latex' and macro then
        return pandoc.RawInline('markdown', '*R*')
      end

      local macro = raw.text:match '\\R$'
      if raw.format == 'latex' and macro then
        return pandoc.RawInline('markdown', '*R*')
      end

      local formula = raw.text:match '\\Bioconductor{}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '*Bioconductor*')
      end

      local formula = raw.text:match '\\CRANpkg{([^}]*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`r CRANpkg(' .. formula .. ')`')
      end

      local formula = raw.text:match '\\Biocpkg{([^}]*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`r Biocpkg(' .. formula .. ')`')
      end

      local formula = raw.text:match '\\Githubpkg{([^}]*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`r Githubpkg(' .. formula .. ')`')
      end

      local formula = raw.text:match '\\Rpackage{([^}]*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`' .. formula .. '`')
      end

      local formula = raw.text:match '\\Robject{(.*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`' .. formula .. '`')
      end

      local formula = raw.text:match '\\Rcode{(.*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`' .. formula .. '`')
      end

      local formula = raw.text:match '\\software{(.*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`' .. formula .. '`')
      end

      local formula = raw.text:match '\\file{(.*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`' .. formula .. '`')
      end

      local formula = raw.text:match '\\Rfunction{(.*)}'
      if raw.format == 'latex' and formula then
         return pandoc.RawInline('markdown', '`' .. formula .. '`')
      end
    end
  }
}

to translate

The \R{} programming language

\R\ is a programming language.

The name of one programming language is simply \R.

\Biocpkg{BiocStyle} is a \Bioconductor{} package.

The \CRANpkg{knitr} is used to create markdown vignettes.

Sometimes packages, like \Githubpkg{AnVILAz} are only found on Github.

The \R{} package \Rpackage{foo} is not found in any common repository

\software{samtools} is pretty important in Bioinformatics...

\Robject{mtcars} is a \Rcode{data.frame}.

\Rfunction{data.frame} is a function used to create a \Rcode{data.frame}.

\Rfunction{data.frame()} is a function used to create a \Rcode{data.frame}.

Sometimes inline \R{} code \Rcode{x <-
1 + 1} can span two lines.

to get something that is mostly correct(?)

The *R* programming language

*R* is a programming language.

The name of one programming language is simply *R*.

`r Biocpkg(BiocStyle)` is a *Bioconductor* package.

The `r CRANpkg(knitr)` is used to create markdown vignettes.

Sometimes packages, like `r Githubpkg(AnVILAz)` are only found on
Github.

The *R* package `foo` is not found in any common repository

`samtools` is pretty important in Bioinformatics\...

`mtcars` is a `data.frame`.

`data.frame` is a function used to create a `data.frame`.

`data.frame()` is a function used to create a `data.frame`.

Sometimes inline *R* code `x <-
1 + 1` can span two lines.

As you note, probably there are much better ways of implementing the Lua code, which is highly repetitive now! Also, maybe we could start a Lua repository that might start to follow better practices (than an issue thread!) for Lua development...

@jwokaty
Copy link
Contributor

jwokaty commented Dec 20, 2022

@mcarlsn @villafup @BerylKanali It might be that you've noticed things that we repeatedly have to manually edit to get it in the right format. It might good to start documenting that here, so that we can make sure those cases are included. I agree with @mtmorgan that it would be nice to come up with a test .Rnw. Maybe @BerylKanali can help with this given some guidance?

@mtmorgan
Copy link
Author

@jwokaty perhaps it makes sense to create a lua branch and add an inst/lua directory with progress so far? I've iterated a bit on @LiNk-NY 's work, and things look pretty promising. Definitely @BerylKanali could help with the test Rnw file!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

3 participants