Skip to content

Latest commit

 

History

History
125 lines (91 loc) · 10.5 KB

README.md

File metadata and controls

125 lines (91 loc) · 10.5 KB

Markov Chains and Context Free Grammars

Markov Chains

Markov Project References

CFG Resources

CFG Project References

CFG Visual Art

Reading

These readings are background material to help you think about text data for the upcoming weeks where I plan to move towards using ML models. These readings are from several years ago, but I believe the themes and questions raised are even more relevant today in the landscape of large language models.

Assignment

Try using markov chains or context free grammars. Feel free to pick just one or try both!

(It is not required to write any new code for this assignment. You are welcome to run one or more of the provided examples with your own data. You can document the results in a blog post (or link to a web page where the text is generated). I'll include some other ideas below in case you are feeling ambitious.)

Markov Chains

Use one of the existing examples to generate text with your own input data. Experiment with the "order" and "maximum" length variables. Try mixing multiple texts. Copy paste your favorite outputs from the browser and document in a blog post.

Emily Martinez proposes a series of questions to ask related to working with a corpus of text data. Reflect on these questions and how they played into your process working with source texts in your documentation post.

It is not required to write any new code for this assignment, however I'll include some ideas for further exploration below.

  • Design a webpage that displays the output of a markov generator a la Allison Parrish's ITP course creator.
  • Create a bot that generates its output based on a markov chain.
  • Use a markov chain on something other than text. Record your own sequence of daily habits. Try musical notes. Could colors or shapes be generated with a markov chain? What else? You can find examples for musical markov chains from Luisa Pereira's Code of Music materials.
  • Thinking back to the word counting material, visualoze n-gram frequencies and/or markov probabilities.

Context-Free Grammars

Invent your own grammar and generate text. I suggest using Tracery but you can base your code on any of my examples, or try RiGrammer from the RiTa library.

Getting results from a context-free-grammar can be tricky. Short and sweet, highly structured ideas tend to work well. For example.

  • A coffee drink order generator.
  • An apology generator.
  • An ITP project idea generator.
  • A knock knock joke generator.

Something you might consider is pulling the "terminal" words for your grammar from an API or other data source. You are also welcome to explore generative visual art with Context Free Grammars basing your exercise off of the L-System material described above. Or what else can you generate from a Context-Free Grammar? Music?

Add your assignment below via Pull Request

(Please note you are welcome to post under a pseudonym and/or password protect your published assignment. Here is some helpful information on privacy options for an NYU blog. Finally, if you prefer not to post your assignment at all here, you may email the submission.)

Emoji Key for Video Tutorials, Readings, and more

  • 🚨 Watch this video tutorial! (this is technical info needed for the examples). Of course if you alreaddy know this material, you can skip.
  • 🔢 This is found in a group, maybe pick just one to check out!
  • 🍿 Additional video if you have a particular interest and want to do a deeper dive.
  • 📕 Required reading! Let's make sure we all have read this.
  • 📚 Optional additional reading for a deeper dive.
  • 💻 Code examples here!
  • 📈 Class presentation slides
  • 🔗 Extra reference material / link

Emily Martinez Questions

  • How can we be more intentional about what we build given the current limitations, problems, and constraints of ML algorithms?
  • How do we prepare datasets and set up guidelines that protect the bodies of knowledge of our communities, that honors lineage, that upholds ethical frameworks rooted in shared, agreed-upon values?
  • How do we work in consensual and respectful ways with texts by marginalized authors that are not as well-represented, and by virtue of that fact alone, much more likely to be misrepresented, misappropriated, or misunderstood if we are not careful?
  • How well can we ensure that the essence of these texts doesn’t dissolve into a word-soup that gets misconstrued?
  • Given that so many of the existing “big data” language models are trained with Western texts and proprietary datasets, what does it even mean to try to decolonize AI?
  • Who do we entrust to do this work?
  • How do we deal with credit and attribution of our new creations?
  • How do we really do ethics with machine learning?
  • How do we get through this whole list of concerns and still build AI that is fun, respectful, tender, pleasurable, kind?