-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import csv #63
base: main
Are you sure you want to change the base?
Import csv #63
Conversation
Edited code and created new text files for TED Talks
With this code, all rows of the csv are now successfully converted into txt files, regardless of the characters in the title column. In addition, the code in auto_instances now uses the Document manager to easily and efficiently create new instances of the Document model.
Wrote new code that adds the Document instances for each TED Talk file into a Corpus object (an instance of the Corpus model).
The program no longer needs to turn reach row in the csv into its own text file. Each row of data in the csv is now directly converted into a Document, and the Documents are all combined into a Corpus.
Created a POST request that took in a csv file and returned a corpus(a serialized version of a corpus)
Wrote an API endpoint function that receives through a POST request a filename for a file in the backend (i.e. small_talks.csv). It then runs the corresponding file through parse_csv and returns as a Response a text representation of the corpus created from that file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! You've tackled numerous parts of the stack and have greatly expanded the input capabilities of the project. I've left a number of comments above, a few of which will need to be addressed before we can merge this. If possible, I'd also love to see a test around parse_csv
. Great work!
Cleaned up code, removed debugging statements, and deleted unnecessary files, in preparation of merge into main branch.
This PR request includes a POST request that receives a csv file that is parsed with a code written to parse csv files with certain attributes and then add the csv file to a corpus