-
Notifications
You must be signed in to change notification settings - Fork 5
Importing from Wikipedia
Importing Wikipedia pages is currently done in several steps. The import/getwiki.js
script starts at a certain category and descends down a given number of levels of subcategories, grabbing all article titles in those categories. It creates two text files (article.txt
, and category.txt
) which list the articles and categories imported, respectively. Currently the script has the category and depth hard-coded at the end, but this could be easily modified to accept command-line parameters.
Next, there is a script that takes a list of article names (such as those in article.txt
), and asks KnowNodes to create a new node for it. This script is called makeNodes.js
. It can be given the file to read on the command line, as well a line number to start on (in case the script was interrupted previously) and a maximum number of lines to read.
Finally, the actual import of the Wikipedia article is done within the KnownNodes API, in controllers/knownodes/index.coffee
. It uses the "nodemw" module to make requests for the article text as well as links to other Wikipedia articles. The API url is a POST request to /knownodes/wikinode, with a form argument of title: <article title>
.
- #20 -
article.txt
, andcategory.txt
are CSV files, but Wikipedia names can have commas in them! They should be changed to be tab-separated. - There is no way to update a Wikipedia page or handle its removal