-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<br/> is written in plain text in some places #1093
Comments
Had a look in our code and its seems this isn't a mistype by us but just part of the data that the handbook returns. It seems that the tag removal kicks in preprocessing but, data here is after the formatting step. Function for removting tags in def delete_HTML(processed: str) -> str:
"""Remove HTML tags"""
# Will replace with a space because they sometimes appear in the middle of the text
# so "and<br/>12 UOC" would turn into and12 UOC
return re.sub("<[a-z]*/>", " ", processed, flags=re.IGNORECASE) Anotehr TODO: if already has a space to the left or right, do not add extraneous spacing but replace with FIX:
Other ref of source data: See in "MARK3088": {
"title": "Product Analytics",
"code": "MARK3088",
"UOC": "6",
"gen_ed": "true",
"level": "3",
"description": "<p>Today\u2019s data-rich environment and advances in data mining techniques have enabled product idea generation from the crowd. Many innovative data-based products or services development and effective marketing of new product ideas are being born in crowdfunding platforms. Today, "data\u201d itself may form part of the \u201ccore material\u201d of new products or services. This course integrates the principles of product development with machine learning techniques by covering text and sentiment analysis to analyse social media posts, product reviews or start-ups campaign on crowdfunding platforms, and data product or service development such as recommendation algorithms. Students will exercise hands-on data analytics to develop and test the machine learning models and conduct exploratory product data analysis and visualisation.</p>",
"study_level": "Undergraduate",
"school": "School of Marketing",
"faculty": "UNSW Business School",
"campus": "Sydney",
"terms": "Term 1, Term 2",
"calendar": "3+",
"field_of_education": "080505 Marketing",
"attributes": [
{
"type": "general_education",
"description": "This course is available as <a href=\"https://www.student.unsw.edu.au/general-education\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">general education</a> and normally taken outside the study area in which the student\u2019s program is based. Availability of general education courses outside of the owning Faculty may be restricted by the Program Authority, usually because they are closely related to the study area of the student\u2019s program."
}
],
"equivalents": {},
"exclusions": {},
"enrolment_rules": "Pre-requisite: ECON1203 or COMM1190 or INFS1609 or MATH1041 or MATH1231 or MATH1241 or MATH1251 or MARK2052 or COMM2050/COMM3050 or COMM2501 or INFS2605 or INFS2609.<br/>Students with equivalent Statistics knowledge can seek pre-requisite waiver via webforms<br/><br/>"
},
|
Technically my tenure is over but, will try to get a fix up :) |
Describe the bug
In some courses, the word
<br/>
is in plain text.Screenshots
data:image/s3,"s3://crabby-images/f8531/f8531d3edaf446216244abd3a072e95c2aa57675" alt="image"
Thanks team!
The text was updated successfully, but these errors were encountered: