Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<br/> is written in plain text in some places #1093

Open
amazhangwinz opened this issue Nov 28, 2023 · 2 comments
Open

<br/> is written in plain text in some places #1093

amazhangwinz opened this issue Nov 28, 2023 · 2 comments

Comments

@amazhangwinz
Copy link

amazhangwinz commented Nov 28, 2023

Describe the bug
In some courses, the word <br/> is in plain text.

Screenshots
image

Thanks team!

@imagine-hussain
Copy link
Contributor

Had a look in our code and its seems this isn't a mistype by us but just part of the data that the handbook returns.

It seems that the tag removal kicks in preprocessing but, data here is after the formatting step.
The tag removal will happen for when these course conditions are shown from the condition side but not the course side.

Function for removting tags in backend/data/processors/conditions_preprocessing.py:191

def delete_HTML(processed: str) -> str:
    """Remove HTML tags"""
    # Will replace with a space because they sometimes appear in the middle of the text
    # so "and<br/>12 UOC" would turn into and12 UOC
    return re.sub("<[a-z]*/>", " ", processed, flags=re.IGNORECASE)

Anotehr TODO: if already has a space to the left or right, do not add extraneous spacing but replace with "".


FIX:

  1. Add this into courses_formatting and into programs_formatting BUT, using \n chars instead to not break readability :)

Other ref of source data:

See in backend/scrapers/coursesFormattedRaw.json:

    "MARK3088": {
        "title": "Product Analytics",
        "code": "MARK3088",
        "UOC": "6",
        "gen_ed": "true",
        "level": "3",
        "description": "<p>Today\u2019s data-rich environment and advances in data mining techniques have enabled product idea generation from the crowd. Many innovative data-based products or services development and effective marketing of new product ideas are being born in crowdfunding platforms. Today, &#34;data\u201d itself may form part of the \u201ccore material\u201d of new products or services. This course integrates the principles of product development with machine learning techniques by covering text and sentiment analysis to analyse social media posts, product reviews or start-ups campaign on crowdfunding platforms, and data product or service development such as recommendation algorithms. Students will exercise hands-on data analytics to develop and test the machine learning models and conduct exploratory product data analysis and visualisation.</p>",
        "study_level": "Undergraduate",
        "school": "School of Marketing",
        "faculty": "UNSW Business School",
        "campus": "Sydney",
        "terms": "Term 1, Term 2",
        "calendar": "3+",
        "field_of_education": "080505 Marketing",
        "attributes": [
            {
                "type": "general_education",
                "description": "This course is available as <a href=\"https://www.student.unsw.edu.au/general-education\" target=\"_blank\" rel=\"noopener noreferrer nofollow\">general education</a> and normally taken outside the study area in which the student\u2019s program is based. Availability of general education courses outside of the owning Faculty may be restricted by the Program Authority, usually because they are closely related to the study area of the student\u2019s program."
            }
        ],
        "equivalents": {},
        "exclusions": {},
        "enrolment_rules": "Pre-requisite: ECON1203 or COMM1190 or INFS1609 or MATH1041 or MATH1231 or MATH1241 or MATH1251 or MARK2052 or COMM2050/COMM3050 or COMM2501 or INFS2605 or INFS2609.<br/>Students with equivalent Statistics knowledge can seek pre-requisite waiver via webforms<br/><br/>"
    },

@imagine-hussain
Copy link
Contributor

Technically my tenure is over but, will try to get a fix up :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants