This project involves scrapping job postings from a Telegram Channel and presenting them via a frontend built with Next.js. The jobs are categorized "manually" (with BeautifulSoup) and through the assistance of language models (LLMs).
- Automated Scraping: Scrapes and parses job posts from a dynamic Telegram group chat.
- Categorization: Job postings are categorized both manually and using LLM assistance.
- Frontend with Next.js: Displays categorized job posts with a clean UI.
- Hosted on Vercel: Free and easy-to-use deployment.
The scraping process targets the DCCEmpleo Telegram Channel, which regularly posts job listings for the Chilean tech scene. I scraped over 1,400 job posts, categorizing them with a combination of manual tagging and language model assistance (GPT-4 mini
).
Scraper Code
: All the scraping logic is located in the /processing/ directory.Output Data
: The final job post data is stored as JSON and can be found in /frontend/src/job-posts.json.
I used GPT-4 mini to assist in processing and categorizing job posts where manual effort was too time-consuming. You can view the LLM prompt I used here.
The frontend is built using Next.js, allowing for easy static data API generation to serve the job data scraped during the first phase. The frontend consumes the JSON data and displays it through a clean, responsive UI.
Frontend Code
: Check the frontend implementation in the /frontend/ directory.Hosting
: Deployed on Vercel, which provides free hosting and seamless integration with Next.js.
The API is built through the Next.js static api. You can find it under /frontend/src/app/api
If you want to submit a job post, create an issue
including the following data for your job post submission:
{
"sender": "<your-telegram-username-starting-with-an-@>",
"contact_email": ["<your-email-(optional)>"],
"contact_phone": ["<your-phone-(optional)>"],
"links": ["<any-link-(optional)>"],
"text": "<your-job-offer-text>",
"date": {
"day": "<current-day-number>",
"month": "<current-month-number>",
"year": "<current-year-number>",
"time": "<current-time>"
},
"company_name": "<your-company-name>",
"remote_work_policy": "<remote, hybrid, in person>",
"employment_type": "<practica, fulltime, trabajo de título, part time>",
"salary_range": {
"currency": "<CLP, USD>",
"min_bound": "<min-bound-for-position>",
"max_bound": "<man-bound-for-position>"
},
"technologies": ["<any-tecnologies-the-applicant-should-know>"],
"id": null
}