Skip to content

Latest commit

 

History

History
103 lines (77 loc) · 3.13 KB

2 - OSINT.md

File metadata and controls

103 lines (77 loc) · 3.13 KB

OSINT Web Resources

https://osintframework.com/

Scraping usernames from LinkedIn

Manual scraping:

Install Chrome 'XPath Helper' plugin.

Login to LinkedIn. Search for the company name. Click on "See all 'n' employees on LinkedIn".

Click on the XPath Helper icon in the Chrome menu. In the "QUERY" pane, enter '//*[contains(@class, 'name actor-name')]'.

Copy and paste the names from the 'RESULTS' pane.

Format usernames from employee names:

Convert First Last to first initial last name: awk '{print tolower(substr($1,1,1) $2)}' employees.txt

Convert First Last to first.last: awk '{print tolower($1 "." $2)}' employees.txt

Convert First Last to firstlast: awk '{print tolower($1 $2)}' employees.txt

Simple Bash script:

#!/bin/bash

awk '{print tolower(substr($1,1,1) $2)}' employees.txt > usernames
awk '{print tolower($1 "." $2)}' employees.txt >> usernames.txt
awk '{print tolower($1 $2)}' employees.txt >> usernames.txt

ToDo: Add LinkedIn and Zoom scrapers

This script will output a list of employee names from LinkedIn.

#!/usr/bin/env python3

from splinter import Browser
import argparse
import re

parser = argparse.ArgumentParser(description="LinkedIn Scraper. Author: Steve Campbell, @lpha3ch0")
parser.add_argument("-u", required=True, help="LinkedIn username")
parser.add_argument("-p", required=True, help="LinkedIn password")
parser.add_argument("-id", required=True, help="Company ID")
parser.add_argument("-n", required=True, help="Number of pages to scrape")
parser.add_argument("-uf", help="Username format: flast, first.last, firstlast. If omitted this script will print out employee names instead of formatted usernames.")
args = parser.parse_args()
username = args.u
password = args.p
company_id = args.id
pages = int(args.n)
uFormat = args.uf

# browser = Browser('firefox', user_agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/76.0.3809.87 Chrome/76.0.3809.87 Safari/537.36")
url = 'https://www.linkedin.com/search/results/people/?facetCurrentCompany=%5B%22{}%22%5D&page={}'
xpath = "//*[contains(@class, 'name actor-name')]"
employees = []
counter = 1
browser = Browser('firefox')
browser.visit('https://www.linkedin.com')
browser.find_by_text('Sign in').click()
browser.fill('session_key', username)
browser.fill('session_password', password)
browser.find_by_text('Sign in').click()
while counter <= pages:
  browser.visit(url.format(company_id, str(counter)))
  names = browser.find_by_xpath(xpath)
  for name in names:
    employees.append(name.text)
  counter += 1

browser.quit()

usernames = []

for username in usernames:
  print(username.lower())
for employee in employees:
  employee = re.sub('[,.]', '', employee)
  employee = re.sub('"', '', employee)
  employee = re.sub("'", '', employee)
  names = employee.split()
  if uFormat == 'flast':
    username = names[0][0] + names[1]
    usernames.append(username)
  elif uFormat == 'first.last':
    usernames.append(names[0] + '.' + names[1])
  elif uFormat == 'firstlast':
    usernames.append(names[0] + names[1])
  else:
    print(employee)
    

if uFormat:
  for username in usernames:
    print(username.lower())