Mikhail Krechetov Personal page

Parsing LinkedIn.

When looking for a new job, having a referral at a promising company may be crucial (since it may be difficult to pass hr screening these days). Ideally, a referral is somebody you know, however, a person from the same university or town is also ok. At first, I was trying to search for potential referrals on LinkedIn and got frustrated quite fast; having a list or CSV file would be a lot more convenient.

Recently, I found a very convenient tool (python-based API). In this short post, I will some use cases.

Preparations

import argparse
import pandas as pd
from linkedin_api import Linkedin

First, you should login using your LinkedIn credentials (it will work with any regular profile):

api = Linkedin('login@gmail.com', 'password')

Extract the company’s information

company = api.get_company(company_name)

The result of this query is a dictionary and for further manipulations, we need to extract the company’s unique Id:

def get_urn(company):
    fullUrn = company['entityUrn']
    return fullUrn[fullUrn.rfind(':')+1:]

This unique Id, called URN, is usually a long string but we need only the number at the end.

Now, having the company’s URN, we can search through its employees:

people_of_company = api.search_people(current_company=[company_urn], **kwargs)

The result of this query is a list of dictionaries, so we are ready to write a simple script:

Result

Finally, the resulting script may look as follows (be patient, it may take about half a minute for each search call):

import argparse
import pandas as pd
from linkedin_api import Linkedin

api = Linkedin('login@gmail.com', 'password')

def get_urn(company):
    fullUrn = company['entityUrn']
    return fullUrn[fullUrn.rfind(':')+1:]

def get_uni_list(profile):
    try:
        schools = profile['education']
        unilist = [school['school']['schoolName'] for school in schools]
        return list(set(unilist))
    except:
        return []
    
def get_job_list(profile):
    try:
        jobs = profile['experience']
        joblist = [job['companyName'] for job in jobs]
        return list(set(joblist))
    except:
        return []
        
    
def parse(person):
    try:
        profile = api.get_profile(person['urn_id'])
    except:
        try:
            profile = api.get_profile(person['public_id'])
        except:
            return
   
    return {'Company':company_name, 'Employee':person['name'], 'Position':person['jobtitle'], 
            'Location':person['location'], 'Id':person['public_id'], 
            'Universities':get_uni_list(profile), 'Experience':get_job_list(profile)
           }
    
    
parser = argparse.ArgumentParser(description='LinkedIn Parser. Usage example: "python script.py --company google --college "Your College Name""')
parser.add_argument('--company', type=str, default="google", help='Name of a Company')
parser.add_argument('--college', type=str, default="MIT", help='Name of your College')
args = parser.parse_args()

company_name = args.company
company = api.get_company(company_name)
company_urn = get_urn(company)

people_by_rus = api.search_people(current_company=[company_urn], profile_languages=['ru'])
print('Found {} russian-speaking people at the company {}'.format(len(people_by_rus), company_name))

people_by_uni = api.search_people(current_company=[company_urn], keyword_school=args.college)
print('Found {} people from {} at the company {}'.format(len(people_by_uni), args.college, company_name))

people = people_by_rus + people_by_uni

df = pd.DataFrame(columns=['Company', 'Employee', 'Position', 'Location', 'Id', 'Universities', 'Experience'])
for counter, person in enumerate(people):
    print("Parsing person no.{} out of {}.".format(counter, len(people)))
    
    result = parse(person)
    if result:
        result['Company'] = company_name
        df = df.append(result, ignore_index=True)

df.to_csv('scraped.csv', index=False, mode='a')