From the buzzfeed data, we now have a lot of companies in our dataset, we would like to know more about these companies than just their name.
So we look at OpenCorporates.com which is an online registry (with an API) of over 126 million corporations world-wide with detailed information like addresses, filings, officers etc.
Our first attempts at matching were not so successful because of spelling differences, there is a lot more that you can do, for now we only remove all punctuation, and remove any "CORP|CO|CORPORATION|INCORPORATION|COMPANY" suffixes.
For the queries please have a look at ../01-trumpworld/cleanup.adoc
A new beta feature of OpenCorporates is "Grouping", which creates lists of companies that belong together under some topic, or conglomerate.
Ben Parker thankfully collected many companies (1056) which belong to Donald Trump’s various businesses (and other members of his cabinet and family).
We can download that grouping also via the OpenCorporates API.
The JSON looks like this (we’re only interested in the results.corporate_groupings.memberships.membership.company
information)
{ "api_version": "0.4", "results": { "corporate_grouping": { "name": "Donald Trump", "wikipedia_id": "Donald_Trump", "companies_count": 1052, "created_at": "2016-05-04T19:59:57+00:00", "updated_at": "2017-03-04T14:43:48+00:00", "curators": [], "memberships": [{ "membership": { "source": { "publisher": "Ben Parker", "retrieved_at": "2016-05-04T20:00:00+00:00", "url": "https://opencorporates.com/users/23275", "source_type": "user" }, "company": { "name": "TRUMP SALES & LEASING CHICAGO MEMBER CORP", "jurisdiction_code": "us_de", "company_number": "4744125", "opencorporates_url": "https://opencorporates.com/companies/us_de/4744125", "inactive": null }}}, ...]
So for the names coming back in that JSON file we do the same cleanup, then try to merge them on the "cleaned" attribute with our organizations and set a few attributes, most importantly the jurisdiction and the opencorporates number which we need later.
call apoc.load.json("https://api.opencorporates.com/v0.4/corporate_groupings/Donald+Trump") yield value
UNWIND value.results.corporate_grouping.memberships as entry
WITH entry.membership.company as company
WITH company, apoc.text.regreplace(apoc.text.regreplace(toUpper(company.name),"([^A-Z0-9 ]+) ?"," ")," (CORP|CO|CORPORATION|INCORPORATION|COMPANY)$","") as cname
MERGE (o:Organization {cleaned:cname})
ON CREATE SET o.name=company.name
SET o.jurisdiction = company.jurisdiction_code, o.oc_number = company.company_number,
o.inactive = company.inactive = true;
Added 154 labels, created 154 nodes, set 2717 properties, statement completed in 3428 ms.
match (o:Organization) WHERE not exists(o.oc_number)
WITH o LIMIT 100
call apoc.load.json("https://api.opencorporates.com/v0.4/companies/search?q="+apoc.text.urlencode(o.name)) yield value
with o, value.results as results
where size(results.companies) = 1
WITH results.companies[0].company as c,o
SET o.oc_number = c.company_number, o.company_type = c.company_type,
o.current_status = c.current_status, o.inactive = c.inactive,
o.incorporation_date = c.incorporation_date, o.dissolution_date = c.dissolution_date,
o.created_at = c.created_at, o.updated_at = c.updated_at, o.retrieved_at = c.retrieved_at,
o.address = c.registered_address_in_full, o.source = c.source.publisher,
o.previous_names = [n IN c.previous_names | n.company_name];
match (o:Organization) WHERE exists(o.oc_number) and exists(o.jurisdiction)
return "curl -o oc_"+o.oc_number+".json -L https://api.opencorporates.com/v0.4/companies/"+o.jurisdiction+"/"+o.oc_number limit 10
MATCH (o:Organization) WHERE exists(o.oc_number) and exists(o.jurisdiction)
WITH o LIMIT 100
CALL apoc.load.json("https://api.opencorporates.com/v0.4/companies/"+o.jurisdiction+"/"+o.oc_number) yield value
WITH o, value.results as results
WITH results.companies[0].company as c,o
SET o.groupings = [g IN c.corporate_groupings | g.corporate_grouping.name],
WITH o, c, apoc.map.groupByMulti([d IN c.data.most_recent WHERE d.datum.description IS NOT null | {key:datum.data_type,value:datum.description}],"key") as agg
WITH o, c, apoc.map.fromPairs([k in keys(agg) | [k,[v IN agg[k] | v.value]]]) as values
SET o += values
UNWIND c.officers as officer
MERGE (p:Person {name:toUpper(officer.name)})
WITH * WHERE not exists (p)-->(o)
CALL apoc.create.relationship(p,toUpper(officer.position),{oc_number:officer.id, start:officer.start_date,end:officer.end_date, source:officer.opencorporates_url},o) yield rel
RETURN count(*);
-
Doesn’t work well for officers yet b/c they are not all people
-
we extract groupings, most-recent data into array properties, keyed by data-type
MATCH (o:Organization) WHERE exists(o.oc_number) and exists(o.jurisdiction)
WITH o LIMIT 1
CALL apoc.load.json("https://api.opencorporates.com/v0.4/companies/"+o.jurisdiction+"/"+o.oc_number) yield value
WITH value.results.company as c
WITH [g IN c.corporate_groupings | g.corporate_grouping.name] as groupings, c
WITH groupings, c, [d IN c.data.most_recent WHERE d.datum.description IS NOT null | {key:d.datum.data_type,value:d.datum.description}] as agg0
WITH c,groupings,apoc.map.groupByMulti(agg0,"key") as agg
WITH *, apoc.map.fromPairs([k in keys(agg) | [k,apoc.coll.toSet([v IN agg[k] | v.value])]]) as values
UNWIND c.officers as po
RETURN groupings,values,
{name:po.officer.name,position:toUpper(po.officer.position),
oc_number:po.officer.id, start:po.officer.start_date,end:po.officer.end_date,
source:po.officer.opencorporates_url} as person;
-
we extract groupings, most-recent data into array properties, keyed by data-type
-
if you don’t have an API key, you can load about 100 organizations worth of data
-
OpenCorporates doesn’t denote if an officer is a person or an organization, so we figure it out by checking the name for common organization keywords "(CORP|AGENT|INC|LLC|LLP|SYSTEM|LTD|LIMITED|ORGA)"
-
could work by looking at id-pattern/space
MATCH (o:Organization) WHERE exists(o.oc_number) and exists(o.jurisdiction) AND NOT exists(o.loaded)
WITH o LIMIT 100
CALL apoc.util.sleep(1000)
CALL apoc.load.json("https://api.opencorporates.com/v0.4/companies/"+o.jurisdiction+"/"+o.oc_number) yield value
WITH o, value.results.company as c
SET o.loaded = true
SET o.groupings = [g IN c.corporate_groupings | g.corporate_grouping.name]
WITH o, c
WITH *, [d IN coalesce(c.data.most_recent,[]) WHERE d.datum.description IS NOT null | {key:d.datum.data_type,value:d.datum.description}] as agg0
WITH *, apoc.map.groupByMulti(agg0,"key") as agg
WITH *, apoc.map.fromPairs([k in keys(agg) | [k,apoc.coll.toSet([v IN agg[k] | v.value])]]) as values
SET o += values
WITH *
UNWIND c.officers as po
WITH *, toUpper(lpo.officer.name) =~ ".*(CORP|AGENT|INC|LLC|LLP|SYSTEM|LTD|LIMITED|ORGA).*" as isOrg,toUpper(po.officer.name) as name,
replace(toUpper(po.officer.position)," ","_") as position
FOREACH (_ IN case when isOrg then [true] else [] end | MERGE (:Organization {name:name}))
FOREACH (_ IN case when isOrg then [] else [true] end | MERGE (:Person {name:name}))
WITH *
MATCH (p {name:name})
SET p:Officer
WITH * WHERE NOT position IS NULL AND not exists ((p)-->(o))
CALL apoc.create.relationship(p,position,{oc_number:po.officer.id},o) yield rel
SET rel.start=po.officer.start_date, rel.end=po.officer.end_date
RETURN labels(p),type(rel),po.officer,values,o.name;
{
"company": {
"branch_status": "branch of an out-of-jurisdiction company",
"company_number": "344214",
"company_type": "Alien Corporations (RICO) - Foreign",
"created_at": "2011-09-30T12:52:25+01:00",
"current_status": "Active/Compliance",
"dissolution_date": null,
"inactive": false,
"incorporation_date": "2005-03-28",
"jurisdiction_code": "us_ga",
"name": "BANK OF SCOTLAND",
"opencorporates_url": "https://opencorporates.com/companies/us_ga/344214",
"previous_names": [],
"registry_url": "http://corp.sos.state.ga.us/corp/soskb/Corp.asp?344214",
"retrieved_at": "2011-10-04T15:15:15+01:00",
"source": {
"publisher": "Georgia Secretary of State",
"retrieved_at": "2011-10-04T15:15:15+01:00",
"url": "http://corp.sos.state.ga.us/corp/soskb/Corp.asp?344214"
},
"updated_at": "2014-02-16T16:18:16+00:00",
"registered_address_in_full": "THE MOUND,EDINBURGH,EHI 1YZ, SCOTLAND ZF"
}
-
jurisdiction_code
-
name
-
branch_status
-
company_number
-
company_type
-
current_status
-
inactive
-
incorporation_date
-
dissolution_date
-
created_at
-
updated_at
-
previous_names: [.company_name]
-
registered_address_in_full
-
source source.publisher
-
retrieved_at
-
industry_codes [industry_code.description]
-
corporate_groupings: [corporate_grouping.name]
"inactive": false,
"incorporation_date": "1909-04-14",
"industry_codes": [
{
"industry_code": {
"code": "70100",
"description": "Activities of head offices",
"code_scheme_id": "uk_sic_2007",
"code_scheme_name": "UK SIC Classification 2007"
}
}
],
"jurisdiction_code": "gb",
"name": "BP P.L.C.",
"officers": [
{
"officer": {
"end_date": null,
"id": 32609673,
"name": "DAVID JOHN JACKSON",
"opencorporates_url": "https://opencorporates.com/officers/32609673",
"position": "secretary",
"start_date": "2003-07-24",
"uid": null
}
},
// todo turn values into list WITH apoc.map.groupByMulti([d IN c.data.most_recent WHERE d.datum.description IS NOT null | {key:datum.data_type,value:datum.description}],"key") as agg WITH apoc.map.fromPairs([k in keys(agg) | [k,[v IN agg[k] | v.value]]]) as values o += values
"results": {
"company": {
"branch_status": null,
"company_number": "00102498",
"company_type": "Public Limited Company",
"corporate_groupings": [
{
"corporate_grouping": {
"name": "bp",
"opencorporates_url": "https://opencorporates.com/corporate_groupings/bp",
"updated_at": "2014-02-16T11:22:24+00:00",
"wikipedia_id": "BP"
}
}
],
"created_at": "2010-10-21T18:20:50+01:00",
"current_status": "Active",
"data": {
"most_recent": [
{
"datum": {
"data_type": "WipoTrademark",
"description": null,
"id": 9790033,
"opencorporates_url": "https://opencorporates.com/data/9790033",
"title": "International Trademark Registration"
}
},
{
"datum": {
"data_type": "CompanyAddress",
"description": "1 St James's Square, London SW1Y 4PD, GB",
"id": 9788778,
"opencorporates_url": "https://opencorporates.com/data/9788778",
"title": "Company Address"
}
},
{
"datum": {
"data_type": "Website",
"description": "http://www.bp.com/sectiongenericarticle.do?categoryId=9021231&contentId=7039279",
"id": 8474113,
"opencorporates_url": "https://opencorporates.com/data/8474113",
"title": "Website"
}
},
{
"datum": {
"data_type": "OfficialRegisterEntry",
"description": "register id: 313807",
"id": 2452824,
"opencorporates_url": "https://opencorporates.com/data/2452824",
"title": "SEC Edgar entry"
}
}
],
"total_count": 125,
"url": "https://opencorporates.com/companies/gb/00102498/data"
},
"dissolution_date": null,
"filings": [
{
"filing": {
"date": "2014-02-13",
"id": 199825350,
"opencorporates_url": "https://opencorporates.com/filings/199825350",
"title": "Return of purchase of own shares",
"uid": "284acfeJxjZRd2YnXi4ohPyU9OSS1OBnHYQJzMFCdxfgMDA0djQy9/Cw8TUxMzAycuzvi0/KLcksqCVCdxlmAPA2MnLtb4lOTEEidxRguYMYklQFkOIwNDEwMjQ+MmNef83ILEvMzUYo/80uJUK6sIXx/3cCsrz9zE9FQo5Z1aSaQybgYGBkYgZgJiZiBmAWJWIGYDYnYg5gBiTiDmAmJuANC4NJA="
}
},
{
"filing": {
"date": "2014-02-13",
"id": 199825349,
"opencorporates_url": "https://opencorporates.com/filings/199825349",
"title": "Notice of cancellation of shares",
"uid": "771a5aeJxjZRd2YnXi4ohPyU9OSS1OBnHYQJzMFCdxfgMDA0djQy9/iwBXUxMzAycuzvi0/KLcksqCVCdxlmAPAzMnLtb4lOTEEidxRkuYMYklQFkOIwNDEwMjQ+MmNef83ILEvMzUYo/80uJUK6sIXx/3cCsrz9zE9FQo5Z1aSaQybgYGBkYgZgJiZiBmAWJWIGYDYnYg5gBiTiDmAmJuAOT2NK0="
}
}
],
"inactive": false,
"incorporation_date": "1909-04-14",
"industry_codes": [
{
"industry_code": {
"code": "70100",
"description": "Activities of head offices",
"code_scheme_id": "uk_sic_2007",
"code_scheme_name": "UK SIC Classification 2007"
}
}
],
"jurisdiction_code": "gb",
"name": "BP P.L.C.",
"officers": [
{
"officer": {
"end_date": null,
"id": 32609673,
"name": "DAVID JOHN JACKSON",
"opencorporates_url": "https://opencorporates.com/officers/32609673",
"position": "secretary",
"start_date": "2003-07-24",
"uid": null
}
},
{
"officer": {
"end_date": "2012-03-30",
"id": 32609674,
"name": "DAVID JOHN PEARL",
"opencorporates_url": "https://opencorporates.com/officers/32609674",
"position": "secretary",
"start_date": "2001-11-01",
"uid": null
}
},
{
"officer": {
"end_date": null,
"id": 32609675,
"name": "PAUL MILTON ANDERSON",
"opencorporates_url": "https://opencorporates.com/officers/32609675",
"position": "director",
"start_date": "2010-02-01",
"uid": null
}
},
{
"officer": {
"end_date": null,
"id": 32609676,
"name": "FRANK BOWMAN",
"opencorporates_url": "https://opencorporates.com/officers/32609676",
"position": "director",
"start_date": "2010-11-08",
"uid": null
}
}
],
"opencorporates_url": "https://opencorporates.com/companies/gb/00102498",
"previous_names": [
{
"company_name": "BP AMOCO P.L.C.",
"con_date": "2001-05-01"
},
{
"company_name": "THE BRITISH PETROLEUM COMPANY P.L.C.",
"con_date": "1998-12-31"
}
],
"registered_address_in_full": "1 ST JAMES'S SQUARE, LONDON, SW1Y 4PD",
"registry_url": "http://data.companieshouse.gov.uk/doc/company/00102498",
"retrieved_at": "2014-02-16T10:06:59+00:00",
"source": {
"publisher": "UK Companies House",
"retrieved_at": "2014-02-16T10:06:59+00:00",
"terms": "UK Crown Copyright",
"url": "http://xmlgw.companieshouse.gov.uk/"
},
"updated_at": "2014-02-16T10:07:06+00:00"
}
o.groupings = [g IN c.corporate_groupings | g.corporate_grouping.name],
UNWIND c.officers as officer
MERGE (p:Person {name:toUpper(officer.name)})
CALL apoc.create.relationship(p,toUpper(officer.position),{oc_number:officer.id, start:officer.start_date,end:officer.end_date, source:officer.opencorporates_url},o) yield rel