Libraries.io already has support for most of the largest package managers but there are many more that we've not added yet. This guide will take you through the steps for adding support for another.
Adding support for a new package manager is fairly easy assuming that the package manager repository has an API for extracting data about its packages over the internet. Follow these steps:
Add new file to app/models/package_manager
, this will be a ruby class so the filename should be all lower case and end in .rb
, for example: app/models/package_manager/foobar.rb
The basic structure of the class should look like this:
module PackageManager
class Foobar < Base
end
end
Note that the class name must begin with a capital letter and only contain letters, numbers and underscores, ideally the class name will match the formatting of the package managers official name, i.e. CocoaPods
There are three basic methods that each package manager class needs to implement to enable minimal support in Libraries.io:
Libraries needs to know all of the names of the projects available in a package manager to be able to index them, this method should return an array of strings of names.
Different package managers provide ways of getting this data, here are some examples:
- npm provides one huge json endpoint containing all the packages, we pluck just the keys from the top level object in the response:
def self.project_names
get("https://registry.npmjs.org/-/all").keys[1..-1]
end
- Haxelib lists all the project names on a html page, so we use nokogiri to pluck them all out:
def self.project_names
get_html("https://lib.haxe.org/all/").css('.project-list tbody th').map{|th| th.css('a').first.try(:text) }
end
- Julia stores all the packages in a git repository, here we clone the repo, list the top level folder names, not ideal but it works:
def self.project_names
@project_names ||= `rm -rf Specs;git clone https://github.com/JuliaLang/METADATA.jl --depth 1; ls METADATA.jl`.split("\n")
end
Once we have a list of package names, we need to be able to get the information for each package by its name from the registry. This is also used for syncing/updating a package we already know about when a new version is published.
This method takes a string of the name as an argument and usually makes a http request to the registry for the given name and returns a ruby hash of information, often parsed from json or xml.
Some examples:
- Packagist has a JSON endpoint and we select just the
package
attribute from the response:
def self.project(name)
get("https://packagist.org/packages/#{name}.json")['package']
end
- npm has a JSON endpoint but we need to escape
/
for scoped module names:
def self.project(name)
get("http://registry.npmjs.org/#{name.gsub('/', '%2F')}")
end
- Hackage doesn't have a JSON endpoint for package information so we scrape the html of the page instead:
def self.project(name)
{
name: name,
page: get_html("http://hackage.haskell.org/package/#{name}")
}
end
After getting the information about a package from the registry, we need to format that data into something that will fit nicely in the Libraries.io database, the mapping method takes the result of the #project
method and returns a hash with some or all of the following keys:
name
- The name of the project, this is usually the same as originally passed to#project
description
- description of the project, usually a couple of paragraphs, not the whole readmerepository_url
- url where the source code for the project is hosted, often a GitHub, GitLab or Bitbucket repo pagehomepage
- url for the homepage of the project if different from therepository_url
licenses
- an array of SPDX license short names that the project is licensed under, eg['MIT', 'GPL-2.0']
keywords_array
- an array of keywords or tags that can be used to categorize the project
Here's an example from Cargo:
def self.mapping(raw_project)
MappingBuilder.build_hash({
name: raw_project['crate']['id'],
homepage: raw_project['crate']['homepage'],
description: raw_project['crate']['description'],
keywords_array: Array.wrap(raw_project['crate']['keywords']),
licenses: raw_project['crate']['license'],
repository_url: repo_fallback(raw_project['crate']['repository'], raw_project['crate']['homepage'])
})
end
Not all package managers have these concepts but lots do, more features in Libraries.io can be enabled if these methods are implemented in a PackageManager class:
For package managers that have a concept of discrete versions being published.
This method takes the returned data from the #project
method and should return an array of hashes, one for each version, with a number
and the date that the version was originally published_at
.
Here's an example from NuGet:
def self.versions(raw_project, _name)
raw_project[:releases].map do |item|
VersionBuilder.build_hash(
number: item['catalogEntry']['version'],
published_at: item['catalogEntry']['published']
)
end
end
For package managers that we can update using a single version instead of all versions.
This method should take the returned data from the #project
method and should return a single version, with the same data
that versions()
returns.
def self.one_version(raw_project, version_string)
raw_project["versions"]
.find { |v| v["number"] == version_string }
.map do |item|
number: item["number"],
published_at: item["published"]
end
end
For package managers that have a concept of versions and versions having dependencies.
This method returns the dependencies for a particular version of a package, so it receives a name
, version
and optionally the returned data from the #project
method and should return an array of hashes, one for each dependency.
Each dependency hash should include the following attributes:
project_name
- the name of the package of the dependencyrequirements
- the version requirements of this dependency, for example~> 2.0
kind
- regular dependencies areruntime
but this could also bedevelopment
,test
,build
or something else
The can also potentially have extra attributes:
optional
- some package managers have the concept of optional dependencies, if yours does, set this as a booleanplatform
- this will almost always beself.name.demodulize
, the same platform as the package manager, but if dependencies come from a different package manager you can override it
Example from Haxelib:
def self.dependencies(name, version, _mapped_project)
json = get_json("https://lib.haxe.org/p/#{name}/#{version}/raw-files/haxelib.json")
return [] unless json['dependencies']
json['dependencies'].map do |dep_name, dep_version|
{
project_name: dep_name,
requirements: dep_version.empty? ? '*' : dep_version,
kind: 'runtime',
platform: self.name.demodulize
}
end
rescue
[]
end
For package managers with a lot of packages, downloading the full list of names can take a long time. If you can provide a list of names of recently added/updated packages then Libraries.io can check that on a more regular basis. It should return a list of names in the same way that #project_names
does, for example:
- Pub's project list page is ordered by most recently updated so we can just grab the first page of packages and map the names out:
def self.recent_names
get("https://pub.dartlang.org/api/packages?page=1")['packages'].map{|project| project['name'] }
end
Many package managers have a command line interface for installing individual packages, if you add this method, Libraries.io will show the instructions on the project page so anyone can easily install it.
This method is passed a project
object and optionally a version number, here's some examples:
- Rubygems adds a
-v
flag if a version is passed
def self.install_instructions(db_project, version = nil)
"gem install #{db_project.name}" + (version ? " -v #{version}" : "")
end
- Go cli doesn't have support for specifying a version so it's ignored
def self.install_instructions(db_project, version = nil)
"go get #{db_project.name}"
end
If the package manager's official name doesn't fit with Ruby's class name rules you can add its official name in this method, for example npm
is always lower case, the class name is NPM
so we have added the following:
def self.formatted_name
'npm'
end
If the package manager registry has a predictable url structure, we can generate useful urls for each project that are used where available:
If the package manager registry website has individual pages for each package, add this method to return a url for it.
It takes a project
object and an optional version
number, for example:
def self.package_link(db_project, version = nil)
"https://rubygems.org/gems/#{db_project.name}" + (version ? "/versions/#{version}" : "")
end
If the package manager provides predictable urls to the tar ball or zip archive of the package, add this method to return a url for it.
It takes a package name
and an optional version
number, for example:
def self.download_url(db_project, version = nil)
"https://rubygems.org/downloads/#{db_project.name}-#{version}.gem"
end
If the package manager provides hosted documentation for each package, add this method to return a url for it.
It takes a package name
and an optional version
number, for example:
def self.documentation_url(name, version = nil)
"http://www.rubydoc.info/gems/#{name}/#{version}"
end
Libraries will try and ping the #package_link
url on a regular basis to check for a 200 status code, if the package manager registry always returns a 200 or doesn't have a #package_link
method, you can add this method to provide a different url that will return a 200 if the package still exists or a 404 if it's been removed.
It takes a project
object, for example:
def self.check_status_url(db_project)
"https://rubygems.org/api/v1/versions/#{db_project.name}"
end
Constants are added to each PackageManager
to provide more meta data about the level of support that Libraries.io has for that package manager:
If the PackageManager
class has a #versions
method then set this to true
:
HAS_VERSIONS = true
If the PackageManager
class has a #dependencies
method then set this to true
:
HAS_DEPENDENCIES = true
If the package manager has a website then set this to the full url with protocol:
URL = 'https://rubygems.org'
Most application level package managers have a main programming language that they focus on, this should be set to the hex value for that language from the github-linguist
gem, you can see the full list of colours in languages.yml
COLOR = '#701516'
HIDDEN
This doesn't need to be set for any active package managers, but if one is shut down and should no longer be shown on the site set it to true
:
HIDDEN = true
Once your PackageManager
class is ready you can add the required rake tasks to download.rake
Depending on the size, popularity and frequency of updates there are different tasks to add:
If there's a #recent_names
method defined on the PackageManager
class then Libraries.io can check for new updates frequently by calling #import_recent_async
on the class, add a rake task that looks like this:
desc 'Download recent Rubygems packages asynchronously'
task rubygems: :environment do
PackageManager::Rubygems.import_recent_async
end
For package managers that don't have a proper concept of versions (Go and Bower are good examples that fall back to git tags), we don't need to check packages we already know about, the #import_new_async
task will only download packages we don't already have in the database:
desc 'Download new Bower packages asynchronously'
task bower: :environment do
PackageManager::Bower.import_new_async
end
For the initial import of all packages, add an foobar_all
task which calls #import_async
, this will be ran on a daily basis if there's no #recent_names
method defined:
desc 'Download all Rubygems packages asynchronously'
task rubygems_all: :environment do
PackageManager::Rubygems.import_async
end
For some package managers that the download process can't easily be parallelized (if it requires cloning a git repo for example), the import can be done synchronously instead with the following task that calls #import
on the class:
desc 'Download all Inqlude packages'
task inqlude: :environment do
PackageManager::Inqlude.import
end
Once the PackageManager
class is ready, there's some optional updates that can be added to some other repositories to enable more functionality.
Depper polls RSS feeds and JSON API endpoints to check for new and updated packages and then enqueues jobs to download those packages. It helps reduce the load on the package manager registries and push new data into the system faster. You will want to add a new ingestor that understands how to track changes in the package manager.
If your package manager has an icon, adding it to the Pictogram repository will enable it to show up on the site.
Check out the documentation on adding a logo for a new package manager in the Pictogram repo: https://github.com/librariesio/pictogram