Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is this still active? #134

Open
cristianocca opened this issue Sep 21, 2017 · 13 comments
Open

Is this still active? #134

cristianocca opened this issue Sep 21, 2017 · 13 comments

Comments

@cristianocca
Copy link

I guess it's a silly question since the last commit was 3 years ago. But tried out a few options and this one seems to yield the best results so far, did this project continue somewhere else?

@cj13579
Copy link

cj13579 commented Sep 22, 2017

I've seen a few forks of this around but they only seem to contain the commits that have been PRd back to this repo 😞

@cristianocca
Copy link
Author

Hmm, the library has some issues that seem to be very common like unicode decode error or timeouts that have been reported and are probably waiting for fixes there on some pull request.

@purplehat7
Copy link

+1, I've found the same as @cristianocca

@mikkokotila
Copy link

This guy did an amazing job with this package. He defied the mythical whois monster. I'm surprised how little work it took to get it to a level where it yields better results than anything I've ever tried before (and I did try a lot...I think I have at least 1,000,000 sites down so far).

@Ni-Knight
Copy link

Is the library dead? is there any active fork?

@mikkokotila
Copy link

@Ni-Knight yes there is https://github.com/botlabio/pywhois.

The package have been somewhat cleaned, slightly refactored and tested with 100,000 sites. As a result of that test there is now a better idea of the actual coverage, how to improve that, etc.

@joepie91
Copy link
Owner

Hi all, apologies for the long radio silence. I've had a bunch of personal issues to deal with for the past few years, so maintenance of libraries has slipped quite a bit; especially for my Python libraries.

As I generally use Node.js for my own projects nowadays, my original intention was to maintain this in parallel with a JS implementation, sharing the parsing ruleset between them; but I've not found the time to get anything done on it.

Given the increasing amount of registries that just totally shut off WHOIS data access due to the GDPR (even for company data, where this isn't necessary!), I'm unsure about the future of this library. On the one hand it's something I'd like to maintain and I have some ideas for improving on it, but on the other hand it may not be very useful in the future with decreasing WHOIS data access.

The persisting encoding issues have also contributed significantly to this library falling by the wayside; there doesn't seem to have ever been a canonical solution for these issues that works in both Python 2 and Python 3 without introducing an additional dependency - and adding a dependency is not something I really like to do considering the rather... lacking and conflict-prone dependency model in Python, which is a big part of why I gave up on Python in the first place.

So... I'm not sure how to proceed.

A few questions for you all, as users of this library, that'll help me determine how to continue:

  • Do you still feel that parsing WHOIS data is worth it considering the increase in "GDPR walls"?
  • What parsed data do and don't you use?
  • Would you be willing to deal with an increased risk of dependency conflicts in your project (due to version mismatches) if it meant that encoding was correctly auto-detected and handled?

@mikkokotila
Copy link

@joepie91 I guess many will be pleased to hear from you here! GDPR wall sounds bad, I did not know about this yet but was wondering how this will play out.

To your questions:

  • do you have information about the extent of the GDPR walls
  • I think the most interesting is registration date, expiry date, registrant email, registrant country
  • if I understand your question correctly, deps are ok, but maybe just drop support for python 2 from the roadmap?

I think that one big thing with this library is that the code needs to be refactored. IMO all special cases should be handled in separate functions that reside in their own files in a /exceptions sub module or something like that.

@joepie91
Copy link
Owner

do you have information about the extent of the GDPR walls

With the introduction of the GDPR, it is no longer allowed to process or publish personally identifiable information of EU residents, without either a) a predetermined legal basis for doing so, or b) explicit and voluntary permission to do so (and it's not allowed to require that 'permission' to use a service).

For WHOIS data, this basically works out to "you can no longer legally publish registrant data for EU residents". While this does not apply to non-EU residents and organizations (eg. companies), an increasing amount of registries is simplifying their implementation by just hiding all registrant data for everybody.

For example, if you WHOIS my domain cryto.net, you will see something like this:

Registrant Name: Not disclosed Not disclosed
Registrant Organization: 
Registrant Street: Not disclosed, Not disclosed, Not disclosed
Registrant City: Not disclosed
Registrant State/Province: 
Registrant Postal Code: 00000
Registrant Country: NL
Registrant Phone: +1.5163872248
Registrant Phone Ext: 
Registrant Fax: 
Registrant Fax Ext: 
Registrant Email: c1bf3e6f696fe288b0b943cea7abda1c.gdrp@customers.whoisprivacycorp.com

There is no standardized way for indicating such removal/replacement of PII, so there would need to be special rules for detecting GDPR-related information removals per registry. It also means that you will get less and less data out of registries over time.

if I understand your question correctly, deps are ok, but maybe just drop support for python 2 from the roadmap?

I'd prefer avoiding dependencies at all, since Python uses a flat dependency model; if two dependencies in the same project use different (incompatible) versions of an encoding-detection package, it will cause a potentially unresolvable version conflict. This also applies when only supporting Python 3.

I think that one big thing with this library is that the code needs to be refactored. IMO all special cases should be handled in separate functions that reside in their own files in a /exceptions sub module or something like that.

That was the original plan, but I kept running into weirder and weirder edge cases; and since there's no centralized repository of formats and edge cases, any such architecture would need to be changed over time anyway to accommodate newly discovered kinds of edge cases.

That's not to say that things can't be refactored, but it's an ongoing and never-ending process rather than a one-off todo item.

@mikkokotila
Copy link

This "not disclosed" business looks bad. How will this end? Will we be deprived from the joys of mindless parsing of whois records?

@joepie91
Copy link
Owner

Likely with increasingly smaller amounts of (useful) information being present in WHOIS data over time, hence my uncertainty on how to proceed with this project.

@mikkokotila
Copy link

@joepie91 generally speaking I prefer signals that are consistent across all observations, which as you know was kind of a struggle with WHOIS data to start with.

That said, my use-case is quite specific and might not be affected too much here. Actually I'd be ok with just registration date, and I think that's not going to be masked at any point for any reason. Instead of regex, a deep learning model could be used for detecting it which would avoid a lot of the headache.

My second use-case is to identify use of whois privacy, which I think could be mostly done in the current scenario.

The third case is a more conventional reverse lookup with reg email, org name, etc which is obviously affected (and actually because of wider adoption of whois privacy already was). But I think there are better, much harder to mask ways, to do that these days.

@guyskk
Copy link

guyskk commented Jan 24, 2022

This fork works well 👍
https://github.com/kilgoretrout1985/pythonwhois-alt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants