Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scraper, project, superblock: Implement automated greylisting #2778

Open
wants to merge 42 commits into
base: development
Choose a base branch
from

Conversation

jamescowens
Copy link
Member

@jamescowens jamescowens commented Oct 2, 2024

Automated Greylisting Design Highlights

Executive Summary

This document provides the highlights for the functionality included in this PR, which implements both manual and automated greylisting. The Gridcoin network rules for project whitelisting and the conditions for greylisting are documented on the main Gridcoin website at Whitelist Process.

When projects temporarily do not meet the requirements for whitelisting, the main two rules of which are a Work Availability Score (WAS) of less than 0.1, and/or a Zero Credit Days (ZCD) count of greater than 7, the project is "greylisted". Traditionally, from a network operations perspective, this has meant temporarily removing the project from the whitelist using the administrative protocol update procedures, and then re-adding the project when the two rules return to normal. This is a manual and labor intensive process that does not scale well as the number of whitelisted projects increase, and it depends on the administrator to take action in a timely manner.

To address the scalability and administrative stablility, it was recognized several years ago that a form of automatic greylisting would be important functionality to implement. This PR addresses that functionality need.

This PR implements manual greylisting via a new project entry status MAN_GREYLISTED. This is set by administrative contract just like whitelisting. The purpose of the manual listing is twofold: 1) Not all possible issues that could result in a need to greylist are covered by the WAS and ZCD rules, and 2) a long term low level availability of a project at a fairly consistent level for more than 40 days will cause the WAS to pass, because both the numerator and denominator forming WAS will see similar results. This project status is stored in the project registry.

Automatic greylisting is implemented via a new AutoGreylist class. This class essentially records total credit data across the entire project (all CPIDs, whether they have an active beacon or not) collected from the scrapers via a pending, or existing (last) superblock for all whitelisted projects, and the history of 40 superblocks back from the pending or last superblock, and evaluates the WAS and ZCD rules. Because it operates at the granularity of the superblocks, technically the definition of these rules is slightly different than the documentation since the time between superblocks is slightly more than 24 hours. In practice this is essentially equivalent. Because the automatic greylisting status is aligned along superblock boundaries, the AutoGreylist class is implemented as a caching singleton, so that repeated calls to the class for the greylist simply report the cached state rather than doing the heavyweight walk of 40 superblocks to recompute the automatic greylist state from the superblock history as long as the referenced superblock hash for the cache has not changed.

The AutoGreylist class overrides a project registry status of ACTIVE or MAN_GREYLISTED with the status of AUTO_GREYLISTED if it meets greylisting criteria. This is done directly in the Whitelist::Snapshot method, preserving the underlying project entries. Since the underlying registry entries are preserved, the project state returns to the underlying status as soon as the greylist entry for that project returns to normal. Conversely there is also the ability to set via addkey the project status AUTO_GREYLIST_OVERRIDE in the project registry, which takes precedence over the automatic greylisting. This allows an override to keep a project active even if the automated greylist rules would greylist it. A good example of this would be a project that had a one day correction of TC due to a database issue that distorts the WAS, causing a false failure. In that case it would be a good idea to override the automatic greylisting for that project temporarily after evaluation by the community.

Note that the auto greylisting ruleset is entirely contained within the AutoGreylist class. While the rules are encapsulated in that class, no attempt to write a formal rules engine was done as that would be too heavyweight for just two rules. Additionally, WAS has been implemented using 64 bit integer arithmetic and the Gridcoin Fraction class to avoid consensus issues due to floating point. For ease of display, WAS values in reports are shown as floating point.

The excluded project functionality of the scraper convergence and the associated superblocks formed still applies. This means that a project that does not export statistics at all in a 48 hour period will be excluded from superblocks until access to project statistics are restored. This is related to the 48 hour statistics retention rule for scraper statistics that has been in place since the scraper rewrite a number of years ago. When a project is excluded this is effectively an override of all project registry statuses and the operation of the automated greylisting.

Changes to the scraper

The scraper was modified to collect total credit across the entire project for each project that does not have a status of deleted in the project registry. This was accomplished via the following changes (this is not all inclusive):

  1. Addition of two fields in the scraper file manifest registry, All_cpid_total_credit and No_records. All_cpid_total_credit captures the total credit summation across the entire project regardless of whether the cpid is active or not. No_records is a flag to record when a file has no active cpids. Formerly, a file and its corresponding entry would have been deleted, but now we must retain the file and entry to record the all_cpid_total_credit even if there are no current active cpids in that project.

  2. Implementation of an additional pseudo-project entry in the CScraperManifest with the name ProjectsAllCpidTotalCredits to allow convergence to be calculated by each node from the scraper data. This means the total credits data must also match just like other project data for a convergence to occur and insures integrity of the total credits data similar to other project data.

  3. Extension of the ScraperStatsVerifiedBeacons structure used in the convergence to include total credits, and renamed ScraperStatsVerifiedBeaconsTotalCredits. This is the primary method of transferrance of state to the superblock of the total credits for each project from a manifest convergence.

  4. Modification of the scraper machinery to assign zero magnitude to greylisted projects for purposes of magnitude calculation.

  5. Modification of the ProcessProjectRacFileByCPID function to accumulate total credit for a project from all CPIDs regardless of status.

  6. Reporting for CScraperManifest and the convergence report rpc functions were extended to provide information on the total credits across all projects.

Changes to the registry_db.h template

The registry db template provides generic programming common code underlying the registry implementations for each of the contract types that require historized, revertable state maintenance of their corresponding objects. This implements versioned state storage of registered contract type objects via the defined key in leveldb. The AutoGreylist class needs to know when the first entry actually occurred for a project to properly apply the rules for projects whitelisted within the 40 superblock lookback scope. As a result, the registry db template had to be extended to include a generic type and code to accomodate this.

Each of the corresponding contract type classes had to be modified to accomodate changes in the registry template, even if they did not actually use the first occurance functionality, i.e. trivial modifications to all other contract types besides project.

Changes to the superblock

The superblock class version was incremented to v3 and was extended to include the m_projects_all_cpids_total_credits map which contains the total credits data for all projects, and which is from the ScraperStatsVerifiedBeaconsTotalCredits structure in the manifest convergence from the scraper. This data is serialized and is stored on the blockchain when the superblock is staked and in turn is used by the AutoGreylist class for greylist status calculations.

The superblock also was extended to store projects that have been greylisted.

Implementation of the AutoGreylist class and changes to the project class and project registry (whitelist)

ProjectEntryStatus enum class changes

The ProjectEntryStatus num class was extended and moved to fwd.h to avoid recursive include problems.

{
//!
//! \brief Enumeration of project entry status. Unlike beacons this is for both storage
//! and memory.
//!
//! UNKNOWN status is only encountered in trivially constructed empty
//! project entries and should never be seen on the blockchain.
//!
//! DELETED status corresponds to a removed entry.
//!
//! ACTIVE corresponds to an active entry.
//!
//! GREYLISTED means that the project temporarily does not meet the whitelist qualification criteria.
//!
//! OUT_OF_BOUND must go at the end and be retained for the EnumBytes wrapper.
//!
enum class ProjectEntryStatus
{
    UNKNOWN,
    DELETED,
    ACTIVE,
    MAN_GREYLISTED,
    AUTO_GREYLISTED,
    AUTO_GREYLIST_OVERRIDE,
    OUT_OF_BOUND
};

MAN_GREYLISTED, AUTO_GREYLISTED, AUTO_GREYLIST_OVERRIDE are new states. The order of enum entries for the extending states was not changed to avoid serialization issues with older project entries.

Implementation of project filter for whitelist snapshots

A ProjectFilterFlag was implemented to accomplish easy filtering of the whitelist snapshot depending on intended use.

    //!
    //! \brief Project filter flag enumeration.
    //!
    //! This controls what project entries by status are in the project whitelist snapshot. Note that REG_ACTIVE
    //! is the original "ACTIVE" and represents project entries with a status of "ACTIVE" in the registry. The
    //! filter flag "ACTIVE" here includes both REG_ACTIVE and AUTO_GREYLIST_OVERRIDE project entry statuses from
    //! the registry, since both mean the project is active assuming a convergence can be formed.
    //!
    enum ProjectFilterFlag : uint8_t {
        NONE                   = 0b00000,
        DELETED                = 0b00001,
        MAN_GREYLISTED         = 0b00010,
        AUTO_GREYLISTED        = 0b00100,
        GREYLISTED             = MAN_GREYLISTED | AUTO_GREYLISTED,
        REG_ACTIVE             = 0b01000,
        AUTO_GREYLIST_OVERRIDE = 0b10000,
        ACTIVE                 = REG_ACTIVE | AUTO_GREYLIST_OVERRIDE,
        NOT_ACTIVE             = 0b00111,
        ALL_BUT_DELETED        = 0b11110,
        ALL                    = 0b11111
    };

Note that the ACTIVE enum value is actually a combination of the original ACTIVE (now labled REG_ACTIVE, which is short for registry active) and AUTO_GREYLIST_OVERRIDE, since a project status of AUTO_GREYLIST_OVERRIDE means that the project is not only active, but overrides any determination by the automatic greylisting.

ProjectEntry class modifications

The ProjectEntry class current version has been incremented to v4 and now includes a requires_ext_adapter boolean to replace the temporary protocol entry based approach to show this status in the GUI. This boolean is serialized and the serialization is conditioned on the version to ensure compatibility with older project records.

AutoGreylist class implementation

GreylistCandidateEntry

The GreylistCandidateEntry is a class implemented within the AutoGreylist class that formalizes the greylist state maintenance and state history for each project in the whitelist filtered by ALL_BUT_DELETED (i.e. all but deleted). This class uses a "reverse" bookmark based approach to effectively deal with a number of tricky situations involving lack of availability of project total credit data (drop-outs). In addition, each update is stored in the m_update_history vector to provide visibility of the historical evolution of the total credit data and greylist status according to the rules.

The GreylistCandidateEntry contains an "empty" constructor and a parameterized constructor, the latter of which both creates the GreylistCandidateEntry and establishes the baseline for the measurements.

uint8_t GetZCD()

This method simply returns the m_zcd_20_SB_count member variable, which is a count of the number of zero credit days in the 20 superblock lookback from the baseline.

Fraction GetWAS()

The method computes the average total credit over a 7 superblock lookback and a 40 superblock lookback and then constructs a Fraction of the result. Note that if the lookback is less than 40, the number of superblocks in the average is reduced for the 40 superblock lookback and similarly if the lookback is less than 7, the number of superblocks in the average is reduced for the 7 superblock lookback. Given that when data is first being collected for a newly listed project, this can lead to odd behavior of WAS, there is a grace period implemented in the application of the rules for setting the m_meets_greylisting_crit flag in the GreylistCandidateEntry. This grace period is currently set at 7 superblocks.

void UpdateGreylistCandidateEntry(std::optional<uint64_t> total_credit, uint8_t sb_from_baseline)

This method is used by the RefreshWithSuperblock method to update each GreylistCandidateEntry and add each update to the entry history.

struct UpdateHistoryEntry

This is the struct that stores the greylist state at the given update for the greylist candidate. Note that this is the history viewed BACKWARDS as a lookback from the current state, not forwards looking, so this can be misleading if you do not understand that. Each time the AutoGreylist class is updated due to the current superblock hash changing, the historical entries will be rebuilt from the current superblock backwards. This struct contains most of its member variables as std::optionals to accomodate the lack of information at a particular update.

const std::vector GetUpdateHistory() const

This is a getter that returns a constant version of m_update_history.

Public member variables
        const std::string m_project_name;

        uint8_t m_zcd_20_SB_count;
        uint64_t m_TC_7_SB_sum;
        uint64_t m_TC_40_SB_sum;
        bool m_meets_greylisting_crit;

The m_project_name contains the project name, which is the key to the greylist map. The next three contain the undertying state with which to compute the ZCD and WAS but since they are public, they can be independently accessed. The m_meets_greylisting_crit stores the current greylist qualification state for the entry. If it is true, the project currently meets automatic greylisting criteria.

Private member variables
        std::optional<uint64_t> m_TC_initial_bookmark; //!< This is a "reverse" bookmark - we are going backwards in SB's.
        std::optional<uint64_t> m_TC_bookmark;
        uint8_t m_sb_from_baseline_processed;

        std::vector<UpdateHistoryEntry> m_update_history;

These store the bookmarks (which are for internal use only) and the m_update_history vector, which is accessed via GetUpdateHistory().

The Greylist map

    typedef std::map<std::string, GreylistCandidateEntry> Greylist;

    //!
    //! \brief Smart pointer around a collection of projects.
    //!
    typedef std::shared_ptr<Greylist> GreylistPtr;

The actual greylist entries are collected into a std::map keyed by project name. This in turn is wrapped by a shared_ptr.

AutoGreylist iterator overloads

Similar to other registry and registry like classes in Gridcoin, the AutoGreylist contains iterator overloads to allow accessing the AutoGreylist map using range loops and other iterator like uses.

void Refresh()

This refreshes the AutoGreylist object from the current superblock. The cached state is used if the superblock hash has not changed to reduce overhead.

void RefreshWithSuperblock(SuperblockPtr superblock_ptr_in, std::shared_ptr<std::map<int, std::pair<CBlockIndex*, SuperblockPtr>>> unit_test_blocks = nullptr)

This refreshes the AutoGreylist object from an input Superblock pointer. It contains a second parameter that provides an alternate way to input test superblocks for unit testing.

void AutoGreylist::RefreshWithSuperblock(Superblock& superblock)

This refreshes the AutoGreylist object from an input Superblock that is going to be associated with the current head of the chain (i.e. a stake). This mode is used in the scraper during the construction of the superblock contract as part of the call chain from the miner. The superblock object will be updated with the greylist status. This is critical distinction. The other two forms simply use the superblocks on the chain, while this form is freezing the state into the provided superblock object as well as doing the historical lookback.

void Reset()

Resets the AutoGreylist object. This is called from the Whitelist Reset().

Private members

    mutable CCriticalSection autogreylist_lock;

    GreylistPtr m_greylist_ptr;
    QuorumHash m_superblock_hash;

The autogreylist_lock is an internal critical section to ensure thread safety, since the AutoGreylist singleton can be accessed by multiple threads. The m_greylist_ptr is the shared smart pointer to the actual greylist map, and the m_superblock_hash stores the hash of the superblock used for the last AutoGreylist update and is used to detect a state change, otherwise the cached information is used.

Changes to Whitelist (Project Registry) class

Change to WhitelistSnapshot Snapshot method

This method has been extended to take the Project Filter as an argument, defaulting to ACTIVE, and also the refresh_greylist boolean defautling to true, and the include_override boolean defaulting to true. This method implements the AUTO_GREYLISTED override of project status when the corresponding AutoGreylist greylist candidate entry meets greylisting criteria according to the rules.

const ProjectEntryMap GetProjectsFirstActive() const

This is a new method that provides a map of the first entry for each project in the registry. This is used by the AutoGreylist class to determine abbreviated lookbacks for projects that were whitelisted within the 40 superblock lookback window.

std::shared_ptr GetAutoGreylist()

This is a new method that returns a shared smart pointer to the AutoGreylist object. This object is a singleton just like the Whitelist registry.

New private members

ProjectEntryMap m_project_first_actives stores the first (active) entry for each project keyed by project name, and is returned read-only by GetProjectsFirstActive(). This map is populated by the registry contract handlers and leveldb initialization method. The std::shared_ptr m_auto_greylist is the smart shared pointer to the AutoGreylist object.

Change to the WhitelistSnapshot class

The constructor of this class was changed to accept the project filter used as a parameter, which is stored for convenience in the WhitelistSnapshot object.

Quorum changes

The QuorumHash ComputeQuorumHash() was extended (implicitly) to include the project all cpid total credit data as part of the superblock hash. The superblock version is validated to ensure that no superblocks less than v4 are accepted once the superblock v4 block height has been reached.

GUI - ResearcherModel and ProjectTableModel changes

The researcher and project table models were extended to deal appropriately with auto greylisted status, following the order of precedence. In the project table displayed in the GUi, automatic greylisting status and manual greylisting status takes precedence over excluded, because if a project has those statuses, it has the same effect as exclusion, but is not a scraper directive.

RPC changes

UniValue SuperblockToJson(const GRC::Superblock& superblock)

This helper function that provides JSON superblock outputs to several different RPC functions has been modified to include the project greylist status and the project all CPID total credits.

UniValue addkey(const UniValue& params, bool fHelp)

This administrative function has been extended to handle manual greylisting and the automatic greylisting override.

Unit Tests

superblock_tests.cpp

The superblock tests were modified to use version two in the superblock tests. A todo would be to change them for the new structures in v3 to fully test serialization and deserialization, but this has been covered in the isolated testnet live network test.

project_tests.cpp

A unit test that uses the std::pair<CBlockIndex*, SuperblockPtr>>> unit_test_blocks parameter of the AutoGreylist::RefreshWithSuperblock method has been implemented with 47 superblocks of test data to test the operation of the greylisting rules. These superblocks exercise every real-world condition expected to be encountered that can be solved with automatic application of the implemented automatic greylisting rules, including no statistics on the first superblock after whitelisting, statistics "drop outs" with no data or no increase in total credit and/or both (i.e. a total credit number then no data then another total credit number that is the same), and a drastic drop in total credit change per superblock that results in WAS meeting greylisting criteria. More superblocks than the lookback limit was tested to ensure the lookback stopped at the appropriate place.


The original notes from the PR in the middle of development for historical purposes:

This PR is for tracking the progress of implementing automated greylisting.

Please see the following set of notes for design considerations that need to be discussed.

  • Basic manual greylisting and scraper machinery for determining automatic greylisting complete
      • Manual greylisting is an administrative contract type that rides normal transactions
        • Scrapers will now collect statistics on projects that have a greylisted status of either AUTO_GREYLISTED or MANUAL_GREYLISTED. Credits and average credits will be recorded in the project payloads, but the project magnitude will be zero, and they will not contribute to CPID magnitude
        • The Scraper code does not deal DIRECTLY with greylisting rules as this is an individual node responsibility
        • The convergence rules in terms of required number of projects use ACTIVE projects and do not include greylisted projects even though the stats are being collected for greylisted projects. This is because a project may be greylisted and available or literally not available at all, so convergences cannot be always expected at the project level for greylisted projects.
  • TODO
    • Wire up automatic greylisting
      • These exist along superblock boundaries, like superblocks themselves (claims with a valid superblock contract) and beacon activation
      • Need to compute ZCD and WAS
      • ZCD rule is <= 7 zero credit days out of 40, WAS rule is last 7 days average project credits / 40 days average project credits >= 0.1
      • Since this needs to be on superblock boundaries, we can slightly change the rules for implementation to be in superblocks rather than days. Since almost all of the time superblocks are very close to one day, this is almost the same.
      • Implies an algorithm that operates over 40 days of superblock history
      • No stats for a whitelisted project in a superblock (i.e. because the project is hard down) needs to be counted in ZCD, with zero entry for WAS averaging, even though last project convergence may be from the 48 hour stats carryover. This may require a tweak to the scraper convergence code
      • Choice of
        • stateless methods that repeatedly iterate over 40 superblock history to apply rules
        • methods over a cache structure that stores a subset of information from up to 40 superblocks relevant to the rule computation
        • Advantage of stateless is simplicity
        • Disadvantage is that it is fairly expensive, as the superblock registry has to be iterated over and processed – this involves disk I/0.
        • Advantage of caching is speed
        • Disadvantage of caching is complexity
        • Once client is synced this is only called when the superblock is staked and processed by the client, ~1 per 24 hours.
        • During sync will be approx 1 per 960 blocks
      • Need to define order of precedence of manual greylist versus automatic. Status of manual greylist must always override automatic as the whole point to manual greylisting once this is put into place is to deal with corner case issues that are not handled by the ZCD and WAS rules.
        • Manual greylisting is granular to each block (i.e. an administrative contract of type project)
        • Automatic greylisting is granular to the superblock (valid staked superblock claim)
        • Walking this through…
          • Example 1

block → MAN_GREYLISTED

superblock → AUTO_GREYLISTED → status still MAN_GREYLISTED

superblock → removed from AUTO_GREYLISTED → status still MAN_GREYLISTED

block → removal from MAN_GREYLISTED → ACTIVE

          • Example 2

block → MAN_GREYLISTED

superblock → AUTO_GREYLISTED → status still MAN_GREYLISTED

block → removed from MAN_GREYLISTED → AUTO_GREYLISTED

superblock → removed from AUTO_GREYLISTED → ACTIVE

        • I think this means we have to do the cache. The most convenient way to deal with this order of precedence is to store the underlying AUTO_GREYLIST status in the cache and have methods to utilize this information
          - Have status in cache of something like AUTO_GREYLIST_QUAL which means project has met the conditions for AUTO_GREYLIST by the rules, but was already MAN_GREYLISTED
          - This would be checked for each contract injection to change MAN_GREYLIST status, to decide whether new status is either AUTO_GREYLIST or ACTIVE
          - The AUTO_GREYLIST_QUAL is a flag on the project at the current (head) state
          - Maybe this really belongs in the in memory superblock structure? This does not need to be in the on-disk (chain) superblock structure at the cost of some computations.
          - There is an existing superblock cache (SuperblockIndex g_superblock_index) that currently stores the last three superblocks and could be expanded to 40 superblocks as an easy way to do the cache. This means more computation on top of the cache but much faster because it operates on in memory structures rather than reading from the superblock registry (disk I/O). It also means more memory usage.
          - Maybe best to modify the cache to be a hybrid and store more limited information for superblocks 4 – 40. But this makes the cache updating more complicated.
          - The memory usage of the additional superblocks is minimal compared to the current size of other data structures with the current active beacon count and chain length; however, when the benefactor contracts are implemented, this will no longer be true.
      • Create more detailed automated greylist reporting
          • Simple listing status on project not sufficient, because users will want to know the details of why a project is greylisted (this is ZCD/WAS reporting)
            • Probably should be something that operates on the project grid in the GUI and allows “clicking” the project whitelisting status and then having a pop-up window that displays the details of ZCD/WAS.

@jamescowens jamescowens self-assigned this Oct 2, 2024
@jamescowens jamescowens added this to the Natasha milestone Oct 2, 2024
@jamescowens jamescowens force-pushed the implement_greylist branch 2 times, most recently from 715dba6 to 11cbd3b Compare October 6, 2024 20:19
@div72
Copy link
Member

div72 commented Oct 6, 2024

Scrapers will now collect statistics on projects that have a greylisted status of either AUTO_GREYLISTED or MANUAL_GREYLISTED.

Makes sense for automatic greylisted projects for de-greylisting, but why are statistics collected for manually greylisted projects? The greylister can then operate on projects with either ACTIVE or AUTO_GREYLISTED status.

Also considering the WAS & ZCD calculation is done per day, I am not sure if bothering with caching is worth it. Might be worthwhile to make some dumb implementation and do a benchmark.

Could adding a separate -projectnotify parameter be useful? I've been thinking about making a mailing list for new polls, adding project state changes doesn't sound bad.

@jamescowens
Copy link
Member Author

jamescowens commented Oct 7, 2024

Excellent question. Depending on the reasons for the manual greylist, statistics may still be available for a project. If so they should continue to be collected, because the ZCD and WAS rules would then apply if the manual greylist status was removed.

What are you thinking in terms of functionality for the -projectnotify parameter?

@div72
Copy link
Member

div72 commented Oct 8, 2024

If so they should continue to be collected, because the ZCD and WAS rules would then apply if the manual greylist status was removed.

Good point.

block → removal from MAN_GREYLISTED → ACTIVE

Manual greylisting should instantly take effect, but should ungreylisting do so too? It should be simpler to make manual ungreylisting put the project in an auto greylist state. It'll take until the next superblock for the project to become active but that's ok imo. I'm imagining a FSM like this:

                              ⣀⡤⠤⠒⠒⠒⠉⠉⠉⠉⠉⠉⠙⠒⠒⠢⠤⣄⡀                         
                            ⡴⠋⠁                    ⠉⠳⡄                       
                            ⢣⣀    AUTO_GREYLIST    ⢀⣠⠇                       
                             ⠈⠙⠒⠤⠤⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⡠⠤⠴⠚⠉                         
                          ⠠⠤⣔⠶⢢               ⡠⣤⢄⣀⡀                       
                        ⢀⡠⠔⠉   ⠁             ⠈  ⠑⠢⡀                       
                     ⠘⠴⠮⠥⠤                         ⠈⠑⠤⡀                      
   ⣀⣠⠤⠤⠖⠒⠒⠒⠒⠒⠒⠒⠒⠒⠲⠤⠤⣄⣀                               ⣈⣑⢄⡀⡔                          
⣠⠖⠋⠁                   ⠈⠙⠲⣄                              ⣉⡭⠤⠖⠒⠒⠒⠒⠦⠤⣄⡀                          
⡇      MANUAL_GREYLIST     ⢸   ⠒⠾⣛⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒⠒  ⢸⡁  ACTIVE ⢀⡹                           
⠙⠲⢤⣀                   ⣀⡤⠖⠋                              ⠈⠙⠒⠒⠒⠒⠒⠒⠒⠚⠉                           
   ⠈⠉⠙⠒⠒⠲⠤⠤⠤⠤⠤⠤⠤⠖⠒⠒⠋⠉⠁                                               

What are you thinking in terms of functionality for the -projectnotify parameter?

Similar to other notify commands. It should be triggered on project status changes(added to whitelist, removed, greylisted etc.), should call a script with the contract hash(or superblock hash in case of an automatic greylist).

@jamescowens
Copy link
Member Author

That is a good simplification actually.

@jamescowens jamescowens force-pushed the implement_greylist branch 2 times, most recently from 6dbd145 to 23e6b12 Compare November 25, 2024 01:00
@jamescowens
Copy link
Member Author

jamescowens commented Jan 6, 2025

Well... back to the drawing board. With "Initial implementation of AutoGreylist" I have created an auto greylist class that executes the ZCD and WAS rules. It is not wired into the whitelist or the superblock yet, but you can see the results using getautogreylist rpc.

The idea on this is to use the AutoGreylist as an override in the Whitelist class, marking the status of projects that meet automatic greylisting status as AUTO_GREYLISTED. This would take precedence over ACTIVE or even MANUALLY_GREYLISTED, There are no project entry (registry) status updates made for the automatic greylist. It is maintained as a global singleton cache which is intended to be refreshed when a new superblock is staked. So when a project comes off the auto greylist, it will revert automatically to the original status dictated by the last valid project entry status.

There is a big problem with the input to the AutoGreylist class, however. The scrapers filter the projects to select only the statistics of actively beaconed crunchers to save processing time and network bandwidth/node memory. This means that the TC reported in the project files provided by the scrapers for greylisted projects, indeed all projects, when summed, only are across active beacon holders for that project, not ALL crunchers. The AutoGreylist class uses the information in the historical superblocks, which is a reduced form of the scraper statistics, so it telescopes the same problem.

Unfortunately, this causes serious issues with the current rules. For example, here is an output after the SB at 3473369 was posted on mainnet:

{
"auto_greylist_projects": [
{
"project:": "SiDock@home",
"zcd": 8,
"WAS": 0.05824032176973353
},
{
"project:": "asteroids@home",
"zcd": 6,
"WAS": 0
},
{
"project:": "gpugrid",
"zcd": 4,
"WAS": 0.02413605533979817
},
{
"project:": "rosetta@home",
"zcd": 10,
"WAS": 0.01227473310563061
}
]
}

I haven't fully traced the WAS, but the ZCD for asteroids, I went and traced the void AutoGreylist::RefreshWithSuperblock(SuperblockPtr superblock_ptr_in) as it updated the greylist. What became immediately apparent is that the asteroids TC for the latest superblock at the time of the run (3473369), was actually LESS than the TC for the previous SB (3472381). How can this be. It is absolutely possible. For example, beaconholders that contributed to the TC for that project may expire between SB's and actually cause the TC summed across all active beacon holders to decline. This is clearly not what is intended by the rules.

The rules using the script based greylist checker operate on the total TC for the entire project across ALL CPID's, whether registered beacon holders or not, and as such are much more stable.

I was hoping to get away with minimal changing to the scraper plumbing and the project stats objects that are put in the manifests of the scrapers to avoid additional points of possible problems, but it looks like I am going to have to bite the bullet and

  1. Sum the TC for all of the users in the source stats export file for a project as it is being processed by the scraper, and
  2. Modify the manifest processing to handle that information, which will have to be provided as another manifest object.
  3. Modify the superblock to store the project-wide TC's for all projects across ALL CPID's, not just the TC project sum for active beaconed CPIDs.
  4. Have the AutoGreylist class use this info instead of the existing project TC sums in the superblock.

This is going to cause the scraper processing to go up some.

Ugh.

@jamescowens jamescowens force-pushed the implement_greylist branch 10 times, most recently from f7a64df to 6cf7fde Compare January 12, 2025 23:10
@jamescowens jamescowens changed the title scraper, project: Implement automated greylisting (tracking PR for WIP) scraper, project, superblock: Implement automated greylisting (tracking PR for WIP) Jan 13, 2025
@jamescowens jamescowens force-pushed the implement_greylist branch 3 times, most recently from 2fe6d02 to e58b82a Compare January 13, 2025 15:36
This change introduces a map in the projects registry (whitelist)
that tracks when a project was first added. This is used in the
AutoGreylist::RefreshWithSuperblock method.

When projects are newly added to the whitelist, their first entry
(and resultant superblock) will be close to the head of the chain,
well within the 40 superblock lookback to execute the greylist ruleset.

The update loop in RefreshWithSuperblock will only post
greylist entries (remember it goes backwards from the present)
when the entry timestamp is greater than or equal to the first
project entry for that project (i.e. when it was first put on the
whitelist).

This is required to ensure the ZCD and WAS rules work correctly
while newly whitelisted projects do not have the full 40 SBs to
sample.

Also includes some changes due to unit testing issues and locking
optimization
This allows the automatic greylist to be overridden for a project
if for some reason it is not functioning correctly.
Made small adjustments as a result of unit testing. Note that
a tough to track down error was occurring due to an incorrect
action from the MarkAsSuperblock() call on the CBlockIndex object.
It turns out this was a enum confusion caused by SUPERBLOCK
being used in two different enums in the same namespace. This
has been corrected by puting the MinedType enum in the GRC
namespace.

The rules have also been finalized to mean the 7 out of 20
ZCD's means 20 SB's lookback from the current, which is
actually 21 SB's including the baseline, and correspondingly
for the other intervals.

Also a grace period of 7 SB's from the baseline is used before the
rules are applied to allow enough statistics to be collected to
be meaningful.
@jamescowens jamescowens force-pushed the implement_greylist branch 2 times, most recently from b69ecc1 to 976b5eb Compare February 3, 2025 22:18
Implement v3 superblock tests. Also fix potential deadlock issue
with cs_main lock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants