IP cache #53

mazhurin · 2020-09-28T14:47:20Z

Local cache for the challenged IPs. Every challenge IP is cached in order to

prevent the extra challenge commands if this IP is still in the following batches
reference this IP from banjax report thread during the processing ip_failed_challenge or ip_passed_challenge banjax reports

mkaranasou

I have many questions I think :P
Good job in general, many good ideas in here 👍

mkaranasou · 2020-10-15T07:22:20Z

src/baskerville/models/banjax_report_consumer.py

+        try:
+            if num_fails >= self.config.engine.banjax_num_fails_to_ban:
+                self.ip_cache.ip_banned(ip)
+                sql = f'update request_sets set banned = 1 where ' \


Have you tested the performance of update? I think we could consider having a separate table for the banjax bans , since they will be a lot less rows than request sets.
Also, do you use sql strings because of better performance?

No, I did not test the performance. Just monitor the performance of the postprocessing pipeline.

Not for performance. I was a bit concerned about that mysterious 1h shift issue and thought that SQL update with explicit time is solid and maybe more readable.

src/baskerville/models/engine.py

src/baskerville/models/ip_cache.py

mkaranasou · 2020-10-15T07:44:51Z

src/baskerville/models/pipeline_tasks/tasks.py

@@ -1251,13 +1255,68 @@ def __init__(self, config, steps=()):
        super().__init__(config, steps)
        self.df_chunks = []
        self.df_white_list = None
+        self.ip_cache = IPCache(config, self.logger)


Will the IPCache be used in other steps? It could be a TaskWithIPCache task. Also I think that all the Banjax stuff should be in different tasks, as many of the methods in attack detection, like steps of AttackDetection. It would be more modular. I remember you've said that it was a bit tricky because of the difference in df needs but - whenever we have some time of course - we could figure it out.
Again, whenever we have some time :) We could rethink about this when doing performance tuning after the cluster set up, for example.

TaskWithIPCache looks like an overkill. If you need an IPCahe in other steps you just create an instance and use it. It's a singleton. But yes, we could rethink later. Now, at least, using this singleton does not block us to move in any direction.

src/baskerville/models/pipeline_tasks/tasks.py

src/baskerville/util/helpers.py

src/baskerville/util/singleton_thread_safe.py

mkaranasou · 2020-10-15T12:24:58Z

src/baskerville/models/pipeline_tasks/tasks.py

            num_records = len(records)
            if num_records > 0:
+                challenged_ips = self.spark.createDataFrame(records).withColumn('challenged', F.lit(1))


Have you tried with perist? If yes, does it help? This also goes for the whitelist df.

I don't remember. But this part is not a bottleneck. I tried persist for whitelisting. It did not help.

White list hosts added.

mazhurin added 26 commits September 23, 2020 10:50

IPCache class added(WIP)

7d0974f

ip_cache first experiment.

783f4ba

Merge branch 'develop' into ip_cache

ddd8228

ip_cache logs + single update()

ee9ba36

ip_cache update() fix

0f7b001

ip_cache log typo fix

0cbc3f6

ip_cache persist added

b97ed33

no counts in logging

4562e8c

last count() in logging removed

843cb61

spark ip_cache replaced with the local cache

6cf4d8d

ip cache log message

99b0e01

ip cache log message

fac8c39

Full cache exception.

ab6fbf4

Full cache warning instead of exception.

6bc8997

IP cache persisting

9762959

logs message corrected with total

e528a45

creating ip cache folder if not exists

39addb2

Thread safe singleton ip cache

0c5481e

config fix in banjax_report

941107c

banjax_report with cache experiment

ef69ce2

banjax_report logs

6a7e961

banjax_report logs type

0564307

incrementing fails in ip_cache

8204a97

logs if ip is not in the cache

0dc3507

cache error handling

f966986

cache error handling 2

05a1532

mazhurin requested a review from mkaranasou September 28, 2020 14:47

mazhurin changed the base branch from master to develop September 28, 2020 14:48

mazhurin added 2 commits September 28, 2020 10:51

Typo fix

4421776

Merge branch 'develop' into ip_cache

147427e

mazhurin added 20 commits September 30, 2020 18:01

num_fails = 1

288be2d

White list fix.

0905262

Empty challenge fix

a7082bd

Sliding window fix.

54cb126

White list fix. Challenged default zero fix.

4ec694c

Low rate attack default zero fix.

47ddd3d

Low rate attack default zero fix 2.

68beba2

Low rate attack default zero fix 3.

f765d6d

Banned/passed reports from banjax

d470e39

Extra loggint removed

8ba115b

Report processing is commented out for now.

2a95243

Two ip caches: passed and pending

ca697d3

a log removed

129d6a5

Fix in ip_cache_passed processing

e4005b7

Fix in return ip_cache_passed processing

fa28f7c

Remove from pending if banned.

9b2cd9a

Saving ip_passed cache in the file.

89570ad

start report consumer even without -e

ca5416a

Banjax thread moved into AttackDetectin task

a111070

Unit test fix for permit in request_set_cache

b9369c9

mkaranasou requested changes Oct 15, 2020

View reviewed changes

mazhurin added 9 commits October 15, 2020 12:03

IP_cache: docstring added, init_cache method. Switched to _pickle.

df9e7f5

The new version of spark-iforest

074293e

Full path pending fix.

d302c1e

Full path pending fix 2.

d254ca6

White list ips optimized.

f623ac4

White list hosts added.

Column name fix in host white list.

3e7990c

IP cache update() accept a list of ips now.

7217a7a

Host white listing is moved to send_challenge()

1a47421

empty host white list fix

37c6e31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IP cache #53

IP cache #53

mazhurin commented Sep 28, 2020

mkaranasou left a comment

mkaranasou Oct 15, 2020

mazhurin Oct 15, 2020

mkaranasou Oct 15, 2020

mazhurin Oct 15, 2020

mkaranasou Oct 15, 2020

mazhurin Oct 15, 2020

IP cache #53

Are you sure you want to change the base?

IP cache #53

Conversation

mazhurin commented Sep 28, 2020

mkaranasou left a comment

Choose a reason for hiding this comment

mkaranasou Oct 15, 2020

Choose a reason for hiding this comment

mazhurin Oct 15, 2020

Choose a reason for hiding this comment

mkaranasou Oct 15, 2020

Choose a reason for hiding this comment

mazhurin Oct 15, 2020

Choose a reason for hiding this comment

mkaranasou Oct 15, 2020

Choose a reason for hiding this comment

mazhurin Oct 15, 2020

Choose a reason for hiding this comment