Skip to content
Alex Sandro Garzão edited this page Nov 14, 2015 · 6 revisions

Welcome to the yapc wiki!

Doubts

  • Is possible one cache system reaches the goals of the three versions above?
  • Reverse proxy is basically a proxy cache that access only local content?
  • Where (and how) the URL could be rewritten?
  • How NGX define domain's rules and resources' rules?
  • Log manager/statistics: is a good idea uses the same approach that varnish? Or it must be considered an improvement?
  • Is a good idea to have more than one option to do the storage, log, and so on?
    • Maybe, based in the config, I can rewrite some code to use a specific storage backend, log system, … With this approach, this project can be used like a framework to construct a custom proxy cache system :-)

Startup sequence (final version)

  • If the config was changed, generate the source's configuration, compile and build the new proxy binary
  • Run the proxy binary
  • Bind in the network port
  • Listen
  • For each connection request
    • Accept
    • Read request
    • Match request in configuration
    • Send 404 if appropriate
    • Connect in the upstream
    • While there is object data
      • Read object data (4KB buffer)
      • Send data to modules' pipeline
      • Send data to cache (storage)
      • Send data to downstream

Tasks

  • Define the possibles module's types and where them can be executed
    • Some modules can generate content
    • Some modules can change the content that was read from the upstream
    • Some modules can change the content that was read from the cache
    • In specific situations, one module can generate the headers' content or alter them
    • We cannot forget that headers are the first content that must be sent
  • BDD, TDD, performance tests, stress tests, FDD, …
  • Focus on test since the first line of code!!!!
  • Varnish tools are valuable? It's funny to implement, but they are useful?
    • varnishlog, varnish top, varnishhist, varnishstat
  • purge?

Sample config

.*.youtube.com:	pass
aabbcc.*		pass
abcxy.br		-> 192.168.1.2 // proxy
// Is it a good approach? Maybe, using a module would be more interesting, but how can I integrate this on the config?
// Well, a NGX's module can register your tokens in configuration…
.*			fail // proxy reverso
.*			accept // proxy ou proxy cache

Future, in this config, I can put cache parameters (global and per rule), TTL, ...

Proxy architecture

Components:

  • Config manager: Responsible for read the configuration, generate the sources, compile and build the new binary;
  • Startup ou Setup Manager: Responsible for Load the plugins (or they will be together with the binary?), network setup (bind, listen);
  • Requests manager: Responsible for accept the requests, read them, match and validate in the config, hit/read or miss/fetch, send to modules' pipeline, save in storage, send to downstream;
  • Permanent log manager;
  • Object manager: Responsible for keep the object's' index in memory to provide a fast search (hit or miss?);
  • Storage manager: Responsible for save and load the objects. Is possible to have a lot of storage managers like, for example, memory storage, hdd storage, ssd storage, MMAP version, one file per object, one file with a lot of objects, … And, in some cases, the objects are unique in the storages, but in other cases the objects can be duplicated (for fast retrieve);
  • Evict manager: Responsible for select the objects that will be removed;
  • Memory manager: Responsible for the memory management, avoiding syscall to OS;
  • Statistics manager: Responsible for keep object's statistics like hits, misses, fetch, object count, KB in/out per second, new objects, removed objects, and so on…;
  • Core: Responsible for the workflow of the other components, signals treatment and error handler.

VERSION 1 (PROXY ONLY)

The goal of this version is to have a functional proxy system. Optimizations are not in this scope.

Tasks:

  • Load configuration (Config manager)
    • Based on rules, without any kind of code generation
    • Each rule informs if the request will be sent to the upstream, or rejected
    • Section with common configs like port to bind, IP, ...
  • Bind in the network port (Startup manager)
  • Listen (Startup manager)
  • For each connection request (Requests manager)
    • Log request
    • Accept
    • Read request
    • Match request in configuration
    • Send 404 if appropriate
    • Connect in the upstream
    • While there is object data
      • Read object data (max size = 4KB buffer)
      • Send data to downstream
  • Logs (Permanent log manager)
  • Intercept signals (KILL, STOP, …), error handler, workflow of the components (Core)
  • Tests:
    • BDD
    • TDD
    • Performance tests
Clone this wiki locally