-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathretriever.erl
65 lines (57 loc) · 4.49 KB
/
retriever.erl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
-module(retriever).
-export(open/1).
%A very high level abstraction that allows you to retrieve from this custom database I have written
%a the result of the query to said database.
%open has two arguments. The first is a Directory string that specifies the directory that this
%database is stored within. It is expected that in Directory there will be indexed directory files
%for each possible permutation of a record. The files will be named according to a format specified
%in createDb.erl but with the string "_indexed" inserted before the "." of the filename.
%This process basically minimizes the number of operations neccesary by always looking for the
%record(s) within its collection that will satisfy a query in the minimum number of steps. It has a
%few strategies that allow it to do this, they are:
%1) It contains a N! number of files, where N is the number of (non-indexed) attributes that belong to
%the records in this database. So each file represents one sorted permutation of the records. This
%proccess, armed with the knowledge of this, will attempt to read from a file that has the records
%searched for in contiguous order. Ties are broken by always picking the file that contains the
%smallest number of records.
%2) Each record contains a index, the index allows us to know how many records match what we are
%looking for, this allows us to zoom pretty fast down into the record we want. Once a record with
%all the correct attributes is found, the index allows us to specify exactly all the records that
%should be read in order to get the entire collection.
%3) It contains its own static sprinkling of records for each file in the database that it always
%contains in memory. These records are a breadth first drop into each file. The nature of how these
%records are chosen guarantees a safe sprinkling of records across many unique buckets of records.
%Anyway, when determine where it should start looking when attempting to find a match from the
%file, it will actually first consult these memory resident things to determine if any of them
%can help it zoom on its desired record faster
%Finally, each Database contains its own queue of records of size StackSize. Any record that is
%needed will first be searched for in here.
%So when a call is made, it first checks its stack to determine if the record exist, it then attempts
%to zoom in as best as possible to the location of the record based upon the "index sprinkles". If the
%index sprinkle happened to contain the record, it simply returns that. Finally, it searches the
%correct file for the record it is looking for, zooming quickly down using the index information
%in each record it reads.
%GetFile is a fun that takes a tuple of integers and returns either {error, no_file_found} or a
%myIO (or its descendents) object that points to a io stream that matches the specified tuple.
%open(fun(), tuple())
open(GetFile, Format) -> .
%You can only send {pid(), string()} and record() to this process.
%Pid is the process id that the response should be sent to, string is the query string.
%to end, you should send the record 'close' to this process.
%The query string:
%The Query string allows you to specify in english what you are looking for:
%The smallest thing you can write are equality operators (=, <, <=, >, >=, !=). Equality
%operators are infix operators. The thing to the left of it must be a attribute name, and the
%thing to the right of it must be a value. You can link equality operators by either specifying
%"and" or "or" between them. The same attribute can be specified twice, for instance you can specify
%"id = 6 and id != 6". No logical checks are done to "reduce" these logical statements, so be a little
%carefull in specifying them.
%Query returns:
%The query will return a fun() or {error, Message}. If a error tuple is received, then consult.
%Otherwise the fun takes one argument. The first argument is the record 'size'. This specifies the
%total number of records contained in this return. The second argument is a (base 1) index that allows
%you to retrieve the record at any index contained in this return value. The reason a fun is used
%rather than a list is that sometimes the number of records returned may be quiet large. If this is the
%case than the fun will only contain so many records, and when you ask for a record that it does not
%contain, it will behind the scene retrieve the next batch of them and return them. However, this fun
%should attempt to be used as a list (that is to say, don't use anything except the head of a list).