-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] arithmetic combination (addition, multiplication, etc) of sequences #47
Conversation
Related: https://github.com/sktime/sktime/blob/main/sktime/dists_kernels/algebra.py - in case you are interested, there are some work items around symbolic combinations of distances. |
One question might be how you would like to handle the Should that not be a dependency, given that your scope seems to involve sequence calculations as well, rather than just pre-defined sequences? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module should not be in sequence.core ? Or it should be not a script in sequence.core? Perhaps arithmetic_operations.py or just operations.py?
Because the classes are helping to create new sequence but are not sequence their-self.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, they are both, aren't they?
Instances of this class will be full sequences, first-class citizens in the interface. In sklearn
, sktime
, etc, such classes sit in "compose"-like modules, where they can be used by the user.
Whereas, core/base, are usually base classes that users won't see, and implementers wlil inherit from but not interface.
Of course, your package, your rules.
sequence/sequences/compose/arith.py
Outdated
import numpy as np | ||
|
||
|
||
class Arith(Sequence): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps ArithmeticOperations
or IntegerSequenceOperations
?
In fact, the method is_finite
is correct just for integer sequences.
For example, take the two non-integer sequences (-1)^n
and (-1)^(n+1)
. Both are infinite, but their sum is finite, as it is the constant sequence 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the method
is_finite
is correct just for integer sequences.
Do we have the same definition of "finite"? Would appreciate if you could link the one you are using.
I thought this property meant that the sequence is of finite length. "finite sequence" afaik is a standing term with this meaning. So, the constant sequence with a countably infinite number of terms, according to this definition, would not be finite, but infinite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re name, I prefer short ones, otherwise object string identifiers become long and unreadable quickly... how about ArithOp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have the same definition of "finite"? Would appreciate if you could link the one you are using.
I thought this property meant that the sequence is of finite length. "finite sequence" afaik is a standing term with this meaning. So, the constant
I see your point, and I understand the misunderstanding between us. Perhaps, I expressed myself bad. Let re-explain the situation:
Suppose we have two sequences a_n and b_n, and we want to add them. We have 3 cases:
- both are infinite
- one is finite and the other is infinite
- both are finite
In the first case, no problem.
In the second case, how should be defined the sum of a finite sequence with an infinite one?
in the third case we have two subclasses:
- if the two sequences have the same length -> no problem
- if the two sequences have different length -> how it is defined the addition?
Possible solutions:
- you can just add infinite sequences or sequences with the same length
- we consider the finite sequence as the sub-vector space of all sequence which have compact support in the vector space of all sequences ( REF)
Both solution have pro and contro.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best option is to take the minimum of lengths, for the terms where the combination is defined.
(that operation is also associative)
sequence/sequences/compose/arith.py
Outdated
---------- | ||
sequences : List[Sequence] | ||
List of sequences to combine. | ||
operation : None, str, function, or numpy ufunc, optional, default = None = mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is default = None = mean
? Actually operation
as no default value, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that is an oversight. Also, perhaps "addition" is a more sensible default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or no default? As the class should not be used directly by the final user or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not? A custom operation
can be provided. Of course dunders are more convenient for most use cases.
sequence/sequences/compose/arith.py
Outdated
sequences : List[Sequence] | ||
List of sequences to combine. | ||
operation : None, str, function, or numpy ufunc, optional, default = None = mean | ||
if str, must be one of "mean", "+" (add), "*" (multiply), "max", "min" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you take the vector space of all integer sequences, you have that it is close by taking addition, multiplication, min and max (actually is an abstract Riesz space). But it is not close by taking mean
. If we restrict the class just to consider integer sequence, this can be a problem.
Moreover, what do you mean for mean
? Is the pointiwise mean? i.e., if a_n
and b_n
are two sequences, then the mean(a_n, b_n)
is the sequence (a_n + b_n )/ 2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, pointwise mean. I copied this coding without much thinking from sktime
distances, where mean is a sensible default.
There is of course also a question about scope here: are you interested only in integer sequences? Seems like an unnecessary restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is of course also a question about scope here: are you interested only in integer sequences? Seems like an unnecessary restriction.
Actually I'm interested in all kind of sequences but I started with the integer ones which are already very interesting and in some sense easy to work with as you know their image.
The main ideas was to code as much as possible sequences from https://oeis.org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main ideas was to code as much as possible sequences from https://oeis.org/
Yes, I get that ofc.
How about covering all of them with a parametric object OEIS(id)
, via https://oeis.org/wiki/JSON_Format,_Compressed_Files ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
symbolic resolution would also be neat, e.g., asking, is OEIS(xyz) + OEIS(abc) = OEIS(def)
, as integer series
(I would like that for skpro
distributions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a good idea.
I have to admit that, I see the philosophy behind but I don't see how to implement it :-)
From this PR are coming so many good point of inspiration. I'm thinking to open a GitHub-project (or using the one already done) to clarify every point and decide how to proceed :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, absolutely!
How about meeting on discord (e.g., one of the sktime channels), and discussing a mini-roadmap for sequentium, and the "sets" project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... I will contact you... good idea
sequence/sequences/compose/arith.py
Outdated
return Arith(sequences=[*self.sequences, *other.sequences], operation="+") | ||
|
||
def __mul__(self, other): | ||
from sequence.sequences.compose.arith import Arith |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you importing Arith
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-paste accident, from the base class (where this is indeed needed)
sequence/sequences/compose/arith.py
Outdated
self._op = self._resolve_operation(operation) | ||
|
||
def __add__(self, other): | ||
if not isinstance(other, Arith) or self._op != other._op: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have self._op != other._op
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line is about dealing with the situation where you have Arith([a, b, c], sym) + Arith([d, e, f], sym2)
.
If sym == sym2
, then you'd want this to become Arith([a,b,c,d,e,f], sym)
rather than Arith(Arith([a,b,c],sym), Arith([d,e,f],sym))
, because all sym
here are associative. Well, except the mean, but that's an oversight which we probably need to fix in sktime
as well.
The "or" is for checking that indeed sym==sym2
. We could do so by the string, but the same operation has multiple string aliases ("+"
or "add"
), so we check _op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah... good point...
But will the final user be able to access this class? Because if the answer is no, we don't have to have different alias for the same operation, not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of it as user facing, potentially. Even if not, it makes sense to minimize the size of the parse tree.
def _as_generator(self): | ||
op = self._op | ||
while True: | ||
yield op([next(sequence) for sequence in self.sequences]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is gonna to fail if you have the sum of a finite sequence with an infinite one, or not?
and how it is defined the product of two sequences, one finite and the other infinite?
In general, we say that a sequence a_n
is finite if there is N > 0
such that a_n = 0
for every n > N
. Therefore, the product of a finite sequence with an infinite one should be finite. Is this respected in this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see.
I didn't fully understand your model. I thought finite sequence indeed had finite length, I would expect tee generator to be finite too, in this case.
Given this explanation, I now have a strong opinion that that is how it should be, i.e., next should not give 0, but a generator-end.
Because otherwise, you cannot symbolically distinguish infinite sequences that have 0s, and finite sequencese; and, you cannot distinguish finite sequences with trailing 0s, from the same sequence but shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I explain myself bad. I hope my explanation above is better :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which explanation, is there a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that: #47 (comment)
In fact, I was confusing what should be a finite sequence: a sequence with finite length or one which is eventually always 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, what should our definition of finite be for the package?
(I would prefer agreeing with the one on wikipedia - finite length, rather than "becomes constant zero")
lists = [sequence._as_list(stop, start, step) for sequence in self.sequences] | ||
return [op(z) for z in zip(*lists)] | ||
|
||
def _at(self, index: int) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return type is wrong. if you take the mean of the constant sequence 1 with itself, then the resulting sequence is the constant sequence 1/2. Therefore, if you apply the method _at, the return type will be 1/2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you take the mean of the constant sequence 1 with itself, then the resulting sequence is the constant sequence 1/2.
I don't see that, and I must be stupid for not seeing that.
Can you please carry out the computation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's my computation:
we consider the pointwise mean, so each value in the constant sequence ArithOp([one, one], "mean")
will have the value that is the mean of two ones.
Substituting definitions for the mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha sorry :-) I'm the stupid one...
You are right. Let's find a better example:
Suppose a_n is the constant sequence = 1 and b_n is the constant sequence = 2. Then 1/2*(1+2) = 3/2
But this is always the question if we should restrict ourself to integer sequences or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any scope explosion from allowing real sequences too (in terms of software complexity) - so, why not?
I agree of course that the mean is not a good choice for a default, or an option at all.
For instance, I was just giving a set of finite sequence already coded that the final users could use. But it was an idea to create a method which the final user can use to create its own sequence. Something like, instantiate_sequence(type="explicit", formula="...") and it returns a I didn't want to use external libraries for this project, but Numpy seems really interesting to solve the problem of arithmetic operation between sequences. I will probably add it |
Hey @fkiraly We have some thing to discuss about, but it seems really interesting how it is going ;-) |
The two dependencies I'd find plausible for this project are |
Yes exact. I think it is possible to rewrite your class using the math operation of the math module of Python. As you mentioned, for a small sets of sequences we should not see a difference but for big ones yes |
ok! Could you list the changes you'd like me to make? We've discussed a lot but should align on the end state of this PR. |
? |
Yes, sorry. With work I don't have so much time. I will take the time to list the changes this weekend for sure |
no worries, take your time. Just wanted to make sure this didn't get buried in an inbox or similar. |
Summary:
In general, I'm not happy about the CI/CD of the package. Until I work alone it is working fine, but just because I know perfectly how it is made. After your changed are released (in version Finally, thank you very much for the contribution, the effort and, overall, the ideas that this PR created :-) |
Addressed changes:
I removed numpy and operations that leave the integers. However, I've added other interesting associative operations that stay in the integers - gcd, lcm.
Ok, also added a length dunder.
No need to close for that. You can change the target with "edit" (top of GitHub GUI)
My preference is merge to main, and manual release trigger. GitHub allows creating releases manually, and GHA allows to set these as release action trigger, see
I think that isn't good practice - should the tests not run still even if in
Sure - you know you can also have tihs auto-generated from the commits? If you (a) squash PRs, and (b) in settings, ensure the commit name defaults to PR title, and commit text to PR first post, the GitHub release interface creates a very readable changelog for you. |
I would recommend to open an issue to discuss end stage of CI and testing framework? I would suggest sth like "collect all sequences and run the same set of interface tests on all of them". An option is doing this with skbase (minimal work for me), but that's not the only way ofc. |
Fixes #21, implements a class
Arith
which allows to create symbolic arithmetic combinations of sequences. Evaluation methods carry out the eager computation.Also implements dunders
__add__
,__mul__
in the base class.Does not implement
__len__
, or other sub-class specific methods for now - my intuition would be to make some of the subclasses property tags instead (using scikit-base, see #45).I'm also a bit puzzled about the test framework, so I did not add any tests, but I reckon the class has to be added to a config somewhere. Advice appreciated.