Skip to content

Commit

Permalink
Merge pull request #1181 from automl/master
Browse files Browse the repository at this point in the history
Synchronize dev and master again
  • Loading branch information
mfeurer authored Jul 27, 2021
2 parents dbc7170 + 6f1e5c3 commit 611cf5c
Show file tree
Hide file tree
Showing 46 changed files with 44,176 additions and 13,006 deletions.
29 changes: 29 additions & 0 deletions .github/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: Question
about: Ask a question!
title: "[Question] My Question?"
labels: ''
assignees: ''

---

# Short Question Description
A clear single sentence question we can try to help with?

With some extra context to follow it up. This way the question is clear for both you and us without it being lost in the paragraph.
Some useful information to help us with your question:
* How did this question come about?
* Would a small code snippet help?
* What have you already looked at?

Before you ask, please have a look at the
* [Documentation](https://automl.github.io/auto-sklearn/master/manual.html)
* If it's related but not clear, please include it in your question with a link, we'll try to make it better!
* [Examples](https://automl.github.io/auto-sklearn/master/examples/index.html)
* Likewise, an example can answer many questions! However we can't cover all question with examples but if you think your question would benefit from an example, let us know!
* [Issues](https://github.com/automl/auto-sklearn/issues?q=label%3Aquestion+)
* We try to label all questions with the label `Question`, maybe someone has already asked. If the question is about a feature, try searching more of the issues. If you find something related but doesn't directly answer your question, please link to it with #(issue number)!

# System Details (if relevant)
* Which version of `auto-sklearn` are you using?
* Are you running this on Linux / Mac / ... ?
6 changes: 4 additions & 2 deletions .github/workflows/stale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
steps:
- uses: actions/stale@v3
with:
days-before-stale: 60
days-before-stale: 30
days-before-close: 7
stale-issue-message: >
This issue has been automatically marked as stale because it has not had
Expand All @@ -18,5 +18,7 @@ jobs:
close-issue-message: >
This issue has been automatically closed due to inactivity.
stale-issue-label: 'stale'
only-issue-labels: 'Answered,Feedback-Required,invalid,wontfix'
# Only issues with ANY of these labels are checked.
# Separate multiple labels with commas (eg. "incomplete,waiting-feedback").
any-of-labels: 'Answered,Feedback-Required,invalid,wontfix'
exempt-all-milestones: true
31 changes: 0 additions & 31 deletions COPYING

This file was deleted.

43 changes: 24 additions & 19 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,24 +1,29 @@
Copyright (c) 2014, Matthias Feurer
BSD 3-Clause License

Copyright (c) 2014-2021, AutoML Freiburg
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the <organization> nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL <COPYRIGHT HOLDER> BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
3 changes: 2 additions & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@ recursive-include autosklearn/metalearning/files *.txt
include autosklearn/util/logging.yaml
include requirements.txt
include autosklearn/requirements.txt
recursive-include autosklearn/experimental/askl2_portfolios *.json
recursive-include autosklearn/experimental/ *.json
include autosklearn/experimental/askl2_training_data.json
include LICENSE.txt
2 changes: 1 addition & 1 deletion autosklearn/__version__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Version information."""

# The following line *must* be the last in the module, exactly as formatted:
__version__ = "0.12.6"
__version__ = "0.12.7"
18 changes: 17 additions & 1 deletion autosklearn/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -885,9 +885,10 @@ def subsample_if_too_large(
task: int,
):
if memory_limit and isinstance(X, np.ndarray):

if X.dtype == np.float32:
multiplier = 4
elif X.dtype in (np.float64, np.float):
elif X.dtype in (np.float64, float):
multiplier = 8
elif (
# In spite of the names, np.float96 and np.float128
Expand All @@ -903,6 +904,21 @@ def subsample_if_too_large(
multiplier = 8
logger.warning('Unknown dtype for X: %s, assuming it takes 8 bit/number',
str(X.dtype))

megabytes = X.shape[0] * X.shape[1] * multiplier / 1024 / 1024
if memory_limit <= megabytes * 10 and X.dtype != np.float32:
cast_to = {
8: np.float32,
16: np.float64,
}.get(multiplier, np.float32)
logger.warning(
'Dataset too large for memory limit %dMB, reducing the precision from %s to %s',
memory_limit,
X.dtype,
cast_to,
)
X = X.astype(cast_to)

megabytes = X.shape[0] * X.shape[1] * multiplier / 1024 / 1024
if memory_limit <= megabytes * 10:
new_num_samples = int(
Expand Down
115 changes: 68 additions & 47 deletions autosklearn/experimental/askl2.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,53 +15,60 @@
import autosklearn
from autosklearn.classification import AutoSklearnClassifier
import autosklearn.experimental.selector
from autosklearn.metrics import Scorer
from autosklearn.metrics import Scorer, balanced_accuracy, roc_auc, log_loss, accuracy

metrics = (balanced_accuracy, roc_auc, log_loss)
selector_files = {}
this_directory = pathlib.Path(__file__).resolve().parent
training_data_file = this_directory / 'askl2_training_data.json'
with open(training_data_file) as fh:
training_data = json.load(fh)
fh.seek(0)
m = hashlib.md5()
m.update(fh.read().encode('utf8'))
training_data_hash = m.hexdigest()[:10]
selector_filename = "askl2_selector_%s_%s_%s.pkl" % (
autosklearn.__version__,
sklearn.__version__,
training_data_hash
)
selector_directory = os.environ.get('XDG_CACHE_HOME')
if selector_directory is None:
selector_directory = pathlib.Path.home()
selector_directory = pathlib.Path(selector_directory).joinpath('auto-sklearn').expanduser()
selector_file = selector_directory / selector_filename
metafeatures = pd.DataFrame(training_data['metafeatures'])
y_values = np.array(training_data['y_values'])
strategies = training_data['strategies']
minima_for_methods = training_data['minima_for_methods']
maxima_for_methods = training_data['maxima_for_methods']
if not selector_file.exists():
selector = autosklearn.experimental.selector.OneVSOneSelector(
configuration=training_data['configuration'],
default_strategy_idx=strategies.index('RF_SH-eta4-i_holdout_iterative_es_if'),
rng=1,
for metric in metrics:
training_data_file = this_directory / metric.name / 'askl2_training_data.json'
with open(training_data_file) as fh:
training_data = json.load(fh)
fh.seek(0)
m = hashlib.md5()
m.update(fh.read().encode('utf8'))
training_data_hash = m.hexdigest()[:10]
selector_filename = "askl2_selector_%s_%s_%s_%s.pkl" % (
autosklearn.__version__,
sklearn.__version__,
metric.name,
training_data_hash
)
selector.fit(
X=metafeatures,
y=y_values,
methods=strategies,
minima=minima_for_methods,
maxima=maxima_for_methods,
)
selector_file.parent.mkdir(exist_ok=True, parents=True)
try:
with open(selector_file, 'wb') as fh:
pickle.dump(selector, fh)
except Exception as e:
print("AutoSklearn2Classifier needs to create a selector file under "
"the user's home directory or XDG_CACHE_HOME. Nevertheless "
"the path {} is not writable.".format(selector_file))
raise e
selector_directory = os.environ.get('XDG_CACHE_HOME')
if selector_directory is None:
selector_directory = pathlib.Path.home()
selector_directory = pathlib.Path(selector_directory).joinpath('auto-sklearn').expanduser()
selector_files[metric.name] = selector_directory / selector_filename
metafeatures = pd.DataFrame(training_data['metafeatures'])
strategies = training_data['strategies']
y_values = pd.DataFrame(training_data['y_values'], columns=strategies, index=metafeatures.index)
minima_for_methods = training_data['minima_for_methods']
maxima_for_methods = training_data['maxima_for_methods']
default_strategies = training_data['tie_break_order']
if not selector_files[metric.name].exists():
selector = autosklearn.experimental.selector.OVORF(
configuration=training_data['configuration'],
random_state=np.random.RandomState(1),
n_estimators=500,
tie_break_order=default_strategies,
)
selector = autosklearn.experimental.selector.FallbackWrapper(selector, default_strategies)
selector.fit(
X=metafeatures,
y=y_values,
minima=minima_for_methods,
maxima=maxima_for_methods,
)
selector_files[metric.name].parent.mkdir(exist_ok=True, parents=True)

try:
with open(selector_files[metric.name], 'wb') as fh:
pickle.dump(selector, fh)
except Exception as e:
print("AutoSklearn2Classifier needs to create a selector file under "
"the user's home directory or XDG_CACHE_HOME. Nevertheless "
"the path {} is not writable.".format(selector_files[metric.name]))
raise e


class SmacObjectCallback:
Expand Down Expand Up @@ -286,7 +293,7 @@ def __init__(
Attributes
----------
cv_results\_ : dict of numpy (masked) ndarrays
cv_results_ : dict of numpy (masked) ndarrays
A dict with keys as column headers and values as columns, that can be
imported into a pandas ``DataFrame``.
Expand Down Expand Up @@ -334,10 +341,22 @@ def fit(self, X, y,
feat_type=None,
dataset_name=None):

if self.metric is None:
if len(y.shape) == 1 or y.shape[1] == 1:
self.metric = accuracy
else:
self.metric = log_loss

if self.metric in metrics:
metric_name = self.metric.name
selector_file = selector_files[metric_name]
else:
metric_name = 'balanced_accuracy'
selector_file = selector_files[metric_name]
with open(selector_file, 'rb') as fh:
selector = pickle.load(fh)

metafeatures = np.array([len(np.unique(y)), X.shape[1], X.shape[0]])
metafeatures = pd.DataFrame({dataset_name: [X.shape[1], X.shape[0]]}).transpose()
selection = np.argmax(selector.predict(metafeatures))
automl_policy = strategies[selection]

Expand Down Expand Up @@ -388,7 +407,9 @@ def fit(self, X, y,
else:
resampling_strategy_kwargs = None

portfolio_file = this_directory / 'askl2_portfolios' / ('%s.json' % automl_policy)
portfolio_file = (
this_directory / metric_name / 'askl2_portfolios' / ('%s.json' % automl_policy)
)
with open(portfolio_file) as fh:
portfolio_json = json.load(fh)
portfolio = portfolio_json['portfolio']
Expand Down
Loading

0 comments on commit 611cf5c

Please sign in to comment.