-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfaq.html
508 lines (460 loc) · 21.1 KB
/
faq.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Smile - FAQ</title>
<meta name="description" content="Statistical Machine Intelligence and Learning Engine">
<!-- prettify js and CSS -->
<script src="https://cdn.rawgit.com/google/code-prettify/master/loader/run_prettify.js?lang=scala&lang=kotlin&lang=clj"></script>
<style>
.prettyprint ol.linenums > li { list-style-type: decimal; }
</style>
<!-- Bootstrap core CSS -->
<link href="css/cerulean.min.css" rel="stylesheet">
<link href="css/custom.css" rel="stylesheet">
<script src="https://code.jquery.com/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js"></script>
<!-- slider -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.carousel.min.js"></script>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.carousel.css" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.transitions.css" type="text/css" />
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/owl-carousel/1.3.3/owl.theme.min.css" type="text/css" />
<!-- table of contents auto generator -->
<script src="js/toc.js" type="text/javascript"></script>
<!-- styles for pager and table of contents -->
<link rel="stylesheet" href="css/pager.css" type="text/css" />
<link rel="stylesheet" href="css/toc.css" type="text/css" />
<!-- Vega-Lite Embed -->
<script src="https://cdn.jsdelivr.net/npm/vega@5"></script>
<script src="https://cdn.jsdelivr.net/npm/vega-lite@5"></script>
<script src="https://cdn.jsdelivr.net/npm/vega-embed@6"></script>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-57GD08QCML"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-57GD08QCML');
</script>
<!-- Sidebar and testimonial-slider -->
<script type="text/javascript">
$(document).ready(function(){
// scroll/follow sidebar
// #sidebar is defined in the content snippet
// This script has to be executed after the snippet loaded.
// $.getScript("js/follow-sidebar.js");
$("#testimonial-slider").owlCarousel({
items: 1,
singleItem: true,
pagination: true,
navigation: false,
loop: true,
autoPlay: 10000,
stopOnHover: true,
transitionStyle: "backSlide",
touchDrag: true
});
});
</script>
</head>
<body>
<div class="container" style="max-width: 1200px;">
<header>
<div class="masthead">
<p class="lead">
<a href="index.html">
<img src="images/smile.jpg" style="height:100px; width:auto; vertical-align: bottom; margin-top: 20px; margin-right: 20px;">
<span class="tagline">Smile — Statistical Machine Intelligence and Learning Engine</span>
</a>
</p>
</div>
<nav class="navbar navbar-default" role="navigation">
<!-- Brand and toggle get grouped for better mobile display -->
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse" id="navbar-collapse">
<ul class="nav navbar-nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Overview <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="quickstart.html">Quick Start</a></li>
<li><a href="overview.html">What's Machine Learning</a></li>
<li><a href="data.html">Data Processing</a></li>
<li><a href="visualization.html">Data Visualization</a></li>
<li><a href="vegalite.html">Declarative Visualization</a></li>
<li><a href="gallery.html">Gallery</a></li>
<li><a href="faq.html">FAQ</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Supervised Learning <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="classification.html">Classification</a></li>
<li><a href="regression.html">Regression</a></li>
<li><a href="deep-learning.html">Deep Learning</a></li>
<li><a href="feature.html">Feature Engineering</a></li>
<li><a href="validation.html">Model Validation</a></li>
<li><a href="missing-value-imputation.html">Missing Value Imputation</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Unsupervised Learning <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="clustering.html">Clustering</a></li>
<li><a href="vector-quantization.html">Vector Quantization</a></li>
<li><a href="association-rule.html">Association Rule Mining</a></li>
<li><a href="mds.html">Multi-Dimensional Scaling</a></li>
<li><a href="manifold.html">Manifold Learning</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">LLM & NLP <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="llm.html">Large Language Model (LLM)</a></li>
<li><a href="nlp.html">Natural Language Processing (NLP)</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Math <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="linear-algebra.html">Linear Algebra</a></li>
<li><a href="statistics.html">Statistics</a></li>
<li><a href="wavelet.html">Wavelet</a></li>
<li><a href="interpolation.html">Interpolation</a></li>
<li><a href="graph.html">Graph Data Structure</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">API <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="api/java/index.html" target="_blank">Java</a></li>
<li><a href="api/scala/index.html" target="_blank">Scala</a></li>
<li><a href="api/kotlin/index.html" target="_blank">Kotlin</a></li>
<li><a href="api/clojure/index.html" target="_blank">Clojure</a></li>
<li><a href="api/json/index.html" target="_blank">JSON</a></li>
</ul>
</li>
<li><a href="https://mybinder.org/v2/gh/haifengl/smile/notebook?urlpath=lab%2Ftree%2Fshell%2Fsrc%2Funiversal%2Fnotebooks%2Findex.ipynb" target="_blank">Try It Online</a></li>
</ul>
</div>
<!-- /.navbar-collapse -->
</nav>
</header>
<div id="content" class="row">
<div class="col-md-3 col-md-push-9 hidden-xs hidden-sm">
<div id="sidebar">
<div class="sidebar-toc" style="margin-bottom: 20px;">
<p class="toc-header">Contents</p>
<div id="toc"></div>
</div>
<div id="search">
<script>
(function() {
var cx = '010264411143030149390:ajvee_ckdzs';
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') +
'//cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<gcse:searchbox-only></gcse:searchbox-only>
</div>
</div>
</div>
<div class="col-md-9 col-md-pull-3">
<h1 id="faq-top" class="title">FAQ</h1>
<h2 class="question" id="cite">How should I cite Smile?</h2>
<p class="answer">Please cite Smile in your publications if it helps your research. Here is an example BibTeX entry:</p>
<pre><code>
@misc{Li2014Smile,
title={Smile},
author={Haifeng Li},
year={2014},
howpublished={\url{https://haifengl.github.io}},
}
</code></pre>
<h2 class="question" id="link-with-smile">Link with Smile</h2>
<p class="answer">
Smile artifacts are hosted in <a href="https://oss.sonatype.org/#nexus-search;quick~smile-core">Sonatype Nexus</a>.
You can add the following dependency into your pom.xml:</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#maven_1" data-toggle="tab">Maven</a></li>
<li><a href="#gradle_1" data-toggle="tab">Gradle (Kotlin)</a></li>
<li><a href="#sbt_1" data-toggle="tab">SBT</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="maven_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-core</artifactId>
<version>4.1.0</version>
</dependency>
</code></pre>
</div>
</div>
<div class="tab-pane" id="gradle_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
implementation("com.github.haifengl:smile-core:4.1.0")
</code></pre>
</div>
</div>
<div class="tab-pane" id="sbt_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
libraryDependencies += "com.github.haifengl" % "smile-core" % "4.1.0"
</code></pre>
</div>
</div>
</div>
<p>To leverage GPU and deep learning, add</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#maven_2" data-toggle="tab">Maven</a></li>
<li><a href="#gradle_2" data-toggle="tab">Gradle (Kotlin)</a></li>
<li><a href="#sbt_2" data-toggle="tab">SBT</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="maven_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
<dependency>
<groupId>com.github.haifengl</groupId>
<artifactId>smile-deep</artifactId>
<version>4.1.0</version>
</dependency>
</code></pre>
</div>
</div>
<div class="tab-pane" id="gradle_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
implementation("com.github.haifengl:smile-deep:4.1.0")
</code></pre>
</div>
</div>
<div class="tab-pane" id="sbt_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
libraryDependencies += "com.github.haifengl" %% "smile-deep" % "4.1.0"
</code></pre>
</div>
</div>
</div>
<p>For Scala API, add</p>
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
libraryDependencies += "com.github.haifengl" %% "smile-scala" % "4.1.0"
</code></pre>
</div>
<p>Some algorithms rely on BLAS and LAPACK (e.g. manifold learning,
some clustering algorithms, Gaussian Process regression, MLP, etc.).
By default, Smile includes OpenBLAS for macOS, Windows, and Linux:</p>
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
libraryDependencies ++= Seq(
"org.bytedeco" % "javacpp" % "1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "openblas" % "0.3.28-1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "arpack-ng" % "3.9.1-1.5.11" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64" classifier ""
)
</code></pre>
</div>
<p>For mobile platform, you may need add <code>classifier "android-arm64"</code>
or <code>classifier "ios-arm64"</code>. In general, you should include only
the needed platforms to save spaces.</p>
<p>If you prefer other BLAS implementations, you can use any library found
on the "java.library.path" or on the class path, by specifying it with
the "org.bytedeco.openblas.load" system property. For example, to use
the BLAS library from the Accelerate framework on Mac OS X, we can pass
options such as `-Djava.library.path=/usr/lib/ -Dorg.bytedeco.openblas.load=blas`.</p>
<p>For a default installation of MKL that would be `-Dorg.bytedeco.openblas.load=mkl_rt`.
Or you may add the following dependencies to your project, which
includes MKL binaries. Smile will automatically switch to MKL.</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#maven_3" data-toggle="tab">Maven</a></li>
<li><a href="#gradle_3" data-toggle="tab">Gradle (Kotlin)</a></li>
<li><a href="#sbt_3" data-toggle="tab">SBT</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane active" id="maven_3">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>mkl-platform</artifactId>
<version>2024.0-1.5.10</version>
</dependency>
<dependency>
<groupId>org.bytedeco</groupId>
<artifactId>mkl-platform-redist</artifactId>
<version>2024.0-1.5.10</version>
</dependency>
</code></pre>
</div>
</div>
<div class="tab-pane" id="gradle_3">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
implementation("org.bytedeco:mkl-platform:2024.0-1.5.10")
implementation("org.bytedeco:mkl-platform-redist:2024.0-1.5.10")
</code></pre>
</div>
</div>
<div class="tab-pane" id="sbt_3">
<div class="code" style="text-align: left;">
<pre class="prettyprint"><code>
libraryDependencies ++= {
val version = "2024.0-1.5.10"
Seq(
"org.bytedeco" % "mkl-platform" % version,
"org.bytedeco" % "mkl-platform-redist" % version
)
}
</code></pre>
</div>
</div>
</div>
<h2 class="question" id="model-serialization">Model serialization</h2>
<p class="answer">To serialize a model, you may use</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#java_3" data-toggle="tab">Java</a></li>
<li><a href="#scala_3" data-toggle="tab">Scala</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane" id="scala_3">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-scala"><code>
import smile._
write(model, file)
</code></pre>
</div>
</div>
<div class="tab-pane active" id="java_3">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-java"><code>
import smile.io.Write;
Write.object(model, file)
</code></pre>
</div>
</div>
</div>
<p>This method serializes the model in Java serialization format. This is handy
if you want to use a model in Spark.</p>
<h2 class="question" id="data-format">Data Format</h2>
<p class="answer">
Most Smile algorithms take simple <code>double[]</code> or <code>DataFrame</code>as input.
<code>DataFrame</code> can be easily constructed from <code>double[][]</code> too.
So you can use your favorite methods
or library to import the data as long as the samples are in double arrays. Meanwhile, Smile provides
a couple of parsers in <code>smile.io</code> for popular data formats, such as CSV, Weka's ARFF files,
LibSVM's file format, delimited text files, SAS, Parquet, Avro, Arrow, JSON, and binary sparse data.
</p>
<h2 class="question" id="pom">Cannot build Smile with maven</h2>
<p class="answer">
We have moved to SBT to build packages. The maven pom.xml files were deprecated and were removed in v1.2.0.
</p>
<h2 class="question" id="headless-plot">Headless Plot</h2>
<p class="answer">
In case that your environment does not have a display, or you need to generate and save
a lot of plots without showing them on the screen, you may run Smile in headless model.
</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#java_1" data-toggle="tab">Java</a></li>
<li><a href="#scala_1" data-toggle="tab">Scala</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane" id="scala_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-sh"><code>
bin/smile -Djava.awt.headless=true
</code></pre>
</div>
</div>
<div class="tab-pane active" id="java_1">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-sh"><code>
bin/jshell.sh -R-Djava.awt.headless=true
</code></pre>
</div>
</div>
</div>
<p>The following example shows how to save a plot in the headless mode.</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#java_2" data-toggle="tab">Java</a></li>
<li><a href="#scala_2" data-toggle="tab">Scala</a></li>
</ul>
<div class="tab-content">
<div class="tab-pane" id="scala_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-scala"><code>
val toy = read.csv("data/classification/toy200.txt", delimiter="\t", header=false)
val canvas = plot(toy, "V2", "V3", "V1", '.')
val image = canvas.toBufferedImage(400, 400)
javax.imageio.ImageIO.write(image, "png", new java.io.File("headless.png"))
</code></pre>
</div>
</div>
<div class="tab-pane active" id="java_2">
<div class="code" style="text-align: left;">
<pre class="prettyprint lang-java"><code>
import java.awt.Color;
import smile.io.*;
import smile.plot.swing.*;
import org.apache.commons.csv.CSVFormat;
var toy = Read.csv("data/classification/toy200.txt", CSVFormat.DEFAULT.withDelimiter('\t'));
var canvas = ScatterPlot.of(toy, "V2", "V3", "V1", '.').canvas();
var image = canvas.toBufferedImage(400, 400);
javax.imageio.ImageIO.write(image, "png", new java.io.File("headless.png"));
</code></pre>
</div>
</div>
</div>
<h2 class="question" id="seed">How can I set the random number generator seed for random forest?</h2>
<p class="answer">
This is a common question for stochastic algorithms like random forest.
In general, this is discouraged because people often choose bad seed due to the lack
of sufficient knowledge of random number generation. However, one may want the repeatable
result for testing purpose. In this case, call <code>smile.math.MathEx.setSeed</code> before
training the model.
</p>
<p>
Note that we don't provide a method to set the seed for a particular algorithm.
Many algorithms are multithreaded and each thread has their own random number
generator. We choose this design because each random number generator maintains
an internal state so that it is not multithread-safe. If multithreads share a
random number generator, we have to use locks, which significant reduce the
performance.
</p>
<p>
A method <code>setSeed()</code> in the algorithm is also troublesome.
For algorithms like random forest, it is not right to initialize every thread
with the same seed. Otherwise, same decision trees will be created,
and we lose the randomness of "random" forest. It is also complicated
to pass a sequence of random numbers because it is not clear
how many random number generators are needed for many algorithms. Even worse, it breaks
the encapsulation as the caller has to know the details of algorithms.
</p>
</div>
<script type="text/javascript">
$('#toc').toc({exclude: 'h1, h5, h6', context: '', autoId: true, numerate: false});
</script>
</div>
</div>
<a href=https://github.com/haifengl/smile><img style="position: fixed; top: 0; right: 0; border: 0" src=/images/forkme_right_orange.png alt="Fork me on GitHub"></a>
<!-- Place this tag right after the last button or just before your close body tag. -->
<script async defer id="github-bjs" src="https://buttons.github.io/buttons.js"></script>
</body>
</html>