-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathconcepts.html
337 lines (223 loc) · 20.4 KB
/
concepts.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Concepts — Digdag 0.10.5 documentation</title>
<script type="text/javascript" src="_static/js/modernizr.min.js"></script>
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/language_data.js"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/custom.css" type="text/css" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Workflow definition" href="workflow_definition.html" />
<link rel="prev" title="Architecture" href="architecture.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> Digdag
</a>
<div class="version">
0.10
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="getting_started.html">Getting started</a></li>
<li class="toctree-l1"><a class="reference internal" href="architecture.html">Architecture</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Concepts</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#projects-and-revisions">Projects and revisions</a></li>
<li class="toctree-l2"><a class="reference internal" href="#sessions-and-attempts">Sessions and attempts</a></li>
<li class="toctree-l2"><a class="reference internal" href="#scheduled-execution-and-session-time">Scheduled execution and session_time</a></li>
<li class="toctree-l2"><a class="reference internal" href="#tasks">Tasks</a></li>
<li class="toctree-l2"><a class="reference internal" href="#export-and-store-parameters">Export and store parameters</a></li>
<li class="toctree-l2"><a class="reference internal" href="#operators-and-plugins">Operators and plugins</a></li>
<li class="toctree-l2"><a class="reference internal" href="#dynamic-task-generation-and-check-error-tasks">Dynamic task generation and _check/_error tasks</a></li>
<li class="toctree-l2"><a class="reference internal" href="#task-naming-and-resuming">Task naming and resuming</a></li>
<li class="toctree-l2"><a class="reference internal" href="#workspace">Workspace</a></li>
<li class="toctree-l2"><a class="reference internal" href="#next-steps">Next steps</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="workflow_definition.html">Workflow definition</a></li>
<li class="toctree-l1"><a class="reference internal" href="scheduling_workflow.html">Scheduling workflow</a></li>
<li class="toctree-l1"><a class="reference internal" href="operators.html">Operators</a></li>
<li class="toctree-l1"><a class="reference internal" href="command_reference.html">Command reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="python_api.html">Language API - Python</a></li>
<li class="toctree-l1"><a class="reference internal" href="ruby_api.html">Language API - Ruby</a></li>
<li class="toctree-l1"><a class="reference internal" href="rest_api.html">REST API</a></li>
<li class="toctree-l1"><a class="reference internal" href="command_executor.html">Command Executor</a></li>
<li class="toctree-l1"><a class="reference internal" href="internal.html">Internal architecture</a></li>
<li class="toctree-l1"><a class="reference internal" href="metrics.html">Digdag metrics (experimental)</a></li>
<li class="toctree-l1"><a class="reference internal" href="community_contributions.html">Community Contributions</a></li>
<li class="toctree-l1"><a class="reference internal" href="releases.html">Release Notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="logo.html">Logo</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">Digdag</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li>Concepts</li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/concepts.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="concepts">
<h1>Concepts<a class="headerlink" href="#concepts" title="Permalink to this headline">¶</a></h1>
<div class="section" id="projects-and-revisions">
<h2>Projects and revisions<a class="headerlink" href="#projects-and-revisions" title="Permalink to this headline">¶</a></h2>
<p>In Digdag, workflows are packaged together with other files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project.</p>
<p>When project is uploaded to a Digdag server, Digdag server inserts a new version and keeps old versions. A version of a project is called revision. When you run a workflow, Digdag uses the latest revision by default. But you can also use old revisions for following purposes:</p>
<ul class="simple">
<li><p>Check the definition of a past workflow execution.</p></li>
<li><p>Run a workflow using a old revision to reproduce the same results with before.</p></li>
<li><p>Revert to a old revision to fix problems introduced in the latest revision by accident.</p></li>
</ul>
<p>A project can contain multiple workflows. But you should create a new project if a new workflow is not related to others. A reason is that all workflows in a project will be updated together when you upload a new revision.</p>
</div>
<div class="section" id="sessions-and-attempts">
<h2>Sessions and attempts<a class="headerlink" href="#sessions-and-attempts" title="Permalink to this headline">¶</a></h2>
<p>A session is a plan to run a workflow which should complete successfully. An attempt is an actual execution of a session. A session has multiple attempts if you retry a failed workflow.</p>
<p>The reason why sessions and attempts are separated is that an execution may fail. When you list sessions up, the expected status is that all sessions are green. If you find a failing session, you check attempts of it, and debugs the problem from the logs. You may upload a new revision to fix the issue, then start a new attempt. Sessions let you easily confirm that all planned executions are successfully done.</p>
</div>
<div class="section" id="scheduled-execution-and-session-time">
<h2>Scheduled execution and session_time<a class="headerlink" href="#scheduled-execution-and-session-time" title="Permalink to this headline">¶</a></h2>
<p>A session has a timestamp called <code class="docutils literal notranslate"><span class="pre">session_time</span></code>. This time means “for which time this workflow runs”. For example, if a workflow is scheduled every day, the time is usually 00:00:00 of a day such as 2017-01-01 00:00:00. Actual execution time may not be the same time. You may want to delay execution for 2 hours because some data need 1 hour to be prepared. You may run a workflow for the time on a next day to backfill yesterday’s results. The time, 2017-01-01 00:00:00 in this example, is called <code class="docutils literal notranslate"><span class="pre">session_time</span></code>.</p>
<p><code class="docutils literal notranslate"><span class="pre">session_time</span></code> is unique in history of a workflow. If you submit two sessions with the same <code class="docutils literal notranslate"><span class="pre">session_time</span></code>, the later request will be rejected. This prevents accidental submission of a session that ran before for the same time. If you need to run a workflow for the same time, you should retry the past session instead of submitting a new session.</p>
</div>
<div class="section" id="tasks">
<h2>Tasks<a class="headerlink" href="#tasks" title="Permalink to this headline">¶</a></h2>
<p>When an attempt of a session starts, a workflow is transformed into a set of tasks. Tasks have dependencies each other. For example, task +dump depends on +process1 and +process2, task +process1 and +process2 depend on +prepare, etc. Digdag understands the dependencies and run the tasks in order.</p>
</div>
<div class="section" id="export-and-store-parameters">
<h2>Export and store parameters<a class="headerlink" href="#export-and-store-parameters" title="Permalink to this headline">¶</a></h2>
<p>There are 3 kinds of parameters for a task.</p>
<ul class="simple">
<li><p><strong>local</strong>: parameters directly set to the task</p></li>
<li><p><strong>export</strong>: parameters exported from parent tasks</p></li>
<li><p><strong>store</strong>: parameters stored by previous tasks</p></li>
</ul>
<p>They are merged into one object when a task runs. Local parameters have the highest priority. Export and store parameters override each other and thus parameters set at later tasks have higher priority.</p>
<p>Export parameters are used for a parent task to pass values to children. Store parameters are used for a task to pass values to all following tasks including children.</p>
<p>Influence of export parameters is limited compared to store parameters. This lets workflows being “modularized”. For example, your workflow uses some scripts to process data. You may set some parameters for the scripts to control their behavior. On the other hand, you don’t want make the other scripts affected by the parameters (e.g. data loading part shouldn’t be affected by any changes in data processing part). In this case, you can put your scripts under a single parent task and let the parent task export parameters.</p>
<p>Store parameters are visible to all following tasks - store parameters are not visible by previous tasks. For example, you ran a workflow and retried it. In this case, parameters stored by a task won’t be visible by previous tasks even if the task has finished successfully in the last execution.</p>
<p>Store parameters are not global variables. When two tasks run in parallel, they will use different store parameters. This makes the workflow behavior consitent regardless of actual execution timing. For example, when another task runs depending on the two parallel tasks, parameter stored by the last task will be used in the order of task submission.</p>
</div>
<div class="section" id="operators-and-plugins">
<h2>Operators and plugins<a class="headerlink" href="#operators-and-plugins" title="Permalink to this headline">¶</a></h2>
<p>Operators are executor of tasks. Operators are set in a workflow definition as <code class="docutils literal notranslate"><span class="pre">sh></span></code>, <code class="docutils literal notranslate"><span class="pre">pg></span></code>, etc. When a task runs, Digdag picks one operator, merges all parameters (local, export, and store parameters), then give the merged parameters to the operator.</p>
<p>An operator can be considered as a package of common workload. With operators, you can do the more things with less scripts.</p>
<p>Operators are designed to be plugins (although it’s not fully implemented yet). You will install operators to simplify your workflow, and you will create a operator so that other workflows can reuse it. Digdag itself would be a simple platform to run many operators on it.</p>
</div>
<div class="section" id="dynamic-task-generation-and-check-error-tasks">
<h2>Dynamic task generation and _check/_error tasks<a class="headerlink" href="#dynamic-task-generation-and-check-error-tasks" title="Permalink to this headline">¶</a></h2>
<p>Digdag transforms a workflow into a set of tasks with dependencies. This graph of the tasks is called DAG, Directed Acyclic Graph. DAG is good to execute from the most dependent task to the end. However, it can’t represent loops. Representing <code class="docutils literal notranslate"><span class="pre">if</span></code> branches is also not straightforward.</p>
<p>But loops and branches are useful. To solve this issue, Digdag dynamically appends tasks to an executing DAG. In following example, Digdag generates 3 tasks to represent a loop: <code class="docutils literal notranslate"><span class="pre">+example^sub+loop-0</span></code>, <code class="docutils literal notranslate"><span class="pre">+example^sub+loop-1</span></code>, and <code class="docutils literal notranslate"><span class="pre">+example^sub+loop-2</span></code> (name of a dynamically generated task starts with <code class="docutils literal notranslate"><span class="pre">^sub</span></code>):</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><span class="nt">+example</span><span class="p">:</span>
<span class="nt">loop></span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">3</span>
<span class="nt">_do</span><span class="p">:</span>
<span class="nt">echo></span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">this is ${i}th loop</span>
</pre></div>
</div>
<p><code class="docutils literal notranslate"><span class="pre">_check</span></code> and <code class="docutils literal notranslate"><span class="pre">_error</span></code> options use dynamic task generation. Those parameters are used by Digdag to run another task only when the task succeeds or fails.</p>
<p><code class="docutils literal notranslate"><span class="pre">_check</span></code> task is generated after successful completion of a task. This is useful especially when you want to validate results of a task before starting next tasks.</p>
<p><code class="docutils literal notranslate"><span class="pre">_error</span></code> task is generated after failure of a task. This is useful to notify failure of a task to external systems.</p>
<p>The following example output <code class="docutils literal notranslate"><span class="pre">success</span></code> on succeeding the tasks. And also, It output the message <code class="docutils literal notranslate"><span class="pre">fail</span></code> on failing the tasks.</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><span class="nt">+example</span><span class="p">:</span>
<span class="nt">sh></span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">your_script.sh</span>
<span class="nt">_check</span><span class="p">:</span>
<span class="nt">+succeed</span><span class="p">:</span>
<span class="nt">echo></span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">success</span>
<span class="nt">_error</span><span class="p">:</span>
<span class="nt">+failed</span><span class="p">:</span>
<span class="nt">echo></span><span class="p">:</span> <span class="l l-Scalar l-Scalar-Plain">fail</span>
</pre></div>
</div>
</div>
<div class="section" id="task-naming-and-resuming">
<h2>Task naming and resuming<a class="headerlink" href="#task-naming-and-resuming" title="Permalink to this headline">¶</a></h2>
<p>A task has an unique name in an attempt. When you retry an attempt, this name is used to match tasks in the last attempt.</p>
<p>Children tasks have parent task’s name as the prefix. Workflow name is also prefixed as the root task. In following example, task names will be <code class="docutils literal notranslate"><span class="pre">+my_workflow+load+from_mysql+tables</span></code>, <code class="docutils literal notranslate"><span class="pre">+my_workflow+load+from_postgres</span></code>, and <code class="docutils literal notranslate"><span class="pre">+my_workflow+dump</span></code>.</p>
<div class="highlight-yaml notranslate"><div class="highlight"><pre><span></span><span class="c1"># my_workflow.dig</span>
<span class="nt">+load</span><span class="p">:</span>
<span class="nt">+from_mysql</span><span class="p">:</span>
<span class="nt">+tables</span><span class="p">:</span>
<span class="l l-Scalar l-Scalar-Plain">...</span>
<span class="nt">+from_postgres</span><span class="p">:</span>
<span class="l l-Scalar l-Scalar-Plain">...</span>
<span class="nt">+dump</span><span class="p">:</span>
<span class="l l-Scalar l-Scalar-Plain">...</span>
</pre></div>
</div>
</div>
<div class="section" id="workspace">
<h2>Workspace<a class="headerlink" href="#workspace" title="Permalink to this headline">¶</a></h2>
<p>Workspace is a directory where a task runs at. Digdag extracts files from a project archive to this directory, change directory there, and executes a task (note: local-mode execution does nothing to create a workspace because it’s assumed that current working directory is the workspace).</p>
<p>Plugins will not allow access to parent directories of workspace. This is because digdag server is running on a shared environment. A project should be self-contained so that it doesn’t have to depend on external environments. Scripting operator is an exception (e.g. sh> operator). It’s recommended to run scripts using <code class="docutils literal notranslate"><span class="pre">docker:</span></code> option.</p>
</div>
<div class="section" id="next-steps">
<h2>Next steps<a class="headerlink" href="#next-steps" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li><p><a class="reference external" href="internal.html">Internal architecture</a></p></li>
<li><p><a class="reference external" href="command_reference.html">Command reference</a></p></li>
</ul>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="workflow_definition.html" class="btn btn-neutral float-right" title="Workflow definition" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="architecture.html" class="btn btn-neutral float-left" title="Architecture" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2016-2024, Digdag Project
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
<br/>
<br/>
<p><a href="https://digdag.io/">Digdag</a> is an open source project, invented and sponsored by <a href="https://www.treasuredata.com/">Treasure Data, Inc.</a> under the Apache 2.0 Licence.</p>
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>