-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathchangelog
654 lines (354 loc) · 16.6 KB
/
changelog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
2014-12-06 Ronan Keryell <[email protected]>
* Par4All 1.4.6
* Integrate latest PIPS version for recent Linux version
* There are stills issues with headers of modern GNU libc
2014-02-03 Ronan Keryell <[email protected]>
* Par4All 1.4.5
* Integrate ASTRAD support for the SIMILAN collaborative research project
* Add support to build and publish Par4All for Ubuntu in a
VirtualBox virtual machine
* Integrate WWW site of the project in the repository itself
* Move the WWW site from WordPress to GitHub Pages based on ReST format
via Sphinx
2014-01-14 Ronan Keryell <[email protected]>
* Par4All 1.4.4
* Update the include recovery for gcc-4.8 and recent Debian & Ubuntu
* Improve documentation
* Improve compilation documentation for OpenSuse
* Fix const-ness issues in Par4All Accel back-end
* Update to latest PIPS version
2012-11-29 Ronan Keryell <[email protected]>
* Par4All 1.4.3
* XML output of some memory access information for OpenGPU Project.
New --spear-xml option taking as input an XML description of
functions to transform and generate XML descriptors
* Improve outlining
* Fix documentation
* Improve compilation documentation for OpenSuse
* Fix post-processing in the OpenCL back-end for Thales TRT in SMECY
project
2012-06-22 Ronan Keryell <[email protected]>
* Par4All 1.4.2
* Change Stars-PM demo Makefile to avoid nvcc choking on recent FFTW3 headers
* Fix normalization for Chirp code from SMECY project
* Make different privatization phases easier to select
* Improved complexity analysis for SMECY project (cluster 3)
* Added a phase limit_parallelism_using_complexity to
automatically skip the parallelization of non enough
compute-intensive loops
* Add a git-svn-switch script to ease upstream SVN URL changes
* Fix scalarization to avoid scalarizing a local variable used in an
array subscript
* New taskify phase to begin producing StarPU tasks for MediaGPU project
* Take into account CUDA 3.0 and 3.5 architecture
* Take into account the new name of HPC Project: SILKAN
* Add an OpenCL target to the benchmarks for time measurements in Mehdi
Amini's PhD thesis
* Choose OpenCL workgroup allocation instead of the vendors
2012-05-17 Ronan Keryell <[email protected]>
* Par4All 1.4.1
* Update package dependencies for Ubuntu 12.04 and Debian/testing
& /unstable.
* Make Par4All relocatable after compile-time (.deb package and .tar.gz)
* Improve the coding rules guide
* Update dependent package list for Ubuntu 12.04 and to be able to run
Stars-PM demo
* Do not add "register" qualifier in scalarization phase
* The license of the PIPS C3 Linear library changes from GPLv3 to LGPL
2012-05-04 Ronan Keryell <[email protected]>
* Par4All 1.4
* Outline kernel callees with a prefix to avoid conflicts
* New kernel generation algorithm for p4a: no longer inline callees in
kernels but call device functions instead
* Apply some callees unfolding after array linearization for non-C99
VLA devices
* nvcc compilation chain can use C++ compiler specified by user
* Deal with OpenCL pointer in kernel callees
* Add another loop fusion algorithm based on array regions
* Fix --select-module and other module filtering to cope new kernel
generation algorithm
* Improve documentation
* Liveness analysis
* Generated variables are now prefixed p4a_ instead of P4A_ reserved
for the runtime itself
* More pointer values, effects and regions interaction
* New PHPiPS interface to ease cloud interfacing
* Loop nests with some variable declaration with initializations are
not perfect loop nests
* New --kernel-unroll option to unroll loops inside kernels
* Display OpenCL kernel compilation messages in the case of errors
or in debug mode
* Improve memory effects and regions analysis in the case of casts
* New privatization phase dealing even with global variables based on
liveness analysis
* New pointer values
* Improvement of the coding rules guide
* Improve packaging scripts
* Improve integration script
* Add OS distribution in error report email
2012-02-09 Ronan Keryell <[email protected]>
* Par4All 1.3.2
* Improved redundancy elimination in Linear
* Add some missing intrinsics for Fortran
* Do not generate loops for degenerated iteration domains
* Fix effects computation
* Improve for- to do-loop translation
* Do not privatize global variable in simple privatization
* New experiment -pointer-analysis option to take into account pointer
analysis
* Fix interface with PoCC
* Add a --pocc-options to p4a
* Improve transformers computation
* Improve semantics analysis in the case of unsigned loop index
* Fix bug in timing method in Accel runtime
* Check for Accel runtime non-initialization
* Test if a CUDA device is available in CUDA mode for Accel runtime
* Add a lot of information in CUDA mode for Accel runtime
* Improve documentation
* Control simplification deals with do-while loops
* Fix bugs in loop-fusion
* Now clean_up_sequence can deal more gracefully with variable scoping
* Improve sorting in Linear to have more repetitive behaviour
* Add detection of 1-trip while-loops
* Fix bugs in isolate_statement
* Improve outliner in the case of a new compilation unit
* Add isnan C intrinsic
* Manage by overloading already existing OpenMP-pragma with the new
generated ones
* Fix sharing in the case of cast in a function call
* --cuda-cc now accepts 2.1 version
* Add support for AMD fp64 OpenCL extension in Accel runtime, if available
* Add debug kernel call information in OpenCL Accel runtime
* p4a can accept now Fortran files with .F extension
2012-01-18 Serge Guelton <[email protected]>
* Par4All 1.3.1
* Check for CUDA memory allocation errors
* Restore PIPS properties after a PyPS exception
* Manage better pointers on 1D array on GPU
* Fix loop bound generation in the presence of unsigned index types
* Fix p4a --report
* Fix regressions in examples because of PIPS validation
compatibility.
* Stars-PM example can work in OpenCL and OpenGL
* New --no-pointer-aliasing option in p4a
* --fine option is now --fine-grain. Coarse grain is again the
default parallelization.
* Remove parasitic kernel launchers from GPU code with static
functions
* Put libraries at the end of link options since it chokes some
linkers.
* Fix bugs in loop fusion
* Fix bugs in PIPS linear around overflow handling
* Improve array linearization to deal with pointers on array of
structs, VLA, arrays with static size and others, skip local arrays...
* Fix bugs in statement isolation, accept structs, deal better
with partial arrays...
* Fix bugs in outliner
* Fix bugs in GPU-ify
* Describe better the versions in scripts and validation
* Update package list for recent Ubuntu
* Update installation guide
* p4a_scpp is now more user friendly
* Better linear constraints, improve normalization
* Improve redundancy reduction in Linear
* Improve ipyps portability
* Improve OpenMP reduction pragma when several reductions at the
same time
* Fix bug in localize_declaration
* Improve simplify control
* Improve p4a_git integration script
* Improve organization of the Par4All download server directories
* Fix typo in stub broker
* Fix P4A Accel runtime to deal with constants in OpenCL kernel
invocation
* Differentiate more OpenMP and GPU compilation flows
* Restructure Par4All Accel Runtime
* Avoid launching kernels with no iteration
* Accept multiple source files with same base name
* More resilient with spaces in path names
* Improve effects for structures
* Keep more comments
* More tolerant with __asm(...)
* Improve transformers
* Improve conflict testing in dependence graph
2011-11-11 Ronan Keryell <[email protected]>
* Par4All 1.3
* Huge improvements in compilation speed of big linear source codes
with a lot of variables to cope with the output of the Scilab-to-C
compiler from HPC Project. Use aggressive caching of internal
structures and remove many memory leaks. Mainly in def-use analyses,
effects...
* Can now generate OpenCL code.
* Generate up to 3D kernel for CUDA 2.0+ architectures (Fermi...)
* Now compile for Fermi architecture with --cuda. Ask for more cache
than shared memory
* Improved debug information on GPU target about kernels. Now P4A_DEBUG
is an environment variable and no longer a compilation flag
* New block of threads layout for CUDA
* Use loop fusion both for OpenMP and Accel
* Use fine grain parallelization before coarse grain for Accel
* Many many bug fixes
* Generate better types for variables with aggregated type in kernels
* Added a phase to promote sequential code to GPU kernel code
* Improved loop fusion
* Do not parallelize loops with control side effects (return,
abort()...)
* Better handling of const
* Cope with in-lining of functions with lacking return
* New quick scalarization phase
* Improved variable localization/privatization
* Improved #include recovery
* New PyPS-made unfolding of functions
* Subtler linearization of arrays in GPU kernel that cannot be C99
* Fix bugs in the GPU communication optimization
* Accept trivial (idempotent) cast in C code
* Function calls can now happen in declarations
* Improved PIPS infrastructure
* Added automatic benchmarking and graphics generation to speed-up
scientific article generation (see article at LCPC this year). :-)
Can also run with PGI and HMPP compilers
* Register variables cannot conflict
* Improved C switch() desugaring
* Can now also accept less standard complex double instead of double
complex
* Par4All building infrastructure improved to be able to easily skip
some wrong commit from upstream projects (PIPS...)
* Can compile on Fedora 15
* Improved documentation
* Improved examples
2011-07-15 Ronan Keryell <[email protected]>
* Par4All 1.2.1
* Unstable version, unreleased. Look at version 1.3 for the real changes.
2011-07-07 Ronan Keryell <[email protected]>
* Par4All 1.2
* This version targets mainly the Wild Cruncher, a parallelizing
environment from HPC Project for Scilab programs. Par4All is used
to parallelize the output of the Scilab-to-C compiler from HPC
Project
* Added in examples/Benchmarks some benchmark examples we use in
our publications so that anybody can verify Par4All performance on
them with own hardware
* Improved support for CUDA atomic update for reductions
* Better deal with scalars in GPU parallelization
* Improved memory effect analysis
* Fixed outlining for kernel generation with scalar parameters
* Improved loop fusion, deal with local variable declarations
* Improved array scalarization
* Make package publication more resilient to network failures
* Fixed GPU code generation for non rectangular iteration spaces
* Fixed communication optimization between GPU and CPU
* Added support for CEA SCMP embedded system
* Installation directory can now be changed also after a first
installation
* Use the broker concept to deal with stubs to manage with non or
already parallelized libraries
* Now install LICENSE.txt
* Updated to new PyPS interface
* GPU kernel can be outlined in separated source files on demand,
for OpenCL or use a separate non C99 compiler (CUDA nvcc), at
kernel, launcher, wrapper grain...
* Fixed compilation flags in PIPS/linear to avoid recompilation to
fail when an API changes too much
2011-04-20 Ronan Keryell <[email protected]>
* Par4All 1.1.2
* Improved support for Scilab compiler output
* Some work on effects in PIPS
* p4a -g now put device code in debug too.
* Improved Stars-PM example
Added nVidia SM11 GPU, FFT timing...
Added a PGI Accelerator version for comparison
* Fixed wrong Makefile targets for Jacobi demo
Bug reported by Richard Membarth from Universität Erlangen.
2011-04-12 Ronan Keryell <[email protected]>
* Par4All 1.1.1
* Added support for CEA SCMP task dataflow machine (European
project ARTEMIS SCALOPES)
* Improved GPU kernel generation for loop nests with complex
declarations.
Bug reported by Richard Membarth from Universität Erlangen.
* Added new options to apply PIPS transformations in the Par4All
compilation transit (--apply-before-parallelization...)
* Added a programming guide describing best practices to get
better performance with Par4All
2011-03-01 Ronan Keryell <[email protected]>
* Par4All 1.1
* C99 declarations anywhere in a block and in C99 for-loops are now
supported.
* Fixed code generation for C99 declarations.
* New --apply-before-parallelization option to apply phases before
parallelization.
* Improved compilation speed.
* No longer rely on Python 3.x since there where some issues on
some systems to cope with both 2.y and 3.x versions.
* Fixed encoding issues.
2011-02-03 Ronan Keryell <[email protected]>
* Par4All 1.0.5
* Par4All 1.0.5 fixes a bug when a code to be kernelized uses some
global variables.
Thanks to Sarnath Kannan for this bug report. It should work now
on common cases.
* Prototype on lazy CUDA communication optimizations to remove
redundant host-accelerator communications.
* Fixed a space iteration transposition bug in accelerator mode
that was killing performances. But right now, better results are
obtained with 2D kernels.
* C99 for(int i;...;...) are now accepted.
* Can generate kernels with less perfectly nested loops.
* Updated examples directory to new options and communication
optimizations
* Better error and warning messages.
* Script cleaning by using Python module names everywhere.
2010-11-22 Ronan Keryell <[email protected]>
* Par4All 1.0.4
* Par4All 1.0.4 introduces a new P4A Accel runtime for OpenMP and CUDA.
* In previous months, PIPS and PyPS has evolved a lot, specially
in the code generation for various accelerators. This version
try to cope with these evolutions.
* Added the Stars-pm cosmological N-body simulation program as an
example.
* Now the runtime can deal with subarray transfers between the
host and the accelerator, up to 4D arrays. Well right now the
phases chosen in PIPS do not use them yet.
* The code generation for non-C99 CUDA is more robust.
2010-11-05 Ronan Keryell <[email protected]>
* Par4All 1.0.3
* Par4All 1.0.3 is a base version to be used from Windows in a
Wild Cruncher..
* NVIDIA_GPU_Computing_SDK is no longer needed to produce CUDA
code for nVidia GPU.
* Can now compile on Fedora.
* New option to use simpler #include recovery.
* Can move produced files to a given directory.
* Improved documentation.
* New options for Python code injection.
* Better error messages.
2010-09-16 Ronan Keryell <[email protected]>
* Par4All 1.0.2
* Clean up examples and their README files.
Added an option to run Hyantes example in single precision
(for demos on laptops with small GPU).
* Updated to current PIPS and PyPS version.
* --accel without --cuda works again for GPU emulation in OpenMP.
* Do not use new PyPS #include recovery.
* Recover any #include, not only standard ones (useful for Par4All
Scilab).
* Do not use PIPS capply by default when running phases.
* Added path normalization in p4a_setup so that configure can
take relative path.
* p4a_validate post-processing utility updated to cope with new
PIPS validation output.
2010-07-23 Ronan Keryell <[email protected]>
* Par4All 1.0.1
* Corrected library name issue for libcutil on 32 bit x86.
* Fixed bugs into array_to_pointer.
* Corrected behaviour of NULL for nvcc.
* Fixed usage of limit_nested_parallelism in p4a_process.py.
* Better error message display.
* Fixed OpenMP prettyprint of C parallel loops with label.
* Deals with clock() intrinsic function.
2010-07-16 Ronan Keryell <[email protected]>
* Par4All 1.0
* Initial version of Par4All released.
%%% Local Variables:
%%% ispell-local-dictionary: "american"
%%% End: