forked from apache/nuttx
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
2763 lines (2309 loc) · 142 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
NuttX TODO List (Last updated July 2, 2020)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This file summarizes known NuttX bugs, limitations, inconsistencies with
standards, things that could be improved, and ideas for enhancements. This
TODO list does not include issues associated with individual board ports. See
also the individual README.txt files in the boards/ sub-directories for
issues related to each board port.
nuttx/:
(16) Task/Scheduler (sched/)
(5) SMP
(1) Memory Management (mm/)
(0) Power Management (drivers/pm)
(5) Signals (sched/signal, arch/)
(2) pthreads (sched/pthread, libs/libc/pthread)
(0) Message Queues (sched/mqueue)
(1) Work Queues (sched/wqueue)
(6) Kernel/Protected Build
(3) C++ Support
(5) Binary loaders (binfmt/)
(17) Network (net/, drivers/net)
(4) USB (drivers/usbdev, drivers/usbhost)
(2) Other drivers (drivers/)
(9) Libraries (libs/libc/, libs/libm/)
(12) File system/Generic drivers (fs/, drivers/)
(10) Graphics Subsystem (graphics/)
(1) Build system / Toolchains
(2) Linux/Cygwin simulation (arch/sim)
(5) ARM (arch/arm/)
apps/ and other Add-Ons:
(1) Network Utilities (apps/netutils/)
(1) NuttShell (NSH) (apps/nshlib)
(2) System libraries apps/system (apps/system)
(1) Modbus (apps/modbus)
(5) Other Applications & Tests (apps/examples/)
o Task/Scheduler (sched/)
^^^^^^^^^^^^^^^^^^^^^^^
Title: CHILD PTHREAD TERMINATION
Description: When a tasks exits, shouldn't all of its child pthreads also be
terminated?
This behavior was implemented as an options controlled by the
configuration setting CONFIG_SCHED_EXIT_KILL_CHILDREN. This
option must be used with caution, however. It should not be
used unless you are certain of what you are doing. Uninformed
of this option can often lead to memory leaks since, for
example, memory allocations held by threads are not
automatically freed!
Status: Closed. No, this behavior will not be implemented unless
specifically selected.
Priority: Medium, required for good emulation of process/pthread model.
The current behavior allows for the main thread of a task to
exit() and any child pthreads will persist. That does raise
some issues: The main thread is treated much like just-another-
pthread but must follow the semantics of a task or a process.
That results in some inconsistencies (for example, with robust
mutexes, what should happen if the main thread exits while
holding a mutex?)
Title: pause() NON-COMPLIANCE
Description: In the POSIX description of this function the pause() function
must suspend the calling thread until delivery of a signal whose
action is either to execute a signal-catching function or to
terminate the process. The current implementation only waits for
any non-blocked signal to be received. It should only wake up if
the signal is delivered to a handler.
Status: Open.
Priority: Medium Low.
Title: ON-DEMAND PAGING INCOMPLETE
Description: On-demand paging has recently been incorporated into the RTOS.
The design of this feature is described here:
https://nuttx.apache.org/docs/latest/components/paging.html.
As of this writing, the basic feature implementation is
complete and much of the logic has been verified. The test
harness for the feature exists only for the NXP LPC3131 (see
boards/arm/lpc31xx/ea3131/configs/pgnsh and locked
directories). There are some limitations of this testing so
I still cannot say that the feature is fully functional.
Status: Open. This has been put on the shelf for some time.
Priority: Medium-Low
Title: GET_ENVIRON_PTR()
Description: get_environ_ptr() (sched/sched_getenvironptr.c) is not implemented.
The representation of the environment strings selected for
NuttX is not compatible with the operation. Some significant
re-design would be required to implement this function and that
effort is thought to be not worth the result.
Status: Open. No change is planned.
Priority: Low -- There is no plan to implement this.
Title: TIMER_GETOVERRUN()
Description: timer_getoverrun() (sched/timer_getoverrun.c) is not implemented.
Status: Open
Priority: Low -- There is no plan to implement this.
Title: INCOMPATIBILITIES WITH execv() AND execl()
Description: Simplified 'execl()' and 'execv()' functions are provided by
NuttX. NuttX does not support processes and hence the concept
of overlaying a tasks process image with a new process image
does not make any sense. In NuttX, these functions are
wrapper functions that:
1. Call the non-standard binfmt function 'exec', and then
2. exit(0).
As a result, the current implementations of 'execl()' and
'execv()' suffer from some incompatibilities, the most
serious of these is that the exec'ed task will not have
the same task ID as the vfork'ed function. So the parent
function cannot know the ID of the exec'ed task.
Status: Open
Priority: Medium Low for now
Title: ISSUES WITH atexit(), on_exit(), AND pthread_cleanup_pop()
Description: These functions execute with the following bad properties:
1. They run with interrupts disabled,
2. They run in supervisor mode (if applicable), and
3. They do not obey any setup of PIC or address
environments. Do they need to?
4. In the case of task_delete() and pthread_cancel() without
deferred cancellation, these callbacks will run on the
thread of execution and address context of the caller of
task_delete() or pthread_cancel(). That is very bad!
The fix for all of these issues it to have the callbacks
run on the caller's thread as is currently done with
signal handlers. Signals are delivered differently in
PROTECTED and KERNEL modes: The delivery involves a
signal handling trampoline function in the user address
space and two signal handlers: One to call the signal
handler trampoline in user mode (SYS_signal_handler) and
on in with the signal handler trampoline to return to
supervisor mode (SYS_signal_handler_return)
The primary difference is in the location of the signal
handling trampoline:
- In PROTECTED mode, there is on a single user space blob
with a header at the beginning of the block (at a well-
known location. There is a pointer to the signal handler
trampoline function in that header.
- In the KERNEL mode, a special process signal handler
trampoline is used at a well-known location in every
process address space (ARCH_DATA_RESERVE->ar_sigtramp).
Status: Open
Priority: Medium Low. This is an important change to some less
important interfaces. For the average user, these
functions are just fine the way they are.
Title: execv() AND vfork()
Description: There is a problem when vfork() calls execv() (or execl()) to
start a new application: When the parent thread calls vfork()
it receives and gets the pid of the vforked task, and *not*
the pid of the desired execv'ed application.
The same tasking arrangement is used by the standard function
posix_spawn(). However, posix_spawn uses the non-standard, internal
NuttX interface task_reparent() to replace the child's parent task
with the caller of posix_spawn(). That cannot be done with vfork()
because we don't know what vfork() is going to do.
Any solution to this is either very difficult or impossible without
an MMU.
Status: Open
Priority: Low (it might as well be low since it isn't going to be fixed).
Title: errno IS NOT SHARED AMONG THREADS
Description: In NuttX, the errno value is unique for each thread. But for
bug-for-bug compatibility, the same errno should be shared by
the task and each thread that it creates. It is *very* easy
to make this change: Just move the tls_errno field from
struct tls_info_s to struct task_group_s. However, I am still
not sure if this should be done or not.
NOTE: glibc behaves this way unless __thread is defined then,
in that case, it behaves like NuttX (using TLS to save the
thread local errno).
Status: Closed. The existing solution is better and compatible with
thread-aware GLIBC (although its incompatibilities could show
up in porting some code). I will retain this issue for
reference only.
Priority: N/A
Title: SCALABILITY
Description: Task control information is retained in simple lists. This
is completely appropriate for small embedded systems where
the number of tasks, N, is relatively small. Most list
operations are O(N). This could become an issue if N gets
very large.
In that case, these simple lists should be replaced with
something more performant such as a balanced tree in the
case of ordered lists. Fortunately, most internal lists are
hidden behind simple accessor functions and so the internal
data structures can be changed if need with very little impact.
Explicitly reference to the list structure are hidden behind
the macro this_task().
Status: Open
Priority: Low. Things are just the way that we want them for the way
that NuttX is used today.
Title: INTERNAL VERSIONS OF USER FUNCTIONS
Description: The internal NuttX logic uses the same interfaces as does
the application. That sometime produces a problem because
there is "overloaded" functionality in those user interfaces
that are not desirable.
For example, having cancellation points hidden inside of the
OS can cause non-cancellation point interfaces to behave
strangely.
Here is another issue: Internal OS functions should not set
errno and should never have to look at the errno value to
determine the cause of the failure. The errno is provided
for compatibility with POSIX application interface
requirements and really doesn't need to be used within the
OS.
Both of these could be fixed if there were special internal
versions these functions. For example, there could be a an
nxsem_wait() that does all of the same things as sem_wait()
was does not create a cancellation point and does not set
the errno value on failures.
Everything inside the OS would use nx_sem_wait().
Applications would call sem_wait() which would just be a
wrapper around nx_sem_wait() that adds the cancellation point
and that sets the errno value on failures.
On particularly difficult issue is the use of common memory
manager C, and NX libraries in the build. For the PROTECTED
and KERNEL builds, this issue is resolved. In that case,
The OS links with a different version of the libraries than
does the application: The OS version would use the OS internal
interfaces and the application would use the standard
interfaces.
But for the FLAT build, both the OS and the applications use
the same library functions. For applications, the library
functions *must* support errno's and cancellation and, hence,
these are also used within the OS.
But that raises yet another issue: If the application
version of the libraries use the standard interfaces
internally, then they may generate unexpected cancellation
points. For example, the memory management would take a
semaphore using sem_wait() to get exclusive access to the
heap. That means that every call to malloc() and free()
would be a cancellation point, a clear POSIX violation.
Changes like that could clean up some of this internal
craziness.
UPDATE:
2017-10-03: This change has been completed for the case of
semaphores used in the OS. Still need to checkout signals
and messages queues that are also used in the OS. Also
backed out commit b4747286b19d3b15193b2a5e8a0fe48fa0a8638c.
2017-10-06: This change has been completed for the case of
signals used in the OS. Still need to checkout messages
queues that are also used in the OS.
2017-10-10: This change has been completed for the case of
message queue used in the OS. I am keeping this issue
open because (1) there are some known remaining calls that
that will modify the errno (such as dup(), dup2(),
nxtask_activate(), kthread_create(), exec(), mq_open(),
mq_close(), and others) and (2) there may still be calls that
create cancellation points. Need to check things like open(),
close(), read(), write(), and possibly others.
2018-01-30: This change has been completed for the case of
scheduler functions used within the OS: sched_getparam(),
sched_setparam(), sched_getscheduler(), sched_setschedule(),
and sched_setaffinity(),
2018-09-15: This change has been completed for the case of
open() used within the OS. There are places under libs/ and
boards/ that have not been converted. I also note cases
where fopen() is called under libs/libc/netdb/.
2019-09-11: built_isavail() no longer sets the errno variable.
Status: Open
Priority: Low. Things are working OK the way they are. But the design
could be improved and made a little more efficient with this
change.
Task: IDLE THREAD TCB SETUP
Description: There are issues with setting IDLE thread stacks:
1. One problem is stack-related data in the IDLE threads TCB.
A solution might be to standardize the use of g_idle_topstack.
That you could add initialization like this in nx_start:
@@ -344,6 +347,11 @@ void nx_start(void)
g_idleargv[1] = NULL;
g_idletcb.argv = g_idleargv;
+ /* Set the IDLE task stack size */
+
+ g_idletcb.cmn.adj_stack_size = CONFIG_IDLETHREAD_STACKSIZE;
+ g_idletcb.cmn.stack_alloc_ptr = (void *)(g_idle_topstack - CONFIG_IDLETHREAD_STACKSIZE);
+
/* Then add the idle task's TCB to the head of the ready to run list */
dq_addfirst((FAR dq_entry_t *)&g_idletcb, (FAR dq_queue_t *)&g_readytorun);
The g_idle_topstack variable is available for almost all architectures:
$ find . -name *.h | xargs grep g_idle_top
./arm/src/common/up_internal.h:EXTERN const uint32_t g_idle_topstack;
./avr/src/avr/avr.h:extern uint16_t g_idle_topstack;
./avr/src/avr32/avr32.h:extern uint32_t g_idle_topstack;
./hc/src/common/up_internal.h:extern uint16_t g_idle_topstack;
./mips/src/common/up_internal.h:extern uint32_t g_idle_topstack;
./misoc/src/lm32/lm32.h:extern uint32_t g_idle_topstack;
./renesas/src/common/up_internal.h:extern uint32_t g_idle_topstack;
./renesas/src/m16c/chip.h:extern uint32_t g_idle_topstack; /* Start of the heap */
./risc-v/src/common/up_internal.h:EXTERN uint32_t g_idle_topstack;
./x86/src/common/up_internal.h:extern uint32_t g_idle_topstack;
That omits these architectures: sh1, sim, xtensa, z16, z80,
ez80, and z8. All would have to support this common
global variable.
Also, the stack itself may be 8-, 16-, or 32-bits wide,
depending upon the architecture and do have differing
alignment requirements.
2. Another problem is colorizing that stack to use with
stack usage monitoring logic. There is logic in some
start functions to do this in a function called go_nx_start.
It is available in these architectures:
./arm/src/efm32/efm32_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/kinetis/kinetis_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/sam34/sam_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/samv7/sam_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/stm32/stm32_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/stm32f7/stm32_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/stm32l4/stm32l4_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/tms570/tms570_boot.c:static void go_nx_start(void *pv, unsigned int nbytes)
./arm/src/xmc4/xmc4_start.c:static void go_nx_start(void *pv, unsigned int nbytes)
But no others.
Status: Open
Priority: Low, only needed for more complete debug.
Title: PRIORITY INHERITANCE WITH SPORADIC SCHEDULER
Description: The sporadic scheduler manages CPU utilization by a task by
alternating between a high and a low priority. In either
state, it may have its priority boosted. However, under
some circumstances, it is impossible in the current design to
switch to the correct priority if a semaphore held by the
sporadic thread is participating in priority inheritance:
There is an issue when switching from the high to the low
priority state. If the priority was NOT boosted above the
higher priority, it still may still need to boosted with
respect to the lower priority. If the highest priority
thread waiting on a semaphore held by the sporadic thread is
higher in priority than the low priority but less than the
higher priority, then new thread priority should be set to
that middle priority, not to the lower priority.
In order to do this we would need to know the highest
priority from among all tasks waiting for the all semaphores
held by the sporadic task. That information could be
retained by the priority inheritance logic for use by the
sporadic scheduler. The boost priority could be retained in
a new field of the TCB (say, pend_priority). That
pend_priority could then be used when switching from the
higher to the lower priority.
Status: Open
Priority: Low. Does anyone actually use the sporadic scheduler?
Title: SIMPLIFY SPORADIC SCHEDULER DESIGN
Description: I have been planning to re-implement sporadic scheduling for
some time. I believe that the current implementation is
unnecessarily complex. There is no clear statement for the
requirements of sporadic scheduling that I could find, so I
based the design on some behaviors of another OS that I saw
published (QNX as I recall).
But I think that the bottom line requirement for sporadic
scheduling is that is it should make a best attempt to
control a fixed percentage of CPU bandwidth for a task in
during an interval only by modifying it is priority between
a low and a high priority. The current design involves
several timers: A "budget" timer plus a variable number of
"replenishment" timers and a lot of nonsense to duplicate QNX
behavior that I think I not necessary.
It think that the sporadic scheduler could be re-implemented
with only the single "budget" timer. Instead of starting a
new "replenishment" timer when the task is resumed, that
single timer could just be extended.
Status: Open
Priority: Low. This is an enhancement. And does anyone actually use
the sporadic scheduler?
Title: REMOVE NESTED CANCELLATION POINT SUPPORT
Description: The current implementation support nested cancellation points.
The TCB field cpcount keeps track of that nesting level.
However, cancellation points should not be calling other
cancellation points so this design could be simplified by
removing all support for nested cancellation points.
Status: Open
Priority: Low. No harm is being done by the current implementation.
This change is primarily for aesthetic reasons. If would
reduce memory usage by a very small but probably
insignificant amount.
Title: DAEMONIZE ELF PROGRAM
Description: It is a common practice to "daemonize" to detach a task from
its parent. This is used with NSH, for example, so that NSH
will not stall, waiting in waitpid() for the child task to
exit.
Daemonization is done to creating a new task which continues
to run while the original task exits (sending the SIGCHLD
signal to the parent and awakening waitpid()). In a pure
POSIX system, this is down with fork(), perhaps like:
if (fork() != 0)
{
exit();
}
but is usually done with task_create() in NuttX. But when
task_create() is called from within an ELF program, a very
perverse situation is created:
The basic problem involves address environments and task groups:
"Task groups" are emulations of Linux processes. For the
case of the FLAT, ELF module, the address environment is
allocated memory that contains the ELF module.
When you call task_create() from the ELF program, you now
have two task groups running in the same address environment.
That is a perverse situation for which there is no standard
solution. There is nothing comparable to that. Even in
Linux, fork() creates another address environment (although
it is an exact copy of the original).
When the ELF program was created, the function exec() in
binfmt/binfmt_exec.c runs. It sets up a call back that will
be invoked when the ELF program exits.
When ELF program exits, the address environment is destroyed
and the other task running in the same address environment is
then running in stale memory and will eventually crash.
Nothing special happens when the other created task running
in the allocated address environment exits since has no such
call backs.
In order to make this work you would need logic like:
1. When the ELF task calls task_create(), it would need to:
a. Detect that task_create() was called from an ELF program,
b. increment a reference count on the address environment, and
c. Set up the same exit hook for the newly created task.
2. Then when either the ELF program task or the created task
in the same address environment exits, it would decrement
the reference count. When the last task exits, the reference
count would go to zero and the address environment could be
destroyed.
This is complex work and would take some effort and probably
requires redesign of existing code and interfaces to get a
proper, clean, modular solution.
Status: Open
Priority: Medium-Low. A simple work-arounds when using NSH is to use
the '&' postfix to put the started ELF program into background.
o SMP
^^^
Title: SMP AND DATA CACHES
Description: When spinlocks, semaphores, etc. are used in an SMP system with
a data cache, then there may be problems with cache coherency
in some CPU architectures: When one CPU modifies the shared
object, the changes may not be visible to another CPU if it
does not share the data cache. That would cause failure in
the IPC logic.
Flushing the D-cache on writes and invalidating before a read is
not really an option. That would essentially effect every memory
access and there may be side-effects due to cache line sizes
and alignment.
For the same reason a separate, non-cacheable memory region is
not an option. Essentially all data would have to go in the
non-cached region and you would have no benefit from the data
cache.
On ARM Cortex-A, each CPU has a separate data cache. However,
the MPCore's Snoop Controller Unit supports coherency among
the different caches. The SCU is enabled by the SCU control
register and each CPU participates in the SMP coherency by
setting the ACTLR_SMP bit in the auxiliary control register
(ACTLR).
Status: Closed
Priority: High on platforms that may have the issue.
Title: MISUSE OF sched_lock() IN SMP MODE
Description: The OS API sched_lock() disables pre-emption and locks a
task in place. In the single CPU case, it is also often
used to enforce a simple critical section since not other
task can run while pre-emption is locked.
This, however, does not generalize to the SMP case. In the
SMP case, there are multiple tasks running on multiple CPUs.
The basic behavior is still correct: The task that has
locked pre-emption will not be suspended. However, there
is no longer any protection for use as a critical section:
tasks running on other CPUs may still execute that
unprotected code region.
The solution is to replace the use of sched_lock() with
stronger protection such as spin_lock_irqsave().
Status: Open
Priority: Medium for SMP system. Not critical to single CPU systems.
NOTE: There are no known bugs from this potential problem.
Title: CORTEX-A GIC SGI INTERRUPT MASKING
Description: In the ARMv7-A GICv2 architecture, the inter-processor
interrupts (SGIs) are non maskable and will occur even if
interrupts are disabled. This adds a lot of complexity
to the ARMV7-A critical section design.
Masayuki Ishikawa has suggested the use of the GICv2 ICCMPR
register to control SGI interrupts. This register (much like
the ARMv7-M BASEPRI register) can be used to mask interrupts
by interrupt priority. Since SGIs may be assigned priorities
the ICCMPR should be able to block execution of SGIs as well.
Such an implementation would be very similar to the BASEPRI
(vs PRIMASK) implementation for the ARMv7-M: (1) The
up_irq_save() and up_irq_restore() registers would have to
set/restore the ICCMPR register, (2) register setup logic in
arch/arm/src/armv7-a for task start-up and signal dispatch
would have to set the ICCMPR correctly, and (3) the 'xcp'
structure would have to be extended to hold the ICCMPR
register; logic would have to added be save/restore the
ICCMPR register in the 'xcp' structure on each interrupt and
context switch.
This would also be an essential part of a high priority,
nested interrupt implementation (unrelated).
Status: Open
Priority: Low. There are no known issues with the current non-maskable
SGI implementation. This change would, however, lead to
simplification in the design and permit commonality with
other, non-GIC implementations.
Title: ISSUES WITH ACCESSING CPU INDEX
Description: The CPU number is accessed usually with the macro this_cpu().
The returned CPU number is then used for various things,
typically as an array index. However, if pre-emption is
not disabled,then it is possible that a context switch
could occur and that logic could run on another CPU with
possible fatal consequences.
We need to evaluate all use of this_cpu() and assure that
it is used in a way that guarantees the the code continues
to execute on the same CPU.
Status: Open
Prioity: Medium. This is a logical problem but I have never seen
an bugs caused by this. But I believe that failures are
possible.
Title: POSSIBLE FOR TWO CPUs TO HOLD A CRITICAL SECTION?
Description: The SMP design includes logic that will support multiple
CPUs holding a critical section. Is this necessary? How
can that occur? I think it can occur in the following
situation:
The log below was reported is Nuttx running on two cores
Cortex-A7 architecture in SMP mode. You can notice see that
when nxsched_add_readytorun() was called, the g_cpu_irqset is 3.
nxsched_add_readytorun: irqset cpu 1, me 0 btcbname init, irqset 1 irqcount 2.
nxsched_add_readytorun: nxsched_add_readytorun line 338 g_cpu_irqset = 3.
This can happen, but only under a very certain condition.
g_cpu_irqset only exists to support this certain condition:
a. A task running on CPU 0 takes the critical section. So
g_cpu_irqset == 0x1.
b. A task exits on CPU 1 and a waiting, ready-to-run task
is re-started on CPU 1. This new task also holds the
critical section. So when the task is re-restarted on
CPU 1, we than have g_cpu_irqset == 0x3
So we are in a very perverse state! There are two tasks
running on two different CPUs and both hold the critical
section. I believe that is a dangerous situation and there
could be undiscovered bugs that could happen in that case.
However, as of this moment, I have not heard of any specific
problems caused by this weird behavior.
A possible solution would be to add a new task state that
would exist only for SMP.
- Add a new SMP-only task list and state. Say,
g_csection_wait[]. It should be prioritized.
- When a task acquires the critical section, all tasks in
g_readytorun[] that need the critical section would be
moved to g_csection_wait[].
- When any task is unblocked for any reason and moved to the
g_readytorun[] list, if that unblocked task needs the
critical section, it would also be moved to the
g_csection_wait[] list. No task that needs the critical
section can be in the ready-to-run list if the critical
section is not available.
- When the task releases the critical section, all tasks in
the g_csection_wait[] needs to be moved back to
g_readytorun[].
- This may result in a context switch. The tasks should be
moved back to g_readytorun[] highest priority first. If a
context switch occurs and the critical section to re-taken
by the re-started task, the lower priority tasks in
g_csection_wait[] must stay in that list.
That is really not as much work as it sounds. It is
something that could be done in 2-3 days of work if you know
what you are doing. Getting the proper test setup and
verifying the change would be the more difficult task.
Status: Open
Priority: Unknown. Might be high, but first we would need to confirm
that this situation can occur and that is actually causes
a failure.
o Memory Management (mm/)
^^^^^^^^^^^^^^^^^^^^^^^
Title: FREE MEMORY ON TASK EXIT
Description: Add an option to free all memory allocated by a task when the
task exits. This is probably not be worth the overhead for a
deeply embedded system.
There would be complexities with this implementation as well
because often one task allocates memory and then passes the
memory to another: The task that "owns" the memory may not
be the same as the task that allocated the memory.
Update. From the NuttX forum:
...there is a good reason why task A should never delete task B.
That is because you will strand memory resources. Another feature
lacking in most flat address space RTOSs is automatic memory
clean-up when a task exits.
That behavior just comes for free in a process-based OS like Linux:
Each process has its own heap and when you tear down the process
environment, you naturally destroy the heap too.
But RTOSs have only a single, shared heap. I have spent some time
thinking about how you could clean up memory required by a task
when a task exits. It is not so simple. It is not as simple as
just keeping memory allocated by a thread in a list then freeing
the list of allocations when the task exists.
It is not that simple because you don't know how the memory is
being used. For example, if task A allocates memory that is used
by task B, then when task A exits, you would not want to free that
memory needed by task B. In a process-based system, you would
have to explicitly map shared memory (with reference counting) in
order to share memory. So the life of shared memory in that
environment is easily managed.
I have thought that the way that this could be solved in NuttX
would be: (1) add links and reference counts to all memory allocated
by a thread. This would increase the memory allocation overhead!
(2) Keep the list head in the TCB, and (3) extend mmap() and munmap()
to include the shared memory operations (which would only manage
the reference counting and the life of the allocation).
Then what about pthreads? Memory should not be freed until the last
pthread in the group exists. That could be done with an additional
reference count on the whole allocated memory list (just as streams
and file descriptors are now shared and persist until the last
pthread exits).
I think that would work but to me is very unattractive and
inconsistent with the NuttX "small footprint" objective. ...
Other issues:
- Memory free time would go up because you would have to remove
the memory from that list in free().
- There are special cases inside the RTOS itself. For example,
if task A creates task B, then initial memory allocations for
task B are created by task A. Some special allocators would
be required to keep this memory on the correct list (or on
no list at all).
Updated 2016-06-25:
For processors with an MMU (Memory Management Unit), NuttX can be
built in a kernel mode. In that case, each process will have a
local copy of its heap (filled with sbrk()) and when the process
exits, its local heap will be destroyed and the underlying page
memory is recovered.
So in this case, NuttX work just link Linux or or *nix systems:
All memory allocated by processes or threads in processes will
be recovered when the process exits.
But not for the flat memory build. In that case, the issues
above do apply. There is no safe way to recover the memory in
that case (and even if there were, the additional overhead would
not be acceptable on most platforms).
This does not prohibit anyone from creating a wrapper for malloc()
and an atexit() callback that frees memory on task exit. People
are free and, in fact, encouraged, to do that. However, since
it is inherently unsafe, I would never incorporate anything
like that into NuttX.
Status: Open. No changes are planned. NOTE: This applies to the FLAT
and PROTECTED builds only. There is no such leaking of memory
in the KERNEL build mode.
Priority: Medium/Low, a good feature to prevent memory leaks but would
have negative impact on memory usage and code size.
o Power Management (drivers/pm)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
o Signals (sched/signal, arch/)
^^^^^^^^^^^^^^^^^^^^^^^
Title: STANDARD SIGNALS
Description: 'Standard' signals and signal actions are not fully
supported. The SIGCHLD signal is supported and, if the
option CONFIG_SIG_DEFAULT=y is included, some signals will
perform their default actions (dependent upon addition
configuration settings):
Signal Action Additional Configuration
------- -------------------- -------------------------
SIGUSR1 Abnormal Termination CONFIG_SIG_SIGUSR1_ACTION
SIGUSR2 Abnormal Termination CONFIG_SIG_SIGUSR2_ACTION
SIGALRM Abnormal Termination CONFIG_SIG_SIGALRM_ACTION
SIGPOLL Abnormal Termination CONFIG_SIG_SIGPOLL_ACTION
SIGSTOP Suspend task CONFIG_SIG_SIGSTOP_ACTION
SIGSTP Suspend task CONFIG_SIG_SIGSTOP_ACTION
SIGCONT Resume task CONFIG_SIG_SIGSTOP_ACTION
SIGINT Abnormal Termination CONFIG_SIG_SIGKILL_ACTION
SIGKILL Abnormal Termination CONFIG_SIG_SIGKILL_ACTION
Status: Open. No further changes are planned.
Priority: Low, required by standards but not so critical for an
embedded system.
Title: SIGEV_THREAD
Description: Implementation of support for SIGEV_THREAD is available
only in the FLAT build mode because it uses the OS work queues to
perform the callback. The alternative for the PROTECTED and KERNEL
builds would be to create pthreads in the user space to perform the
callbacks. That is not a very attractive solution due to performance
issues. It would also require some additional logic to specify the
TCB of the parent so that the pthread could be bound to the correct
group.
There is also some user-space logic in libs/libc/aio/lio_listio.c.
That logic could use the user-space work queue for the callbacks.
Status: Low, there are alternative designs. However, these features
are required by the POSIX standard.
Priority: Low for now
Title: SIGNAL NUMBERING
Description: In signal.h, the range of valid signals is listed as 0-31. However,
in many interfaces, 0 is not a valid signal number. The valid
signal number should be 1-32. The signal set operations would need
to map bits appropriately.
Status: Open
Priority: Low. Even if there are only 31 usable signals, that is still a lot.
Title: NO QUEUING of SIGNAL ACTIONS
Description: In the architecture specific implementation of struct xcptcontext,
there are fields used by signal handling logic to pass the state
information needed to dispatch signal actions to the appropriate
handler.
There is only one copy of this state information in the
implementations of struct xcptcontext and, as a consequence,
if there is a signal handler executing on a thread, then addition
signal actions will be lost until that signal handler completes
and releases those resources.
Status: Open
Priority: Low. This design flaw has been around for ages and no one has yet
complained about it. Apparently the visibility of the problem is
very low.
Title: QUEUED SIGNAL ACTIONS ARE INAPPROPRIATELY DEFERRED
Description: The implement of nxsig_deliver() does the following in a loop:
- It takes the next next queued signal action from a list
- Calls the architecture-specific up_sigdeliver() to perform
the signal action (through some sleight of hand in
up_schedule_sigaction())
- up_sigdeliver() is a trampoline function that performs the
actual signal action as well as some housekeeping functions
then
- up_sigdeliver() performs a context switch back to the normal,
uninterrupted thread instead of returning to nxsig_deliver().
The loop in nxsig_deliver() then will have the opportunity to
run until when that normal, uninterrupted thread is suspended.
Then the loop will continue with the next queued signal
action.
Normally signals execute immediately. The is the whole reason
why almost all blocking APIs return when a signal is received
(with errno equal to EINTR).
Status: Open
Priority: Low. This design flaw has been around for ages and no one has yet
complained about it. Apparently the visibility of the problem is
very low.
o pthreads (sched/pthreads libs/libc/pthread)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Title: PTHREAD_PRIO_PROTECT
Description: Extend pthread_mutexattr_setprotocol(). It should support
PTHREAD_PRIO_PROTECT (and so should its non-standard counterpart
sem_setproto()).
"When a thread owns one or more mutexes initialized with the
PTHREAD_PRIO_PROTECT protocol, it shall execute at the higher of its
priority or the highest of the priority ceilings of all the mutexes
owned by this thread and initialized with this attribute, regardless of
whether other threads are blocked on any of these mutexes or not.
"While a thread is holding a mutex which has been initialized with
the PTHREAD_PRIO_INHERIT or PTHREAD_PRIO_PROTECT protocol attributes,
it shall not be subject to being moved to the tail of the scheduling queue
at its priority in the event that its original priority is changed,
such as by a call to sched_setparam(). Likewise, when a thread unlocks
a mutex that has been initialized with the PTHREAD_PRIO_INHERIT or
PTHREAD_PRIO_PROTECT protocol attributes, it shall not be subject to
being moved to the tail of the scheduling queue at its priority in the
event that its original priority is changed."
Status: Open. No changes planned.
Priority: Low -- about zero, probably not that useful. Priority inheritance is
already supported and is a much better solution. And it turns out
that priority protection is just about as complex as priority inheritance.
Excerpted from my post in a Linked-In discussion:
"I started to implement this HLS/"PCP" semaphore in an RTOS that I
work with (https://apache.nuttx.org) and I discovered after doing the
analysis and basic code framework that a complete solution for the
case of a counting semaphore is still quite complex -- essentially
as complex as is priority inheritance.
"For example, suppose that a thread takes 3 different HLS semaphores
A, B, and C. Suppose that they are prioritized in that order with
A the lowest and C the highest. Suppose the thread takes 5 counts
from A, 3 counts from B, and 2 counts from C. What priority should
it run at? It would have to run at the priority of the highest
priority semaphore C. This means that the RTOS must maintain
internal information of the priority of every semaphore held by
the thread.
"Now suppose it releases one count on semaphore B. How does the
RTOS know that it still holds 2 counts on B? With some complex
internal data structure. The RTOS would have to maintain internal
information about how many counts from each semaphore are held
by each thread.
"How does the RTOS know that it should not decrement the priority
from the priority of C? Again, only with internal complexity. It
would have to know the priority of every semaphore held by
every thread.
"Providing the HLS capability on a simple pthread mutex would not
be such quite such a complex job if you allow only one mutex per
thread. However, the more general case seems almost as complex
as priority inheritance. I decided that the implementation does
not have value to me. I only wanted it for its reduced
complexity; in all other ways I believe that it is the inferior
solution. So I discarded a few hours of programming. Not a
big loss from the experience I gained."
Title: INAPPROPRIATE USE OF sched_lock() BY pthreads
Description: In implementation of standard pthread functions, the non-
standard, NuttX function sched_lock() is used. This is very
strong since it disables pre-emption for all threads in all
task groups. I believe it is only really necessary in most
cases to lock threads in the task group with a new non-
standard interface, say pthread_lock().
This is because the OS resources used by a thread such as
mutexes, condition variable, barriers, etc. are only
meaningful from within the task group. So, in order to
performance exclusive operations on these resources, it is
only necessary to block other threads executing within the
task group.
This is an easy change: pthread_lock() and pthread_unlock()
would simply operate on a semaphore retained in the task
group structure. I am, however, hesitant to make this change:
In the FLAT build model, there is nothing that prevents people
from accessing the inter-thread controls from threads in
different task groups. Making this change, while correct,
might introduce subtle bugs in code by people who are not
using NuttX correctly.
Status: Open
Priority: Low. This change would improve real-time performance of the
OS but is not otherwise required.
o Message Queues (sched/mqueue)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
o Work Queues (sched/wqueue)
^^^^^^^^^^^^^^^^^^^^^^^^^^
Title: WORK QUEUE DELAY INACCURACIES
Description: Each queued work may have an optional delay value associated
with it. That delay should be respect to the time that the
work is queued. However, since we do not know the time the
work is queue, the actual delay will be respect to the time
that the work is processed. Under certain conditions, the
work may sit in the queue for some time before it is
processed, leading to an inaccuracy in the delay.
One solution might involved saving the time when in the work
structure when the work is queued. Then the delay logic can
take the difference between the processing time and the
queued time to get a more accurate delay.
Status: Open
Priority: In all known use cased, the priority is low. A problem is
would only occur if the work queue is overload or if work in
the work queue suspends waiting for a resource (both of which
are much bigger problems).
o Kernel/Protected Build
^^^^^^^^^^^^^^^^^^^^^^
Title: C++ CONSTRUCTORS HAVE TOO MANY PRIVILEGES (PROTECTED MODE)
Description: When a C++ ELF module is loaded, its C++ constructors are called
via sched/task_starthook.c logic. This logic runs in protected mode.
The is a security hole because the user code runs with kernel-
privileges when the constructor executes.
Destructors likely have the opposite problem. The probably try to
execute some kernel logic in user mode? Obviously this needs to
be investigated further.
Status: Open
Priority: Low (unless you need build a secure C++ system).
Title: TOO MANY SYSCALLS
Description: There are a few syscalls that operate very often in user space.
Since syscalls are (relatively) time consuming this could be
a performance issue. Here is some numbers that I collected
in an application that was doing mostly printf output:
sem_post - 18% of syscalls
sem_wait - 18% of syscalls
getpid - 59% of syscalls
--------------------------
95% of syscalls
Obviously system performance could be improved greatly by simply
optimizing these functions so that they do not need to system calls
so frequently. This getpid() call is part of the re-entrant
semaphore logic used with printf() and other C buffered I/O.
Something like TLS might be used to retain the thread's ID
locally.
Linux, for example, has functions call up() and down(). up()
increments the semaphore count but does not call into the kernel
unless incrementing the count unblocks a task; similarly, down
decrements the count and does not call into the kernel unless
the count becomes negative the caller must be blocked.
Update:
"I am thinking that there should be a "magic" global, user-
accessible variable that holds the PID of the currently
executing thread; basically the PID of the task at the head
of the ready-to-run list. This variable would have to be reset
each time the head of the ready-to-run list changes.
"Then getpid() could be implemented in user space with no system call
by simply reading this variable.