-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathindex.html
1595 lines (1364 loc) · 92.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!doctype html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=1200">
<script src="website/scripts/distill.pub_template.v2.js"></script>
<script src="https://d3js.org/d3.v7.min.js"></script>
<link rel="stylesheet" href="website/style.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-GxNFqL3r9uRJQhR+47eDxuPoNE7yLftQM8LcxzgS4HT73tp970WS/wV5p8UzCOmb" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-vZTG03m+2yp6N6BNi5iM4rW4oIwk5DfcNdFfxkk9ZWpDriOkXX8voJBFrAO7MpVl" crossorigin="anonymous"></script>
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-PBVN09FJG8"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-PBVN09FJG8');
</script>
<script >
document.addEventListener("DOMContentLoaded", function() {
renderMathInElement(document.body, {
// customised options
// • auto-render specific keys, e.g.:
delimiters: [
{left: '$$', right: '$$', display: true},
{left: '$', right: '$', display: false},
{left: '\\(', right: '\\)', display: false},
{left: '\\[', right: '\\]', display: true}
],
// • rendering keys, e.g.:
throwOnError : false
});
});
</script>
</head>
<d-front-matter>
<script type="text/json">{"authors": []}</script>
</d-front-matter>
<d-title>
<h1>Requirements for self organization</h1>
<p>
What do you need to have a structure emerge spontaneously?<br>
</p>
</d-title>
<d-byline>
<div class="byline grid">
<div class="authors-affiliations grid">
<h3>Authors</h3>
<h3>Affiliations</h3>
<p class="author">
<a class="name" href="https://github.com/Francesco215">Francesco Sacco</a>
</p>
<p class="affiliation">
<a class="affiliation" href="https://allencenter.tufts.edu/">Allen Discovery Center at Tufts University</a>
</p>
<p class="author">
<a class="name" href="https://darsakthi.github.io/">Dalton Sakthivadivel</a>
</p>
<p class="affiliation">
<a class="affiliation" href="https://www.verses.ai/">VERSES</a>
</p>
<p class="author">
<a class="name" href="https://www.drmichaellevin.org/">Michael Levin</a>
</p>
<p class="affiliation">
<a class="affiliation" href="https://allencenter.tufts.edu/">Allen Discovery Center at Tufts University,</a>
</p>
<p class="author">
<a class="name" href=" "> </a>
</p>
<p class="affiliation">
<a class="affiliation" href="https://wyss.harvard.edu/"> Wyss Institute at Harvard University</a>
</p>
</div>
<div></div>
<div>
<h3>DOI</h3>
<p>10.5281/zenodo.8416764</p>
<h3>Video Abstract</h3>
<a href="https://youtu.be/cGcY-ReeGDU?si=FxXN_8mmSa-YK-GZ"><p>youtu.be/cGcY-ReeGDU</p></a>
</div>
</div>
</d-byline>
<script src="website/scripts/audience-selection.js"></script>
<d-article>
<div style="position:relative;" class="l-body">
<div id="colab-hero-div" style="float:left;" >
<div class="colab-root" id="colab-hero" onclick="showContent('b')">I am a Biologist
</div>
</div>
<div id="colab-hero-div" style="float:left;">
<div class="colab-root" id="colab-hero" onclick="showContent('m')">I am a ML researcher
</div>
</div>
<div id="colab-hero-div"style="float:left;">
<div class="colab-root" id="colab-hero" onclick="showContent('p')">I am a Physicist
</div>
</div>
</div>
<div style="padding:10px 10px ;"></div>
<div>
</div>
<div target_audience="n" style="display: block;">
<d-section id="abs">Adaptive Abstract</d-section>
<p>
This article is tailored to your preferences. Simply choose the researcher profile that resonates with you the most, and enjoy a personalized reading experience.
</p>
<p>
Feel free to join the <a href="https://github.com/Francesco215/Language_CA/discussions">discussion section</a> to exchange ideas and offer feedback. Additionally, you have the opportunity to propose edits to the article in the discussion or issue section, or even by submitting a pull request. Your input is valued!
</p>
</div>
<div target_audience="b" style="display: none;">
<d-section id="abs">Abstract for Biologists</d-section>
<p>
One of the most striking examples of self-organization is the formation of multicellular organisms, where most cells interact with their local environment through their cell membrane. The signals that each cell receives from its neighbors, and shifts in its own internal state, enable it to reach a specific cell type and self-regulate a variety of behaviors such as differentiation, proliferation, migration, and signaling. Despite the limited information available to each individual cell, incredibly complex organisms can be formed by cellular collectives. Crucially, the result is not just emergent complexity, but a kind of homeodynamic ability to take context-appropriate actions to reach the same large-scale form despite perturbations of internal components and external environment.
</p>
<p>
We have previously suggested that the ability of cell groups to traverse anatomical morphospace (during regulative development, regeneration, and cancer suppression) is a kind of unconventional problem-solving capacity that shares important features with other kinds of intelligence, such as familiar behavior and communication at the level of individual organisms.
</p>
<p>
In this work we show how the same process of self-organization that occurs in biological systems can be copied to improve narrative skills in linguistic space, where the words in a text self-organize to create a coherent whole. This is an important step towards characterizing the fundamental invariants that unify not merely the emergence of complexity, but goal-directed behavior, across diverse problem spaces and physical embodiments.
</p>
<p>
But what makes a system capable of self-organizing in the first place? In this paper we show that the ability of a system to self-organize relies on having the right topology for cell communication.
</p>
<p>
Furthermore, using tools from Statistical Physics, we propose that current state of the art architectures for text generation and biological brains lack the appropriate topology to be able to generate arbitrarily long, coherent output, and how organisms developed a habit of using the external environment to circumvent this problem. Thus, we show how exploring the symmetry of ideas between biology, physics, and computer science identifies important aspects of information-processing architectures relevant to diverse implementations.
</p>
</div>
<div target_audience="p" style="display: none;">
<d-section id="abs">Abstract for Physicists</d-section>
<p>
One of the things that makes phase transitions beautiful is their universality. Despite the vast variety and complexity of systems of many particles, there are just a few of different phases of aggregation, and phase transitions are described by just a handful of critical exponents.
</p>
<p>
Furthermore, we observe that many systems tend to attain an ordered state at low temperatures. For instance, in Ising models, the ferromagnetic phase emerges with aligned spins, while materials exhibit a solid phase with a crystalline structure.
</p>
<p>
In this paper we are going to show that asking if it is possible to generate arbitrarily long, coherent text is equivalent to asking if the system is capable of having an ordered phase.
</p>
<p>
But what makes a system capable of self-organizing in the first place? In this paper we show that the ability of a system to self-organize relies on having the right topology for cell communication. Furthermore, we show that current state of the art architectures for text generation lack the appropriate topology to be able to generate arbitrarily long, coherent text.
</p>
</div>
<div target_audience="m" style="display: none;">
<d-section id="abs">Abstract for ML researchers</d-section>
<p>
The complexity of generating a text with $N$ words using the Transformer architecture is $O(N^2)$ <d-cite key="vaswani2017attention"></d-cite>. This makes it computationally infeasible to generate coherent pieces of text as long as books. self-organizing systems, on the other hand, have two big strengths that transformers lack
</p>
<p>
<ul>
<li>
Efficiency: self-organizing systems often exhibit efficient resource allocation and optimization of processes. The emergent patterns and structures that arise from local interactions can lead to efficient distribution of tasks, information, or resources, resulting in overall system efficiency
</li>
<li>
Scalability: self-organizing systems can often scale effectively. As the number of components increases the system can still organize itself without requiring additional centralized control. This property is valuable in various domains, such as computer networks, where scalability is crucial
</li>
</ul>
</p>
<p>Inspired from Biology and using tools from Statistical Physics we show that:</p>
<p>
<ul>
<li>
Autoregressive models, no matter the architecture, the number of parameters, the window size, and the training procedure will never be able to generate text that is coherent at arbitrarily long distances.
</li>
<li>
Transformers with attention matrices with a specifically crafted topology, can in principle be able to generate text that is coherent at arbitrarily long distances.
</li>
<li>
An agent with a finite context size can be capable of enacting long term plans if it is able to interact with an external environment by deliberately storing and retrieving information.
</li>
</ul>
</p>
</div>
<div target_audience="b" style="display: block;">
<d-section id="abs">Introduction for Biologists</d-section>
</div>
<div target_audience="p" style="display: none;">
<d-section id="abs">Introduction for Physicists</d-section>
</div>
<div target_audience="m" style="display: none;">
<d-section id="abs">Introduction for ML researchers</d-section>
</div>
<div target_audience="bmp">
<figure class="l-side">
<img src="website/images/organizing.webp">
<figcaption>
In many self-organizing systems, you see a hierarchical structures emerge
</figcaption>
</figure>
<p>
Advancements in machine learning technologies have brought about a revolution in various areas of biomedical research, by using AI as a tool to facilitate advancing existing paradigms in the field. However, can the exchange of ideas between biology and machine learning work both ways? Can insights from biology prove beneficial in the field of machine learning?
</p>
<div target_audience="bmp" style="display: block;">
<p>
Specifically, can we find powerful invariants across domains - principles of system-level function which help to unify phenomena that have, until now, been treated as disparate sub-fields <d-cite key="fields2023regulative, fields2020scale"></d-cite>. The emerging discipline of “diverse intelligence” combines biology, computer science, and behavioral science to study the features common to active agents of very different provenance (evolved, engineered, and hybrid), material embodiment, and intelligence level
<d-cite key="lyon2021reframing, lyon2006biogenic, levin2022technological, miller2023revised, baluvska2023cellular, reber2021cognition, sridhar2021geometry, rahwan2019machine, deisboeck2009collective"></d-cite>.
</p>
<p>
A highly related concept is that of collective intelligence and swarm cognition <d-cite key="couzin2009collective, bazazi2008collective, couzin2007collective"></d-cite>, which is developing conceptual tools to help understand how evolution pivots and enlarges problem-solving capabilities of individual subunits into greater degrees of competency in larger problem spaces <d-cite key="levin2019computational"></d-cite>.
</p>
<p>
One of the most interesting aspects of biology is morphogenesis - the ability of cell groups to harness individual cell behaviors into reliable construction, repair, and maintenance of complex anatomical forms. We have recently argued that it is empirically fruitful to view this process not merely as emergence of complexity but as a kind of behavior (in anatomical morphospace) of a collective intelligence of cells which is able to solve a variety of problems <d-cite key="pio2023scaling, mathews2023cellular, levin2023bioelectric, levin2023darwin, levin2022collective, lagasse2023future"></d-cite>.
</p>
</div>
<p>
This paper aims to showcase how the process of self-organization, observed in biological systems, can be harnessed to develop coherent narrative skills within linguistic space. By allowing words in a text to self-organize, a cohesive and meaningful whole can be created.
</p>
<p>
In textual context, words can organize themselves into sentences, which further organize into chapters, ultimately forming complete books, etc., as in biological organisms, cells organize into specific structures, which then combine to form organs, culminating in fully grown organisms and swarms
</p>
<p>
Certain machine learning algorithms, such as neural cellular automata, have shown remarkable ability to replicate aspects of the behavior of cells in biological organisms <d-cite key="mordvintsev2020growing"></d-cite> .
</p>
<p>
Here, we demonstrate that attempting to teach a machine learning algorithm to generate text by simply providing each word with information solely from its neighboring words is unsuccessful in achieving self-organization.
In this paper we will prove that this is due to the fact that text is a 1-dimensional sequence of words, and self organization in 1D is impossible. Biological organisms are 3-dimensional so don't have this problem.
</p>
<p>
We also show that, in principle, it is possible to generate long-coherent text by making distant elements of text interact via long range connections.
</p>
<p>
Furthermore we demonstrate that biological brains are likely incapable of retaining memories forever, and thus many species, including humans, have developed skills that allow them to store information for long term by interacting with the external environment.<br> Thus, interacting with the external environment is something that is indispensable for long-term storage for all living beings.
</p>
</div>
<div target_audience="b" style="display: block;">
<p>
These results support the consistent patterns we discussed earlier, highlighting the need for systems trying to store memories or maintain patterns to put that information into their environment. This need arises due to limitations in their self-organizing abilities. Using a generic model we prove that a large class of self-organizing systems face fundamental restrictions on how long they can maintain coherence, and can escape these limitations by writing information to a static environment. We follow up with examples like stigmergy and the human brain which fall under the “universality class” of self-organizational behavior described by this model.
</p>
<p>
<b>Why focus on text to study self-organization?</b>
<ul>
<li>
Long-range cohesion is a necessary feature of text, and it is something that current language processing algorithms don’t possess. This will mean that any improvement on the theoretical front, will directly translate into an application in the field of natural language processing.
</li>
<li>
In text, spotting a misspelled word is easy. In biological (and virtual) organisms spotting an off place cell is hard.<br>
This raises the bar of difficulty so it will give a more rigorous test on the ability of self-organizing systems
</li>
<li>
It’s easier to run experiments on code rather than biological organisms.
</li>
<li>
Since text is a 1-dimensional sequence it can give insights into other types of 1D sequences, for example a sequence of actions performed by an agent.<br>
Having long term coherence for a sequence of actions performed by an agent means that the agent is able to stick to a plan and use information from the distant past to make decisions on what to do next.
</li>
</ul>
</p>
</div>
<!-- <p>
This results not only apply to text generation, but to any generative algorithm, and in reinforcment learning
</p> -->
<d-subsection>Morphogenesis and Language Modelling</d-subsection>
<div>
<figure class="l-side">
<img src="website/images/navigation.webp">
<figcaption>
Side by side comparison between text generation and morphogenesis
</figcaption>
</figure>
<p>
Many phenomena in self-organizing systems are interpreted in the literature as navigation in a morphological space.
<d-cite key="fields2022competency"></d-cite>.
</p>
<p>
In various fields like biology, linguistics and computer science, a morphological space refers to a conceptual space where each point represents a specific configuration or form of an entity. The term "morphological" refers to the form, structure, or shape of something.
</p>
<p>
Navigating a morphological space involves exploring and moving through this conceptual space to observe or analyze the different configurations or forms that an entity can take. It is often used to understand the relationships between different variations or possibilities within a system. Better configurations are associated with a lower energy value, and the system evolves rolling down this energy landscape.
</p>
<p>
One common example is in biology, where the concept of morphological space can be used to describe the range of physical forms that a species can exhibit <d-cite key="stone1997spirit, raup1965theoretical, abzhanov2017old"></d-cite>. By navigating this space, the organism changes shape over time. Both processes of morphogenesis and damage repair can be thought as navigating this morphological space towards the stable adult form of the organism. The damage becomes impossible to heal for the organism if it puts it in a point into the morphological space outside the basin of attraction of the stable configuration.
</p>
<p>
We propose that a number of biological phenomena, especially morphogenetic control, is a kind of navigation of a problem space (e.g., physiological, transcriptional, metabolic, and behavioral spaces) <d-cite key="fields2022competency"></d-cite>. This suggests a research program to understand what basic features of the organization of a swarm of subunits is necessary to enable efficient behavior of the system toward specific goals in various spaces, and potential applications driven by the use of basic biological principles to advance engineering and AI.
</p>
<p>
Drawing parallels between text generation and morphogenesis, text generation starts with a prompt (analogous to an embryo) and progresses, generating the rest of the text until completion. Similarly, correcting errors in text, such as misspelled or missing words, bears similarity to the process of cell level defects, and repairing damage in the whole organism can be associated with keeping a conversation on-topic. Both tasks are error correction procedures that involve navigating a morphological space to rectify errors and achieve the desired final state.
</p>
<p>
Specifically, we are interested in discovering the architectures needed to facilitate the kind of collective behavior that enables efficient competency of the collective, and deploying them in other domains. For our test case of moving across domains and exploring plasticity, we examined the parallels between maintaining a trajectory in morphospace (reliably creating a complex anatomy in novel circumstances) and maintaining a trajectory in linguistic space (keeping the thread of a coherent narrative).
</p>
</div>
<details target_audience="b m" open="">
<summary class="custom_spoiler" style="display: inline-block">
<d-section id="intro">The simplest example of self-organization: The Ising Model</d-section>
</summary>
<p>
Initially, when students delve into the study of the Ising model, they often find it utterly useless and struggle to comprehend why physicists are so fixated on this seemingly mundane model.<br><br>
The problem is that it's hardly ever mentioned that the Ising model is key to understanding how self-organization works. This is because the Ising model is the simplest example of self-organization. Thus if you want to understand how more complex self-organizing systems work you have to start by understanding how the Ising model works.
</p>
<p>
But what exactly is the Ising model? Originally conceived as a simple explanation for magnetization effects in matter, the Ising model explores the interplay between interacting spins. These spins, when in proximity, have a preference to align with each other, leading to an interaction energy denoted by $H=-Js_1\cdot s_2$.<br><br>
What happens when lots of spins interact? Well, it depends on how they interact: The Ising Model comes in many different flavors, we are going to look at the 3 simplest ones: the fully connected, the 1D and the 2D one.
</p>
<d-subsection>Fully connected Ising model</d-subsection>
<p>
In the fully connected Ising model we have L spins, all interacting with each other. The Hamiltonian of the system is
$$H=-\frac JN\sum_{i\neq j}s_is_j$$
Now let’s see how the system behaves as a function of the temperature.
</p>
<figure>
<canvas id="FC_ising-canvas" width="700" height="700"></canvas>
<div>
<label for="temperature-slider">$T=$</label>
<span id="FC_temperature-value" class="value">1.0</span>
<input type="range" id="FC_temperature-slider" min="0.1" max="5.0" step=".01" value="1.0">
</div>
<figcaption>
Simulation of the Fully connected Ising Model
</figcaption>
</figure>
<p>
As you can see, below a certain <i>Critical Temperature</i> $T_\textrm c$ you have an <i>Ordered Phase</i>, while above you have a <i>Disordered Phase</i>
</p>
<d-subsection>1D Ising Model</d-subsection>
<p>
In the 1D Ising Model, spins are arranged in a chain and interact with their neighboring spins. The system's Hamiltonian can be expressed as:
$$
H=-J\sum_{i}s_is_{i+1}
$$
Now, let's examine the behavior of this system in relation to temperature
</p>
<figure>
<canvas id="canvas1D" width="700" height="600"></canvas>
<div>
<label for="temperature">$T=$</label>
<span id="1D_temperature-value" class="value"></span>
<input type="range" id="temperature" min="0.1" max="3" step="0.1" value="1">
</div>
<figcaption>
Simulation of the 1D Ising Model each spin interacts with the spin before and after. For space efficiency the spins have been put in a serpentine shape, as you can see even at low temperature the system stays unordered
</figcaption>
</figure>
<p>
As observed, regardless of how low the temperature is, the system never manages to achieve an ordered phase. This can be proven mathematically with the following theorem
</p>
<b class="theorem">Theorem</b>
<p class="theorem_text">
The 1D Ising chain doesn't have an ordered phase
</p>
<details>
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>
At thermal equilibrium, the Free Energy $F$ is minimized.<d-footnote>This is because the Free Energy measures the amount of useful energy that can be extracted from a thermodynamic system. At thermal equilibrium, no useful energy can be extracted.</d-footnote></p>
<p>
To demonstrate that the ordered state, where all spins align, is not the minimum of Free Energy, we have to examine what occurs when a domain wall forms.
</p>
<figure>
<img src="website/images/domain_wall.png" style="width:100%; height:auto; float:left;">
<figcaption>
The top image shows the ordered phase of the Ising model, while the bottom image depicts a state with a domain wall.
</figcaption>
</figure>
<p>
Upon the formation of a domain wall, the energy change increases by $2J$. This is because only a single link now comprises two opposite spins.<br><br>
If the chain is $L$ spins long, there can be $L-1$ possible locations for the domain wall. Consequently, the size of the configuration space where a domain wall is present, denoted as $\Omega$, equals $\Omega = L-1$. Since entropy is the logarithm of the configuration space, we have $\Delta S = \log(L-1)$.<br><br>
Therefore, the change in Free Energy upon the formation of a domain wall is given by:
$$
\Delta F= \Delta E -T\Delta S= J - T\log(L-1)
$$
In the thermodynamic limit, as $L$ approaches infinity, and since $J$ is a constant, we find that for any temperature $T$, $\Delta F < 0$.<br><br>
Thus no matter how low the temperature (or the noise) is, it is impossible to reach an ordered state
</p>
</div>
</details>
<d-subsection>2D Ising model</d-subsection>
<p>
In the 2D Ising model the spins are arranged in a grid, and each one of them interacts with his neighbors. The system's Hamiltonian can be expressed as
$$
H=-J\sum_{\langle i,j\rangle} s_is_j
$$
Where $\langle i, j\rangle$ means that we sum over all the neighboring spins. And let's see how it behaves
</p>
<figure>
<canvas id="canvas2D" width="700" height="700" style="border: 1px solid #000;"></canvas>
<div>
<label for="temperature2D">$T=$</label>
<span id="temperature2D-value" class="value">1.0</span>
<input type="range" id="temperature2D" min="0.1" max="3" step="0.1" value="1">
</div>
<figcaption>
Simulation of the 2D Ising model. If you hover over the image you the interaction window is displayed in orange. The critical temperature is at $T$=1
</figcaption>
</figure>
<p>
As you can see, when the temperature is sufficiently low, the system will slowly reach an ordered phase.
</p>
<b class="theorem">Theorem</b>
<p class="theorem_text">The 2D Ising model does have an ordered phase </p>
<details>
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>The following proof is also known as <i>Peierls Argument</i> <d-cite key="bonati2014peierls, peierls1936ising"></d-cite>.</p>
<p>
Suppose we start in a configuration where all the spins are all in the $+1$ configuration, we now create a domain with a perimeter $\mathcal P$ made of spins in the $-1$ configuration, and see what happens to the Free Energy.
</p>
<p>
The change in energy will be $$\Delta E= 2J\times \mathcal P$$
Where $J$ is the coupling strenght between spins
</p>
<p>
The change in Entropy is harder to estimate, fortunately we only need an upper bound.<br>
We need to find the number of domain with a perimeter $\mathcal P$.
</p>
<p>
We can always fit a curve of perimeter $\mathcal P$ inside a box $\mathcal P\times \mathcal P$ centered in the origin, so the number of starting points is at most $\mathcal P^2$, then at each step along the perimeter the curve has at most 3 choices for where to step next, so for each starting point there are at most $3^{\mathcal P}$ curves <d-footnote>The choices are usually less than 3 beacuse the curve is a close self-avoiding curve, so to satisfy this constraints usually has less choices. It can be proven that there are at least $2^{\mathcal P/2}$ possible configurations</d-footnote>
</p>
<p>
We can thus conclude that the number of possible configuration $\Omega\le \mathcal P^23^\mathcal P$, and the change in entropy is
$$
\Delta S\le \mathcal P\log3 +2\log\mathcal P
$$
Putting it all toghether we have that
$$\Delta F \ge 2J\mathcal P - T \mathcal P\log3 +O(\log \mathcal P)$$
And if $T$ is sufficently low we have that $\Delta F>0$, thus for low temperature the 2D Ising model at the equilibrium tends to an ordered phase.
</p>
</div>
</details>
<p>
However, there is a caveat here: The theorem doesn't tell us anything about how much time the system takes to reach equilibrium, in fact if the system is sufficiently large it can take more then the age of the universe. In this article we will not worry about the time it will need to reach equilibrium as this lays outside of the scope of this work.
</p>
<p>
If you want some great resources for learning about the Ising model and phase transitions in general I suggest you look at these books. <d-cite key="tong2017statistical, kardar2007statistical"></d-cite>
</p>
<details>
<summary><b>Ising model on a Tree</b></summary>
<p>
Many self-organizing systems seem to gravitate towards a hierarchical structure, in earlier sections of the paper we talked about how cells and words on the text organize into bigger and bigger structures. The same goes for how people organize into teams, companies, and so on.
</p>
<p>
So, let's take a briaf look at the Ising model on a Tree<d-cite key="breuckmann2020critical"></d-cite>. <br>
Interestingly, the Ising model on a tree can be thought as a 2D Ising model in hyperbolic geometry, this means that the same proof for the existence of an ordered phase is pretty much the same as the one in the 2D case, but with the extra flavor of hyperbolic geometry.
</p>
<figure>
<img src="website/images/hyperbolic_tiling.png" width="100%">
<img src="website/images/pentagon_tree.png" width="100%">
<figcaption>
Ising model in a hyperbolic geometry. On the left the triangular hyperbolic lattice is equivalent to a tree with each 2 children nodes, on the right the pentagonal hyperbolic lattice is equivalent to the tree below FIND BETTER IMAGES
</figcaption>
</figure>
<p>
It does have the additional benefit that the average distance between a couple of spins in the same chain increases as $\log N$, so the information to go from one point to another needs to travel shorter distances.
<!-- WRITE SOME OF THE ADVANTAGES ON REACHING THE EQULIBRIUM FASTER (does it?). maybe an interesting note would be the interesting properties of error correcting codes in hyperbolic geometry -->
</p>
</details>
<!-- <details>
<summary><b>1D Ising model with a fully connected spin</b></summary>
<p>
As a last example we will consider a slightly modified 1D Ising chain. In this example there is an extra spin particle $e=\pm 1$ linked to all the other spins
$$
H=-J\sum_{i}(s_is_{i+1} +es_i)
$$
With this simple extra modification the system is capable or reaching an ordered phase. The proof is left as an excercise for the reader.
PUT THE IMAGE OR SIMULATION HERE
</p>
</details> -->
</details>
<d-subsection>Main takeaway from the Ising model</d-subsection>
<p>
We have seen that the ability of a self-organizing system to reach an ordered state can indeed be understood through the lens of phase transitions. In fact developmental biology has studied this phenomenon in the context of planar polarity of cells in a 2D tissue <d-cite key="goodrich2011principles, zallen2007planar, henderson2018planar"></d-cite>, including the use of Ising models <d-cite key="weber2016cellular, tuszynski2012mean, torquato2011toward"></d-cite> to understand collective decision-making in this patterning task.
</p>
<p>
When it comes to self-organizing systems, the concept of an ordered state typically refers to a highly organized and coherent configuration. For example, in the Ising model, an ordered state corresponds to spins aligning in a particular direction. In other self-organizing systems, it may involve patterns, structures, or coordinated behaviors emerging among the individual elements of the system.
</p>
<p>
Understanding the behavior of self-organizing systems in terms of phase transitions allows us to analyze their equilibrium properties, and gain insights into the underlying mechanisms driving the emergence of order. By studying the conditions and factors that influence phase transitions in self-organizing systems, we can better comprehend their ability to reach ordered states.
</p>
<p>
We will now give an overview of a more complicated self-organizing system through the lens of phase-transition: The Hopfield Network
</p>
<d-section>Associative Memory</d-section>
<p>
The brain is capable of storing and retrieving memories, but the way it does that is very different from how the memories are managed in a computer.
<ul>
<li>
<b>Address based memories</b>, which are the kind of memories stored in a computer, work that given a memory address, it gives you back the information stored in that specific address.
</li>
<li>
<b>Associative memories</b> are how memories are stored and extracted in human brains. In an associative memory system, when given an input, the system returns the most associated memory it has stored; furthermore a memory becomes easier to extract the more “hints” or related memories are given.
</li>
</ul>
</p>
<p>
An example of associative memories are search engines. What a search engine does is to return the most associated results to the query given by the user.
</p>
<p>
In general, a associative memory can be modeled as a energy landscape with with a discrete number of minima, each of which corresponding to a stored memory
<figure>
<img src="website/images/associative_memory-1.png" width="100%">
<figcaption>
Example of associative memory, in this case each member of the Simpsons family has his own enegry minimum. When given a guess of the system will converge to the closest Simpsons member. Images adapted from <d-cite key="ramsauer2020hopfield"></d-cite>.
</figcaption>
</figure>
</p>
<p>
Associative memories, just like the memory inside a computer, have finite capacity. How do we quantify this?<br>
Intuitively what ends up happening when you try to store too many separate memories, is that the configuration space becomes too crammed and some of the saved patterns fuse together.
<d-figure>
<img src="website/images/associative_memory-2.png" width="100%">
<figcaption>
Example of a saturated associative memory, when lots of patterns are stored, minima that are close together fuse and are no longer distinguishable. Images adapted from <d-cite key="ramsauer2020hopfield"></d-cite>.
</figcaption>
</d-figure>
</p>
<d-subsection>Hopfield Networks</d-subsection>
<p>
Hopfield Networks <d-cite key="hopfield1982neural"></d-cite> are one of the most famous examples of associative memories, and, it turns out that, when generalized, they are equivalent to the Transformer architecture <d-cite key="ramsauer2020hopfield"></d-cite>. It is the perfect working bench to understand what are the capabilities of associative memories.
</p>
<p>
The standard Hopfield Network with $N$ patterns stored equation is the following
$$
H=-\sum_{ij}W_{ij}s_is_j\quad\textrm{ with }\quad W_{ij}=\sum_p^NX^p_{i}X^p_{j}
$$
Notice how this is similar to the equation of the Hamiltonian of the fully connected Ising model <d-footnote>...Actually if there is just one stored pattern they are equivalent up to a Gauge transformation $s_i\to X_is_i$</d-footnote>. What effectively does is that it creates a number of local minima equal to the number of patterns stored, here is a simulation.
<d-figure>
<canvas id="canvasHop" width="600" height="600"></canvas>
<div>
<label for="temperatureHop-slider">$T=$</label>
<span id="temperatureHop-value">1.0</span>
<input type="range" id="temperatureHop-slider" min="0.1" max="3.0" step=".01" value="1.0">
</div>
<figcaption>
Here is a simulation of the Hopfield model, the two patterns stored are a image of Homer and Marge Simpsons<br>
You can go from a saved pattern to the other by increasing the temperature and relowering (it works 50% of the times). If you dont want to wait for the system to thermalize you can just reload the page.
</figcaption>
</d-figure>
</p>
<p>
As you can see at low temperature the system converges to the closest state, but at high temperature the system is noisy.
</p>
<d-subsection>Hopfield networks with local interaction</d-subsection>
<p>
Many of the self-organizing systems in real life are capable of reaching a highly ordered state even with just local interactions, so what happens if the Hamiltonian of the Hopfield network makes spins interact only if they are close by?
$$
H=-\sum_{\langle i,j\rangle}W_{ij}s_is_j\quad\textrm{ with }\quad W_{ij}=\sum_p^NX^p_{i}X^p_{j}
$$
In this case $\langle i,j\rangle$ means that $i$ and $j$ are in the same window of interaction. Here is a simulation to play with.
</p>
<figure id="local_hop_sim">
<canvas id="canvasHop2D" width="600" height="600"></canvas>
<div>
<label for="temperatureHop2D-slider">$T=$</label>
<span class="value" id="temperatureHop2D-value">1.0</span>
<input type="range" id="temperatureHop2D-slider" min="0.01" max="5.0" step=".1" value="1.0">
<label for="windowHop2D-slider">$W=$</label>
<span class="value" id="windowHop2D-value">5</span>
<input type="range" id="windowHop2D-slider" min="3" max="21" step="2" value="9">
</div>
<figcaption>
Example of a Hopfiled network with just local interaction. By hovering the mouse over the figure you can see all the interaction windows. You can set the size of the interaction windows $W$ by moving the relative slider.
</figcaption>
</figure>
<p>
As you can see the system struggles more to converge to one of the saved patterns, and the smaller the window size is, the harder it becomes.<br>
What is possible to conclude even with this small demo is that the memory capacity of a local Hopfield model is lower, we will later analyze by how much.
</p>
<!-- <d-subsection>Transformers and Hopfield networks</d-subsection>
<p>
MAYBE ITS BEST TO PUT THIS SUBSECTION LATER
</p> -->
<d-section>
What makes a self-organizing system capable of organizing?
</d-section>
<p>
So far we have seen that
<ul>
<li>The 1D Ising model is not capable of self-organizing in to an ordered phase</li>
<li>The 2D Ising model is capable of self-organizing in to an ordered phase, but it takes time</li>
<li>The Fully connected Ising model is capable of self-organizing in to an ordered phase and it does that quickly</li>
<li>The Standard Hopfield Network, which is a generalization of the Fully connected Ising model is capable of self-organizing quicky, but it doesn't have a large memory capacity</li>
<li>Windowed Hopfield wich are a generalization of the 2D Ising model are in principle capable of self organizing, but their memory capacity is lower than the one of standard Hopfield Networks, and it takes time to self-organize</li>
</ul>
But is there a unified way of determining if any system is capable of self organizing?
</p>
<p>
It turns out that a system can be capable of self-organizing if it is topologically equivalent to an Ising model which has a condensed phase at low temperatures.
</p>
<p>
It turns out that there is, but before proving in the general case, we will first prove why autoregressive models for text are incapable of generating text that remains coherent at arbitrary lenghts as it gives us the intuition to prove it in the more general case.
</p>
<details>
<summary style="display: inline-block" class="custom_spoiler">
<h3 class="topo">Topological requirements for self-organization</d-subsection>
</summary>
<p>
First, we generalize the value that the spins can take: instead of $-1$ and $1$, they can now take any integer value from $0$ to $V$<d-footnote>For now we are going to ignore the possibility of having continous valued elements</d-footnote>
</p>
<d-subsection>The 1D case</d-subsection>
<b class="Definition">Definition: Potts Chain</b>
<p class="definition_text">
A Potts chain is $\mathcal C$ chain of spins $s_i$ with $i \in \mathbb{Z}$ and each $s_i$ can assume any integer value from $1$ to the vocabulary size $V$
</p>
<p>
Ok, now we are going to define how this new spins interact with one onother. We want them to interact with only their neighbours, thus only interactions inside a finite Window are non zero.
</p>
<b class="Definition">Definition: Hamiltonian Window</b>
<p class="definition_text">
A Hamiltonian window $H_i$ is an Hamiltonian that acts only on the spins inside his finite window of interaction $\mathcal W_i$. Without loss of generality, we are going to assume that the lowest energy state has a value of zero.
</p>
<b class="Definition">Definition: Local Hamiltonian</b>
<p class="definition_text">
A Local Hamiltonian $H=\sum_i H_i$ is an Hamiltonian that can be written as the sum of several windowed Hamiltonians $H_i$ all with the same window width $\mathcal W$. For sake of simplicity we are going to assume that there exists an upper bound to the highest energy of every windowed Hamiltonian $E^\textrm{max}_i < E^\textrm{max}$
</p>
<figure style="width: 1050px; left: -173px;">
<div class="graph-container">
<svg id="graph"></svg>
</div>
<figcaption>
The time evolution of this chain is governed by a local Hamiltonian which is the sum of several Hamiltonian windows, you can see all the Hamiltonian windows by hovering on top of the chain elements. In this case the elements of the sequence interact in groups of five.
</figcaption>
</figure>
<p>
As you can see this is a very general structure that can be used to model cells, agents, particles, and words arranged in a line or evolving though time.<br>
Ok, now what is a memory in this kind of situation? A pattern stored is a particular configutation of spins with zero energy, or more precisely:
</p>
<b class="Definition">Definition: Memory or Pattern stored</b>
<p class="definition_text">
A stored pattern $\mathcal P$ is a particular sequence of spins $(\dots, s_{-1}, s_0, s_1,\dots )$ such that the energy of the Hamiltonian $H$ associated to this pattern is equal to zero. If more than one stored pattern is present, they can be numbered as $\mathcal P^{(n)}$ = $(\dots,s^{(n)}_1,s^{(n)}_0,s^{(n)}_1,\dots)$.
</p>
<p>
Notice how we don't really care what the spins represent: They can be the words in a text, or the state of the cells of an organism, or the actions performed by an agent, or the value of a title in the stock market.<br>
<!-- <ul>
<li>Words in a text</li>
<li>State of the cells in an organism</li>
<li>Actions performed by an agent</li>
<li>The value of a title in the stock market</li>
</ul> -->
We also don't really care how the values of the Hamiltonian are evaluated: The architecture of the model doesn't matter.
</p>
<p>
We are now going to prove that local Hamiltonians in 1D, no matter how low the temperature, cannot have a condensed phase.
</p>
<b class="theorem">Theorem</b>
<p class="theorem_text">
Let $H$ be a local Hamiltonian with $N$ > 1 stored patterns. At non-zero temperature the system will be unable to converge to single stored pattern.
</p>
<details >
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>
Suppose that our Potts chain starts out equal to our first stored pattern $\mathcal C=\mathcal P^1$. Now we want to know if the formation of a single domain barrier is thermodynamically favorable.
$$
\Delta F=\Delta E -T\Delta S< 0
$$
For that to be true, the Free energy of the system must decrease upon the formation of a domain barrier.
Upon the formation of a domain barrier, The windowed Hamiltonians that intersect it will have a non zero, positive energy. Therefore $\Delta E >0$, however, we know that the energy of each window Hamiltonian is smaller than $E^\textrm{max}$ and no more that $\mathcal W-1$ windows can be affected by a domain wall, therefore
$$
0\le\Delta E\le (\mathcal W-1)E^\textrm{max}
$$
At the same time we know that in a sequence long $L$ there can be $L-1$ possible places where a domain wall can appear, and for each of this possible places it can lead to any of the $N-1$ other patterns saved, therefore there are $(L-1)(N-1)$ possible configurations where the system has a single domain wall. This means that the change of the entropy of the system is
$$
\Delta S=\log [(N-1)(L-1)]
$$
Putting all the equations together we have that
$$
\Delta F\le (\mathcal W-1)E^\textrm{max} - T\log [(N-1)(L-1)]
$$
In the thermodynamic limit ($L\to \infty$) we have that the right hand side of the equation becomes eventually negative, therefore the domain barriers are inevitable.
</p>
</div>
</details>
<p>
We have now seen that 1-dimensional self-organizing systems can't really exists<d-footnote>We are assuming that there are at least two different ground states, </d-footnote>.
<!-- Sure, there are worms and snakes that are more or less one dimensional, but they are usually less than 50 times longer than their diameter. -->
</p>
<d-subsection>Autoregressive models</d-subsection>
<b class="Definition">Definition: Autoregressive Model</b>
<p class="definition_text">
Given some input tokens $\{s_{i_\textrm{first}},\dots, s_{i_\textrm{last}}\}$ an auto-regressive model $M$ return a $V$-dimensional $(p_1,\dots,p_V)$ vector with an estimate of the probability for the next element in the sequence $i_\textrm{pred}=i_\textrm{last}+1$
$$
p_c=M(s_{i_\textrm{pred}}=c|s_{i_\textrm{first}},\dots,s_{i_\textrm{last}})
$$
where $p_c$ is the porbability that the next spin $s_{i_\textrm{pred}}$ is equal to the $c$-th token of the vocabulary
</p>
<figure>
<canvas id="canvasAR" width="700" height="20" style="border: 1px solid #000;"></canvas>
<div>
<label for="temperatureAR">$T=$</label>
<span class="value" id="AR_temperature-value">1.0</span>
<input type="range" id="temperatureAR" min="0.1" max="3" step="0.1" value="1">
</div>
<figcaption>
Simulation of an autoregressive Ising model. In this example to predict the next element in the sequence $s_{i+1}$ the model only looks at the last known element of the sequence $s_i$, and the vector with a estimate of the probability for $s_{i+1}$ is just a 1-dimensional vector equal to $p=\left(1+e^{J/T}\right)^{-1}$. We can write the Hamiltonian windows of the system wich are equal to $H_t=-Js_ts_{t+1}$
</figcaption>
</figure>
<p>
We will now make the parallelism explicit with Natural Language Processing. Most state of the art language models of today are Autoregressive, meaning to predict the next word (or token) they look at all the previous elements on the text.<br>
However, as the context increases, it becomes computationally infeasible to predict the next token, and only the last few words are fed as input to the model.
</p>
<p>
Intuitively, this leads the model to completely forget what is left outside of its input window
</p>
<p>
Keep in mind that how well you can predict the next element in a senquence can be considered a measure of intelligence. After all, what any intelligent agent does is just finding the best next action.
</p>
<p>
But how do we prove <i>mathematically</i> that autoregressive models with finite context are bound to lose the thread?<br>
We show that Autoregressive models, no matter the architecture, are equivalent to a thermodynamic system with a specifically crafted local Hamiltonian
</p>
<p>
The significance of demonstrating the incapability of autoregressive models doesn't lie solely in the theorem itself. Intuitively, we already understand that autoregressive models, due to their inherent loss of information over time, are unable to generate lengthy texts. What holds importance is that through mathematical proof, we gain the knowledge necessary to establish the coherence of text generation through alternative methods.
</p>
<b class="theorem">Theorem</b>
<p class="theorem_text">
It is possible to associate local Hamiltonian to any auto-regressive model
</p>
<details >
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>
Let $M$ be out autoregressive model, and $s_{i_\textrm{first}},\dots,s_{i_\textrm{last}}$ our input sequence.
then the probability that the next token in the sequence is equal to $c$ is.
$$
p_c=M(s_{i_\textrm{pred}}=c|s_{i_\textrm{first}},\dots,s_{i_\textrm{last}})
$$
Through Boltzmann's equation it's possible to turn a probability distribution to some energy levels
$$
p_c=\frac 1Ze^{-\frac{E_c}{T}}\quad\textrm{with}\quad c=1\dots V
$$
Without loss of generality, we can assume $T=1$ and set the energy associated with every prediction turns out to be
$$
E_c=-\log p_c +\textrm{const} \quad\textrm{with}\quad c=1\dots V
$$
Where we can set the constant in such a way that the lowest energy state has a energy equal to zero.<br>
We can now define a windowed Hamiltonian
$$
H_{i_\textrm{pred}}(s_{i_\textrm{first}},\dots,s_{i_\textrm{last}},s_{i_\textrm{pred}})=-\log\left[ M(s_{i_\textrm{pred}}|s_{i_\textrm{first}},\dots,s_{i_\textrm{last}})\right]+\textrm{const}
$$
And the full local Hamiltonian can now be seen as the sum of all the $H=\sum H_{i_\textrm{pred}}$ of the sequence.<br>
The generation process of an autoregressive model can now be seen as sampling from the Boltzmann distribution given from
$$
p_\textrm{sequence}=\frac 1Z
e^{-\frac 1TH(\textrm{sequence})}
$$
</p>
</div>
</details>
<p>
So, we now know that autoregressive data generation is equivalent to sampling form the Boltzmann distribution of a local Hamiltonian (usually at $T\approx 1$), and since 1-dimensional local Hamiltonians are inherently unstable it means that autoregressive models with a finite context window are too.
</p>
<!-- <d-subsection>The 2D case</d-subsection>
<p>
2-dimensional self-organizing systems are plentiful in nature (cities, the human economy before globalization, ecosystems, those fungi with long roots, those trees ...). Most of them are 2D because the surface of the earth is also 2D. REWRITE THIS
</p>
<p>MAYBE ITS NOT THAT IMPORTANT TO TALK ABOUT IT...</p> -->
<d-subsection>More complex Topologies</d-subsection>
<p>
As you have seen from the examples, determining whether or not an ordered phase can exists boils down to a counting problem.
<ol>
<li>
Start with the system being equal to one of the patterns stored
</li>
<li>
Create a domain wall
</li>
<li>
Count the number of such domain walls
</li>
<li>
Estimate the energy gained by the system
</li>
<li>
See if there exists a finite temperature for which the free energy decreases as the size of the domain walls goes to infinity
</li>
</ol>
</p>
<p>
This can be applied to systems with very different topologies, we are now going to explore that
</p>
<p>
The most general way to consider the interaction between the elements of our system is if the interactions are represented by a graph Hamiltonian
</p>
<p>
For simplicity we are going to consider that the links of the graph are fixed in time.<d-footnote>
This is a simplification and loads of self-organizing systems have a interaction graph that changes with time, for example most of the people you interact daily are different from the people you interacted in your childhood.<br> Sometimes even the topology of the entire graph changes, for example before the invention of means of rapid communication each person interacted only with the people nearby thus the topology was 2-dimensional. Today, with the internet we can interact with people far away.<br>
However, we can say that even though the interaction graph changes with time, it usually changes slowly relative to the dynamics of the time evolution of its elements.
</d-footnote></p>
<b class="Definition">Definition: Graph Hamiltonian</b>
<p class="definition_text">
Let G be the adjacency matrix of a graph, then a Graph Hamiltoninan H is a Hamiltonian that can be written as
$$H=H*G$$
where the ($*$) operator is the element wise multiplication.
</p>
<p>
Now to see if a ordered phase can ever exist, we must generalize Peierls argument. To see if an ordered phase exists we only care how the enegry and the entropy increase when we create a domain wall
</p>
<b class="Definition">Definition: Entropy Scaling</b>
<p class="definition_text">
Let $H$ be a Graph Hamiltonian, and $P$ be the perimeter length, or surface area of a domain wall, as the perimeter length increases, the number of possible configurations of domain barrier increases, thus increasing the entropy of the system $\Delta S$.
We say that the Entropy gained scales as $f_S$ if
\[
\Delta S=O(f_S(P))
\]
</p>
<b class="Definition">Definition: Energy Scaling</b>
<p class="definition_text">
Let H be a Graph Hamiltonian, and $P$ be the perimeter length, or surface area of a domain wall, as the perimeter length increases, the the Higher and Lower bound of the energy gained $\Delta E$ scale as respectively $O(f_E^\textrm{high}(P))$ and $O(f_E^\textrm{low}(P))$. If $f_E^\textrm{high}=f_E^\textrm{low}\equiv f_E$ we say that the energy gained scales as $f_E$
\[
\Delta E=O(f_E(P))
\]
</p>
<p>
Once we know how the entropy and the energy scale, we know if there is a ordered phase
</p>
<b class="Theorem">Theorem</b>
<p class="theorem_text">
If $O(f_S)=O(f_E)$ there exists a ordered phase
</p>
<details>
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>
Let's define $O(f)=O(f_S)=O(f_E)$, the change in Free energy is now
$$
\Delta F= \Delta E -T\Delta S=\lim_{P\to \infty}O(f(P))-TO(f(P))
$$
If we now do $\lim_{T\to0}$ the term on the right disappears, therefore the creation of a domain wall increases the free energy and so a coherent phase is favored
</p>
</div>
</details>
<p>
This means that we don't really care about the exact formula for $E(P)$ and $S(P)$, but only on how they scale as $P\to \infty$, so we can rule out some details that can lead to complications, for example the number of stored patterns
</p>
<b class="Theorem">Theorem</b>
<p class="theorem_text">
Let $H$ be a Graph Hamiltonian with $N>1$ stored patterns. At thermal equilibrium, the ability to converge to a ordered phase doesn't depend on $N$
</p>
<details>
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>
The change in entropy due to the creation of a domain barrier can always be written as
$$
\Delta S= \log\left[
(N-1)N_\textrm{barriers}\right]=\log N_\textrm{barriers} + \log (N-1)
$$
Where $N_\textrm{barriers}$ is the number of barriers of a certain shape. In the thermodynamic limit, the term proportional to the number of barriers increases, while the one proportional to the number of patterns stored stays constant, therefore can be ignored as it doesn't change the entropy scaling
</p>
</div>
</details>
<p>
The importance of this last theorem is that, since the number and the shape of stored pattern doesn't affect the thermodynamics of the problem we might as well stick with a system with just 2 ground state equal to all spin ups and all spin downs. Like in the Ising Model!<br>
We are also go one step furter we can also show that we don't really care on how strong, complicated the interaction are, and neither on how big the interaction windows are.
</p>
<b class="Theorem">Theorem</b>
<p class="theorem_text">
Let $H=\sum_i H_i$, if there exists two energies $E_\textrm{max},E_\textrm{min}$ which are the biggest and smallest non-zero energy level of all the windowed Hamiltonians $H_i$. At thermal equilibrium, the ability to converge to a ordered phase does not depend from the energy levels and the window sizes
</p>
<details>
<summary>
<b>proof</b>
</summary>
<div class="proof_text">
<p>
Let $\mathcal W$ be the biggest window size, and 1 the smallest window size of any $H_i$, and let $P$ be the perimeter length of our domain wall. The energy gain by creating such a domain wall is bounded by
$$
P E^\textrm{min}\le\Delta E\le \mathcal W PE^\textrm {max}
$$
In both cases we have that
$$
E=O(P)
$$
</p>
</div>
</details>
<p>This means that, even if we have a really complicated local Hamiltonian, with large window sizes, large vocabulary size and many patterns stored, we can just look at a simple next-neighbour Ising model to see if we have a coherent phase</p>
</details>
<d-subsection>Main Takeaway</d-subsection>
<p>
In this section we saw how the only thing that matters to determine if a system can be capable of self organizing at thermal equilibrium is the topology of the system.
</p>
<p>
This also means that to see if a system has the required topology to have an ordered phase you need to make sure that a topologically equivalent Ising model has an ordered phase.
</p>
<!-- <b class="Theorem">Theorem</b>
<p class="theorem_text">