generated from jhudsl/OTTR_Template
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path03-Binary_data_to_computations.Rmd
148 lines (77 loc) · 15.6 KB
/
03-Binary_data_to_computations.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
```{r, include = FALSE}
ottrpal::set_knitr_image_path()
```
# Binary Data to Computations
Now that we are familiar with transistors and binary data, we will next discuss how computers process and store data.
```{r, echo = FALSE, fig.alt= "Learning Objectives: 1. Describe what a computer chip is and the basics of how it works, 2. Understand how computers use and store data, 3. Explain the difference between RAM and long-term storage like hard disk drives and solid-state drives, 4. Recognize what aspects of a computer are hardware or software 5. Describe how the operating system is involved in telling the computer what computations to process, 6. Explain how computing systems have evolved to what they are today", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g11383d0152c_0_0")
```
### **CPU** - Central Processing Unit
The CPU is often called **the brain** of the computer. It has some confusing additional names, because it is such an important and prominent part of the computer, as it performs and orchestrates computational tasks [@braunl_central_2008, @CPU_redhat, @Wikipedia_CPU_2021].
It is sometimes called a **processor** or **microprocessor** (but technically these terms include both the CPU and other elements). The CPU is often what people are referring to when they describe a **"computer chip"** (which again technically includes other elements) [@braunl_central_2008, @CPU_redhat, @Wikipedia_CPU_2021].
The CPU is made up of several components, a few that are particularly important (two of which we have discussed):
* Arithmetic Logic Unit (ALU)
* Registers
* Control Unit (CU)
A group of these components together is called a **core**. Multiple cores together are also referred to as CPU**s**. As you can see describing this can get kinda tricky.
The component that we haven't yet discussed, the Control Unit, coordinates the ALU and the data stored in the registers, so that the ALU can perform the operations on the right data stored in the registers at the right time [@braunl_central_2008].
```{r, fig.align='center', echo = FALSE, fig.alt= "Figure of how the processor/chip/ or CPU which includes the ALU, registers and the Control Unit are grouped together.", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g1076ceee833_0_1")
```
Modern computers now have multiple cores. What does this mean?
This means that there are multiple groups of the above components that can each process data within the same computer. A dual core CPU is a chip with two cores. A quad-core CPU is a chip with 4 cores and so on. This allows modern computers to perform multiple tasks at the same time instead of sequentially, such as 4 tasks simultaneously on a current typical laptop (with 4 cores). This makes our computers much faster than they used to be [@Wikipedia_CPU_2021].
In addition to the main CPU or CPUs or cores (chose your favorite name), computers may be equipped with specialized processors called [GPUs](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#) which stands for graphics processing units that are especially efficient at tasks involving images [@GPU]. Thus often tasks that involve images are done using the GPU(s) and not the CPU(s). This frees up the CPU(s) to continue on the tasks not involving images more efficiently. Note however, that GPU processors are also "generally programmable" (meaning they can work with different types of data) and can also be used to perform tasks that don't involve images [@GPU]. It's also really good at doing something called parallel processing, which means dividing up a single task into multiple pieces that can be run simultaneously and thus allowing for running a task more efficiently overall. People also use GPU graphics cards which can add additional GPUs for more computational power [@GPU].
```{r, fig.align='center', echo = FALSE, fig.alt= "A computer chip is also sometimes called the CPU. Inside this CPU or chip are often multiple cores.", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf6e632d05f_0_381")
```
Hyper-threading is also an option for improving processing. This technology started in 2002 by Intel [@Wikipedia_hyper-threading]. The idea is that while part of the same core is idle or waiting for a given task, another part of the same core can work to perform another task. This isn't as efficient as a having another core or CPU, but it does improve efficiency [@hyperthreading; @Wikipedia_hyper-threading]. So many modern computer chips actually use all three efficiency boosters (having multiple cores, having GPUs, and using hyper-threading). Thus a chip with 4 cores that also has hyper-threading can work on 8 tasks simultaneously. Since it is now much easier to produce chips with multiple cores and because there are some security concerns with hyper-threading, the field seems to be moving away from hyper-threading [[@hyperthreading; @Wikipedia_hyper-threading].
```{r, fig.align='center', echo = FALSE, fig.alt= "A computer chip that has hyper-threading can perform more tasks by single cores more efficiently. Thus a 4 core chip with hyper-threading can work on 8 tasks simultaneously.", out.width= "100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gfb2e21ecdc_0_75")
```
### **Memory or RAM** - short-term memory
OK, so we have already talked about how data can be stored in the registers within the CPU. This data or memory is used directly by the CPU during operations or tasks. However, our CPUs need additional quick access to instructional data to tell the CPU what to do to perform the operations and what data to use. This is also the data in a file that we are working with at a particular moment in time [@RAM_ComputerHope]. This bring us to [RAM](https://www.computerhope.com/jargon/r/ram.htm), which stands for **Random Access Memory**. It is often simply referred to as **memory**. Ram is similarly made out of transistors and capacitors like the registers within the CPU, but it is located nearby but outside of the CPU [@RAM_ComputerHope; @RAM_HowStuff_Works]. This type of memory is that it is temporary. Data is stored in RAM for only a short time, while your computer is running a task on it, but then it disappears. Due to the fact that what is stored disappears, this type of memory is also called volatile. This is why when you are working on a file, but forget to save it, you might lose your work [@RAM_ComputerHope; @RAM_HowStuff_Works].
For more information about how RAM works, check out this [website](https://computer.howstuffworks.com/ram.htm) [@RAM_HowStuff_Works].
### **Storage** - long-term memory
We can also store data that we aren't directly using when our computer is performing operations. So for example, our excel files and word files that aren't currently in use. This type of memory is called storage and is sometimes referred to as long-term or non-volatile memory because electricity is not required to preserve this data. This type of memory is stored using [hard disk drives (HDDs) also called hard drives](https://www.computerhope.com/jargon/h/harddriv.htm) or more recently [solid-state drives (SSDs)](https://www.computerhope.com/jargon/s/ssd.htm). The reason accessing this memory is slower than accessing data stored in RAM is that it is located further away from the CPU and data needs to be transferred from the storage to the CPU along a wire when a user wants to perform operations on such data. In addition the right data needs to be found out of all of your files, which also takes some time. Furthermore, the way in which data is retrieved from HDDs and SSDs is slower than that of RAM. This type of storage allows for much larger data capacity than RAM and it is also cheaper [@hard_drive; @hard_drive_works].
Hard disk drives store memory using [magnetic methods](https://www.extremetech.com/computing/88078-how-a-hard-drive-works) [@hard_drive_works], while solid-state drives store memory using chips that have guess what??
They are made of yet again the important basic building block of computers - tiny bees! Oops, I mean transistors yet again, just like the CPU chip! See, those transistors are really important.
SSDs allow for much faster reading and writing of files, as well as increased reliability. However, they are more expensive and they also wear out eventually [@SSD].
Here's a great explanation for how HDDs work and the difference with SSDs. It will also introduce the concept of [caching](https://en.wikipedia.org/wiki/CPU_cache), which allows for faster use of data from storage for the CPU. It is a special kind of memory that's even faster and closer to the CPU than RAM [@Wikipedia_cache_2021]:
```{r, fig.align="center", fig.alt = "video", echo=FALSE, out.width="100%"}
knitr::include_url("https://www.youtube.com/embed/wI0upu9eVcw?start=22")
```
See this [link](https://computer.howstuffworks.com/solid-state-drive.htm) for more information about how SSDs work, and see [here](https://arstechnica.com/information-technology/2012/06/inside-the-ssd-revolution-how-solid-state-disks-really-work) for an in depth explanation.
### Hardware and software
So far we have talked about the [hardware](https://simple.wikipedia.org/wiki/Computer_hardware) of a computer, which is the physical components of a computer, while [software](https://simple.wikipedia.org/wiki/Software) is the code that tells the hardware how to function [@Wikipedia_hardware_2021; @Wikipedia_software_2021].
Software is also important to know about. Most importantly it is useful to know about operating systems.
### Operating systems
The [operating system](https://en.wikipedia.org/wiki/Operating_system) (sometimes simply called the OS) is a set of code or software that translates user interactions with the computer to tell the hardware (including memory and the CPU) of the computer what tasks the user wants the computer to perform and when [@Wikipedia_OS_2021].
You can think of this as the basic code to keep the computer running and functional and to allow the user to use other forms of software, such as applications [@Wikipedia_OS_2021]. Applications are specialized software programs like Microsoft Word, or an internet browser like Chrome that allow a user to do specific tasks on the computer. So your OS is what allows you to name, rename, move and save files. It helps you to keep track of memory and decides what memory should be used when and to run all of your application software. It also allows you to talk to other devices like printers or other computers.
Examples of commonly used operating systems on computers and phones are:
* Microsoft Windows (such as Windows 10, Windows 11 etc.)
* macOS (notice the OS here - it might make more sense now why it is called this)
* Unix
* Linux
* Android
Recall that we previously talked about how computers today are often called 64-bit? Operating systems are also designed in this way. A 64-bit operating system expects the hardware of the computer to allow for processing at least 64 bits of data at a time (the word size) [@Wikipedia_word_length_2021]. If we have registers of at least this length in the CPU, than we can in fact perform operations on data that may be up to 64 bits in length. This also means that we can perform operations on values that take up less than 64 bits. This can be important because if you try to use an operating system that expects a longer word size than the hardware can accommodate, for example a 64-bit operating system on a 32-bit computer, this will not work. Application programs are also designed according to different word sizes and again you need to choose options that are equal to or less than the word size that your CPU can accommodate [@ComputerHope_64-bit].
### Historical context
Previously, back when a university might have one single computer, as they were so large and expensive (they didn't use those nifty small transistors of today), computers didn't have sophisticated operating systems and only one task could be performed at a time by one person at a time. Back then, tasks were just manually started, prioritized, and scheduled by humans. Tasks or programs (including sometimes data) could be printed or punched on cards (called punchcards, punch cards or punched cards) that would be loaded into the machine. Data and code would be manually indicated by punching or creating a hole in the card in certain locations. For example, columns might indicate different numeric or alphabetical values. It could really be a pain for users if they accidentally dropped the cards for the program they wanted to run, as you can imagine [@punched_card_2021]!
```{r, fig.align='center', echo = FALSE, fig.alt= "Image of a punchcard", out.width="100%"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf96b1d997a_0_1")
```
There were many [different kinds](https://www.jkmscott.net/data/Punched%20Cards.html) of punch cards over time, see @scott_collection_2016 for a collection.
The first operating system just allowed different programs to be run sequentially without someone manually starting each one. Now our personal computers can perform multiple tasks at the same time and schedule future tasks that our automatically run.
Check out this [video](https://www.youtube.com/watch?v=KG2M4ttzBnY) if you want to learn more about how these punch cards worked. See @OS_2017 for more information about operating systems and @punched_card_2021 for really interesting information about the history of punched cards.
Also check out @hardware_history_2021 for really interesting and more extensive history about how computer hardware was developed.
Also, here is some fascinating additional reading on the role of women as computer operators starting in the 1940s. Initially computer science was actually thought of as a field for women, however this changed over time (and now women and gender minorities are hopefully becoming more represented) :
* [Article titled: Woman pioneered computer programming. Then men took their industry over](https://timeline.com/women-pioneered-computer-programming-then-men-took-their-industry-over-c2959b822523) [@visions_women_2017]
* [Article titled: Untold History of AI: Invisible Women Programmed America's First Electronic Computer The “human computers” who operated ENIAC have received little credit](https://spectrum.ieee.org/untold-history-of-ai-invisible-woman-programmed-americas-first-electronic-computer) [@untold_2019]
## Conclusion
We hope that this chapter has given you some more knowledge about how computers actually function.
In conclusion, here are some of the major take-home messages:
1) The central processing unit or CPU contains the Arithmetic Logic Unit or ALU which performs operations on data using transistor logic gates
2) A CPU chip can contain multiple cores (also called CPUs) allowing a computer to perform multiple operational tasks at a time
3) RAM is the memory for a computer for the tasks that its currently working on and is very fast to access because it is close to the CPU
4) Storage on a hard drive or solid state drive is the memory for a computer that is long-term, such as files that you aren't currently working on. It takes longer to access data from this memory as it has to travel to the CPU
5) The operating system is what tells the computer what the user wants the computer to do and when
Now that we know how a computer works in general, we will next discuss computing capacity, especially for informatics research, and how servers and cloud computing can help.