When they don’t have energy, they forget all
All data must be stored in secondary memory
Today secondary memory is
The disks store a huge amount of data
To organize it we use files
To organize the files we use folders
also called directories
Like the main memory, a file is just a list of bytes
The meaning of the file depends on the context
Most of the times, the name of the file suggests a context
Besides the data itself, files have metadata
That is, data about the data. For example
The names of the files are “words”: a serie of letters, numbers and some symbols
Technically, a filenames is a String or list of characters
Maximum length of a filename is 250 characters
Avoid “/”, “:”, “+”, “|”, “<”, "*“,”>" quotes
Use letters (A-Z, a-z), numbers (0-9), “.”, “-”, "_“,” "
In some systems small caps and BIG CAPS are not equivalent.
Be systematic and coherent
If the filename includes “.”, the text after it is called extension
In Microsoft Windows (c) extensions are usually 3 letters
At low level there is only one type of file
For us, it is useful to separate in two:
Among binary files we have EXE files, which are programs for Windows
When disks became big, people could put thousands of files on them
But then finding the files became an issue
Directories came as a solution.
At that time (70’s) people used “phone directories”: big books with the name and phone number of everybody
In the 80’s, with graphical screens, people drew folders instead of directories.
A directory is a set of files. A file belongs to a single directory
A directory also can contain sub-directories
In Windows we also have separated disks, labeled A:
, B:
, C:
but none uses A:
or B:
nowadays
Directories are organized in a hierarchy
“Parent” directories contain “child” directories
There cannot be any “cycle”
The topmost folder is called root directory
In Windows, each disk has a different tree and different roots
Each program in the computer knows about at least one folder: current folder
These are the files that the program can “see” immediately
To see other files, our program has to indicate in which folder to find the file
Is like using given names and family names.
When you are at home, family names are implicit
A program can change its current directory
When accessing a file X outside the current directory, we have to specify the folder of the file
There are two ways of doing that:
The (absolute or relative) list of folders is called the path of the file. It is a string, where a /
between each folder name. In Windows \\
also separates folder names.
An absolute path starts with the character /
.
In Windows we may also start with the disk label as C:/
Example:
/home/user/data
/Documents and Settings/user/Desktop
C:/Program Files
Easy. They do not start with /
Each directory knows his parent directory. It is called ..
If necessary the current directory is also known by .
Many disciplines, including Molecular Biology and Genetics, have become more and more data driven.
Starting now, we will use RStudio, a free software for data analysis
Most users of R are molecular biologists, but it is also used by economists, psychologists and marketing specialists
You have to install R and RStudio in your computer
You have to execute RStudio. Then
RStudio, as almost all serious programs, is controlled by the keyboard
The mouse can be used for some shortcuts, but the real deal is the keyboard
A goal of this course is to become comfortable with the keyboard
These tools are for people who read books and don’t watch TV