[UCSD CSE120]文件系统-filesystem

[UCSD CSE120]文件系统-filesystem

本文是我在上UCSD的 CSE 120: Principles of Operating Systems (Winter 2020) 整理的笔记，这一课主要介绍了操作系统里面文件系统以及文件的概念，包括文件系统的组成以及访问具体文件/文件夹的方法。

Basic

Preview
- File: logical unit of storage, container of data
  - Accessed by (name, region within file)
- File System: a structured collection of files
  - Access control, name space, persistent storage
File System Abstraction
- Repository of objects
  - Objects are data, programs for system and users
  - Objects referenced by name, to be read/written
- More than a repository
  - Objects can be r/w, protected, shared, locked
  - Contains I/O devices: disk, keyboard, display
  - Processes: memory
- Pesistent: remains “forever”
- Large: “unlimited” size
- Sharing: controlled access
- Security: protecting information
Hierarchical File Name Space
- Name space is organized as a tree
  - Name has components, branches start from root
  - No size restrictions
  - Intuitive for users
- Example: UNIX “Pathnames”
  - Absolute: /a/b/c
  - Relative: b/c relative to /a
  - Not strictly a tree: links

File

Attributes
- Type (recognized by system or users)
- Times: creation, accessed, modified
- Sizes: current size, (maximum size)
- Access control (permissions)
Operations
- Creation: create, delete
- Prepare for access: open, close, mmap
- Access: read, write
- Search: move to location
- Attributes: get, set(e.g., permissions)
- Mutual exclusion: lock, unlock
- Name management: rename
Read/Write model
- (file descriptor)fd = open(fname, usage)
- nr = read(fd, buf, size)
- nw = write(fd, buf, size)
- close
Memory-mapped model
- Map file into address space
  - mmap(addr, n, …, fd, …)
  - addr = mmap(NULL, n, …, fd, …)
- Use memory ops
  - x = addr[5]
  - strcpy(addr, “hello”)
- Issues
  - Efficient for multiple process sharing memory
  - If memory is written, how is file actually updated?
Access Control
- Who can access file
- What operations are allowed
- User interface must be simple and intuitive
- Example: Unix
  - r/w/x permissions for owner, group and everyone

File System Implementation

Goals
- Archival storage
  - Keep forever, including previous versions
- Support various storage technologies
  - Disks (different types), remote disks, …
- How to best achieve and balance:
  - Performance
  - Reliability
  - Security
Storage Abstraction
- Hide complexity of device
  - Model as array of blocks of data
  - Randomly addressable by block number
  - Typical block size: 1KB (also 4KB ~ 64KB)
    - Generally multiple of disk sector size: 512B
- Simple interface
  - read(block_num, mem_addr)
  - write(block_num, mem_addr)
Typical Implementation Structure
- Three major regions: Sequence of blocks for each one
- Region 1: File System Metadata
  - Information about file system
  - Sizes
    - Files in use, free entries
    - Data blocks in use, free entries
  - Free lists (or bitmaps)
    - File control blocks
    - Data blocks
- Region 2: File Metadata (File Control Blocks)
  - Information about a file
  - Referenced by number/index
  - Contains
    - Attributes: type, size, permissions,…
    - References to data blocks: disk block map
  - Note: many file control blocks may fit in single storage block
  - Example:
    - Number: 88 (index in file control block array)
    - Size: 4096 bytes
    - Permissions: rw-r–r–
    - Data blocks: set of indexes into storage array, may not be contiguous (such as 567, 7076, 9201)
- Region 3: Data Blocks
  - File contents

File control blocks

Keeping track of allocated blocks
- Contiguous blocks
  - Single sequence of blocks
- Extents
  - Groups of contiguous blocks
- Non-contiguous blocks
  - Blocks individually named
Keeping track of free blocks
- Free Block Map
  - Compact if lots of free regions of space
- Doubly Linked List
  - Easy to keep ordered due to fast inserts and deletes
- Bit Map
  - Fixed size regardless of fragmentation
File Name to File Control Block
- Users access files using file names
- Problem: how to translate
  - from file name: “/sports/baseball/Padres”
  - to file control block number: 88
- Must parse file name
- Each branch corresponds to a directory/folder
- Each directory/folder may itself be a file

Example: UNIX v.7 Block Map

Block Map UNIX v.7
- Array of pointers to data blocks
- 13 Pointers
  - 10 direct: references 10 data blocks
  - 1 singly-indirect: references $n$ data blocks
  - 1 doubly-indirect: reference $n^2$ data blocks
  - 1 triply-indirect: reference $n^3$ data blocks
- $n$ depends on how many pointers fit in a block
  - Example: 256 4-byte pointers will fit in 1KB block
Implementing UNIX Directories
- Table where each entry contains
  - name and attributes
  - name and pointer to file control structure
- Unix (name and pointer) - pre-BSD
  - Each entry: branch name (14), i-node number (2)
  - Berkeley Unix uses a more complex scheme to support long names
Example of parsing names in UNIX
- Given pathname: /sports/baseball/Padres
  - Inode 0 block map points to data block(s) of root directory
  - Look up “sports” in root directory to get inode 22
  - Inode 22 blocks map points to data block(s) of sports directory
  - Look up “baseball” in sports directory to get inode 15
  - …

Storage

File Systems use disks for storage
- pros: persistent, random access, cheap
- cons: slow (mechanical)
- Performance
  - Accesses are time expensive: 5 ~ 20 msec
    - Rotational latency: 2 ~ 6 msec (5200 ~ 15000 RPM)
    - Seek time: 3 ~ 13 msec
    - Transfer rate: 100+ MB/sec
  - Reduced accesses by
    - reading multiple blocks in one access (read ahead)
    - maintaining a block cache
  - Cluster related blocks to reduce seek time
Solid State Drives (SSD)

NAND-based flash memory, non-volatile
Unaffected by shock, magnetic fields; no noise
Limited number of writes, wears out with age

Performance

Caching
- Data blocks of files
- File system metadata (keep in memory)
- File metadata
  - Currently active file
  - Recently used
- Block maps
- File names
  - Name to file metadata translations
Clustering
- Blocks that exhibit locality of reference
  - Directory, and files within that directory
  - The inodes of the directory and files
- Strategy
  - Place related blocks close to each other: clustering
  - Reduces disk head movement and seek time
Block size
- trade off
  - the larger the block, the better the throughput
  - The smaller the block, the less wasted space
- technology trend
  - Disk density is increasing faster than disk speed
  - Make disk blocks larger: 1KB $\rightarrow$ 8KB, 64KB, 1MB

Reliability

Consistency
- Buffer cache reduces disk access
- If system crashes, block modifications lost
- To improve file system consistency
  - write out modified blocks periodically
  - write out critical blocks
- Critical blocks: file system meta-data
  - Directories, i-nodes, free block lists
Journaling
- Journal: log of file (or file system) updates
- For every update, create log entry
- Write log entry out to disk (part of journal)
- If crash occurs:
  - Look at journal entries
  - Check if mods properly reflected in file system
  - Update appropriately