Computing Infrastructures: Hardware and Software Systems Overview

Document from Politecnico Di Milano about Computing Infrastructures. The Pdf, a summary of a university course, details hardware and software infrastructures, including data centers, servers, and virtualisation concepts for Computer Science students.

50 Pages

A.A. 2023/2024

Computing

Infrastructures

Summary of the course by prof.

Manuel Roveri

Federica Laudizi – Federico Liuzzi

POLITECNICO DI MILANO

Computing Infrastructures | Federica Laudizi – Federico Liuzzi 
 
 pag. 1 
Hardware Infrastructures .......................................................................................................... 5 
1  System level ...................................................................................................................... 5 
1.1  Introduction .............................................................................................................. 5 
1.1.1  Data centers ......................................................................................................... 5 
1.1.2  Edge computing .................................................................................................... 5 
1.1.3  Embedded PCs ..................................................................................................... 5 
1.1.4  IoT ........................................................................................................................ 5 
1.2  Data Center architectures ......................................................................................... 6 
1.3  Warehouse scale computers ..................................................................................... 6 
1.3.1  Regions ................................................................................................................ 7 
1.3.2  Availability zones .................................................................................................. 7 
2  Node level ......................................................................................................................... 8 
2.1  Servers ..................................................................................................................... 8 
2.2  Form factors ............................................................................................................. 8 
2.2.1.1  Racks ............................................................................................................... 8 
2.2.1.2  Towers ............................................................................................................. 8 
2.2.1.3  Blade ............................................................................................................... 9 
2.2.2  Hardware accelerators .......................................................................................... 9 
2.2.2.1  GPUs ............................................................................................................... 9 
2.2.2.2  TPUs .............................................................................................................. 10 
2.2.2.3  FPGAs ............................................................................................................ 10 
2.3  Storage ................................................................................................................... 11 
2.3.1  Hard disk drive .................................................................................................... 11 
2.3.2  Solid state drive .................................................................................................. 13 
2.3.2.1  Garbage Collection ......................................................................................... 14 
2.3.2.2  Flash Translation Layer ................................................................................... 14 
2.3.2.3  Wear Leveling ................................................................................................. 14 
2.3.2.4  Summary ....................................................................................................... 14 
2.3.3  Storage Systems ................................................................................................. 15 
2.2.3.1 DAS ................................................................................................................... 15 
2.2.3.2 NAS................................................................................................................... 15 
2.2.3.3 SAN................................................................................................................... 15 
2.3.4  RAID Disks .......................................................................................................... 16 
2.3.4.1  RAID level 0 (striping) ...................................................................................... 17 
2.3.4.2  RAID level 1 (mirroring) ................................................................................... 18 
2.3.4.3  Combined RAID levels .................................................................................... 18 

Preview

A.A. 2023/2024 Computing Infrastructures Summary

Computing Infrastructures Summary of the course by prof. Manuel Roveri Federica Laudizi - Federico Liuzzi POLITECNICO DI MILANOComputing Infrastructures | Federica Laudizi - Federico Liuzzi 1

Hardware Infrastructures Overview

System Level Computing

Introduction to Computing Infrastructures

What is a computing infrastructure? It is a technological infrastructure that provides hardware and software for computation to other systems and services.

A computing infrastructure may be formed of various layers, each layer being formed by computing units of diverse computational power and/or power usage. The lower the layer, the closer we are to the physical world we are measuring and/or interacting with.

RAM Data Center 50x - 100x Edge Computing Systems PC 0.0001x - 0.0005x 0.05x - 0.1x 20x - 50x 100x - 1000x Computation Speedup Embedded PCs Embedded Devices Internet of Things 0.01x-0.05x 0.1x - 0.5x Each layer (or type of computing unit) has its own characteristics.

Data Centers Characteristics

Data centers are big conglomerates of interconnected machines working together, using this kind of solution has the advantage of being overall less expensive, having high storage and computational capacity with higher reliability. However, it requires being constantly connected to the internet (with high bandwidth!), there is higher latency and power consumption, together with privacy and security issues.

Edge Computing Systems

Moving lower along the layers, we find ourselves in the realm of edge computers. They provide high computational capacity, higher security levels and lower latencies. They too require relatively a lot of power and an internet connection.

Cloud Servers Edge Computing (EC) HOmi O O Vehicle-2-Vehicle Smart Devices Intelligent Monitoring Systems

Embedded PCs Functionality

Embedded PCs are low power and low performance units, but still capable of running complex OSes such as Linux. They allow for a pervasive implementation of a system, high performances for the buck, high availability of hardware and large support communities around each board. They will still need a substantial amount of power to work (around the tens of watt) and often some hardware design is needed.

IoT Device Limitations

IoT devices are highly pervasive, super low cost and low power but are affected by substantial limitations in computing ability, memory and thus being difficult to program.

Data Center Architectures

BUILDING AND INFRASTRUCTURE SERVERS NETWORKING SYSTEMS COOLING STORAGE SUPPLY POWER RECOVERY FAILURE In the last few decades, computing and storage have moved from PC-like clients to smaller, often mobile, devices, combined with large internet services. Traditional enterprises are also shifting to this new paradigm of cloud computing.

This shift comes with an advantage to vendors, it enables them to deliver products in SaaS fashion, allowing for faster application development, easier improvements, and fixes of the software, together with more robust deployment due to the use of few well tested configurations.

Some workloads require so much computing capability that they are a more natural fit in datacenter (and not in client-side computing), these can vary from web search services and machine learning. (As an example, it would take 355 years to train GPT-3 on the fastest GPU on the market).

Warehouse Scale Computers (WSCs)

The trends toward server-side computing and widespread internet services created a new class of computing systems: Warehouse-scale computers (WSCs). In one of these systems, a program may be a whole internet service, consisting of tens or more individual programs that interact in complex ways.

Data centers are buildings where multiple servers and communication units are co-located because of their common environmental requirements and physical security needs, and for ease of maintenance.

Traditional data centers typically host a large number of small- or medium- sized applications, each application is running on a dedicated hardware infrastructure that is de-coupled and protected from pag. 6Computing Infrastructures | Federica Laudizi - Federico Liuzzi other systems in the same facility, because of this, applications tend not to communicate each other.

These data centers host hardware and software for multiple organizational units or even different companies.

WSCs, instead, belong to a single organization, use a homogeneous hardware and system software platform, and share a common systems' management layers.

WSCs run a smaller number of large applications (or internet services). The common resource management infrastructure allows significant deployment flexibility.

Traditional Datacenters Warehouse-Scale Computer Customer A Customer B Customer C Service Service Service Service Service Service Service Internet Internet Internet Internet Internet Provider V IT IT IT WSCs were initially designed for online data-intensive web workloads (such as managing Amazon's ecosystem), when the companies that developed these systems realized the power of these infrastructures, they started selling public clouds computing systems (e.g., Amazon, Google, Microsoft). Such public clouds do run many small applications, like a traditional data center but these applications rely on Virtual Machines (or Containers), and they access large, common services for block or database storage, load balancing, and so on, fitting very well within the WSC model.

These big data centers are often replicated in various physical locations to reduce user latency and improve serving throughput, nevertheless a single request will be processed within one data center.

Geographic Regions in WSCs

The world is divided into Geographic Areas (GAS) defined by Geo-political boundaries (or country Region1 Region2 borders) or determined by data residency. In each GA >100 miles there are at least 2 Computing Regions (CRs). <2ms Customers see regions as the finer grain discretization of the infrastructure, this means that multiple DCs in the same region are not exposed.

Each region has a perimeter defined by the maximum latency to reach its DC.

Availability Zones (AZs)

Availability Zones (AZs) are finer grain location within a single computing region, they allow customers to run mission critical applications with high availability and fault tolerance to datacenter failures. Each availability zone is fault-isolated with redundant power, cooling, and networking.

Region AZ1 AZ2 AZ3 AZ4 Customer 1 No multi AZs Customer 2 Multi AZs Customer 3 Multi AZs Customer A Customer A Customer B Customer C Internal IT customer A Provider IT pag. 7

Node Level Hardware

Servers and Form Factors

Racks
Towers
Blade

Hardware Accelerators

GPUS
TPUs
FPGAS

Storage Systems

Hard disk drive
Solid state drive
- Garbage Collection
- Flash Translation Layer
- Wear Leveling
Storage Systems
- DAS.
- NAS.
- SAN
RAID Disks
- RAID level 0 (striping)
- RAID level 1 (mirroring)
- Combined RAID levels

pag. 1

Raid level 4
Raid level 5
Raid level 6.
Raid disks: reliability calculations

Networking Architectures

Switch-centric architecture
- Traffic definitions
- Three Layer Network
  - Top of Rack
  - End of Row
- Leaf-spine
- Pod-based model
Server centric architecture

Building Level Infrastructure

Cooling Systems

Power Consumption

Software Infrastructures

Virtualization Concepts

Process and System VMs

System VMs
Process VMS

Virtualization Types

Multi-programmed systems
Emulation
High-Level Language VM
Classic System VMs
Hosted VM
Whole System VMs

Virtualization Implementation

Hypervisors

Para Virtualization
Full Virtualization

Containers

Computing Architectures

Cloud Computing

From Cloud to Edge and Fog Computing

pag. 2

Methods for Infrastructure Management

Dependability in Systems

Introduction to Dependability

Effects of failure
When to think about dependability
Application areas
Providing dependability
Implementations levels
Tradeoffs

Dependability Attributes

Reliability
Availability
Related indices
Terminology
Reliability block diagram
Standby redundancy

Scalability and Performance

Performance Modeling

Model-based
Queue Networks
- Arrival
- Service
- Queue
- Population
- Routing

Operational Laws

Utilization's law
Little's Law
Interactive Response Time law
Visits
Forced Flow Law
Response and residence times
General Response Time Law.

Performance Bounds

Bounding Analysis
Bottlenecks

pag. 3

Notation
Asymptotic bounds
- Open Models
- Closed Models

Useful Formulas

Hardware Infrastructure Formulas

Hard Disk Drives Calculations

RAID Calculations

Dependability and Performance Formulas

Reliability and Availability Metrics

Operational Laws Equations

Asymptotic Bounds Formulas

RAID Levels Calculations

pag. 4

Can’t find what you’re looking for?

Explore more topics in the Algor library or create your own materials with AI.