Intel Dontated Server Rebuild at Montana State University CS Department [PART-1]

Introduction

About a year ago a large shipment of servers arrived at MSU for the use in a new grid and cluster system for students and faculty of the Montana State University Computer Science department. Once the servers were opened from their boxes we realized that installing the operating system of our choice was going to be complicated considering the units were "engineering samples" and were never used in production. Also each unit was a 1U form factor with no external media device (CDR/DVD/etc). On the newer PIV-Xeon systems (referred to as Server Type 1) the install was not as complicated as originally anticipated. On the other hand the installation of Gentoo on the older Dual-PIII systems (referred to as Server Type 2) was more complicated due to the alpha version of the BIOS installed on the systems.

The tutorial you are currently reading has been broken into 6 different parts to help with captivating the reader at the same time as describing short-cut techniques that were learned along the way (ha ha, captivating). By the end of this tutorial one should grasp the basic concept of how to build a cluster and some simple tricks to make maintaining a cluster easier in the life of a cluster administrator.

Part 1. Introduction and Purpose
Part 2. Tools and Equipment
Part 3. Deployment
Part 4. Maintenance
Part 5. Research

Purpose

The purpose of building a cluster at MSU was to provide services for research students and faculty. This request included installation of a grid and/or cluster to facilitate in research in parallel computing, artificial intelligence, and other broad computer science research topics. Specific examples include: parallel algorithms, neural networks, adaptive services, networking protocols, and many others. Some of these research areas require a tightly coupled system, where others do not. In the latter case we chose to use a semi-closed source grid based software. For the more tightly coupled system we chose an open source solution that is obvious to even the newest Linux users. The tools that were evaluated will be included in the next part and will be more detailed into how and why we chose these methods.

We have chosen to leave the details about the research to the end of the article to allow the reader to get to the "meat" of installing a cluster or grid on their own. One will find that little is needed in equipment to build a cluster or grid. With the current turn over time of computers in the university system, older computers are being thrown away when they are still "computationally" strong. In other words, many of the older systems that make up the MSU CS-Grid and Clusters are older PIII and PIVs. This means most nodes (fancy term for 1 computer or CPU) contain generic consumer memory, chipsets, video devices, and hard drives to keep the cost down. The turn around time for computers is three years on average for computers in the university and although half of the population of computers are replaced every year to make the total, the numbers of retired computer systems is still staggering. One solution for this issues was to build a grid or cluster to create a large computational source for research.

In the world of green today anyone can see the need for recycling. Interviews with employees at Google and Yahoo have reviled that even the corporate world uses their retired workstations for computing power. Although there are some funny jokes about late hours at Google trying to find out which server died in a mess of workstations. It is obvious that the computing market is in need of the holy grail breakthrough of performance gain through parallel computing. For many years artificial intelligence has requested even more and more processing power. The massive gaming world of today requires some of the most efficient and complicated hardware and software that is starting to hit a brick wall (unless you read and agree with the latest article by Knuth). With this has come a surge of job openings in the grid, cluster, parallel computing ... [place keyword here] ... markets around the world.