Showing posts with label raid. Show all posts
Showing posts with label raid. Show all posts

Tuesday, June 18, 2013

Crash during md array recovery

I just got a machine crash while my RAID5 array was rebuilding. After the system come back, the array was marked as inactive:
md2 : inactive sdb1[0] sde1[4](S) sdc1[3] sdd1[1]
      7814047744 blocks
No matter what I tried, it didn't want to activate the array. For example, I tried the following command:
mdadm --assemble --force --scan /dev/md2
But there wasn't any message, and no change. I also tried reboot, that didn't helped either. The thing is, I have very important data on that array and I didn't want to experiment with commands to risk data loss. Googling around, I found several possible solutions, for example this, this, or this. Even though those had some similarities, none helped me. In the end, I realized that the array is started (it is shown in /proc/mdstat) and that fact is the reason why previous assemble command didn't do anything, and also why I was receiving errors like this one:
# mdadm -A /dev/md/2 /dev/sd{b,c,d,e}1
mdadm: cannot open device /dev/sdb1: Device or resource busy
In the end, what helped, is that I stopped the array using:
mdadm -S /dev/md2
and then I started it again. First, I tried:
# mdadm --assemble --scan /dev/md2
mdadm: /dev/md2 assembled from 3 drives and 1 spare - not enough to start the array while not clean - consider --force.
which obviously didn't help. But, as suggested, I tried with --force option:
mdadm --assemble --force --scan /dev/md2
After this, the array was again in inactive state. Then, I stopped it and started it again, and suddenly, it was active and in recovery mode. I'm not sure if this restart was necessary. The main reason I did it was that I thought that array was brought in recovered state, which I thought would destroy data on it and so I quickly stopped it. But, after I stopped it I realized that I was mistaken, so I started it again, and this time it was in recovery mode.

One more thing that bothered me was if the array was properly rebuilding, i.e. writing corrected data on a failed device and not on some healthy one. Namely, looking into /proc/mdstat shows disks status [UU_U] which could mean that third disk, i.e. sdd1, was rebuilt, which is wrong. But then, using the command:
mdadm --misc --detail /dev/md2
I saw that sde1 was rebuilt which was what I expected. And while I'm at the output of this command, it is interesting to know that MD subsystem knows which disks are out of sync using epoch number that is displayed too.

Finally, as disks became larger and larger, rebuilding array takes more and more time. My array of 4 disks each of 2T size, takes about 10 hours to rebuild. This is a lot of time. So, maybe its time to switch to btrfs or zfs which have integrated RAID functionality and thus rebuilding array in their case is much faster. Alternatively, MD subsystem should be thought to take a note on which blocks changed and thus only rebuild those blocks instead of a whole disk.


Sunday, February 5, 2012

Getting CentOS on RAID after using text mode installation...

Well, this was a hack. I got a firewall to install with only 512MB RAM which isn't enough for graphical install. And, using text install means that there are no customizations available, in particular, no customized disk partitioning. But since I don't install anything important without RAID I had to somehow do it. The options are:
  1. Temporarily add more RAM only during installation process.
  2. Use kickstart file.
  3. Juggle with partitions.
Option 1 wasn't acceptable since I didn't have any extra RAM modules available, and option 2 seamed too complicated as I don't have USB flash drive to store kickstart file on it and network install is a bit too much on home network. So, I decided to go with option 3. Note that there was one more thing that was in favor for option 3, namely one disk has capacity of 80GB while the other 250GB, which means there is extra space I can use as a temporary storage. Still, even if I didn't have it, option 3 is still viable with a bit more juggling.

So, the general idea is as follow:
  1. Install CentOS on a temporary partition.
  2. Create RAID array and move CentOS there.
  3. Fix boot.
Those three steps are not in strict sequence and so clearly separated, as we'll see, but are logically grouped. Also, just to clarify, a helper partition is a disk space after 80th GB on a second (larger) disk!

Install CentOS on a helper partition

This is a first step and it's easy. Boot from DVD and start installation process. Except from one small detail. Namely, how to persuade installer to use helper partition when it does almost everything automatically? Well, that's actually easy to solve. After Anaconda starts, but before doing anything, switch to second virtual terminal (Alt+F2) and use fdisk to create partitions for RAID. In my case those were:
  1. /dev/sda1 and /dev/sdb1 of size 256MB for /boot partition
  2. /dev/sda2 and /dev/sdb2 of size 2G for swap
  3. /dev/sda3 and /dev/sdb3 that take the rest of space to fill 80GB and that will be root (/) partition.
This will take first 80GB on both disks, and leave empty space on a second disk. Now, go back to installer and continue installation. When the installer asks you where to install Linux, select "Emtpy space". Note that later I was thinking that it was better if I created another partition so that free space is smaller. This speeds up the process of installation as creating file systems is faster! Anyway, for a suggestion on minimal CentOS installation you can look what I wrote in this post.

Create RAID arrays and move CentOS there

After the installer finishes, don't boot into a new system. Boot again from DVD and select rescue mode. Also, allow installer to search for existing installations and select to mount them in read-write mode! Finally, select shell from a menu that appears.

Now, create RAID arrays. If you used partitions as I did, then the following sequence of commands will do the work.
mdadm -C /dev/md0 --metadata=0.90 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
This creates first RAID1 array for boot partition. Note that I'm using metadata version 0.90. This is because grub doesn't understand later formats! That "little" fact costed me a lot of time!
mdadm -C /dev/md1 --level=1 --raid-devices=2 /dev/sda2 /dev/sdb2
mdadm -C /dev/md2 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
Those two commands create RAID1 arrays for swap and root. Maybe at this point you'll need to activate arrays (i.e. it isn't done automatically by above mdadm commands):
mdadm --activate --scan
Now, create file systems on those arrays:
mkfs.ext4 /dev/md0
mkfs.ext4 /dev/md2
mkswap /dev/md1
RAID arrays are now created. The next step is to move installed CentOS from temporary partition. First, you have to mount destination filesystems:
mkdir /mnt/s
mount /dev/md2 /mnt/s
mkdir /mnt/s/boot
mount /dev/md0 /mnt/s/boot
After they are mounted, copy all the files:
cd /mnt/sysimage
rsync -av bin boot etc home lib lib64 opt root sbin tmp usr var /mnt/s/
Note that I skipped some in-memory file systems like proc, dev, and similar ones. You should only create those directories without content as their content is recreated during each boot and held in memory:
cd /mnt/s
mkdir dev media mnt proc selinux srv sys
That's it for filesystems. We need now to adjust /etc/fstab and /etc/grub.conf files as they reference temporary partition/filesystem used by installer. So, change /etc/fstab to contain the following lines/filesystems:
/dev/md2    /       ext4    defaults   1  1
/dev/md0    /boot   ext4    defaults   1  2
/dev/md1    swap    swap    defaults   0  0
There will be also lines starting with tmpfs, devpts, sysfs and proc. Leave those as is and remove all the other lines. As for the /etc/grub.conf file, you need to modify any occurrences of (hdN,M). Those will be in two places. One in splashimage line, that one isn't so important. And the other in line starting with root keyword. That one IS important! Also, remove from the line that starts with keyword kernel any word that contains substring LV (logical volume!). After a change, this line should look something like this:
kernel /vmlinuz-2.6.32-220.el6.x86_64 ro root=/dev/md2 LANG=en_US.UTF-8 crashkernel=auto KEYBOARDTYPE=pc KEYTABLE=us
Note that this is a single line! The important part is root=, others might differ if you selected some other language or keyboard layout during installation process. Also, I removed keywords rhgb and quiet. Those prevent me to see kernel messages during the boot process. And those messages could be very important, especially in this case when we are doing something that could impact early boot process itself!

Finally, remove temporary partition. Do that using fdisk, just delete that partition. Note that this will take effect on next reboot, so nothing will happen and the computer won't crash. :)

And, that's it for moving CentOS to RAID partition.

Fixing boot loader

This is the last step. Boot from DVD and select rescue mode. Again allow installer to scan disk for Linux partitions. But this time installer should find RAID partitions and CentOS on them. Mount them in read-write mode and enter shell again. Execute the following command:
chroot /mnt/sysimage
This will activate targeted CentOS on RAID.

First, we have to recreate /etc/mdadm.conf. Note that if you don't do that the system won't boot. This troubled me until I figured out that that file contained stale data filled by Anaconda and I didn't refresh its content. So, to refresh it, do the following:
mdadm --examine --scan > /etc/mdadm.conf
Open /etc/mdadm.conf in the editor and add the following line at the beginning:
DEVICES /dev/sd*[0-9] /dev/hd*[0-9]
Also be careful, I typed etc instead of dev and it costed me two additional reboots. :)

Now, recreate initramfs image. Do that with the following command:
mkinitrd -f -v /boot/initramfs-2.6.32-220.el6.x86_64.img 2.6.32-220.el6.x86_64
Be careful, -f forces mkinitrd to overwrite existing initramfs file, so maybe it would be good to make a copy of that file, just in case.

One final step and that's it, reinstall the boot loader! Do this with the following commands:
grub-install /dev/sda
grub-install /dev/sdb
If the two commands return error (some problem with stage1 or stage2 files) then do it "manually" like this:
# grub
grub> root (hd0,0)
grub> setup (hd0)
grub> root (hd1,0)
grub> setup (hd1)
grub> exit
Note that you type text in bold, while everything else is response from the system. And that's it! Reboot the system, this time from the hard disk, and you should have minimal CentOS installation on RAID partitions!

Thursday, November 24, 2011

Re-adding SATA disk to software RAID without rebooting...

It happened second time that on the one of the servers I'm maintaining one of the SATA disks suddenly was disconnected from the server. Looking into log files, I found the following error messages:
kernel: mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) cb_idx mptbase_reply
kernel: mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) cb_idx mptscsih_io_done
kernel: mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) cb_idx mptscsih_io_done
last message repeated 62 times
and then a lot of messages like the following one:
kernel: sd 0:0:1:0: SCSI error: return code = 0x00010000
kernel: end_request: I/O error, dev sdb, sector 1264035833
This triggered RAID to log the following type of messages:
kernel: raid5:md0: read error not correctable (sector 28832 on sdb2)
and finally to remove failed disk from array:
kernel: RAID5 conf printout:
kernel:  --- rd:3 wd:2 fd:1
kernel:  disk 0, o:1, dev:sda2
kernel:  disk 1, o:0, dev:sdb2
kernel:  disk 2, o:1, dev:sdc2
kernel: RAID5 conf printout:
kernel:  --- rd:3 wd:2 fd:1
kernel:  disk 0, o:1, dev:sda2
kernel:  disk 2, o:1, dev:sdc2
I yet need to find out what happened, but in the mean time the consequence of those error messages was that one disk was disconnected, and removed from RAID array, and I received the following mail from the mdmonitor process on the server:
This is an automatically generated mail message from mdadm
running on mail.somedomain

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sdb2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc2[2] sdb2[3](F) sda2[0]
      1952989696 blocks level 5, 256k chunk, algorithm 2 [3/2] [U_U]
     
unused devices:
Since this happened exactly at noot which is a time when everybody uses mail server it isn't exactly an option to reboot the server, not unless I absolutely have to. In this case I decided that I'm going to reboot it after work hours and in the mean time I can either just wait or try to rebuild RAID. If I wait, there is a risk of another disk failing and that would bring the server down. So, as this happened already, and I knew that the disk is OK and it will be re added after reboot, I decided to try to do that immediately and on a live system.

So, the first thing is to request kernel to rescan SATA/SCSI bus in order to find "new" devices. This is done using the following command:
 echo "- - -" > /sys/class/scsi_host/host0/scan
After this, disk reappeared, but the problem was that the name now is /dev/sde and not /dev/sdb. To get disk always the same name I would need to mess with udev, which I was not prepared to do now. (And, btw, I have recently read about a patch that allows you to do just that, to rename existing device, but I think it was rejected on the ground that this kind of stuff is better done in user space, i.e. modifying udev rules.)

Now, the only problem was to "convice" RAID subsystem to re add disk. I thought that it would find disk and attach it, but eventually, I just used the following command:
mdadm --manage /dev/md0 --add /dev/sde2
The command notified me that the disk was already a member of array and that it is being re-added. Afterwords, sync process was started, that will take some time:
 # cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde2[3] sdc2[2] sdb2[4](F) sda2[0]
      1952989696 blocks level 5, 256k chunk, algorithm 2 [3/2] [U_U]
      [=>...................]  recovery =  7.6% (74281344/976494848) finish=204.9min speed=73355K/sec
    
unused devices:
It would be ideal for transient errors, like this one, that RAID subsystem memorizes only changes and when the disk is readded to apply only those changes. But, I didn't managed to find a way how to do that, and I also think that that functionality is no implemented at all.

Anyway, after synchronization process finished this is the content of /proc/mdstat file:
#cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sde2[1] sdc2[2] sdb2[3](F) sda2[0]
      1952989696 blocks level 5, 256k chunk, algorithm 2 [3/3] [UUU]
     
unused devices:
As you can see sdb2 is still here. Trying to remove it isn't possible because there is no corresponding device node:
# mdadm --manage /dev/md0 -r /dev/sdb2
mdadm: cannot find /dev/sdb2: No such file or directory
[root@mail ~]# mdadm --manage /dev/md0 -r sdb2
mdadm: cannot find sdb2: No such file or directory
So, I decided to wait until reboot.

Edit: I did reboot few days ago, and after reboot everything came to normal state, i.e. it was before disk was removed from array!

[201211114] Update: Again this happened almost exactly at noon. Here is what was recorded in log files:
Nov 14 12:00:02 mail kernel: mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) cb_idx mptbase_reply
Nov 14 12:00:07 mail kernel: mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) cb_idx mptscsih_io_done
Nov 14 12:00:08 mail kernel: mptbase: ioc0: LogInfo(0x31110d00): Originator={PL}, Code={Reset}, SubCode(0x0d00) cb_idx mptscsih_io_done
Nov 14 12:00:08 mail kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000) cb_idx mptscsih_io_done
Nov 14 12:00:08 mail kernel: sd 0:0:2:0: Unhandled error code
Nov 14 12:00:08 mail kernel: sd 0:0:2:0: SCSI error: return code = 0x00010000
Nov 14 12:00:08 mail kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Nov 14 12:00:08 mail kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000) cb_idx mptscsih_io_done
Nov 14 12:00:08 mail kernel: mptbase: ioc0: LogInfo(0x31130000): Originator={PL}, Code={IO Not Yet Executed}, SubCode(0x0000) cb_idx mptscsih_io_done
Nov 14 12:00:08 mail kernel: sd 0:0:2:0: Unhandled error code
Nov 14 12:00:08 mail kernel: sd 0:0:2:0: SCSI error: return code = 0x00010000
Nov 14 12:00:08 mail kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Nov 14 12:00:08 mail kernel: raid5: Disk failure on sdc2, disabling device. Operation continuing on 2 devices
Nov 14 12:00:08 mail kernel: sd 0:0:2:0: Unhandled error code
Nov 14 12:00:08 mail kernel: sd 0:0:2:0: SCSI error: return code = 0x00010000
Nov 14 12:00:08 mail kernel: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Nov 14 12:00:08 mail kernel: raid5:md0: read error not correctable (sector 1263629840 on sdc2).
Nov 14 12:00:08 mail kernel: RAID5 conf printout:
Nov 14 12:00:08 mail kernel:  --- rd:3 wd:2 fd:1
Nov 14 12:00:08 mail kernel:  disk 0, o:1, dev:sda2
Nov 14 12:00:08 mail kernel:  disk 1, o:1, dev:sdb2
Nov 14 12:00:08 mail kernel:  disk 2, o:0, dev:sdc2
Nov 14 12:00:08 mail kernel: RAID5 conf printout:
Nov 14 12:00:08 mail kernel:  --- rd:3 wd:2 fd:1
Nov 14 12:00:08 mail kernel:  disk 0, o:1, dev:sda2
Nov 14 12:00:08 mail kernel:  disk 1, o:1, dev:sdb2
And then, the system by itself re-scanned array, but it didn't re add disk to array:
Nov 14 12:00:44 mail kernel: mptsas: ioc0: attaching sata device: fw_channel 0, fw_id 6, phy 2, sas_addr 0x8a843926a69f9691
Nov 14 12:00:44 mail kernel:   Vendor: ATA       Model: WDC WD1001FALS-0  Rev: 0K05
Nov 14 12:00:44 mail kernel:   Type:   Direct-Access                      ANSI SCSI revision: 05
Nov 14 12:00:44 mail kernel: SCSI device sde: 1953525168 512-byte hdwr sectors (1000205 MB)
Nov 14 12:00:44 mail kernel: sde: Write Protect is off
Nov 14 12:00:44 mail kernel: SCSI device sde: drive cache: write back
Nov 14 12:00:44 mail kernel: SCSI device sde: 1953525168 512-byte hdwr sectors (1000205 MB)
Nov 14 12:00:44 mail kernel: sde: Write Protect is off
Nov 14 12:00:44 mail kernel: SCSI device sde: drive cache: write back
Nov 14 12:00:44 mail kernel:  sde: sde1 sde2
Nov 14 12:00:44 mail kernel: sd 0:0:4:0: Attached scsi disk sde
Nov 14 12:00:44 mail kernel: sd 0:0:4:0: Attached scsi generic sg2 type 0
So I had to manually issue the following command:
mdadm --manage /dev/md0 --add /dev/sde2

Friday, October 28, 2011

Installing minimal CentOS 6.0 distribution

This post starts a three part series in which I'll describe in detail how to install Zimbra Open Source mail server on 64-bit CentOS distribution. The first part deals with CentOS installation itself. The second part talks about setting up split DNS server, and finally, the third part will talk about setting up Zimbra server itself.

Before describing installation, I'm going to define the environment, and some basic parameters, in which this server is going to be deployed. Note that you can implement this network topology using VMWare or some similar product and in that way you can test that everything is working before doing actual installation.

So, the network topology I'm going to assume is given in the following figure:

Network topology for Zimbra Mail server
What you can see in this figure is a future Zimbra server (on the right) with the IP address 10.0.0.2/24. This is the server I'm going to describe how to install. We'll also assume domain example-domain.com. For the moment no additional parameters are needed, and in later post I'll introduce all the necessary parameters on an as needed base.

Preinstallation considerations

When I perform CentOS installation I usually do minimal install because in that way I'm getting more secure system. Then, as need arises, I add additional packages. Sometimes it happens that even minimal installation (as defined by CentOS installer) has some packages I don't need and so I remove them. But this state changes from release to release. For example, at one time minimal installation included isdn4k-tools which I didn't need as I was connecting my servers to Ethernet LAN. Apart from security concerns for such behavior, there used to be additional reason to make minimal installation. Namely, to save disk space. But because of the abundance of available disks space today, that reason is not valid any more, at least not for the majority of cases.
 
Performing base system installation is in principle very easy. The potential problem is that you need to anticipate some parameters, three of which we are going to discuss in some detail. Those are file systems (and disks), network configuration and 32 or 64-bit installation.

For file systems the following details have to be considered: partitions sizes, use of logical volume management and RAID. There is also question of exact file system type to use, but I won't discuss that one here. ext4 suffices in majority of cases.

When we talk about sizes of different directories, specially problematic ones in general could be /var and /home. But also, for example, /opt, or any other directory with application data and/or logs. Directories like /etc, /usr, /lib, and some others are in general constant in size during the system's deployment. What I would suggest is that you start with a minimum disk space required and when some of the aforementioned partitions has to have more space, you just create new partition, move content of the directory this partition will replace, and finally mount the partition. Additionally, the application you intend to install could significantly influence how your partitions are laid out. In any case, I don't allow installer to do manual partitioning by itself.

I try to avoid logical volume management if I can, if nothing else, just to remove one additional layer of complexity. But, in certain scenarios you'll have no choice but to use it, unless of course you want to have some nightmares later. When, for example, you are installing a production system that is going to be used for a long time and there will be a large quantity of data (but you are uncertain how much exactly), in that case I would suggest that you use logical volume management. So, we have two extremes, on one side there is a static system that wouldn't grow much in size with a simple file layout, and on the other side there is heavily loaded server with lots of recorded data and/or very complex file system layout. Note that for small systems, maybe medium ones too, where you can have few hours of downtime any decision you made can later be changed. For example, you start without LVM, and then decide that you need to implement it so you add LVM partition under a single directory only, or you change everything apart from the boot partition. It is relatively easy to do so and I'll describe that process in some future post.

Finally, there is also question about the use of RAID, should you use it or not. There are several different possibilities:
  1. You are installing system on a local disk subsystem, with or without hardware RAID support.
  2. You are using remote disk storage.
  3. Installation is performed within virtualized environment (e.g. VMware, Citrix Xen, KVM)
In case you are using virtualized environment then you don't have to use RAID, actually, it is an overkill. The assumption is that the host itself has RAID to protect all the hosted virtual machines. Still, there is one exemption, and that is a production server running within ESXi. In case you are using ESXi with local storage and you don't have hardware RAID, then you have to implement RAID in virtual machine. But I suppose that this case will be rare as it signals that you are using some poor hardware for production environment. Nevertheless, it is possible to do so, and maybe I'll describe that scenario in some future post too.

Next, if you are installing test server or something not particularity important, RAID is definitely an overkill. And finally, if you are using remote storage, then also it is not necessary to use RAID because remote storage takes care of that (or at least it should).

This leaves us with the scenario of using local storage, installing an important server and a question should we use software and hardware RAID (if there is no hardware RAID, there is obviously no dilemma). I personally prefer software RAID for a simple reason that I'm allowed to access individual disks using smartctl tool to monitor their health status. This is also a better solution for a number of low cost RAID solutions because those are, in essence, software RAIDs. Still, when you have some high end hardware that has very good hardware RAID and/or you need high performance then your route to go is definitely hardware RAID.

So, the last thing to consider is how to combine software RAID and LVM? I personally prefer using md RAID, and on top of that I install LVM.

While we are at disks we have to also consider swap partition size too. I doubt that more than few gigs of swap is of any use. It used to be a rule to have twice as much of swap as you have RAM. But in case you have 64G of RAM, to have 128G of swap is exaggeration. I usually put 2G, maybe 4G at most. Simply, this can be considered as a space for dormant applications. But if you have so many dormant applications that they fill so much of a swap, then you should probably tune your applications. And yes, if swap is used as a short term space for applications (i.e. they are swapped out, and then shortly after that swapped in) that is also not good as it severely impacts the performance of a server. Finally, RAM is cheap, buy more RAM, not larger disk.

Second consideration, after file systems, we also have to consider network. Basically, there are only two options: dynamic or static addresses. That choice is relatively easy. If you are installing some sort of a server, machine that will be accessed by another machines/people, than it's better to assign static IP address. With dynamic address it could happen that DHCP server is unreachable for some reason and that server loses its IP address and stops functioning. On the other hand, if you are installing workstation, that is, a machine that will access other machines, then better option in majority of cases is to use dynamic assignment of addresses, i.e. DHCP. It brings some flexibility into the system, with a price in lower security (which also can be adequately solved).

Finally, the third consideration is whether to install 32 or 64 bit system. I strongly suggest that you install 64 bit system. Only in case you are running some application that requires 32 bit operating system and it is only supported on 32 bit operating system, you should use 32-bit system. In all other cases, as I said, use 64 bit. Here I implicitly assume that the hardware you use is 64 bit. If it is not, then that's also the case when you'll use 32-bit operating system. Note that it is possible to run 32-bit application on a 64-bit operating system! That is, it is not mandatory to install 32-bit installation to use 32 bit applications!

So, that's all about preinstallation considerations. Let us proceed to base system installation.

Installing base system

After all the preinstallation considerations, I'll assume that we are going to install 64-bit system in a virtualized environment and that we don't expect this system to grow much in terms of the installed size and recorded data. So, I won't use RAID and neither I'm going to use LVM. Furthermore, it's definitely a server, so we'll use static IP address. Also, we'll assume that you have 8G of RAM in server, and we'll also allocate 2G of swap and 4G for a single root partition (no special /var, /home, etc.). Actually, minimal installation takes about 600MB, but this will grow for about 200M after first update. So, you have to have at least 1G for base system install.

Start by putting CD and booting the machine (or attaching ISO image and starting virtual machine).

After the installation starts, it asks you the following series of questions:
  1. Should the installer check CD/DVD? In case you are using ISO image there is certainly no need to do that. If you are using real DVD media, then decide for yourself. I usually skip this step. After this question, graphical installation starts. Note that if you don't have enough RAM, you'll be forced into text based installation which has severely restricted number of options, e.g. you can not manually partition hard disk! Take a look into this post in case you did installation in text mode and want to switch to RAID.
  2. After you select Next you are first asked for language to be used during installation as well as for keyboard layout. The two are used only during the installation process. Select the ones that suite you, and select Next.
  3. Storage types used for installation. There are two options: Basic Storage Devices and Specialized Storage Devices. The first one you use when you are performing installation on local disks, while the second one is for a shared storage. Just select Basic Storage Devices.
  4. Then, if this is a new computer, or a new disk, you are presented with a warning that disk(s) need to be reinitialized. Select button 'Re-initalize all'.
  5. You are asked to provide computer name. Enter here mail.example-domain.com. Then, click on button Configure Network. A new dialog will open.
  6. In the newly opened dialog select tab Wired (if it isn't already selected) and in there select option 'Auto eth0' and click on the button Edit. New dialog will open.
  7. It is not necessary, but I change the name to be only eth0. Then, I select checkbox Connect automatically. This is mandatory because otherwise your server will be unavailable until someone logs into it and connects it to network. This isn't something you want. :)
  8. Clik on the tab IPv4 Settings. You'll see under Method option Automatic (DHCP). Change that into Manual and click on Add button. Then, add the address 10.0.0.2, change network mask to 24 (you'll be automatically offered 8) and enter gateway 10.0.0.1. Also, enter the IP address of public DNS server you are using until we configure our own DNS server. Finally, click Apply. Click Close to close network connections editor.
  9. Select the zone you are in and click Next.
  10. Next, you have to enter root password. Note that this is a vary important password so you should pick a strong one, or be certain in what you are doing! Anyway, after entering root password (twice) click Next. If you entered a weak password you'll be warned about it. Decide for yourself what you'll do, ignore it or change it to better one. In any case, eventually you'll proceed to next step.
  11. Now we came to partitioning step. Select Create Custom Layout and then Next. You'll be transferred to disk editor. In disk editor create swap partition (2G) and root (6G) partition. Both are standard partition so when asked about partition type (after clicking Create button) just confirm default value (i.e. Standard Partition). When you click Next, you'll be asked if you are certain that changes should be written to a disk. To confirm, press button Write Changes to Disk.
  12. When asked about grub loader, just select Next.
  13. Now you are presented with a screen to select package selection to be installed. Select Minimal and then Next.
Installation now starts so you should wait. Because it is minimal install it is finished quite soon. When all the packages are installed press Reboot. At this moment, on CentOS 6.2 the disk usage is:
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7,4G  759M  6,3G  11% /

As a final step of a base system installation you should do update. But, in order to do so you'll have to add additional repository that isn't include by default, see some details here. In short, you should run the following command as a root (this is one line, but it could be broken because of formatting in your browser!):
rpm -ivh ftp://ftp.funet.fi/pub/mirrors/centos.org/6/cr/i386/RPMS/centos-release-cr-6-0.el6.centos.i686.rpm
After that command successfully finishes, run the following command to pick up all the updates:

This additional repository isn't used any more, as far as I know. So just use the following command to update installation:
yum update
When asked, confirm update. You'll also be asked to import CentOS signing key into RPM database. Check that this is a valid key, and confirm import process. That's all, base system is installed! Don't forget to reboot machine after upgrade since probably many important packages replaced with newer versions and to activate them in already running processes you should reboot machine.

After update finished my disk usage was:

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7,4G  986M  6,1G  14% /
But the exact values heavily depend on number of updates, so take this only as a rough guideline.

Adding some useful packages

As a final step of base system installation I'll list some additional packages you might want to install. I find them very useful for debugging problems and checking system's correctness. Those packages are:
  • tcpdump - this is the packet sniffer. If something is wrong with a network you'll use this tool to see what's going on (or not, depending on the problem :)).
  • strace - sometimes process behave oddly and in those cases you can use this tool to trace them to see what's going on. It's not exactly dtrace, but in many cases is very hapeful.
  • telnet - when some server is apparently listening on some port and you can not access it for whatever reason this simple telnet client can help you try to connect, and using tcpdump see what's going on. It will even allow you to interact with server, e.g. mail server to send test email message.
  • lsof - swiss army knife that allows many thing to be queried from processes. For example, which ports are opened by a process, of to which process particular port belongs. Then what files are opened, etc. Very usefull tool, indeed.
  • ntpdate - this is a network time protocol that allows you to synchronize you machine's time clock with some accurate time server (e.g. zg1.ntp.carnet.hr).
  • rsync - for more efficiently copying data from and to server.
  • openssh-clients - to allow rsync to work and also to allow you to connect to remote machines from this server.
All those packages can be installed using yum followed by the package name (the name in bold).

About Me

scientist, consultant, security specialist, networking guy, system administrator, philosopher ;)

Blog Archive