Tuesday, June 16. 2009
As promised in the previous post this is the second part in a series of testing/benchmarking 8.4 under various circumstances.
The topic in this post is bulk loading of data. Knowing about expected and theoretical bulk loading performance is a very important thing for an DBA. It affects not only data warehouse style operations but also plays an important part in disaster recovery scenarios because the total time it takes to restore your database from your backup is directly related to its bulk loading performance.
Continue reading "Benchmarking 8.4 - Chapter 2/bulk loading"
Friday, June 12. 2009
Computing platforms are constantly changing, evolving and improving. This holds true for both the hardware and the software running on that hardware.
With PostgreSQL 8.4 just around the corner and a 2U Nehalem based IBM x3650M2 at my hands I thought I would do some benchmarking and see how well PostgreSQL does on that kind of hardware under a range of different workloads. This is planned as a series of posts starting with read-only benchmarking, followed up by bulk load testing and maybe some OLTP benchmarks as well.
Continue reading "Benchmarking 8.4 - Chapter 1/Read-Only workloads"
Tuesday, May 22. 2007
ok just bought me a new (and rather expensive) toy after some people said "ok if you buy one today I will buy you a game for that".
So I went to the electronic shop and bought one (including the optional remote and two games) but I guess that might have only been the start because now a nice new LCD television would suddenly make sense ...
Saturday, May 19. 2007
Today I spent some time on doing a bit of maintenance work on some of my buildfarm boxes and suddenly I thought it would nice to present some of the more weird ones to others.
Of course there is also magnus complaint about a lack of blog posts on planetpostgresql.org that I just had to react too ;-)
I have a total of 14 registered hosts on the buildfarm with nine of them actively reporting and four of them (those that I think are the most weird and interesting ones) are worth to get mentioned in a bit more detail:
lionfish:
- Hardware: Cobalt Cube 2
- CPU: 250Mhz MIPS in little endian mode
- Bogomips: ~250
- Memory: 48MB of RAM (+196MB of swap)
- Disk: 4GB IDE
- Time to complete a build farm run: ~5,5-6 hours(this makes lionfish by far the slowest box on the farm)
- On the farm since: 2004
- Operation System: Debian/Sarge 3.1
issues found by lionfish:
quagga:
- Hardware: ALLNET6500 (identically to the Thecus 2100 with 256MB of RAM instead of 128MB)
- CPU: Intel IOP 80219 ARMv5TE running at 600Mhz
- Bogomips: ~250
- Memory: 256MB DDR-SDRAM
- Disk: 250GB SATA
- Time to complete a build farm run: ~2,5 hours
- On the buildfarm since: January 2007
- Operating System: Debian/Etch 4.0
issues found by quagga:
- tcl upstream bug on ARM and MIPS
spoonbill:
- hardware: Sun Ultra 10 Workstation
- CPU: 300Mhz UltraSPARC-IIi
- Memory: 1GB
- Disk: 40GB IDE
- Time to complete a buildfarm run: ~1,5 hours
- On the buildfarm since: at least autumn 2004
- Operating System: OpenBSD 3.9/Sparc64 (upgraded a few times though)
issues found by spoonbill:
sponge:
- Hardware: IBM RS/6000 7046-B50
- CPU: PowerPC 604e 375Mhz
- Bogomips: ~41
- Memory: 256MB
- Disk: 18GB SCSI
- Time to complete a buildfarm run: 1-1,5 hours
- On the farm since: spring 2006
- Operating System: Fedora Core 5/ppc
issues found by sponge:
While hardware of that kind is not likely to be found in any serious or performance critical production use (at least I hope so!) this summary clearly shows the importance of the buildfarm as well the value of having not-so-mainstream boxes there :-)
I would be interested in getting details on other weird boxes people have on the farm or are are planning to add ...
Monday, April 30. 2007
The discussion on using SAN vs. DASD based storage is nearly a religious war(as can be seen in a lot of discussions on pgsql-performance) and in many ways similar to the infamous emacs vs. vi debate.
From personal experience I have found the IBM DS4300 and IBM DS4300 Turbo (basically the same as the DS4300 but with more memory/cache and a hefty markup in price) quite a reliable and basically maintenance free solution.
However - for some workloads those types of SAN are not really that appropriate. A DS4300(which is a now withdrawn from marketing) can do only a bit above 100MB/s of seq IO(nearly independent on the number of disks!) per controller(about 135MB/s if used together) which is really not much when one considers how fast modern hard drives are.
I recently got a SAN Array to play with that looks quite interesting since while expensive it still seems reasonably priced compared to what companies like IBM or others want for similar gear.
The array I got for testing is basically a non-branded LSI/Engenio 3994 with 16 2Gbit 10k 146FC drives and 2GB of battery backed cache per controller.
It is directly connected via two QLogic QLE2460 PCI-Express adapters to a HP DL380 G5 running CentOS 5 for testing.
The first impression of the array is a solid one - it looks very familiar for people that are used to the IBM DS4000 storage line and the Management GUI is basically the same (with an Engenio logo in place of the IBM one).
The controller chassis can hold 16 disks (up from 14 in the older designs) in 3U and the available expansion enclosures have the same capacity and dimensions (up to 6 are supported) and can be added online(untested!) without disruption to ongoing IO.
Due to the use of disks that are only capable of 2Gbit/s, the speed of the two drive channel loops is also limited to 2Gbit/s (using 4GBit FC drives it can be configured to use 4Gbit/s on the drive channels).
The following is not meant as a thorough benchmark of neither the array nor PostgreSQL but rather some ad-hoc testing and playing around to get some impression on the overall performance characteristics of the device and are done using ext3(I'm fully aware of the fact that other file systems - especially XFS - might provide noticeable better streaming performance, but I have a much higher level of trust in ext3 and that's my choice in production environments) in the default journaling mode.
In the following test(test case 1) we use two volume groups - each a RAID10 (8 disks) and a RAID0 in the OS and write cache mirroring between the controllers(keeps both controller caches in sync so in case one controller fails the other one can take over). To utilize both controllers the HBAs are set up so that controller A is using on and controller B the other.
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 51121 98 188347 76 97426 28 58961 98 378240 38 732.5 2
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 19599 94 257598 100 8272 27 19270 91 331205 99 4879 17
convm004,16G,51121,98,188347,76,97426,28,58961,98,378240,38,732.5,2,512,19599,94,257598,100,8272,27,19270,91,331205,99,4879,17
and the same with write mirroring disabled for both logical volumes (test case 2):
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 51372 99 235020 96 122183 35 58880 98 369848 37 723.0 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 19888 95 256732 99 10037 32 19704 93 332286 99 5541 19
convm004,16G,51372,99,235020,96,122183,35,58880,98,369848,37,723.0,1,512,19888,95,256732,99,10037,32,19704,93,332286,99,5541,19
so write mirroring seems to have a 20% penalty for sequential writes and rewriting but not much impact for others - so it might be worth keeping it turned on due to the additional data integrity guarantees it provides .
It further seems that the device seems to be bottlenecked by the speed of the drive channels (there are two loops in the array head and half the drives are on the one and the other half on the other) due to the 2Gbit disks.
But it also shows that the devices seems to scale fairly well - until it hit's the bandwidth limit - at least for RAID10.
and now for comparison a test using only volume group and a single controller (test case 3):
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 51346 98 134822 56 69414 17 58651 97 251779 23 758.8 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 20048 91 259694 99 5846 18 18707 84 338671 99 2935 9
convm004,16G,51346,98,134822,56,69414,17,58651,97,251779,23,758.8,1,512,20048,91,259694,99,5846,18,18707,84,338671,99,2935,9
so let's see what PostgreSQL is able to do in terms of sequential IO on such device:
bench=# select version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 8.3devel on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52)
(1 row)
bench=#
simple sequential scan on a large table (pgbench schema generated with a scale of 10000) using only a single controller (same setup as in test case 3):
bench=# select count(1) from accounts;
count
------------
1000000000
(1 row)
Time: 619865.939 ms
bench=# select pg_relation_size('accounts')/619::float;
?column?
------------------
216998258.558966
(1 row)
so we are getting about 215MB/s out of 250MB/s which looks ok.
so what happens with software raid 0 over two 8 disk RAID10 volume groups on different controllers (same setup as test case 1):
bench=# select count(1) from accounts;
count
------------
1000000000
(1 row)
Time: 478785.617 ms
bench=# select pg_relation_size('accounts')/478::float;
?column?
------------------
281008205.121339
(1 row)
Time: 265.791 ms
so that is more interesting - it seems that PostgreSQL is getting CPU bottlenecked(the array/file system can do >370MB/s) here and those ~280MB/s are pretty much in line with what Luke usually quotes (PostgreSQL getting CPU bottlenecked at around 300MB/s even on very fast AMD Opteron based boxes).
for those curious here are some other random tests (uncommented so judge by yourself):
single raid 5 with 4 logical volumes (each 500GB) and software RAID0 in the OS - two volumes per channel
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 50978 99 208098 82 89274 25 59058 98 236993 24 488.5 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 20860 94 258368 99 7770 25 21087 95 335918 100 4291 14
convm004,16G,50978,99,208098,82,89274,25,59058,98,236993,24,488.5,1,512,20860,94,258368,99,7770,25,21087,95,335918,100,4291,14
A single RAID5 array over all 16 disks and two identically sized logical volumes each around 1TB in size.
bonnie++:
on one LUN:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 51245 98 121190 49 69406 17 56902 94 256111 22 840.9 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 20781 94 235507 91 7233 23 18685 84 338017 99 4125 14
using both LUNs and software RAID0:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 51423 99 204881 84 83740 23 59040 98 232573 23 554.7 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 20303 93 259481 99 7230 23 20357 93 337312 100 3793 13
convm004,16G,51423,99,204881,84,83740,23,59040,98,232573,23,554.7,1,512,20303,93,259481,99,7230,23,20357,93,337312,100,3793,13
with disabled write cache mirroring:
Version 1.03 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
convm004 16G 51751 99 242637 97 105392 30 58859 98 235541 23 563.2 1
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
512 21034 95 255377 99 6485 21 19269 88 337095 99 4119 14
convm004,16G,51751,99,242637,97,105392,30,58859,98,235541,23,563.2,1,512,21034,95,255377,99,6485,21,19269,88,337095,99,4119,14
Thursday, March 29. 2007
while trying to put new hardware into production today I found that the box (running Debian Etch/i386 with a 2.6.18 based kernel) would start to drop network connections during large transfers at Gigabit speeds.
Simply scping a large file over from a nearby box would result in stalled scp transfers and a large number of "TX unit hang" errors appearing in the kernel log as well as debugging output similar to:
Mar 29 17:30:05 xxx kernel: Tx Queue <0>
Mar 29 17:30:05 xxx kernel: TDH <56>
Mar 29 17:30:05 xxx kernel: TDT <57>
Mar 29 17:30:05 xxx kernel: next_to_use <57>
Mar 29 17:30:05 xxx kernel: next_to_clean <56>
Mar 29 17:30:05 xxx kernel: buffer_info[next_to_clean]
Mar 29 17:30:05 xxx kernel: time_stamp <1eed17>
Mar 29 17:30:05 xxx kernel: next_to_watch <56>
Mar 29 17:30:05 xxx kernel: jiffies <1eefd1>
Mar 29 17:30:05 xxx kernel: next_to_watch.status <0>
Mar 29 17:30:07 xxx kernel: Tx Queue <0>
Mar 29 17:30:07 xxx kernel: TDH <56>
Mar 29 17:30:07 xxx kernel: TDT <57>
Mar 29 17:30:07 xxx kernel: next_to_use <57>
Mar 29 17:30:07 xxx kernel: next_to_clean <56>
Mar 29 17:30:07 xxx kernel: buffer_info[next_to_clean]
The nic in question is an embedded Intel 82573E/L on a Supermicro PDSM4+ with the latest BIOS-Update available (1.2):
0d:00.0 Ethernet controller: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)
0e:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
A fair bit of research turned this small shellscript up. This script basically greps the output of ethtool -e interface (which in itself dumps the eeprom contents) and flips a bit in the eeprom:
~# sh fixeep-82573-dspd.sh eth0
eth0: is a "82573E Gigabit Ethernet Controller"
This fixup is applicable to your hardware
executing command: ethtool -E eth0 magic 0x108c8086 offset 0x1e value 0xdf
Change made. You *MUST* reboot your machine before changes take effect!
after the reboot the nic just works fine - no more stalls and transmit timeouts ...
Thursday, March 1. 2007
System monitoring is both an art and a pain. It is nice to have pretty graphs that one can show what's going on with a server or a service as well as having something that does proper notification of current or potential issues, but on the other side there is also a lot of pain and (boring) work involved in getting this up and running in a proper way.
I'm quite a fan of doing proper and detailed monitoring of systems - and after the latest issues with tribble I took a stab at improving the monitoring of that box but - well tribble is running FreeBSD and doing hardware related monitoring (vs. checking for things in the OS) is often more difficult there for various reasons.
The first thing I wanted to get monitored is the hardware itself - modern servers usually carry some sort of BMC (Baseboard Management controller) or some even more sophisticated solutions(RSAII, iLO - just to name a few) that are basically small independent computers on the mainboard.
Accessing the data those BMCs can provide is often done through complex and binary only drivers available only for Microsoft Windows and a limited number of commercially supported linux distributions(and some of them are even bloated java based GUI things) - however in the last few years a standard based solution to that kind of task has appeared - Intelligent Platform Management Interface (IPMI).
IPMI provides a standardized interface to manage and monitor servers even in the absence(!) of an operating system - it is a cool idea though in practice it bears a lot of similarity to ACPI in the sense that every vendor is implementing it a bit different and especially early implementations are buggy like hell.
Luckily for us tribble is running 1 FreeBSD 6.2 with is the first FreeBSD release to support ipmi(4) despite the fact that the man page claims it got added in 7.0 ...
For integration into the postgresql.org monitoring infrastructure I hacked up a small nagios check script which is simple calling ipmitool and looking for interesting output.
Sample output for tribble of that script looks like:
[stefan@tribble ~]$ sudo /usr/local/libexec/nagios/check_ipmi
OK - IPMI: (Ambient_Temp = 23 degrees C, CPU_1_Temp = 34 degrees C, CPU_2_Temp = 34 degrees
C, DASD_Temp = 31 degrees C, Fan_10_Presence = 0x02, Fan_10_Tach = 1830 RPM, Fan_11_Presence
= 0x02, Fan_11_Tach = 1800 RPM, Fan_12_Presence = 0x02, Fan_12_Tach = 1740 RPM,
Fan_1_Presence = 0x02, Fan_1_Tach = 1710 RPM, Fan_2_Presence = 0x02, Fan_2_Tach = 1650 RPM,
Fan_3_Presence = 0x02, Fan_3_Tach = 1830 RPM, Fan_4_Presence = 0x02, Fan_4_Tach = 1830 RPM,
Fan_5_Presence = 0x02, Fan_5_Tach = 1890 RPM, Fan_6_Presence = 0x02, Fan_6_Tach = 1680 RPM,
Fan_7_Presence = 0x02, Fan_7_Tach = 1680 RPM, Fan_8_Presence = 0x02, Fan_8_Tach = 1680 RPM,
Fan_9_Presence = 0x02, Fan_9_Tach = 1800 RPM, PS_1_Fan_Fault = 0x01, PS_1_Status = 0x01,
PS_2_Fan_Fault = 0x01, PS_2_Status = 0x01)
which is a bit verbose but I will work on that later ;-)
The script will also check the System Event Log (SEL) - which is basically a small NVRAM backed memory on the BMC holding all kinds of hardware monitoring events - for entries (in this case there are none) and will return a warning if it finds something.
Ok now that we had the basic hardware covered only one major thing is left - the monitoring of the integrated IBM ServeRAID 7k adapter which has two arrays (a 2 disk RAID 1 for the OS and related data and a 4 disk RAID 10 for the VMs).
Monitoring hardware RAID is a delicate thing on most BSDs (though OpenBSD made some promising progress on that front lately) - the lack of vendor support often results in only rudimentary drivers at best and useful tools to check the array status or even initiate rebuilds are often simply not available.
A bit of research turned the following post on the freebsd-scsi mailing list up.
Once compiled this tool indeed gives basic information about the status of ips(4) based raid controllers on FreeBSD - wrapping it once again into a nagios compatible check script results in:
[stefan@tribble ~]$ sudo /usr/local/libexec/nagios/check_raid
OK: /dev/ips0 - Volume: 0, ArrayState: OK; Volume: 1, ArrayState: OK;
so a the end of the day we have nice hardware monitoring for at least one of the projects servers - but there is still a lot to do in the future ...
Sunday, February 18. 2007
I have been playing with a LSILogic MegaRAID 8480E lately (with 24 disks in two IBM EXP3000 Enclosures attached).
Those who have used SCSI based products from said company before might know that there was a useful litte curses based tool on linux for managing arrays (I think even Dell shipped it for their LSI based PERCs).
But hey - times have changed and SAS is the new hip thing now and new technology requires new tools ...
So what got it replaced with ? well there is the inevidable |JAVA based GUI monster called MegRAID Storage Manager that nobody really wants to have on a server and then there is MegaCLI.
MegaCLI comes as a RPM containing only a single statically linked 32 bit Linux binary (since I'm running Debian here I just used alien to extract the binary).
The fact that the package comes with NO documentation the "-help" output was a bit irritating:
MegaCLI SAS RAID Management Tool Ver 1.01.09 May 25, 2006
(c)Copyright 2006, LSI Logic Corporation, All Rights Reserved.
MegaCli -v
MegaCli -help|-h|?
MegaCli -adpCount
MegaCli -AdpSetProp {CacheFlushInterval -val}|{ RebuildRate -val}
|{PatrolReadRate -val}|{BgiRate -val}|{CCRate -val}
|{ReconRate -val}|{SpinupDriveCount -val}|{SpinupDelay -val}
|{CoercionMode -val}|{ClusterEnable -val}|{PredFailPollInterval -val}
|{BatWarnDsbl -val} |{EccBucketSize -val} | {EccBucketLeakRate -val}
| AlarmEnbl | AlarmDsbl | AlarmSilence -aN|-a0,1,2|-aALL
MegaCli -AdpGetProp CacheFlushInterval | RebuildRate | PatrolReadRate | BgiRate
| CCRate | ReconRate | SpinupDriveCount | SpinupDelay | CoercionMode
| PredFailPollInterval | EccBucketSize | EccBucketLeakRate | EccBucketCount
| ClusterEnable | BatWarnDsbl | AlarmDsply -aN|-a0,1,2|-aALL
MegaCli -AdpAllInfo -aN|-a0,1,2|-aALL
MegaCli -AdpGetTime -aN|-a0,1,2|-aALL
MegaCli -AdpSetTime yyyymmdd hh:mm:ss -aN
MegaCli -AdpSetVerify -f fileName -aN|-a0,1,2|-aALL
MegaCli -AdpBIOS {-Enbl [SOE|BE]}|-Dsbl|-Dsply -aN|-a0,1,2|-aALL
MegaCli -AdpBootDrive {-Set -Lx}|-Get -aN|-a0,1,2|-aALL
MegaCli -AdpAutoRbld -Enbl|-Dsbl|-Dsply -aN|-a0,1,2|-aALL
MegaCli -AdpCacheFlush -aN|-a0,1,2|-aALL
MegaCli -AdpPR -Dsbl|EnblAuto|EnblMan|Start|Stop|Info|{SetDelay Val}
-aN|-a0,1,2|-aALL
MegaCli -FwTermLog -BBUoff|BBUoffTemp|BBUon|BBUGet|Dsply|Clear -aN|-a0,1,2|-aALL
MegaCli -AdpDiag [val] -aN|-a0,1,2|-aALL
val - Time in second.
MegaCli -AdpBatTest -aN|-a0,1,2|-aALL
MegaCli -PDList -aN|-a0,1,2|-aALL
MegaCli -PDGetNum -aN|-a0,1,2|-aALL
MegaCli -pdInfo -PhysDrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PDOnline -PhysDrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PDOffline -PhysDrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PDMakeGood -PhysDrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PDHSP {-Set [-Dedicated [-ArrayN|-Array0,1,2...]] [-EnclAffinity] [-nonRevertible]}
|-Rmv -PhysDrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PDRbld -Start|-Stop|-ShowProg |-ProgDsply
-PhysDrv [E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PDClear -Start|-Stop|-ShowProg |-ProgDsply
-PhysDrv [E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PdLocate {[-start] | -stop} -physdrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PdMarkMissing -physdrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL
MegaCli -PdGetMissing -aN|-a0,1,2|-aALL
MegaCli -PdReplaceMissing -physdrv[E0:S0] -arrayA, -rowB -aN
MegaCli -PdPrpRmv [-UnDo] -physdrv[E0:S0] -aN|-a0,1,2|-aALL
MegaCli -EncInfo -aN|-a0,1,2|-aALL
MegaCli -PhyInfo -phyM -aN|-a0,1,2|-aALL
MegaCli -LDInfo -Lx|-L0,1,2|-Lall -aN|-a0,1,2|-aALL
MegaCli -LDSetProp {-Name LdNamestring} | -RW|RO|Blocked | WT|WB|RA|NORA|ADRA
| Cached|Direct | -EnDskCache|DisDskCache -Lx|-L0,1,2|-Lall -aN|-a0,1,2|-aALL
MegaCli -LDGetProp -Cache | -Access | -Name | -DskCache -Lx|-L0,1,2|-LALL
-aN|-a0,1,2|-aALL
MegaCli -LDInit {-Start [-full]}|-Abort|-ShowProg|-ProgDsply -Lx|-L0,1,2|-LALL -aN|-a0,1,2|-aALL
MegaCli -LDCC -Start|-Abort|-ShowProg|-ProgDsply -Lx|-L0,1,2|-LALL -aN|-a0,1,2|-aALL
MegaCli -LDBI -Enbl|-Dsbl|-getSetting|-Abort|-ShowProg|-ProgDsply -Lx|-L0,1,2|-LALL -aN|-a0,1,2|-aALL
MegaCli -LDRecon {-Start -rX [{-Add | -Rmv} -Physdrv[E0:S0,...]]}|-ShowProg|-ProgDsply
-Lx -aN
MegaCli -LdPdInfo -aN|-a0,1,2|-aALL
MegaCli -LDGetNum -aN|-a0,1,2|-aALL
MegaCli -CfgLdAdd -rX[E0:S0,E1:S1,...] [WT|WB] [NORA|RA|ADRA] [Direct|Cached]
[-szXXX [-szYYY ...]] [-strpszM] [-Hsp[E0:S0,...]] [-AfterLdX] -aN
MegaCli -CfgEachDskRaid0 [WT|WB] [NORA|RA|ADRA] [Direct|Cached][-strpszM] -aN|-a0,1,2|-aALL
MegaCli -CfgClr -aN|-a0,1,2|-aALL
MegaCli -CfgDsply -aN|-a0,1,2|-aALL
MegaCli -CfgLdDel -LX|-L0,2,5...|-LALL -aN|-a0,1,2|-aALL
MegaCli -CfgFreeSpaceinfo -aN|-a0,1,2|-aALL
MegaCli -CfgSpanAdd -r10 -Array0[E0:S0,E1:S1] -Array1[E0:S0,E1:S1] [-ArrayX[E0:S0,E1:S1] ...] -aN
MegaCli -CfgSpanAdd -r50 -Array0[E0:S0,E1:S1,E2:S2,...] -Array1[E0:S0,E1:S1,E2:S2,...]
[-ArrayX[E0:S0,E1:S1,E2:S2,...] ...] [WT|WB] [NORA|RA|ADRA] [Direct|Cached]
[-strpszM] -aN
MegaCli -CfgSave -f filename -aN
MegaCli -CfgRestore -f filename -aN
MegaCli -CfgForeign -Scan -aN|-a0,1,2|-aALL
MegaCli -CfgForeign -Dsply [x] -aN|-a0,1,2|-aALL
MegaCli -CfgForeign -Preview [x] -aN|-a0,1,2|-aALL
MegaCli -CfgForeign -Import [x] -aN|-a0,1,2|-aALL
MegaCli -CfgForeign -Clear [x] -aN|-a0,1,2|-aALL
x - index of foreign configurations. Optional. All by default.
MegaCli -AdpEventLog -GetEventLogInfo -aN|-a0,1,2|-aALL
MegaCli -AdpEventLog -GetEvents -f <fileName> -aN|-a0,1,2|-aALL
MegaCli -AdpEventLog -GetSinceShutdown -f <fileName> -aN|-a0,1,2|-aALL
MegaCli -AdpEventLog -GetSinceReboot -f <fileName> -aN|-a0,1,2|-aALL
MegaCli -AdpEventLog -IncludeDeleted -f <fileName> -aN|-a0,1,2|-aALL
MegaCli -AdpEventLog -GetLatest n -f <fileName> -aN|-a0,1,2|-aALL
MegaCli -AdpEventLog -Clear -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -GetBbuStatus -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -GetBbuCapacityInfo -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -GetBbuDesignInfo -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -GetBbuProperties -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -BbuLearn -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -BbuMfgSleep -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -BbuMfgSeal -aN|-a0,1,2|-aALL
MegaCli -AdpBbuCmd -SetBbuProperties -f <fileName> -aN|-a0,1,2|-aALL
MegaCli -AdpFacDefSet -aN
MegaCli -AdpFwFlash -f filename [-NoSigChk] [-NoVerChk] -aN|-a0,1,2|-aALL
For somebody who knows a bit about the underlying technology and has worked with previous SCSI based LSI products one can actually guess on most of the things - but that's not what one should actually do with enterprise class RAID hardware that is often used to protect valuable data.
LSI really needs to look into bundling proper docs with this too because "let's guess what this cryptic switch means" is NOT appropriate at all.
Oh - by the way there is a README linked on the lsilogic website for that tool - but guess? Except for a bit of revision history it only contains the very same output I showed above ...
Friday, February 16. 2007
This is actually pretty easy to do but the information on how to do it is pretty well hidden on the IBM website.
The first thing one needs is ASU IBMs advanced settings utility. The other component needed is the Remote Supervisor Adapter II USB Daemon.
After compiling and loading the driver (though not officially supported it works fine on Debian Etch/amd64) on can simply:
[root@somewhere ~]# ./asu resetrsa
Resetting RSA/RSA2..........done
|