Tuesday, June 16. 2009
As promised in the previous post this is the second part in a series of testing/benchmarking 8.4 under various circumstances.
Bulk loading requirements can appear in various different ways. People might want to load CSV style data from external sources, would like to populate development databases with vast amounts of test data to get and idea on scalability of their application or they might simply want to restore their database after the losing their server.
The testbed here is the same as the one mentioned in the first Chapter so I wont spend much time in repeating that information. All the following benchmarks use COPY which is the PostgreSQL specific way to load large amounts of data. given the fact that COPY (or \copy in psql) is not always a feasible solution I might do some further testing using multi value INSERTS and transactional batching in a later post. All the tables used here are NOT indexed - testing this is planned for a future post as well.
The data loaded here is what is generated by the dbgen utility from Mark Wongs DBT3 project. The file containing the test data contains ~60M rows and is used multiple times(coming out of the OS buffer cache). This means that in the case of say 16 processes a total of 60000000*16=960000000 rows got loaded, so the larger the process count the more amount of data PostgreSQL had to deal with in total.
The schema for that table is:
and for those who are curious some sample data:
The following graph shows three different benchmark runs each done at a connection count of 1,2,4,8,12 and 16. The one labeled "single table" was creating by loading the data concurrently into the same table. The one labeled "multiple tables" loads into one separate table per connection (ie. process one loads into table "lineitem1", the second one into "lineitem2" and so on). The last one does the same as the multiple table test except that it makes use of the fact that recent versions of PostgreSQL can bypass writing to the WAL altogether under certain circumstances. In this test the was done wrapping the COPY into a "BEGIN; TRUNCATE lineitem1; COPY FROM ... COMMIT;".
The first thing this graph tells us that the WAL overhead is HUGE. It also shows that there is not that much a difference between loading into the same table vs. loading data into separate ones.
On a closer look we can see that the scaling problem for concurrent copy with WAL logging already starts with the second concurrent process inserting data. In the following graph I ploted the theoretical scaling (based on multiplying the single connection performance with the number of available cores/threads):
The main scalability issue here seems to be internal locking in PostgreSQL - namely the XLogInsert lock which is showing up on top in a lot of other update/insert heavy workloads as well. The second process only results in a 50% performance increase and with 8 cores we are reaching the performance peak with a bit better than 3x the single core performance. At higher connection counts performance degrades again with the server showing around 70% idle time and abysmal performance at 16 connections.
The profile for the server at 8 connections looks fairly similar to the following:
As one can see in the first graph PostgreSQL is able to avoid writing to the WAL under certain circumstances resulting in a significant performance increase.
This graph shows much better scaling behavior - it gives at near perfect scaling up to 4 cores and reaches an 8x speedup at 16 cores without the degradation at higher connection counts as seen in the former tests. This is also what the new pg_restore in 8.4 will do in most circumstances. With 16 connections the profile during that test looks like:
At this load rate the server is really busy with almost no idle time left but still only showing iowait in the single digit percentage range. It almost seems that there is still opportunity left to optimize here :)
Display comments as (Linear | Threaded)