Lourens Naudé has been working professionally with Rails for over a year in the travel and hospitality industry.
He has deep exerpience from the perspective of making servers perform at their peek (whether its Rails, PHP or other languages).
Note: Original question text in blockquotes, Lourens’ response inline. Minor editing/linking for clarity.
For ActiveRecord object pulls, would you mind rating the following (1-3/X, in order of fastest to slowest) in terms of mean response time?
I’m assuming storing a marshaled AR object. Marshaling overhead is the same, regardless of backend storage (except with no AR object caching, just the overhead of converting a record to an AR instance).
* Filestore (top of the line hard-drives, SCSI or SATA, whatever’s faster these days =)
1 ( fastest ) : All kernels version 2.6.x is compiled with tmpfs support ( think RAM DISK, with size limits ), which allows you to create a filesystem within memory. Downside being the contents is discarded when the tmpfs filesystem is unmounted. This is fine for temporary data such as caches. Disk speed (SCSI, SATA) isn’t a factor at all. See this page for further details. I have not had the time to play with this just yet, but I think this is a heavily overlooked area for performance gains.
A directory structure as per this post (eg. 0000/0317) should yield good performance. Major downside being Capistrano tasks to create/mount the tmpfs partition and a custom ruby storage module for working with this.
* MySQL database (again 2-3GB of memory allocated to it for a 20GB+ database, usually maxing out memory usage as is)
2 (decent) : No Marshaling.There’s a ton of variables here, but I’ll run through the most important (MySQL only, assuming dominant InnoDB table space, read (SELECT) performance only) :
STATUS VARIABLES
- Handler_read_key : You’re indexes is pretty good if this is high, however …
- Handler_read_next : If this is proportionally much higher than Handler_read_key, you’re doing too many table scans.Remember, MySQL only uses one index per query per table …. perhaps an index spanning more than 1 column would yield improvements if more than one key is often referenced in a query?Has many polymorphic associations with STI. Model -> has_many :assets, :as => attachable, where different types of assets is supported.Table structure (id | item | type | attachable_id | attachable_type).Have one or two indexes only eg. one index for type, attachable_id and attachable_type (see slow query log below).
- Handler_read_rnd : If very high, too many table scans ( via joins ), optimize your queries (see slow query log below).
- Handler_read_rnd_next : If very high, too many table scans ( via joins ), optimize your queries (see slow query log below).
- Created_tmp_disk_tables; If this is high, increase tmp_table_size.
- Key_reads : Increase key_buffer_size if this is very high.
FINDING THE OFFENDERS
Use the slow query log.
log-slow-queries = /var/log/mysql-slow.log
long_query_time = 2
log-queries-not-using-indexes
Let it run for at least 2 days, then use the mysqldumpslow utility to aggregate slow query counts and results.
mysqldumpslow
* Memcached - 2GB of memory dedicated to memcache 3 (slowest) : The connection and handshake overhead of Memcached used only from a single host ( non-distributed ) is higher than an already open DB thread dedicated to a Mongrel/FCGI process. In a multiple app server cluster environment, each with an amount of memory dedicated to Memcached, yes, you will see significant gains. For cached HTML fragments, would you mind rating the following (1-3/X, in order of fastest to slowest) in terms of mean response time? * Filestore (top of the line hard-drives, SCSI or SATA, whatever’s faster these days =) 1 (fastest) tmpfs, as above. * Memcached - 2GB of memory dedicated to memcache 2 (decent) : See handshake problem and single host issue above. * Storing fragments as rows in a MySQL database (2-3GB of memory allocated for a 20GB+ database, usually maxing out memory usage as is) 3 (slowest) : Isn’t the whole point of cache fragments to eliminate rendering AND DB access time? —————- Lourens Naudé is employed at Madeira Online (madeiraapartments.com and madeiraproperty.com) and has been spending most of the last year perfecting a hosted Hospitality and Vehicle Rental app for the SME travel sector. Coming soon.


Good information Lourens! We definitely took some different routes in our analysis.
You know, ActiveRecord, as great as it is, can’t be tweaked much. In our experience, if you need intensive DB work on the backend, don’t expect AR to come to the rescue. Sure, it’s great for the front end, but is painfully slow on the backend.
I recently tweaked a project I was working on by writing a plain ol’e C app to do backend processing. I was able to take a 13 second SELECT (complicated join) with AR and turn it into a 1 sec tight loop with C. Got the same work done.
Sure it was more lines of code, but if you need intensive backend processing, AR isn’t up to the task, yet.
I’ll be posting some more detailed benchmarks on my site soon.
Robert,
I’d love to see how you implemented the C version.
- Shanti
We used the regular mysql.h code.
We just did mysql queries. I tried it first with MySQL++, but the speed of that isn’t much better than AR.
I would share the code but it is a closed source project.
Oliver,
I think AR slowness is inherited by the language, more specific Garbage Collection.Depending on the size of the result set, there’s a good chance Garbage Collection could be triggered after conversion of raw result to AR objects.
http://beaver.net/slides/ruby/ruby-gc-slides.pdf
http://seattlerb.rubyforge.org/mem_inspect/
Complex joins is catch 22 ground.Object caching (memcached) just unmarshals the AR object and is unlike to trigger GC as often, but is only well suited for single row queries or the odd couple of rows.Fetching via AR would likely trigger GC in a scenario where one Mongrel/FCGI processes the same db intensive (multiple joins etc. ) action at regular intervals.
The performance difference between Model.find_by_sql ( conversion to AR object ) and Model.connection.execute ( raw MySQL result ) is also significant.
Applying http://railsexpress.de/blog/articles/2006/10/05/make-ruby-mysql-create-less-garbage to latest ruby-mysql tarball ( requires MySQL devel libs >= 5.0.27 ) would also cut down on large JOIN retrieval.
But of course, none of the above would yield native C performance