Obvious trick to reduce amount of HABTM relationship tables

by syegorov
Trick howto reduce amount of HABTM tables in complex application

Intro

Here is some trick I would like to share to you

Assume that your application already has 20 models. Now you want to add some extra model which HABTM to all others (ex: tag). According to manual that gives to you +1 table for tag and +20 tables for models relationship. And now you want to add extra model which will HABTM all others one more time :) (ex: comment all feature). It gives to you +1 table for model and +21 for relationship.

Trick

Use common sequence for id fields of models to which you would like to reduce amount of HABTM relationship tables. On MySQL where you wont find sequence - play with initial autoincrement value.
For sure do not forget to define HABTM in models accordingly

Example

Original DB:

CREATE TABLE `blobsters` (
  `id` int(10) NOT NULL auto_increment,
  .......,
  PRIMARY KEY  (`id`)
) AUTO_INCREMENT=0;
CREATE TABLE `shmobsters` (
  `id` int(10) NOT NULL auto_increment,
  .......,
  PRIMARY KEY  (`id`)
) AUTO_INCREMENT=10000;
CREATE TABLE `hopohopos` (
  `id` int(10) NOT NULL auto_increment,
  .......,
  PRIMARY KEY  (`id`)
) AUTO_INCREMENT=30000;
I am assume that for lifecycle that would be enough to have up to 10000 blobsters objects in db, 20000 shmobsters and billion hopohopos

Now I need to add 'comments all' feature. First create tables.

-- Comments model table
CREATE TABLE `comments` (
  `id` int(10) NOT NULL auto_increment,
  .......,
  `text` varchar(4096) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1030000;
-- Comments HABTM table
CREATE TABLE `comments_links` (
  `object_id` int(10) unsigned NOT NULL default '0',
  `comment_id` int(10) unsigned NOT NULL default '0',
  PRIMARY KEY  (`object_id`,`comment_id`)
);

Add comment model which HABTM to all existing model:

Model Class:

<?php 
class Comment extends AppModel {
    var 
$hasAndBelongsToMany = array(
        
'Blobster' => array(
        
'className'             => 'Blobster',
        
'joinTable'             => 'comments_links',
        
'foreignKey'            => 'comment_id',
        
'associationForeignKey' => 'object_id'
    
),
        
'Shmobster' => array(
        
'className'             => 'Shmobster',
        
'joinTable'             => 'comments_links',
        
'foreignKey'            => 'comment_id',
        
'associationForeignKey' => 'object_id'
    
),
        
'Hopohopo' => array(
        
'className'             => 'Hopohopo',
        
'joinTable'             => 'comments_links',
        
'foreignKey'            => 'comment_id',
        
'associationForeignKey' => 'object_id'
    
),
    );
}
?>

To the each of other models we have to define HABTM too:

var $hasAndBelongsToMany = array(

'Comment' => array(
        'className'             => 'Comment',
        'joinTable'             => 'comments_links',
        'foreignKey'            => 'object_id',
        'associationForeignKey' => 'comment_id'
    )
);

I specially marked that some of model has already defined HABTM array, but part of code you need to add is same per all models.

Conclusion

[p]I have test it with scaffold and it does working. I don't see reason why it should not to work. Bet similar technique will works with others types of relationships.

[p]Advantages which I can see:

  1. Less tables to implement complex HABTM
  2. Easy to extend already existing application (new part of code is same and could be shared between models trough include file or base model. Could be used in application which supports add-ons(one add-on per new thing\feature such could be tags, 'comment all', raiting, ...) and different users can have different sets of add-ons(no needs to care in one add-on about others))

Report

More on Models

Tags

Advertising

Comments

  • reconbot posted on 04/09/10 01:04:38 PM
    If anyone every wanted to see what a markov chain looked like, you can see it right here.
  • reconbot posted on 03/03/08 10:51:31 AM
    the bakery was loading really slow - sorry
  • reconbot posted on 03/03/08 10:43:23 AM
    This is a perfect case for using UUID's for your id field. UUID's wont have any of the trouble of running out of space in an arbitrarily range and cake supports using them just fine. I use them in a similar way you've mentioned using integer IDs for an thumbnail model that services a few other models.
  • reconbot posted on 03/03/08 10:37:23 AM
    the bakery was loading really slow - sorry
  • renan.saddam posted on 01/30/08 05:49:21 AM
    Can I use CONSTRAINTS (with InnoDB Engine) using your way ?
    I need to keep the database consistency too.
  • markstory posted on 12/22/07 09:27:49 AM
    I don't think that this proposed solution is a good idea. A better solution would to have a parent 'object' table that all other model types inherit or belong to. And do your habtm relationships off of the parent class table.
  • coeus posted on 10/03/07 02:15:24 AM
    I believe Drupal works in a similar fashion. They have a table called "nodes" and all other types of nodes (ex: a blog story, a page, a poll, a gallery, etc) belong to a node. So essentially every 'thing' is a node and the node ids correspond with the id of that 'thing'.

    So, when creating a commenting system you can simply create the table `node_comments`... which would virtually add a comment system to every 'thing' since they're all considered nodes.

    "the potential to calculate the auto-increment numbers incorrectly is inexcusably high"

    I really don't see how this is difficult. Drupal does it really well. Everytime you create something, like a poll or a page, it creates the node with an auto-incremented id, then you simply apply that node's id to the poll record in the poll table.
  • Skiedr posted on 08/20/07 12:26:36 AM
    Better way to solve such poblem is the way рефе used in translate behavior. You need two field in join table - ModelName and RowId. Also you need to slighly fix the conditions for each HABTM relation in all models (you need create it in any cases so this will no the problem).
    You ever can use same condition in all models for example
    'model_name='.$this->name
    • syegorov posted on 08/20/07 09:18:28 AM
      Better to share sequence between models which could be in "ModelName" field proposed, then have additional text field. Will be faster and no fix requred. Highly probably you even do not need 64bit for ID to last forever.
  • lamby posted on 08/18/07 10:30:41 AM
    This is possibly the worst database idea I've ever seen.

    Not only is this a great ugly hack, the potential to calculate the auto-increment numbers incorrectly is inexcusably high, as is the obvious possibility of collisions within the `id` ranges (think of a DoS spam attack).

    Furthermore, the claimed advantages are bogus. Firstly, yes, there would be more tables. But why is this a problem? I can only think that you either optimising prematurely, or you are finding the 20-odd tables unwieldy to manage. The former has an obvious fix, and the latter can be fixed by investing time in better tools.

    Secondly, is certainly not easier to extend. An add-on would require at least one table for the main entity it is storing - why is it suddenly difficult for it to add the join-table to a pre-existing 'Tag' table? And let us also revisit the problem of collisions in the big join table: how does the add-in know which ranges are safe? What happens when you decide the add-in (or any entity) needs a larger range?

    Finally, it is also completely avoidable and I would encourage anyone to do so: just use normal HABTM. If you are spending ages creating SQL tables, try writing a script to generate 'CREATE TABLE' stanzas for you.
    • syegorov posted on 08/20/07 09:06:56 AM
      Idea was not about to use ranges (use sequence if you can - sequence for MySQL can be simulated by model behavior), which btw you can control (hope you do something if so scared of DDOS, not only pray on cake)

      Idea was to use one join table per model which HABTM to models with same sequence.
      Big join table is not so big - there is no "total join table per db" - there is just join table which specify joins for certain model.

      I am totally agree with your comments about having it normal way. And for sure when you doing your own only server that is correct.

      But assume that you are doing something which could be extended by entities(plugins), set of which could be very different from installation to installation. And plugin developers should not be notes (or even worry) about others possible plugins.
    • richardathome posted on 08/19/07 12:52:28 PM
      I agree with lamby on this.

      If you think creating 20+ extra table and models is a chore you should really think about automating the process.

login to post a comment.