HCELL selection can be a controversial topic. I would like to describe a few different techniques, the rationale behind those techniques, and hopefully generate some discussion based on your experiences.
One technique focuses on performance. In the interest of fastest turnaround time, we can choose cells based strictly on the potential time savings that choosing any particular cell may bring. Any cell containing just a few devices, and then placed just a few times, would offer little value from the standpoint of performance. On the other hand, a cell that contains several thousand devices, and placed multiple times, is a good candidate for the hcell list. Calibre Interactive can help create an hcell list of this nature. From the command line, it can be invoked with "calibre -gui -lvs". Several fields will need to be filled in for netlist names, top cell names etc. The process is described in the "Calibre Interactive User's Manual" in the section titled "Performing Hcell Analysis in LVS". A high performance hcell list may have a surprisingly small number of hcells... Possibly less than 10 or 20 cell names. While the performance aspect of this method seems clear enough, I do sometimes wonder if the small number of hcells ever has a negative side effect of increased difficulty for LVS debug.
Another technique focuses on "design methodology" instead of performance. Some people use hcells as a means of enforcing a design methodology where hundreds or thousands of cells with the same name in the layout and source are expected to match at the cell level. The "-automatch" switch, or an exhaustive hcell list of practically all the layout and source cells may be used for this method. It's not necessarily best for performance, and can lead to nuisance errors in many cases, but many people use this method. I would be interested to hear your opinions related to this method.
A third technique begins with listing all the standard cells, and adding certain other cells based on some criteria. I presume that familiarity with the design is necessary for this method. I haven't used this method myself so I'm a little vague on the details. If you have had good or bad experiences with this third method, please share.
I have often wondered if Calibre could offer a simple and automatic hcell selection option that gives optimum performance, avoids false errors and promotes easy debug. Maybe our collective discussions will lead to the answer.
I will collect and summarize replies for this thread into a document we can refer to. Here's a link to that document: HCELL selection methods