[9693] in Athena Bugs
decmips 7.4G: Bug in tags.el
daemon@ATHENA.MIT.EDU (Reid M. Pinchback)
Tue Jul 28 11:00:35 1992
To: bugs@Athena.MIT.EDU
Date: Tue, 28 Jul 92 11:00:23 EDT
From: "Reid M. Pinchback" <reidmp@Athena.MIT.EDU>
System name: whirlwind
Type and version: KN02ca 7.4G (1 update(s) to same version)
Display type: PMAG-DV
What were you trying to do?
Use etags to locate functions in C files.
What's wrong:
The emacs tags code sometimes finds the wrong tag in the tag buffer,
and then uses this incorrect tag to try and find the function. As
a result, it finds the wrong function. This is bad enough when you
are using find-tag interactively, but a real problem when you are
trying to use it in elisp programs.
What should have happened:
It should be more careful about finding the correct tag. I can
understand it searching through source files and not quite hitting
the proper entry, but it should at least search its own TAGS file
properly.
Please describe any relevant documentation references:
In the file /usr/athena/lib/gnuemacs/lisp/tags.el, at least one
mistake can be found at line 153, inside the "find-tag" defun. The
line of code currently is:
(if (not (search-forward tagname nil t))
This is doing an ordinary string search *of the tags buffer* to find
the tag. If you have a tag name that is a substring of another tag
name (for instance, "spell" and "capspell"), then "find-tag" may hit
the substring before hitting the symbol (depending on the ordering of
tags in the buffer). It could even match a return type instead of
any function at all, if the name of the return type contains a
substring of a function name.
This is relatively fixable. For instance, something like:
(if (not (re-search-forward
(concat "\\<" (regexp-quote tagname) "\\>") nil t))
This still isn't quite right; it won't match tags that start or end
with symbol constituents (like "_" for c files). That is yet another
bug in how tags work... the "etags" program allows for symbol
consituents in tags, but tags.el loads the TAGS buffer and leaves it
in fundamental mode. This means that the tags code, even if given
a simple patch, still won't be able to properly deal with symbols
across various languages since it almost never uses the appropriate
syntax table. Of course, this would be tricky to fix since that
TAGS file can contain tags for more than one source file, and
the different source files may be in different languages.
I suspect the only easy fix would be to search the tags buffer
twice... first with the code that I gave above, and if it fails,
then by falling back to the original string search method. This
won't be perfect, but the hit rate on the tags will be higher
without (I think) ever being made worse than it is now.
To do any better you would to search for page breaks (^L), check the
extension on the file name, set the appropriate mode, and limit
the region of the re-search-forward so that it doesn't go beyond
the next page break. Then keep looping this code until you run
through all the pages in the TAGS file. It would also help if
function tags were the first thing on each line in the buffer.
The return type being in front is a pain, and isn't very reliable
anyways (since it only contains the portion of the return type
that is on the same line as the function name). This way you
could stop function tags from inadvertently matching return type
strings. This is the way ctags generates its tag file.