[32183] in Perl-Users-Digest

home help back first fref pref prev next nref lref last post

Perl-Users Digest, Issue: 3448 Volume: 11

daemon@ATHENA.MIT.EDU (Perl-Users Digest)
Thu Jul 21 14:09:43 2011

Date: Thu, 21 Jul 2011 11:09:22 -0700 (PDT)
From: Perl-Users Digest <Perl-Users-Request@ruby.OCE.ORST.EDU>
To: Perl-Users@ruby.OCE.ORST.EDU (Perl-Users Digest)

Perl-Users Digest           Thu, 21 Jul 2011     Volume: 11 Number: 3448

Today's topics:
        =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?= <xahlee@gmail.com>
        =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?= <rustompmody@gmail.com>
        =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?= <xahlee@gmail.com>
        =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?= <xahlee@gmail.com>
        =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?= <xahlee@gmail.com>
    Re: a little parsing challenge =?UTF-8?B?4pi6?= <steve+comp.lang.python@pearwood.info>
    Re: a little parsing challenge =?UTF-8?B?4pi6?= <rabbits77@my-deja.com>
    Re: a little parsing challenge =?UTF-8?B?4pi6?= <RedGrittyBrick@spamweary.invalid>
    Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?= <uri@StemSystems.com>
    Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?= (Randal L. Schwartz)
    Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?= <jearl@notengoamigos.org>
    Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?= <uri@StemSystems.com>
    Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?= <uri@StemSystems.com>
        Digest Administrivia (Last modified: 6 Apr 01) (Perl-Users-Digest Admin)

----------------------------------------------------------------------

Date: Wed, 20 Jul 2011 03:31:24 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?=
Message-Id: <02e81438-f03a-491f-a105-bc758ebfce83@z7g2000prh.googlegroups.com>

i've just cleaned up my elisp code and wrote a short elisp tutorial.

Here:

=E3=80=88Emacs Lisp: Batch Script to Validate Matching Brackets=E3=80=89
http://xahlee.org/emacs/elisp_validate_matching_brackets.html

plain text version follows. Please let me know what you think.

am still working on going thru all code in other langs. Will get to
the ruby one, and that perl regex, and the other fixed python ones.
(possibly also the 2 common lisp codes but am not sure they are
runnable as is or just some non-working showoff. lol)

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Emacs Lisp: Batch Script to Validate Matching Brackets

Xah Lee, 2011-07-19

This page shows you how to write a elisp script that checks thousands
of files for mismatched brackets.

----------------------------------------------------------------
The Problem

------------------------------------------------
Summary

I have 5 thousands files containing many matching pairs. I want to to
know if any of them contains mismatched brackets.

------------------------------------------------
Detail

The matching pairs includes these: () {} [] =E2=80=9C=E2=80=9D =E2=80=B9=E2=
=80=BA =C2=AB=C2=BB =E3=80=88=E3=80=89 =E3=80=8A=E3=80=8B =E3=80=90=E3=80=
=91 =E3=80=96=E3=80=97 =E3=80=8C=E3=80=8D
=E3=80=8E=E3=80=8F.

The program should be able to check all files in a dir, and report any
file that has mismatched bracket, and also indicate the line number or
positon where a mismatch occurs.

For those curious, if you want to know what these brackets are, see:

    =E2=80=A2 Syntax Design: Use of Unicode Matching Brackets as Specialize=
d
Delimiters
    =E2=80=A2 Intro to Chinese Punctuation with Computer Language Syntax
Perspectives

For other notes and conveniences about dealing with brackets in emacs,
see:

    =E2=80=A2 Emacs: Defining Keys to Navigate Brackets
    =E2=80=A2 =E2=80=9Cextend-selection=E2=80=9D at A Text Editor Feature: =
Extend Selection by
Semantic Unit
    =E2=80=A2 =E2=80=9Cselect-text-in-quote=E2=80=9D at Suggestions on Emac=
s's mark-word
Command

----------------------------------------------------------------
Solution

Here's outline of steps.

    =E2=80=A2 Go thru the file char by char, find a bracket char.
    =E2=80=A2 Check if the one on stack is a matching opening char. If so
remove it. Else, push the current onto the stack.
    =E2=80=A2 Repeat the above till no more bracket char in the file.
    =E2=80=A2 If the stack is not empty, then the file got mismatched
brackets. Report it.
    =E2=80=A2 Do the above on all files.

Here's some interesting use of lisp features to implement the above.

------------------------------------------------
Define Matching Pair Chars as =E2=80=9Calist=E2=80=9D

We begin by defining the chars we want to check, as a =E2=80=9Cassociation
list=E2=80=9D (aka =E2=80=9Calist=E2=80=9D). Like this:

(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("=E2=80=9C" . "=E2=80=9D")
                   ("=E2=80=B9" . "=E2=80=BA")
                   ("=C2=AB" . "=C2=BB")
                   ("=E3=80=90" . "=E3=80=91")
                   ("=E3=80=96" . "=E3=80=97")
                   ("=E3=80=88" . "=E3=80=89")
                   ("=E3=80=8A" . "=E3=80=8B")
                   ("=E3=80=8C" . "=E3=80=8D")
                   ("=E3=80=8E" . "=E3=80=8F")
                   )
      )

If you care only to check for curly quotes, you can remove elements
above. This is convenient because some files necessarily have
mismatched pairs such as the parenthesis, because that char is used
for many non-bracketing purposes (e.g. ASCII smiley).

A =E2=80=9Calist=E2=80=9D in lisp is basically a list of pairs (called key =
and value),
with the ability to search for a key or a value. The first element of
a pair is called its key, the second element is its value. Each pair
is a =E2=80=9Ccons=E2=80=9D, like this: (cons mykey myvalue), which can als=
o be
written using this syntax: (mykey . myvalue) for more easy reading.

The purpose of lisp's =E2=80=9Calist=E2=80=9D is similar to Python's dictio=
nary or
Pretty Home Page's array. It is also similar to hashmap, except that
alist can have duplicate keys, can search by values, maintains order,
and alist is not intended for massive number of elements. Elisp has a
hashmap datatype if you need that. (See: Emacs Lisp Tutorial: Hash
Table.)

(info "(elisp) Association Lists")

------------------------------------------------
Generate Regex String from alist

To search for a set of chars in emacs, we can read the buffer char-by-
char, or, we can simply use =E2=80=9Csearch-forward-regexp=E2=80=9D. To use=
 that,
first we need to generate a regex string from our matchPairs alist.

First, we defines/declare the string. Not a necessary step, but we do
it for clarity.

(setq searchRegex "")

Then we go thru the matchPairs alist. For each pair, we use =E2=80=9Ccar=E2=
=80=9D and
=E2=80=9Ccdr=E2=80=9D to get the chars and =E2=80=9Cconcat=E2=80=9D it to t=
he string. Like this:

(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair))
"|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

Then we remove the ending =E2=80=9C|=E2=80=9D.

(setq searchRegex (substring searchRegex 0 -1)) ; remove the ending
=E2=80=9C|=E2=80=9D

Then, change | it to \\|. In elisp regex, the | is literal. The =E2=80=9Cre=
gex
or=E2=80=9D is \|. And if you are using regex in elisp, elisp does not have=
 a
special regex string syntax, it only understands normal strings. So,
to feed to regex \|, you need to espace the first backslash. So, your
regex needs to have \\|. Here's how we do it:

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t
t)) ; change | to \\| for regex =E2=80=9Cor=E2=80=9D operation

You could shorten the above into just 2 lines by using \\| in the
=E2=80=9Cmapc=E2=80=9D step and not as a extra step of replacing | by \\|.

See also: emacs regex tutorial.

------------------------------------------------
Implement Stack Using Lisp List

Stack is done using lisp's list. e.g. '(1 2 3). The bottom of stack is
the first element. To add to the stack, do it like this: (setq mystack
(cons newitem mystack)). To remove a item from stack is this: (setq
mystack (cdr mystack)). The stack begin as a empty list: '().

For each element in the stack, we need the char and also its position,
so that we can report the position if the file does have mismatched
pairs.

We use a vector as entries for the stack. Each entry is like this:
(vector char pos). (See: Emacs Lisp Tutorial: List =EF=BC=86 Vector.)

Here's how to fetch a char from stack bottom, check if current char
matches, push to stack, pop from stack.

; check if current char is a closing char and is in our match pairs
alist.
; use =E2=80=9Crassoc=E2=80=9D to check alist's set of =E2=80=9Cvalues=E2=
=80=9D.
; It returns the first key/value pair found, or nil
(rassoc char matchPairs)

; add to stack
(setq myStack (cons (vector char pos) myStack) )

; pop stack
(setq myStack (cdr myStack) )

------------------------------------------------
Complete Code

Here's the complete code.

;; -*- coding: utf-8 -*-
;; 2011-07-15
;; go thru a file, check if all brackets are properly matched.
;; e.g. good: (=E2=80=A6{=E2=80=A6}=E2=80=A6 =E2=80=9C=E2=80=A6=E2=80=9D=E2=
=80=A6)
;; bad: ( [)]
;; bad: ( ( )

(setq inputFile "xx_test_file.txt" ) ; a test file.
(setq inputDir "~/web/xahlee_org/") ; must end in slash

(defvar matchPairs '() "a alist. For each pair, the car is opening
char, cdr is closing char.")
(setq matchPairs '(
                   ("(" . ")")
                   ("{" . "}")
                   ("[" . "]")
                   ("=E2=80=9C" . "=E2=80=9D")
                   ("=E2=80=B9" . "=E2=80=BA")
                   ("=C2=AB" . "=C2=BB")
                   ("=E3=80=90" . "=E3=80=91")
                   ("=E3=80=96" . "=E3=80=97")
                   ("" . "")
                   ("" . "")
                   ("=E3=80=8C" . "=E3=80=8D")
                   ("=E3=80=8E" . "=E3=80=8F")
                   )
      )

(defvar searchRegex "" "regex string of all pairs to search.")
(setq searchRegex "")
(mapc
 (lambda (mypair) ""
   (setq searchRegex (concat searchRegex (regexp-quote (car mypair))
"|" (regexp-quote (cdr mypair)) "|") )
   )
 matchPairs)

(setq searchRegex (replace-regexp-in-string "|$" "" searchRegex t
t)) ; remove the ending =E2=80=9C|=E2=80=9D

(setq searchRegex (replace-regexp-in-string "|" "\\|" searchRegex t
t)) ; change | to \\| for regex =E2=80=9Cor=E2=80=9D operation

(defun my-process-file (fpath)
  "process the file at fullpath fpath ..."
  (let (myBuffer myStack =CE=BEchar =CE=BEpos)

    (setq myStack '() ) ; each element is a vector [char position]
    (setq =CE=BEchar "") ; the current char found

    (when t
      ;; (not (string-match "/xx" fpath)) ; in case you want to skip
certain files

      (setq myBuffer (get-buffer-create " myTemp"))
      (set-buffer myBuffer)
      (insert-file-contents fpath nil nil nil t)

      (goto-char 1)
      (setq case-fold-search t)
      (while (search-forward-regexp searchRegex nil t)
        (setq =CE=BEpos (point)  )
        (setq =CE=BEchar (buffer-substring-no-properties =CE=BEpos (- =CE=
=BEpos
1))  )

        ;; (princ (format "-----------------------------\nfound char:
%s\n" =CE=BEchar) )

        (let ((isClosingCharQ nil) (matchedOpeningChar nil) )
          (setq isClosingCharQ (rassoc =CE=BEchar matchPairs))
          (when isClosingCharQ (setq matchedOpeningChar (car
isClosingCharQ) ) )

          ;; (princ (format "isClosingCharQ is: %s\n"
isClosingCharQ) )
          ;; (princ (format "matchedOpeningChar is: %s\n"
matchedOpeningChar) )

          (if
              (and
               (car myStack) ; not empty
               (equal (elt (car myStack) 0) matchedOpeningChar )
               )
              (progn
                ;; (princ (format "matched this bottom item on stack:
%s\n" (car myStack)) )
                (setq myStack (cdr myStack) )
                )
            (progn
              ;; (princ (format "did not match this bottom item on
stack: %s\n" (car myStack)) )
              (setq myStack (cons (vector =CE=BEchar =CE=BEpos) myStack) ) =
)
            )
          )
        ;; (princ "current stack: " )
        ;; (princ myStack )
        ;; (terpri )
        )

      (when (not (equal myStack nil))
        (princ "Error file: ")
        (princ fpath)
        (print (car myStack) )
        )
      (kill-buffer myBuffer)
      )
    ))

(require 'find-lisp)

(let (outputBuffer)
  (setq outputBuffer "*xah match pair output*" )
  (with-output-to-temp-buffer outputBuffer
    ;; (my-process-file inputFile)
    (mapc 'my-process-file (find-lisp-find-files inputDir "\\.txt$"))
    (princ "Done deal!")
    )
  )

I added many comments and debug code for easy understanding. If you
are not familiar with the many elisp idioms such as opening file,
buffers, printing to output, see: Emacs Lisp Idioms (for writing
interactive commands) =E2=97=87 Text Processing with Emacs Lisp Batch Style=
 .

To run the code, simply open it in emacs. Edit the line at the top for
=E2=80=9CinputDir=E2=80=9D. Then call =E2=80=9Ceval-buffer=E2=80=9D.

Here's a sample output:

Error file: c:/Users/h3/web/xahlee_org/p/time_machine/
Hettie_Potter_orig.txt
[")" 3625]
Error file: c:/Users/h3/web/xahlee_org/p/time_machine/
Hettie_Potter.txt
[")" 2338]
Error file: c:/Users/h3/web/xahlee_org/p/arabian_nights/xx/v1fn.txt
["=E2=80=9D" 185795]
Done deal!

The weird =CE=BE you see in my code is greek x. I use unicode char in
variable name for experimental purposes. You can just ignore it. (See:
Programing Style: Variable Naming: English Words Considered Harmful.)

------------------------------------------------
Advantages of Emacs Lisp

Note that the great advantage of using elisp for text processing,
instead of {perl, python, ruby, =E2=80=A6} is that many things are taken ca=
re
by the emacs environment.

I don't need to write code to declare file's encoding (emacs
automatically detects). No reading file is involved. Just open, save,
or move thru characters. No code needed for doing safety backup. (the
var =E2=80=9Cmake-backup-files=E2=80=9D controls that). You can easily open=
 the files
by its path with a click or key press. I can add just 2 lines so that
clicking on the error char in the output jumps to the location in the
file.

Any elisp script you write inside emacs automatically become extension
of emacs and can be used in a interactive way.

This problem is posted to a few comp.lang newsgroups as a fun
challenge. You can see several solutions in python, ruby, perl, common
lisp, at: a little parsing challenge =E2=98=BA (2011-07-17) @ Source
groups.google.com.

 Xah


------------------------------

Date: Wed, 20 Jul 2011 10:30:03 -0700 (PDT)
From: rusi <rustompmody@gmail.com>
Subject: =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?=
Message-Id: <b1168dba-5270-4710-a191-3520e91c7c6d@f17g2000prf.googlegroups.com>

On Jul 20, 9:31=A0pm, "Uri Guttman" <u...@StemSystems.com> wrote:
> a better parsing challenge. how can you parse usenet to keep this troll
> from posting on the wrong groups on usenet? first one to do so, wins the
> praise of his peers. 2nd one to do it makes sure the filter stays in
> place. all the rest will be rewarded by not seeing the troll anymore.
>
> anyone who actually engages in a thread with the troll should parse
> themselves out of existance.

Goedelian paradox: Is this thread in existence?


------------------------------

Date: Thu, 21 Jul 2011 05:58:48 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?=
Message-Id: <caf3941f-ec97-447d-bb84-c02dbcaf2487@h7g2000prf.googlegroups.com>

On Jul 19, 11:07=A0am, Thomas Jollans <t...@jollybox.de> wrote:
> On 19/07/11 18:54, Xah Lee wrote:
>
>
>
>
>
>
>
>
>
> > On Sunday, July 17, 2011 2:48:42 AM UTC-7, Raymond Hettinger wrote:
> >> On Jul 17, 12:47 am, Xah Lee <xah...@gmail.com> wrote:
> >>> i hope you'll participate. Just post solution here. Thanks.
>
> >>http://pastebin.com/7hU20NNL
>
> > just installed py3.
> > there seems to be a bug.
> > in this file
>
> >http://xahlee.org/p/time_machine/tm-ch04.html
>
> > there's a mismatched double curly quote. at position 28319.
>
> > the python code above doesn't seem to spot it?
>
> > here's the elisp script output when run on that dir:
>
> > Error file: c:/Users/h3/web/xahlee_org/p/time_machine/tm-ch04.html
> > ["=93" 28319]
> > Done deal!
>
> That script doesn't check that the balance is zero at the end of file.
>
> Patch:
>
> --- ../xah-raymond-old.py =A0 =A0 =A0 2011-07-19 20:05:13.000000000 +0200
> +++ ../xah-raymond.py =A0 2011-07-19 20:03:14.000000000 +0200
> @@ -16,6 +16,8 @@
> =A0 =A0 =A0 =A0 =A0elif c in closers:
> =A0 =A0 =A0 =A0 =A0 =A0 =A0if not stack or c !=3D stack.pop():
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return i
> + =A0 =A0if stack:
> + =A0 =A0 =A0 =A0return i
> =A0 =A0 =A0return -1
>
> =A0def scan(directory, encoding=3D'utf-8'):

Thanks a lot for the fix Raymond.

Though, the code seems to have a minor problem.
It works, but the report is wrong.
e.g. output:

30068: c:/Users/h3/web/xahlee_org/p/time_machine\tm-ch04.html

that 30068 position is the last char in the file.
The correct should be 28319. (or at least point somewhere in the file
at a bracket char that doesn't match.)

Today, i tried 3 more scripts. 2 fixed python3 versions, 1 ruby, all
failed again. I've reported the problems i encounter at python or ruby
newsgroups. If you are the author, a fix is very much appreciated.
I'll get back to your code and eventually do a blog of summary of all
different lang versions.

Am off to test that elaborate perl regex now... cross fingers.

 Xah. Mood: quite discouraged.


------------------------------

Date: Thu, 21 Jul 2011 06:23:58 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?=
Message-Id: <09e533d2-543f-4fb7-8355-a9c6d5635a97@f17g2000prf.googlegroups.com>


2011-07-21

On Jul 18, 12:09=C2=A0am, Rouslan Korneychuk <rousl...@msn.com> wrote:
> I don't know why, but I just had to try it (even though I don't usually
> use Perl and had to look up a lot of stuff). I came up with this:
>
> /(?|
> =C2=A0 =C2=A0 =C2=A0(\()(?&matched)([\}\]=E2=80=9D=E2=80=BA=C2=BB=E3=80=
=91=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(\{)(?&matched)([\)\]=E2=80=9D=E2=80=BA=C2=BB=E3=80=
=91=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(\[)(?&matched)([\)\}=E2=80=9D=E2=80=BA=C2=BB=E3=80=
=91=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E2=80=9C)(?&matched)([\)\}\]=E2=80=BA=C2=BB=E3=80=
=91=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E2=80=B9)(?&matched)([\)\}\]=E2=80=9D=C2=BB=E3=80=
=91=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=C2=AB)(?&matched)([\)\}\]=E2=80=9D=E2=80=BA=E3=80=
=91=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E3=80=90)(?&matched)([\)\}\]=E2=80=9D=E2=80=BA=C2=
=BB=E3=80=89=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E3=80=88)(?&matched)([\)\}\]=E2=80=9D=E2=80=BA=C2=
=BB=E3=80=91=E3=80=8B=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E3=80=8A)(?&matched)([\)\}\]=E2=80=9D=E2=80=BA=C2=
=BB=E3=80=91=E3=80=89=E3=80=8D=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E3=80=8C)(?&matched)([\)\}\]=E2=80=9D=E2=80=BA=C2=
=BB=E3=80=91=E3=80=89=E3=80=8B=E3=80=8F]|$) |
> =C2=A0 =C2=A0 =C2=A0(=E3=80=8E)(?&matched)([\)\}\]=E2=80=9D=E2=80=BA=C2=
=BB=E3=80=91=E3=80=89=E3=80=8B=E3=80=8D]|$))
> (?(DEFINE)(?<matched>(?:
> =C2=A0 =C2=A0 =C2=A0\((?&matched)\) |
> =C2=A0 =C2=A0 =C2=A0\{(?&matched)\} |
> =C2=A0 =C2=A0 =C2=A0\[(?&matched)\] |
> =C2=A0 =C2=A0 =C2=A0=E2=80=9C(?&matched)=E2=80=9D |
> =C2=A0 =C2=A0 =C2=A0=E2=80=B9(?&matched)=E2=80=BA |
> =C2=A0 =C2=A0 =C2=A0=C2=AB(?&matched)=C2=BB |
> =C2=A0 =C2=A0 =C2=A0=E3=80=90(?&matched)=E3=80=91 |
> =C2=A0 =C2=A0 =C2=A0=E3=80=88(?&matched)=E3=80=89 |
> =C2=A0 =C2=A0 =C2=A0=E3=80=8A(?&matched)=E3=80=8B |
> =C2=A0 =C2=A0 =C2=A0=E3=80=8C(?&matched)=E3=80=8D |
> =C2=A0 =C2=A0 =C2=A0=E3=80=8E(?&matched)=E3=80=8F |
> =C2=A0 =C2=A0 =C2=A0[^\(\{\[=E2=80=9C=E2=80=B9=C2=AB=E3=80=90=E3=80=88=E3=
=80=8A=E3=80=8C=E3=80=8E\)\}\]=E2=80=9D=E2=80=BA=C2=BB=E3=80=91=E3=80=89=E3=
=80=8B=E3=80=8D=E3=80=8F]++)*+))
> /sx;
>
> If the pattern matches, there is a mismatched bracket. $1 is set to the
> mismatched opening bracket. $-[1] is its location. $2 is the mismatched
> closing bracket or '' if the bracket was never closed. $-[2] is set to
> the location of the closing bracket or the end of the string if the
> bracket wasn't closed.
>
> I didn't write all that manually; it was generated with this:
>
> my @open =3D ('\(','\{','\[','=E2=80=9C','=E2=80=B9','=C2=AB','=E3=80=90'=
,'=E3=80=88','=E3=80=8A','=E3=80=8C','=E3=80=8E');
> my @close =3D ('\)','\}','\]','=E2=80=9D','=E2=80=BA','=C2=BB','=E3=80=91=
','=E3=80=89','=E3=80=8B','=E3=80=8D','=E3=80=8F');
>
> '(?|'.join('|',map
> {'('.$open[$_].')(?&matched)(['.join('',@close[0..($_-1),($_+1)..$#close]=
). ']|$)'}
> (0 .. $#open)).')(?(DEFINE)(?<matched>(?:'.join('|',map
> {$open[$_].'(?&matched)'.$close[$_]} (0 ..
> $#open)).'|[^'.join('',@open,@close).']++)*+))'

Thanks for the code.

are you willing to make it complete and standalone? i.e. i can run it
like this:

perl Rouslan_Korneychuk.pl dirPath

and it prints any file that has mismatched pair and line/column number
or the char position?

i'd do it myself but so far i tried 5 codes, 3 fixes, all failed. Not
a complain, but it does take time to gather the code, of different
langs by different people, properly document their authors and
original source urls, etc, and test it out on my envirenment. All
together in the past 3 days i spent perhaps a total of 4 hours running
several code and writing back etc and so far not one really worked.

i know perl well, but your code is a bit out of the ordinary =E2=98=BA. If
past days have been good experience, i might dive in and study for
fun.

 Xah


------------------------------

Date: Thu, 21 Jul 2011 08:36:18 -0700 (PDT)
From: Xah Lee <xahlee@gmail.com>
Subject: =?UTF-8?Q?Re=3A_a_little_parsing_challenge_=E2=98=BA?=
Message-Id: <65c72670-9030-41fa-af4d-bcf5189d0aae@f17g2000prf.googlegroups.com>

Ok. Here's a preliminary report.

=E3=80=88Lisp, Python, Perl, Ruby =E2=80=A6 Code to Validate Matching Brack=
ets=E3=80=89
http://xahlee.org/comp/validate_matching_brackets.html

it's taking too much time to go thru.

right now, i consider only one valid code, by Raymond Hettinger (with
minor edit from others).

right now, there's 2 other possible correct solution. One by Robert
Klemme but requires ruby19 but i only have ruby18x. One by Thomas
Jollans in Python 3 but didn't run on my machine perhaps due to some
unix/Windows issue, yet to be done.

the other 3 or 4 seems to be incomplete or just suggestion of ideas.

i haven't done extensive testing on my own code neither.
I'll revisit maybe in a few days.

Feel free to grab my report and make it nice. If you would like to fix
your code, feel free to email.

 Xah

On Jul 21, 7:26=C2=A0am, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Thu, Jul 21, 2011 at 6:58 AM, Xah Lee <xah...@gmail.com> wrote:
> > Thanks a lot for the fix Raymond.
>
> That fix was from Thomas Jollans, not Raymond Hettinger.
>
> > Though, the code seems to have a minor problem.
> > It works, but the report is wrong.
> > e.g. output:
>
> > 30068: c:/Users/h3/web/xahlee_org/p/time_machine\tm-ch04.html
>
> > that 30068 position is the last char in the file.
> > The correct should be 28319. (or at least point somewhere in the file
> > at a bracket char that doesn't match.)
> Previously you wrote:
> > If a file has mismatched matching-pairs, the script will display the
> > file name, and the =C2=A0line number and column number of the first
> > instance where a mismatched bracket occures. (or, just the char number
> > instead (as in emacs's =E2=80=9Cpoint=E2=80=9D))
>
> I submit that as the file contains no mismatched brackets (only an
> orphan bracket), the output is correct to specification (indeed you
> did not define any output for this case), if not necessarily useful.
>
> In other words, stop being picky. =C2=A0You may be willing to spend an ho=
ur
> or moe on this, but that doesn't mean anybody else is. =C2=A0Raymond gave
> you a basically working Python solution, but forgot one detail.
> Thomas fixed that detail for you but didn't invest the time to rewrite
> somebody else's function to get the output "correct". =C2=A0Continuing to
> harp on it at this point is verging on trolling.


------------------------------

Date: Thu, 21 Jul 2011 15:58:54 +1000
From: Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Subject: Re: a little parsing challenge =?UTF-8?B?4pi6?=
Message-Id: <4e27c01f$0$29965$c3e8da3$5496439d@news.astraweb.com>

On Thu, 21 Jul 2011 02:31 am Uri Guttman wrote:

> 
> a better parsing challenge. how can you parse usenet to keep this troll
> from posting on the wrong groups on usenet? first one to do so, wins the
> praise of his peers. 2nd one to do it makes sure the filter stays in
> place. all the rest will be rewarded by not seeing the troll anymore.
> 
> anyone who actually engages in a thread with the troll should parse
> themselves out of existance.

Somebody should make one of those "Douchebag foo" image memes. As in, 


Douchebag Poster

 - Complains about people engaging with cross-posting trolls

 - Cross-posts to five newsgroups while engaging with troll




-- 
Steven



------------------------------

Date: Thu, 21 Jul 2011 10:01:13 -0400
From: rabbits77 <rabbits77@my-deja.com>
Subject: Re: a little parsing challenge =?UTF-8?B?4pi6?=
Message-Id: <ab57a$4e28312a$ad30ae40$1625@news.eurofeeds.com>

> how would you then ejimicate all the groups to ignore the troll? the
> troll has been doing this for years and many ignore/killfile him. yet he
> always lands a few fish who xpost replies. he thrives on this.
Just what makes him a troll? I mean besides your opinion?
It seems he has engaged a large group of people in a profitable
discussion. It isn't like he is inciting a flame war. In fact, you seem
to be the only person offended by this activity.


------------------------------

Date: Thu, 21 Jul 2011 15:22:30 +0100
From: RedGrittyBrick <RedGrittyBrick@spamweary.invalid>
Subject: Re: a little parsing challenge =?UTF-8?B?4pi6?=
Message-Id: <4e28362a$0$2934$fa0fcedb@news.zen.co.uk>

On 21/07/2011 15:01, rabbits77 wrote:
>> how would you then ejimicate all the groups to ignore the troll? the
>> troll has been doing this for years and many ignore/killfile him. yet he
>> always lands a few fish who xpost replies. he thrives on this.
> Just what makes him a troll? I mean besides your opinion?
> It seems he has engaged a large group of people in a profitable
> discussion. It isn't like he is inciting a flame war. In fact, you seem
> to be the only person offended by this activity.

It may *seem* that way if you are unfamiliar with the long history of 
the OP's crossposts and the large number of published objections to them.

I'm not especially keen to see a Perl newsgroup filled with examples of 
Python, Lisp, Ruby and Scheme. Which is what the OP encourages.

-- 
RGB


------------------------------

Date: Wed, 20 Jul 2011 12:31:05 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?=
Message-Id: <87k4bcq5h2.fsf@quad.sysarch.com>


a better parsing challenge. how can you parse usenet to keep this troll
from posting on the wrong groups on usenet? first one to do so, wins the
praise of his peers. 2nd one to do it makes sure the filter stays in
place. all the rest will be rewarded by not seeing the troll anymore.

anyone who actually engages in a thread with the troll should parse
themselves out of existance.

uri

-- 
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


------------------------------

Date: Wed, 20 Jul 2011 12:06:18 -0700
From: merlyn@stonehenge.com (Randal L. Schwartz)
Subject: Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?=
Message-Id: <86zkk822mt.fsf@red.stonehenge.com>

>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:

Uri> a better parsing challenge. how can you parse usenet to keep this troll
Uri> from posting on the wrong groups on usenet? first one to do so, wins the
Uri> praise of his peers. 2nd one to do it makes sure the filter stays in
Uri> place. all the rest will be rewarded by not seeing the troll anymore.

Uri> anyone who actually engages in a thread with the troll should parse
Uri> themselves out of existance.

Since the newsgroups: line is not supposed to have spaces in it, that
makes both his post and your post invalid.  Hence, filter on invalid
posts.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.posterous.com/ for Smalltalk discussion


------------------------------

Date: Wed, 20 Jul 2011 14:57:25 -0600
From: Jason Earl <jearl@notengoamigos.org>
Subject: Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?=
Message-Id: <87k4bc3c22.fsf@notengoamigos.org>

On Wed, Jul 20 2011, Randal L. Schwartz wrote:

>>>>>> "Uri" == Uri Guttman <uri@StemSystems.com> writes:
>
> Uri> a better parsing challenge. how can you parse usenet to keep this troll
> Uri> from posting on the wrong groups on usenet? first one to do so, wins the
> Uri> praise of his peers. 2nd one to do it makes sure the filter stays in
> Uri> place. all the rest will be rewarded by not seeing the troll anymore.
>
> Uri> anyone who actually engages in a thread with the troll should parse
> Uri> themselves out of existance.
>
> Since the newsgroups: line is not supposed to have spaces in it, that
> makes both his post and your post invalid.  Hence, filter on invalid
> posts.

I suspect that the spaces you are seeing are being added by Gnus.  I see
them too (and I see them in your post as well), but they disappear when
I use "C-u g" and view the source of the posts.

Jason


------------------------------

Date: Thu, 21 Jul 2011 02:18:34 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?=
Message-Id: <87zkk8cg1x.fsf@quad.sysarch.com>

>>>>> "SD" == Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

  SD> On Thu, 21 Jul 2011 02:31 am Uri Guttman wrote:
  >> 
  >> a better parsing challenge. how can you parse usenet to keep this troll
  >> from posting on the wrong groups on usenet? first one to do so, wins the
  >> praise of his peers. 2nd one to do it makes sure the filter stays in
  >> place. all the rest will be rewarded by not seeing the troll anymore.
  >> 
  >> anyone who actually engages in a thread with the troll should parse
  >> themselves out of existance.

  SD> Somebody should make one of those "Douchebag foo" image memes. As in, 


  SD> Douchebag Poster

  SD>  - Complains about people engaging with cross-posting trolls

  SD>  - Cross-posts to five newsgroups while engaging with troll

how would you then ejimicate all the groups to ignore the troll? the
troll has been doing this for years and many ignore/killfile him. yet he
always lands a few fish who xpost replies. he thrives on this. 

uri

-- 
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


------------------------------

Date: Thu, 21 Jul 2011 13:01:40 -0400
From: "Uri Guttman" <uri@StemSystems.com>
Subject: Re: a little parsing challenge =?utf-8?Q?=E2=98=BA?=
Message-Id: <87pql3bma3.fsf@quad.sysarch.com>

>>>>> "r" == rabbits77  <rabbits77@my-deja.com> writes:

  >> how would you then ejimicate all the groups to ignore the troll? the
  >> troll has been doing this for years and many ignore/killfile him. yet he
  >> always lands a few fish who xpost replies. he thrives on this.

  r> Just what makes him a troll? I mean besides your opinion?
  r> It seems he has engaged a large group of people in a profitable
  r> discussion. It isn't like he is inciting a flame war. In fact, you seem
  r> to be the only person offended by this activity.

besides all the others agreeing he is a troll, notice how he NEVER
replies to any posts about his trolling. also no one with half a brain
would post to 5 different language groups with crap like he does. you
need to read some usenet history and see.

uri

-- 
Uri Guttman  --  uri AT perlhunter DOT com  ---  http://www.perlhunter.com --
------------  Perl Developer Recruiting and Placement Services  -------------
-----  Perl Code Review, Architecture, Development, Training, Support -------


------------------------------

Date: 6 Apr 2001 21:33:47 GMT (Last modified)
From: Perl-Users-Request@ruby.oce.orst.edu (Perl-Users-Digest Admin) 
Subject: Digest Administrivia (Last modified: 6 Apr 01)
Message-Id: <null>


Administrivia:

To submit articles to comp.lang.perl.announce, send your article to
clpa@perl.com.

Back issues are available via anonymous ftp from
ftp://cil-www.oce.orst.edu/pub/perl/old-digests. 

#For other requests pertaining to the digest, send mail to
#perl-users-request@ruby.oce.orst.edu. Do not waste your time or mine
#sending perl questions to the -request address, I don't have time to
#answer them even if I did know the answer.


------------------------------
End of Perl-Users Digest V11 Issue 3448
***************************************


home help back first fref pref prev next nref lref last post