[3687] in BarnOwl Developers
Re: [barnowl] Perl logging (#54)
daemon@ATHENA.MIT.EDU (Jason Gross)
Fri Jan 3 11:28:50 2014
Date: Fri, 03 Jan 2014 08:28:46 -0800
From: Jason Gross <notifications@github.com>
Reply-To: barnowl/barnowl <reply+p-8637730-f72a051d799c95c5be6f956fdf0ba42f65ba7769-4475081@reply.github.com>
To: barnowl/barnowl <barnowl@noreply.github.com>
In-Reply-To: <barnowl/barnowl/pull/54@github.com>
----==_mimepart_52c6e53ec97c4_68781277d00947f0
Content-Type: text/plain;
charset=UTF-8
Content-Transfer-Encoding: 7bit
> +}
> +
> +=head2 sanitize_filename BASE_PATH FILENAME
> +
> +Sanitizes C<FILENAME> and concatenates it with C<BASE_PATH>.
> +
> +In any filename, the characters C<"/">, C<"~">, and anything before
> +C<"!"> get replaced by underscores. If the resulting filename is
> +empty or equal to C<"."> or C<"..">, it is replaced with C<"weird">.
> +
> +=cut
> +
> +sub sanitize_filename {
> + my $base_path = BarnOwl::Internal::makepath(shift);
> + my $filename = shift;
> + $filename =~ s/[\/~\0- ]/_/g;
quoting @quentinmit:
> Is it safe to use a range like this in a potentially non-POSIX locale?
Looking at [perldoc for character ranges](http://perldoc.perl.org/perlrecharclass.html#Character-Ranges):
> Note that the two characters on either side of the hyphen are not necessarily both letters or both digits. Any character is possible, although not advisable. `['-?]` contains a range of characters, but most people will not know which characters that means. Furthermore, such ranges may lead to portability problems if the code has to run on a platform that uses a different character set, such as EBCDIC.
So maybe I should use `[\0-\x20]`? Or is that not safe either? On the other hand, according to [Wikipedia](http://en.wikipedia.org/wiki/EBCDIC#Codepage_layout), `[\0- ]` is closer to what we want to do than `[\0-\x20]`, because space is `\x40` in EBCDIC and there's nothing we want before that. Alternatively, maybe we should specify permitted characters to be anything above `\x7F` (anything above 127), together with `[[:graph:]]` or `[[:print:]]` (see [perldoc's POSIX Character Classes](http://perldoc.perl.org/perlrecharclass.html#POSIX-Character-Classes))? Do people who know more about perl/locals have input?
> Also, the commit message says "less than !" (0x21), but the range appears to be "less than space" (0x20).
The range is "less than or equal to space", which is the same as "less than !".
---
Reply to this email directly or view it on GitHub:
https://github.com/barnowl/barnowl/pull/54/files#r8637730
----==_mimepart_52c6e53ec97c4_68781277d00947f0
Content-Type: text/html;
charset=UTF-8
Content-Transfer-Encoding: 7bit
<p>In perl/lib/BarnOwl/Logging.pm:</p>
<pre style='color:#555'>> +}
> +
> +=head2 sanitize_filename BASE_PATH FILENAME
> +
> +Sanitizes C<FILENAME> and concatenates it with C<BASE_PATH>.
> +
> +In any filename, the characters C<"/">, C<"~">, and anything before
> +C<"!"> get replaced by underscores. If the resulting filename is
> +empty or equal to C<"."> or C<"..">, it is replaced with C<"weird">.
> +
> +=cut
> +
> +sub sanitize_filename {
> + my $base_path = BarnOwl::Internal::makepath(shift);
> + my $filename = shift;
> + $filename =~ s/[\/~\0- ]/_/g;
</pre>
<p>quoting <a href="https://github.com/quentinmit" class="user-mention">@quentinmit</a>:</p>
<blockquote>
<p>Is it safe to use a range like this in a potentially non-POSIX locale?</p>
</blockquote>
<p>Looking at <a href="http://perldoc.perl.org/perlrecharclass.html#Character-Ranges">perldoc for character ranges</a>:</p>
<blockquote>
<p>Note that the two characters on either side of the hyphen are not necessarily both letters or both digits. Any character is possible, although not advisable. <code>['-?]</code> contains a range of characters, but most people will not know which characters that means. Furthermore, such ranges may lead to portability problems if the code has to run on a platform that uses a different character set, such as EBCDIC.</p>
</blockquote>
<p>So maybe I should use <code>[\0-\x20]</code>? Or is that not safe either? On the other hand, according to <a href="http://en.wikipedia.org/wiki/EBCDIC#Codepage_layout">Wikipedia</a>, <code>[\0- ]</code> is closer to what we want to do than <code>[\0-\x20]</code>, because space is <code>\x40</code> in EBCDIC and there's nothing we want before that. Alternatively, maybe we should specify permitted characters to be anything above <code>\x7F</code> (anything above 127), together with <code>[[:graph:]]</code> or <code>[[:print:]]</code> (see <a href="http://perldoc.perl.org/perlrecharclass.html#POSIX-Character-Classes">perldoc's POSIX Character Classes</a>)? Do people who know more about perl/locals have input?</p>
<blockquote>
<p>Also, the commit message says "less than !" (0x21), but the range appears to be "less than space" (0x20).</p>
</blockquote>
<p>The range is "less than or equal to space", which is the same as "less than !".</p>
<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br>Reply to this email directly or <a href='https://github.com/barnowl/barnowl/pull/54/files#r8637730'>view it on GitHub</a>.<img src='https://github.com/notifications/beacon/4475081__eyJzY29wZSI6Ik5ld3NpZXM6QmVhY29uIiwiZXhwaXJlcyI6MTcwNDI5OTMyNiwiZGF0YSI6eyJpZCI6MjI3OTE3NzN9fQ==--7f0291033d0537019ceb6cb5c580a8156d1085c0.gif' height='1' width='1'></p>
----==_mimepart_52c6e53ec97c4_68781277d00947f0--