[36979] in Kerberos

home help back first fref pref prev next nref lref last post

Re: Erratic behavior of full resync process

daemon@ATHENA.MIT.EDU (Greg Hudson)
Wed May 13 13:56:56 2015

Message-ID: <55539050.3080907@mit.edu>
Date: Wed, 13 May 2015 13:56:32 -0400
From: Greg Hudson <ghudson@mit.edu>
MIME-Version: 1.0
To: "Leonard J. Peirce" <leonard.peirce+kerberos@wmich.edu>, kerberos@mit.edu
In-Reply-To: <5552662D.7090500@wmich.edu>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: kerberos-bounces@mit.edu

On 05/12/2015 04:44 PM, Leonard J. Peirce wrote:
>     Authentication attempt failed: 172.30.110.46, GSS-API error strings are:
>         Unspecified GSS failure.  Minor code may provide more information
>         Clock skew too great

I don't know of a reason why this would happen with synchronized clocks.
 You could try instrumenting the code in
lib/krb5/krb/rd_req_dec.c which calls krb5_check_clockskew() to find out
what the authenticator timestamp and server local time are; I don't know
of an easier way to investigate.

> On the slave I see syslog entries showing repeated problems with kpropd
> connecting to the master:
> 
>     /usr/sbin/kpropd: GSS-API (or Kerberos) error while initializing /usr/sbin/kpropd interface, retrying

I assume these correspond to the failed authentication attempts logged
by kadmind.

> I start kpropd with -d -S and use strace on it and I see that repeatedly
> opens /dev/urandom and reads from it just before I see the above error.

That doesn't seem unusual.

>     /usr/sbin/kpropd: Connection reset by peer while reading database block starting at offset 92340224
>     Full resync was unsuccessful

> Unfortunately, the resync was not successful.  Often (but not always), when
> kprop -f starts on the master, the slave_datatrans file will *partially*
> copy to the slave, often 60-90% of the data, before the connection hangs
> and then times out.  I have run strace on both the kprop and kpropd processes
> while they are connected.  The kprop on the master hangs during a write()
> for several minutes and then eventually times out:

>     Process 3183 attached - interrupt to quit
>     writev(4, [{"\240\37\26+[\16\247\tC\21\6/\243\217\340\0231f\362\245\3\214$\246\227\231N\265\351\366\1\233"..., 22106}], 1) = -1 
> ETIMEDOUT (Connection timed out)

You don't say what's happening on the slave at this point.  Is it also
hanging in a read() at the same time?  Can you correlate these events
with packet captures on both ends to see if a network element
interjected an RST?

> In my debugging attempts, I tried starting kpropd with
> 
>     kpropd -S -d -P NNN
> 
> and then attempt to run
> 
>     kprop -f slave_datatrans -P NNN r.test.admin.private
> 
> on the master but kpropd on the slave doesn't appear to be listening
> on port NNN.  Am I misunderstanding something?

In 1.10, with incremental propagation configured, krpopd doesn't listen
for kprop connections except when it has just requested a full dump from
kadmind.  In 1.13 it should always be listening.
________________________________________________
Kerberos mailing list           Kerberos@mit.edu
https://mailman.mit.edu/mailman/listinfo/kerberos

home help back first fref pref prev next nref lref last post