[8178] in Release_7.7_team

home help back first fref pref prev next nref lref last post

Future of Athena development

daemon@ATHENA.MIT.EDU (Jonathan D Reed)
Tue Dec 22 13:51:28 2015

From: Jonathan D Reed <jdreed@mit.edu>
To: release-team <release-team@mit.edu>
CC: Patricia Sheppard <pshepp@mit.edu>, Matthew Harrington <mjharrin@mit.edu>
Date: Tue, 22 Dec 2015 18:51:23 +0000
Message-ID: <3FFC7F4D2201CE49B49E5580FE0B5C0A01126DF71D@OC11EXPO30.exchange.mit.edu>
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_3FFC7F4D2201CE49B49E5580FE0B5C0A01126DF71DOC11EXPO30exc_"
MIME-Version: 1.0

--_000_3FFC7F4D2201CE49B49E5580FE0B5C0A01126DF71DOC11EXPO30exc_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

(CC'ing Pat & Matt since they were involved in my offboarding discussion; S=
ean & Garry should already be on this list)

Hi folks,

There's two major issues I'd like to get some input on:

1) Changes to production, going forward

I took an extended lunch break today and took a look at the machines in Hay=
den which were hanging on login.  There does indeed seem to be something go=
ing on, possibly account dependent, and/or possibly an obscure race conditi=
on or an upstream bug involving gsettings.   /etc/X11/Xsession.d/90qt-a11y =
contains a single invocation of "gsettings get" which should not be even a =
little-bit controversial, yet somehow is causing gsettings to become a zomb=
ie process.  A stack trace is unenlightening, and people who have a better =
understanding of the kernel and userspace than I do are out of  ideas.

The quick-and-dirty fix is to simply divert that file out of the way, but t=
hat's clearly a band-aid, because the root cause has not been identified. N=
either gsettings nor dconf have changed since 2014, but the Unity login pro=
cess is a maze of twisty passages, and it would take me at least 40 FTE hou=
rs to eliminate all possible culprits (and I don't have dedicated hardware =
anymore, even if I had the free time).

That having been said, I'd like to come up with a process for someone in IS=
&T to sign off on changes to the clusters in production, lest something go =
wrong.  Obviously we can roll back, and we have the update hook, but the pr=
evious workflow policy operated under the assumption that there was full-ti=
me staff devoted to development and release engineering, and that's no long=
er true.

2) Debathena 16.04

I have heard rumors that planning for a 16.04 release is in the works.  All=
 my off-boarding and knowledge transfer operated under the assumption that =
there were no future cluster releases.  The last communication I have on th=
e subject is from Pat, on a thread from April 29, 2015, which is as follows=
:

"[...] 14.04 is to be the last release from IS&T, with the caveat that SIPB=
 may do a 15.04 release which generate some support tickets for us in the f=
uture in terms of private workstations."

If that is no longer true, I'd strongly encourage whoever is doing the deve=
lopment to make some drastic changes, and throw away as much legacy code as=
 possible.  Back when 16.04 and/or VDI was still on the table, we had talke=
d about giving up completely on the idea of AFS home directories: instead u=
sers would get a temporary home directory on local disk, and would explicit=
ly navigate to their AFS homedir (via /afs or /mit) if they wished to acces=
s files in it.  Obviously there are some challenges there, such as preventi=
ng users from losing data by accidentally saving it to local disk, and reta=
ining some shared state, such as browser bookmarks and certs.  This would a=
lso mean that users' customizations won't follow them around, but I think w=
e established long ago that the vast majority of cluster users do not care,=
 and in fact those who are most vocal about losing legacy features are typi=
cally people with no MIT affiliation who use the clusters as general purpos=
e public computing, and would be better served by their local public librar=
y.

I'm happy to be involved in some brainstorming or planning meetings, if ind=
eed there are plans for a 16.04 release, but at this point my schedule is s=
uch that I can't really commit to any substantive development effort.

-Jon

--_000_3FFC7F4D2201CE49B49E5580FE0B5C0A01126DF71DOC11EXPO30exc_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<style type=3D"text/css" id=3D"owaParaStyle"></style>
</head>
<body fpstyle=3D"1" ocsi=3D"0">
<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =
10pt;">
<div style=3D"font-size: 13.3333px;">(CC'ing Pat &amp; Matt since they were=
 involved in my offboarding discussion; Sean &amp; Garry should already be =
on this list)</div>
<div><br>
</div>
Hi folks,
<div><br>
</div>
<div>There's two major issues I'd like to get some input on:</div>
<div><br>
</div>
<div>1) Changes to production, going forward</div>
<div><br>
</div>
<div>I took an extended lunch break today and took a look at the machines i=
n Hayden which were hanging on login. &nbsp;There does indeed seem to be so=
mething going on, possibly account dependent, and/or possibly an obscure ra=
ce condition or an upstream bug involving
 gsettings. &nbsp; /etc/X11/Xsession.d/90qt-a11y&nbsp;<span style=3D"font-s=
ize: 10pt;">contains a single invocation of &quot;gsettings get&quot; which=
 should not be even a&nbsp;</span><span style=3D"font-size: 10pt;">little-b=
it controversial, yet somehow is causing gsettings to become
 a </span><span style=3D"font-size: 10pt;">zombie process. &nbsp;A stack tr=
ace is unenlightening, and people who have a&nbsp;</span><span style=3D"fon=
t-size: 10pt;">better understanding of the kernel and userspace than I do a=
re out of &nbsp;</span><span style=3D"font-size: 10pt;">ideas.</span></div>
<div><br>
</div>
<div>The quick-and-dirty fix is to simply divert that file out of the way, =
but&nbsp;<span style=3D"font-size: 10pt;">that's clearly a band-aid, becaus=
e the root cause has not been identified.&nbsp;</span><span style=3D"font-s=
ize: 10pt;">Neither gsettings nor dconf have changed
 since 2014, but the Unity login&nbsp;</span><span style=3D"font-size: 10pt=
;">process is a maze of twisty passages, and it would take me at least 40 F=
TE hours&nbsp;</span><span style=3D"font-size: 10pt;">to eliminate all poss=
ible culprits (and I don't have dedicated hardware&nbsp;</span><span style=
=3D"font-size: 10pt;">anymore,
 even if I had the free time).</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">That having been said, I'd like to co=
me up with a process for someone in IS&amp;T to sign off on changes to the =
clusters in production, lest something go wrong. &nbsp;Obviously we can rol=
l back, and we have the update hook, but the
 previous workflow policy operated under the assumption that there was full=
-time staff devoted to development and release engineering, and that's no l=
onger true.</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">2) Debathena 16.04</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-size: 10pt;">I have heard rumors that planning for=
 a 16.04 release is in the works. &nbsp;</span><span style=3D"font-size: 13=
.3333px;">All my off-boarding and knowledge transfer operated under the ass=
umption that there were no future cluster
 releases.</span><span style=3D"font-size: 10pt;">&nbsp;&nbsp;The last comm=
unication I have on the subject is from Pat, on a thread from April 29, 201=
5, which is as follows:</span></div>
<div><span style=3D"font-size: 10pt;"><br>
</span></div>
<div><span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif; =
font-size: 13.3333px;">&quot;[...] 14.04 is to be the last release from IS&=
amp;T, with the&nbsp;</span><span style=3D"font-family: 'Segoe UI', Helveti=
ca, Arial, sans-serif; font-size: 13.3333px;">caveat
 that SIPB may do a 15.04 release which generate some support&nbsp;</span><=
span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif; font-s=
ize: 13.3333px;">tickets for us in the future in terms of private workstati=
ons.&quot;</span></div>
<div><br>
</div>
<div>If that is no longer true, I'd strongly encourage whoever is doing the=
 development to make some drastic changes, and throw away as much legacy co=
de as possible. &nbsp;Back when 16.04 and/or VDI was still on the table, we=
 had talked about giving up completely
 on the idea of AFS home directories: instead users would get a temporary h=
ome directory on local disk, and would explicitly navigate to their AFS hom=
edir (via /afs or /mit) if they wished to access files in it. &nbsp;Obvious=
ly there are some challenges there, such
 as preventing users from losing data by accidentally saving it to local di=
sk, and retaining some shared state, such as browser bookmarks and certs. &=
nbsp;This would also mean that users' customizations won't follow them arou=
nd, but I think we established long ago
 that the vast majority of cluster users do not care, and in fact those who=
 are most vocal about losing legacy features are typically people with no M=
IT affiliation who use the clusters as general purpose public computing, an=
d would be better served by their
 local public library.</div>
<div><br>
</div>
<div>I'm happy to be involved in some brainstorming or planning meetings, i=
f indeed there are plans for a 16.04 release, but at this point my schedule=
 is such that I can't really commit to any substantive development effort.<=
/div>
<div><br>
</div>
<div>-Jon</div>
</div>
</body>
</html>

--_000_3FFC7F4D2201CE49B49E5580FE0B5C0A01126DF71DOC11EXPO30exc_--

home help back first fref pref prev next nref lref last post