[26020] in Source-Commits

home help back first fref pref prev next nref lref last post

Re: /svn/athena r25349 - trunk/debathena/config/auto-update/debian

daemon@ATHENA.MIT.EDU (Geoffrey Thomas)
Wed Aug 3 00:50:56 2011

Date: Wed, 3 Aug 2011 00:50:49 -0400 (EDT)
From: Geoffrey Thomas <geofft@MIT.EDU>
To: Jonathan D Reed <jdreed@mit.edu>
cc: source-commits@mit.edu
In-Reply-To: <201108030336.p733aFN7006371@drugstore.mit.edu>
Message-ID: <alpine.DEB.2.00.1108030017430.11210@tyger.mit.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

I'm a little skeptical that this is safe on arbitrary hardware. Do we 
support athena-auto-upgrade on hardware we don't own? We've been telling 
people they can run cluster on private hardware...

On my laptop (Toshiba Satellite L635, integrated Intel graphics, Debian 
Squeeze running Wheezy's 2.6.39 kernel, using inteldrmfb), running 
`vbetool post` snow crashes the display. The machine is still running, but 
unloading the framebuffer (while in X) doesn't do anything. Rebooting via 
kexec, with the current kernel loaded, works fine; the display goes black 
with some pixels in the top-left corner for a few seconds, and the 
framebuffer loads normally. However, GDM/X fails to load, and I get the 
following in dmesg:

[   41.251096] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[   41.251110] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state
[   47.703071] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[   47.703105] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 4 at 0, next 5)
[   50.619043] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
[   50.619065] [drm:i915_wait_request] *ERROR* i915_wait_request returns -11 (awaiting 9 at 0, next 10)
[   50.619224] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged!
[   50.619229] [drm:i915_reset] *ERROR* Failed to reset chip.

and the following in Xorg's logs (/var/log/gdm/:0.log) repeated hundreds 
of times:

(EE) intel(0): Failed to submit batch buffer, expect rendering corruption 
or even a frozen display: Input/output error.

and after about a minute, I get a cursor on a black screen. Logging in on 
Ctrl-Alt-F1 and restarting gdm seems to mostly get things working again.


I'd be happier if we conditionalized this to hardware we've tested this 
transition on. Even *Optiplex*) is probably fine. (This is ultimately 
probably just cosmetic since we do a machine reboot at the end of the 
install, but I worry that, since this is a real-mode call, it might do 
things we don't anticipate.)


It does occur to me that unloading the framebuffer is a bit irrelevant, 
since we expect this to be run while X owns the display. We could `chvt 1` 
beforehand if watching the kexec (from shutdown to the new kernel loading 
the framebuffer) is interesting.

-- 
Geoffrey Thomas
geofft@mit.edu

On Tue, 2 Aug 2011, Jonathan D Reed wrote:

> Author: jdreed
> Date: 2011-08-02 23:36:15 -0400 (Tue, 02 Aug 2011)
> New Revision: 25349
>
> Modified:
>   trunk/debathena/config/auto-update/debian/athena-auto-upgrade
>   trunk/debathena/config/auto-update/debian/changelog
> Log:
> In auto-update:
>  * Always wave dead chickens
>
>
> Modified: trunk/debathena/config/auto-update/debian/athena-auto-upgrade
> ===================================================================
> --- trunk/debathena/config/auto-update/debian/athena-auto-upgrade	2011-08-02 22:20:58 UTC (rev 25348)
> +++ trunk/debathena/config/auto-update/debian/athena-auto-upgrade	2011-08-03 03:36:15 UTC (rev 25349)
> @@ -178,15 +178,6 @@
> # Yes, really.  The actual value of product_name on the Vostro is:
> # "Vostro 320                   "
> case "$product_name" in
> -  "Vostro 320")
> -    # This is not so much Vostro 320-specific as it is i915 specific
> -    # Known to be necessary on PCI ID 8086:2e32, rev 03
> -    debug "Waving dead chicken..."
> -    # Run the video card's POST to reset it
> -    vbetool post
> -    # Disable the framebuffer, falling back to VGA
> -    echo 0 > /sys/class/vtconsole/vtcon1/bind
> -    ;;
>   "OptiPlex 790")
>     debug "Adding 'reboot=pci' for Dell 790"
>     kargs="$kargs reboot=pci"
> @@ -195,6 +186,14 @@
> if [ $(echo $kargs | wc -c) -ge 512 ]; then
>   complain "kargs exceeds 512 bytes.  That's not good."
> fi
> +# This is not so much Vostro 320-specific as it is i915 specific
> +# Known to be necessary on PCI ID 8086:2e32, rev 03
> +# Actually, on all hardware, because Dell sucks.  Or possibly Ubuntu.
> +debug "Waving dead chicken..."
> +# Run the video card's POST to reset it
> +vbetool post
> +# Disable the framebuffer, falling back to VGA
> +echo 0 > /sys/class/vtconsole/vtcon1/bind
> # Don't kexec -e here, because modern Ubuntu is unable to kexec while
> # X is running.  Instead, kexec -l and let the init script take care of.
> # Until Oneiric, when this will probably stop working if kexec-tools hasn't
>
> Modified: trunk/debathena/config/auto-update/debian/changelog
> ===================================================================
> --- trunk/debathena/config/auto-update/debian/changelog	2011-08-02 22:20:58 UTC (rev 25348)
> +++ trunk/debathena/config/auto-update/debian/changelog	2011-08-03 03:36:15 UTC (rev 25349)
> @@ -1,3 +1,9 @@
> +debathena-auto-update (1.28) unstable; urgency=low
> +
> +  * Always wave dead chickens
> +
> + -- Jonathan Reed <jdreed@mit.edu>  Tue, 02 Aug 2011 23:36:10 -0400
> +
> debathena-auto-update (1.27) unstable; urgency=low
>
>   [ Geoffrey Thomas ]
>
>

home help back first fref pref prev next nref lref last post