Opened 15 years ago

Closed 14 years ago

#7135 closed defect (invalid)

multiple [mythfrontend] <defunct>

Reported by: simons.philippe@… Owned by: Isaac Richards
Priority: trivial Milestone: unknown
Component: MythTV - General Version: head
Severity: low Keywords: mythwelcome
Cc: Ticket locked: no

Description

my box is starting mythwelcome in autostart (through autologin of mythtv user and a .xinitrc) I've 3 instances of mythfrontend and 2 are <defunct>

could be because mythwelcome try to lauchn mythfrontend before mythbackend is completely up and running

Attachments (8)

mythfrontend.log (3.3 KB) - added by simons.philippe@… 15 years ago.
mythfrontend.log
process.txt (378 bytes) - added by Josh Winters <fuxxociety@…> 14 years ago.
ps -efw
mlog.gz (22.1 KB) - added by Bill Meek <llibkeem@…> 14 years ago.
mythfrontend log
mediamonitor.diff (1.5 KB) - added by Bill Meek <llibkeem@…> 14 years ago.
hard codes udevadm rather than udevinfo (deprecated?)
mythtv-7135-defunct_processes.patch (585 bytes) - added by sphery 14 years ago.
mediamonitor-unix.cpp-svn-diff (769 bytes) - added by Bill Meek <llibkeem@…> 14 years ago.
Patch as modified per mdean's request.
udev.c (1.1 KB) - added by Bill Meek <llibkeem@…> 14 years ago.
Standalone udev library test.
mdean.patch.mediamonitor-unix.cpp (1.8 KB) - added by Bill Meek <llibkeem@…> 14 years ago.
From mdean, http://www.gossamer-threads.com/lists/mythtv/users/408825#408825

Download all attachments as: .zip

Change History (35)

comment:1 Changed 15 years ago by cpinkham

Status: newinfoneeded_new

Please provide log files from mythwelcome and mythfrontend, otherwise it is impossible for us to diagnose.

Changed 15 years ago by simons.philippe@…

Attachment: mythfrontend.log added

mythfrontend.log

comment:2 Changed 15 years ago by simons.philippe@…

here is mythwelcome log but there is not much intersting here...

comment:3 Changed 15 years ago by danielk

Milestone: 0.22unknown

Nothing notable in the log. We're probably just not waiting on the child process some place in mythwelcome. As a rule this doesn't consume any resources aside from the entry in the process table, so not a big deal to fix before 0.22.

Simon you are seeing this with trunk, not 0.21-fixes? In trunk, we use myth_system() which should be waiting on the mythfrontend pid to exit.

comment:4 Changed 15 years ago by simons.philippe@…

Yup, only with trunk, didnt see this with .21-fixes

comment:5 Changed 15 years ago by paulh

Status: infoneeded_newnew

Are you sure this is a problem with MythWelcome?? I'm seeing a defunct mythfrontend process from just starting it and letting it sit on the menu.

comment:6 Changed 15 years ago by simons.philippe@…

honestly, no, it was an assumption (seems i was wrong)

comment:7 Changed 14 years ago by Josh Winters <fuxxociety@…>

At the request with sphery from #mythtv-users:

I"m seeing the mythfrontend<defunct> processes and are NOT using mythwelcome.

My system is a mythbuntu based machine, without VDPAU, operating as a remote frontend.

I removed all traces of mythtv that I could find before installing mythtv 0.22-fixes.

This includes removing the old autostart entry from "Startup Programs" and replacing it with my own.

Attaching a snippet of 'ps -efw' for refrence.

Changed 14 years ago by Josh Winters <fuxxociety@…>

Attachment: process.txt added

ps -efw

comment:8 Changed 14 years ago by sphery

Component: MythTV - Mythwelcome & MythshutdownMythTV - General
Summary: mythwelcome creating mythfrontend <defunct>multiple [mythfrontend] <defunct>

Seems unrelated to mythwelcome.

comment:9 Changed 14 years ago by Dibblah

Status: newinfoneeded_new

Can you provide compressed logs with -v all, please - and a matching ps -efw for when you see the issue.

comment:10 Changed 14 years ago by Josh Winters <fuxxociety@…>

This may or may not be important, but I notice that once I get the <defunct> processes, they are reaped by the kernel when mythfrontend is killed (as they should be).

However, when I restart mythfrontend, the defunct processes come back with the new mythfrontend instance. This behavior is occurring as soon as mythfrontend is started, no sort of interaction with mythfrontend has been done otherwise.

comment:11 Changed 14 years ago by derliebegott@…

I am also doing the same:

  1. autologin with mingetty on tty7
  2. start mythwelcome from .xinitrc

I always see two defunct mythfrontend processes but there are no visible problems. Logs are absolutely ok.

root@mythbox:/tmp# ps -efw | grep mythfront mythtv 4004 3944 0 08:43 tty7 00:00:04 /usr/bin/mythfrontend -d -v general mythtv 4023 4004 0 08:43 tty7 00:00:00 [mythfrontend] <defunct> mythtv 4026 4004 0 08:43 tty7 00:00:00 [mythfrontend] <defunct>

Changed 14 years ago by Bill Meek <llibkeem@…>

Attachment: mlog.gz added

mythfrontend log

comment:12 Changed 14 years ago by Bill Meek <llibkeem@…>

On my combined frontend/backend running mythbuntu 9.04, when mythfrontend is started, I get:

  PID  PPID USER     STAT COMMAND
  13399     1 bill     Rl   mythfrontend --verbose all --logfile /var/log/mythtv/0.22-fe.log
  13420 13399 bill     Z     \_ [mythfrontend] <defunct>
  13423 13399 bill     Z     \_ [mythfrontend] <defunct>
  13425 13399 bill     Z     \_ [mythfrontend] <defunct>
  13428 13399 bill     Z     \_ [mythfrontend] <defunct>
  13431 13399 bill     Z     \_ [mythfrontend] <defunct>
  13434 13399 bill     Z     \_ [mythfrontend] <defunct>
  13437 13399 bill     Z     \_ [mythfrontend] <defunct>
  13440 13399 bill     Z     \_ [mythfrontend] <defunct>
  13443 13399 bill     Z     \_ [mythfrontend] <defunct>
MythTV Version   : 22679M
MythTV Branch    : trunk
Network Protocol : 50
Library API      : 0.22.20091022-1
QT Version       : 4.5.0

My frontend is not started automatically.

This was for a 4 minute session. Started, waited for 'quiet' log and exited.

Logfile attached (I think.)

Bill

comment:13 Changed 14 years ago by Bill Meek <llibkeem@…>

Started wondering why I have 13 defuncts and the report before mine and the original had only 2. So, I plugged in an SD card into my card reader, restarted the frontend and my defunct count dropped from 13 to 12.

Most of the time, there are no cards plugged into the card reader.

On a roll here, I shutdown and disconnected the USB plug for the card reader (which has 5 slots CF/SD/uSD...). Restarting the frontend again, I got 2 defuncts, (for /dev/sd0?) which I'm guessing match log entries:

MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power,

When the card reader is plugged in, there are 12 Error entries. 2 each for /dev/sd[defgh] and /dev/sr0.

bill@rc1:~/Download$ zcat mlog.gz|cut -c25- |grep /dev/
MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power,  
MMUnix::AddDevice() - Added /dev/sdd
MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power, 
MMUnix::AddDevice() - Added /dev/sde
MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power, 
MMUnix::AddDevice() - Added /dev/sdf
MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power, 
MMUnix::AddDevice() - Added /dev/sdg
MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power, 
MMUnix::AddDevice() - Added /dev/sdh
MMUnix::AddDevice() Error: failed to stat /dev/bdi, 
MMUnix::AddDevice() Error: failed to stat /dev/power, 
MMUnix::AddDevice() - Added /dev/sr0

There are truly no /dev/bdi or /dev/power files, however,

/sys/devices/pci0000:00/0000:00:14.1/host6/target6:0:0/6:0:0:0/block/sr0/bdi

exists.

My frontend logs go back as far as 2009-06-16, which is when I started running the trunk. The 1st entry with this type of error started on 2009-07-21 and I was at 20844. I update my box about every 100 commits. The card reader was purchased on 2008-11-12 and most likely installed the same day, although I won't swear to that.

Hope this helps.

Bill

Changed 14 years ago by Bill Meek <llibkeem@…>

Attachment: mediamonitor.diff added

hard codes udevadm rather than udevinfo (deprecated?)

comment:14 Changed 14 years ago by Bill Meek <llibkeem@…>

The attached changes work on a 9.04 mythbuntu distribution. If there are still distributions without udevadm, this 'fix' will give them the same problem we're seeing in this ticket.

Point me in the right direction and give me a shove and I'd be happy to make a real fix.

Details:

trunk/mythtv/libs/libmyth/mediamonitor-unix.cpp executes udevinfo, which doesn't exist in mythubuntu 9.04 and ubuntu 9.10 (the two distributions I have.)

% type udevinfo
-bash: type: udevinfo: not found

% type udevadm
udevadm is /sbin/udevadm

If the device is valid, both return the full path, as in:

udevinfo -q name -rp /sys/block/sdd      (existing code)
udevadm info -q name -rp /sys/block/sdd  (proposed) 
    /dev/sdd

In the error case, (... -q name -rp /sys/block/sdfoo) the existing 'udevinfo' code checks for a response of:

device not found in database

but udevadm returns:

device path not found

Also, if udevinfo is used but linked to udevadm, the following will appear in mythfrontend.log:

MMUnix::GetDeviceFile(/sys/block/sdd) - udevinfo error...
the program '/usr/local/bin/mythfrontend' called 'udevinfo',
it should use 'udevadm info <options>', this will stop working
in a future release

comment:15 Changed 14 years ago by sphery

Refs #6137

comment:16 Changed 14 years ago by stuartm

Status: infoneeded_newnew

We can't switch to udevadm, it's root-only on some distributions.

Changed 14 years ago by sphery

comment:17 Changed 14 years ago by sphery

Status: newinfoneeded_new

Attached patch, mythtv-7135-defunct_processes.patch , might work to prevent the zombie processes. I can't reproduce the issue, so I'm posting the patch for others to test.

The only way I could get defunct processes with my contrived test application was to delete the QProcess before the process exited. It's possible that this can happen when deleteLater() is called right after kill(), so the patch just puts another waitForFinished() call after the kill() to allow the process to die before deleteLater() gets called (it will wait much less than 2s, but will give up after 2s if the kill() fails). This extra waitForFinished() should probably be done regardless of whether it solves the issue or not.

However, in theory, if the specified application doesn't exist, waitForStarted() should return false, meaning that in the described cases, we should be returning from the waitForStarted() block above. (If you enable -v important,general,media on mythfrontend, you should see "Error - udevinfo failed to start!" and/or "Error - udevinfo failed to end! Terminating" which will indicate whether the problem is in the waitForStarted() or waitForFinished() block.) If the problem is in the waitForStarted() block, we may need to do the same kill() and waitForFinished() in it before deleteLater().

If someone can test and adjust the patch as necessary, please report back whether it works (and upload any modified version of the patch). #6137 (udevadm vs udevinfo) will be handled separately.

Changed 14 years ago by Bill Meek <llibkeem@…>

Patch as modified per mdean's request.

Changed 14 years ago by Bill Meek <llibkeem@…>

Attachment: udev.c added

Standalone udev library test.

comment:18 Changed 14 years ago by Bill Meek <llibkeem@…>

Thanks for the quick response. Unfortunately, the kill()/waitFor...() patch didn't eliminate the defunct processes.

Your theory is spot on, the "Error - udevinfo failed to start!" leg is the one taken.

I've attached your patch, to be sure I modified it correctly. Also, attached is a stand alone file with libudev tests that would solve??? #6137 and eliminate the need for these changes.

The program will take /sys/block/sda and find /dev/sda like udevinfo/adm do.

Full disclosure, I don't know spit about udev, but would be willing to try modifying the program if it looks reasonable. It did require getting libudev-dev, which I'm guessing is a drawback.

Changed 14 years ago by Bill Meek <llibkeem@…>

comment:19 Changed 14 years ago by Bill Meek <llibkeem@…>

The fix from the mailing list eliminates the defunct processes.

Nice work!

To be clear, the udev.c attachment I added wasn't intended as a fix or workaround, but a test of the udev library which could replace the existing calls to udevinfo.

What I don't know is how to find out which distributions have the library and/or if they require root access to run them.

comment:20 Changed 14 years ago by Bill Meek <llibkeem@…>

Bad test.

I had a udevinfo script that called udevadm in place when I tested.

Removed it and the defuncts are still happening.

Sorry.

comment:21 Changed 14 years ago by JohnnyJboss <johnnyjboss@…>

I have also tested this fix and it does not work.

1648 ? Rsl 0:11 /usr/local/bin/mythfrontend 1746 ? Z 0:00 \_ [mythfrontend] <defunct> 1748 ? Z 0:00 \_ [mythfrontend] <defunct>

comment:22 Changed 14 years ago by stuartm

Severity: mediumlow
Status: infoneeded_newnew

comment:23 Changed 14 years ago by Bill Meek <llibkeem@…>

Update: haven't given up, but the number of defuncts changes. I can't figure out why.

I usually had about 15 defunct processes. The next patch dropped that to a solid 5. Both "bdi" and "power" appear on my machine, "trace" was added because I saw a comment on gossamer-threads by jarpublic. This seems like a keeper.

Index: mediamonitor-unix.cpp
===================================================================
--- mediamonitor-unix.cpp	(revision 22872)
+++ mediamonitor-unix.cpp	(working copy)
@@ -553,7 +553,9 @@
 
             // skip some sysfs dirs that are _not_ sub-partitions
             if (*pit == "device" || *pit == "holders" || *pit == "queue"
-                                 || *pit == "slaves"  || *pit == "subsystem")
+                                 || *pit == "slaves"  || *pit == "subsystem"
+                                 || *pit == "bdi"     || *pit == "power"
+                                 || *pit == "trace")
                 continue;
 
             found_partitions |= FindPartitions(

I suspect that mediamonitor-unix.cpp isn't and wasn't causing the actual defuncts to occur. When I was at 5, there were a matching 5 failures to mount. That makes sense, because there are no memory cards plugged in any of the 5 slots. In these cases, myth_system() is called to do the mounts and I tried the below. But the number of defuncts now floats between 1 and 6. Not recommending this as a fix, but the change does make a difference.

Index: mythmedia.cpp
===================================================================
--- mythmedia.cpp	(revision 22872)
+++ mythmedia.cpp	(working copy)
@@ -121,7 +121,7 @@
                 .arg(m_DevicePath);
     
         VERBOSE(VB_MEDIA, QString("Executing '%1'").arg(MountCommand));
-        if (0 == myth_system(MountCommand)) 
+        if (0 == myth_system(MountCommand, MYTH_SYSTEM_DONT_BLOCK_PARENT)) 
         {
             if (DoMount)
             {

I keep tripping over why is it that udevinfo works and the error case doesn't? udevinfo returns the same string, although the default has no trailing new line. I tried appending a new line to the default case, (ret.append(QChar '\n')) to no avail.

comment:24 Changed 14 years ago by stuartm

Priority: minortrivial

There is a known bug with mythsystem in 0.22/trunk, multiple concurrent processes started with mythsystem share the same pid file meaning they aren't cleaned up properly when complete. This is probably related.

comment:25 Changed 14 years ago by Bill Meek <llibkeem@…>

I added VERBOSE lines to print out the child PIDs in myth_system and found that they didn't match those of the defunct processes. So I did the same [udevinfo->pid()] in mediamonitor-unix.cpp right after udevinfo->start and got an exact match! The usleep() in the following has eliminated the defuncts for me. The additional tests below that cut down the number of ?unrequired? attempts.

Index: mediamonitor-unix.cpp
===================================================================
--- mediamonitor-unix.cpp	(revision 22889)
+++ mediamonitor-unix.cpp	(working copy)
@@ -229,6 +229,8 @@
     args << sysfs;
     udevinfo->start("udevinfo", args);
 
+    usleep(100000);
+
     if (!udevinfo->waitForStarted(2000 /*ms*/))
     {
         VERBOSE(VB_MEDIA, msg + ", Error - udevinfo failed to start!");
@@ -553,7 +555,10 @@
 
             // skip some sysfs dirs that are _not_ sub-partitions
             if (*pit == "device" || *pit == "holders" || *pit == "queue"
-                                 || *pit == "slaves"  || *pit == "subsystem")
+                                 || *pit == "slaves"  || *pit == "subsystem"
+                                 || *pit == "bdi"     || *pit == "power"
+                                 || *pit == "trace")
+
                 continue;
 
             found_partitions |= FindPartitions(

Of course those affected could just add a shell script something like this as /sbin/udevinfo:

# If you already have udevinfo, you don't want this script!!!
UDEVADM=/sbin/udevadm

if [ ! -e $UDEVADM ]; then
    echo "Strange, you don't have $UDEVADM either, bye!"
    exit 1
fi

RESULT=`$UDEVADM info $1 $2 $3  $4 2>&1`

RETURN_CODE=$?

if [ $RETURN_CODE = 0 ]; then
    echo "$RESULT"
else
    echo "device not found in database"
fi

exit $RETURN_CODE

comment:26 Changed 14 years ago by Bill Meek <llibkeem@…>

Update:

This link http://bugreports.qt.nokia.com/browse/QTBUG-5990 seems to address the defunct/zombie problem.

Also, http://qt.nokia.com/doc/4.5/qprocess.html#starts speaks to starting a process that is still running. The log shows all my udevinfo->start()s (up to 16) were done in a 122msec window. I saw a log from another user with 5 removable devices do it in 112msec.

Also, the echo "device not found in database" above should have: 1>&2 appended in order to emulate udevinfo.

comment:27 Changed 14 years ago by sphery

Resolution: invalid
Status: newclosed

The problem is a bug in Qt ( http://bugreports.qt.nokia.com/browse/QTBUG-5990 ). When #6137 is fixed, it will prevent our seeing the symptoms of this Qt bug, even on broken Qt versions. Until #6137 is fixed, users may use workarounds mentioned above or keep an eye on QTBUG-5990.

Thanks to Bill Meek for tracking down the Qt bug and to Bill Meek and Josh Winters and all the others for all the debugging help.

Note: See TracTickets for help on using tickets.