Opened 15 years ago
Closed 14 years ago
#7135 closed defect (invalid)
multiple [mythfrontend] <defunct>
Reported by: | Owned by: | Isaac Richards | |
---|---|---|---|
Priority: | trivial | Milestone: | unknown |
Component: | MythTV - General | Version: | head |
Severity: | low | Keywords: | mythwelcome |
Cc: | Ticket locked: | no |
Description
my box is starting mythwelcome in autostart (through autologin of mythtv user and a .xinitrc) I've 3 instances of mythfrontend and 2 are <defunct>
could be because mythwelcome try to lauchn mythfrontend before mythbackend is completely up and running
Attachments (8)
Change History (35)
comment:1 Changed 15 years ago by
Status: | new → infoneeded_new |
---|
comment:3 Changed 15 years ago by
Milestone: | 0.22 → unknown |
---|
Nothing notable in the log. We're probably just not waiting on the child process some place in mythwelcome. As a rule this doesn't consume any resources aside from the entry in the process table, so not a big deal to fix before 0.22.
Simon you are seeing this with trunk, not 0.21-fixes? In trunk, we use myth_system() which should be waiting on the mythfrontend pid to exit.
comment:5 Changed 15 years ago by
Status: | infoneeded_new → new |
---|
Are you sure this is a problem with MythWelcome?? I'm seeing a defunct mythfrontend process from just starting it and letting it sit on the menu.
comment:7 Changed 14 years ago by
At the request with sphery from #mythtv-users:
I"m seeing the mythfrontend<defunct> processes and are NOT using mythwelcome.
My system is a mythbuntu based machine, without VDPAU, operating as a remote frontend.
I removed all traces of mythtv that I could find before installing mythtv 0.22-fixes.
This includes removing the old autostart entry from "Startup Programs" and replacing it with my own.
Attaching a snippet of 'ps -efw' for refrence.
comment:8 Changed 14 years ago by
Component: | MythTV - Mythwelcome & Mythshutdown → MythTV - General |
---|---|
Summary: | mythwelcome creating mythfrontend <defunct> → multiple [mythfrontend] <defunct> |
Seems unrelated to mythwelcome.
comment:9 Changed 14 years ago by
Status: | new → infoneeded_new |
---|
Can you provide compressed logs with -v all, please - and a matching ps -efw for when you see the issue.
comment:10 Changed 14 years ago by
This may or may not be important, but I notice that once I get the <defunct> processes, they are reaped by the kernel when mythfrontend is killed (as they should be).
However, when I restart mythfrontend, the defunct processes come back with the new mythfrontend instance. This behavior is occurring as soon as mythfrontend is started, no sort of interaction with mythfrontend has been done otherwise.
comment:11 Changed 14 years ago by
I am also doing the same:
- autologin with mingetty on tty7
- start mythwelcome from .xinitrc
I always see two defunct mythfrontend processes but there are no visible problems. Logs are absolutely ok.
root@mythbox:/tmp# ps -efw | grep mythfront mythtv 4004 3944 0 08:43 tty7 00:00:04 /usr/bin/mythfrontend -d -v general mythtv 4023 4004 0 08:43 tty7 00:00:00 [mythfrontend] <defunct> mythtv 4026 4004 0 08:43 tty7 00:00:00 [mythfrontend] <defunct>
comment:12 Changed 14 years ago by
On my combined frontend/backend running mythbuntu 9.04, when mythfrontend is started, I get:
PID PPID USER STAT COMMAND 13399 1 bill Rl mythfrontend --verbose all --logfile /var/log/mythtv/0.22-fe.log 13420 13399 bill Z \_ [mythfrontend] <defunct> 13423 13399 bill Z \_ [mythfrontend] <defunct> 13425 13399 bill Z \_ [mythfrontend] <defunct> 13428 13399 bill Z \_ [mythfrontend] <defunct> 13431 13399 bill Z \_ [mythfrontend] <defunct> 13434 13399 bill Z \_ [mythfrontend] <defunct> 13437 13399 bill Z \_ [mythfrontend] <defunct> 13440 13399 bill Z \_ [mythfrontend] <defunct> 13443 13399 bill Z \_ [mythfrontend] <defunct>
MythTV Version : 22679M MythTV Branch : trunk Network Protocol : 50 Library API : 0.22.20091022-1 QT Version : 4.5.0
My frontend is not started automatically.
This was for a 4 minute session. Started, waited for 'quiet' log and exited.
Logfile attached (I think.)
Bill
comment:13 Changed 14 years ago by
Started wondering why I have 13 defuncts and the report before mine and the original had only 2. So, I plugged in an SD card into my card reader, restarted the frontend and my defunct count dropped from 13 to 12.
Most of the time, there are no cards plugged into the card reader.
On a roll here, I shutdown and disconnected the USB plug for the card reader (which has 5 slots CF/SD/uSD...). Restarting the frontend again, I got 2 defuncts, (for /dev/sd0?) which I'm guessing match log entries:
MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power,
When the card reader is plugged in, there are 12 Error entries. 2 each for /dev/sd[defgh] and /dev/sr0.
bill@rc1:~/Download$ zcat mlog.gz|cut -c25- |grep /dev/ MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power, MMUnix::AddDevice() - Added /dev/sdd MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power, MMUnix::AddDevice() - Added /dev/sde MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power, MMUnix::AddDevice() - Added /dev/sdf MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power, MMUnix::AddDevice() - Added /dev/sdg MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power, MMUnix::AddDevice() - Added /dev/sdh MMUnix::AddDevice() Error: failed to stat /dev/bdi, MMUnix::AddDevice() Error: failed to stat /dev/power, MMUnix::AddDevice() - Added /dev/sr0
There are truly no /dev/bdi or /dev/power files, however,
/sys/devices/pci0000:00/0000:00:14.1/host6/target6:0:0/6:0:0:0/block/sr0/bdi
exists.
My frontend logs go back as far as 2009-06-16, which is when I started running the trunk. The 1st entry with this type of error started on 2009-07-21 and I was at 20844. I update my box about every 100 commits. The card reader was purchased on 2008-11-12 and most likely installed the same day, although I won't swear to that.
Hope this helps.
Bill
Changed 14 years ago by
Attachment: | mediamonitor.diff added |
---|
hard codes udevadm rather than udevinfo (deprecated?)
comment:14 Changed 14 years ago by
The attached changes work on a 9.04 mythbuntu distribution. If there are still distributions without udevadm, this 'fix' will give them the same problem we're seeing in this ticket.
Point me in the right direction and give me a shove and I'd be happy to make a real fix.
Details:
trunk/mythtv/libs/libmyth/mediamonitor-unix.cpp executes udevinfo, which doesn't exist in mythubuntu 9.04 and ubuntu 9.10 (the two distributions I have.)
% type udevinfo -bash: type: udevinfo: not found % type udevadm udevadm is /sbin/udevadm
If the device is valid, both return the full path, as in:
udevinfo -q name -rp /sys/block/sdd (existing code) udevadm info -q name -rp /sys/block/sdd (proposed) /dev/sdd
In the error case, (... -q name -rp /sys/block/sdfoo) the existing 'udevinfo' code checks for a response of:
device not found in database
but udevadm returns:
device path not found
Also, if udevinfo is used but linked to udevadm, the following will appear in mythfrontend.log:
MMUnix::GetDeviceFile(/sys/block/sdd) - udevinfo error... the program '/usr/local/bin/mythfrontend' called 'udevinfo', it should use 'udevadm info <options>', this will stop working in a future release
comment:16 Changed 14 years ago by
Status: | infoneeded_new → new |
---|
We can't switch to udevadm, it's root-only on some distributions.
Changed 14 years ago by
Attachment: | mythtv-7135-defunct_processes.patch added |
---|
comment:17 Changed 14 years ago by
Status: | new → infoneeded_new |
---|
Attached patch, mythtv-7135-defunct_processes.patch , might work to prevent the zombie processes. I can't reproduce the issue, so I'm posting the patch for others to test.
The only way I could get defunct processes with my contrived test application was to delete the QProcess before the process exited. It's possible that this can happen when deleteLater() is called right after kill(), so the patch just puts another waitForFinished() call after the kill() to allow the process to die before deleteLater() gets called (it will wait much less than 2s, but will give up after 2s if the kill() fails). This extra waitForFinished() should probably be done regardless of whether it solves the issue or not.
However, in theory, if the specified application doesn't exist, waitForStarted() should return false, meaning that in the described cases, we should be returning from the waitForStarted() block above. (If you enable -v important,general,media on mythfrontend, you should see "Error - udevinfo failed to start!" and/or "Error - udevinfo failed to end! Terminating" which will indicate whether the problem is in the waitForStarted() or waitForFinished() block.) If the problem is in the waitForStarted() block, we may need to do the same kill() and waitForFinished() in it before deleteLater().
If someone can test and adjust the patch as necessary, please report back whether it works (and upload any modified version of the patch). #6137 (udevadm vs udevinfo) will be handled separately.
Changed 14 years ago by
Attachment: | mediamonitor-unix.cpp-svn-diff added |
---|
Patch as modified per mdean's request.
comment:18 Changed 14 years ago by
Thanks for the quick response. Unfortunately, the kill()/waitFor...() patch didn't eliminate the defunct processes.
Your theory is spot on, the "Error - udevinfo failed to start!" leg is the one taken.
I've attached your patch, to be sure I modified it correctly. Also, attached is a stand alone file with libudev tests that would solve??? #6137 and eliminate the need for these changes.
The program will take /sys/block/sda and find /dev/sda like udevinfo/adm do.
Full disclosure, I don't know spit about udev, but would be willing to try modifying the program if it looks reasonable. It did require getting libudev-dev, which I'm guessing is a drawback.
Changed 14 years ago by
Attachment: | mdean.patch.mediamonitor-unix.cpp added |
---|
comment:19 Changed 14 years ago by
The fix from the mailing list eliminates the defunct processes.
Nice work!
To be clear, the udev.c attachment I added wasn't intended as a fix or workaround, but a test of the udev library which could replace the existing calls to udevinfo.
What I don't know is how to find out which distributions have the library and/or if they require root access to run them.
comment:20 Changed 14 years ago by
Bad test.
I had a udevinfo script that called udevadm in place when I tested.
Removed it and the defuncts are still happening.
Sorry.
comment:21 Changed 14 years ago by
I have also tested this fix and it does not work.
1648 ? Rsl 0:11 /usr/local/bin/mythfrontend 1746 ? Z 0:00 \_ [mythfrontend] <defunct> 1748 ? Z 0:00 \_ [mythfrontend] <defunct>
comment:22 Changed 14 years ago by
Severity: | medium → low |
---|---|
Status: | infoneeded_new → new |
comment:23 Changed 14 years ago by
Update: haven't given up, but the number of defuncts changes. I can't figure out why.
I usually had about 15 defunct processes. The next patch dropped that to a solid 5. Both "bdi" and "power" appear on my machine, "trace" was added because I saw a comment on gossamer-threads by jarpublic. This seems like a keeper.
Index: mediamonitor-unix.cpp =================================================================== --- mediamonitor-unix.cpp (revision 22872) +++ mediamonitor-unix.cpp (working copy) @@ -553,7 +553,9 @@ // skip some sysfs dirs that are _not_ sub-partitions if (*pit == "device" || *pit == "holders" || *pit == "queue" - || *pit == "slaves" || *pit == "subsystem") + || *pit == "slaves" || *pit == "subsystem" + || *pit == "bdi" || *pit == "power" + || *pit == "trace") continue; found_partitions |= FindPartitions(
I suspect that mediamonitor-unix.cpp isn't and wasn't causing the actual defuncts to occur. When I was at 5, there were a matching 5 failures to mount. That makes sense, because there are no memory cards plugged in any of the 5 slots. In these cases, myth_system() is called to do the mounts and I tried the below. But the number of defuncts now floats between 1 and 6. Not recommending this as a fix, but the change does make a difference.
Index: mythmedia.cpp =================================================================== --- mythmedia.cpp (revision 22872) +++ mythmedia.cpp (working copy) @@ -121,7 +121,7 @@ .arg(m_DevicePath); VERBOSE(VB_MEDIA, QString("Executing '%1'").arg(MountCommand)); - if (0 == myth_system(MountCommand)) + if (0 == myth_system(MountCommand, MYTH_SYSTEM_DONT_BLOCK_PARENT)) { if (DoMount) {
I keep tripping over why is it that udevinfo works and the error case doesn't? udevinfo returns the same string, although the default has no trailing new line. I tried appending a new line to the default case, (ret.append(QChar '\n')) to no avail.
comment:24 Changed 14 years ago by
Priority: | minor → trivial |
---|
There is a known bug with mythsystem in 0.22/trunk, multiple concurrent processes started with mythsystem share the same pid file meaning they aren't cleaned up properly when complete. This is probably related.
comment:25 Changed 14 years ago by
I added VERBOSE lines to print out the child PIDs in myth_system and found that they didn't match those of the defunct processes. So I did the same [udevinfo->pid()] in mediamonitor-unix.cpp right after udevinfo->start and got an exact match! The usleep() in the following has eliminated the defuncts for me. The additional tests below that cut down the number of ?unrequired? attempts.
Index: mediamonitor-unix.cpp =================================================================== --- mediamonitor-unix.cpp (revision 22889) +++ mediamonitor-unix.cpp (working copy) @@ -229,6 +229,8 @@ args << sysfs; udevinfo->start("udevinfo", args); + usleep(100000); + if (!udevinfo->waitForStarted(2000 /*ms*/)) { VERBOSE(VB_MEDIA, msg + ", Error - udevinfo failed to start!"); @@ -553,7 +555,10 @@ // skip some sysfs dirs that are _not_ sub-partitions if (*pit == "device" || *pit == "holders" || *pit == "queue" - || *pit == "slaves" || *pit == "subsystem") + || *pit == "slaves" || *pit == "subsystem" + || *pit == "bdi" || *pit == "power" + || *pit == "trace") + continue; found_partitions |= FindPartitions(
Of course those affected could just add a shell script something like this as /sbin/udevinfo:
# If you already have udevinfo, you don't want this script!!! UDEVADM=/sbin/udevadm if [ ! -e $UDEVADM ]; then echo "Strange, you don't have $UDEVADM either, bye!" exit 1 fi RESULT=`$UDEVADM info $1 $2 $3 $4 2>&1` RETURN_CODE=$? if [ $RETURN_CODE = 0 ]; then echo "$RESULT" else echo "device not found in database" fi exit $RETURN_CODE
comment:26 Changed 14 years ago by
Update:
This link http://bugreports.qt.nokia.com/browse/QTBUG-5990 seems to address the defunct/zombie problem.
Also, http://qt.nokia.com/doc/4.5/qprocess.html#starts speaks to starting a process that is still running. The log shows all my udevinfo->start()s (up to 16) were done in a 122msec window. I saw a log from another user with 5 removable devices do it in 112msec.
Also, the echo "device not found in database" above should have: 1>&2 appended in order to emulate udevinfo.
comment:27 Changed 14 years ago by
Resolution: | → invalid |
---|---|
Status: | new → closed |
The problem is a bug in Qt ( http://bugreports.qt.nokia.com/browse/QTBUG-5990 ). When #6137 is fixed, it will prevent our seeing the symptoms of this Qt bug, even on broken Qt versions. Until #6137 is fixed, users may use workarounds mentioned above or keep an eye on QTBUG-5990.
Thanks to Bill Meek for tracking down the Qt bug and to Bill Meek and Josh Winters and all the others for all the debugging help.
Please provide log files from mythwelcome and mythfrontend, otherwise it is impossible for us to diagnose.