Opened 14 years ago

Closed 10 years ago

#7762 closed Patch - Feature (Won't Fix)

Autoshutdown slave feature enhanced

Reported by: gmembre@… Owned by: stuartm
Priority: minor Milestone: 0.28
Component: MythTV - General Version: 0.24.1
Severity: medium Keywords: auto shutdown slave dayly wakeup period mythshutdown
Cc: Ticket locked: no

Description

Here is an enhancement of the auto shutdown feature of slave. It was introduced in 0.22. This patch allows the MB to ask a slave to shutdown, the slave executes a script to check if it can and then shutdown if allowed. If not, the MB will ask again few minutes later and so on... It is the same functionnality of the auto shutdown feature of the MB with mythshutdown but for slaves. The MB also checks for daily wakeup period of slaves and wakes it up if needed.

I have been running this patch on my production system version 0.22 for 2 weeks without any issue.

I'll add another patch to add a new input in mythtv-setup.

Attachments (6)

shutdownSlaveEnhanced.diff (17.6 KB) - added by gmembre@… 14 years ago.
patch of this enhancement
mythtv-setup (1.8 KB) - added by gmembre@… 14 years ago.
options added to mythtv-setup
SWpatch_shutdownEnhanced09Feb.diff (19.4 KB) - added by simonwalls@… 14 years ago.
Modified patch. Closer to [20084] and uses SleepCommand? not HaltCommand?
shutdownSlaveEnhanced_v0.2.diff (18.7 KB) - added by gmembre@… 14 years ago.
Here is a new version of this feature : SleepCommand? is back, DailyWakeup?*Period can be set per slave host, and the slave doesn't start a new Scheduler
shutdownSlaveEnhanced_v0.3.diff (19.1 KB) - added by gmembre@… 14 years ago.
Better handling of daily wakeup period for the slave
shutdownSlaveEnhanced_v0.3+0.24.1.diff (20.7 KB) - added by Cédric Schieli <cschieli@…> 13 years ago.
Updated patch based on v0.3 and forwardported to the fixes/0.24 branch

Download all attachments as: .zip

Change History (30)

Changed 14 years ago by gmembre@…

Attachment: shutdownSlaveEnhanced.diff added

patch of this enhancement

comment:1 Changed 14 years ago by cpinkham

Owner: changed from Isaac Richards to cpinkham
Status: newassigned

Changed 14 years ago by gmembre@…

Attachment: mythtv-setup added

options added to mythtv-setup

comment:2 Changed 14 years ago by simonwalls@…

Just to let everyone know that I am testing this patch.

I'm running svn22759 which is virtually 0.22, and am making daily use of the Slave Shutdown feature that cpinkham added.

I'll report my findings. One difference I have noted so far is that I'm not using Mythwelcome, and the patch seems to have some dependence upon that -- it's not working for me 100% correct yet. I am using a 24/7 Master server and sleep/wake Slave.

Cheers, Simon.

comment:3 Changed 14 years ago by gmembre@…

I don't use mythwelcome, could you explain a little more what is your problem ? My setup is also a MBE 24/7 with a slave backend which wakes up upon recordings.

comment:4 Changed 14 years ago by simonwalls@…

Hi gmembre,

Aha. Glad to know you have the same configuration as me, that helps.

I guess the sight of DB accesses for DailyWakeup? periods got me looking, since I use a cron-based daily wakeup for transcoding. Do you now support the use of the DailyWakeup? DB definitions to wake up a slave (for transcoding, for example) ? If so, they would need to be set for each Slave backend, whereas right now there is one set only - intended I think for a combined BE/FE system.

Anyway I have got to the root of my problem. My system is set up as described in cpinkham's patch [20084] where each Slave has a DB value SleepCommand? (I used /sbin/shutdown -h +3). Your patch changes that to use the HaltCommand? which is under Setup->General. I think this is aimed at making the Master server halt whereas cpinkham's can be different for each Slave. You may be lucky in that the same command works on all your machines.

The reason mine was not sleeping was due to this:

2010-02-09 19:19:54.724 CheckShutdownServer returned - OK to shutdown
2010-02-09 19:19:54.726 MainServer, Warning: Unknown socket closing MythSocket(0x8e12688)
2010-02-09 19:19:54.736 MythSocket(8e12688:-1): writeStringList: Error, socket went unconnected.
                        We wrote 0 of 10 bytes with 1 errors
2010-02-09 19:19:54.739 New DB connection, total: 4
2010-02-09 19:19:54.758 Connected to database 'mythconverg' at host: 192.168.1.7
2010-02-09 19:19:54.769 Running the command to shutdown this computer :-
                                                sudo /sbin/halt -p
sudo: sorry, you must have a tty to run sudo
2010-02-09 19:19:54.872 ServerHaltCommand failed, shutdown aborted

Not only was it using the HaltCommand?, which I had left at default (I believe), it was failing on my system with a sudo problem. This is on a Fedora 11 system, and that particular problem may affect others trying to use sudo from within Myth.

So, I have changed my copy of the patch to be more like [20084] and use the SleepCommand? again, sorry about that, but I think it fits better that way. I've also added a few more diagnostic log entries, showing what is going on with the OK or BUSY responses from the slave. I changed one of your lines from SCHEDULE+EXTRA back to IMPORTANT since it's important to know if a slave is asked to shut down, you might not see it if you don't run with schedule+extra logging all the time. I'll be testing it in the next few days as the system is busy tonight. I've attached it anyway, though, but be aware it is untested for now other than compilation. File: SWpatch_shutdownEnhanced09Feb.diff

Cheers, Simon.

Changed 14 years ago by simonwalls@…

Modified patch. Closer to [20084] and uses SleepCommand? not HaltCommand?

comment:5 Changed 14 years ago by anonymous

Simon,

This is correct, the MBE uses the DailyWakeup? period set for the whole system, not for each slave. It may be a very simple enhancement to make it work for each slave. But as I have only one slave, I didn't though about it. It is also good idea to change some log level. I used HaltCommand? because it was the way it works before cpinkham add this patch, but SleepCommand? is fine :) Do I need to produce a new patch with your idea based on my patch ? I see in your diff that your are using an older version of mythtv than mine. My patch applies against r22882 and your against r22759.

Regards Guillaume

comment:6 Changed 14 years ago by cpinkham

Thanks for the feature enhancement, but there are a few things that I think need to be corrected before this patch can be accepted. Here are the issues I noticed:

1) You can't instantiate a Scheduler anywhere but the master. Just calling DisableScheduling?() still lets the main parts of the scheduler run but it doesn't fire off any recordings. The Scheduler code is not guaranteed to be 'multi-backend-safe', meaning that it potentially could do things to the database that should only be done from the master Scheduler. I don't think there is any code like this currently, but it's not something we want to deal with in the future. Even the "mythbackend --printsched" functionality opens a connection to the master scheduler and downloads the list of scheduled recordings. If Scheduler::CheckShutdownServer?() is the only thing you need from the scheduler, then you can make this method static and call it as "bool status = Scheduler::CheckShutdownServer?(0, idleSince, blockShutdown);" in your code.

2) There are a couple reasons that EncoderLink::GoToSleep?() can return false other than when the slave returns a non "OK" status, so you can't rip out the while loop that I have that sets the slave's status to undefined. It sounds like EncoderLink::GoToSleep?() and PlaybackSock::GoToSleep?() both need to return an int status. Something like -1 == error, set status to undefined, 0 == OK slave is going to sleep, and 1 == slave was busy and can't go to sleep.

3) The changes to MainServer::HandleGoToSleep?() seem wrong in a couple ways. a) you are calling SendResponse?() twice in the case where 'status' is true, but the SleepCommand? being empty. b) you don't need to get the value of blockSDWUwithoutClient prior to calling CheckShutdownServer?. The 'blockshutdown' parameter to CheckShutdownServer? is passed by reference, and updated inside the method so that the caller can get the value. You can just declare "bool blockShutdown;" without the GetNumSetting? since you don't need the value of blockShutdown.

4) please observe myth coding style and put brackets { and } on their own lines.

5) the Feb 9 version of the patch still uses the AllowSlaveToSleep? setting in one place.

6) since Scheduler::WakeUpSlaves? is now checking if a slave IsAwake?(), it probably should also check IsWaking?() in case we just told it to wake up a minute ago and the slave's mythbackend isn't up and running yet.

I think that's about all I noticed from my first pass through the patch.

comment:7 Changed 14 years ago by anonymous

Thanks for the attention cpinkham,

I am testing this patch because I have a couple of problems with [20084] as found in my 22759 svn, that I thought this may help with. I haven't reported them elsewhere yet but was planning on it.

  • I'm getting a race condition or similar shutting down the SBE just after a transcode job has begun from a cron wakeup (since the SBE cannot refuse to shutdown in the original incarnation)
  • And also it would be nice if the JobQueue? could wake up SBEs that are sleeping.

The patch has merit, the use of an OK or BUSY response based on a check of mythshutdown is great, just seems there are a few issues with the implementation - you mention way more than I could find.

I'm running the Feb 9 version of the patch now, but it still has problems. True I am running a version earlier than the patch was written for, and I am seeing socket problems as follows:

2010-02-10 20:56:35.359 Scheduler, Checking for slaves that can be wake up
2010-02-10 20:56:35.361 Scheduler, Checking for slaves that can be shut down
2010-02-10 20:56:35.362   Getting list of slaves that will be active in the next 15 minutes.
2010-02-10 20:56:35.363 Checking scheduler's reclist
2010-02-10 20:56:35.365   Checking inuseprograms table:
2010-02-10 20:56:35.369     No entries.
2010-02-10 20:56:35.370     Asking Server to go to sleep.
2010-02-10 20:56:35.546 MainServer::ANN Monitor
2010-02-10 20:56:35.548 adding: Server as a client (events: 0)
2010-02-10 20:56:35.551 MainServer::ANN Monitor
2010-02-10 20:56:35.553 adding: Server as a client (events: 1)
2010-02-10 20:57:05.373 MythSocket(a1f7858:31): readStringList: Error, timed out after 30000 ms.
2010-02-10 20:57:05.384 Slave backend: Server no longer connected
2010-02-10 20:57:05.387 MythSocket(a1f7858:-1): writeStringList: Error, called with unconnected socket.
2010-02-10 20:57:05.395 MythSocket(a1f7858:-1): readStringList: Error, called with unconnected socket.
2010-02-10 20:57:05.397 PlaybackSock::SendReceiveStringList(): No response.
2010-02-10 20:57:05.398 PlaybackSock, Error: GetEncoderState: QUERY_REMOTEENCODER 4 gave us no response.
2010-02-10 20:57:05.397 PlaybackSock::SendReceiveStringList(): No response.
2010-02-10 20:57:05.408       Slave Server responded that it will not go to sleep.
2010-02-10 20:57:05.419 Reschedule requested for id 0.
2010-02-10 20:57:05.425 BuildWorkList...
2010-02-10 20:57:05.427 AddNewRecords...

So, the SBE actually goes to sleep (log not copied here) but the MBE thinks it has refused due to the socket issue. That may be because something important has changed between versions 22759 and 22882, as Guillame reports it is working for him. I assumed nothing vital would have changed between those versions but looks like I was incorrect.

Since I am stuck with a system that doesn't work properly I will back out the change for now and look at upgrading to 22882 to test further.

Cheers, Simon.

comment:8 Changed 14 years ago by anonymous

Thanks a lot both for your remark. I'm correcting this patch and send it back to you asap (~ 2-3 days)

comment:9 Changed 14 years ago by anonymous

Thanks Guillaume,

I have been through cpinkham's list in detail and see what he's saying. I did not understand originally the point about the AllowSlaveToSleep? setting. That is not present in the original incarnation (as found in 0.22), so maybe he is saying why introduce a new setting. The logic being, if an individual SBE has a sleep command defined, it should not need further settings to allow it to sleep. To disable sleep, you ensure the SleepCommand? for that slave is null. Since you were using HaltCommand? (which is a global setting) each SBE _did_ need individual permission to sleep. That's clear now, so I would say the definition of (in mythtv-setup) and the check for AllowSlaveToSleep? should both be removed.

Also I think I accidentally introduced the second call to SendResponse? when I was adding back cpinkham's detection of the SleepCommand? being empty.... apologies. (That may be why I had socket error messages in my log).

With the issues resolved, the patch could be accepted :-) You might not get agreement from the devs that the DailyWakeup? period should apply to SBE's as well..... I think the reasoning would be that the MBE (only) should be woken using DailyWakeup? and then it would command SBE's to wake as-needed via WOL. In a system with many SBE's it may not be desirable for all of them to wake up at the same time, if they do not have jobs to perform. I was thinking of a patch to get JobQueue? to wake a SBE which is sleeping and has jobs due, so that would integrate nicely.

Hope that helps, I look forward to testing a revised patch, Simon.

comment:10 Changed 14 years ago by gmembre@…

@Chris : when you say that EncoderLink::GoToSleep?() and PlaybackSock::GoToSleep?() should return an integer, it mean that MainServer::HandleGoToSleep?() is able to return something else than OK or BUSY. It may be usefull in case the SleepCommand? failed to execute, so I have to move SendResponse?(pbs->getSocket(), strlist); at the end of this function in order to return an error to the MBE if needed. But, myth_system() is a blocking function, if the SleepCommand? doesn't return quickly while shutting down the system, the slave mythtv backend while be killed before it can send its response to the MBE, so the MBE will think something wrong occurs in the slave. What do you think about it ?

@simon : When I wrote my patch, I didn't saw SleepCommand? in this manner, but now, I better understand. I'll make the correction. I also can make the DailyWakeup? period be setup for each slave (but not thrugh any setup screen for the moment). For me, any job in the jobqueue will be consumed when a slave wakes up but it is not a reason to wake up a slave to just do job. At home, my slave backend is my desktop computer, it wakes up every day at 7pm, a few minutes before I came back from work : my computer is waiting for me and it starts working in background on the jobqueue.

comment:11 Changed 14 years ago by simonwalls@…

For me, any job in the jobqueue will be consumed when a slave wakes up but it is not a reason to wake up a slave to just do job.

Everybody uses Myth in a different way, because it's so flexible. If your patch is too restrictive, it might not get accepted. Keeping it configurable makes it suitable for lots of users, better in my opinion.

I also can make the DailyWakeup?? period be setup for each slave

There is already a kind of Daily wakeup available for each Slave, it's under the JobQueue? settings (jobqueue start time and end time). It sounds like you have the same need as I, and could use these rather than add new ones - but you want the Slave to start up whether there are jobs or not. It does sound outside of Myth's purpose - perhaps you could use a cron job from your Master to ensure it's waiting for you at 7pm even if there are no jobs for it?

My SBE has all my tuners, and a fast processor, but no storage. It wakes automatically for recordings under cpinkham's change (yay) then sleeps again. I also wake it up once a day via cron, at a time when no recordings are likely, so that it can clear any transcodes I've queued (sometimes there are none so it shuts down again). It would be nice to get the JobQueue? to wake it up (at the JobQueue? start time for that SBE) only if there are jobs due. That's the patch I had in mind.

BR, Simon.

Changed 14 years ago by gmembre@…

Here is a new version of this feature : SleepCommand? is back, DailyWakeup?*Period can be set per slave host, and the slave doesn't start a new Scheduler

comment:12 Changed 14 years ago by simonwalls@…

Looks good, I will get to testing it as soon as possible.

Also that will let me use one of the DailyWakeup? periods for my SBE to do my transcode processing now, thank you.

Simon.

comment:13 Changed 14 years ago by simonwalls@…

Well I am running it now, still on svn22759 and it compiles and runs.

It's still early days in testing, so far the only issue is that I see this in the SBE's log:

Broadcast message from root@Server
        (unknown) at 22:15 ...

The system is going down for halt in 3 minutes!
2010-02-17 22:15:45.841 CheckShutdownServer returned - OK to shutdown
2010-02-17 22:15:45.851 Running the command to shutdown this computer :-
                                                /sbin/shutdown -h +3
2010-02-17 22:15:45.852 MainServer, Warning: Unknown socket closing MythSocket(0x83955e8)
2010-02-17 22:15:45.854 MythSocket(83955e8:-1): writeStringList: Error, socket went unconnected.
                        We wrote 0 of 10 bytes with 1 errors
2010-02-17 22:15:46.852 Connecting to master server: 192.168.1.7:6543
2010-02-17 22:15:46.854 Connected successfully
2010-02-17 22:15:54.203 JobQueue: Currently set to run up to 1 job(s) max.
2010-02-17 22:15:54.207 JobQueue: GetJobsInQueue: findJobs search bitmask 4, found 10 total jobs
2010-02-17 22:15:54.208 JobQueue: GetJobsInQueue: Found 'User Job - Save to MythVideo' Job for chanid 1001 @ 20090112223400 in Queued state.
2010-02-17 22:15:54.210 JobQueue: GetJobsInQueue: Ignore 'Flag Commercials' Job for chanid 1005 @ 20100215065900 in Finished state.
2010-02-17 22:15:54.211 JobQueue: GetJobsInQueue: Ignore 'Flag Commercials' Job for chanid 1071 @ 20100215163100 in Finished state.

Looks like the communications isn't perfect yet, but hey, the SBE shuts down ok. I have a wakeup pending overnight so will verify that works too.

At this time, the only additional suggestion I can make is that JobQueue? should not parse the queue if the machine has a pending shutdown, and should display a message to say so. You can see it still parsing the queue in the above log clip.

I'll update when I have some more results from testing, Simon.

comment:14 Changed 14 years ago by gmembre@…

Your error is strange, could you double ckech that the file mainserver.cpp is clean of any patch before applying my patch. I'm also investigating for your problem.

Changed 14 years ago by gmembre@…

Better handling of daily wakeup period for the slave

comment:15 Changed 14 years ago by simonwalls@…

Hi, Now I'm using svn 22882 (just mythbackend for now, hope that is not a big problem) with v0.3 of the patch. SBE does shut down, but according to MythWeb it goes to "currently not connected" and log entries are as follows:

MBE

2010-02-18 21:02:57.766 Scheduler, Checking for slaves that can be wake up
2010-02-18 21:02:57.768 Scheduler, Checking for slaves that can be shut down
2010-02-18 21:02:57.769   Getting list of slaves that will be active in the next 15 minutes.
2010-02-18 21:02:57.770 Checking scheduler's reclist
2010-02-18 21:02:57.772 2  Checking inuseprograms table:
2010-02-18 21:02:57.780     Asking Server to go to sleep.
2010-02-18 21:02:58.708 adding: Server as a client (events: 0)
2010-02-18 21:02:58.711 adding: Server as a client (events: 1)
2010-02-18 21:03:27.782 MythSocket(9b73830:31): readStringList: Error, timed out after 30000 ms.
2010-02-18 21:03:27.788 Slave backend: Server no longer connected
2010-02-18 21:03:27.791 MythSocket(9b73830:-1): writeStringList: Error, called with unconnected socket.
2010-02-18 21:03:27.793 MythSocket(9b73830:-1): readStringList: Error, called with unconnected socket.
2010-02-18 21:03:27.795 PlaybackSock::SendReceiveStringList(): No response.
2010-02-18 21:03:27.796 PlaybackSock, Error: GetEncoderState: QUERY_REMOTEENCODER 4 gave us no response.
2010-02-18 21:03:27.791 PlaybackSock::SendReceiveStringList(): No response.
2010-02-18 21:03:27.809 Scheduler, Error: Unable to shutdown Server slave backend, setting sleep status to undefined.
2010-02-18 21:03:27.821 BuildWorkList...
2010-02-18 21:03:27.822 AddNewRecords...
2010-02-18 21:03:27.967  |-- Start DB Query...
2010-02-18 21:03:28.207  |-- 198 results in 0.234584 sec. Processing...
2010-02-18 21:03:28.297  +-- Cleanup...

and SBE

2010-02-18 21:03:17.413 JobQueue: Currently set to run new jobs from 15:00 to 21:00
2010-02-18 21:03:17.414 JobQueue: Currently Running 0 jobs.  Jobs in Queue, but we are outside of the Job Queue time window, no new jobs can be started.
2010-02-18 21:03:17.415 JobQueue: Skipping 'Flag Commercials' job for chanid 1009 @ 20100218200600, current time is outside of the Job Queue processing window.

Broadcast message from root@Server
        (unknown) at 21:03 ...

The system is going down for halt in 3 minutes!
2010-02-18 21:03:26.804 CheckShutdownServer returned - OK to shutdown
2010-02-18 21:03:26.819 Received a request to shut down this machine.... response is OK.
2010-02-18 21:03:26.822 Running the command to shutdown this computer :-
                                                /sbin/shutdown -h +3
2010-02-18 21:03:26.823 MainServer, Warning: Unknown socket closing MythSocket(0x95f51a0)
2010-02-18 21:03:26.825 MythSocket(95f51a0:-1): writeStringList: Error, socket went unconnected.
                        We wrote 0 of 10 bytes with 1 errors
2010-02-18 21:03:27.416 JobQueue: Currently set to run up to 1 job(s) max.
2010-02-18 21:03:27.419 JobQueue: GetJobsInQueue: findJobs search bitmask 4, found 10 total jobs
2010-02-18 21:03:27.422 JobQueue: GetJobsInQueue: Ignore 'User Job - Save to MythVideo' Job for chanid 1001 @ 20090112223400 in Finished state.

(The broadcast message from root announcing the shutdown on SBE appears on my terminal where I am running a tail -f mythbackend.log , and is not in the file).

It does seem as though the slave backend breaks the connection with the master too quickly, since some socket comms fail immediately after it disconnects at 21:03:27

Cheers, Simon.

comment:16 Changed 14 years ago by robertm

Guys, help us out here. *Please* take all this discussion to the dev list. This ticket is a perfect example of why a ticket would not get applied because any one of us would look at it and be irritated just trying to find relevant information. I don't want to lock the ticket, but discussion needs to happen in the right place.

comment:17 Changed 14 years ago by stuartm

Milestone: unknown

comment:18 Changed 14 years ago by cpinkham

Milestone: unknown0.25

comment:19 Changed 14 years ago by stuartm

Milestone: 0.25

Milestone 0.25 deleted

Changed 13 years ago by Cédric Schieli <cschieli@…>

Updated patch based on v0.3 and forwardported to the fixes/0.24 branch

comment:20 Changed 13 years ago by Cédric Schieli <cschieli@…>

I attached an updated version of the patch based on v0.3 and forwardported to the fixes/0.24 branch. I'll try to work on forwardporting it to master too, but I can't promise it in a foreseeable future.

comment:21 Changed 12 years ago by cpinkham

Milestone: 0.25unknown

comment:22 Changed 12 years ago by stuartm

Type: enhancementPatch - Feature
Version: unknown0.24.1

comment:23 Changed 11 years ago by stuartm

Milestone: unknown0.28
Owner: changed from cpinkham to stuartm
Status: assignedaccepted

comment:24 Changed 10 years ago by stuartm

Resolution: Won't Fix
Status: acceptedclosed

Unfortunately this patch doesn't apply at all against master/0.28 and I don't have the time to re-write it.

Note: See TracTickets for help on using tickets.