Opened 15 years ago

Closed 14 years ago

#6478 closed patch (wontfix)

Add update timestamp to program table for incremental program data retrieval

Reported by: chasedouglas@… Owned by: danielk
Priority: minor Milestone: unknown
Component: MythTV - Mythfilldatabase Version: head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

As a developer of MyMote, I would like to be able to download incremental sets of program data each time MyMote is opened. The attached patch implements this update capability.

It does a number of things:

  1. Adds a timestamp column to the program table
  2. Modifies ProgramData::clearDataByChannel() to leave the program table as-is by default
  3. Modifies mythfilldatabase so it updates the first and last flags only when the resultant value is different than the current value
  4. Modifies the DataDirectProcessor::DataDirectProgramUpdate() so that it:
    1. Removes program entries that do not match any of the new data direct entries
    2. Removes new data direct entries that match any of the old program entries
    3. Inserts what is left of the new data direct entries into the program table

I have tested this with schedules direct and have not found any issues. I believe that other forms of program table changes are done through updates and not through wipe and reinsert like DataDirectProcessor does. Thus, this should be all the changes necessary.

Attachments (5)

program-timestamp.patch (16.5 KB) - added by chasedouglas@… 15 years ago.
program-timestamp.2.patch (25.6 KB) - added by chasedouglas@… 15 years ago.
New patch adding PreferredListingsSource? setting
program-timestamp.3.patch (25.8 KB) - added by chasedouglas@… 15 years ago.
Modified patch with a few sql query accuracy changes
program-timestamp.4.patch (25.7 KB) - added by chasedouglas@… 15 years ago.
New patch with change to delete query so it doesn't use dd_genre subquery
program-timestamp.5.patch (14.7 KB) - added by chasedouglas@… 15 years ago.
New patch based against revision 20676

Download all attachments as: .zip

Change History (20)

Changed 15 years ago by chasedouglas@…

Attachment: program-timestamp.patch added

comment:1 Changed 15 years ago by danielk

Milestone: 0.22unknown
Owner: changed from stuartm to danielk
Status: newassigned

Changed 15 years ago by chasedouglas@…

Attachment: program-timestamp.2.patch added

New patch adding PreferredListingsSource? setting

comment:2 Changed 15 years ago by anonymous

I attached a new version of the patch after consulting with danielk. This new patch adds support for the 'PreferredListingsSource?' setting. For example, if this setting is set to 'datadirect', then the datadirect program guide info can replace and remove any data it wants in the program table. If the setting isn't set then any program data fetch can overwrite any data in the program table, just as it does today. If the setting is set to something other than 'datadirect', then datadirect will only remove program table entries it previously added, and will only insert new program entries when they completely fit into an empty portion of the program guide.

This should be extended to the other program guide insertion routines as well.

Changed 15 years ago by chasedouglas@…

Attachment: program-timestamp.3.patch added

Modified patch with a few sql query accuracy changes

comment:3 Changed 15 years ago by anonymous

I added one last revision of the patch to fix some sql query issues with the datadirect program table update code.

  1. originalairdate may be NULL, and thus the null-safe comparison operator (<=>) should be used.
  2. The bit shift for the audioprop comparison was oriented in the wrong direction.
  3. A few extra WHERE clauses to deal with situations where the dd_genre table does not have any information about a program in dd_v_program.

After this patch I got one days worth of program table updates down from 406 changes to just two row changes I can't work around. The two changes were due to marking programs as the first part of a multiple part series. I can't think of a good way to get around having to update such rows on every mythfilldatabase run.

These last changes were nagging at me because I thought 406/~2000 programs per day being updated on each mythfilldatabase run wasn't as good as it should be. Now that it's down to just a few programs that are marked specially due to their part numbers in a series I believe this patch is functionally correct. Only regression testing should need to be done at this point.

comment:4 Changed 15 years ago by chasedouglas@…

In the big delete query in datadirect.cpp I refer to dd_genre twice, once in the main query and once in a subquery. This was not an issue during my testing because I had made the temporary dd tables non-temporary (I just deleted the TEMPORARY flag from the CreateATempTable query). However, now that I've restored the TEMPORARY flag and deleted the non-temporary tables, the delete query won't prepare. According to the MySQL docs (http://dev.mysql.com/doc/refman/5.1/en/temporary-table-problems.html) this is not allowed, and the QSqlError text after it tries to bind the query states "Can't reopen table: 'dd_genre'."

I don't have a fix in mind yet, and I may just have to scrap that subquery. This would increase the number of unnecessary updates to include programs without matching genre table entries. If I remember correctly this may represent about 2% of the programs I've retrieved through schedules direct.

Changed 15 years ago by chasedouglas@…

Attachment: program-timestamp.4.patch added

New patch with change to delete query so it doesn't use dd_genre subquery

comment:5 Changed 15 years ago by anonymous

The attachment I just posted fixes the temporary table multiple reference issue. I will continue testing to ensure everything works properly.

comment:6 Changed 15 years ago by danielk

Status: assignedinfoneeded

Chase, I think you need too look at this a bit more. There are SQL statements prepared but not executed in datadirect.cpp and there is a file missing in the patch (but I was able to reconstruct it for testing.)

Also, why is the program table treated differently from the other related tables in ProgramData::clearDataByChannel() ?

Did you do before and after benchmarking for this? What kind of numbers?

DataDirectProgramUpdate?() will give us overlaps if there is pre-existing non-DataDirect? data in the table, unless I'm missing something?

comment:7 in reply to:  6 ; Changed 15 years ago by anonymous

Replying to danielk:

Chase, I think you need too look at this a bit more. There are SQL statements prepared but not executed in datadirect.cpp and there is a file missing in the patch (but I was able to reconstruct it for testing.)

There are three queries prepared in datadirect.cpp, and all three are executed. If they weren't it wouldn't work right at all.

What file is missing?

Also, why is the program table treated differently from the other related tables in ProgramData::clearDataByChannel() ?

The current clearDataByChannel() code completely deletes all the program information in the channel. Currently, mythtv deletes all program information, and then reloads it all back in. We want to leave the program information and just update it with what has changed since the last mythfilldatabase run.

Did you do before and after benchmarking for this? What kind of numbers?

I have not yet, but I will try to soon.

DataDirectProgramUpdate?() will give us overlaps if there is pre-existing non-DataDirect? data in the table, unless I'm missing something?

If DataDirect? is not the preferred listings source then it won't delete the pre-existing non-DataDirect? programs in the first query of the function. During the update, the temporary table dd_v_program holds all the programs that will be copied into the real program table. The third query in DataDirectProgramUpdate?() deletes any programs in dd_v_program that would overlap with any pre-existing programs that weren't cleared away by the first query in the function.

comment:8 in reply to:  7 ; Changed 15 years ago by danielk

Replying to anonymous:

Replying to danielk:

Chase, I think you need too look at this a bit more. There are SQL statements prepared but not executed in datadirect.cpp and there is a file missing in the patch (but I was able to reconstruct it for testing.)

There are three queries prepared in datadirect.cpp, and all three are executed. If they weren't it wouldn't work right at all.

Right, that's why I want you to look at it. See line 885, that query is prepared but not executed.

What file is missing?

listingsources.h, just try applying your patch to virgin sources gcc will report missing files.

Also, why is the program table treated differently from the other related tables in ProgramData::clearDataByChannel() ?

The current clearDataByChannel() code completely deletes all the program information in the channel. Currently, mythtv deletes all program information, and then reloads it all back in. We want to leave the program information and just update it with what has changed since the last mythfilldatabase run.

I think you misunderstood my question. You changed clearDataByChannel() to not delete some of the program information by default. I see two things wrong with this: First, when you add an argument to a function that has a default it should always default to the old behavior to avoid introducing bugs and to lessen the effort needed in code review. Second, the information in programrating, credits and programgenres is tied to the information in program, if you delete an row in one or more of them it stands to reason you should delete the matched rows in all the others (unless you can explain why that wouldn't be the case?)

Did you do before and after benchmarking for this? What kind of numbers?

I have not yet, but I will try to soon.

k, thanks.

DataDirectProgramUpdate?() will give us overlaps if there is pre-existing non-DataDirect? data in the table, unless I'm missing something?

If DataDirect? is not the preferred listings source then it won't delete the pre-existing non-DataDirect? programs in the first query of the function. During the update, the temporary table dd_v_program holds all the programs that will be copied into the real program table. The third query in DataDirectProgramUpdate?() deletes any programs in dd_v_program that would overlap with any pre-existing programs that weren't cleared away by the first query in the function.

k, makes sense.

comment:9 in reply to:  8 Changed 15 years ago by chasedouglas@…

Replying to danielk:

Replying to anonymous:

There are three queries prepared in datadirect.cpp, and all three are executed. If they weren't it wouldn't work right at all.

Right, that's why I want you to look at it. See line 885, that query is prepared but not executed.

The last patch I attached must not be what I'm currently running. I'll try to recreate it and attach it again.

What file is missing?

listingsources.h, just try applying your patch to virgin sources gcc will report missing files.

I'll make sure it's in the new patch.

I think you misunderstood my question. You changed clearDataByChannel() to not delete some of the program information by default. I see two things wrong with this: First, when you add an argument to a function that has a default it should always default to the old behavior to avoid introducing bugs and to lessen the effort needed in code review. Second, the information in programrating, credits and programgenres is tied to the information in program, if you delete an row in one or more of them it stands to reason you should delete the matched rows in all the others (unless you can explain why that wouldn't be the case?)

I understand now. The reason I changed the default behavior was that DataDirect? was the only code path that invoked the function. I mostly added the extra argument in case some future code wanted to use the functionality. It could be completely stripped out if that would be preferred.

comment:10 Changed 15 years ago by danielk

(In [20650]) Refs #6478. Applies part of the program table listingsource+timestamp patch from Chase Douglas. There is plenty of good stuff not in this commit, this partial commit is just to shrink the patch of the simple stuff while other parts of the patch are tested and refined.

comment:11 Changed 15 years ago by danielk

(In [20656]) Refs #6478. This merges DBEvent in the EIT program data classes with ProgInfo? in the XMLTV program data classes.

The insertion algorithms have not been merged, just the base data representation class. In my runs of mythfilldatabase this reduces the memory footprint of mfd by 15-20% by using the more compact representation developed for EIT for the common data and by compacting the QStrings. The wall clock time is the same before and after.

This also adds some cleanups from #6478 -- mostly to address changes in QSqlQuery fro Qt3 to Qt4 and to workaround the Qt bindings prefix bug in some early Qt4 implementations (bugfixes).

I've tested with DataDirect?, XMLTV, EIT and a combined DataDirect?/EIT video source, but there may be problems which I did not see with my sources.

NOTE: The plan is to merge the data insertion algorithms, but not until after this data representation changeset has been in the tree a week or two.

Changed 15 years ago by chasedouglas@…

Attachment: program-timestamp.5.patch added

New patch based against revision 20676

comment:12 Changed 15 years ago by chasedouglas@…

I rebased the patch against revision 20676. The only major difference between the fourth version and this fifth version is that the prepared but not executed query in DataDirectProgramUpdate?() is now completely removed. I realized that it was redundant with respect to the next query in the function, and the next query should execute faster.

I left the additional parameter to ClearDataByChannel?() for delete_program since there has not been an agreement reached on how to deal with it. I am happy to leave the details up to daniel2k or any other committer as long as the functionality is maintained.

I benchmarked mythfilldatabase using revision 20676 and then with my patch added in. In all cases I ran it as "time mythfilldatabase --refresh-all". I used a generic "basic cable" source with 62 channels. I ran the tests on Ubuntu Jaunty (9.04). In the first test, the program table starts completely empty and is filled. The second test then runs immediately after to show statistics for a redundant run. The last day's worth of data is deleted from the program table before running the third test to simulate the performance of a normal daily mythfilldatabase run.

User time increased between 2.5 and 4.5%, and system time increased around 14% for each test. However, the absolute values are relatively small, especially compared to the total run time due to the guide data fetching.

Old (revision 20676)

complete refill:

57.57user 13.04system 5:32.55elapsed 21%cpu (0avgtext+0avgdata 0maxresident)k 200inputs+128outputs (1major+24878minor)pagefaults 0swaps

redundant update:

54.14user 11.81system 5:16.88elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+120outputs (0major+23551minor)pagefaults 0swaps

single day update:

52.89user 12.12system 5:28.31elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+120outputs (0major+23582minor)pagefaults 0swaps

New with program timestamp patch

complete refill:

59.05user 14.06system 6:06.39elapsed 19%CPU (0avgtext+0avgdata 0maxresident)k 1112inputs+128outputs (6major+24870minor)pagefaults 0swaps

redundant update:

55.47user 13.35system 5:32.57elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+120outputs (0major+23586minor)pagefaults 0swaps

single day update:

55.32user 13.77system 5:36.75elapsed 20%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+120outputs (0major+23576minor)pagefaults 0swaps

comment:13 Changed 15 years ago by stuartm

Milestone: unknown0.22
Status: infoneededassigned

comment:14 Changed 15 years ago by danielk

Milestone: 0.22unknown

comment:15 Changed 14 years ago by danielk

Resolution: wontfix
Status: assignedclosed

I think the only way to get this applied is if we create a new table with the timestamps linked to program by chanid+startts. If such a patch is submitted it will be looked at.

Note: See TracTickets for help on using tickets.