Opened 14 years ago

Closed 14 years ago

#8812 closed defect (fixed)

RingBuffer has a number of thread safety issues

Reported by: danielk Owned by: danielk
Priority: minor Milestone: 0.24
Component: MythTV - General Version: Master Head
Severity: medium Keywords:
Cc: Ticket locked: no

Description

There are a large number of thread safety problems and the attempts to add multiple locks to deal with individual segfaults and races has only made the it harder to ensure correctness.

Attachments (9)

8812-v1.patch (43.7 KB) - added by danielk 14 years ago.
Initial patch
8812-v2.patch (53.7 KB) - added by danielk 14 years ago.
Updated patch
8812-v4.patch (67.2 KB) - added by danielk 14 years ago.
This patch adds some optimizations
8812-v10.patch (77.7 KB) - added by danielk 14 years ago.
Updated patch
8812-v13.patch (89.9 KB) - added by danielk 14 years ago.
Update patch
combined-v3.patch.bz2 (35.3 KB) - added by danielk 14 years ago.
Updated patch (this is actually combined with the preview gen patch as I'm testing those together)
combined-v4.patch.bz2 (35.3 KB) - added by danielk 14 years ago.
Added fix for slow streaming startup & ~RingBuffer? getting called before RingBuffer::run() can set readaheadrunning
combined-v5.patch.bz2 (35.4 KB) - added by danielk 14 years ago.
8812-v15.patch (94.6 KB) - added by danielk 14 years ago.
Just the ring buffer portion of the previous patch (sans mythversion bits)

Download all attachments as: .zip

Change History (14)

Changed 14 years ago by danielk

Attachment: 8812-v1.patch added

Initial patch

comment:1 Changed 14 years ago by danielk

This patch reduces the number of instance locks to two and protects most of the variables with the rwlock.

The locking hasn't been checked completely yet nor run tested. There may also be lock contention problems. Finally, the main read ahead loop needs to have it's locking simplified, but it should at least be correct now.

Changed 14 years ago by danielk

Attachment: 8812-v2.patch added

Updated patch

comment:2 Changed 14 years ago by danielk

Owner: set to danielk
Status: newassigned

This adds back more locks than in the first version of the patch so that the locking for reads from the buffer can be relaxed to rwlock.lockForRead() allowing the reads from the buffer to happen while reads from the disk or other medium is in progress. The performance should now be as good or better than before. The multitude of QWaitConditions has also been reduced to just one wait condition, which simplifies the code a bit and allows performance to be improved.

The two locks that were added can probably be eliminated using QAtomicInt, but it has a complicated API and several means to shoot yourself in the foot. Considering the mess that was made of RingBuffer? in just a few years I would like to keep this as simple as possible.

Changed 14 years ago by danielk

Attachment: 8812-v4.patch added

This patch adds some optimizations

comment:3 Changed 14 years ago by danielk

The latest patch adds optimizations for seeks near our current position and seeks close to the end of the file. These are special cases for ffmpeg which at the start of playback makes several passes over the first 8Mbits of the file and then a pass over the last 2Mbit, and then finally seeks back to the beginning of the file and starts reading.

This also adds a lot of debugging which will slow things down, including checksumming every block that ffmpeg reads. This has been done to assure correctness, this additional debugging will be removed later.

Changed 14 years ago by danielk

Attachment: 8812-v10.patch added

Updated patch

Changed 14 years ago by danielk

Attachment: 8812-v13.patch added

Update patch

comment:4 Changed 14 years ago by danielk

The latest version fixes problems with LiveTV that the earlier patch had and updates the patch to apply after the recent changes in RingBuffer? in trunk.

Changed 14 years ago by danielk

Attachment: combined-v3.patch.bz2 added

Updated patch (this is actually combined with the preview gen patch as I'm testing those together)

Changed 14 years ago by danielk

Attachment: combined-v4.patch.bz2 added

Added fix for slow streaming startup & ~RingBuffer? getting called before RingBuffer::run() can set readaheadrunning

Changed 14 years ago by danielk

Attachment: combined-v5.patch.bz2 added

Changed 14 years ago by danielk

Attachment: 8812-v15.patch added

Just the ring buffer portion of the previous patch (sans mythversion bits)

comment:5 Changed 14 years ago by danielk

Resolution: fixed
Status: assignedclosed

(In [26101]) Fixes #8812. This primarily fixes a number of locking problems in the RingBuffer? code.

This also introduces 4 optimizations:

1/ If a seek is to a position already contained in the read ahead buffer, we move the read position to that location and do not reset read ahead. 2/ If there is a seek to within 300,000 bytes of the end of the file, read ahead is not reset and instead we enter a special state which makes reads directly. This prevents the read ahead thread from being cleared when in practice this is usually followed by a seek back to the beginning of the thread. 3/ When the read ahead thread has no work to do it now waits for another thread to give it work instead of attempting to read repeatedly. This happens when the buffer is full, the end of file is seen, we've hit an irrecoverable error, etc. 4/ The read ahead read size is now incresed in size by 50% whenever it's not keeping up with the reader instead of increasing in 32K blocks. This results in faster response when the bitrate estimation is bogis (ffmpeg problem) or absent. Shrinkage is still done slowly in 32K blocks.

OpenFile? and subsequently the RingBuffer? ctor and ANN FileTransfer? have also been changed to use a timeout rather than a retry count parameter. OpenFile? was already using a timeout internally so that local and remote files would be treated similarly, but this had not been extended to the interface yet.

MythPlayer?'s Peek code now retries it's peeks for a couple seconds if they fail due to a too small file initially. This has been a problem ever since the LiveTVChain was introduced.. but it became noticable with some ffmpeg changes and then went from being relatively rare to an everytime occurance with the faster RingBuffer?. This fixes #8847.

Further optimations are possible. For instance, opt #2 could just read in the remaining bytes into an aux buffer which would have no effect on local files but shave at least 100 ms of mythstreaming. Opt #2 could reserve a portion of the buffer for skip backs, while right now those are opportunistic. Opt #4 could also resize the read ahead buffer itself, not just the block size; easy to do if we used a vector for the buffer. The bitrate could be passed from the local ringbuffer to the remote one so it could better estimate it's read ahead requirements. We could fix the bitrate calculations in ffmpeg or perform our own.

Note: There are two ANN FileTransfer? calls outside of the C++ code, one in the python bindings and another in the perl bindings. These are were not using the retry param and so should work fine with the change to a timeout. If there were any I missed in my grep that you know of please check them.

Note2: This ups both the binary version and protocol version, please distclean and upgrade all systems.

Note: See TracTickets for help on using tickets.