Leaderboard

Popular Content

Showing content with the highest reputation on 11/08/19 in Report Comments

  1. The corruption occurred as a result of failing a read-ahead I/O operation with "BLK_STS_IOERR" status. In the Linux block layer each READ or WRITE can have various modifier bits set. In the case of a read-ahead you get READ|REQ_RAHEAD which tells I/O driver this is a read-ahead. In this case, if there are insufficient resources at the time this request is received, the driver is permitted to terminate the operation with BLK_STS_IOERR status. Here is an example in Linux md/raid5 driver. In case of Unraid it can definitely happen under heavy load that a read-ahead comes along and there are no 'stripe buffers' immediately available. In this case, instead of making calling process wait, it terminated the I/O. This has worked this way for years. When this problem first happened there were conflicting reports of the config in which it happened. My first thought was an issue in user share file system. Eventually ruled that out and next thought was cache vs. array. Some reports seemed to indicate it happened with all databases on cache - but I think those reports were mistaken for various reasons. Ultimately decided issue had to be with md/unraid driver. Our big problem was that we could not reproduce the issue but others seemed to be able to reproduce with ease. Honestly, thinking failing read-aheads could be the issue was a "hunch" - it was either that or some logic in scheduler that merged I/O's incorrectly (there were kernel bugs related to this with some pretty extensive patches and I thought maybe developer missed a corner case - this is why I added config setting for which scheduler to use). This resulted in release with those 'md_restrict' flags to determine if one of those was the culprit, and what-do-you-know, not failing read-aheads makes the issue go away. What I suspect is that this is a bug in SQLite - I think SQLite is using direct-I/O (bypassing page cache) and issuing it's own read-aheads and their logic to handle failing read-ahead is broken. But I did not follow that rabbit hole - too many other problems to work on
    1 point