LFletcher

Members
  • Posts

    37
  • Joined

  • Last visited

Everything posted by LFletcher

  1. I've been having a look. In reality it shouldn't be too difficult to work out what goes where as its either a movie or a tv series/episode. I suppose I didn't expect everything to be in a lost+found directory, but as I've never done this before - you live and learn. I'll wait for the scan of corrupt files to finish, but what is my next step? Is it to unassign Disk 5 (physically remove it from the server), and then assign this cloned disk in its place? If I do that will unRaid recreate the old folder structure and I'll just have to manually move things into the correct place or will I have to do something else? Also what are the next steps to sorting out the issues with both Disk 3 (which we unassigned earlier) and the Parity 2 drive which also has issues still? Thanks
  2. ddrescue has now finished. I then ran the xfs_repair against the cloned drive; I've now run the following commands from the ddrescue faq; printf "unRAID " >~/fill.txt ddrescue -f --fill=- ~/fill.txt /dev/sdd /boot/ddrescue.log find /mnt/disks/Z2GBNVET -type f -exec grep -l "unRAID" '{}' ';' which is still in the process of running. When looking at the data on the mounted cloned drive everything appears to now be in a lost+found directory Shouldn't the cloned drive have a directory structure that mirrored the original disk? I assumed after the check I would have been able to unassign the old bad drive (Disk 5) and assign the cloned drive in it's place, restart the array and this part of the issue would be resolved. I guess with just the lost+folder that is not going to be the case or am I missing something?
  3. Is there any way to speed up the ddrescue process? It's been running for about 30 hours and it less than 70% done of pass 1 (ignore the run time on the screen shot, I had to restart it after 24 hours, so this is the second run)
  4. I have copied the important stuff onto another (external) drive. Ran the -v command and got this; So then ran the -vL and got this; And also these notifications;
  5. It's not my machine (I'm trying to sort it out for a friend), but it's safe to assume there won't be any backups. I know there are photos on the array, but I don't know where they are, or whether they are likely to be on any of the impacted drives. Obviously in an ideal world we'll be able to restore all of the drives without losing any data, but in an ideal world he would have paid more attention when the box started having issues (and given it to me sooner). What options do we have, assuming we have no backups to rely on and I need to try and save as much of the data as possible? All of the assistance I have been given so far is very much appreciated.
  6. OK, so I restarted the array in maintenance mode and ran the check with -nv and this was the output I assume I now need to run; -v /dev/md5
  7. Yes, the disk was unassigned when I started the server, so I reassigned it, but I hadn't started the array up until now. I have unassigned disk 3 and started the array. Disk 3 now resides in the unassigned devices section Disk 5 isn't happy though and states its unmountable I've attached the updated diagnostics file. tower-diagnostics-20221123-1623.zip
  8. Parity 2 passes a SMART test, but unRAID isn't happy with the results Disk 3 was unassigned when I booted the server up. When I allocated the drive thats when the udma crc message popup box came up. The drives are on a miniSAS backplane so it's unlikely to be a cable issue causing the crc errors. To the best of my knowledge I don't believe anything has been written to the emulated disk 3. Thanks
  9. Hi, Helping a friend who has some issues with his unRAID server. My understanding is that the parity drive had issues first of all and went offline (Parity 2). Then there was issues with a data drive (disk 3 - sdd) - this appears to have udma crc errors. I did SMART checks on the other drives and it appears Disk 5 (sdh) has issues as well, although it hasn't failed the drive (yet). I've got 3 14TB drives, which I had originally planned to replace both the parity drives with and also disk 3. Now that disk 5 also has potential issues I can get another 14TB drive to replace that. My question is, can the data from disk 3 be recovered or is it lost? If it can be recovered whats the correct order to do things in? Do I need to copy the data from disk 5 before that has any more issues? I've attached the diagnostics file as well. Thanks in advance. tower-diagnostics-20221123-1310.zip
  10. Thanks for the response - very much appreciated. New hard drive arriving tomorrow. Fingers crossed I don't have another one die on me with this rebuild.
  11. I had a drive failure a few days ago (Disk 5). Replaced it with a new drive, precleared and then started the data rebuild. That finished a few hours ago, but it appears that I have had another drive failure (Disk 8 this time). I am mostly concerned that the data rebuild wasn't successful for disk 5 - although the message on screen was green and said that there were 0 errors. I have checked some random files on Disk 5 and they appear to play back fine. I've attached the diagnostics file, would someone be able to confirm that the rebuild of disk 5 did complete successfully? I'm paranoid that it wasn't successful and that I have lost data on that drive. Is it common for other drives to fail during a rebuild - I thought the data was reconstructed from information on the parity drives (or is it stored across other drives on the array as well)? Once thats done, 'll have to start the process all over again with disk 8. Thanks in advance unraid-diagnostics-20210224-1852.zip
  12. Hi, I've been running on 6.5.3 since it was released. Finally upgraded this box to 6.6.7 and I can no longer access the SMB shares via the server name. Thought it might have been an issue with that specific release so upgraded to the Next Branch release of 6.7.0-rc5, but that has the same issue. Before upgrading I could access \\UNRAID\media from both my Windows 8.1 and 10 machines without issue. Since the upgrade the only way I can now access the box (from the Windows clients) is via the ip address, so \\192.168.1.235\media. If I try to access via the server name I get a Network Error message and Windows Cannot access \\UNRAID - specifically The network path was not found. None of the dockers or anything else appears to be having any issues accessing the shares on the box. Having searched the forums the most resolutions seem to point back to issues on the client machines, but I do not believe the issue is with the Windows clients in this instance. I have another unraid server on 6.7.0-rc5 (which was previously running 6.6.7 and 6.5.3 before that) and they can both access all the smb shares as expected. I have checked settings -> SMB settings and both servers appear to have the same values. Another thread mentioned removing an extras folder from the flash drive, but I don't have one of those. I checked the Fix Problems plugin and that only mentions a few deprecated plugins. Does anyone have any idea what might be causing the issue? It seems odd that one server running this release is fine and the other isn't. Thanks for your help. unraid-diagnostics-20190323-1521.zip
  13. I have tried putting the drive back into its original bay (it's in a 16 bay server case using HBA's and SFF8087 connectors) and it won't show up. I've moved it to two other bays and it also doesn't show up. I tried a different hard drive in the original bay of this disk and that is recognised without any issues. I can't leave the drive attached to the SATA connector on the motherboard, so are there any other options as to why Unraid won't recognise it now unless it's connected to the SATA connector on the motherboard? It worked fine for months before this. I note these messages are still in the log; Feb 6 22:16:02 Media kernel: sd 9:0:3:0: rejecting I/O to offline device Feb 6 22:16:02 Media kernel: sdd: unable to read partition table Feb 6 22:16:02 Media kernel: sd 9:0:3:0: [sdd] Attached SCSI disk Feb 6 22:16:03 Media emhttpd: device /dev/sdd problem getting id Any ideas as to whether reformatting the drive on a different system would help? media-diagnostics-20190206-2216.zip
  14. Right, I have connected it to an onboard SATA port and the disk shows up. It's sdd rather than sdl this time. It won't let me run a SMART test against it (either short or long) - when I click on start, it flashes stop and then shows the start button again. Any thoughts on what I need to do next? Thanks media-diagnostics-20190204-2219.zip
  15. Hi, Turned on a box I am building for someone else (after it has been off for a few weeks) and one of the 2 parity drives has disappeared. The GUI tells me it is missing. When looking at the logs it appears (if I am reading them correctly) that the partition table on the parity drive cannot be read. Feb 1 11:55:50 Media kernel: sd 10:0:6:0: [sdl] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Feb 1 11:55:50 Media kernel: sd 10:0:6:0: [sdl] tag#4 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 Feb 1 11:55:50 Media kernel: print_req_error: I/O error, dev sdl, sector 0 Feb 1 11:55:50 Media kernel: Buffer I/O error on dev sdl, logical block 0, async page read Feb 1 11:55:50 Media kernel: sd 10:0:6:0: rejecting I/O to offline device Feb 1 11:55:50 Media kernel: ldm_validate_partition_table(): Disk read failed. Feb 1 11:55:50 Media kernel: sd 10:0:6:0: rejecting I/O to offline device Feb 1 11:55:50 Media kernel: sdl: unable to read partition table The drive doesn't appear in unassigned drives and I cannot get to any smart information. I have tried the drive in a different slot (just in case) and that also didn't resolve the issue. Any idea how I can resolve the issue? I assume I should just be able to format the drive, put it back in the box and rebuild parity from scratch - but as I can't get to the drive how would I go about this? I've attached the diagnostics file as well. Thanks for your help media-diagnostics-20190201-1157.zip
  16. I'd rather have the option for multiple arrays rather than wider ones. I have a 48 bay case (and an additional 60 bay expander) which i'd like to use with unRAID, but due to the current protected drive limit I would have to swap to another product in order to utilize them properly.
  17. Thanks for the response. So in summary some or no data on disk 7 may or may not now be corrupt once I restore it back to a new disk? And I should also check the new data which was copying off the cache when the issue ordered just in case. Are there any tools which would assist with checking the files? In the past I've used mediainfo as that won't show container info if the file is corrupt. I assume I could create a disk share once the restore is complete and just scan that?
  18. Hi, Disk 7 has gone into an error state with a nice big red cross next to it. I followed the steps in this section of the troubleshooting guide, https://wiki.lime-technology.com/Troubleshooting#What_do_I_do_if_I_get_a_red_X_next_to_a_hard_disk.3F and have the diagnostics from before and after the reboot (see attached). From looking at the info in the syslog this is when the issue occured; Nov 22 18:23:31 unraid kernel: sd 1:0:12:0: task abort: SUCCESS scmd(ffff8807e28d1080) Nov 22 18:23:31 unraid kernel: sd 1:0:12:0: [sdn] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Nov 22 18:23:31 unraid kernel: sd 1:0:12:0: [sdn] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 a4 92 1b 98 00 00 00 08 00 00 Nov 22 18:23:31 unraid kernel: blk_update_request: I/O error, dev sdn, sector 2761038744 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=2761038680 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=5824529848 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=5824529856 Nov 22 18:23:31 unraid kernel: md: disk7 read error, sector=5824529864 Looking at the smart info for disk 7 I can see that the Reallocated_Sector_Ct isn't great. 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 480 I ran a quick smart test on the drive after the reboot and it appeared to get stuck at 90%. The sector count increased to; 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 536 So at this stage I assume I better RMA the drive back to Seagate as if it's not dead yet it will be soon? I have 2 questions with regards to replacing the drive as this will be the first time I've had to do it with unRaid and I don't want to do anything stupid and lose any data. I have no idea if anything was copying specifically to this drive at the time of the failure, but I was moving stuff off my cache into the main array. How would I know if any of the data I was copying at the time has become corrupt - or more to the point, how does unRaid deal with a write failure? Looking at the re-enable a drive section (https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive) seems to indicate the data went to an emulated drive, so should be ok and I won't have to hunt for the file(s) which may now be corrupt - is that assumption correct? Reading the Replace a drive section (https://wiki.lime-technology.com/Replacing_a_Data_Drive) is the following procedure correct for replacing the bad drive; Stop the array Unassign the old drive if still assigned (to unassign, set it to No Device) Power down [ Optional ] Pull the old drive (you may want to leave it installed for Preclearing or testing) Install the new drive Power on Assign the new drive in the slot of the old drive Go to the Main -> Array Operation section Put a check in the Yes, I'm sure checkbox (next to the information indicating the drive will be rebuilt), and click the Start button Does the checkbox mentioned in step 9 appear once you have unassigned the old drive and reassigned the new drive as this option is currently available with the rebooted and stopped array? Thanks for any help, it's very much appreciated. unraid-diagnostics-20171122-2037.zip unraid-diagnostics-20171122-2209.zip
  19. Not sure if this is related to running the beta or not. Came home yesterday and all of my drives said they were missing. Rebooted the machine and they all came back. Left it running overnight and then switched it off again this morning. When I came home this evening, started it up again and once again all of the disks were missing. Rebooted, and all but one came back (disk 7 was missing). Rebooted again and all the drives were back again. A few minutes after starting the array I received the message that Disk 7 returned to normal operation (sdi). There may have been a power cut yesterday (I can't be 100% certain as the server was running and the uptime on the web gui said it had been running for 7 days). Anyone got any ideas what the issue might be? My thinking was that if there was a power cut it may borked some hardware - the motherboard has an inbuilt LSI controller which is in IT mode and the Supermicro case has a SAS expander backplane - either of which may have been affected. I've run a short test against the hard drives (with no issues reported), but my feeling is that it isn't related to the hard drives. I've attached the diagnostics from the most recent reboots. Any help appreciated. unraid-diagnostics-20160512-2106.zip unraid-diagnostics-20160512-2116.zip
  20. Just started getting this error today whilst running the latest beta. I'm in the process of copying everything onto the unRAID server (from various Windows machines) and today that copy seemed really slow. The copy stopped with the following error message; Error 0x8007003B: An unexpected network error occurred. I captured the diagnosis file after this error occurred. It's worth noting that I was doing a copy of files from a Windows machine, but also trying to move several files from one share on unRAID to another, via a Windows PC. I then couldn't access any other files on the unRAID machine. Tried to stop the array and reboot. That didn't work. Also tried to use the powerdown plugin, which also didn't work, so did a hard reboot. After the box came backup, I tried to copy one of the same files which was copying when the transfer died the first time (this was a copy from one unRAID share to another - via Windows machine). This caused the same unexpected network error again. I've attached the diagnosis file for the second failure as well. I've searched on the forum and this error appears to have occurred frequently (so probably isn't related to the latest beta). The machine has only been in service for 6-8 weeks. I ran memtest on it before starting. All the hard drives have been through 3 rounds of preclear before being used. If I can't copy files either internally or externally to my unRAID machine I have a bit of an issue. Does anyone have any idea what's causing it and how to resolve it? Thanks unraid-diagnostics-20160426-2032.zip unraid-diagnostics-20160426-2232.zip
  21. From recent experience of preclearing 11 8TB drives, make sure you make your /var/log bigger. The advice I was given was the following; mount -o remount,size=256m /var/log I only ran 6 in parallel. As already mentioned although CPU and memory usage didn't get very high, the gui was unresponsive at times. That was on a box with 64GB of RAM.
  22. I had an issue with beta20 (that thread is locked so I thought this was the best place to mention it) which occurred when copying files from Windows onto my unRAID server. It appears to have corrupted files as part of the process and I had to hard reboot it. A more lengthy description and diagnostics file is in this thread, http://lime-technology.com/forum/index.php?topic=48209.0 Any help appreciated.
  23. I have had a closer look at the files that were copied as part of this process and rather worryingly it appears that the final 4 files that were copied before the issue took place have been corrupted. A little more concerning is that they were copied using TeraCopy and that successfully copied them based on the checksums matching. However when I try and play any of the files, although they start, they eventually hang before reaching the end.
  24. Hi, Still trying to find my feet with unRAID. I had an issue last week when running a number of preclears in parallel, https://lime-technology.com/forum/index.php?topic=47893.0 but this new issue appears different. I was copying several TB of video files from my Windows server to unRAID via TeraCopy. When I left it copying this morning it was fine. When I returned this evening the transfer had stopped and the Teracopy error was "The specified network name is no longer available". I can browse to unRAID via a windows machine. It lets me go through all the shares (none appear to be missing), however if I double-click on a video file to play it, it just hangs. Nothing in the dashboard indicates there's an issue with the array. I can't see any issues in the syslog either. I checked the /var/log folder as that's what caused the issue last week and it isn't very full so I don't believe its that. I'm currently running 6.2.0 beta20. I've also got a few dockers running which I was playing around with over the last week. Now I assume if I reboot the unRAID machine then the issue will be resolved. Currently I feel like I'm flying blind as I don't really know where to start looking in order to find where the issue might be. Does anyone have any advice as to what could be the issue, but also where do you start looking for clues as to what could have caused it? I've attached a diagnostics file. I've had a quick look at it, but as I don't really know what to look for, nothing immediately springs out to what could have caused the share to disappear from Windows. I've seen mentioned of unRAID being a little bit flaky with SMB, so is this just one of those issues? Once again, if it is, how do you diagnose that? Any help, much appreciated. unraid-diagnostics-20160406-1914.zip
  25. Hi, I've had similar issues to others with the new beta plugin with regards to it filling up the /var/log space. Started a thread here as I didn't realise it was due to the plugin, https://lime-technology.com/forum/index.php?topic=47893.0 I've run, mount -o remount,size=256m /var/log and I can see that has extended the space. Will it have broken the 2 preclears that I have been running against some 8TB Seagate drives or will it have kept on running? It also appears to have had the side effect of preventing me from accessing my unRAID shares from a windows machine (but I assume that due to the /var/log space filling up).