snatchos

Members
  • Posts

    22
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

snatchos's Achievements

Noob

Noob (1/14)

0

Reputation

  1. I decided to skip the SMART test for now just to get my music off of disk9. If I can accomplish that, which I'm doing right now, then pretty much everything else is replaceable/backed up on another server. So if I get my music off disk9 I'm just inclined to remove disk1 and disk9 completely and build a new array starting with 10 of the 12 data disks and rebuilding parity on a new, larger parity drive at the same time. My number one priority is getting the music off disk9 before anything catastrophic happens to it - is that paranoid?
  2. Thanks for the feedback everyone. Work and life had to take priority this week and I'm just getting back to this. Today when I rebooted my server the dashboard is showing an error on Disk9. I've decided my number one priority right now is to get all my music off Disk9 and I'm working on doing that. I synced my server with a friend of mine several years ago and he has everything that's on Disk1 so I can get that back if I lose all that data. Given that it looks like Disk1 and Disk9 are in bad shape (and now I know I have a backup of disk1) I'm thinking of getting all my music off Disk9 and then setting my server up as a new configuration. While doing so I would increase the parity drive size. I have two 8TB drives that I'd use for this purpose (one for parity, one for data). Does this seem like a good idea? I'm sure there are threads out there with instructions but I may continue to reach out for help and advice on this thread. Any thoughts on my proposed course of action?
  3. I'm not sure what's going on. I'm on another computer and looking at the GUI it doesn't look like the self-test is running on disk1 after all so I'm not sure why the browser in my main pc is showing the self-test as 90% complete. Maybe I need to clear the browser cache? Perhaps I should just reboot the server and start the self test again. Is there any way to kick off the self test through telnet instead?
  4. When I went to do an extended test on disk9 I click the start button and nothing happens. So I went back to Disk1 to do an extended test there and an extended test (the one I kicked off last night I guess) is still in progress. It's 90% complete after about 16 hours, maybe not a good sign. it's taking so long Anyway I'll wait until the disk1 test is finished and then go back to the disk9 test. Once disk1 is done do I need to post full diags or is there somewhere I can find just the self-test info for the drive? Maybe that will become obvious to me once the test finishes
  5. Thanks JorgeB. Any idea why they would start running the self-test and not complete? Any idea why I keep getting a 404 Not Found error when I don't usually receive those? I'll try kicking off another extended self-test on disk9 and see how it goes. Possibly the self-test stopped because I had a spin down delay of 1 hour. I set to 'Never' for now so hopefully the self-test will complete now.
  6. Last night I powered down my server and moved it to a more open space just to ensure there would be no issues with airflow when doing the extended self-tests for disk1 and disk9. I set them running and left them going overnight. When I came back to the Unraid GUI it wasn't clear tome whether they were running or not. When I opened the GUI in a new tab I got a 404 Not Found error. I've been seeing this fairly regularly over the last few days and it's not something I've ever experienced before. In order to see the GUI again I have to close the tab, close out of the browser and then reopen the browser and navigate to tower\main again. So that's probably not a good sign, but anyway... I'm not sure whether the extended self-tests are still going on or not - I kicked them off about 15 hours ago, both are 3TB drives. So I'm not sure I have any update right now. In any event, I've downloaded the syslogs and the diags zip file and am attaching here. Is there a way to tell if the self-tests are still happening? syslog.txt tower-diagnostics-20210505-1217.zip
  7. Thanks for clarifying. Am I able to run the extended test on both drives at the same time? Or do I need to do them one at a time?
  8. Thanks JorgeB. I power cycled and the SMART report for disk9 is now in the diagnostics. I'm attaching it for reference. My next step is to run one or more extended SMART tests. I was planning to run on disk1 and disk9. You've also suggested I run another on disk3 - did you see something in the logs that looks concerning? Should I also run an extended SMART scan on the parity drive as well? Am I able to run multiple extended SMART scans concurrently? Thanks again for the help. WDC_WD30EZRX-00MMMB0_WD-WCAWZ2186031-20210503-1511 disk9 (sdm) - DISK_DSBL.txt
  9. Just to be 100% clear on my earlier comment - while I am in a position to buy a new prebuilt server, it is very much a last resort. It is a lot of money and I would much rather spend time and a much smaller amount of $ to buy a few new drives than to buy a whole new server. My number one priority here is ensuring that I don't lose any data if possible. I'm assuming that this is possible by adding drives to the server, mounting them (but not adding to the array) and copying the data over from the sketchy drives then replacing the new drives with the data into the slots where the dodgy drives were. But I'm really not sure how to do this and i don't want to do it incorrectly and risk losing data through my own ignorance or lack of understanding. Thanks again for the advice/input/suggestions as I'm kind of dead in the water without your expert input.
  10. Thanks for the comments. When I read codefaux's reply it makes me very worried, sounding like I'm at risk of losing one of my data drives (disk 1 the WD) and parity (the Seagate). Then there's the drive that is disabled (disk 9 WD) that may or may not be compromised (or may be that the controller just dropped the drive). I didn't include the log for disk 9 as it says it was skipped due to it being disabled. JorgeB's response makes me feel not quite as bad, but I'm still worried about losing the array and a bunch of data. The temperature situation is weird - my server normally doesn't run that hot. Right now all drives are spun up and all are sitting between 37 and 39 C. The air flow situation of the server is I think okay and the fans are running. So not sure why it was running so hot yesterday. Codefaux - it sounds like you're saying that my priority should be to get the data off disk 1 (the WD with the one uncorrectable error logged) and then replace that with a new drive. It sounds like you're also saying to replace the parity drive (the Seagate with eight uncorrected errors). Is that right? Can't I only replace one drive at a time? My understanding was that if I lose two drives then I lose the whole array (but I'm a newbie despite running an unraid server for 15+ years). Then what am I supposed to do about the disk 9 that's disabled? All of my music collection is on there. If there's anything that's irreplaceable on the server it's that music on disk 9 and some personal data on disk 8. I think I should prioritize getting off the data I actually need to keep first and then worry about fixing the other drives. But I don't want to risk bringing down the entire array by copying irreplaceable data off the server if that's likely to cause issues for the entire server. Just for my own understanding (sorry for my ignorance): is it true that I can copy all data off disk 1 (the WD) onto a new 3TB disk and swap both the data disk (with the copied data) and the parity (with a fresh 3TB drive) into the correct bays in my server at the same time and then rebuild the parity drive? Or do i need to replace disk 1 and run a parity check and then replace the parity drive and rebuild parity? And at what point do I deal with disk 9? Is it worth running an extended SMART scan on that drive as a step 1 to see if there's actually anything wrong with it? It feels like to me I'm on the verge of a catastrophic situation of losing a bunch of data - if this isn't the case I'd appreciate it if someone could walk me back off the ledge. I'm fortunate enough to be in a position where I can afford to buy a new prebuilt server from Greenleaf (who built my current server around 10 years ago). I've already got someone willing to buy my existing rackmount server and populate with their own drives, which will help offset the cost slightly. This is an option for me assuming that my existing server will "hang on" long enough for the new server to be built and shipped and for me to get all the data off the old server and onto the new one. Given the current state of the drives on my server is that asking too much? I'm attaching the diagnostics zip file as well for more info. Sorry for all the questions - I'm a long time unraid user but have always been a "set it and forget it" type of user because unraid has always been a bulletproof solution for me. I welcome and appreciate any and all advice, thanks so much for the help. tower-diagnostics-20210502-0825.zip
  11. Sorry, this is a bit of a long post. The short version is: (1) one of my data drives is currently disabled (since yesterday) (2) my parity drive is showing a SMART error in the dashboard (but has a green ball on Main screen) (3) one of my data drives is showing a SMART error in the dashboard (but has a green ball on Main screen) I would really appreciate input/advice on the following: (a) Are my parity and data drive that are showing SMART errors on the dashboard “okay”? If so then I can proceed to (b) (b) What happened to my data drive (why did it become disabled)? (c) What is the best way to rebuild my disabled data drive? (d) MOST IMPORTANTLY – Am I at imminent risk of losing any (all?) of the data in my array? I’m attaching the following: (i) a screenshot of my dashboard showing the SMART error on the parity and disk 1 devices. (ii) my syslog (iii) log files for parity and disk 1 (iv) A file showing the PCI devices (not sure if this is useful) This server was built for me by Greenleaf and has been running with no problems for at least a decade. I really appreciate any input/advice from the experts here. Thanks for any help you can provide. The long story (with some more details): This morning I noticed a disabled icon next to one of the drives on my dashboard. I recalled that when moving some items over to that particular drive yesterday I had some sort of communication blip (“network location not available”) – I skipped over the file and everything else finished copying. Then I went back and moved the last file over. That must’ve been when the drive became disabled. I say that because I was looking at the dashboard yesterday to see which drives had the most space. I’m sure I would’ve noticed the little x next to the drive in question showing it as disabled. I made a rookie mistake of rebooting my server without first taking a syslog so I don’t have anything to show what went wrong (but the syslog I attached has data from yesterday so maybe it’s okay). My primary concern is ensuring no data loss (outside of what I’m willing to lose). The drive in question is a 3TB drive with a little over 2TB of data on it. There’s content on there that I was planning to get rid of at some point, but that’s the only data on the 30+TB array that I’m “willing” to lose. There’s a little less than 1TB of data on that drive that I actually need. I would appreciate some advice on next steps. My inclination is to copy the data I want to keep onto either dropbox or the new HTPC I’m setting up. Copying that much data over may take a while, but it’s kind of a “set it and forget it” task. I can leave the copy task running and get on with my life. Once I have all the data that I need copied over from that drive, my intention is to rebuild that drive onto itself. If it works then great, I can run a parity check and, if it’s successful, be happy with the server. If it fails then I can buy a new 3TB drive, preclear it, and rebuild the data drive on that. I’m also thinking that, if there is some sort of issue with the drive and/or controller then the copy task will identify the issue (i.e. there will be some communication glitch and the syslog will help identify what’s going on). I also found on the dashboard that two of my drives (parity and disk 1) have a SMART error showing. They both have green icons on the Main screen and are currently enabled. I’m wondering whether I should be concerned about this? Are these drives at risk of failing and, in combination with the disabled drive, taking out my whole array? Thanks for taking the time to read (if you made it this far). Any and all advice/input on next steps would be appreciated. Thanks in advance. syslog.txt ST3000DM001-9YN166_W1F07ZZC-20210501-1212 parity (sdo).txt WDC_WD30EZRX-00MMMB0_WD-WCAWZ2214731-20210501-1212 disk1 (sdc).txt System Devices - PCI Devices.txt
  12. Thanks Joe, Even though I had formatted the drive in windows, when I connected back to my pc it was showing as uninitialized and i was getting cyclic redundancy errors. So I reformatted a fresh drive and mounted it via unmenu, then mounted with write access and then shared it. But the share button never changed to unshare. At the bottom of the page it says sharing /dev/sdm1 as sdm1. But when I navigate to it in MC I don't see the 'Movies' folder I created. I do see it on my network as a share but when I try to access the folder it prompts me for a password. I've tried 'root' and I get an error message. So my drive's mounted, appears to be shared, but I can't navigate to the 'Movies' folder I created after formatting in windows. When I try to copy a movie folder using MC to the root directory of mnt/disk/sdm1 I get an error that the folder already exists there, which it doesn't. So I'm kind of stuck on how to proceed. Any thoughts?
  13. I'm not even sure how to ask this question! Sorry if this is the wrong forum. So I wanted to plug a SATA drive into my server's USB port so I could copy stuff using mc directly onto it. After hours of research I settled on installing unmenu and the ntfs-3g package. Then I went and mounted the drive (already formatted NTFS with folders for Movies and TV Shows) so I could start copying files onto it. I got some weird message about unrecognized format and it asked if I wanted to format with reiserfs, which I don't b/c the drive is going into a media player when it's been filled with content. Now I can't seem to access the drive or the folders on the drive or unmount it or do anything with it in unmenu or telnet. It shows up in Disk Management like this: usb-WDC_WD20_EZRX-00DC0B0_000000000000-0:0 * 36893488T /dev/sdm I have no clue with Linux. Could someone please help me figure out how to (1) unmout the drive and then (2) mount the drive so I can begin copying content over with midnight commander? Any help much appreciated. Thanks!