[OmniOS-discuss] Overheating faults with ST4000NM0023

Schweiss, Chip chip at innovates.com
Thu Apr 17 15:40:02 UTC 2014


You can get the Seagate firmwares from this link:

https://apps1.seagate.com/downloads/request.html

Seems they don't link to this on their site any more I found it in an old
email from their site.

-Chip


On Tue, Apr 15, 2014 at 5:30 PM, Saso Kiselkov <skiselkov.ml at gmail.com>wrote:

> Hi,
>
> I've hit this exact same issue on my recent SEAGATE ST2000NM0023 drives.
> Can you please direct me to where I can get the firmware package?
> Perhaps we could also post the link publicly, so that people can find it
> through google or some such method.
>
> Thanks!
>
> Best wishes,
> --
> Saso
>
> On 2/13/14, 11:18 AM, Thibault VINCENT wrote:
> > On 02/12/2014 09:59 PM, Steamer wrote:
> >> Did you ever find a solution to the overheating faults with the
> >> ST4000NM0023?
> >>
> >> I'm currently having the exact same issue with ST1000NM0023 drives,
> >> seems like seagate has the user temp probe set at 40'C. The manual
> >> states that the temperature settings are programmable via smart, but I
> >> haven't found a way to do that.
> >
> > Hello Emile,
> >
> > I've found a workaround but the definitive fix should be handled by
> > Illumos I guess. There is no open ticket, first I was waiting for
> > something to happen with #4051 before going back to using that distro
> > and kernel.
> >
> > Here's the story:
> > The SCSI specification defines two registers to store the temperature
> > thresholds in SMART data. One contains the recommended maximum operation
> > temperature for best MTBF, and the other register is for the absolute
> > maximum rating. Usually the industry has always put the same value in
> > both, and that is the absolute maximum. That's why we always see
> > something like 60/65°C from SMART. But recently Seagate has changed that
> > because it was asked by a large OS company to comply with the
> > specification for better hardware monitoring integration. The change did
> > not only occur in newer products but in a firmware update for existing
> > disks and that was applied to the production line which explains some
> > disks mays or may not expose this problem although they are the same
> > model. Our disks are of the Megalodon serie and all share the same
> > firmware basecode.
> >
> > So any Seagate disk will now trigger faults in FMA if they have a
> > firmware with the newer policy. Also I think other brands will follow
> > the same path.
> >
> > Like other members suggested in that thread, maybe nothing should change
> > in FMA but let's face it, you can't maintain a temperature steadily
> > under 40°C in a JBOD of hundreds of busy disks. Especially in
> > eco-friendly datacenters. IMHO we should not trigger a fault on the
> > lower threshold, and certainly not a drive retirement. It breaks storage
> > servers on reboot or before a pool import, also spare disks could
> > disappear with the retirement triggered.
> >
> > The workaround is to downgrade firmware to the last version before the
> > change, and to reset the register with an SCSI command. It is not
> > possible to set the register to a user specified value like the
> > documentation suggests, they confirmed it.
> >
> > I'm sending a working firmware to you in a private mail. I'm not aware
> > of any issue working with that older version and hopefully it should
> > upload to 1TB drives as well.
> > I'm applying it like this but from Linux not OmniOS:
> > # ./dl_sea_fw-0.2.3_32 -f Megalodon_StdOEM_SAS_0002+C84C.lod -m
> ST4000NM0023
> > # ./dl_sea_fw-0.2.3_32 -i
> >
> > Then you should reset the drives so they reload the firmware.
> > Here's our example for 4TB drives:
> > -------------
> > for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
> >   sg_reset -d $i
> > done
> > -------------
> >
> > And reset the register that contains value from the previous firmware.
> > It doesn't work well so we've got this script to run a few times until
> > all disks got it. Again it matches 4TB Megalodon.
> > -------------
> > for i in $(lsscsi | grep 'ST4000NM0023' | awk '{print $6}') ; do
> >   echo -n "$i "
> >   if sg_logs $i --page=0x0d | grep 'Reference temperature = 68 C'
> >> /dev/null ; then
> >     echo 'ok'
> >   else
> >     sg_logs $i --page=0x0d --reset
> >     echo 'reset'
> >   fi
> > done
> > -------------
> >
> >
> > Cheers
> >
>
> _______________________________________________
> OmniOS-discuss mailing list
> OmniOS-discuss at lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://omniosce.org/ml-archive/attachments/20140417/1f42ca7a/attachment.html>


More information about the OmniOS-discuss mailing list