• DocumentCode
    2703159
  • Title

    Post-silicon validation: It´s the unique fails that hurt you

  • Author

    Ahuja, P.K.

  • Author_Institution
    Sun Microsystems, USA
  • fYear
    2009
  • fDate
    1-6 Nov. 2009
  • Firstpage
    1
  • Lastpage
    1
  • Abstract
    We have found that logic, design, and architectural bugs do not control the difficulty of bringing up a new microprocessor. Anything that can be reproduced in simulation can be fixed rapidly. The bugs that are hard to reproduce, which occur sporadically, and which don´t fail consistently with voltage or temperature are the ones we remember. We describe one such bug, called SSEL for the system error message it caused, which one test engineer said was the strangest bug seen in his long career. It was limited to only one output, and did not occur in other similar outputs. It never failed on a consistent schedule. Failure rates showed a strong correlation with wafer location. Finally, one of the best system level tests for the failure was letting the system sit at the command line prompt, since the failure was not related to system activity. We will describe the characteristics of the bug, the results of experiments with it, our mitigation strategy, our fix, and the root cause. Reliability and availability features built into our servers allowed us to protect customers from the impact of the problem. We will show a large amount of real data from the effort to find the cause of this problem.
  • Keywords
    Availability; Computer bugs; Engineering profession; Logic design; Microprocessors; Protection; Sun; System testing; Temperature; Voltage;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Test Conference, 2009. ITC 2009. International
  • Conference_Location
    Austin, TX, USA
  • Print_ISBN
    978-1-4244-4868-5
  • Electronic_ISBN
    978-1-4244-4867-8
  • Type

    conf

  • DOI
    10.1109/TEST.2009.5355606
  • Filename
    5355606