10 citations found. Retrieving documents...
Jim Gray and D. Siewiorek. High-availability computer systems. IEEE Computer, 24(9), 1991. Includes a criticism of N-version programming.

 @ NUS  Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
An Evaluation of the Recovery-Related Properties of Software Faults - Chandra (2000)   (Correct)

....But these applications have a high rate of failure and faults in them are directly visible to the common man or woman. Unfortunately, we have not yet succeeded in building fault free computer systems. Computers fail due to a variety of problems with their hardware and software. Field studies [Gray91] and everyday experience show that the dominant cause of failures today is software faults, both in the application and system layers. Reducing the number of software faults and surviving the ones that remain has been an important challenge for the fault tolerance community. Researchers have ....

....in the kernel text and modifies the instruction to reflect the kind of fault we intend to inject. 9 CHAPTER 2 SOFTWARE FAULT RECOVERY AND ASSUMPTIONS 2.1 Introduction As computers become an integral part of today s society, making them dependable becomes increasingly important. Field studies [Gray91] and everyday experience make it clear that the dominant cause of failures today is software faults, both in the application and system layers. Reducing the number of failures caused by software faults is therefore an important challenge for the fault tolerance community. The best way to ensure ....

[Article contains additional citation context not shown here]

Jim Gray and Daniel P. Siewiorek. High-Availability Computer Systems. IEEE Computer, 24(9):39--48, September 1991. 100


Reducing the Cost of System Administration of a Disk Storage.. - Asami (2000)   (18 citations)  (Correct)

....even with an operator with an around the clock pager that can be called in even during the middle of the night, cannot be expected to recover in a few minutes, let al..one a few seconds like our design. It usually takes much longer for a human operators to repair problems than machines do [GS91] 15 2.3 Our Approach I am proposing a self maintaining approach to system administration, in which the storage system maintains itself with only minimal help from a human operator. Instead of having someone constantly on call look after the system, our system is designed to mask problems and ....

....is important to reduce the system s down time as seen from across the Internet. Note that a system that relies on an operator to keep it running is not as available as one that maintains itself, as it will take minutes or maybe even hours for the operator to actually be able to repair the damage [GS91] Our goal is to have the system repair any interruption of service within a few seconds, and continue to function unattended until the next scheduled visit by the operator. For a web server application such as the one we are running, this is illustrated by the slogan repair by reload . As Mary ....

Jim Gray and Daniel P. Siewiorek. High-availability computer systems. IEEE Computer, 24(9), September 1991.


RAID Organization and Performance - Schwarz (1992)   (3 citations)  (Correct)

....RAID organization that accommodates multiple failures within reliability groups while retaining its excellent storage utilization, response time, and fault recovery properties. These schemes constitute a step towards the high availability computer systems recently advocated by Gray and Siewiorek [5]. We are concerned with exploring the performance improvements that are available within very large disk arrays. We will consider workloads featuring operations that involve a small quantity of data typical of database transactions. We are interested in considering the run time effects of various ....

Jim Gray and Daniel P. Siewiorek. "HighAvailability Computer Systems," COMPUTER, pp. 39-48, 1991.


Fault Tolerance and Scalability in DSM Coherence Protocols - A.. - Shah (1997)   (Correct)

....issues related to, reliable DSM systems. 2.3. 1 Terminology Fault tolerance discussions benefit from terminology and concepts developed by the International Federation for Information Processing Working Group 10.4 and by the IEEE Computer Society Technical Committee on Fault Tolerant Computing[26]. We may view a system as consisting of multiple modules, which are in turn composed of sub modules. A module has an ideal specified behavior and an observed actual behavior. A failure is deviation of the actual behavior from the specified behavior. A failure is caused by an error, which is a ....

Jim Gray and Daniel P. Siewiorek. High-availability computer systems. IEEE Computer, pages 39--48, September 1991.


A Scalable Key Distribution Hierarchy - McDaniel, Jamin (1998)   (Correct)

....these channels independent, an enterprise root can validate received certificates. Users may also opt to validate certificates through several independent channels, resulting in increased confidence in their authenticity. This approach is similar to mechanisms for high availability proposed in [GS91]. The architecture must be flexible. The topology of the Internet is constantly changing, so the architecture and underlying protocols must not be dependent on the physical connectivity or location of any singular authority. Mobility of users is of equal importance. As users travel from one domain ....

Jim Grey and Daniel P. Siewiorek. HighAvailability Computer Systems. IEEE Computer, 24(9):39--48, September 1991.


AFRAID - A Frequently Redundant Array of Independent Disks - Savage, Wilkes (1996)   (21 citations)  (Correct)

....support components. An aggregate MTTDL of a million hours (114 years) translates into only a 2.6 likelihood of any data loss at all during a typical 3 year array lifetime. This is much lower than the rate of problems due to software failures, operator errors, and other environmental difficulties [Gray90, Gray91a] that is, a small to medium sized array that achieves an overall MTTDL of 1M hours or better will probably be entirely adequate for the majority of its applications. In addition to reduced failure rates, modern disks also provide feedback mechanisms for predicting when such failures will occur. ....

Jim Gray and Daniel P. Siewiorek. Highavailability computer systems. IEEE Computer, 24(9):39--48, September 1991.


Designing a Self-Maintaining Storage System - Asami, Talagala, Patterson (1999)   (Correct)

....it is important to reduce the system s down time as seen from across the Internet. Note that a system that relies on an operator to keep it running is not as available as one that maintains itself, as it will take minutes or maybe even hours for the operator to actually be able to repair the damage[22]. Our goal is to have the system repair any interruption of service within a few seconds, and continue to function unattended until the next scheduled visit by the operator. For a web server application such as the one we are running, this is illustrated by the slogan repair by reload . As Mary ....

Jim Gray and Daniel P. Siewiorek. High-availability computer systems. IEEE Computer, 24(9), September 1991.


A Scalable Key Distribution Hierarchy - McDaniel, Jamin (1998)   (Correct)

....these channels independent, an enterprise root can validate received certificates. Users may also opt to validate certificates through several independent channels, resulting in increased confidence in their authenticity. This approach is similar to mechanisms for high availability proposed in [GS91]. The architecture must be flexible. The topology of the Internet is constantly changing, so the architecture and underlying protocols must not be dependent on the physical connectivity or location of any singular authority. Mobility of users is of equal importance. As users travel from one domain ....

Jim Grey and Daniel P. Siewiorek. HighAvailability Computer Systems. IEEE Computer, 24(9):39--48, September 1991.


A Review of Software Upgrade Techniques for Distributed Systems - Ajmani (2004)   (1 citation)  (Correct)

No context found.

Jim Gray and D. Siewiorek. High-availability computer systems. IEEE Computer, 24(9), 1991. Includes a criticism of N-version programming.


Building Secure and Reliable Network Applications - Birman (1996)   (121 citations)  (Correct)

No context found.

Jim Gray. High Availability Computer Systems. IEEE Computer, Sept. 1991.

Online articles have much greater impact   More about CiteSeer.IST at NUS   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST at NUS - Copyright Penn State and NEC. Hosted by the School of Computing, National University of Singapore.