The PEGASUS Method

Description of the PEGASUS-Method

The following chapter describes the different elements of the PEGASUS method. Therefore, the methods, tools, processes within the different elements are explained.

0. Description of test object (Use Case)

0. Description of test object (Use Case)
    The Highway Chauffeur

Among the project partners, the highway chauffeur was picked as exemplary test object. The system was designed to be an SAE L3 conditional automation system. The system design was chosen that on the one hand simple enough so that all major partners could or already had implemented a system like it, on the other hand complex enough so that we could use it to derive a vast variety of scenarios from it including critical scenarios and automation risks. The following Figure 1 shows the systems capabilities (dark blue) and limitations (light blue).

The following Table 1 summarize the capabilities of the automated driving function on the level of path guidance and stabilization.

The system limitations are mainly based on its exemplary and simple design as well and therefor led to the following restrictions for the ODD (see Table 2).

Extended Application Scenario

One of the goals of the PEGASUS project was to also ensure that our tools and methods can be adapted to different domains. To avoid spending time and effort on defining a completely new system, we analyzed our Highway chauffeur’s key capabilities and defined 5 exemplary functional scenarios which would differ from the original system on different levels. Describing the new system in scenarios helped us not only to efficiently describe the desired functionality and performance but also allowed us to systematically generate executable and therefore testable concrete scenarios which then could be used in simulation without additional effort.

We ended up picking functional scenarios that would extend the ODD to highway and rural roads and chose 5 specific situations our SUT could encounter there. Table 3 shows a description of those scenarios

1. Knowledge

1. Knowledge

As a basis for defining requirements or deducing test cases, the operating scenarios of the item to be developed have to be specified. For this purpose, various sources of knowledge, like standards, regulations, or expert knowledge, can be utilized. Well established codes & standards like SAE J3061 (Society of Automotive Engineers, 2014) on nomenclature, as well as safety-related documents like ISO 26262 (International Organization for Standardization, 2009) are the basis of any development of highly automated driving systems. Furthermore, respective legal requirements like the country-specific road law and international conventions as well as related regulations need to be followed. Finally, there exist non-binding documents from government institutions or related committees with specific recommendations. One prominent example here is the document published by the ethics committee of the German transportation ministry (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019). Also knowledge by experts on the specific use case and its requirements should be collected via a structured expert peer review.

Guidelines

Within the scope of PEGASUS, the German standard how to build freeways (RAA) (Forschungsgesellschaft für Straßen- und Verkehrswesen, 2008) was used as a source of knowledge for specifying the operational design domain of the Autobahn-Chauffeur. The standard provides a set of rules regarding the process of constructing freeways and defines design patterns.

In a first step of constructing freeways, a design class is chosen based on criteria like freeway type (for example urban freeway or transregional freeway). This design class specifies the standard cross section, the limit and standard values of the design elements, the elementary shapes of junctions and distances between junctions, as well as speed limits.


Cross sections
The cross section is chosen based on the design class and the expected traffic volume. The cross section defines the arrangement and dimensions of freeway components like lanes, as shown in Figure 1. Each cross section may contain different amounts and types of lanes like traffic lane, shoulder or central reservation.

Alignment
Each freeway consists of multiple segments. Besides the standard cross section, each segment implements a layout and an elevation profile, as seen in Figure 2.

The layout describes the lateral alignment of the freeway. For this purpose, the RAA defines multiple design patterns, like straight, circular arc, or clothoid, and specifies parameter ranges to be redeemed.

To describe the vertical alignment of the freeway, the RAA defines multiple patterns for the elevation profile like plane, crest curve, or sag curve. In addition, guidelines how to calculate the parameters for these patterns are formulated.

 

Junctions
To connect multiple roads, the RAA defines various patterns for junctions, like clover leaf or windmill. Which pattern is used, depends on the traffic volume, the constructional effort, installation space, and elevation profile.

These design patterns specified in the RAA can be used to define the operational design domain of the Highway Chauffeur. In addition, these patterns build the basis for a knowledge-based scenario generation with the help of an ontology.

Codes & Standards

As a necessary pre-requisite generally accepted industry standards are to be followed, as well as national and international regulations. For the development of HAD systems ISO 26262 (International Organization for Standardization, 2009) and ISO/PAS 21448 (International Organization for Standardization, 2019) are essential.

ISO 26262
Functional safety processes in the automotive industry are described in ISO26262, an adaptation of the IEC 61508 functional safety standard with a focus to automotive Electric/Electronic Systems. The ISO 26262 document has the goal to provide process steps to assure that the developed E/E (electric/electronic) system is functionally safe meaning the “absence of unreasonable risk […] due to hazards […] caused by malfunctioning behavior […] of E/E systems“ (International Organization for Standardization, 2009). The most recent update of this standard was released in December 2018 as ISO 26262:2018 iteration.

After defining the functional specification of the system a hazard analysis and risk assessment is performed. The Hazard and Risk Analysis determines the “Automotive Safety Integrity Levels” (ASIL= which are based on a matrix with Severity (ranging from S1 Light Injuries to S3 Life threatening injuries), Exposure (E1 Very low probability to E4 high, - i.e. 10% - probability) and Controllability (C1 Simple, i.e. >99%, to difficult C4 Difficult, i.e. <90%) as input values. Thereby, an ASIL level represents a risk-based classification of a safety goal.

Based on these results, the system is designed and a safety concept is developed. The hardware and software development then takes place in further parallel runs of the V model. On the right branch of the V integration tests for the developed components are performed and a validation of the safety goals takes place, as well as a functional safety assessment of the complete vehicle against the functional specification. Since both the V model itself as well as ISO26262 and IEC 61508 are well established, within the scope of this document no further details are provided. Instead the interested reader is referred to the full documents. It is recommended to always follow ISO26262 (International Organization for Standardization, 2009) as basis procedure.

 

SOTIF
Regarding the PEGASUS analysis, for highly automated driving, also the principles of the safety of the intended normative behavior of the automated driving system needs to be addressed, reaching beyond E/E system malfunctions. For this purpose, currently activities are ongoing within standardization bodies to address the so-called “safety of the intended functionality”. Originally these activities focused on driver assistance and partially automated systems, but now are extended to higher levels of automation as well (SAE level >=3). As stated by ISO, “the absence of unreasonable risk due to hazards resulting from functional insufficiencies of the intended functionality or by reasonably foreseeable misuse by persons is referred to as the Safety of the Intended Functionality (SOTIF)”. In order to cover these issues, guidance on the applicable design, verification and validation is needed. A document was published by ISO in January 2019 as a “Publicly available standard” ISO/PAS 21448:2019 (International Organization for Standardization, 2019).

The split between ISO 26262 and ISO/PAS 21488 is to be shown in Figure 3. Regarding aspects of the cyber security of road vehicles, there are also currently activities ongoing: ISO/SAE 21434 reached the status of a committee draft in September 2018.

Recommendation

A prominent example of a government recommendation regarding automated driving is the document by the ethics committee of the German Transportation ministry that was released on June 20, 2017 (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019). This committee was headed by the former justice of Germany’s Supreme Court, Udo di Fabio. The Ethics Commission on Automated and Connected Driving was convened as an interdisciplinary panel of experts by the German Federal Minister of Transport and Digital Infrastructure in 2016. The goal is „to develop the necessary ethical guidelines for automated and connected driving“. Results of the five working groups (“Situations involving unavoidable harm”, “Data availability, data security, data-driven economy“, “Conditions of human-machine interaction“, “Consideration of the ethical context beyond road traffic“, ”Scope of responsibility for software and infrastructure“) were presented as a public available report in 2017 (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019). This report contains among other things 20 ethical rules for automated and connected vehicular traffic. Of particular importance for PEGASUS project is the second rule: “The protection of individuals takes precedence over all other utilitarian considerations. The objective is to reduce the level of harm until it is completely prevented. The licensing of automated systems is not justifiable unless it promises to produce at least a diminution in harm compared with human driving, in other words a positive balance of risks.” Within the framework of PEGASUS safety argumentation, this particular rule is set as the primary safety goal to be met.

2. Data

2. Data

Within PEAGASUS, a scenario-based approach is chosen to assess the highway chauffeur function. Those scenarios can be derived in two ways, either systematically (see P4) or data driven (this paragraph). For the data driven way, the project aimed at using available, not strictly confidential data to develop the methodology and derive scenario characteristics. Therefore, within D2, relevant data is collected from multiple sources such as naturalistic driving studies (NDS), field operational tests (FOT), simulation data from user studies, accident data (GIDAS) as well as recorded data from test drives. This is the baseline for all adjacent metrics and tools to search for relevant scenarios deriving the characteristic scenario parameters and statistics and to eventually assess the driving function.

NDS / FOT / Test Drives

The PEGASUS approach of scenario-based combined testing requires insights from real-world traffic in order to understand what the relevant situations for highly automated driving (HAD) are and how frequently they occur. Thus, the field operational test (FOT) and naturalistic driving studies (NDS) data analysis within PEGASUS derives traffic characteristics from real driving data. The naturalistic methods NDS and FOT represent field studies within real traffic where drivers are observed during a longer period of time while everyday driving. The so gathered data is characterized by a high external validity. Furthermore, the data enables comparing the performance of an automated driving function in a test scenario with human driving performance in similar situations in real traffic. To a certain extent evaluation criteria for this comparison can be derived from it.

Driving Simulator Data

Driving simulator data allow to analyze the underlying causal relationships of accidents and to systematically approach and assess the threshold of human driving performance.

Thus the application of the driving simulator supports closing the gap between the assessment and description of human driving performance in critical and non-critical scenarios to the point of accidents. Furthermore the driving simulator can be used to assess human driving performance in scenarios that are not available in accident data bases nor NDS/FOTS due to their minor probability of occurrence.

Within PEGASUS, the threshold of the human driving performance was assessed using driving simulator studies e.g. by producing repeatedly critical scenarios with varying stimulus intensities. The varying stimuli were operationalized by different criticalities, e.g. the TTC (time to collision) of a merging vehicle in front. The threshold of the human driving performance was then described as the probability to have an accident dependent of the different levels of criticality. Additionally further performance parameters, e.g. the deceleration or reaction behavior have been analyzed as well as the accident severity. Using the simulator data the metric controllability of the human driver was described by means of a logistical regression and can be thus directly compared with the performance of the automated driving function.

Within a first simulator study 52 participants drove on the left lane of a two-way highway with a constant speed of 130 km/h. A preceding car driving in a column of vehicles merged 63 times with a velocity of 80 km/h from the right to the left lane. In doing so criticality was systematically varied by varying the time-to-collision (TTC) (0.5, 0.7, 0.9, 1.1, 1.3, 1.5s). During the experiment the TTC was calculated when the merging vehicle drove on the midline (s. Figure. 1 aspired TTC). However, the reported TTC of the following results is based on the actual beginning of the merging maneuver in order to achieve a realistic estimation of the criticality. [In case the merging vehicle was not visible yet, the earliest point of visibility was used for calculating TTC.]

It should be indicated that the assessment of driving performance was focused on the stabilization of a vehicle within this experiment. This driving task, which is almost skill based, e.g. based on stimulus reaction automatism, was selected due to a direct influence on the criticality of the driving situation (see Preuk & Schießl, 2017) [ Preuk & Schießl, 2017,Menschliche Leistungsfähigkeit als Gütekriterium für die Zulassung automatisierter Fahrzeuge: Methode zur Ermittlung der Grenzen menschlicher Leistungsfähigkeit. 9. VDI-Tagung Der Fahrer im 21. Jahrhundert, VDI-Berichte 2311, S. 15-24.]

Within this study accidents have hardly been avoidable in critical situations with a TTC < 1.3s, on the other side accidents occurred very seldom with a TTC > 1.7s; the threshold of human driving performance should be in between accordingly.

As described above the metric controllability of the human driver was described by means of a logistical regression and can thus be directly compared with the performance of the automated driving function. For this purpose the accident probability was equalized with the controllability and predicted on the basis of the measured TTCs. 4 different logistical regressions were calculated in order to indicate the performance of different driver groups (see Figure 2):

  • the whole sample without outliers (orange, n = 48),
  • the mean 50% of the sample (middle blue, n = 26),
  • the best 10% of the sample (light blue, n = 7)
  • the worst 10% of the sample (dark blue, n = 4).

The threshold for the human driving performance was defined by an accident probability of 50%, which is linked with a TTC of 1.38s for the whole sample within this study.

GIDAS and PCM data

The German In-Depth Accident Study (GIDAS) is a database of real world crashes which are investigated by the MHH (Medizinische Hochschule Hannover (MHH), 2019) and the VUFO (Verkehrsunfallforschung an der TU Dresden GmbH, 2019). The project is funded by the Bundesanstalt für Straßenwesen (BASt) and the Forschungsvereinigung Automobiltechnik e. V. (FAT) since 1999. For a collision to be documented in GIDAS, it must adhere to sampling criteria, such that the data can be assumed to be a representative sub sample of the German crash scene. The criterion requires that at least one person has to be at least slightly injured. GIDAS contains information about the involved participants (for example vehicle, cyclist, pedestrian etc.) and the individuals, their sustained injuries and the infrastructure. Each crash is reconstructed and the findings are also stored in the database.

The Pre-Crash-Matrix (PCM) is a branch of the GIDAS data. It is not reprehensive of the German crash scene, rather details information regarding the pre-crash phase of two participants. The database contains detailed information regarding the predicted trajectory five seconds prior to initial impact.

The GIDAS database with PCM extension collects in-depth data from real world crashes in the region Hannover and Dresden. Each year up to 2000 single crashes are investigated and include the documentation of over 3000 crash-relevant properties. This information include

  • Conditions of the environment
  • The road design with traffic regulations
  • Vehicle deformations
  • Impact sites of occupants and other road users
  • Technical characteristics such as vehicle type and technical equipment
  • Crash information and characteristic values, i.e. collision and driving speed, ∆v and EES
  • Crash causes and details how the crash occurs
  • Information of the individuals such as weight, height, age
  • Injury patterns, preclinical and clinical care

The PCM data extends the existing data with detailed information of vehicle dynamics for the pre-crash phase of five seconds before first impact of these two participants which were crashed first. The criteria for a case could having a PCM are:

  • At least one passenger car is involved
  • No participant skids
  • No participant drives reverse
  • No participant has a trailer
  • No participant has a technical defect.

In the regarded version of GIDAS (2016-12-31) there are 29,514 crashes coded and 8,651 of them have a PCM. In all GIDAS cases there are 24,359 collisions with car-involvement and in these cases 47,983 participants (33,920 cars) are coded. In all cases with car-involvement 64,263 persons are involved and 5,303 of them are injured with at least one AIS 2 (or higher) injury. More information about the injury scale AIS can be found in (AAAM, 1998) and (AAAM, 2005).

3. Requirements Analysis

3. Requirements Analysis

In general, there exist various methodologies for the requirement assessment of the tolerable risk of new technologies. CEN/CENELEC outlines in EN 50126 the concepts of risk assessment and refers to the established methods of GAMAB “at least as good as”, ALARP “as low as reasonably practical” and the Endogenous Minimal Mortality. RAMS approach is established for specification and development of safety relevant railway systems. [QUELLE]

The approach includes system definition as well as implementation of risk analyses, hazard determination and safety approvals. Details and deductions are to be provided in the following paragraphs.

Method to define acceptance criteria

The common expectation for HAD vehicles is that an introduction will lead to an increased road safety and a reduction in traffic fatalities. However, the quantitative safety requirements are still in discussion. This clause analyses the risk acceptance according to several acceptance principles and applies the safety level on today’s traffic to derive references for acceptable risks. The focus is on macroscopic safety requirements meaning accident rates per mileage and not the behaviour in individual driving situations. It is concluded that the acceptable risk varies with the focus group involved and with the market share of automated vehicles. Increased safety of conventional driving in the future could lead to higher requirements as well. It is also pointed out that it is not guaranteed that the given acceptable risk levels are also accepted by the customer because other factors besides the accident statistics are relevant. An argumentation for a proof of safety is given and, however, as none of these risk levels can be proven before introduction, a monitoring of vehicles in the field is suggested. Different introduction and risk management strategies are briefly described. Finally, the risk during introduction of new generations of airplanes is discussed.


Detailed description of the content (methods and goals):
The description of the content is an executive summary of a paper by Philipp Junietz, Udo Steininger and Hermann Winner (see (Philipp Junietz, 2019)).

The focus of the following description is on the results and their embedding into the overall approach, especially on the synchronization with work packages about defining a safety level for HAD-F and derivation of requirements for an HAD-F.

Quantitative risk assessment

The usual quantitative risk definition “risk equals frequency times severity” is illustrated in Figure 1. Considering occurrence of unintended events (frequency) and extent of damage (severity), we find two typical areas. In the green area, the system is in a safe state; the corresponding risk is accepted. In the red area, the system is in an unsafe state; the corresponding risk is not accepted. The borderline between these areas is probably not sharp; there can be a kind of transition area. Although this simple definition is very useful for many questions in technology and insurance industry, it neglects aspects like aversion against high severity, lack of controllability, and personal benefit that are relevant for risk perception and acceptance by individuals and society.

Fritzsche discussed risk acceptance relating to voluntary nature of exposure in 1986. The results are summarized in Figure 2.  The numbers are valid until today. The reason might be that risk perception studies reached their peak in the 70’s with the spread of nuclear power plants. Later studies delivered no relevant changes in risk perception.

For voluntary activities, Fritzsche found that the willingness to accept risks is nearly unlimited, depending on the experienced personal benefit. We can see this by the example of high-risk sport or other leisure activities, e.g. free climbing, motorcycling etc. Job-related activities are important for a deeper understanding of the subject. Acceptance is relatively well investigated in this field and there is a common understanding of accepted individual mortality risk in the order of 10-5 per person and year, for example by professional associations and insurance companies, on the one hand. On the other hand, job-related risks are useful to bridge the gap between voluntary and involuntary risks.

Figure 2: Risk acceptance versus voluntary nature of risk exposure[see (Philipp Junietz, 2019) Based on Steininger, U., and L. Wech. Wie sicher ist sicher genug? Sicherheit und Risiko zwischen Wunsch und 29 Wirklichkeit. VDI-Berichte, No. 2204, 2013. and Fritzsche, A. F. Wie sicher leben wir? Verlag TUV Rheinland, 1986]

Fritzsche found that for involuntary risks, e.g. death of passengers due to a train or airplane crash, the acceptance level is an order of magnitude less than for job-related risks. Moreover, acceptance decreases another order of magnitude if the risk is caused by major technology, e.g. chemical industry or nuclear power generation. Beside the fact that the experienced personal benefit of those technologies is low (at least from a subjective point of view), the low degree of self-determination or rather controllability by individuals plays an important role for the low acceptance level as well as the potentially high number of mortalities (severity). Nevertheless, Figure 2 shows that it is generally possible to deal with risk, risk perception, and acceptance in a quantitative manner.

To implement safety requirements based on risk acceptance, several principles have been developed in different application areas. Because of the relationship between railway and road traffic, it is useful to refer to the CENELEC safety standard EN 50126. The development of this standard has been started in the 1990’s, where safety requirements based on quantitative risk analysis have been implemented and ALARP, MEM, and GAMAB have been introduced as principles for risk acceptance.

As low as reasonably practicable (ALARP) principle tries to assess what is technically feasible considering economic sense and social acceptance. Between two regions of generally unaccepted and broadly accepted risk, there is a tolerance range where risk is undertaken only if a benefit is desired and where each risk must be made as low as reasonably practicable. The two key levels lie around road death statistics (about 10-4 per person and year) and the chances of being struck by lightning (about 10-7 per person and year). If something is more dangerous than driving a car, the risk is unacceptable. If something is less dangerous than being struck by lightning, then we do not expect anyone to do anything about it. In the range between these two figures, cost benefit studies are appropriate to reduce the risk to as low as reasonably practicable [EN 50126].

Minimum endogenous mortality (MEM) is based upon age- and gender-specific mortality rates. Although the absolute values of the mortality rates change with birth cohort, they show a typical development over age as well as a significant minimum at an age of about 10 years. The related mortality at an age of 10 years is defined as “minimum endogenous mortality”. The MEM principle demands that a new system does not significantly contribute to the existing minimum endogenous mortality. EN 50126 specifies that the individual risk due to a certain technical system must not exceed 1/20th of the minimum endogenous mortality, taking into account that people are normally exposed to the risk of several technical systems. This means that the accepted individual risk of a certain technical system should be about 2.5∙10-6 per person and year. [Statistisches Bundesamt, „Kohortensterbetafeln für Deutschland, Methoden- und Ergebnisbericht zu den Modellrechnungen für Sterbetafeln der Geburtsjahrgänge 1871 – 2017“, 5126206-17900-4, 23.06.2017]

Globalement au moins aussi bon (GAMAB) - English: generally at least as good as - requires, unlike MEM, the existence of a reference system with currently accepted residual risks. According to GAMAB, residual risks caused by a new system must not exceed those of the reference system. In other words: a new system must offer a level of risk generally at least as good as the one offered by any equivalent existing system. This makes it necessary to identify the risk of the equivalent existing system. Looking for the acceptable risk of HAD-F in a certain application area according to GAMAB, we have to identify the current risk of the equivalent existing system in the same application area. To derive acceptance requirements for a controlled-access highway pilot, we analyse the current risk on German controlled-access highways during manual driving. We get this way an average rate for fatal accidents of 1.52·10-9/km [Schöner, H.-P. Challenges and Approaches for Testing of Highly Automated Vehicles, Paris, 04.12.2014]. This can be transferred into an accident rate of 6·10-6 per person per year with the help of known driving profiles [VDA 702 Situationskatalog E-Parameter nach ISO 26262-3. VDA-Empfehlungen, VERBAND DER AUTOMOBILINDUSTRIE E. V. (VDA), 2015. https://www.vda.de/de/services/Publikationen/situationskatalog-e-parameter-nach-iso-26262-3.html].

In Figure 3, the different approaches for the mortality risk are summarized. On the one hand, it shows that the application of different risk acceptance principles delivers comparable and consistent results. On the other hand, it demonstrates that we have to deal with a relatively broad range of applicable acceptance criteria.

Figure 3: Implementation of several risk acceptance principles for a Highway-Pilot
[see (Philipp Junietz, 2019) Based on Steininger, U., and L. Wech. Wie sicher ist sicher genug? Sicherheit und Risiko zwischen Wunsch und 29 Wirklichkeit. VDI-Berichte, No. 2204, 2013. and Fritzsche, A. F. Wie sicher leben wir? Verlag TUV Rheinland, 1986]

In a first step, comparison with other technologies – especially other traffic systems and technologies that deliver a high personal benefit – seems to be useful. Afterwards, different focus groups have to be distinguished, for example users of highly automated driving systems and other traffic participants, taking into account the impact of voluntary exposure. Finally, a decrease in total mortality risk is expected in the future, following the trend in the last decades and centuries. Therefore, risk acceptance might change over time.

Introduction of New Technologies in Aviation
In aviation passengers are exposed to a technical system without having personal control. Although severe accidents happen, its safety is accepted by most of the population. Due to the long travelling distance and the fact that accidents mostly happen during take-off and landing, accident rates are typically given per flight and not per travel distance. Accidents and critical situations are strictly reported and collected in databases, so we have even more profound data compared to road traffic. Depending on the number of flights per year, we can observe an annual risk that is similar to driving a car on a highway. One fatal accident happens about once per ten million flights [Airbus. Commercial Aviation Accidents 1958-2016. A Statistical Analysis, 2017. Accessed August 1, 2017]. With a typical exposure of two flights per year, the risk of a fatal accident would be lower than the risk of involuntary exposure finv and about one order of magnitude lower than driving on a highway. However, with 20 flights per year, one would be exposed to a risk that is in the same order of magnitude. Therefore, the levels of risk are in fact comparable when only driving on controlled-access highways is considered. However, typically users drive on all types of roads. The risk of car traffic is at least on order of magnitude higher in total, so the superior reputation of air traffic is justified.

Aviation has become increasingly automated over the past decades (although today’s systems are still SAE level 2 because they are supervised by the crew). The detailed collection of data in aviation allows an analysis per generation of airplanes, which was summarized by Airbus Industries (see [Airbus. Commercial Aviation Accidents 1958-2016. A Statistical Analysis, 2017. Accessed August 1, 2017]). As depicted in Figure 4, with every introduction of a new generation, the fatal accident rate for this new generation was higher than the state of the art. Due to the low number of new airplanes at introduction, this trend cannot be observed in the total accident rate. Nevertheless, the introduction was clearly beneficial to society in total because after an introduction phase of five to ten years, the new generation had the lowest accident rate of all.

Judging from this data, new generations of airplanes are not tested in a way to prove statistically that the system is superior to the former. In fact, this is impossible; because the knowledge about the new system’s behavior is incomplete and only field experience can reduce the unknowns. Similar to HAD, statistical testing is neither economically feasible nor necessary because the strict supervision of air traffic allows efficient improvement in case of critical situations or accidents. However, the highest automation in commercial air traffic is still comparable to level 2, so human error is still a factor. Nevertheless, the leap in accident rate occurred with the introduction of technology, be it because of flaws in human-machine-interaction or in the technology itself. One could argue that it is unethical to release a system that is not tested in the best way possible. But first, it is impossible to completely test a system operating in an uncontrolled environment because there might be situations that the tester was not aware of. These unknown unknowns cannot be tested. Second, a stricter approval process would prevent technical progress because an economic-oriented development would become impossible. It seems possible if not likely that the accident rate of automated vehicles will behave in a similar way. We should be aware of that possibility and focus on the improvement of the system in case of a detected critical situation or accident. The delayed introduction of HAD could in fact risk the lives of many people because the system is believed to improve safety over time.

With a combined testing strategy of simulation, proving ground tests, and real traffic tests, it is still unlikely to complete a logical proof of safety because every validation test has certain underlying assumptions. To deal with this uncertain safety performance, accidents, unexpected critical situations, and near misses must be monitored similar to air traffic, in order to find flaws in the system (including infrastructure and human interaction) with the chance to improve them. This is discussed in detail later.

Figure 4: Fatal accidents with different generations of airplanes in commercial traffic. Dotted line means less than one million flights a year. First generation: Early commercial jets, Second generation: More integrated Auto Flight System, Third generation: Glass cockpit and Flight-Management-System, Fourth generation: Fly-By-Wire with flight envelope protection. [Airbus. Commercial Aviation Accidents 1958-2016. A Statistical Analysis, 2017. Accessed August 1, 2017]
4. Systematic Identification of Scenarios

4. Systematic Identification of Scenarios

Knowledge-based approaches to scenario generation complement data-based approaches. As the mutual strengths and weaknesses in rare-event coverage vs. frequentistically adequate representation of usual situations are complementary, data-based and knowledge-based approaches can support each other and are able to counterbalance their disadvantages.
Several approaches of systematic knowledge-based identification of scenarios were followed in PEGASUS. One approach is the description of “normal” scenarios on the highways using ontologies. To extend and complete this approach and to identify possibly critical scenarios early in the development process and guarantee a sufficient test-case coverage, by employing methods to systematically identify automation risks, as developed for and applied to the Highway Chauffeur. Furthermore, an expert-based approach for identification of scenarios on layer 4 of the 6-layer-model was developed, which is suitable for parameterization with measurement data.

Application of ontologies for scenario generation

As shown in Figure 1, the process for the knowledge-based scenario generation using an ontology implemented in PEGASUS is realized in two steps.

Figure 1: Ontology-based process for scenario creation based on [2]. Rectangles represent working products, rounded corners represent process steps.

In the first step, linguistically described knowledge about road traffic patterns on German highways is identified, conceptualized, and formalized. To formalize the knowledge, an ontology in the Web Ontology Language (OWL) is implemented. Therefore, the knowledge is represented through hierarchic classes as well as semantic relations and restrictions between these classes. The knowledge modeled in the knowledge base is structured according to a 6-layer-model (see Figure 2). Based on prior work by Schuldt [3], the model has been adapted to a representation in an ontology and an automated creation of functional scenarios. On the first and second layer, the road network is described according to the German guideline how to construct motorways. In concept, the third layer describes temporary manipulations of layers one and two (like road construction sites). However, within the scope of PEGASUS, this layer has not been implemented. On the fourth layer, the interactions of traffic participants are represented through maneuvers. On the fifth layer, weather conditions are modeled. The sixth layer describes digital information, such as V2X and digital data. Similar to the third layer, within PEGASUS, the sixth layer has not been detailed.

Figure 2: 6-layer-model for structuring scenarios according to (Bock, et al., 2018) [2] based on [3] .

In the second step, scenarios are automatically and systematically deduced by varying classes and instances defined in the ontology. For variation, possible combinations and restrictions specified in the knowledge base are considered. Therefore, scenarios are created stepwise.

In the beginning, possible sceneries including relations (like arrangement relations) of the respective elements are created. The elements to be considered for variation can be specified by the user through properties. Afterwards, for every scenery, all combinations of traffic participants are generated, based on a defined position grid, and for each traffic participant all possible maneuvers are assigned. At this point, multiple combinations of sceneries and traffic participants, including their arrangement and all possible maneuvers, have been generated.

In the next step, an individual maneuver to be executed is chosen for every traffic participant in order to build a start scene. Afterwards, for every start scene, an end scene is deduced based on constraints (like relative velocities) arising from the individual maneuvers. Furthermore, the driving paths are checked regarding collisions and, if appearing, the respective end scene is dismissed. The combination of a start scene and an end scene defines a scenario. As shown in Figure 3, each scenario is exported as abstracted HTML-based visualization and scenario graph for a detailed linguistic description. In addition, the export of the scenario consisting of the start scene and end scene to the technical formats OpenDRIVE and OpenSCENARIO allows the execution of the scenario in the simulation.

Figure 3: Left: HTML-based visualization as abstracted overview; Right: scenario graph as detailed linguistic description

References
[1] A. Reschka, „Fertigkeiten- und Fähigkeitengraphen als Grundlage des sicheren Betriebs von automatisierten Fahrzeugen im öffentlichen Straßenverkehr in städtischer Umgebung“, dissertation, TU Braunschweig, Braunschweig, 2017.
[2] G. Bagschik, T. Menzel, und M. Maurer, „Ontology based Scene Creation for the Development of Automated Vehicles“, in 2018 IEEE Intelligent Vehicle Symposium, Changshu, 2018.
[3] F. Schuldt, „Ein Beitrag für den methodischen Test von automatisierten Fahrfunktionen mit Hilfe von virtuellen Umgebungen“, dissertation, TU Braunschweig, Braunschweig, 2017.

Identification of automation risks

The methods to identify automation risks make use of well-established techniques for hazard and risk assessments as well as a thorough safety analysis. They serve the purpose of identifying the causes of risks at an early stage, especially those arising from the introduction of highly automated driving functions (HAD-F). Within the scope of the analysis, possibly hazardous scenarios are identified and described in the form of parameter ranges of logical scenarios. This enables targeted testing later in the test process and enables sufficient test case coverage of potentially critical scenarios.

For each class of automation risks a different method was developed, in order to address the specific challenges of this class. However, the methods to identify automation risks of the classes 1 and 2 are similar in their structural nature.

For the automation risks of class 1 an iterative method, based on traditional safety analysis methods like HAZOP and fault tree analysis (c.f. (Ericson, 2005)), has been developed.

The safety analysis methods were adapted as necessary to be able to address the specific challenges emerging from the introduction of HAD-F. Class 1 automation risks are those arising from external influences on the automation. They correspond to situations in which the HAD-F has difficulty coping with the environment or the behavior of other traffic participants. These include the non-detection or misclassification of objects in the environment, the erratic detection of objects that do not exist in reality, and the misprediction of the future development of a situation. These problems do not necessarily stem from a fault in the E/E systems but may be inherent (e.g. limitations in perceptions due to a specific sensor set-up) to the system and are in general only visible in a certain scenario or a small range of scenarios.

The first step to identify these scenarios is to model the driving function. Taking a behavioral perspective, this entails modeling the inputs, computations, and outputs of the different components/computational units. This step will in general be iteratively refined during later stages. Next, the possible hazards stemming from deviations of the intended functionality or from issuing actions in the wrong context are identified. Identification is pursued using a keyword-based approach on the vehicle behavior level covering different basic scenarios. The resulting table is used for a second keyword-based approach, where the reasons inside the system that may lead to the previously identified deviations are observed and recorded in another table (see Figure 5).

The next step is to identify possible reasons for hazardous behavior (with a focus on environmental triggers) with the help of a modified fault-tree analysis. Here we follow the identified outputs of each function from step 1 and identify possible reasons for unintended output. These reasons are classified in

  • Random hardware faults (covered by ISO 26262 (International Organization for Standardization, 2009)): e.g., hardware fault in the ECU for calculating trajectories
  • Design faults in HW or SW (deviations of the implementation from the specification): e.g., fault in the algorithm that calculates the trajectories
  • Specification fault: e.g., trajectories of other traffic participants are predicted without a dynamical model of these
  • Propagated faults (the input the component is receiving is already faulty): e.g., an inappropriate maneuver was chosen by an upstream planner

On the one hand, certain environmental conditions may be necessary for the faults to actually become visible or relevant (e.g., only if another traffic participant is present the HAD-F can mispredict its trajectory). On the other hand, these environmental conditions may be the reason for an unintended function of the component (e.g., a small metal object may be responsible for a misclassification of a RADAR sensor). In order to identify and model these, the fault tree analysis was extended by a further logical gate (which is a variation of the inhibit gate), which makes it possible to specify environmental conditions that are necessary for fault propagation.

In the last step, the environmental triggers for hazardous behavior are deduced and translated into the necessary specification language for scenarios. Automation risks of class 2 correspond to scenarios in which the driving behavior of the HAD-F deviates from the expected behavior of human drivers and subsequently triggers a critical situation. When identifying these scenarios systematically, it is assumed that there is no malfunction within the driving function in these situations. The error or risk thus arises from the interaction between different systems rather than from local causes. A typical case here is the interaction between the HAD-F system and the human driver. A risk analysis method that is particularly well suited for identifying risks arising from the interaction of several complex individual systems is the Systems-Theoretic Process Analysis  (STPA) (Leveson, 2012). To facilitate the application of the STPA method, it was adapted to the problem at hand and the analysis was carried out on our use case. In the following a short summary of the approach is given.

The first step is to define the fundamentals. This includes the control structure diagram including the control actions, the scenarios, and the expectations of the human driver in these scenarios. When defining the control structure diagram, the process under consideration and the control actions occurring in this process are described. In the given use case, the considered process consists of the HAD-F, the human driver, and the interaction between them. Within this process, the HAD-F can perform the following tactical control actions: 1) keep speed and keep lane, 2) keep speed and change lane, 3) change speed and keep lane, 4) change speed and change lane, 5) abort lane change, 6) emergency braking and 7) emergency stop. In order to determine the scenarios, a classification based on the six-layer model is used and the most important ones are selected based on an expert judgement so that the number of scenarios is reduced to a manageable size. Finally, the behavior of the HAD-F as expected by the human driver is determined in the identified scenarios.

In the second step, the actual STPA method is performed. This includes determining unsafe control actions (UCAs), hazard identification, and causal analysis. When determining the UCAs, the impact of improper control actions is examined, especially regarding what happens if control actions are

a)    not provided,
b)    provided,
c)    provided at the wrong time, or
d)    stopped too soon or applied for too long.

For example, an otherwise reasonable control action becomes an unsafe control action if the human driver does not expect this control action to be provided in this situation. Subsequently, the hazard identification systematically analyses the consequences of the UCAs in the scenarios under consideration. In this step, the automation risks of class 2 are identified. The final causal factor analysis provides information on the triggers of the identified risk.

For the automation risks of class 3, according to the definition of this class, the possible difficulties between the automation and the driver where identified (e.g. mode confusion or misuse). However, these do not result in scenarios, but in requirements on the HMI concept.

For the automation risks of class 1 and 2, the resulting environmental triggers are translated into a JSON-file and can be used in the PEGASUS database to mark potentially critical combinations of parameter areas.

Definition of logical scenarios on layer 4

In order to draw concrete scenarios from field data, it is necessary to elaborate a framework for deriving a scenario catalogue. Herein, it should be possible to store all relevant elements of the scenario with regard to Layer 4 of the 6-layer-model. Since for safety assurance the focus is on safety relevant situations, the underlying systematics focus on the avoidance of collisions between a subject vehicle (SV) and another object. For the use-case highway chauffeur in PEGASUS the object with which such a collision would occur is typically another vehicle.

The central underlying assumption of the framework is, that a safety-relevant situation can be The central underlying assumption of the framework is, that a safety relevant situation can be characterized by the need for a reaction by the SV to avoid a collision. This object is referred to as the challenging object or the challenger. The challenging object does not have to be the accident perpetrator, but it is the object with which the SV would collide with, it no collision avoidance action were executed. Based on this assumption, a limited number of safety relevant logical scenarios can be defined.

A first criterion for defining these logical scenarios is the area of the SV the challenging object would collide with. A distinction between front, rear and side impacts can be made. The second criterion is the initial positions of the challenging object. Depending on the impact plane, different initial position for the challenger can be distinguished by whether the outline of the challenger overlaps with the outline of the SV in longitudinal or lateral direction. For front impacts, one position can be identified, which overlaps laterally with the SV. Secondly, a position in front of the SV can be identified which does not overlap with the challenger. Thus, a front impact from this initial position would require an additional lateral movement by the challenger compared to a challenger coming from a position with lateral overlap. A third option is a challenger coming from an initial position which does not overlap in lateral direction and either overlaps in longitudinal direction or requires a transition through a state of longitudinal overlap. This requires the challenger to have an initially larger velocity compared to the SV but then decelerate to a slower longitudinal velocity combined with a lateral movement. The different positions and the path leading to the different types of impact are pictured in Figure 6.

Position for rear impacts can be defined analogous to the positions for a front impact. For side impacts, one position can be identified, which overlaps in lateral direction. Furthermore, a distinction is made between positions, which do not overlap in longitudinal direction or are in front or behind the SV. The decisions to derive the resulting scenarios are depicted in Figure 7; the associated parts leading to a collision are also pictured in Figure 6.

A nomenclature for the safety relevant logical scenarios was chosen as shown in Table 1.

Apart from the challenger defining the safety relevant logical scenarios, further objects might be of relevance in a scenario. In order to represent only objects that are relevant for the safety relevant scenarios, different roles have been identified for the additional objects in a scenario. These additional objects either may increase the challenge for the SV to handle the situation safely or are important in the sequence of events of a scenario, so that the SV may react to an emerging situation although a collision with a challenging object is not yet imminent.

Action Restrictions
The first group of objects that increase the challenge for the SV are action constraints. These objects limit the space for collision avoidance maneuvers for the subject vehicle. An action constraint alone would not require collision avoidance action by the SV, but in combination with a challenger, an inadequate avoidance maneuver may result in a collision with the action constraint. Initially actions constraints can be identified, which are realized by a single object. This object may be either located in front, behind or to the side of the SV. Compared to the initial position of the challenging vehicle, no distinction is made between position 2, 3 and 4 in Figure 6. Instead, the positions to the side of the object are treated as a continuous space.

Further action constraints can be identified, which are realized by multiple objects. One configuration is the complete blocking of a side to the ego vehicle. Such a blocking may result from a queue of vehicle with low headways that practically do not allow any collision avoidance maneuver to this side for the SV. Alternatively, this blocking of the side may include a gap which is large enough for the SV to move in while requiring a certain longitudinal trajectory. In order to keep the number of dimension of each scenario as small as possible, it should be possible to describe the position and size of this gap as parameters of the scenario instead of describing trajectories for all vehicle contributing to the existence of this gap.

The described action constraint can be summed up as following:

  • Object in front of the SV
  • Object behind SV
  • Object to the side of SV
  • Complete blocking of one side of the SV
  • Blocking to the side of the SV with gap sufficiently large for a collision avoidance maneuver

 

Dynamic Occlusions
Another role of additional objects in a scenario that increase the challenger for the SV to handle the situation safely are dynamic occlusions. Similar to action restrictions, a dynamic occlusion itself does not require an avoidance maneuver by the SV. Objects are represented as dynamic occlusions if the object obstructs the vision on the challenging object from a theoretical point of view of the ego vehicle. This allows storing scenarios that often are referred to as “Cut-Outs” (Figure 8): A vehicle in front of the SV makes a lane change to a neighboring lane because a vehicle further in front is decelerating. This decelerating vehicle then generates the safety relevant logical scenario A for the SV. If the occluding object were not present, the SV could have perceived the challenging object at an earlier stage and already initiated a collision avoidance maneuver.

The presence of a dynamic occlusion does not necessarily have as consequence, that the SV does not perceive the challenging object. Instead, it describes the kinematic behavior of the object serving as obstruction. Depending on the sensor setup, it may still be possible for the SV to perceive the challenger although it is completely obstructed in a planar representation, since e.g. a radar sensor may sense the obstructed challenger via reflections from the ground. When realizing the scenarios in a simulation or on a proving ground, the role of the dynamic occlusion will executed by a vehicle so that depending on the type vehicle possible sensory advantages for the SV can still be represented.

Multiple Challengers
It is also possible that two objects require collision avoidance action by the SV. For instance, the SV might be challenged at the same time by a sides wide challenger and by a rear-end challenger. Yet, it is unlikely that the hypothetical collisions would happen at just the same time. Thus, it is necessary to store the temporal relation between the two challengers. Identification of scenarios in which multiple challenging objects are present are extremely rare in driving data, so that it is not possible to study the underlying mechanisms yet. Applying the metrics for identifying a single challenger on multiple objects within a limited time window is implemented within the database mechanics. Hence, it is possible to store candidates of scenarios involving multiple challengers, which can be subject to further analysis.

 

Causal Chains
Apart from objects that increase the challenge for the SV to handle the scenario safely, certain objects may induce a tactical behavior of the SV, which may lead to an earlier reaction to the challenging object. Comparable to a human driver, the SV may look further ahead than just one vehicle and sense relevant behavior. An example is a vehicle two vehicle ahead performing a harsh braking maneuver, which forces the vehicle in front of the SV to a harsh braking maneuver and thus becomes the challenger from SV’s point of view. These causal chains can be incorporated in the logical scenarios by treating the challenging vehicle as a vehicle that initially was challenged by another vehicle itself and applying the systematics of the safety relevant logical scenarios.

 

5. Preprocessing / Reconstruction

coming soon

6. Process Guidelines + Metrics for HAD Assessment

6. Process Guidelines + Metrics for HAD Assessment

In general, an automated vehicle is a risk for different focus groups. The first two groups are the users of the HAD vehicle and the potentially involved accident partner, who can be any individual traffic participant (non-users). As previously discussed, the reason for the different views on accepted risk of the two groups results from the benefits the groups get. The third group is the society. Different from the first two groups, the fate of an individual is not relevant for society but the total accident number is. Hereinafter, we only discuss the occurrence of fatalities (index d), so instead of risk, we are allowed to give quantitative requirements for the occurrence rate or frequency of fatal accidents.

Quantitative requirements for different types of usage are given in Figure 1. It is concluded that the accepted frequency for a person’s death per year ƒinv is 10-6 kd ⁄ a for involuntary exposure, 10-5 kd ⁄ a for professional exposure (ƒprof), and a theoretically unlimited risk for voluntary exposure with typical acceptance rates ƒvol of up to 10-2 kd ⁄ a. In general, the accepted risk varies with the benefit for the user or focus group. Most of these considerations are from the 70’s and 80’s regarding the discussions on safety of nuclear power plants. However, the assumptions are still valid in general, but should be adapted to today’s level of safety. We suggest a factor of one forth, similar to the development of MEM from the 70’s to today. (Reduction by a change in accident rate would be another approach with a similar outcome.) In the following, the lower risk today compared to the numbers above is indicated by its index with year and country of the underlying statistic.

Users
The fatal risk for users is assumed equivalent to the risk of a fatal accident of a HAD vehicle (neglecting a higher damage with more than one user at the same time). As depicted in Figure 1, the type of exposition is relevant for accepting risks. In most use cases, HAD-F are used voluntarily; they must be actively bought and activated. Professional use is also plausible but the use for the job is not expected during the first introduction phase. Involuntary use is excluded in typical use cases. This consideration suggests following the risk acceptance rate of fprof, 2016, GER equal to 1.4∙10-9 kd ⁄ km . Similar rates are also present in the US (2.5∙10-9 kd ⁄ km) [FATALITY RATE PER 100 MILLION ANNUAL VMT - 2013, U.S. Department of Transportation Federal Highway Administration. https://www.fhwa.dot.gov/policyinformation/statistics/2013/pdf/fi30.pdf. Accessed July 17, 2018]. For other countries, data about the mileage on different road types is not always available. The accident rate on all roads’ combined mileage is in a similar order of magnitude for most developed countries.[Oguchi, T. Achieving safe road traffic—the experience in Japan. IATSS research, Vol. 39, No. 2, 2016, pp. 110–116] [Comparison of 2013 VMT Fatality Rates in U.S. States and in High-Income Countries, U.S. Department of Transportation NHTSA, October 2016. https://crashstats.nhtsa.dot.gov/Api/Public/Publication/812340. Accessed July 17, 2018].

However, the substitution of conventional driving also suggests comparing the risk of today’s driving with the suggested rate of ƒprof. In the following, both considerations will be examined and compared. For today’s driving risk, driving on Autobahn in Germany will be taken as a reference. This has several advantages. First, driving on controlled-access highways is one of the safest, if not the safest way of travelling in a car, especially when taking the accident rate per mileage as reference. Second, it is likely that the first HAV will drive on a controlled-access highway. Third, accident data on highways are well documented. Even minor accidents often result in the involvement of police because of the traffic disturbance and the measured traffic density estimates the travelled distance. Assuming an average travel distance d, the time-based frequency (index t) can be transmitted to a distance-based frequency (index s) and vice versa. In this example, 4000 km/a are assumed as an average travel distance according to VDA (see footnote 4) and an average velocity of 100 km/h.

In the following equation (1), the GAMAB principle and the MEM principle are combined. We consider this the upper limit for tolerable frequency because a new technology is introduced that comes with new risk. Additional risk is acceptable because the user experiences a benefit from that new technology. Note that we only consider the risk for the user in this section. This is not applicable to non-users or society at all, what will be discussed in the following sections.

Interestingly, the order of magnitude according to equation (1) corresponds to the accepted frequency for professional exposure. This strengthens the hypothesis that both estimations result in acceptable values for users of automated vehicles. However, higher risk could be accepted by the user (similar to motorbikes or extreme sport) but the user should be aware of this potentially increased risk.

Non-users
For all other traffic participants, HAD has no personal benefit (besides the decreased total risk for all traffic participants assuming that HAD is generally safer than the average driver). However, non-users could have a lower risk acceptance threshold because they are skeptical about the new technology or might even have (subjective) disadvantages e.g. due to slow vehicles on the road. Extraordinarily critical is the risk of new types of accidents because non-users would blame HAD for those accidents despite a potential reduction of the total number.

New risks could be caused for example by performance limitation (according ISO/PAS 2144), systematic software failures or cyber-attacks. The total new risk of the technology for an individual non-user should be below ƒinv,2016,GER equal to 2.5∙10-7 kd ⁄ a  (= ¼ of the value for involuntary exposure). So how can the individual risk for a non-user be calculated? As long as there are not many HAD vehicles on the market, the exposure is very low and the probability that the individual traffic participant is involved in a HAD accident is low. So, the risk is multiplied with the field share µ. The risk for non-users is diluted by the exposure to vehicles equipped with HAD.

According to equation (2), the accepted risk for a single HAD vehicle is lower with increasing number of HAD vehicles. This is intuitively obvious because the exposure multiplies with the number of potential single threats. Comparing the risk level with equation (1) results in a number of 1.625·106 HAD vehicles in Germany, until the risk acceptance of the other traffic participants becomes dominant. This deliberately neglects that the non-user also has benefits if the system is safer than the human driver it replaces. This will only be acknowledged by non-users if there is an undeniable difference in accident statistic. Otherwise, the (subjective) disadvantage of the new technology stays dominant.

Society
For society, the fate of individuals is of lesser importance. Benefits and costs of HAD are measured by the total number of accidents and if they are reduced over time. In general, a decreasing trend of accident rate throughout the years can be observed in Germany [Statistisches Bundesamt (Destatis). Verkehrsunfälle - Fachserie 8 Reihe 7 - 2015, 2015] and the US [Fatality Analysis Reporting System, U.S. Department of Transportation NHTSA, 2017. https://www-fars.nhtsa.dot.gov/Main/index.aspx. Accessed July 18, 2018]. However, we can observe that this trend has been diminishing over the last 5 years for accidents with injuries and even had a slight (but insignificant) increase during these years. For fatal accidents, this trend of a more slowly decreasing rate is observable as well, but less significant. One could suspect that there is a natural limitation with current road network, traffic density, and state-of-the-art vehicles.

When introducing HAD, there will still be a non-zero risk of severe accidents and therefore it is likely that HAD will be involved in those severe or even fatal accidents. So, what are the requirements by society if individual accidents do not influence the total number significantly? What is the upper total accident rate limit accepted by society?

The overall target is to reduce the amount of accidents over time with the introduction of new technology. If we follow the argumentation of Wachenfeld [Wachenfeld, W. H. K., Dissertation, Technische Universität Darmstadt, 2017] and Kalra [Kalra, N., and D. G. Groves. The Enemy of Good. Estimating the Cost of Waiting for Nearly Perfect Automated Vehicles, 2017], we should allow a certain risk to bring HAD to the market and to gain further knowledge. At the same time, it is not acceptable for the whole society that the total risk is increased in a noticeable way. However, there is no way to check how accident numbers would have evolved without the technology as soon as it has entered the market. However, in the last decade, the decrease of fatal accidents and accidents with injuries diminished. At the same time, the annual travel distance increased. Hence, it seems justified to use recent numbers as reference. When using the accident rate for fatal accidents ƒs,d , an exponential regression is a better fit than a linear regression. Interestingly, this is also the case for accidents in aviation (comp. footnote 5). The standard deviation for the exponential regression for all years since 2010 results in:

Multiplying the standard deviation with the average annual mileage in 2016 results in 23 fatal accidents per year. It must be pointed out that the type of regression and the number of years influence the result. It is also possible to use the double or triple standard deviation as a measure. However, the results will be in a similar order of magnitude. In the following, the result from equation (3) will be used.

The requirements by society should be that the risk from HAD is significantly lower than the described exponential trend observed in the latest data, so HAD should be at least one standard deviation σ7y,exp better than the predicted performance of conventional driving. However, society should give HAD time to reach this high safety reference. Similar to air traffic, it is necessary to monitor the performance to allow improvement in functions, infrastructure, and user experience. In the following formula, it is suggested to allow additional risk of one standard deviation at the beginning of introduction and demand, that risk shall be three standard deviations lower than the extrapolation, when full market share is reached. Therefore, the acceptable risk not only depends on the development of the risk in conventional traffic over the years, but also on the market share of µ of HAD.

In the following, a field share µ is assumed that develops similar to the field share of other driving functions such as electronic stability control (comp. (5)). Full field share is assumed to be reached after 30 years and described by a sine function (1+sin⁡π∙t/T) ⁄ 2; 0≤t≤T, in Figure 1 but also other parameters are time dependent because the actual safety on the roads is expected to change over time, also without HAV.


Summary of Safety Requirements
In the previous sections, safety requirements for three different stakeholders were deduced. For society, the required risk depends on the market share of HAD. The authors suggest to allow an increase in total risk by one standard deviation of the predicted accident rate, so HAD can be introduced when the knowledge about its safety level is not yet complete. Additionally to society’s requirements, non-users (as part of society) have increased requirements for new risks that come with automation. For users, we suggest constant risk requirements although they might increase with the current traffic safety over the years. However, the user’s requirements are only dominant in the early introduction phase (comp. Figure 1) when the market share is relatively low. From a market share of about 10% on, the requirements of society (and non-users) are dominant. However, if the market share reaches 100%, there are no non-users remaining.

In Table 1, the requirements are summed up. Note that we currently only give values for Germany, because statistics about mileage on controlled-access highways is available. Since data for the accident rate on the whole road network is in the same order of magnitude for developed countries (see above), we do not expect significant changes in safety requirements.

Proof of Safety and Probation in the Filed
Although we have found individual and social risk acceptance criteria for HAD it is neither possible

  • to derive safety requirements or test criteria for certain vehicles from risk acceptance criteria nor
  • to integrate a set of scenarios to approve that individual and social risks of HAD are accepted.

What we need is for 1st bullet point a Proof of Safety and a Probation in the Field for 2nd.

Proof of Safety
Social consensus regarding acceptable risk is regulated by liability laws, e.g. German ProdSG §5(2), according to which a product that conforms to standards or other relevant technical specifications is presumed to comply with product safety requirements, as far as they are covered by the standards or other relevant technical specifications. Development according to ISO 26262 ensures “absence of unreasonable risk” (i.e. “risk, judged to be unacceptable in a certain context according to valid societal moral concepts”). Other relevant technical specification is e.g. ISO/PAS 21448 for SOTIF, represented by automation risks in PEGASUS. But conformity with standards or other relevant technical specifications is only a necessary condition.

A Sufficient condition is given by the rules of the Ethics Committee [Ethik-Kommission Automatisiertes und Vernetztes Fahren, BMVI, Juni 2017], which says:

  • HAD is reasonable if it promises to reduce damage in the sense of a positive balance of risk compared to human performance.
  • If there is a fundamentally positive balance of risk, technically unavoidable residual risks do not preclude an introduction.

A fundamentally positive balance of risk compared to human performance can be argued following expert’s expectations of a generally benefit of vehicle automation for traffic safety by

  • avoidance of accidents and
  • reduction of consequences [Rechtsfolgen zunehmender Fahrzeugautomatisierung, Berichte der Bundesanstalt für Straßenwesen, Heft F 83, 2012].

Moreover, the test concept developed in PEGASUS ensures exemplarily, that the systems achieve at least human driving performance.
Beside the general benefit for traffic safety, HAD will provide automation risks in individual cases because of performance limitations and inadequate interaction with drivers and other traffic participants. Those automation risks are identified and - as far as possible - quantified in PEGASUS to ensure that HAD systems cause only technically unavoidable residual risks.
However, product development according to standards and specifications does not replace product tests in frame of verification and validation. An evaluation of different test strategies is given by Junietz et al [Junietz, P., W. Wachenfeld, K. Klonecki, and H. Winner. Evaluation of Different Approaches to Address Safety Validation of Automated Driving. Accepted Paper. In 2018 Intelligent Transportation Systems Conference (ITSC), 2018].
A crucial limitation of all a-priori considerations consists in the fact, that a valid statistical proof, that the systems actually meet the expectations, cannot be provided before they are launched on the market. This results in requirements for Probation in the Field as well as in measures to be derived therefrom for the continuous improvement of the systems.

Limited Introduction and Probation in the Field
Despite all consideration, normative specification and product tests, there will be an uncertainty about the future performance of HAD at the time of the initial introduction to the market. This uncertainty either can be accepted if all stakeholders agree that the residual risk is sufficiently small or controlled by reducing the number of sold vehicles in the introduction phase (see footnote 15). The market share µ is controlled and the exposure for the society and non-user reduced. Only the user has to accept the uncertainty if he wants to use HAD. However, the performance of HAD should be observed and statistics publicly discussed to build trust in customers and the society, but also to identify critical situations and improve the system with updates.

Recent measures of field observation have limitations regarding consideration of special aspects of HAD, detection of rare events in a timely manner and consistency during complete product lifecycle. Potential improvements are for example

  • Further development of accident models
  • Definition of indicators for deficiencies of HAD systems
  • Increase density of accident analyses
  • Field studies with HAD systems across manufacturers
  • Surveys of customers who own vehicles with HAD systems
  • Effective testing of HAD systems during Periodical Technical Inspection.

It is necessary to implement those improvements before market introduction and to install an independent organization for execution of proof of probation in the field. For efficient product monitoring and continuously optimization of HAD, event data recording (EDR) is necessary. Therefore, standards have to be developed and implemented as well as procedures to deal with data.

The legislature has just created the framework for the introduction of HAD with the amendment of the German Road Traffic Act (StVG) in 2017. A proof of Probation in the Field is therefore also an essential prerequisite for the desired and required further development of the legal framework for the introduction of higher levels of automation.

7. Data in PEGASUS-format

7. Data in PEGASUS-format

For the PEGASUS project, a feasible way to gather data from different sources and different partners needed to be developed. As pointed out in step 2 the sources differ among others from simulated data over FOT and NDS data up to data from development vehicles. In order to get this data into a single database, a common data format needed to be established. The main goal is to ensure a consistent handling of data throughout the different processing steps. To achieve this, different requirements are checked, and a definition based on this information is generated.

First a structure for all the different data needs to be developed. Since the processing of data is done in Matlab, a Matlab structure is chosen. However, Matlab is not available to every partner in the project. Therefore, also hdf5-files can be used, but they need to have the same internal structure as the matlab-files.

The main structure inside the data-files is designed to have a channel for every signal that is saved in it. Inside this channel, a Time as well as a Value vector corresponding to each other need to be provided for every signal. The time vector should have a (mostly) fixed frequency throughout the entire measurement, but the frequency can vary between different signals. Values which do not change within an entire measurement, can be stored only once with a timestamp of 0. With this structure, all required data can be stored. All mandatory, but also further useful signals are defined. Additional signals can be enclosed, so that special data, which may be available in different test vehicles, can be integrated into the format.
Different requirements were assessed during this process step. The main requirements were given by the different data sources as well as from the further processing of the data. The main goal was identified as to be able to represent every possible situation which may occur to the highway chauffeur function in the project. Since the highway chauffeur is bound to highway scenarios, the data format only needs to be capable of representing these scenarios. Since most of the data is captured from vehicles, an ego-perspective is used in the developed format.

Generally two things need to be represented by the data format:

  • Objects in the current scenario
  • Environment of the current scenario


Used coordinate systems
To ensure a consistent representation, two coordinate systems are used in the format. A global coordinate system (North, East, Altitude) to describe the overall movement of the ego vehicle and a local coordinate system (x,y,z) bound to the rear axle of the ego vehicle. The directions in as well as the connections between these two are shown in Figure 1.

Most of the data is located in the vehicle coordinate system, since the main source for data are recordings made in different vehicles. Nevertheless a global coordinate system is necessary to be able to represent the overall trajectories of the ego and all surrounding objects.


Object representation
The object representation has a maximum of 65 object slots. This includes the ego vehicle and up to 64 surrounding vehicles. To distinguish the data between these object slots, every signal is replicated for every object slot. This is done by including the current slot number into the name of every signal. Like this the signal names become idX_SignalName with X reaching from 0 to 64. If there are less objects, not all of these slots need to be used. For every object, signals for typical translation and rotation values as well as other properties can be stored. All these values are stored relative to the ego vehicle, since this is the common format for measurements taken from a test vehicle.

To be capable of storing more than 65 vehicles with this format, a global object id is stored as one signal for every object slot. With this global object ID, every slot is able to store different objects over time, since single objects are likely to get out of range after some time. So every object within the measurement data gets a unique global object ID. For the further data processing steps, these global object IDs, and thus the current object, need to stay within a single slot of the data format. Since the number of used slots needs to stay constant within a single measurement, currently not used object slots need to be indicated by a global object ID of -1.

For the ego vehicle additional signals are necessary, to give a full representation of the current scenario. To be able to recalculate the entire trajectories, values for Northing and Easting of the ego vehicle as well as its current heading angle need to be provided. These values are not needed for the surrounding objects, the complete trajectories of these can be derived from the ego vehicles position.


Environment representation
As already mentioned, the environment in the PEGASUS project is restricted to highway scenarios. Therefore, an easy approach to describe the current situation for the data format can be used. The environment is described by the lane markings and the roadside infrastructure. All these are described by a third degree polynomial using four signals each (distance, heading, curvature and curvature derivative). With the help of these values, the position along the road of each feature can be calculated. Figure 2 and Figure 3 show the general concept of the lane markings and also the definition for heading and distance. Just like for distance and heading in Figure 3, the curvature and curvature derivative also need to be given at the rear axle of the ego vehicle

The Equation 1 to Equation 3 show how to compute the position, heading and curvature in the ego vehicles coordinate system:

Additional data, like lane marking color, lane marking type or type of roadside infrastructure (barrier, guard rail, grass, …) can be given with an additional signal.

 

Necessary recommended and optional signals
Based on the requirements of the database process some signals need to be available in every measurement. These signals are shown in Figure 4. For a basic representation of the environment, at least the left and right lane marking need to be available. These are bound to the ego vehicle. Further the positions and speeds, as well as the global object ID of all objects, are mandatory for every measurement. For the recalculation of the global trajectories of all objects, the absolute position of the ego vehicle is also needed in every measurement.

If this data is available in a measurement, all further process steps can be carried out. As already mentioned, a lot more signals are defined within the format. These can be used for further extensions of the database processes or other specific calculations. The remaining signals are divided into recommended and optional signals. The recommended signals should be provided if possible, the optional ones are possibly only necessary for special applications.

In the following a list of mandatory PEGASUS JSON-signals is shown. These signals must be available in test data. The convention is that the index for the EGO is 0 and is set between 1 and 64 for surrounding objects.

8. Application of Metrics + Mapping to Logical Scenarios

8. Application of Metrics + Mapping to Logical Scenarios

Mapping to Logical Scenarios

For the mapping from measurement data to logical scenarios, different signals and data are needed. To this purpose, a processing framework is set up. It takes the recorded data as input and outputs the mapped scenarios together with key performance indicators (KPI) that describe the logical scenario.
In a first step, the data is checked for errors and, if possible, these errors are handled and corrected automatically. If not, the data conversion step must be repeated. Since there are many different KPIs that may be desired for the analysis of the different logical scenarios, the processing framework is flexible concerning the computed measures. In order to achieve this, it analyzes the given scripts and functions needed to derive the measures builds a computation tree from the requirements of the distinct scripts. It analyzes input and output of each script and orders them in a way that always, all needed signals have already been calculated.

The functions and scripts are put in four distinct groups: derived signals, scenario probabilities, scenario extraction and indicators. Derived signals are the first step of computation. Here signals such as the time-to-collision, the time headway or also the assignment of objects to particular lanes are calculated. Scenario probabilities return a probability for each time step of the recording, indicating whether that time step is part of a scenario and with which probability. Functions for scenario extraction then take these probabilities and extract scenarios, thereby not only taking into account the single probabilities but also accounting for sensory deficiencies and processing hiccups. The last group of function, the indicators, extract KPIs for each of the extracted scenarios. Depending on the type of scenario, these indicators can be more or less extensive.
After all these functions have been processed, the data is aggregated for further use. In the current state of the processing framework, it is passed on to the database for further aggregation across trips and recordings. To this purpose, the extent of a scenario and its KPIs are transferred to a JavaScript Object Notation (JSON) file. This can then be read in by the data base.

All of this framework is executed for one sole purpose: the extraction of scenarios and the KPIs describing those scenarios. Due to its modular nature, the framework can always be extended to extract more scenarios or calculate more KPIs. If the naming conventions of the used signals is kept and the needed signals are available, the framework will easily integrate any new function.

The before mentioned scenarios are recognized in the functions of the scenario probability group. From the data, the probabilities are derived using a four-stage process. For this purpose, in the first stage, comfort zones are set up around the ego vehicle. These zones are shown in Figure 1 The zone shown by the green square is limited by the ego lane markings to the left and the right. Limitation to the front is handled dynamically by the time headway

Figure 1: The two defined comfort zones around the ego vehicle. The first is laid along the ego lane (green), the second is in proximity around the ego vehicle (red)

(THW). For the second comfort zone (red in Figure 1), the dimensions are set statically. A common selection here is a few meters before and behind the car and 1 m beside the car. Bear in mind, that other objects are handled by their center, e.g. if the distance from the side of the ego vehicle to the object is 1 m it is already very close.

As noted above, the framework has a modular approach. Therefore, the assignment of objects to the lanes is given as derived signal. This means, that when the mapping to the logical scenarios is performed, the function already knows if objects are in the ego lane at a distinct point in time or not. The same goes for the infringement of the proximity (red) zone. Therefore, the second stage to the recognition is computing all entrances into the ego lane across the complete trip. This is then complemented with the information of the exact position of the infringing object, thereby taking into account the THW thresholds used to define the relevance of the logical scenarios. Entrances into the (green) comfort zone are only considered relevant, if they occur below a certain threshold.

Once a candidate for entering the comfort zone is found, the third stage is the mapping to the actual logical scenario. To this purpose, the time before and after the object entering the comfort zone is investigated. Using the objects dynamic information, the framework can deduce which logical scenario this entrance belongs to.

The fourth and last stage is the reduction of parameters needed for the description of the scenario. For brevity in the description of the scenarios and thereby also for memory efficiency, not the complete trajectory is stored. Moreover, only parameters allowing the precise reconstruction of the trajectory are computed. For entrances from the front and rear, this is rather simple, since a straight movement relative to the road can be assumed. For entrances that include a lane change, a fitting of a simpler function describing the lane change is done. This way, even more complex scenarios can be described using only a small subset of the parameters in the raw recording data.

After a scenario has been fully recognized and its parameters have been extracted, the probability of the scenario is stored for further usage in the extraction process and for the computation of the KPIs.

Application of Metrics

In the following the purpose and conception of the criticality metrics are described as one example of an application of metrics. More detail can be found in the publication by Junietz et al. [1] When deducing scenarios from real world data, a filter to find relevant scenarios is required. Most driving especially on motorways is monotone and does not require a special test case [2]. The metric should detect all scenarios with increased driving requirements in order to derive test cases that are challenging. It can be applied on data from NDS/FOT and during HAD test drives to detect relevant scenarios. When the metric is applied online during driving, real-time computation is necessary. If this requirement is not met, a metric to filter all definitely irrelevant scenarios could be used in a first assessment step in order to reduce the number of scenarios that need further computational effort. The Worst-Time-to-Collision [2] is a metric designed to work as a first filter to efficiently find those scenarios and could be used here.


Requirements for a Criticality Metric

Besides the absence of accidents, metrics that describe the criticality of the test scenario could be used in order to assess the safety (compare section 4.3.2). Criticality is hereby defined as the driving requirements that are present in the scenario. The driving requirements are highly dependent on the scenario, the HAD-F is exposed to. Hence, a general requirement cannot be deduced. The occurrence of scenarios with high requirements can only be compared statistically in real world driving. If two different HAD-F perform in a different way in the same artificial test (simulation/test field), there is not necessarily a difference in safety, as a more conservative driving strategy does necessarily mean increased safety, if both HAD-F drive accident-free. An exception would be if the driving capabilities of the HAD-F are known (e.g. a reaction time until a full brake due to computation time and brake system delay). In this case it is a requirement that the driving capabilities are sufficient to perform accident-free and the driving strategy has to be adapted accordingly.

As all kinds of accidents shall be prevented, the accident severity is not considers here, so it is not a risk metric. Ideally, the metric should describe the probability of an accident in a specific situation for AD or a human driver. As different drivers and different AD also have different driving performance and therefore different collision probability, the metric shall describe the driving requirements in a situation that is independent of the available driving performance [3]. The driving requirements consist of the following entities:

  1. Necessary acceleration in lateral and longitudinal direction
  2. Margin for corrections of course angle (side distance)
  3. Margin for correction of speed (front distance)

These entities depend on each other. A higher margin for corrections can be bought with a higher deceleration early on. Hence, the three entities should be normalized and combined into a joint value.


Comparison with State of the Art

Various metrics exist for the identification of scenarios, trajectory planning, and the assessment of criticality in general. They can be classified into a posteriori metrics, which are often used to analyze NDS [4, 5], and metrics with deterministic [6–9] and probabilistic trajectory prediction [9–13]. An overview and a classification of metrics with trajectory prediction can be found in [10] and [14]. However, those metrics do not combine all of the mentioned entities. In [15, 16] monte-carlo sampling is used to address combined maneuvers. In [17], the reachable and available free space is assessed to derive criticality. However, these approaches do not address the difficulty of driving on those trajectories. In a combination of different criteria is often done in trajectory planning and optimization [18, 19]. Defining an optimization function to be minimized offers the opportunity to combine different entities and weighting factors. Typically, the resulting trajectory is of interest together with the required control values in case model predictive control (MPC) is used. In PEGASUS, we use a similar approach with an optimization function designed to describe criticality as the driving requirements mentioned above. The minimized value of the optimization function is the criticality in the situation. It is important to note the difference between this approach and most use-cases for trajectory optimization. The metric has no direct feedthrough to the trajectory control. Its only purpose is to optimize possible future trajectories in order to calculate the criticality. This could be either done online during test drive, or a posteriori from recorded data.


Computation

To obtain the desired criticality metric, we define the state-space model with state vector x=[xw yw vv ψc] and input vector u=[ax ay]. The problem depends on the current ego velocity in natural coordinates vv, the course angle ψc, and the acceleration ax,ay in natural coordinates. Additionally, the position in world coordinates is required because the position of the objects is not updated into vehicle coordinates once the prediction horizon is initialized. For most MPC applications, a linear single-track-model is used. This has the advantage that the actuator inputs can be optimized directly. However, in the application of the criticality metric, the accelerations are of interest. As we do not use the model for actual vehicle control, additional deviations from a simplified model (e.g. because the neglecting of tire cornering stiffness) can be neglected. Instead, a point mass model is used that is extended by a course angle that is required to transform the vehicle motion into world coordinates. A non-linear vehicle model is described by:

Assuming small course angle changes and small changes in velocity allows linearization of the model. Initializing the model at course angle zero and using a constant velocity vold, which is updated only at the beginning of the prediction horizon (comp. [18]), the equations simplify to:

Resulting in the system matrices:

The goal of the MPC is to optimize the trajectory subject to the criticality within a given prediction horizon N. The first two influence factors are the accelerations that are minimized. The margin for corrections of angle is dependent on the lateral distance and the velocity. A higher distance gives more space to correct the course angle. A lower velocity has two influence factors: A disturbance (e.g. side wind) has a smaller influence towards a deviation in the position (because the velocity is integrated) and the course angle can be corrected with less effort in ay. As we want to minimize a term that includes maximization of distances, we use the position with the maximum lateral distance to all obstacles as reference ry and a maximum deviation dy resulting in the following term:

The margin in longitudinal direction cannot be used in such way, because only front distance is relevant. As reference rx we use the recommended following distance in Germany (0.5∙v/(km/h)). We use the piecewise linear function:

With the minimization of an objective function alone, the collision with other objects is not excluded. Hence, the outputs are bound by constraints that are updated at the start of the prediction horizon assuming constant velocity and course angle. Again, we neglect objects coming from behind and represent the left, right, and front boundaries to be constrained by cl,cr, and cf respectively. Additionally, the inputs must be bound because the dynamics of a vehicle are limited by the available friction coefficient. According to Kamm’s circle:

However, to allow efficient solving of the optimization problem, Kamm’s circle is approximated with 12 linear constraints using , and  taken from [18]. Modeling the objective function with the sum of the defined elements with constant weighting factors  gives us following optimization problem:

The weighting factors are chosen in a way that allows equal weight of the four elements of the optimization function. However, Rx typically results in values 10 to 20 times higher than the other elements. Because the ego vehicle is approaching, the elements of the objective function are summed up for subsequent steps. Hence, we choose a weighting factor of 1/10 here, while all other factors remain 1. The final criticality is the maximum of the acceleration during prediction divided by the maximal acceleration possible.

Developed Workflow
The computational effort of the proposed metric is high compared to state of the art metrics. Hence it is suggested to preselect scenarios that might be relevant or in other words filter scenarios that are definitely not relevant. We suggest to use the metric Worst-Time-to-Collision (WTTC) [2] as it addresses all kinds of scenarios. It calculates the time until a collision if both vehicles perform the maneuver that leads to the fastest collision possible (Figure 2).

As threshold, a WTTC value of at least one second is recommended based on calibration with real world scenarios [3] (Figure 3). Other metrics would also be possible in scenarios were they can be applied (e.g. front-to-rear collisions for TTC, TTB or required acceleration). The identified threshold are similar to those found in the SHRP2 database [20].

If sufficient data is available, machine learning could be used to classify scenarios for a first evaluation. During training, false negatives should be prevented using a cost-matrix punishing those events. For classification an ordinal scale is suggested to classify the scenario into uncritical, slightly critical and critical. For a cut-in scenario, the metrics WTTC, Time-Headway, Time-to-Steer and Collision Index (CI=v²/TTC) were favored using methods of feature selection. With more data available, a trained classification model could replace the suggested filter by WTTC only.


References
[1]    P. Junietz, F. Bonakdar, B. Klamann, and H. Winner, “Criticality Metric for the Safety Validation of Automated Driving using Model Predictive Trajectory Optimization,” in 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2018, pp. 60–65.
[2]    W. Wachenfeld, P. Junietz, R. Wenzel, and H. Winner, “The worst-time-to-collision metric for situation identification,” in 2016 IEEE Intelligent Vehicles Symposium (IV), 2016, pp. 729–734.
[3]    P. Junietz, J. Schneider, and H. Winner, “Metrik zur Bewertung der Kritikalität von Verkehrssituationen und -szenarien,” in 11. Workshop Fahrerassistenzsysteme, Walting, 2017.
[4]    M. Benmimoun, Automatisierte Klassifikation von Fahrsituationen: fka Forschungsgesellschaft Kraftfahrwesen mbH, 2015.
[5]    T. A. Dingus, R. J. Hanowski, and S. G. Klauer, “Estimating crash risk,” Ergonomics in Design: The Quarterly of Human Factors Applications, vol. 19, no. 4, pp. 8–12, 2011.
[6]    C. Rodemerk, S. Habenicht, A. Weitzel, H. Winner, and T. Schmitt, “Development of a general criticality criterion for the risk estimation of driving situations and its application to a maneuver-based lane change assistance system,” in Intelligent Vehicles Symposium (IV), 2012 IEEE, 2012, pp. 264–269.
[7]    H. Winner, S. Geyer, and M. Sefati, “Maße für den Sicherheitsgewinn von Fahrerassistenzsystemen,” in Maßstäbe des sicheren Fahrens. 6. Darmstädter Kolloquium Mensch + Fahrzeug, 2013.
[8]    R. K. Satzoda and M. M. Trivedi, “Safe maneuverability zones & metrics for data reduction in naturalistic driving studies,” in Intelligent Vehicles Symposium (IV), 2016 IEEE, 2016, pp. 1015–1021.
[9]    F. Damerow and J. Eggert, Eds., Predictive risk maps. Intelligent Transportation Systems (ITSC), 2014 IEEE 17th International Conference on, 2014.
[10]    M. Schreier, Bayesian environment representation, prediction, and criticality assessment for driver assistance systems. Dissertation, Technische Universität Darmstadt, 2016. Düsseldorf: VDI Verlag GmbH, 2016.
[11]    J. Eggert and T. Puphal, “Continuous Risk Measures for ADAS and AD,” in FAST-zero '17, 2017.
[12]    J. Eggert, “Risk estimation for driving support and behavior planning in intelligent vehicles,” at-Automatisierungstechnik, vol. 66, no. 2, pp. 119–131, 2018.
[13]    D. Althoff, J. Kuffner, D. Wollherr, and M. Buss, “Safety assessment of robot trajectories for navigation in uncertain and dynamic environments,” Auton Robot, vol. 32, no. 3, pp. 285–302, http://dx.doi.org/10.1007/s10514-011-9257-9, 2012.
[14]    S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion prediction and risk assessment for intelligent vehicles,” (English), Robomech J, vol. 1, no. 1, pp. 1–14, http://dx.doi.org/10.1186/s40648-014-0001-z, 2014.
[15]    A. Broadhurst, S. Baker, and T. Kanade, “Monte carlo road safety reasoning,” in Intelligent Vehicles Symposium, 2005. Proceedings. IEEE, 2005, pp. 319–324.
[16]    A. Eidehall and L. Petersson, “Statistical Threat Assessment for General Road Scenes Using Monte Carlo Sampling,” Intelligent Transportation Systems, IEEE Transactions on, vol. 9, no. 1, pp. 137–147, 2008.
[17]    C. Schmidt, Fahrstrategien zur Unfallvermeidung im Straßenverkehr für Einzel- und Mehrobjektszenarien. KIT, Diss.--Karlsruher Institut für Technologie, 2013. Karlsruhe, Baden: KIT Scientific Publishing, 2014.

9. Logical Scenarios & Parameter Space

9. Logical Scenarios & Parameter Space

Logical scenarios that were previously defined are stored in this data container together with their variants extracted from measurement data. Logical scenarios are an abstract description of a traffic scenario. They are characterized by the fact that not all scenarios, such as distances or speeds, have a concrete value. Instead, some values are left open and only parameter ranges are specified. Which combinations of parameters occur how often can be extracted from measurement data. Several components are required to be able to save these logical scenarios together with the variants found from measurement data (concrete scenarios). First, parameters are derived from the modeling of the logical scenarios that uniquely describe the concrete characteristics of the scenarios. These parameters are then determined automatically in the extraction step for the scenarios found. In order to be able to use the modelled scenarios in simulations or on the test track, templates are created in OpenSCENARIO and OpenDRIVE. These templates are created manually for each scenario and stored in the database.

After scenarios in the previous process steps have been found and extracted from measurement data, e.g. from test runs, they are stored in this data container. However, scenarios must not only be extracted, but it must also be possible to play them back in the simulation and on the test track. On the one hand, it is important that a scenario found in measurement data, including the static environment, can be algorithmically correctly extracted and modelled. On the other hand, it is at least as important that these scenarios can be described technically so that the events can be reproduced precisely and that the description format used can be used in the simulation and the test track.

Within this project, the two description formats OpenDRIVE and OpenSCENARIO are used to fulfill this task. Both formats are open standards and are based on the XML description language. They solve the problem that an open standard is needed to describe the static and dynamic environment. Thus, within the database, a single combination of standards can be used to aggregate data from different sources such as FOTs, NDSs or simulations and simultaneously use them on the test track or simulations from different vendors.

OpenDRIVE was first released in 2005 and allows to describe a static environment with a focus on streets. Thus, individual road elements such as straight lines and curves can be parameterized and combined with other elements to create entire routes. OpenSCENARIO was first introduced in 2014 and further developed during the PEGASUS project. Unlike OpenDRIVE, OpenSCENARIO is used to describe the temporally dynamic elements of a scenario. For example, the position of a traffic light is defined in OpenDRIVE, while the switching times are defined in OpenSCENARIO. However, the focus is on the vehicles themselves. For example, vehicles are defined in catalogs and positioned on a route that is referenced in OpenDRIVE. The behavior of the vehicles in the scenario can be described by a story. Vehicle actions can be triggered when conditions occur. The actions include, for example, a lateral movement to change lanes on different levels of abstraction. A vehicle can be given only a new destination lane, the transverse movement can be described by geometrical primitives or precisely given by a trajectory in the form of a list of positions.

Although OpenDRIVE and OpenSCENARIO are technically powerful enough to define scenarios, there is still room for improvement. For example, the definition of routes in OpenDRIVE is often time-consuming, especially for more complex routes. For this reason, DLR introduced the SimpleRoad description language. This makes it easier to generate routes by providing a shorter and easier to understand description. Files in SimpleRoad format can be translated into OpenDRIVE by a provided converter.

Even when describing the dynamic parts of a scenario in OpenSCENARIO, compromises must be made in the current version with regard to parameterization. This becomes clear with the example of the description of a lane change trajectory. Within the scenario concept, lane changes are described by fourth degree polynomials. However, since the current version of the OpenSCENARIO standard does not support these as geometric primitives, such a trajectory must be defined as a list of positions. This makes it possible to use the polynomial for lane changes, but this trajectory cannot be described by a few parameters in OpenSCENARIO itself. On the other hand, supported geometric shapes such as straight lines can be easily modified by parameters, which is an important requirement for stochastic variation. The solution to this problem is to allow the introduction of non-standard compliant elements into OpenSCENARIO. For these, however, an appropriate script must be provided, which is executed by an introduced Transpiler and translates the new definition into standards-compliant OpenSCENARIO. Thus, new maneuvers and other elements can be easily integrated and distributed to partners for evaluation before ultimately eventually being integrated into the standard.

These data formats (OpenDRIVE and OpenSCENARIO) and extensions (SimpleRoad and Transpiler) are used to define the logical scenarios. As already mentioned, not all parameters are assigned a concrete value. Since elements such as distances and speeds are not fixed in a logical scenario, placeholder variables are specified in the files at these points. Before the scenarios can be used for a simulation, for example, these variables must be replaced by concrete values so that a concrete scenario is created from the logical scenario.

Method to define acceptance criteria

As an example, the single reeving shall be considered from the right, which is shown in Figure 30. For the dynamic elements of this logical scenario, a total of 8 parameters were required, which are listed in Table 1. If a value is defined for each parameter, the constellation of the vehicles shown in this scenario can be uniquely described. The space of all possible parameter combinations describes the parameter space. It contains all possible variations of this one scenario. After, for example, a right-hand cut-in has been detected in the data, the values of all parameters are determined from the trajectories and saved. After this has been carried out for a large number of scenarios, analyses can be carried out in the next process steps, for example on the frequency of certain characteristics, in order to use the findings for the simulation.

10. Integration of Pass / Fail Criteria

10. Integration of Pass / Fail Criteria

Pass / fail criteria are implemented as fixed threshold. These thresholds are depicted by utilizing function styled formats. Therefore, a unified language was defined, offering basic operators and functions. As long as the threshold or inequality is fulfilled for all time steps during a test, the test is assumed as passed. The format is based on Matlab function to simplify the implementation effort needed to integrate pass-fail criteria into existing testing toolchains.

11. Space of logical Test Cases

coming soon

12. Application of Test Concept incl. Variation Method

coming soon

13. Test Cases

coming soon

14. Test HAD-F: Simulation, Proving Ground, Real World Drive

coming soon

15. Test Data

15. Test Data

Test data container measurements contain generally the mandatory PEGASUS JSON Signals. Simulation test data and proving ground test data have their own data-description format depending of the applied measuring technology, simulation architecture or tools as well as on the test object bus architecture. In order to standardize the test data treatment, a general test data format was defined in the PEGASUS project. This test data format should have the following hard characteristics:

  • Be able to fully describe the tested scenario
  • Metrics can be applied to the test data

It also should have the following soft characteristic:

  • Easily be used as input for the PEGASUS method, especially for real ride tests

The hard characteristics fully match the requirements of the PEGASUS scenario database. Therefore, it was decided to use the PEGASUS JSON format description with the benefit that database scripts can easily be applied to the test data sets. The only difference between this container and the container of step 7 is the fact, that the ego describing signals are fulfilled by vehicle with activated driving function (closed-loop). Within the data container of step 7, it is possible that the ego signals are fulfilled by a vehicle without automated driving function, e.g. open-loop or manually driven.

In order to compare different test data sets from different test instances a converter needs to be implemented which transforms the measured signals and their format into the PEGASUS JSON format. This converter can be used in the simulation, proving ground and real world tests as long as the input signals are chosen carefully. It ensures also the comparability of test results between the different test instances and thus allows to verify the results among each other.
The test data from simulation, proving grounds, and real world drives can also be used as input for a second iteration of the PEGASUS method. Therefore, a converter should be implemented to converter the result data to the common input format of the database.

Due to the standardized interface the test data can be uploaded easily into the PEGASUS database including a flag signaling the originating test instance. This flag must be activated by uploading simulation or proving ground – data in order to not disturb the calculated distributions. Calculated database parameter distributions must reflect reality, simulation searches critical scenes and proving ground – data test critical scenes and would virtually increase the probability of critical situations. In contrary, real ride – tests should always be uploaded in the scenario database to improve the calculated parameter and scenario distributions.

16. Evaluation and Classification

16. Evaluation and Classification

Within the processing step test result evaluation, all or part of the test data are categorized and evaluated with different purposes. The results are summarized and depending on the purpose possibly passed back to one test instance for another iterative assessment step. If the results fulfill certain criteria, the iterative assessment process breaks up and the results are passed to the next processing step, which is the overall risk assessment.

The iterative assessment process has two main purposes:

  • iterative assessment within the simulation in order to find critical scenario subspaces
  • passing of concrete scenario test result to another test instance for cross-verification

First, there will be an evaluation of the single test results of each concrete scenario test either originating from the proving ground or the simulation.  Within the iterative assessment process of the simulation (see step 12) there will exist groups of test results within a logical scenario space that are evaluated regarding the goal of finding critical scenario subspaces. If such a subspace cannot be described sufficiently, the results are passed back to the test automation including the stochastic variation for further assessment. Another purpose of passing back test results is the cross-verification, e.g. if a concrete scenario test result of the simulation must be verified in a real but clinical testing environment.

 

17. Test Results

coming soon

18. Risk Assessment

18. Risk Assessment

PEGASUS Behavioral Safety Assessment (BSA) focuses on the assessment of the HAD-F in individual test cases. Thereby, for each individual test case different metrics are applied to confirm the HAD-Fs compliances with pre-defined behavioral criteria. In the specific PEGASUS context these pre-defined criteria are (a) keeping appropriate safety distances, (b) not causing collisions and (c), if possible, mitigating collisions, as they comply with the test concept described in step 12. During the BSA, for each of these criteria, it is evaluated if the HAD-F complied with it or not. Based on the result for each criterion a method is proposed to decide if a single test case is passed or failed.

Furthermore, the BSA discusses the necessary information needed to extrapolate the result of an individual test case to its semi-concrete scenario and its logical scenario as well as to the overall operational design domain, to derive more expressive results.

Multi-Stage Behavioral Safety Assessment
The first part of the BSA focuses on the assessment of individual test cases. Thereby, it is assumed that the test cases from different focus areas (e.g. crash analysis, automation risks, FOT-data, and simulation) are provided by D18 in a unified format, as described above. Figure 1 shows the different stages of the BSA. The first stage assesses if the HAD-F complies with the required safety distances based on metrics like time-to-collision or the RSS-Model (Shalev-Shwart,et.al) [S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model of safe and scalable self-driving cars,” arXiv:1708.06374, 2017].

If a safety distance is violated stage 1 is failed if not it is passed. Notice, that at this point it is not distinguished if the HAD-F causes the violation or another traffic participant. Stage 1 may be obsolete depending on the test concept, which is highlighted by the dashed line.

While stage 2 performs the actual collision check, stage 3 performs the causality assessment. Stage 3 is of great importance, since up to now it cannot be determined if the HAD-F or someone else caused possible fails in stage 1 or stage 2. Therefore, it is first evaluated if the situation would have been controllable for the HAD-F. In PEGASUS this controllability evaluation is limited to assessing the physical limits of driving for collision avoidance. If the HAD-F would not have been able to control the situation leading to the collision an additional examination is performed if the HAD-F caused the collision.

While stage 3 is important in the PEGASUS project it may be difficult to assess causality automatically for all relevant scenarios. While it may be possible to estimate the driving limits with a certain uncertainty, estimating who caused the accident may be challenging, since generic rules for all possible scenarios would have to be derived. However, the RSS-Model (Shalev-Shwart,et.al) [S. Shalev-Shwartz, S. Shammah, and A. Shashua, “On a formal model of safe and scalable self-driving cars,” arXiv:1708.06374, 2017] already contains a causality model, which covers a significant amount of situations and more progress in this field can be expected in the future. Until then the judgment of an expert group could be used.

Finally, stage 4 determines if the HAD-F appropriately mitigated the collision (e.g. by applying an appropriate braking force). Thereby, it is irrelevant if the HAD-F caused the collision or nor. This is the case because the PEGASUS project assumes that the HAD-F should try to mitigate a collision in any case, if it is physically possible.

After evaluating all stages it is determined if the test case is passed or failed. The approach taken in PEGASUS is depicted in Figure 2. The first row is an overall fail because the HAD-F does not keep the safety distance, causes a collision and did not mitigate appropriately. The second row shows a case where the HAD-F does not comply with the safety distances but has no collision. In this case, the overall result may be a PASS-, since even without a collision the HAD-Fs behavior is critical. The result in the third row differs depending on if the HAD-F caused the collision (stage 3: 0) or not (stage 3: 1), which clarifies the importance of the causality stage.

Extrapolation of individual test-case results
A central question of the behavioral safety assessment is which conclusion can be drawn for the individual test result of Figure 2. The most fundamental conclusion is if a test case is passed or not and if not for what reason. This result can, for example, be used to compare two HAD-F releases, if the same arbitrarily selected test cases are used during the BSA.
More expressive conclusions can be drawn if additional information on the context of the test-case are available as shown in Table 1.

Table 1: Example for scaling the result of a single-test case to its logical scenario and to an average yearly driving distance. Bottom row: basic result. Other rows: specification for additional inputs required to derive more expressive results

One possibility is to select the test cases by performing an equidistant sampling of the parameter space of a logical scenario (see Table 1 second row from the bottom). In this case, the importance π_i^R of the test case is defined by the total number of selected test cases NR as π i R=1/NR  . The result is the ratio by which the logical scenario under test is solved. Further, Table 1 shows possibilities to draw conclusion on the HAD-F behavior in an average yearly driving distance. However, while such extrapolations of the individual test cases results are theoretically possible the additionally required input parameters may only be available after extensive real world testing.

19. Contribution to Safety Statement

19. Contribution to Safety Statement

This step contains the results of the behavioral safety assessment as depicted in step 18 as well as information that allow identifying its logical scenario. The content of this data tone is the behavioral safety assessment of step 18 and data-file which includes the ID of the test-case and ID of its logical scenario, the result of the multiple stage of step 18 and if available the extrapolated results as described within table 1 of step 18.

 

20. Contribution to Safety Argumentation

20. Contribution to Safety Argumentation

PEGASUS Safety Argumentation is to be understood as a conceptual framework to support securing and approval of higher levels of automation through structure, formalization, coherence, integrity and relevance. It is structured by introducing five layers. Established formalizations are used wherever possible in order to describe each layers elements. Those elements are linked across the layers in order to form a coherent argumentation. It is also suggested to evaluate each elements integrity in order to establish a reliable Safety Argumentation. Assessing each elements relevancy proposes to be a useful measure as well. This framework is a proposal for the integration of research on the topic of technology acceptance, existing guidelines or standards to be taken into account when bringing HAD-F to market and the logical structure that builds a safety case. The central assumption of the PEGASUS Safety Argumentation is: if a chain of arguments, which was created taking into account the proposed framework of the PEGASUS Safety Argumentation, stands up to a critical examination, this will support securing and approval of higher levels of automation.

5 Questions, 5 Paradigms, 5 Layers

It is assumed that a safety argumentation for HAD-F needs to address more or less the following five questions:

  1. How to ensure necessarily compliance with all relevant standards and directives needed for homologation?
  2. Which elements should a safety argumentation contain sufficiently?
  3. How to support a safety argumentation with evidence?
  4. How to monitor a HAD-F over its entire life cycle?
  5. What do people expect from highly automated mobility in the larger context?

The first question is addressed in large parts by established quality management processes. A product which complies with relevant guidelines or standards shall be presumed to comply with product safety requirements insofar as they are covered by those guidelines or standards.

The question of what design principles should be taken into account during the development of HAD-F is currently still differently answered. Here is room for standardization to develop a unified view of which design principles are relevant and how they are defined. Provided that those design principles have a positive effect on safety of a HAD-F, applying state of the art design principles seems to be a key prerequisite to achieve sufficient conditions for securing and approval of higher levels of automation. This can be seen as a feasible first step in operationalizing general objectives as recommended by the German ethical commission for example: if a generally positive risk balance in comparison to the human driving performance can be assumed, technically unavoidable residual risks do not stand in the way of a market launch (Ethics Commission Automated and Connected Driving appointed by the German Federal Minister of Transport and Digital Infrastructure, 2019).

Coming up with answers to question number three is one of the main drivers of the PEGASUS project as described in the previous sections of this document. The methods and tools developed within PEGASUS all aim to provide evidence in a safety argumentation. For safeguarding HAD-F it is crucial to link above mentioned design principles and the derived safety goals with results achieved by applying actions, methods, and tools like those developed by members of the PEGASUS project. Only if there is a logical connection between design principles and the results obtained by applying actions, methods and tools can use these results be considered as evident in the sense of a safety argumentation.

Question number four refers to the systematic monitoring of the market launch and throughout the product life cycle. It is assumed, that the product needs to be monitored during its life cycle with particular attention to deficits and unexpected behavior. As it is not possible to provide a statistically valid proof of positive risk balance for a HAD-F before market launch, particular attention should be paid here (Philipp Junietz, 2019). At this point there is a need for further clarification regarding a uniform approach.

The fifth question seeks to embed a safety argumentation in a broader context of technology acceptance. The assumption is that a safe product is necessary for acceptance of a particular technology but it is very likely that this is not the whole story. Answering this question is more than safeguarding HAD-F. Rather, it is about understanding people’s needs in relation to HAD-F and drawing the right conclusions for the development of highly automated mobility.
As a conceptional framework the PEGASUS safety argumentation wants to address the previous mentioned questions methodically: in a nutshell the PEGASUS safety argumentation proposes five layers for this upon which elements are located. Layer 0 embeds a safety argumentation in a broader context of technology acceptance. Within Layer 1, top level safety goals are defined which need to be achieved. Layer 2 links elements across the layers to build a safety case. Within layer 3, actions, methods and tools are developed and implemented in order to achieve results. Those results become evident (Layer 4) when they can be traced back to a safety goal within layer 1.

Therefore, the overarching goal is to create a relevant, coherent and thereby verifiable safety argumentation with integrity in a structured and formalized manner. Leading the activities here are five paradigms mentioned: structuring, formalization, coherence, integrity and relevance.

Structuring is meant here as organizing the PEGASUS safety argumentation into five layers. This structure attempts to bring more clarity into the continuing discussions regarding securing and approval of higher levels of automation. A differentiation must be made between two kinds of layers: layers outside the focus of PEGASUS, which can be placed within the proposed framework in order to put PEGASUS into a wider context (layer 0), and layers and their elements which directly contribute to answering the question of how to verify safety and reliability (layer 1-4).

Formalization within the PEGASUS safety argumentation means making the individual elements of a layer explicit by means of a defined, standardized and ideally established notation. The choice of the notation is dependent on the layer and the contained elements to be formalized.

Coherence is defined as the linking of individual elements within one of the postulated five layers of the PEGASUS safety argumentation as well as in between layers. Only once there is success in reasonably linking elements across layers, can one show how safety and reliability can be verified and how an argumentation can be made for socially accepted, highly automated mobility in the larger context.

Integrity in the sense of the PEGASUS safety argumentation is intended as a quality criterion for the individual elements of the five layers. They are oriented around the quality criteria of empirical research. Different levels of integrity are imaginable in principle for the different layers and their elements.

Relevance may also be seen as a quality criterion but despite integrity may not be applicable to all five layers. Relevance is especially linked to the elements of layer 2. It is intended to show the “maturity layer” of each element. For example, a peer reviewed and published method, which is applied for the purposes of generating evidence, is ranked higher on an ordinal scale than an unpublished method. Integrity does not necessarily have to correlate with relevance, however it can be expected that an element with high relevance also has a high integrity.

Layer 0 – Acceptance Model

Layer 0 embeds PEGASUS in a larger context. The specifics are not the focus of PEGASUS. The key element of this layer is a scientific model for describing the dependence of the individual or social acceptance for HAD-F from several factors. A key premise here is that individual or social acceptance cannot be explained with a single cause. This premise is in line with established models on individual technology acceptance such as the Technology Acceptance Model (TAM) (Davis, 1985), (Davis, 1989) the Theory of Planned Behavior (TPB) (Ajzen, 1991), and the Unified Theory of Acceptance and Use of Technology (UTAUT) (Venkatesh, et al., 2003). Depending on the model, different factors are postulated. Rahman et al. examined the utility of the models mentioned for describing the acceptance of Advanced Driver Assistance Systems (ADAS) and give an overview of studies which build upon the models mentioned (Rahmana, et al., 2017). Rahman et al. come to the conclusion that in principle all three models can be used in the context of acceptance of ADAS. However, further work is necessary to adapt the theories and models to the domain, to develop them further and to explain a greater proportion of the variance (cf.(Madigan, et al., 2017), (J.Haboucha, et al., 2017)).

Layer 1 – Top Level Safety Goals

Currently there are various suggestions to address the topic here: OEMs suppliers, associations, research institutes as well as national and international regulators and legislatures all have their perspectives which points need to be taken into account in order to fulfill the promise of enhanced road safety by introducing HAD-F. Depending on the author, those are called design principles, (priority) safety design elements, guidelines or (ethical) rules for example. They all share the pursuit of automated vehicle technology safety. Merging those approaches to establish a unified and complete view seems to be reasonable. To the author’s knowledge, this is the subject of ongoing discussions. Within the scope of PEGASUS safety argumentation, 13 top level safety goals directly derived from the report by the German Ethics Commission on Automated and Connected Driving (G_01) as well as the 12 priority design elements proposed by NHTSA (NHTSA, 2017) (G_02 – G_13) are used exemplarily.

It is argued here, that the fulfillment of G_02 to G_13 is a positive influence on the achievement of the more general requirement associated with G_01. For the scope of PEGASUS, G_05 “Process and procedure for testing and validation of HAD-F functionality with the prescribed ODD is documented and considered” is of particular interest as it is in line with the overall project goals.

Layer 2 – Logical Structure

The logical structure of the safety argument as defined by the PEGASUS safety argumentation is all about deriving the steps necessary to establish a verification for safety and reliability from higher level goals. The logical structure of the argumentation is made explicit in order to answer the following question: why is a certain result, which has been achieved, relevant for the verification of safety and reliability? In principle, various formalizations are applicable for this: an example is ISO 15026-2 (Anon., kein Datum) or the Goal Structuring Notation (GSN) (Kelly & Weaver, 2004) (Origin Consulting (York) Limited on behalf of the Contributors, 2011), which are also used in the example given later. Both approaches formulate action-guiding goals, which to a certain extent serve as a monitor of success in the argumentation chain. By means of a standardized graphical notation, safety goals are defined as elements of the second layer, plus strategies and solutions for achieving these safety goals. Further information regarding justifications, assumptions and context can also be consigned to these.

It is proposed to add a relevancy ranking to the individual elements of the logical structure of the safety argument as mentioned earlier.

Layer 3 – Methods and Tools

The leading question for layer 3 of the PEGASUS safety argumentation is: How can one prove that the layer 2 specific safety goals have been achieved? To do so one needs results at first hand. As part of the overall PEGASUS method, a continuous and flexible tool chain is being described in the previous chapters of this document. All this actions, methods and tools could be linked to specific safety goals. This will later be depicted as an example.

Layer 4 – Evidence

Results are generated from applying actions, methods and tools located in layer 3. The challenge is now to embed the results and evaluate their contribution to achieving a specific safety goal. Only through this step can a result become evident as intended by the PEGASUS safety argumentation. The leading question for layer 4 is therefore: Can a result be considered as evidence for achieving a specific safety goal? The key difference between the terms “result” and “evidence” in the sense of the PEGASUS safety argumentation is therefore how they can be assigned to specific safety goals.

The methods and tools described in layer 3 can be as widely differing as the specific safety goals that are to be achieved. This can then lead to heterogeneous results and to formats in which evidence can lie. A uniform formalization cannot therefore be reasonably proposed for all results from layer 4.

At this point we arrive at a key point of the PEGASUS safety argumentation – the critical testing of the argumentation chains. If the argumentation chains, which connect the individual elements of the layers, can withstand critical evaluation, an approval recommendation can be made.

Example

Following the above introduction of the PEGASUS safety argumentation, there now follows a brief demonstration of how it could be applied. This example is given in a graphic form and draws from proposed formalizations. It must be considered here that this example does not claim to show a complete or sufficiently detailed argumentation chain. Instead it is intended to illustrate how the layers – described in theory – could be applied in practice.

Layer 0 shows a model based on UTAUT but is slightly altered and depicted without moderating effects for the sake of simplicity. The ellipses used to visualize the latent variables should not be confused with elements used for GSN (Formalisation here draws from Structural Equation Modelling (SEM), an approach from multivariate statistics). Relevance could be rated B in principle for research on technology acceptance as there is a large body of research on this topic. Nevertheless, the particular model depicted here lacks validity (integrity) mainly for two reasons: it was altered by the author and it is designed for individual technology acceptance, not social acceptance.

In Layer 1 the Top Level Safety Goals G_01 – G_13 are addressed. Strategy S_01 reflects the argument, that G_02 – G_13 support achievement of overarching goal G_01. This example focuses on G_05.

Layer 2 mainly address the operationalization of G_05. S_02 aims to reflect the fourth leading question presented earlier: how to monitor a HAD-F over its entire life cycle? G_14 (early research and development), G_15 (product development), and G_16 (after market) aims to address processes and procedures for testing and validation of HAD-F functionality with the prescribed ODD. They differ in the phase of the product life cycle they address. Hereby it is expressed, that there is a need for different processes and procedures depending on the life cycle. This example focus on G_15: Process and procedure for testing and validation of HAD-F functionality with the prescribed ODD is documented and considered to bring HAD-F to market.

Strategy S_03 expresses the overall project goals of PEGASUS to come up with the solution for safeguarding HAD-F for market. At this point it becomes clear, where PEGASUS can support securing and approval of higher levels of automation.

The overall PEGASUS method is reflected by G_17 – G_20. They reflect the four main clusters of the PEGASUS method, namely Data Processing (G_17), Requirements Definition & Conversion for Database (G_18), Scenario Compilation and Database (G_19) and Assessment of Highly Automated Driving Function (incl. Human) (G_20). As stated earlier, methods developed within these clusters could be traced back to fulfillment of goal G_05 and therefore are evident for this particular safety goal.

The example focuses on G_20 in particular the composition of a safety statement as reflected by G_21. This was explained in detail in one of the previous sections. S_05 expresses the basic assumption of that approach to decompose the safety statement into four individual ratings (stages). Integrity could be rated high for Level 2 as the proposed approach offers traceability. Relevance could be rated C-B.

Applying this particular method as described in step 18 is located in layer 3. As the proposed method is not fully implemented as an algorithm yet, it lacks reproducibility and reliability which impacts integrity. Relevance could be rated C-B depending on the stage. Results obtained by applying this method can count as evident as they are linked to the overall safety goal and are located in layer 4.