[go: up one dir, main page]

WO2023161993A1 - Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations - Google Patents

Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations Download PDF

Info

Publication number
WO2023161993A1
WO2023161993A1 PCT/JP2022/007230 JP2022007230W WO2023161993A1 WO 2023161993 A1 WO2023161993 A1 WO 2023161993A1 JP 2022007230 W JP2022007230 W JP 2022007230W WO 2023161993 A1 WO2023161993 A1 WO 2023161993A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
value
data
values
combination
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/007230
Other languages
English (en)
Japanese (ja)
Inventor
啓介 後藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to PCT/JP2022/007230 priority Critical patent/WO2023161993A1/fr
Publication of WO2023161993A1 publication Critical patent/WO2023161993A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a training data generation program, a training data generation method, and an information processing device.
  • machine learning may be used to determine whether or not a person can be hired (whether or not he or she is worthy of being hired) according to the attributes of an applicant for a job, based on training data on hiring performance by companies.
  • the training data includes, for example, an attribute of the applicant such as gender, and a class label indicating whether or not the applicant was determined to be employable.
  • the rate of being hired was significantly higher for males than for non-males. shall be Such training data has a bias that if the gender is male, the probability of being hired is very high.
  • a learning result (machine learning model) that reflects the bias is obtained. For example, if the gender is male, a machine learning model is generated that is determined to be employable with a higher probability than other persons. However, gender is not an indicator of job performance. Therefore, the use of such machine learning models leads to gender-unfair results and is not appropriate.
  • a learning data generation device that suppresses bias in the attributes of learning data.
  • an information processing apparatus has been proposed that reduces the influence of the bias of information that is the target of inference in learning data during machine learning of an inference device, and also reduces the influence of the bias of other information related to the information.
  • Systems have also been proposed to detect and mitigate bias in intelligent virtual assistants. Devices have also been proposed that can prevent unethical behavior related to training data.
  • bias correction processing When bias correction processing is applied to training data, it is desirable to be able to clearly explain the reason for bias correction.
  • the ease of explaining the reason for bias correction will be referred to as the interpretability of bias correction.
  • the training data contains various biases, removing all the biases will result in various changes to the training data. A wide variety of changes in the training data impairs the interpretability of the bias correction as a whole.
  • this case aims to prevent the deterioration of the interpretability of bias correction for training data.
  • a training data generation program causes a computer to perform the following processes. Based on the number of data corresponding to a combination of attribute values, among a plurality of data each having a plurality of attributes, the computer determines the first one or more attribute values having a data bias equal to or greater than a threshold value. Identify combinations. The computer selects a particular attribute value according to the number of times each attribute value is included in the first one or more attribute value combinations. Then, the computer selects a plurality of attribute values according to the condition that the data bias is less than a threshold with respect to a second combination of one or more attribute values that includes a specific attribute value among the first combination of one or more attribute values. Training data is generated by changing the value of one or more data attributes of the data.
  • FIG. 10 is a diagram showing an example of input data change that reduces the bias correction effect; It is a figure which shows an example of the procedure of a bias correction process.
  • FIG. 10 is a diagram showing an example of input data change that reduces the bias correction effect; It is a figure which shows an example of the procedure of a bias correction process.
  • FIG. 11 is a flow chart showing an example of a procedure of change rule generation processing;
  • FIG. 10 is a diagram showing an example of a change rule list;
  • FIG. 9 is a flow chart showing an example of a procedure of sensitive attribute selection processing;
  • FIG. 10 is a diagram showing an example of an applied change rule list;
  • FIG. 11 is a flowchart showing an example of a procedure of sensitive attribute change processing;
  • FIG. It is a figure which shows an example of the training data produced
  • FIG. 10 is a flowchart showing an example of a procedure of sensitive attribute selection processing according to importance;
  • FIG. It is a figure which shows an example of sensitive attribute selection according to importance.
  • the first embodiment is a training data generation method capable of suppressing deterioration in interpretability of bias correction for training data in machine learning.
  • FIG. 1 is a diagram showing an example of a training data generation method according to the first embodiment.
  • FIG. 1 shows an information processing device 10 for implementing the training data generation method.
  • the information processing device 10 can implement the training data generation method by executing, for example, a training data generation program.
  • the information processing device 10 has a storage unit 11 and a processing unit 12 .
  • the storage unit 11 is, for example, a memory or a storage device that the information processing device 10 has.
  • the processing unit 12 is, for example, a processor or an arithmetic circuit included in the information processing device 10 .
  • the storage unit 11 stores the input data 1.
  • Input data 1 is data used for supervised learning in machine learning.
  • Input data 1 includes a plurality of data (records) each having a plurality of attributes.
  • each of the plurality of data is given a label (C) value indicating the correct answer when learning the object (person, animal, object, phenomenon, etc.) represented by the data.
  • a flag of "1" or “0” indicates whether the value of each attribute is a predetermined value.
  • a flag “1” indicates that the value of the attribute is a predetermined value (for example, “male” for the attribute “gender”).
  • a flag “0” indicates that the value of the attribute is a value other than a predetermined value (for example, "female” for the attribute "sex").
  • Input data 1 may contain bias.
  • the difference in the value of the first attribute causes an excessive difference in the appearance frequency of the value of the assigned label (e.g., the ratio of values "1"). If so, there is a bias for the value of that first attribute.
  • the processing unit 12 generates training data 2 in which the bias contained in the input data 1 is corrected. For example, the processing unit 12 selects a first attribute having a data bias equal to or greater than a threshold value ⁇ based on the number of data corresponding to a combination of attribute values among a plurality of data included in the input data 1. Identify combinations of values for .
  • a combination of values of the first one or more attributes is, for example, a combination of a value of the first attribute and a value of the second attribute.
  • the processing unit 12 calculates an index (bias value) indicating data bias for each combination of attribute values.
  • the processing unit 12 determines that the combination of attribute values has a bias equal to or greater than the threshold ⁇ .
  • the processing unit 12 determines that the combination of attribute values has a bias equal to or greater than the threshold ⁇ .
  • the processing unit 12 determines that the combination of attribute values has a bias equal to or greater than the threshold ⁇ .
  • the processing unit 12 selects a specific attribute value according to the number of values of each attribute included in the first combination of one or more attribute values. A subset of the attribute values included in the first set of attribute value(s) is selected as the value of the particular attribute.
  • the processing unit 12 generates training data 2 according to predetermined conditions.
  • the predetermined condition is a condition that the data bias is less than the threshold value ⁇ with respect to the second combination of one or more attribute values that includes the specific attribute value among the first combination of one or more attribute values. is.
  • the processing unit 12 generates training data 2 that satisfies a predetermined condition by changing attribute values of one or more data out of the plurality of data.
  • the data bias is less than the threshold ⁇ with respect to the combination of the values of the second one or more attributes including the value of the specific attribute, and the value of the specific attribute is Biases in value combinations of the second attribute or attributes are corrected.
  • the processing unit 12 also selects a specific attribute value from among the plurality of first attribute values. Accordingly, it is possible to appropriately prevent an unfair inference result from being output by a model obtained by performing machine learning using the generated training data 2 .
  • the processing unit 12 In generating the training data 2, the processing unit 12 generates, for example, change rules corresponding to each combination of values of one or more second attributes.
  • the change rule a value different from the value of the specific attribute is set in the common attribute with the value of the specific attribute, and the value of the second attribute indicated in the combination of the values of the second one or more attributes Data in which a value is set and a predetermined label is assigned is specified as a change target.
  • the change rule specifies that the second value of a specific attribute in the data to be changed should be changed to the first value.
  • the processing unit 12 selects at least one attribute value from the largest number included in the first combination of one or more attribute values, for example, as the specific attribute value. Select as the value of This allows more biases to be corrected by changing the value for one attribute. That is, a high bias correction effect can be obtained by changing the value of one type of attribute.
  • the processing unit 12 may select a specific attribute value in consideration of the interpretability of each attribute. For example, in the process of selecting a specific attribute value, the processing unit 12 selects the first one or more attribute values for each attribute value included in the combination of the first one or more attribute values. Calculate the importance obtained by weighting the numbers contained in the combination of . Then, the processing unit 12 selects a specific attribute value according to the importance of each attribute value. For example, the processing unit 12 selects at least one attribute value in descending order of importance as a specific attribute value. As a result, biases related to attributes with higher interpretability are more likely to be corrected, and a decrease in interpretability due to correcting biases is suppressed.
  • bias correcting processing is performed as preprocessing of training data used for machine learning.
  • FIG. 2 is a diagram showing an example of computer hardware used in the second embodiment.
  • a computer 100 is entirely controlled by a processor 101 .
  • a memory 102 and a plurality of peripheral devices are connected to the processor 101 via a bus 109 .
  • Processor 101 may be a multiprocessor.
  • the processor 101 is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a DSP (Digital Signal Processor).
  • processor 101 executing a program may be realized by an electronic circuit such as ASIC (Application Specific Integrated Circuit) or PLD (Programmable Logic Device).
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • the memory 102 is used as the main storage device of the computer 100.
  • the memory 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the processor 101 .
  • the memory 102 stores various data used for processing by the processor 101 .
  • a volatile semiconductor memory device such as RAM (Random Access Memory) is used.
  • Peripheral devices connected to the bus 109 include a storage device 103 , a GPU (Graphics Processing Unit) 104 , an input interface 105 , an optical drive device 106 , a device connection interface 107 and a network interface 108 .
  • a storage device 103 a storage device 103 , a GPU (Graphics Processing Unit) 104 , an input interface 105 , an optical drive device 106 , a device connection interface 107 and a network interface 108 .
  • GPU Graphics Processing Unit
  • the storage device 103 electrically or magnetically writes data to and reads data from a built-in recording medium.
  • a storage device 103 is used as an auxiliary storage device for the computer 100 .
  • the storage device 103 stores an OS program, application programs, and various data.
  • an HDD Hard Disk Drive
  • an SSD Solid State Drive
  • the GPU 104 is an arithmetic unit that performs image processing, and is also called a graphics controller.
  • a monitor 21 is connected to the GPU 104 .
  • the GPU 104 displays an image on the screen of the monitor 21 according to instructions from the processor 101 .
  • Examples of the monitor 21 include a display device using an organic EL (Electro Luminescence), a liquid crystal display device, and the like.
  • a keyboard 22 and a mouse 23 are connected to the input interface 105 .
  • the input interface 105 transmits signals sent from the keyboard 22 and mouse 23 to the processor 101 .
  • the mouse 23 is an example of a pointing device, and other pointing devices can also be used.
  • Other pointing devices include touch panels, tablets, touchpads, trackballs, and the like.
  • the optical drive device 106 reads data recorded on the optical disc 24 or writes data to the optical disc 24 using laser light or the like.
  • the optical disc 24 is a portable recording medium on which data is recorded so as to be readable by light reflection.
  • the optical disc 24 includes DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable)/RW (ReWritable), and the like.
  • the device connection interface 107 is a communication interface for connecting peripheral devices to the computer 100 .
  • the device connection interface 107 can be connected to the memory device 25 and the memory reader/writer 26 .
  • the memory device 25 is a recording medium equipped with a communication function with the device connection interface 107 .
  • the memory reader/writer 26 is a device that writes data to the memory card 27 or reads data from the memory card 27 .
  • the memory card 27 is a card-type recording medium.
  • the network interface 108 is connected to the network 20.
  • Network interface 108 transmits and receives data to and from other computers or communication devices via network 20 .
  • the network interface 108 is a wired communication interface that is connected by a cable to a wired communication device such as a switch or router.
  • the network interface 108 may be a wireless communication interface that communicates with a wireless communication device such as a base station or an access point via radio waves.
  • the computer 100 can implement the processing functions of the second embodiment with the above hardware.
  • the information processing apparatus 10 shown in the first embodiment can also be realized by hardware similar to the computer 100 shown in FIG.
  • the computer 100 implements the processing functions of the second embodiment, for example, by executing a program recorded on a computer-readable recording medium.
  • a program describing the processing content to be executed by the computer 100 can be recorded in various recording media.
  • a program to be executed by the computer 100 can be stored in the storage device 103 .
  • the processor 101 loads at least part of the program in the storage device 103 into the memory 102 and executes the program.
  • the program to be executed by the computer 100 can also be recorded in a portable recording medium such as the optical disc 24, memory device 25, memory card 27, or the like.
  • a program stored in a portable recording medium can be executed after being installed in the storage device 103 under the control of the processor 101, for example.
  • the processor 101 can read and execute the program directly from the portable recording medium.
  • FIG. 3 is a block diagram showing functions that a computer has for machine learning.
  • Computer 100 has storage unit 110 , change rule generation unit 120 , sensitive attribute selection unit 130 , sensitive attribute change unit 140 and machine learning unit 150 .
  • the storage unit 110 stores input data 111 and training data 112 .
  • the input data 111 is a data set prepared for supervised learning.
  • Input data 111 includes a plurality of data. Each piece of data is set with values for sensitive attributes, non-sensitive attributes, and class labels.
  • a sensitive attribute is an example of the first attribute in the first embodiment.
  • a non-sensitive attribute is an example of the second attribute in the first embodiment.
  • Input data 111 may contain multiple kinds of biases.
  • the training data 112 is data obtained by removing some bias from the input data 111 . Training data 112 is the input to the learning phase of machine learning.
  • the change rule generation unit 120 generates bias change rules based on the input data 111 . For example, the change rule generation unit 120 calculates a numerical value that serves as an index for bias determination using a predetermined formula for the sensitive attribute included in the input data 111, and if the calculated numerical value is equal to or greater than a predetermined value, the sensitive attribute value Generate change rules that change the .
  • the sensitive attribute selection unit 130 selects a sensitive attribute to be changed from among the generated change rules. For example, the sensitive attribute selection unit 130 selects the sensitive attribute included in the largest number of change rules.
  • the sensitive attribute changing unit 140 changes the value of the selected sensitive attribute so that the bias is corrected. For example, the sensitive attribute changing unit 140 changes the value of the selected sensitive attribute for one or more pieces of data in the input data 111 so that the numerical value used as the index for bias determination is less than the threshold. Then, the sensitive attribute changing unit 140 stores the changed input data 111 as the training data 112 in the storage unit 110 .
  • the machine learning unit 150 performs machine learning using the training data 112. For example, the machine learning unit 150 receives the training data 112 and generates an inference model. Thereafter, when data to be inferred is input, the machine learning unit 150 performs inference processing using the generated model. For example, when a model is generated for judging whether or not an applicant can be hired for recruitment, the machine learning unit 150 uses the attributes of the applicant as input to make inferences using the model, and determines whether the applicant is worthy of being hired. output the inference result.
  • each element shown in FIG. 3 indicates part of the communication paths, and communication paths other than the illustrated communication paths can also be set. Also, the function of each element shown in FIG. 3 can be realized by causing a computer to execute a program module corresponding to the element, for example.
  • FIG. 4 is a diagram showing an example of input data.
  • the input data 111 shown in FIG. 4 is data indicating the result of determination of whether or not to hire an applicant for recruiting.
  • values are set for multiple sensitive attributes, multiple non-sensitive attributes, and class labels.
  • the value in the data corresponding to each attribute or class label is indicated by a "1" or "0" flag.
  • the flag of each attribute or class label of each data is "1" it indicates that the value of that attribute or class label is a predetermined value (for example, the value of the attribute "gender” is "male”).
  • the flag of each attribute or class label of each data is "0" it indicates that the value of that attribute or class label is other than the predetermined value (for example, the value of the attribute "gender” is other than “male”).
  • Sensitive attributes include "gender” and "race”.
  • Non-sensitive attributes include "hometown”, "annual income”, and "age”.
  • "annual income” column for example, if the applicant's annual income exceeds 1.3 million (annual income > 1.3 million), a flag "1” is set, and if the annual income does not exceed 1.3 million, a flag "0” is set. be.
  • a flag "1” is set if the age of the applicant is over 20 (age>20), and a flag "0” is set if the age is not over 20.
  • the class label is "recruitment”.
  • the value of the class label indicates the result determined by the hiring manager based on the results of interviews, practical tests, and the like.
  • which attribute in the input data 111 is set as the sensitive attribute is preset by the user.
  • "sex” and “race” are designated as sensitive attributes in order to correct biases due to sex discrimination and racial discrimination. For example, if discrimination based on place of birth is rampant and correction of bias due to place of birth discrimination is required, "hometown” may be designated as a sensitive attribute.
  • the input data 111 from which the training data 112 is generated may have a bias. In that case, it is inappropriate to directly use the training data 112 for learning by machine learning. For example, a model trained with biased input data 111 may exhibit discriminatory behavior.
  • the computer 100 then generates the training data 112 based on change rules that change the values of the sensitive attributes in the input data 111 so as to correct the bias.
  • a bias value calculated for a set of a predetermined value of one sensitive attribute and a predetermined value of one insensitive attribute is equal to or greater than a predetermined threshold value
  • a change rule is generated in association with the set. For example, let A be the sensitive attribute and B be the non-sensitive attribute of the pair to be subjected to bias calculation. Let C be the class label in the input data 111 . Further, let ⁇ be the threshold value of the bias value. At this time, if the bias value “elift (A, B ⁇ C)” of the selected set is equal to or greater than the threshold value ⁇ , it is determined that the selected set is biased.
  • the formula is as follows.
  • sup(A, B) is the number of data in which the value of the sensitive attribute of the set to be calculated is a predetermined value and the value of the non-sensitive attribute is a predetermined value.
  • sup(A, B, C) is the number of data in which the value of the sensitive attribute of the set to be calculated is a predetermined value, the value of the non-sensitive attribute is a predetermined value, and the value of the class label is a predetermined value.
  • sup(B) is the number of data in which the value of the non-sensitive attribute of the set to be calculated is the predetermined value.
  • sup(A, C) is the number of data in which the value of the sensitive attribute of the set to be calculated is a predetermined value and the value of the class label is a predetermined value.
  • Formula (2) indicates the ratio of data with a predetermined value for the class label among the data for which both the values of the sensitive attribute and the non-sensitive attribute are predetermined values.
  • Equation (3) indicates the ratio of data with a predetermined value of the class label to data with a predetermined value of the non-sensitive attribute.
  • the bias value shown on the left side of Equation (1) indicates how much the probability that the value of the class label will be the predetermined value changes depending on whether or not the sensitive attribute is taken into consideration. For example, the ratio of the class label value when the sensitive attribute is taken into consideration (the value obtained by the formula (2)) is the ratio of the class label when the sensitive attribute is not taken into account (the value obtained by the formula (3)). The larger the value obtained), the larger the bias value. Then, if the bias value for a pair of sensitive attributes and non-sensitive attributes is greater than or equal to the threshold ⁇ , a change rule for that pair is generated.
  • FIG. 5 is a diagram showing an example of generation of change rules. For example, it is assumed that the bias value is calculated for a set of the value "male" of the sensitive attribute "gender” and the value "20" of the non-sensitive attribute "age”.
  • the input data 41 includes four data as shown in FIG.
  • the threshold value ⁇ of the bias value is "1.2".
  • FIG. 6 is a diagram showing an example of bias correction.
  • the training data 42 is data in which the value of the sensitive attribute has been changed.
  • the input data may contain various biases. At that time, applying change rules that correspond to all biases impairs the interpretability of bias correction.
  • FIG. 7 is a diagram showing an example of input data change that impairs interpretability.
  • a change rule is generated to change the value of the sensitive attribute "gender” to "male” for the data whose value of the sensitive attribute "gender” is other than “male” and whose value of the class label "employment” is "impossible”. (conditions for non-sensitive attributes are omitted).
  • a change rule is generated to change the value of the sensitive attribute "race” to "white”. (conditions for non-sensitive attributes omitted).
  • the training data 112a generated by applying all generated change rules to the input data 111 has changed values for many sensitive attributes. Changing the values of a large number of sensitive attributes in this manner removes many of the biases in the input data 111, but reduces the ease of explanation of the bias correction process. That is, the interpretability of the bias correction process is lowered.
  • FIG. 8 is a diagram showing an example of input data change that reduces the bias correction effect.
  • a change rule for changing the value of the sensitive attribute "gender” to "male” for data whose value of the sensitive attribute "gender” is other than “male” and whose value of the class label "employment” is “impossible” is generated 10 times.
  • the conditions (B1, B2, . . . , B10) for non-sensitive attributes in each change rule are different.
  • training data 112b is generated in which the value of the sensitive attribute "race” is changed from other than “white” to "white” for some data.
  • the sensitive attribute selection unit 130 selects the value of the sensitive attribute with the highest appearance frequency among the generated change rules as a change target in the bias correcting process. This makes it possible to correct more biases by changing the value of one type of sensitive attribute. In other words, a large bias correcting effect can be obtained by changing the value of a small number of sensitive attributes.
  • FIG. 9 is a diagram illustrating an example of a procedure for bias correction processing. The processing shown in FIG. 9 will be described below along with the step numbers. [Step S ⁇ b>101 ]
  • the change rule generation unit 120 reads the input data 111 from the storage unit 110 .
  • Step S ⁇ b>102 Based on the input data 111 , the change rule generation unit 120 generates a bias change rule included in the input data 111 . One or more change rules are generated.
  • the change rule indicates the set of sensitive and non-sensitive attributes that cause the bias.
  • the value of the sensitive attribute indicated in the change rule is subject to change when correcting the bias. Details of the change rule generation process will be described later (see FIG. 10).
  • the sensitive attribute selection unit 130 selects a sensitive attribute to be changed for bias correction from among the sensitive attributes included in any of the generated change rules. Details of the sensitive attribute selection process will be described later (see FIG. 12).
  • the sensitive attribute changing unit 140 changes the value of the selected sensitive attribute in the input data 111 based on the change rule including the sensitive attribute. Details of the sensitive attribute change processing will be described later (see FIG. 14).
  • Step S ⁇ b>105 The sensitive attribute changing unit 140 stores the input data 111 with changed sensitive attribute values in the storage unit 110 as the training data 112 . Training data 112 is thus generated based on the input data 111 . Next, change rule generation processing will be described in detail.
  • FIG. 10 is a flowchart illustrating an example of the procedure of change rule generation processing. The processing shown in FIG. 10 will be described below along with the step numbers.
  • the change rule generation unit 120 selects one unselected set from the sets that can be generated with one sensitive attribute and one non-sensitive attribute.
  • the change rule generation unit 120 calculates a bias value for the selected pair. For example, the change rule generator 120 calculates the bias value “elift (A, B ⁇ C)” shown in Equation (1).
  • Step S113 The change rule generation unit 120 determines whether the calculated bias value is equal to or greater than the threshold value ⁇ . If the bias value is equal to or greater than the threshold ⁇ , change rule generation section 120 advances the process to step S114. Further, if the bias value is less than the threshold value ⁇ , the change rule generation unit 120 advances the process to step S115.
  • the change rule generation unit 120 generates a change rule for the selected pair. For example, the change rule generation unit 120 determines that the value of the selected sensitive attribute is other than the predetermined value (flag "0"), the value of the selected insensitive attribute is the predetermined value (flag "1"), and the value of the class label is other than the predetermined value. A change rule is generated for the data of (flag "0") to be changed. The generated change rule indicates that the value of the sensitive attribute should be changed to a predetermined value (flag "1"). The change rule generation unit 120 registers the generated change rule in the change rule list.
  • Step S115 The change rule generator 120 determines whether there is an unselected combination of a sensitive attribute and a non-sensitive attribute. If there is an unselected pair, change rule generation unit 120 advances the process to step S111. Further, if all pairs have been selected, the change rule generation unit 120 ends the process.
  • bias values are calculated for all pairs of sensitive attributes and non-sensitive attributes, and modification rules corresponding to pairs whose bias values are equal to or greater than the threshold ⁇ are generated. That is, change rules for correcting each bias included in the input data 111 are generated.
  • the generated change rule is shown, for example, in a change rule list.
  • FIG. 11 is a diagram showing an example of a change rule list.
  • the change rule list 51 shown in FIG. 11 ten change rules including the value "male” of the sensitive attribute "sex" are registered.
  • the change rule list 51 one change rule including the value "white” of the sensitive attribute "race” is registered.
  • FIG. 12 is a flowchart showing an example of the procedure of sensitive attribute selection processing. The processing shown in FIG. 12 will be described below along with the step numbers.
  • the sensitive attribute selection unit 130 selects one unselected sensitive attribute value included in one of the change rules from the change rule list 51 .
  • the sensitive attribute selection unit 130 determines whether there is an unselected sensitive attribute value in the change rule list 51 or not. If there is an unselected sensitive attribute value, the sensitive attribute selection unit 130 advances the process to step S121. If all sensitive attribute values have been selected, sensitive attribute selection section 130 advances the process to step S124.
  • the sensitive attribute selection unit 130 identifies the value of the sensitive attribute with the highest appearance frequency as the value of the sensitive attribute to be applied to the change for bias correction.
  • the sensitive attribute selection unit 130 extracts from the change rule list 51 a change rule that includes the value of the specified sensitive attribute.
  • the sensitive attribute selection unit 130 registers, for example, the extracted change rule in the applicable change rule list.
  • FIG. 13 is a diagram showing an example of an applied change rule list.
  • FIG. 14 is a flowchart illustrating an example of the procedure of sensitive attribute change processing. The processing shown in FIG. 14 will be described below according to the step numbers.
  • the sensitive attribute changing unit 140 selects an unselected change rule in the applied change rule list 52 as a change rule to be applied.
  • the sensitive attribute changing unit 140 changes the value of the sensitive attribute of the selected data according to the selected change rule.
  • the sensitive attribute change unit 140 changes the value of the sensitive attribute indicated by the change rule in the selected data from a value other than the predetermined value (flag "0") to a predetermined value (flag "1").
  • Step S134 The sensitive attribute changing unit 140 determines whether the bias value of the bias to be changed in the selected change rule is less than the threshold ⁇ . If the bias value is less than the threshold value ⁇ , the sensitive attribute changing unit 140 advances the process to step S135. Also, if the bias value is equal to or greater than the threshold ⁇ , the sensitive attribute changing unit 140 advances the process to step S132.
  • Step S135 The sensitive attribute change unit 140 determines whether or not there is an unselected change rule in the applied change rule list 52. If there is an unselected change rule, the sensitive attribute change unit 140 advances the process to step S131. Also, if all the change rules in the applied change rule list 52 have been selected, the sensitive attribute change unit 140 ends the sensitive attribute change process.
  • the input data 111 in which the value of the sensitive attribute has been changed in this manner is stored in the storage unit 110 as the training data 112 .
  • FIG. 15 is a diagram showing an example of generated training data. Similar to the input data 111, the training data 112 includes a plurality of data having sensitive attributes, non-sensitive attributes, and class label values. The values set for each data are the same as the values of the input data 111 except for the values changed by the sensitive attribute change process. In the example of FIG. 15, for at least some of the data whose value of the sensitive attribute “sex” was other than “male” (flag “0”), the value of the sensitive attribute “sex” was “male” (flag “1”). ) has been changed.
  • the third embodiment performs bias correction processing in consideration of the difference in interpretability for each sensitive attribute.
  • Sensitive attributes include those that clearly cause unfairness when biased, and those that are difficult to assert as being unfair.
  • the user can set a value indicating the ease of interpretation for each sensitive attribute. For example, a higher value is set for the interpretability of a sensitive attribute that is more interpretable when changed for bias correction.
  • the interpretability for each sensitive attribute is set, for example, in an interpretability management table.
  • FIG. 16 is a diagram showing an example of the interpretability management table.
  • the interpretability management table 61 values indicating interpretability are set in association with sensitive attribute values.
  • the sensitive attribute selection unit 130 weights the appearance frequency of the sensitive attribute in the change rule list 51 according to the interpretability value. For example, the sensitive attribute selection unit 130 takes the result of multiplying the appearance frequency of the sensitive attribute by the value (weight value) of the easiness of interpretation of the sensitive attribute as the degree of importance. Then, the sensitive attribute selection unit 130 selects the sensitive attribute with the highest importance as the sensitive attribute to be changed.
  • FIG. 17 is a flow chart showing an example of the procedure of sensitive attribute selection processing according to importance. The processing shown in FIG. 17 will be described below along with the step numbers. [Step S ⁇ b>201 ] The sensitive attribute selection unit 130 selects one unselected sensitive attribute value included in one of the change rules from the change rule list 51 .
  • the sensitive attribute selection unit 130 counts the appearance frequency in the change rule list 51 of the value of the selected sensitive attribute.
  • the sensitive attribute selection unit 130 calculates the importance of the value of the selected sensitive attribute. For example, the sensitive attribute selection unit 130 acquires the interpretability value of the selected sensitive attribute value from the interpretability management table 61 . Then, the sensitive attribute selection unit 130 sets the result of multiplication of “appearance frequency ⁇ value of ease of interpretation” as the importance of the value of the selected sensitive attribute.
  • Step S204 The sensitive attribute selection unit 130 determines whether there is an unselected sensitive attribute value in the change rule list 51 or not. If there is an unselected sensitive attribute value, the sensitive attribute selection unit 130 advances the process to step S201. If all sensitive attribute values have been selected, the sensitive attribute selection unit 130 advances the process to step S205.
  • the sensitive attribute selection unit 130 specifies the value of the sensitive attribute with the highest degree of importance as the value of the sensitive attribute to be applied to the change for correcting the bias.
  • the sensitive attribute selection unit 130 extracts from the change rule list 51 a change rule that includes the value of the specified sensitive attribute.
  • the sensitive attribute selection unit 130 registers, for example, the extracted change rule in the applied change rule list 52 .
  • FIG. 18 is a diagram showing an example of sensitive attribute selection according to importance.
  • the sensitive attribute selection unit 130 selects, for example, a predetermined number of sensitive attribute values in descending order of appearance frequency in the change rule list 51 as sensitive attribute values to which bias correction processing is applied.
  • the ratio of the data whose value of the sensitive attribute "gender” is other than “male” to the value of "acceptable” for the class label "acceptable” increases, and the bias is corrected.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention supprime une réduction des caractéristiques d'interprétation d'une correction de polarisation par rapport à des données d'apprentissage. Parmi une pluralité d'éléments de données comportant chacun une pluralité d'attributs, un dispositif de traitement d'informations (10) identifie une combinaison de valeurs pour un ou plusieurs premiers attributs ayant une polarisation de données d'une valeur seuil ou supérieure, d'après le nombre d'éléments de données correspondant à une combinaison de valeurs d'attribut. De plus, le dispositif de traitement d'informations (10) sélectionne une valeur d'attribut identifiée en fonction du nombre de combinaisons dans lesquelles la valeur de chaque attribut est incluse, parmi les combinaisons de valeurs du ou des premiers attributs. Le dispositif de traitement d'informations (10) génère des données d'apprentissage (2) en modifiant les valeurs d'attribut d'un ou de plusieurs éléments de données parmi la pluralité d'éléments de données conformément à la condition selon laquelle la polarisation de données est inférieure à la valeur seuil, la polarisation de données concernant une combinaison de valeurs pour un ou plusieurs seconds attributs comprenant une valeur d'attribut spécifique, parmi les combinaisons de la ou des premières combinaisons de valeurs d'attribut.
PCT/JP2022/007230 2022-02-22 2022-02-22 Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations Ceased WO2023161993A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/007230 WO2023161993A1 (fr) 2022-02-22 2022-02-22 Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/007230 WO2023161993A1 (fr) 2022-02-22 2022-02-22 Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations

Publications (1)

Publication Number Publication Date
WO2023161993A1 true WO2023161993A1 (fr) 2023-08-31

Family

ID=87765170

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/007230 Ceased WO2023161993A1 (fr) 2022-02-22 2022-02-22 Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations

Country Status (1)

Country Link
WO (1) WO2023161993A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022500747A (ja) * 2018-09-10 2022-01-04 グーグル エルエルシーGoogle LLC 機械学習モデルを使用した、偏りのあるデータの拒否

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022500747A (ja) * 2018-09-10 2022-01-04 グーグル エルエルシーGoogle LLC 機械学習モデルを使用した、偏りのあるデータの拒否

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHE YU; JOYMALLYA CHAKRABORTY; TIM MENZIES: "FairBalance: How to Achieve Equalized Odds With Data Pre-processing", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 April 2023 (2023-04-26), 201 Olin Library Cornell University Ithaca, NY 14853, XP091493897 *

Similar Documents

Publication Publication Date Title
US10886025B2 (en) Drug adverse event extraction method and apparatus
US11687553B2 (en) System and method for generating analytical insights utilizing a semantic knowledge graph
JPWO2013125482A1 (ja) 文書評価装置、文書評価方法、及びプログラム
CN111125529A (zh) 产品匹配方法、装置、计算机设备及存储介质
US20220335315A1 (en) Application of local interpretable model-agnostic explanations on decision systems without training data
US7716145B2 (en) System for supporting user's behavior
US20250086432A1 (en) Modified inputs for artificial intelligence models
JP6930195B2 (ja) モデル同定装置、予測装置、監視システム、モデル同定方法および予測方法
JP7733300B2 (ja) アンケート結果分析プログラム、アンケート結果分析方法、および情報処理装置
WO2023161993A1 (fr) Programme de génération de données d'apprentissage, procédé de génération de données d'apprentissage et dispositif de traitement d'informations
EP4350585A1 (fr) Programme d'apprentissage automatique, procédé d'apprentissage automatique et dispositif d'apprentissage automatique
JPWO2013105404A1 (ja) 信頼度計算装置、信頼度計算方法、及びプログラム
Skarzyńska et al. Risks, failures, and ethical dilemmas of AI technologies and trust
US20230385690A1 (en) Computer-readable recording medium storing determination program, determination apparatus, and method of determining
JP7593473B2 (ja) 互換性評価装置、互換性評価方法、及び、プログラム
JP5826893B1 (ja) 変化点予測装置、変化点予測方法、及びコンピュータプログラム
WO2023064158A1 (fr) Prévision auto-adaptative pour des modèles d'apprentissage automatique de prévision multi-horizons
WO2021152801A1 (fr) Dispositif d'apprentissage, procédé d'apprentissage et support d'enregistrement
WO2021245850A1 (fr) Programme, dispositif et procédé d'aide au diagnostic
JP2007164346A (ja) 決定木変更方法、異常性判定方法およびプログラム
JP2021033392A (ja) 情報処理装置及び情報処理プログラム
JP7762251B2 (ja) 報告書評価方法および当該方法を実行する情報処理装置、プログラム
US20230244960A1 (en) Computer-readable recording medium having stored therein machine learning program, method for machine learning, and information processing apparatus
JP7603887B2 (ja) 情報処理装置、プログラム及び情報処理方法
US20080120263A1 (en) Computer-readable recording medium, apparatus and method for calculating scale-parameter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22928524

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22928524

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP