[go: up one dir, main page]

WO2025194342A1 - Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique - Google Patents

Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique

Info

Publication number
WO2025194342A1
WO2025194342A1 PCT/CN2024/082472 CN2024082472W WO2025194342A1 WO 2025194342 A1 WO2025194342 A1 WO 2025194342A1 CN 2024082472 W CN2024082472 W CN 2024082472W WO 2025194342 A1 WO2025194342 A1 WO 2025194342A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
variable
circuitry
evaluation
pidnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/082472
Other languages
English (en)
Inventor
Zheng SONG
Wenbin Tian
Haifeng GONG
Nishi AHUJA
Xiaoguo Liang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corp filed Critical Intel Corp
Priority to PCT/CN2024/082472 priority Critical patent/WO2025194342A1/fr
Publication of WO2025194342A1 publication Critical patent/WO2025194342A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This disclosure generally relates to neural networks and more particularly, but not exclusively, to operation of a proportional-integral-derivative neural network (PIDNN) for thermal regulation.
  • PIDNN proportional-integral-derivative neural network
  • FIG. 1 shows a block diagram illustrating a system to provide thermal regulation with a proportional-integral-derivative neural network (PIDNN) according to an embodiment.
  • PIDNN proportional-integral-derivative neural network
  • FIG. 2 shows a flow diagram illustrating features of a method to determine new weights of a PIDNN according to an embodiment.
  • FIGs. 3A, 3B show respective block diagrams each illustrating features of a controller to adapt weights of a PIDNN based on regularization terms according to an embodiment.
  • FIG. 4 shows a graph illustrating features of a PIDNN for which weights are variously adapted according to an embodiment.
  • FIG. 5 shows a flow diagram illustrating features of a method to operate a PIDNN which facilitates thermal regulation according to an embodiment.
  • FIG. 6 illustrates an exemplary system.
  • FIG. 7 illustrates a block diagram of an example processor that may have more than one core and an integrated memory controller.
  • FIG. 8A is a block diagram illustrating both an exemplary in-order pipeline and an exemplary register renaming, out-of-order issue/execution pipeline according to examples.
  • FIG. 8B is a block diagram illustrating both an exemplary example of an in-order architecture core and an exemplary register renaming, out-of-order issue/execution architecture core to be included in a processor according to examples.
  • FIG. 9 illustrates examples of execution unit (s) circuitry.
  • FIG. 10 is a block diagram of a register architecture according to some examples.
  • Embodiments discussed herein variously provide techniques and mechanisms for weights of a proportional-integral-derivative neural network to be dynamically updated based on regularization terms which are determined from an output of said proportional-integral-derivative neural network.
  • Fan Speed Control FSC
  • PID controllers have been utilized for FSC in servers and many other areas.
  • conventional PID controllers suffer from a few drawbacks, including difficulty in tuning, reduplicative tuning requirements for different hardware and environmental settings, and sensitivity to model uncertainties and disturbances. For at least these reasons, conventional PID controllers are ill-suited for adapting to varying operating conditions, such as fluctuating server workloads, different ambient temperatures and add-in card setups.
  • Embodiments described herein variously extend or otherwise adapt functionality of a neural network –referred to herein as a PID neural network (PIDNN) –which is operable to approximate or otherwise implement a PID controller function, wherein the PIDNN includes, is coupled to access, or otherwise operates in combination with circuitry which dynamically adjusts one or more operational parameters of the PIDNN.
  • PIDNN PID neural network
  • a PIDNN is operated (for example) in any of a wide range of applications to take advantage of both the characteristics of a PID controller circuit, and those of a neural network, such as self-learning ability, adaptability, and resilience.
  • PIDNN controllers are able to achieve better performance than traditional PID controllers –e.g., by detecting environmental characteristics and/or operational characteristics of a server system, and adapting thermal regulation to workload changes, temperature changes, or the like.
  • Some embodiments variously provide an improved weight modification algorithm whereby a PIDNN is less likely, as compared to conventional techniques, to overfit in long-term operation.
  • “overfit” refers to the tendency of a neural network to adapt one or more parameters too much and/or too often based on fluctuating input data. In many applications, such overfitting tends to contribute to instability in the control output (s) provided by such a neural network.
  • a conventional PIDNN controller may be able to maintain a system temperature at or near some target temperature, this maintaining is often inefficient in terms of power consumption, and/or exposes a fan (or other suitable thermal transfer device) to excessive wear.
  • some embodiments variously enable and/or utilize a regularization technique to prevent overfitting by conditionally incorporating penalty terms into a parameter update process for a neural network.
  • a PIDNN controller is able to continuously learn and adapt to any of a relatively wide variety of operational conditions.
  • embodiments various enable efficient and steady operation of a server (or other computer system) , even under operational conditions which may not have been considered during system design.
  • a PIDNN controller need not rely on prior knowledge of the thermal model or physical characteristics of the system for which thermal regulation is to be provided.
  • Such a PIDNN controller is more easily deployable and adaptable (for example) to any of various types of server platforms with different system architectures and hardware configurations.
  • Non-limiting examples of electronic devices that may utilize the technologies described herein include any kind of mobile device and/or stationary device, such as cameras, cell phones, computer terminals, desktop computers, electronic readers, facsimile machines, kiosks, laptop computers, netbook computers, notebook computers, internet devices, payment terminals, personal digital assistants, media players and/or recorders, servers (e.g., blade server, rack mount server, combinations thereof, etc. ) , set-top boxes, smart phones, tablet personal computers, ultra-mobile personal computers, wired telephones, combinations thereof, and the like. More generally, the technologies described herein may be employed in any of a variety of electronic devices including a PIDNN and/or circuitry which is operable to adapt weights of such a PIDNN.
  • FIG. 1 shows a system 100 which provides thermal regulation with a proportional-integral-derivative neural network (PIDNN) according to an embodiment.
  • System 100 illustrates features of one example embodiment wherein one or more PIDNNs each generate a respective control signal to operate a corresponding thermal transfer device (such as a fan, a blower, a thermoelectric cooler, or the like) .
  • the PIDNN includes, is coupled to, or otherwise operates based on hardware, firmware and/or executing software which is suitable to adapt one or more weights of the PIDNN.
  • system 100 comprises a device 102 which, for example, is any of various suitable computer devices including, but not limited to, a server, a laptop, a desktop, an electronic tablet, a hybrid or convertible personal computer (PC) , etc.
  • the device 102 provides thermal regulation for one or more components which are susceptible to generating heat and/or being heated, for example, by the environment and/or by one or more other components.
  • such one or more components include some or all of a display screen 103, a keyboard 104, one or more pointing devices 106, a processor 130, one or more image sensors 122, one or more motion sensors 123, one or more microphones 124 and/or the like.
  • device 102 provides thermal regulation for any of various additional or alternative components. However, some embodiments are not limited with respect to a particular type of component or components for which thermal regulation is to be controlled.
  • a keyboard is presented via display screen 103, and a user provides inputs on the keyboard by touching the screen.
  • the one or more pointing devices 106 comprise a mouse, a touchpad, or the like.
  • the keyboard 104 and the pointing device (s) 106 are carried by a housing the device 102 and accessible via an exterior surface of the housing and, thus, can be considered on-board user input devices for the device 102.
  • the device 102 includes image sensor (s) 122.
  • the image sensor (s) 122 of the device 102 include one or more cameras to capture image data of the surrounding environment in which the device 102 is located.
  • the image sensor (s) 122 include depth-sensing camera (s) .
  • the image sensor (s) 122 can be carried by a bezel of the display screen 103.
  • the example device 102 of FIG. 1 includes the motion sensor (s) 123.
  • the motion sensor (s) 123 can include, for example, infrared sensor (s) to detect user movements. Data generated by the motion sensor (s) 123 can be analyzed to identify gestures performed by the user of the device 102.
  • the motion sensor (s) 123 can be carried by the device 102 proximate to, for example, a touchpad of the device 102, a bezel of the display screen 103, etc. so as to detect user motion (s) occurring proximate to the device 102.
  • the device 102 includes the microphone (s) 124 to detect sounds in an environment in which the device 102 is located.
  • the microphone (s) 124 can be carried by the device 102 at one or more locations, such as on a lid of the device 102, on a base of the device 102 proximate to the keyboard 104, etc.
  • the device 102 additionally or alternatively includes one or more external devices communicatively coupled to the device 102, such as an external keyboard 108, external pointing device (s) 110 (e.g., wired or wireless mouse (s) ) , and/or headphones 112.
  • the external keyboard 108, the external pointing device (s) 110, and/or the headphones 112 can be communicatively coupled to the device 102 via one or more wired or wireless connections.
  • an external keyboard 108, the external pointing device (s) 110, and/or the headphones 112 can be communicatively coupled to the device 102 via one or more wired or wireless connections.
  • the device 102 includes one or more device configuration sensors 120 that provide means for detecting whether user input (s) are being received via the external keyboard 108 and/or the external pointing device (s) 110 and/or whether output (s) (e.g., audio output (s) ) are being delivered via the headphones 112 are coupled to the device 102.
  • the device configuration sensor (s) 120 detect a wired connection of one or more of the external devices 108, 110, 112 via a hardware interface (e.g., USB port, etc. ) .
  • the device configuration sensor (s) 120 detect the presence of the external device (s) 108, 110, 112 via wireless connection (s) (e.g., Bluetooth) .
  • the device configuration sensor (s) 120 include accelerometers to detect an orientation of the device 102 (e.g., tablet mode) and/or sensor (s) to detect an angle of, for instance, a screen of a laptop (e.g., facing the laptop base, angled away from the base, etc. ) .
  • the senor (s) 120, 122, 123, 124, 126 can transmit data to a cloud-based device 129 (e.g., one or more server (s) , processor (s) , and/or virtual machine (s) ) .
  • a cloud-based device 129 e.g., one or more server (s) , processor (s) , and/or virtual machine (s) .
  • processor 130 executes software to interpret and output response (s) based on one or more user input event (s) (e.g., touch event (s) , keyboard input (s) , etc. ) .
  • the device 102 of FIG. 1 includes, or accommodates coupling to, one or more power sources 116 such as a battery to provide power to the processor 130 and/or other components of the device 102 which are communicatively coupled to power source (s) 116–e.g., via one or more power busses (not shown) .
  • the hardware components of the device 102 generate heat during operation of the device 102.
  • the example device 102 includes temperature sensor (s) 126 to measure temperature (s) associated with the hardware component (s) of the device 102.
  • the temperature sensor (s) 126 measure a temperature within a housing of device 102, a temperature of a skin of the housing of the device 102, and/or an exterior surface of the user device that can be touched by a user (e.g., a base of a laptop) (the terms “user” and “subject” are used interchangeably herein and both refer to a biological creature such as a human being) .
  • the temperature sensor (s) 126 can be disposed in the housing of the device 102 proximate to the skin (e.g., coupled to a side of the housing opposite the side of the housing that is visible to the user) .
  • the temperature sensor (s) 126 can include one or more thermometers.
  • device 102 comprises one or more thermal transfer devices, such as the illustrative one or more fans 160 shown, which facilitate conductive and/or other cooling.
  • the fan (s) 160 provide means for cooling and/or regulating the temperature of the hardware component (s) (e.g., the processor 130) of the device 102 in response to temperature data generated by the temperature sensor (s) 126.
  • operation of the fan(s) 160 is controlled in view of one or more thermal constraints for the device 102 that define temperature settings for the hardware component (s) of the device 102 and/or a skin temperature of the device 102.
  • operation of the fan (s) 160 is responsive to a controller 140 which includes, is coupled to, or otherwise operates based on a proportional-integral-derivative neural network (PIDNN) 144.
  • PIDNN proportional-integral-derivative neural network
  • parameters (e.g., weights) of PIDNN 144 are dynamically updated during operation of device 102 based on the contribution by a proportional term and/or by a derivative term to a control signal which is output by PIDNN 144 –e.g., wherein the control signal is to communicate the variable v shown.
  • PIDNN 144 comprises an input layer, an output layer, and one or more hidden layers which include nodes variously coupled between the input layer and the output layer. PIDNN 144 receives input information via the one or more nodes of the input layer, and outputs a fan control signal (for example) based on said input information.
  • the input layer comprises a node R and a node Y which are configured to receive (respectively) an indication of a target thermal condition, and an indication of a detected thermal condition.
  • a variable y communicated to node Y, specifies or otherwise indicates temperature and/or other suitable thermal condition –e.g., as detected by the temperature sensor (s) 126 –of such one or more components of device 102.
  • a value r is communicated to node R, wherein r specifies or otherwise indicates a target thermal condition (such as a target temperature) at which a given one or more components of device 102 are to be maintained with the fan (s) 160.
  • the value r is maintained at (and, for example, provided from) a repository 142 such as one or more memories, registers, and/or any of various other suitable resources of device 102.
  • a repository 142 such as one or more memories, registers, and/or any of various other suitable resources of device 102.
  • some embodiments are not limited with respect to the source from which, and/or the basis one which, value r is determined and provided to controller 140.
  • one or more nodes in the hidden layer (s) of PIDNN 144 –e.g., including the illustrative node P shown – provide proportional calculation functionality.
  • a node V in the output layer of PIDNN 144 is coupled to receive a proportional term, an integral term, and a derivative term from the one or more hidden layers.
  • PIDNN 144 The particular number and configuration of the nodes of PIDNN 144 is merely illustrative, and not limiting on various embodiments.
  • PIDNN 144 is shown as comprising only the six nodes R, Y, P, I, D, V, functionality of a given one node of such a neural network is implemented with multiple nodes, in other embodiments.
  • node R generates a signal s 11 which is based on the value r and which corresponds to a weight w 11 , wherein signal s 11 is communicated from node R to node P.
  • node R determines a first value based on the value r, wherein signal s 11 represents a product of that first value and the weight w 11 .
  • a weight is applied by a receiver node, rather than a transmitter node.
  • the signal s 11 represents the first value, wherein node P applies the weight w 11 to signal s 11 (e.g., by multiplying the represented first value by weight w 11 ) .
  • node R further generates a signal s 12 which is based on the value r, and which corresponds to (e.g., which is a product of, or is to be multiplied by) a weight w 12 , wherein signal s 12 is communicated from node R to node I.
  • Node R further generates a signal s 13 which is based on the value r, and which corresponds to (e.g., which is a product of, or is to be multiplied by) a weight w 13 , wherein signal s 13 communicated from node R to node D.
  • controller 140 provides functionality to dynamically update one or more weights of PIDNN 144. To prevent or otherwise mitigate the possibility of variable v exhibiting characteristics of overfitting, controller 140 calculates one or more terms –referred to herein as “regularization terms” –which mitigate the chance of a given weight being changed by an excessive amount over time (or being changed at an excessive rate, in some embodiments) . Such updating of weights is provided, for example, with an evaluation unit 146 and an adjustment unit 148 of controller 140. In various embodiments, functionality such as that of evaluation unit 146 and/or adjustment unit 148 is implemented with any of various suitable types of hardware, firmware and/or executing software.
  • evaluation unit 146 and adjustment unit 148 are implemented with integrated circuitry or, in another embodiment, with a software process that is executed with processor 130 (or any other suitable processor resource of device 102) . It is to be noted that some embodiments are implemented solely with logic which implements functionality of evaluation unit 146 and adjustment unit 148 –e.g., wherein such embodiments omit some or all other features of controller 140, device 102, or system 100.
  • a node of a PIDNN applies a weight to generate a signal which is then communicated to another node of said PIDNN.
  • a cost metric is determined based on a regularization term which is indicative of a characteristic (e.g., a volatility) of the signal.
  • a PIDNN node receives a signal from another PIDNN node, and applies a weight to that received signal.
  • a cost metric is determined based on a regularization term which is, more particularly, indicative of a characteristic of a combination of –e.g., a product of –the signal and the weight.
  • evaluation unit 146 determines the value of a regularization term Tp which (for example) is indicative of a volatility of the signal s1 –e.g., by indicating a volatility of the product (s1 ⁇ w1) –and is further indicative of a stability of variable y.
  • determining a value of the regularization term Tp comprises evaluation unit 146 conditionally setting the regularization term Tp to be equal to a calculated value which is based on both a square of the weight w 21 and a square of the weight w1.
  • evaluation unit 146 conditionally selects between setting the regularization term Tp to be equal to the calculated value or some baseline value –e.g., zero (0) .
  • evaluation unit 146 performs a first evaluation to detect whether a first test criteria (e.g., a condition for setting the regularization term Tp to be equal to the calculated value) has been met.
  • the first test criteria includes or is otherwise based on both a metric of a volatility of the signal s1 (which, in some embodiments, correspond to a volatility of a product (s1 ⁇ w1) ) , and a metric of a stability of the variable y.
  • the first evaluation determines whether a change (if any) to the signal s1 is less than some first threshold amount, and further determines whether a magnitude of a change (if any) to the variable y is less than a second threshold amount.
  • evaluation unit 146 additionally or alternatively determines the value of a regularization term Td which (for example) is indicative of a volatility of the variable v.
  • determining a value of the regularization term Td comprises evaluation unit 146 conditionally setting the regularization term Td to be equal to another calculated value which is based on both a square of the weight w 23 and a square of the weight w3.
  • evaluation unit 146 conditionally selects between setting the regularization term Td to be equal to the other calculated value or some baseline value –e.g., zero (0) .
  • evaluation unit 146 performs a second evaluation to detect whether a second test criteria (e.g., a condition for setting the regularization term Td to be equal to the other calculated value) has been met.
  • the second test criteria includes or is otherwise based on a metric of a volatility of the signal v.
  • the second evaluation determines whether a spike of the signal v (if any) is sufficiently quick, and/or is sufficiently large.
  • evaluation unit 146 calculates the value of a cost metric J based on the respective values of such regularization terms Tp, Td –e.g., wherein the value of cost metric J is further based on a difference between the value r, and the variable y.
  • determining a value of the cost metric J comprises evaluation unit 146 calculating an average of multiple (e.g., previously sampled, calculated, or otherwise determined) cost terms which each comprise a sum of a respective value of the regularization term Tp, a respective value of the regularization term Td, and a square of a difference between the target thermal condition a respective detected thermal condition. After calculation by evaluation unit 146, the value of cost metric J is communicated to adjustment unit 148 of controller 140.
  • adjustment unit 148 determines one or more adjustments each to be made to a respective weight of PIDNN 144.
  • an adjustment of a given weight is determined by adjustment unit 148 based on a gradient of the cost metric J with respect to that particular given weight.
  • adjustment unit 148 calculates an adjustment ⁇ w 11 to be made to weight w 11 , wherein adjustment ⁇ w 11 is based on a gradient of the cost metric J with respect to the weight w 11 .
  • adjustment unit 148 calculates an adjustment ⁇ w 12 to be made to weight w 12 based on a gradient of the cost metric J with respect to the weight w 12 .
  • Adjustment unit 148 calculates any of various additional weight adjustments, in some embodiments.
  • adjustment unit 148 outputs one or more signals to variously implement one or more weight adjustments each at a respective node of PIDNN 144.
  • FIG. 2 shows a method 200 for determining new weights of a PIDNN according to an embodiment.
  • Operations such as those of method 200 are performed with any of various combinations of suitable hardware (e.g., circuitry) , firmware and/or executing software which, for example, provide some or all of the functionality of device 102.
  • method 200 is performed at least with evaluation unit 146 and adjustment unit 148, in an embodiment.
  • method 200 comprises (at 210) determining a value of a control variable v which is calculated by an output node V of a PIDNN based on each of variables s1, s3.
  • evaluation unit 146 is coupled to receive, snoop or otherwise detect the value of the control variable v generated by PIDNN 144.
  • the variable s1 represents a proportional term
  • the variable s3 represents a derivative term of the proportional-integral-derivative calculation.
  • Method 200 further comprises (at 212) determining weights w1, w3, w21, w23 which correspond to respective variables s1, s3, s21, s23.
  • an input layer of the PIDNN calculates respective values of variables s21, s23 (e.g., those of PIDNN 144) based on a variable y which indicates a detected thermal condition.
  • a hidden layer of the PIDNN calculates respective values of variables s1, s3 based on variables s21, s23.
  • Method 200 further comprises (at 214) performing a first evaluation based on a first metric of volatility of the variable s1.
  • the first evaluation is to determine whether, according to some predetermined first test criteria, the variable s1 has undergone some sufficiently large decrease while the detected thermal condition, indicated by variable y, has remained sufficiently stable.
  • method 200 sets a regularization term Tp to be equal to a first value which, for example, is based on each of the weights w21, w1.
  • the first value is based on both a square of the weight w21 and a square of the weight w1.
  • Method 200 further comprises (at 218) performing a second evaluation with a second metric of volatility of the variable v.
  • the second evaluation is to determine whether, according to some predetermined second test criteria, the variable v has exhibited a spike which (for example) is sufficiently short in duration and/or sufficiently large in magnitude.
  • method 200 sets a regularization term Td to be equal to a second value which is based on each of the weights w23, w3.
  • the second value is based on both a square of the weight w23 and a square of the weight w.
  • Method 200 further comprises (at 224) adjusting one or more weights of the PIDNN based on the value of a cost metric J.
  • adjusting the one or more weights at 224 comprises changing a first weight of the PIDNN based on a gradient of the cost metric J with respect to the first weight.
  • method 200 further comprises operating one or more fans based on the value of the variable v –e.g., wherein the variable v is provided to a pulse width modulator circuit with which a speed of a given fan is regulated.
  • FIGs. 3A, 3B variously show features of a controller 300 which adapts weights of a PIDNN based on regularization terms according to an embodiment. More particularly, FIG. 3A illustrates a forward propagation of information in a PIDNN 310 of the controller 300, whereas FIG. 3B shows a view 301 of a backward propagation which updates one or more parameters of the PIDNN 310. Controller 300 illustrates one example embodiment wherein adjustments to neural network weights are various determined based on a cost term which, in turn, is calculated based on one or more regularization terms. In some embodiments, controller 300 provides functionality such as that of controller 140 –e.g., wherein operations of method 200 are performed with some or all of controller 300.
  • controller 300 comprises a PIDNN 310, an evaluation unit 320, and an adjustment unit 330 which (for example) correspond functionally to PIDNN 144, evaluation unit 146, and adjustment unit 148, respectively.
  • an input layer of PIDNN 310 comprises nodes (or “neurons” ) R, Y, wherein a hidden layer of PIDNN 310 comprises nodes P, I, D, and an output layer of PIDNN 310 comprises nodes V.
  • controller 300 is implemented with executing software logic –e.g., wherein a given node of PIDNN 310 (or, for example, evaluation unit 320 and/or adjustment unit 330) is implanted by the execution of a subroutine and/or any of various other suitable software processes.
  • some or all of controller 300 is implemented with an application-specific integrated circuit.
  • evaluation unit 320 and adjustment unit 330 are implemented in PIDNN 310, or as part of a larger neural network which includes PIDNN 310.
  • nodes R and Y of PIDNN 310 receive (respectively) a value r which represents a target thermal condition, and a variable y which represents a thermal condition such as one detected by the temperature sensor (s) 126. Based on value r, node R generates variables s 11 [h] , s 12 [h] , s 13 [h] which correspond to respective weights w 11 [h] , w 12 [h] , w 13 [h] –e.g., wherein variables s 11 [h] , s 12 [h] , s 13 [h] are communicated via signals s 11 , s 12 , s 13 , respectively. In one such embodiment, variables s 11 [h] , s 12 [h] , s 13 [h] are communicate to nodes P, I, and D (respectively) of PIDNN 310.
  • node Y Based on variable y, node Y generates respective variables s 21 [h] , s 22 [h] , s 23 [h] which correspond to weights w 21 [h] , w 22 [h] , w 23 [h] (respectively) –e.g., wherein variables s 21 [h] , s 22 [h] , s 23 [h] are communicated via signals s 21 , s 22 , s 23 , respectively. In one such embodiment, wherein variables s 21 [h] , s 22 [h] , s 23 [h] are communicate to nodes P, I, and D (respectively) of PIDNN 310.
  • a given node applies a weight to determine a value of a corresponding variable which is to be communicated from that node.
  • node R applies the weight w 11 [h] to the value r (e.g., by multiplying r by w 11 [h] ) to generate variable s 11 [h] .
  • node Y applies the weight w 21 [h] to the variable y (e.g., by multiplying y by w 21 [h] ) to generate variable s 21 [h] .
  • a given node applies a weight to a variable which has been received from another node.
  • variable s 11 [h] is equal to or otherwise represents the value r, wherein node P applies the weight w 11 [h] to the value r by multiplying variable s 11 [h] by the weight w 11 [h] .
  • variable s 21 [h] is equal to or otherwise represents the value y, wherein node P applies the weight w 21 [h] to the value y by multiplying variable s 21 [h] by the weight w 21 [h] .
  • variables s 12 [h] , s 13 [h] is equal to, or otherwise indicates, the value r –e.g., wherein variable s 12 [h] is equal to a product of r and weight w 12 [h] , and/or wherein variable s 13 [h] is equal to a product of r and weight w 13 [h] .
  • variables s 22 [h] , s 23 [h] is equal to, or otherwise indicates, the variable y –e.g., wherein variable s 22 [h] is equal to a product of y and weight w 22 [h] , and/or wherein variable s 23 [h] is equal to a product of y and weight w 23 [h] .
  • the hidden layer of PIDNN 310 generates respective variables s 11 [o] , s 21 [o] , s 31 [o] which correspond to weights w 11 [o] , w 21 [o] , w 31 [o] (respectively) –i.e., wherein node P generates variable s 11 [o] based on variables s 11 [h] , s 21 [h] , node I generates variable s 21 [o] based on variables s 12 [h] , s 22 [h] , and node D generates variable s 31 [o] based on variables s 13 [h] , s 23 [h] .
  • the node P's activation function mimics a proportional function, as shown in (1) .
  • u and x are the input and output of a node respectively
  • j indicates the specific jth neuron in each layer
  • k represents the sample time.
  • node I’s activation function mimics the integral function, as shown in (2) .
  • a cost function J includes, is based on, or is otherwise corresponds to equation (4) :
  • calculation logic 322 determines whether a change (if any) to the variable s 11 [o] –the change indicated by the product w 11 [o] ⁇ x 1 [h] –is less than (or equal to, in some embodiments) the threshold v th2 .
  • some embodiments comprise calculation logic 324 of evaluation unit 320 conditionally setting the regularization term to be equal to a value which is based on both a square of the weight w 23 [h] and a square of the weight w 31 [o] .
  • calculation logic 324 performs an evaluation of a test criteria which is based on a metric of a volatility of the variable v.
  • calculation logic 324 evaluates whether a spike (if any) of the variable v is detected –e.g., by determining whether a duration of said spike is sufficiently short and/or determining whether a magnitude of said spike is sufficiently short.
  • a graph 400 includes various plots to illustrate the control of a fan with a PIDNN, for which weights are updated according to an embodiment.
  • graph 400 includes a first plot which represents a target temperature 410 (e.g., represented by the value r) to be maintained with a fan which is controlled with a PIDNN.
  • Graph 400 further shows a second plot which represents a sensed temperature 420 (e.g., represented by the variable y) of one or more components for which thermal regulation is provided with the fan.
  • graph 400 shows a third plot which represents a contribution 430, by a proportional (P) term, to a control signal which is generated with a PIDNN.
  • P proportional
  • graph 400 shows a fourth plot which represents a pulse width modulation (PWM) 440 as determined, for example, by a control signal v which is output by the PIDNN.
  • PWM pulse width modulation
  • a point 450 at which contribution 430 is sampled corresponds a sample time (k -n)
  • another point 452 at which contribution 430 is sampled corresponds to some later sample time (k)
  • the sample points 450, 452 indicate a volatility of the contribution 430.
  • contribution 430 indicates a steep increase in the variable s 11 [o] at sample point 450 –e.g., the increase indicated by the product w 11 [o] ⁇ x 1 [h] in equation (5) . This steep increase is based on an excessively large value of weight w 21 [h] and an excessively large value of weight w 11 [o] .
  • Some embodiments thus adjust the respective values of weights w 21 [h] , w 11 [o] when a subsequent steep decrease of contribution 430 is detected.
  • contribution 430 indicates a steep drop in the variable s 11 [o] at sample point 452.
  • This steep drop e.g., in combination with the steep increase at sample point 450 –is indicative of both the contribution 430 and the variable v (as indicated by PWM 440) being too high in the near past.
  • some embodiments are variously able to perform a detection, using the threshold parameters v th1 , v th2 , v th3 and v th4 , of over-tuning of weights on the P and D path –e.g., wherein the over-tuning makes a given weight too big or too small.
  • a regularization term for weights w 11 [h] , w 13 [h] is not necessary in the calculation of cost function J –e.g., where the value r is a constant parameter, which results in a gradient of J with respect to w 11 [h] and w 13 [h] always being zero.
  • evaluation of some or all regularization terms is performed independent of the weights w 22 [h] and w 21 [o] on the I node path –e.g., wherein overfitting (if any) over these weights is relatively insignificant.
  • calculation logic 322 and calculation logic 324 communicate regularization terms ⁇ , (respectively) to calculation logic 326 of evaluation unit 320.
  • calculation logic 326 is further coupled to determine value r and the current value of variable y. Based on regularization terms ⁇ , value r and variable y, calculation logic 326 determines a value of the cost function J –e.g., according to equation (4) above.
  • backpropagation includes or is otherwise based on a determination of one or more weight adjustments, which in turn are based on a value of a cost metric J, such as one determined by calculation logic 326 according to equation (4) above.
  • the value of cost metric J is communicated to adjustment unit 330, which is further configured to determine a first set ⁇ w ab [h] ⁇ of some or all of the weights which each correspond to a respective signal communicated between the input layer and the hidden layer of PIDNN 310.
  • adjustment unit 330 further determines a second set ⁇ w c1 [h] ⁇ of some or all of the weights which each correspond to a respective signal communicated between the hidden layer and the output layer of PIDNN 310
  • adjustment unit 330 determines one or more weight adjustments to minimize or otherwise reduce the amount of the cost metric J.
  • some or all network weights are changed by a modified stochastic gradient descent algorithm (SGD) , wherein respective gradients of the cost metric J, each with respect to a corresponding one of the weights w ih [h] and w ho [o] , are obtained to facilitate a back propagation algorithm.
  • SGD stochastic gradient descent algorithm
  • the weights w ho [o] between the hidden later and the output layer are as follows:
  • weights w ih [h] between the input layer and the hidden layer are as follows:
  • ⁇ ho , ⁇ ih are respective learning rates for each network weight.
  • adjustment unit 330 communicates to PIDNN 310 that an adjustment ⁇ w 11 [h] is to be made to weight w 11 [h] based on a gradient of the cost metric J with respect to the weight w 11 [h] .
  • adjustment unit 330 communicates to PIDNN 310 that an adjustment ⁇ w 12 [h] is to be made to weight w 12 [h] based on a gradient of the cost metric J with respect to the weight w 12 [h] .
  • adjustment unit 330 communicates to PIDNN 310 that an adjustment ⁇ w 13 [h] is to be made to weight w 13 [h] based on a gradient of the cost metric J with respect to the weight w 13 [h] .
  • adjustment unit 330 identifies to controller 300 that an adjustment ⁇ w 21 [h] is to be made to weight w 21 [h] based on a gradient that an adjustment ⁇ w 22 [h] is to be made to weight w 22 [h] based on a gradient and/or that an adjustment ⁇ w 23 [h] is to be made to weight w 23 [h] based on a gradient
  • adjustment unit 330 identifies to controller 300 that an adjustment ⁇ w 11 [o] is to be made to weight w 11 [o] based on a gradient of the cost metric J with respect to the weight w 11 [o] , that an adjustment ⁇ w 21 [o] is to be made to weight w 21 [o] based on a gradient of the cost metric J with respect to the weight w 21 [o] , and/or that an adjustment ⁇ w 31 [o] is to be made to weight w 31 [o] based on a gradient of the cost metric J with respect to the weight
  • FIG. 5 shows a method 500 for operating a PIDNN which facilitates thermal regulation according to an embodiment.
  • Operations such as those of method 500 are performed with hardware, firmware and/or executing software which, for example, provides functionality such as that of controller 140, or controller 300.
  • method 500 is performed at least with evaluation unit 146 and adjustment unit 148, in an embodiment –e.g., wherein method 500 includes operations of method 200.
  • Method 500 further comprises (at 516) determining –based on the first evaluation at 514 –whether a first volatility condition is satisfied by the variable s1. Where it is determined at 516 that the first volatility condition is satisfied, method 500 (at 518) sets a regularization term Tp to a first value –e.g., the value ( ⁇ 1 w 21 [h] 2 + ⁇ 2 w 11 [o] 2 ) in equation (5) above –which is based on each of a square of the weight w21 and a square of the weight w1. Where it is instead determined at 516 that the first volatility condition is not satisfied, method 500 (at 520) sets the regularization term Tp to be equal to zero (0) , or some other suitable baseline value.
  • a first value e.g., the value ( ⁇ 1 w 21 [h] 2 + ⁇ 2 w 11 [o] 2 ) in equation (5) above –which is based on each of a square of the weight w21 and a square of the weight w1.
  • Method 500 further comprises (at 522) performing a second evaluation based on a second metric of volatility of the variable v.
  • the second evaluation is to detect for a second volatility condition wherein the variable v exhibits a spiking characteristic during a time when a detected thermal condition, indicated by a variable y, has remained sufficiently stable.
  • Method 500 further comprises (at 524) determining –based on the second evaluation at 522 –whether variable v satisfies a second volatility condition while the temperature variable y satisfies a stability condition.
  • method 500 sets the regularization term Td to a second value —e.g., the value ( ⁇ 3 w 23 [h] 2 + ⁇ 4 w 31 [o] 2 ) in equation (6) above –which is based on each of a square of the weight w23 and a square of the weight w3.
  • method 500 sets the regularization term Tp to be equal to zero (0) or some other suitable baseline value.
  • Method 500 further comprises (at 530) calculating a value of a cost metric J based on each of the regularization terms Tp, Td –e.g., wherein said calculating is similar to that indicated by equation (4) above.
  • the calculating at 530 comprises calculating an average of cost sample terms which each comprise a sum of a respective value of the regularization term Tp, a respective value of the regularization term Td, and a square of a difference between the value r and a respective value of the variable y.
  • Method 500 further comprises (at 532) adjusting one or more weights of the PIDNN, wherein each such weight w x is adjusted based on a respective gradient of the cost metric J.
  • method 500 performs a next instance of the determining at 510 –e.g., as part of a next cycle of multiple successive weight adjustment cycles.
  • FIG. 6 illustrates an exemplary system.
  • Multiprocessor system 600 is a point-to-point interconnect system and includes a plurality of processors including a first processor 670 and a second processor 680 coupled via a point-to-point interconnect 650.
  • the first processor 670 and the second processor 680 are homogeneous.
  • first processor 670 and the second processor 680 are heterogenous.
  • the exemplary system 600 is shown to have two processors, the system may have three or more processors, or may be a single processor system.
  • Processors 670 and 680 are shown including integrated memory controller (IMC) circuitry 672 and 682, respectively.
  • Processor 670 also includes as part of its interconnect controller point-to-point (P-P) interfaces 676 and 678; similarly, second processor 680 includes P-P interfaces 686 and 688.
  • Processors 670, 680 may exchange information via the point-to-point (P-P) interconnect 650 using P-P interface circuits 678, 688.
  • IMCs 672 and 682 couple the processors 670, 680 to respective memories, namely a memory 632 and a memory 634, which may be portions of main memory locally attached to the respective processors.
  • Processors 670, 680 may each exchange information with a chipset 690 via individual P-P interconnects 652, 654 using point to point interface circuits 676, 694, 686, 698.
  • Chipset 690 may optionally exchange information with a coprocessor 638 via an interface 692.
  • the coprocessor 638 is a special-purpose processor, such as, for example, a high-throughput processor, a network or communication processor, compression engine, graphics processor, general purpose graphics processing unit (GPGPU) , neural-network processing unit (NPU) , embedded processor, or the like.
  • a shared cache (not shown) may be included in either processor 670, 680 or outside of both processors, yet connected with the processors via P-P interconnect, such that either or both processors’ local cache information may be stored in the shared cache if a processor is placed into a low power mode.
  • first interconnect 616 may be a Peripheral Component Interconnect (PCI) interconnect, or an interconnect such as a PCI Express interconnect or another I/O interconnect.
  • PCI Peripheral Component Interconnect
  • one of the interconnects couples to a power control unit (PCU) 617, which may include circuitry, software, and/or firmware to perform power management operations with regard to the processors 670, 680 and/or co-processor 638.
  • PCU 617 provides control information to a voltage regulator (not shown) to cause the voltage regulator to generate the appropriate regulated voltage.
  • PCU 617 also provides control information to control the operating voltage generated.
  • PCU 617 may include a variety of power management logic units (circuitry) to perform hardware-based power management. Such power management may be wholly processor controlled (e.g., by various processor hardware, and which may be triggered by workload and/or power, thermal or other processor constraints) and/or the power management may be performed responsive to external sources (such as a platform or power management source or system software) .
  • power management logic units circuitry to perform hardware-based power management.
  • Such power management may be wholly processor controlled (e.g., by various processor hardware, and which may be triggered by workload and/or power, thermal or other processor constraints) and/or the power management may be performed responsive to external sources (such as a platform or power management source or system software) .
  • PCU 617 is illustrated as being present as logic separate from the processor 670 and/or processor 680. In other cases, PCU 617 may execute on a given one or more of cores (not shown) of processor 670 or 680. In some cases, PCU 617 may be implemented as a microcontroller (dedicated or general-purpose) or other control logic configured to execute its own dedicated power management code, sometimes referred to as P-code. In yet other examples, power management operations to be performed by PCU 617 may be implemented externally to a processor, such as by way of a separate power management integrated circuit (PMIC) or another component external to the processor. In yet other examples, power management operations to be performed by PCU 617 may be implemented within BIOS or other system software.
  • PMIC power management integrated circuit
  • Various I/O devices 614 may be coupled to first interconnect 616, along with a bus bridge 618 which couples first interconnect 616 to a second interconnect 620.
  • one or more additional processor (s) 615 such as coprocessors, high-throughput many integrated core (MIC) processors, GPGPUs, accelerators (such as graphics accelerators or digital signal processing (DSP) units) , field programmable gate arrays (FPGAs) , or any other processor, are coupled to first interconnect 616.
  • second interconnect 620 may be a low pin count (LPC) interconnect.
  • LPC low pin count
  • second interconnect 620 may be coupled to second interconnect 620 including, for example, a keyboard and/or mouse 622, communication devices 627 and a storage circuitry 628.
  • Storage circuitry 628 may be one or more non-transitory machine-readable storage media as described below, such as a disk drive or other mass storage device which may include instructions/code and data 630 in some examples.
  • an audio I/O 624 may be coupled to second interconnect 620.
  • a system such as multiprocessor system 600 may implement a multi-drop interconnect or other such architecture.
  • Processor cores may be implemented in different ways, for different purposes, and in different processors.
  • implementations of such cores may include: 1) a general purpose in-order core intended for general-purpose computing; 2) a high-performance general purpose out-of-order core intended for general-purpose computing; 3) a special purpose core intended primarily for graphics and/or scientific (throughput) computing.
  • Implementations of different processors may include: 1) a CPU including one or more general purpose in-order cores intended for general-purpose computing and/or one or more general purpose out-of-order cores intended for general-purpose computing; and 2) a coprocessor including one or more special purpose cores intended primarily for graphics and/or scientific (throughput) computing.
  • Such different processors lead to different computer system architectures, which may include: 1) the coprocessor on a separate chip from the CPU; 2) the coprocessor on a separate die in the same package as a CPU; 3) the coprocessor on the same die as a CPU (in which case, such a coprocessor is sometimes referred to as special purpose logic, such as integrated graphics and/or scientific (throughput) logic, or as special purpose cores) ; and 4) a system on a chip (SoC) that may include on the same die as the described CPU (sometimes referred to as the application core (s) or application processor (s) ) , the above described coprocessor, and additional functionality.
  • SoC system on a chip
  • FIG. 7 illustrates a block diagram of an example processor 700 that may have more than one core and an integrated memory controller.
  • the solid lined boxes illustrate a processor 700 with a single core 702A, a system agent unit circuitry 710, a set of one or more interconnect controller unit (s) circuitry 716, while the optional addition of the dashed lined boxes illustrates an alternative processor 700 with multiple cores 702A-N, a set of one or more integrated memory controller unit (s) circuitry 714 in the system agent unit circuitry 710, and special purpose logic 708, as well as a set of one or more interconnect controller units circuitry 716.
  • the processor 700 may be one of the processors 670 or 680, or co-processor 638 or 615 of FIG. 6.
  • different implementations of the processor 700 may include: 1) a CPU with the special purpose logic 708 being integrated graphics and/or scientific (throughput) logic (which may include one or more cores, not shown) , and the cores 702A-N being one or more general purpose cores (e.g., general purpose in-order cores, general purpose out-of-order cores, or a combination of the two) ; 2) a coprocessor with the cores 702A-N being a large number of special purpose cores intended primarily for graphics and/or scientific (throughput) ; and 3) a coprocessor with the cores 702A-N being a large number of general purpose in-order cores.
  • the special purpose logic 708 being integrated graphics and/or scientific (throughput) logic
  • the cores 702A-N being one or more general purpose cores (e.g., general purpose in-order cores, general purpose out-of-order cores, or a combination of the two)
  • the processor 700 may be a general-purpose processor, coprocessor or special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, GPGPU (general purpose graphics processing unit circuitry) , a high-throughput many integrated core (MIC) coprocessor (including 30 or more cores) , embedded processor, or the like.
  • the processor may be implemented on one or more chips.
  • the processor 700 may be a part of and/or may be implemented on one or more substrates using any of a number of process technologies, such as, for example, complementary metal oxide semiconductor (CMOS) , bipolar CMOS (BiCMOS) , P-type metal oxide semiconductor (PMOS) , or N-type metal oxide semiconductor (NMOS) .
  • CMOS complementary metal oxide semiconductor
  • BiCMOS bipolar CMOS
  • PMOS P-type metal oxide semiconductor
  • NMOS N-type metal oxide semiconductor
  • a memory hierarchy includes one or more levels of cache unit (s) circuitry 704A-N within the cores 702A-N, a set of one or more shared cache unit (s) circuitry 706, and external memory (not shown) coupled to the set of integrated memory controller unit (s) circuitry 714.
  • the set of one or more shared cache unit (s) circuitry 706 may include one or more mid-level caches, such as level 2 (L2) , level 3 (L3) , level 4 (L4) , or other levels of cache, such as a last level cache (LLC) , and/or combinations thereof.
  • ring-based interconnect network circuitry 712 interconnects the special purpose logic 708 (e.g., integrated graphics logic) , the set of shared cache unit (s) circuitry 706, and the system agent unit circuitry 710
  • special purpose logic 708 e.g., integrated graphics logic
  • set of shared cache unit (s) circuitry 706, and the system agent unit circuitry 710 alternative examples use any number of well-known techniques for interconnecting such units.
  • coherency is maintained between one or more of the shared cache unit (s) circuitry 706 and cores 702A-N.
  • the system agent unit circuitry 710 includes those components coordinating and operating cores 702A-N.
  • the system agent unit circuitry 710 may include, for example, power control unit (PCU) circuitry and/or display unit circuitry (not shown) .
  • the PCU may be or may include logic and components needed for regulating the power state of the cores 702A-N and/or the special purpose logic 708 (e.g., integrated graphics logic) .
  • the display unit circuitry is for driving one or more externally connected displays.
  • the cores 702A-N may be homogenous in terms of instruction set architecture (ISA) .
  • the cores 702A-N may be heterogeneous in terms of ISA; that is, a subset of the cores 702A-N may be capable of executing an ISA, while other cores may be capable of executing only a subset of that ISA or another ISA.
  • FIG. 8A is a block diagram illustrating both an exemplary in-order pipeline and an exemplary register renaming, out-of-order issue/execution pipeline according to examples.
  • FIG. 8B is a block diagram illustrating both an exemplary example of an in-order architecture core and an exemplary register renaming, out-of-order issue/execution architecture core to be included in a processor according to examples.
  • the solid lined boxes in FIGS. 8A-B illustrate the in-order pipeline and in-order core, while the optional addition of the dashed lined boxes illustrates the register renaming, out-of-order issue/execution pipeline and core. Given that the in-order aspect is a subset of the out-of-order aspect, the out-of-order aspect will be described.
  • a processor pipeline 800 includes a fetch stage 802, an optional length decoding stage 804, a decode stage 806, an optional allocation (Alloc) stage 808, an optional renaming stage 810, a schedule (also known as a dispatch or issue) stage 812, an optional register read/memory read stage 814, an execute stage 816, a write back/memory write stage 818, an optional exception handling stage 822, and an optional commit stage 824.
  • One or more operations can be performed in each of these processor pipeline stages.
  • one or more instructions are fetched from instruction memory, and during the decode stage 806, the one or more fetched instructions may be decoded, addresses (e.g., load store unit (LSU) addresses) using forwarded register ports may be generated, and branch forwarding (e.g., immediate offset or a link register (LR) ) may be performed.
  • addresses e.g., load store unit (LSU) addresses
  • branch forwarding e.g., immediate offset or a link register (LR)
  • the decode stage 806 and the register read/memory read stage 814 may be combined into one pipeline stage.
  • the decoded instructions may be executed, LSU address/data pipelining to an Advanced Microcontroller Bus (AMB) interface may be performed, multiply and add operations may be performed, arithmetic operations with branch results may be performed, etc.
  • AMB Advanced Microcontroller Bus
  • the exemplary register renaming, out-of-order issue/execution architecture core of FIG. 8B may implement the pipeline 800 as follows: 1) the instruction fetch circuitry 838 performs the fetch and length decoding stages 802 and 804; 2) the decode circuitry 840 performs the decode stage 806; 3) the rename/allocator unit circuitry 852 performs the allocation stage 808 and renaming stage 810; 4) the scheduler (s) circuitry 856 performs the schedule stage 812; 5) the physical register file (s) circuitry 858 and the memory unit circuitry 870 perform the register read/memory read stage 814; the execution cluster (s) 860 perform the execute stage 816; 6) the memory unit circuitry 870 and the physical register file (s) circuitry 858 perform the write back/memory write stage 818; 7) various circuitry may be involved in the exception handling stage 822; and 8) the retirement unit circuitry 854 and the physical register file (s) circuitry 858 perform the commit stage 824
  • FIG. 8B shows a processor core 890 including front-end unit circuitry 830 coupled to an execution engine unit circuitry 850, and both are coupled to a memory unit circuitry 870.
  • the core 890 may be a reduced instruction set architecture computing (RISC) core, a complex instruction set architecture computing (CISC) core, a very long instruction word (VLIW) core, or a hybrid or alternative core type.
  • the core 890 may be a special-purpose core, such as, for example, a network or communication core, compression engine, coprocessor core, general purpose computing graphics processing unit (GPGPU) core, graphics core, or the like.
  • GPGPU general purpose computing graphics processing unit
  • the front end unit circuitry 830 may include branch prediction circuitry 832 coupled to an instruction cache circuitry 834, which is coupled to an instruction translation lookaside buffer (TLB) 836, which is coupled to instruction fetch circuitry 838, which is coupled to decode circuitry 840.
  • the instruction cache circuitry 834 is included in the memory unit circuitry 870 rather than the front-end circuitry 830.
  • the decode circuitry 840 (or decoder) may decode instructions, and generate as an output one or more micro-operations, micro-code entry points, microinstructions, other instructions, or other control signals, which are decoded from, or which otherwise reflect, or are derived from, the original instructions.
  • the decode circuitry 840 may further include an address generation unit (AGU, not shown) circuitry.
  • AGU address generation unit
  • the AGU generates an LSU address using forwarded register ports, and may further perform branch forwarding (e.g., immediate offset branch forwarding, LR register branch forwarding, etc. ) .
  • branch forwarding e.g., immediate offset branch forwarding, LR register branch forwarding, etc.
  • the decode circuitry 840 may be implemented using various different mechanisms. Examples of suitable mechanisms include, but are not limited to, look-up tables, hardware implementations, programmable logic arrays (PLAs) , microcode read only memories (ROMs) , etc.
  • the core 890 includes a microcode ROM (not shown) or other medium that stores microcode for certain macroinstructions (e.g., in decode circuitry 840 or otherwise within the front end circuitry 830) .
  • the decode circuitry 840 includes a micro-operation (micro-op) or operation cache (not shown) to hold/cache decoded operations, micro-tags, or micro-operations generated during the decode or other stages of the processor pipeline 800.
  • the decode circuitry 840 may be coupled to rename/allocator unit circuitry 852 in the execution engine circuitry 850.
  • the execution engine circuitry 850 includes the rename/allocator unit circuitry 852 coupled to a retirement unit circuitry 854 and a set of one or more scheduler (s) circuitry 856.
  • the scheduler (s) circuitry 856 represents any number of different schedulers, including reservations stations, central instruction window, etc.
  • the scheduler (s) circuitry 856 can include arithmetic logic unit (ALU) scheduler/scheduling circuitry, ALU queues, arithmetic generation unit (AGU) scheduler/scheduling circuitry, AGU queues, etc.
  • ALU arithmetic logic unit
  • AGU arithmetic generation unit
  • the scheduler (s) circuitry 856 is coupled to the physical register file (s) circuitry 858.
  • Each of the physical register file (s) circuitry 858 represents one or more physical register files, different ones of which store one or more different data types, such as scalar integer, scalar floating-point, packed integer, packed floating-point, vector integer, vector floating-point, status (e.g., an instruction pointer that is the address of the next instruction to be executed) , etc.
  • the physical register file (s) circuitry 858 includes vector registers unit circuitry, writemask registers unit circuitry, and scalar register unit circuitry. These register units may provide architectural vector registers, vector mask registers, general-purpose registers, etc.
  • the physical register file (s) circuitry 858 is coupled to the retirement unit circuitry 854 (also known as a retire queue or a retirement queue) to illustrate various ways in which register renaming and out-of-order execution may be implemented (e.g., using a reorder buffer (s) (ROB (s) ) and a retirement register file (s) ; using a future file (s) , a history buffer (s) , and a retirement register file (s) ; using a register maps and a pool of registers; etc. ) .
  • the retirement unit circuitry 854 and the physical register file (s) circuitry 858 are coupled to the execution cluster (s) 860.
  • the execution cluster (s) 860 includes a set of one or more execution unit (s) circuitry 862 and a set of one or more memory access circuitry 864.
  • the execution unit (s) circuitry 862 may perform various arithmetic, logic, floating-point or other types of operations (e.g., shifts, addition, subtraction, multiplication) and on various types of data (e.g., scalar integer, scalar floating-point, packed integer, packed floating-point, vector integer, vector floating-point) . While some examples may include a number of execution units or execution unit circuitry dedicated to specific functions or sets of functions, other examples may include only one execution unit circuitry or multiple execution units/execution unit circuitry that all perform all functions.
  • the scheduler (s) circuitry 856, physical register file (s) circuitry 858, and execution cluster (s) 860 are shown as being possibly plural because certain examples create separate pipelines for certain types of data/operations (e.g., a scalar integer pipeline, a scalar floating-point/packed integer/packed floating-point/vector integer/vector floating-point pipeline, and/or a memory access pipeline that each have their own scheduler circuitry, physical register file (s) circuitry, and/or execution cluster –and in the case of a separate memory access pipeline, certain examples are implemented in which only the execution cluster of this pipeline has the memory access unit (s) circuitry 864) . It should also be understood that where separate pipelines are used, one or more of these pipelines may be out-of-order issue/execution and the rest in-order.
  • the execution engine unit circuitry 850 may perform load store unit (LSU) address/data pipelining to an Advanced Microcontroller Bus (AMB) interface (not shown) , and address phase and writeback, data phase load, store, and branches.
  • LSU load store unit
  • AMB Advanced Microcontroller Bus
  • the set of memory access circuitry 864 is coupled to the memory unit circuitry 870, which includes data TLB circuitry 872 coupled to a data cache circuitry 874 coupled to a level 2 (L2) cache circuitry 876.
  • the memory access circuitry 864 may include a load unit circuitry, a store address unit circuit, and a store data unit circuitry, each of which is coupled to the data TLB circuitry 872 in the memory unit circuitry 870.
  • the instruction cache circuitry 834 is further coupled to the level 2 (L2) cache circuitry 876 in the memory unit circuitry 870.
  • the instruction cache 834 and the data cache 874 are combined into a single instruction and data cache (not shown) in L2 cache circuitry 876, a level 3 (L3) cache circuitry (not shown) , and/or main memory.
  • L2 cache circuitry 876 is coupled to one or more other levels of cache and eventually to a main memory.
  • the core 890 may support one or more instructions sets (e.g., the x86 instruction set architecture (optionally with some extensions that have been added with newer versions) ; the MIPS instruction set architecture; the ARM instruction set architecture (optionally with optional additional extensions such as NEON) ) , including the instruction (s) described herein.
  • the core 890 includes logic to support a packed data instruction set architecture extension (e.g., AVX1, AVX2) , thereby allowing the operations used by many multimedia applications to be performed using packed data.
  • a packed data instruction set architecture extension e.g., AVX1, AVX2
  • FIG. 9 illustrates examples of execution unit (s) circuitry, such as execution unit (s) circuitry 862 of FIG. 8B.
  • execution unit (s) circuitry 862 may include one or more ALU circuits 901, optional vector/single instruction multiple data (SIMD) circuits 903, load/store circuits 905, branch/jump circuits 907, and/or Floating-point unit (FPU) circuits 909.
  • ALU circuits 901 perform integer arithmetic and/or Boolean operations.
  • Vector/SIMD circuits 903 perform vector/SIMD operations on packed data (such as SIMD/vector registers) .
  • Load/store circuits 905 execute load and store instructions to load data from memory into registers or store from registers to memory.
  • Load/store circuits 905 may also generate addresses.
  • Branch/jump circuits 907 cause a branch or jump to a memory address depending on the instruction.
  • FPU circuits 909 perform floating-point arithmetic.
  • the width of the execution unit (s) circuitry 862 varies depending upon the example and can range from 16-bit to 1,024-bit, for example. In some examples, two or more smaller execution units are logically combined to form a larger execution unit (e.g., two 128-bit execution units are logically combined to form a 256-bit execution unit) .
  • FIG. 10 is a block diagram of a register architecture 1000 according to some examples.
  • the register architecture 1000 includes vector/SIMD registers 1010 that vary from 128-bit to 1, 024 bits width.
  • the vector/SIMD registers 1010 are physically 512-bits and, depending upon the mapping, only some of the lower bits are used.
  • the vector/SIMD registers 1010 are ZMM registers which are 512 bits: the lower 256 bits are used for YMM registers and the lower 128 bits are used for XMM registers. As such, there is an overlay of registers.
  • a vector length field selects between a maximum length and one or more other shorter lengths, where each such shorter length is half the length of the preceding length.
  • Scalar operations are operations performed on the lowest order data element position in a ZMM/YMM/XMM register; the higher order data element positions are either left the same as they were prior to the instruction or zeroed depending on the example.
  • Segment registers 1020 contain segment points for use in accessing memory. In some examples, these registers are referenced by the names CS, DS, SS, ES, FS, and GS.
  • Machine specific registers (MSRs) 1035 control and report on processor performance. Most MSRs 1035 handle system-related functions and are not accessible to an application program. Machine check registers 1060 consist of control, status, and error reporting MSRs that are used to detect and report on hardware errors.
  • One or more instruction pointer register (s) 1030 store an instruction pointer value.
  • Control register (s) 1055 e.g., CR0-CR4
  • determine the operating mode of a processor e.g., processor 670, 680, 638, 615, and/or 700
  • Debug registers 1050 control and allow for the monitoring of a processor or core’s debugging operations.
  • Memory (mem) management registers 1065 specify the locations of data structures used in protected mode memory management. These registers may include a GDTR, IDRT, task register, and a LDTR register.
  • the register architecture 1000 may, for example, be used in physical register file (s) circuitry 8 58.
  • signals are represented with lines. Some lines may be thicker, to indicate a greater number of constituent signal paths, and/or have arrows at one or more ends, to indicate a direction of information flow. Such indications are not intended to be limiting. Rather, the lines are used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit or a logical unit. Any represented signal, as dictated by design needs or preferences, may actually comprise one or more signals that may travel in either direction and may be implemented with any suitable type of signal scheme.
  • connection means a direct connection, such as electrical, mechanical, or magnetic connection between the things that are connected, without any intermediary devices.
  • coupled means a direct or indirect connection, such as a direct electrical, mechanical, or magnetic connection between the things that are connected or an indirect connection, through one or more passive or active intermediary devices.
  • circuit or “module” may refer to one or more passive and/or active components that are arranged to cooperate with one another to provide a desired function.
  • signal may refer to at least one current signal, voltage signal, magnetic signal, or data/clock signal.
  • the meaning of “a, ” “an, ” and “the” include plural references.
  • the meaning of “in” includes “in” and “on. ”
  • a device may generally refer to an apparatus according to the context of the usage of that term.
  • a device may refer to a stack of layers or structures, a single structure or layer, a connection of various structures having active and/or passive elements, etc.
  • a device is a three-dimensional structure with a plane along the x-y direction and a height along the z direction of an x-y-z Cartesian coordinate system.
  • the plane of the device may also be the plane of an apparatus which comprises the device.
  • scaling generally refers to converting a design (schematic and layout) from one process technology to another process technology and subsequently being reduced in layout area.
  • scaling generally also refers to downsizing layout and devices within the same technology node.
  • scaling may also refer to adjusting (e.g., slowing down or speeding up –i.e. scaling down, or scaling up respectively) of a signal frequency relative to another parameter, for example, power supply level.
  • the terms “substantially, ” “close, ” “approximately, ” “near, ” and “about, ” generally refer to being within +/-10%of a target value.
  • the terms “substantially equal, ” “about equal” and “approximately equal” mean that there is no more than incidental variation between among things so described. In the art, such variation is typically no more than +/-10%of a predetermined target value.
  • a first material “over” a second material in the context of a figure provided herein may also be “under” the second material if the device is oriented upside-down relative to the context of the figure provided.
  • one material disposed over or under another may be directly in contact or may have one or more intervening materials.
  • one material disposed between two materials may be directly in contact with the two layers or may have one or more intervening layers.
  • a first material “on” a second material is in direct contact with that second material. Similar distinctions are to be made in the context of component assemblies.
  • a list of items joined by the term “at least one of” or “one or more of” can mean any combination of the listed terms.
  • the phrase “at least one of A, B or C” can mean A; B; C; A and B; A and C; B and C; or A, B and C. It is pointed out that those elements of a figure having the same reference numbers (or names) as the elements of any other figure can operate or function in any manner similar to that described, but are not limited to such.
  • combinatorial logic and sequential logic discussed in the present disclosure may pertain both to physical structures (such as AND gates, OR gates, or XOR gates) , or to synthesized or otherwise optimized collections of devices implementing the logical structures that are Boolean equivalents of the logic under discussion.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a machine-readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs) , random access memories (RAMs) such as dynamic RAM (DRAM) , EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, and coupled to a computer system bus.
  • one or more non-transitory machine-readable storage media having stored thereon instructions which, when executed by one or more processing units, cause the one or more processing units to perform a method comprising determining a value of a control variable v which is calculated, by an output node V of a proportional-integral-derivative neural network (PIDNN) , based on each of a variable s1 which represents a proportional term, and a variable s3 which represents a derivative term, determining weights w1, w3, w21, w23 corresponding to respective variables s1, s3, s21, s23, wherein an input layer of the PIDNN calculates respective values of the variables s21, s23 based on a variable y which indicates a detected thermal condition, and wherein a hidden layer of the PIDNN calculates respective values of variables s1, s3 based on variables s21, s23, performing a first evaluation based on a first metric of a
  • PIDNN proportional
  • the method further comprises operating a fan based on the value of the control variable v.
  • the first value is based on both a square of the weight w21 and a square of the weight w1.
  • the second value is based on both a square of the weight w23 and a square of the weight w3.
  • the first evaluation is further based on a third metric of a stability of the variable y.
  • calculating the value of the cost metric J is further based on a difference between the variable y and a value r which indicates a target thermal condition.
  • calculating the value of the cost metric J comprises calculating an average of cost sample terms which each comprise a sum of a respective value of the regularization term Tp, a respective value of the regularization term Td, and a square of a difference between the value r and a respective value of the variable y.
  • the method further comprises performing a third evaluation to detect whether a change to the variable s1 is less than a first threshold amount, performing a fourth evaluation to detect whether a magnitude of a change to the variable y is less than a second threshold amount, and based on the third evaluation and the fourth evaluation, selecting between setting the regularization term Tp to be equal to the first value and setting the regularization term Tp to be equal to a baseline value.
  • the second circuitry to calculate the value of the cost metric J comprises the second circuitry to calculate an average of cost sample terms which each comprise a sum of a respective value of the regularization term Tp, a respective value of the regularization term Td, and a square of a difference between the value r and a respective value of the variable y.
  • the first circuitry is further to perform a first evaluation to detect whether a duration of a spike of the variable v is less than a first threshold amount, perform a second evaluation to detect whether a magnitude of the spike is greater than a second threshold amount, and based on the first evaluation and the second evaluation, select between setting the regularization term Td to be equal to the second value and setting the regularization term Td to be equal to a baseline value.
  • the third circuitry to adjust the one or more weights of the PIDNN based on the cost metric J comprises the third circuitry to reduce a first weight based on a gradient of the cost metric J with respect to the first weight.
  • a method comprises determining a value of a control variable v which is calculated, by an output node V of a proportional-integral-derivative neural network (PIDNN) , based on each of a variable s1 which represents a proportional term, and a variable s3 which represents a derivative term, determining weights w1, w3, w21, w23 corresponding to respective variables s1, s3, s21, s23, wherein an input layer of the PIDNN calculates respective values of the variables s21, s23 based on a variable y which indicates a detected thermal condition, and wherein a hidden layer of the PIDNN calculates respective values of variables s1, s3 based on variables s21, s23, performing a first evaluation based on a first metric of a volatility of the variable s1, based on the first evaluation, setting a regularization term Tp to be equal to a first value which is based on each
  • PIDNN proportional-integral
  • the method further comprises operating a fan based on the value of the control variable v.
  • the first value is based on both a square of the weight w21 and a square of the weight w1.
  • the second value is based on both a square of the weight w23 and a square of the weight w3.
  • the first evaluation is further based on a third metric of a stability of the variable y.
  • calculating the value of the cost metric J is further based on a difference between the variable y and a value r which indicates a target thermal condition.
  • calculating the value of the cost metric J comprises calculating an average of cost sample terms which each comprise a sum of a respective value of the regularization term Tp, a respective value of the regularization term Td, and a square of a difference between the value r and a respective value of the variable y.
  • the method further comprises performing a third evaluation to detect whether a change to the variable s1 is less than a first threshold amount, performing a fourth evaluation to detect whether a magnitude of a change to the variable y is less than a second threshold amount, and based on the third evaluation and the fourth evaluation, selecting between setting the regularization term Tp to be equal to the first value and setting the regularization term Tp to be equal to a baseline value.
  • the method further comprises performing a third evaluation to detect whether a duration of a spike of the variable v is less than a first threshold amount, performing a fourth evaluation to detect whether a magnitude of the spike is greater than a second threshold amount, and based on the third evaluation and the fourth evaluation, selecting between setting the regularization term Td to be equal to the second value and setting the regularization term Td to be equal to a baseline value.
  • adjusting the one or more weights of the PIDNN based on the value of the cost metric J comprises reducing a first weight based on a gradient of the cost metric J with respect to the first weight.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Feedback Control In General (AREA)

Abstract

L'invention concerne des techniques et des mécanismes de mise à jour dynamique de poids d'un réseau neuronal proportionnel-intégral-dérivé (PIDNN). Dans un mode de réalisation, un nœud de sortie du PIDNN génère un signal de commande sur la base d'un terme proportionnel et d'un terme dérivé qui sont chacun fournis au nœud de sortie. Des termes de régularisation sont calculés sur la base de poids qui correspondent de manière variée chacun à un terme respectif parmi le terme proportionnel et le terme dérivé. La valeur d'une métrique de coût est calculée sur la base de chacun des termes de régularisation, et un ou plusieurs poids de PIDNN sont ajustés sur la base de la métrique de coût. Dans un autre mode de réalisation, le signal de commande est fourni pour faire fonctionner un ou plusieurs ventilateurs qui fournissent une régulation thermique d'un dispositif informatique.
PCT/CN2024/082472 2024-03-19 2024-03-19 Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique Pending WO2025194342A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/082472 WO2025194342A1 (fr) 2024-03-19 2024-03-19 Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2024/082472 WO2025194342A1 (fr) 2024-03-19 2024-03-19 Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique

Publications (1)

Publication Number Publication Date
WO2025194342A1 true WO2025194342A1 (fr) 2025-09-25

Family

ID=97138313

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/082472 Pending WO2025194342A1 (fr) 2024-03-19 2024-03-19 Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique

Country Status (1)

Country Link
WO (1) WO2025194342A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110663049A (zh) * 2017-04-28 2020-01-07 谷歌有限责任公司 神经网络优化器搜索
US10699190B1 (en) * 2018-03-04 2020-06-30 Facebook, Inc. Systems and methods for efficiently updating neural networks
US20210397964A1 (en) * 2020-06-16 2021-12-23 California Institute Of Technology Resilience determination and damage recovery in neural networks
US20220138564A1 (en) * 2020-10-30 2022-05-05 Graphcore Limited Batch Processing in a Machine Learning Computer

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110663049A (zh) * 2017-04-28 2020-01-07 谷歌有限责任公司 神经网络优化器搜索
US10699190B1 (en) * 2018-03-04 2020-06-30 Facebook, Inc. Systems and methods for efficiently updating neural networks
US20210397964A1 (en) * 2020-06-16 2021-12-23 California Institute Of Technology Resilience determination and damage recovery in neural networks
US20220138564A1 (en) * 2020-10-30 2022-05-05 Graphcore Limited Batch Processing in a Machine Learning Computer

Similar Documents

Publication Publication Date Title
US11567555B2 (en) Software assisted power management
US11836464B2 (en) Method and apparatus for efficient binary and ternary support in fused multiply-add (FMA) circuits
US10929503B2 (en) Apparatus and method for a masked multiply instruction to support neural network pruning operations
US10656697B2 (en) Processor core power event tracing
CN109791513B (zh) 用于检测数值累加误差的指令和逻辑
CN103959236B (zh) 用于提供向量横向多数表决功能的处理器、设备和处理系统
WO2018063703A1 (fr) Instruction et logique pour détection de soupassement précoce et dérivation d'arrondi
US20190196813A1 (en) Apparatus and method for multiplying, summing, and accumulating sets of packed bytes
CN108351785A (zh) 用于部分减少操作的指令和逻辑
US11010166B2 (en) Arithmetic logic unit with normal and accelerated performance modes using differing numbers of computational circuits
WO2018005718A1 (fr) Système et procédé de décodage agrégé dans le désordre
US9851976B2 (en) Instruction and logic for a matrix scheduler
US12455612B2 (en) Device, method and system to provide thread scheduling hints to a software process
US10387797B2 (en) Instruction and logic for nearest neighbor unit
CN108351778A (zh) 用于检测浮点抵消效应的指令和逻辑
CN106030519A (zh) 用于从多个股分派指令的处理器逻辑和方法
US20240103874A1 (en) Instruction elimination through hardware driven memoization of loop instances
CN108292214A (zh) 用于获得数据列的指令和逻辑
WO2025194342A1 (fr) Adaptation de poids d'un réseau neuronal proportionnel-intégral-dérivé pour faciliter la régulation thermique
US10268255B2 (en) Management of system current constraints with current limits for individual engines
EP4395113A1 (fr) Gestion d'énergie pour groupe de batteries
US20240202125A1 (en) Coherency bypass tagging for read-shared data
WO2019005115A1 (fr) Appareil et procédé de multiplication et de cumul de valeurs complexes
US20240202120A1 (en) Integrated circuit chip to selectively provide tag array functionality or cache array functionality
US20250199931A1 (en) Device, method and system for determining a frequency ratio of a processor with inference engine circuitry

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24930028

Country of ref document: EP

Kind code of ref document: A1