Rozdíly

Zde můžete vidět rozdíly mezi vybranou verzí a aktuální verzí dané stránky.

Odkaz na výstup diff

courses:b4m36smu [2018/06/06 17:57]
rozumden [Zkouška]
courses:b4m36smu [2025/01/03 18:23] (aktuální)
Řádek 17: Řádek 17:
   - (**10 pnts**) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals. ​   - (**10 pnts**) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals. ​
        * (3 pnts) For each agent: does it learn online? ​        * (3 pnts) For each agent: does it learn online? ​
-       * (3 pnts) For each agent: does it learn efficiently? ​\+       * (3 pnts) For each agent: does it learn efficiently? ​
        * (4 pnts) For the first agent: given the first negative observation (0,​1,​1,​1,​...,​1),​ what will be the agent'​s decision on the next observation (0,​1,​0,​1,​...)?​        * (4 pnts) For the first agent: given the first negative observation (0,​1,​1,​1,​...,​1),​ what will be the agent'​s decision on the next observation (0,​1,​0,​1,​...)?​
   - (**15 pnts**) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background? ​   - (**15 pnts**) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background? ​
Řádek 28: Řádek 28:
      * (3 pnts) Compute Pr(Heart Attack|Winter,​ Bad Sales).      * (3 pnts) Compute Pr(Heart Attack|Winter,​ Bad Sales).
   - (**5 pnts**) Q-learning. Given 5 small questions, response True/False and provide your reasoning. ​   - (**5 pnts**) Q-learning. Given 5 small questions, response True/False and provide your reasoning. ​
-     * Can Q-learning be extended to infinite states or action space? How would it handle this? +     ​* ​(1 pnt) Can Q-learning be extended to infinite states or action space? How would it handle this? 
-     * Does Q-learning use on-policy update? What is the difference from off-policy update? +     ​* ​(1 pnt) Does Q-learning use on-policy update? What is the difference from off-policy update? 
-     * Does Q-learning always converge? If so, is it conditioned by anything? By what? +     ​* ​(1 pnt) Does Q-learning always converge? If so, is it conditioned by anything? By what? 
-     * Is Q-learning just an instance of temporal difference learning? If not, what is different?​ +     ​* ​(1 pnt) Is Q-learning just an instance of temporal difference learning? If not, what is different?​ 
-     * What is the difference between Q-learning and direct utility estimation or adaptive dynamic programming?​ What is better? ​+     ​* ​(1 pnt) What is the difference between Q-learning and direct utility estimation or adaptive dynamic programming?​ What is better? ​
   - (**5 pnts**) Q-learning representation. ​   - (**5 pnts**) Q-learning representation. ​
       * There is a robot moving in a swimming pool, which can move in either of 3 dimensions and it has exactly one propeller for each dimension. It can also move with two different speeds. There is a treasure at a specific place and a specific depth. There are mines at some places as well. If the robot hits a mine or the wall, it restarts at a random position.       * There is a robot moving in a swimming pool, which can move in either of 3 dimensions and it has exactly one propeller for each dimension. It can also move with two different speeds. There is a treasure at a specific place and a specific depth. There are mines at some places as well. If the robot hits a mine or the wall, it restarts at a random position.
-      * Describe states, actions, rewards of a specific game. You may provide two different representations. +      * (3 pnts) Describe states, actions, rewards of a specific game. You may provide two different representations. 
-      * Describe Q-learning representation,​ the update rule, gamma, alpha value. How are Q values defined?+      * (2 pnts) Describe Q-learning representation,​ the update rule, gamma, alpha value. How are Q values defined?
  
 ~~DISCUSSION~~ ~~DISCUSSION~~
  
  
courses/b4m36smu.1528300627.txt.gz · Poslední úprava: 2025/01/03 18:16 (upraveno mimo DokuWiki)
Nahoru
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0