Zde můžete vidět rozdíly mezi vybranou verzí a aktuální verzí dané stránky.
courses:b4m36smu [2018/06/06 17:58] rozumden [Zkouška] |
courses:b4m36smu [2025/01/03 18:23] (aktuální) |
||
---|---|---|---|
Řádek 17: | Řádek 17: | ||
- (**10 pnts**) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals. | - (**10 pnts**) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals. | ||
* (3 pnts) For each agent: does it learn online? | * (3 pnts) For each agent: does it learn online? | ||
- | * (3 pnts) For each agent: does it learn efficiently? \ | + | * (3 pnts) For each agent: does it learn efficiently? |
* (4 pnts) For the first agent: given the first negative observation (0,1,1,1,...,1), what will be the agent's decision on the next observation (0,1,0,1,...)? | * (4 pnts) For the first agent: given the first negative observation (0,1,1,1,...,1), what will be the agent's decision on the next observation (0,1,0,1,...)? | ||
- (**15 pnts**) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background? | - (**15 pnts**) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background? | ||
Řádek 28: | Řádek 28: | ||
* (3 pnts) Compute Pr(Heart Attack|Winter, Bad Sales). | * (3 pnts) Compute Pr(Heart Attack|Winter, Bad Sales). | ||
- (**5 pnts**) Q-learning. Given 5 small questions, response True/False and provide your reasoning. | - (**5 pnts**) Q-learning. Given 5 small questions, response True/False and provide your reasoning. | ||
- | * Can Q-learning be extended to infinite states or action space? How would it handle this? | + | * (1 pnt) Can Q-learning be extended to infinite states or action space? How would it handle this? |
- | * Does Q-learning use on-policy update? What is the difference from off-policy update? | + | * (1 pnt) Does Q-learning use on-policy update? What is the difference from off-policy update? |
- | * Does Q-learning always converge? If so, is it conditioned by anything? By what? | + | * (1 pnt) Does Q-learning always converge? If so, is it conditioned by anything? By what? |
- | * Is Q-learning just an instance of temporal difference learning? If not, what is different? | + | * (1 pnt) Is Q-learning just an instance of temporal difference learning? If not, what is different? |
- | * What is the difference between Q-learning and direct utility estimation or adaptive dynamic programming? What is better? | + | * (1 pnt) What is the difference between Q-learning and direct utility estimation or adaptive dynamic programming? What is better? |
- (**5 pnts**) Q-learning representation. | - (**5 pnts**) Q-learning representation. | ||
* There is a robot moving in a swimming pool, which can move in either of 3 dimensions and it has exactly one propeller for each dimension. It can also move with two different speeds. There is a treasure at a specific place and a specific depth. There are mines at some places as well. If the robot hits a mine or the wall, it restarts at a random position. | * There is a robot moving in a swimming pool, which can move in either of 3 dimensions and it has exactly one propeller for each dimension. It can also move with two different speeds. There is a treasure at a specific place and a specific depth. There are mines at some places as well. If the robot hits a mine or the wall, it restarts at a random position. |