Rozdíly

Zde můžete vidět rozdíly mezi vybranou verzí a aktuální verzí dané stránky.

--- courses:b4m36smu [2018/06/06 17:37]
rozumden [Zkouška]
+++ courses:b4m36smu [2025/01/03 18:23] (aktuální)
@@ Řádek 11: / Řádek 11: @@
 ===== Zkouška =====
-* 06.06.2018
+.06.2018
+  - (**5 pnts**) Difference between PAC-learning agent and mistake-bound agent.
-  - (5 pnts) Difference between PAC-learning agent and mistake-bound agent. What does it mean when an agent in both frameworks learns? What does it mean when it learns efficiently? Online?
+      * (2 pnts) What does it mean when an agent in both frameworks learns?
-  - (10 pnts) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals. For each agent: does it learn online? does it learn efficiently? For the first agent: given the first negative observation (0,1,1,1,...,1), what will be the agent's decision on the next observation (0,1,0,1,...)?
+      * (3 pnts) What does it mean when it learns efficiently? Online?
-  - (15 pnts) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background? Apply algorithm, draw tables, theta functions. Make a reduction step relative to B. Why is it needed?
+  - (**10 pnts**) Space-version agent. There are given two agent with different hypotheses spaces. First is all possible 3-conjunctions (non-negative) of n variables. Second is all n-conjunctions of positive and negative literals.
-  - (10 pnts) Bayesian networks. Find optimal, efficient, complete network (something like Season -> Temperature -> (two children: -> Ice Cream Sales, -> Heart Attack Rate)). Then compute CPT (conditional probability tables). For two queries compute its probability: 1) Pr(Spring|Good Ice Cream Sales, No Heart Attack) 2) Pr(Heart Attack|Winter, Bad Sales).
+       * (3 pnts) For each agent: does it learn online?
-  - (5 pnts) Q-learning. Given 5 small questions, response True/False and provide your reasoning.
+       * (3 pnts) For each agent: does it learn efficiently?
-  - (5 pnts) Q-learning representation. Describe states, actions, rewards.
+       * (4 pnts) For the first agent: given the first negative observation (0,1,1,1,...,1), what will be the agent's decision on the next observation (0,1,0,1,...)?
+  - (**15 pnts**) Relative Least General Generalization (rlgg). Given background knowledge B = {half(4,2), half(2,1), int(2), int(1)}. What will be the rlgg of o1 = even(4) and o2 = even(2) relative to the background?
+      * (10 pnts) Apply algorithm, draw tables, theta functions.
+      * (5 pnts) Make a reduction step relative to B. Why is it needed?
+  - (**10 pnts**) Bayesian networks.
+     * (2 pnts) Find optimal, efficient, complete network (something like Season -> Temperature -> (two children: -> Ice Cream Sales, -> Heart Attack Rate)).
+     * (2 pnts) Then compute CPT (conditional probability tables).
+     * (3 pnts) Compute Pr(Spring|Good Ice Cream Sales, No Heart Attack)
+     * (3 pnts) Compute Pr(Heart Attack|Winter, Bad Sales).
+  - (**5 pnts**) Q-learning. Given 5 small questions, response True/False and provide your reasoning.
+     * (1 pnt) Can Q-learning be extended to infinite states or action space? How would it handle this?
+     * (1 pnt) Does Q-learning use on-policy update? What is the difference from off-policy update?
+     * (1 pnt) Does Q-learning always converge? If so, is it conditioned by anything? By what?
+     * (1 pnt) Is Q-learning just an instance of temporal difference learning? If not, what is different?
+     * (1 pnt) What is the difference between Q-learning and direct utility estimation or adaptive dynamic programming? What is better?
+  - (**5 pnts**) Q-learning representation.
+      * There is a robot moving in a swimming pool, which can move in either of 3 dimensions and it has exactly one propeller for each dimension. It can also move with two different speeds. There is a treasure at a specific place and a specific depth. There are mines at some places as well. If the robot hits a mine or the wall, it restarts at a random position.
+      * (3 pnts) Describe states, actions, rewards of a specific game. You may provide two different representations.
+      * (2 pnts) Describe Q-learning representation, the update rule, gamma, alpha value. How are Q values defined?
 ~~DISCUSSION~~

courses/b4m36smu.1528299476.txt.gz · Poslední úprava: 2025/01/03 18:16 (upraveno mimo DokuWiki)

Nahoru