Agents and Environments</h1> <article> <h1>Agents and Environments</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h2 id="designing-rational-agents"> Designing Rational Agents <a class="anchor" href="#designing-rational-agents">#</a> </h2> <p><strong>How do we actually build AI systems?</strong></p> <ul> <li>An agent is any entity that <strong>perceives</strong> (through sensors) ****and <strong>acts</strong> (through actuators or effectors)<strong>.</strong></li> <li>One possible goal of making an AI is to create a <strong>rational agent -</strong> one that selects actions that maximize its expected utility.</li> <li>The <strong>agent function</strong> maps perception to action. It is generated by an <strong>agent program</strong> running on a machine.</li> </ul> <h2 id="evaluating-the-environment"> Evaluating the Environment <a class="anchor" href="#evaluating-the-environment">#</a> </h2> <p>The task environment can be described using <strong>PEAS:</strong></p> </article> <article> <h1>Search Problems</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <p>Search problems consist of:</p> <ul> <li>A state space $S$</li> <li>An initial state $s_0$</li> <li>Possible actions $A(s)$ in each state</li> <li>Transition model that converts a current state and action into the next state</li> <li>A goal test $G(s)$ that returns true or false depending on if $s$ is at the goal</li> <li>An action cost</li> </ul> <p>As an example, suppose you are travelling:</p> <ul> <li>The state space is all cities you can go to.</li> <li>The initial state is which city you start in.</li> <li>Actions are to navigate to adjacent cities.</li> <li>The transition model is the process of travelling to another city.</li> <li>The goal test is to check if you are in the desired city.</li> <li>The action cost is the road distance.</li> </ul> <p>Search problems are typically conducted on <strong>models,</strong> which are imperfect representations of the real world. Models are almost always wrong to some extent.</p> </article> <article> <h1>Games</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <p>A <strong>game</strong> is a task environment with more than one agent. Some characteristics of games could be:</p> <ul> <li>Deterministic vs Stochastic</li> <li>Fully observable vs Partially observable</li> <li>Number of players</li> <li>Team vs Individual</li> <li>Turn based vs Simultaneous</li> <li>Zero sum vs General sum <ul> <li>Zero sum: where agents have opposite utilities (one maximizes, the other one minimizes)</li> <li>General sum: agents have independent utilities, allowing for cooperation, alliances, competition…</li> </ul> </li> </ul> <p>A <strong>standard game</strong> is deterministic, observable, two-player, turn-based, and zero-sum. It can be formulated using:</p> </article> <article> <h1>Logic</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h1 id="what-is-logic-and-why-is-it-important"> What is logic and why is it important? <a class="anchor" href="#what-is-logic-and-why-is-it-important">#</a> </h1> <p>In the beginning of the course, we discussed <strong>atomic, factored, and structured</strong> representations (atomic = single points per state, factored = representation of multiple states, structured = language to describe those states).</p> <p>While search problems deal with atomic representations, <strong>logic deals with factored representations.</strong></p> <h2 id="knowledge"> Knowledge <a class="anchor" href="#knowledge">#</a> </h2> <p>Agents acquire knowledge through perception, learning, and language. Agents need to know:</p> <ul> <li>The effects of actions (transition model),</li> <li>How the world affects sensors (sensor model),</li> <li>And the current state of the world (especially important for partially observable worlds).</li> </ul> <p>A <strong>knowledge base</strong> is a set of sentences in a formal language. It is a declarative approach to building an agent:</p> </article> <article> <h1>Bayes Nets</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h1 id="probability-review"> Probability Review <a class="anchor" href="#probability-review">#</a> </h1> <p>For the underlying probability theory (independence, conditional probability, distributions), see [[cs70/probability/probability-overview]]. For Bayesian parameter estimation on the resulting CPTs, see [[data102/parameter estimation]].</p> <h1 id="bayes-nets"> Bayes Nets <a class="anchor" href="#bayes-nets">#</a> </h1> <p>Bayes Nets consist of:</p> <ul> <li>Nodes corresponding to a variable in the problem</li> <li>A CPT (Conditional Probability Table) for each node</li> <li>Edges between variables that encode a direct influence (conditional probability)</li> </ul> <p>Properties:</p> <ul> <li>Must be a directed acyclic graph (no cycles)</li> <li>Space complexity: $O(n \cdot d^k)$ <ul> <li>$n$ variables (number of CPT’s)</li> <li>$d$ possible values per tables</li> <li>$k$ maximum variables in one table</li> </ul> </li> </ul> <p>Global semantics:</p> </article> <article> <h1>Markov Models</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <p>A Markov Model is basically a Bayes net that is an infinitely long chain (”time series bayes net”).</p> <p>Typically, each node is a random variable that represents a specific point in time.</p> <p>Markov models follow the <strong>memoryless property,</strong> which states that the random variable for time step $i+1$ is independent of all other variables except the random variable at time step $i$. (For the probability-theory foundation of this construction, see [[cs70/probability/markov-chains]].)</p> </article> <article> <h1>Utilities and Decision Trees</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h1 id="utility"> Utility <a class="anchor" href="#utility">#</a> </h1> <p>Utilities are values that determine the relative benefit of a particular state: the higher the utility, the better. For the statistical-decision-theory perspective on the same idea (loss functions, estimators, Bayes vs. minimax risk), see [[data102/decision theory]].</p> <p>Rational agents follow the <strong>principle of maximum expected utility (MEU):</strong> they always choose whichever actions maximize expected utility. Rational agents must also have <strong>rational preferences</strong> and therefore follow the <strong>axioms of rationality:</strong></p> </article> <article> <h1>Markov Decision Processes</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h2 id="what-is-a-markov-decision-process"> What is a Markov Decision Process? <a class="anchor" href="#what-is-a-markov-decision-process">#</a> </h2> <p>A Markov Decision Process is a Markov model that solves <strong>nondeterministic search problems</strong> (where an action can result in multiple possible successor states). The same formalism is treated from an inference/decision-theory perspective in [[data102/Markov Decision Processes]].</p> <p>A MDP is defined by:</p> <ul> <li>A set of states $s$</li> <li>A set of actions $a$</li> <li>A transition model $T(s, a, s’)$ that represents the probability $P(s’ | s, a)$ - that the action $a$ taken at state $s$ will lead to a new state $s’$. (Allowed by memoryless property)</li> <li>A reward function $R(s, a, s’)$ per transition</li> <li>Discount factor $\gamma \in [0, 1]$</li> <li>A start state</li> <li>A terminal (absorbing state)</li> </ul> <p>The utility function of an MDP can be calculated as follows:</p> </article> <article> <h1>Machine Learning</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h2 id="what-is-machine-learning"> What is Machine Learning? <a class="anchor" href="#what-is-machine-learning">#</a> </h2> <p>So far, we’ve used Bayes’ Nets, Markov Decision Processes, etc. to solve models. But how do we actually determine what those models are in the first place?</p> <p>This is where machine learning comes in: the process of improving models through experience. There are two different categories of algorithms: <strong>supervised learning,</strong> where relationships are inferred between given input and output data to predict outputs for new inputs, and <strong>unsupervised learning,</strong> where no outputs are given and the algorithm recognizes structures or patterns in the inputs. ****</p> </article> <article> <h1>Neural Networks</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <p><strong>General idea:</strong> Combine multiple simple regression models together to increase complexity of the overall model.</p> <h2 id="optimization"> Optimization <a class="anchor" href="#optimization">#</a> </h2> <h3 id="gradient-ascent-and-descent"> Gradient Ascent and Descent <a class="anchor" href="#gradient-ascent-and-descent">#</a> </h3> <p><strong>The difference:</strong> gradient ascent maximizes a log-likelihood function; gradient descent minimizes a loss function.</p> <p>Gradient Ascent algorithm:</p> <div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>randomly init w </span></span><span style="display:flex;"><span><span style="color:#66d9ef">while</span> w <span style="color:#f92672">not</span> converged: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">for</span> weight <span style="color:#f92672">in</span> w: </span></span><span style="display:flex;"><span> weight <span style="color:#f92672">=</span> weight <span style="color:#f92672">+</span> learning_rate <span style="color:#f92672">*</span> gradient(log_likelihood(w), weight) </span></span></code></pre></div><ul> <li>convergence is when gradient = 0, or no change occurs between two runs</li> <li>$w$ is a vector of $N$ weights</li> <li><code>gradient(log_liklihood(w), weight)</code> represents the operation $\nabla_{weight} \log l(\bold{w})$, which returns a vector of $N$ partial derivatives $\partial_{weight} \log l(w_i)$ for every weight $w_i$.</li> </ul> <p>Gradient Descent algorithm:</p> </article> <article> <h1>Reinforcement Learning</h1> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <h2 id="introduction"> Introduction <a class="anchor" href="#introduction">#</a> </h2> <p>Reinforcement Learning (RL) is an example of <strong>online planning,</strong> where agents have no prior knowledge of rewards or transitions and must explore an environment before using an estimated policy. The Data 102 treatment of the same material, with more emphasis on the statistical-inference side, is at [[data102/Reinforcement Learning]] (see also [[data102/bandits]] for exploration/exploitation tradeoffs).</p> <ul> <li>Model-based learning: attempts to estimate transition and reward functions with samples attained during exploration before solving MDP with estimates using value or policy iteration</li> <li>Model-free learning: attempts to estimate values/Q-values of states directly without constructing a reward or transition model in MDP</li> </ul> <p>Passive reinforcement learning: agent is given a policy and learns the values of states under that policy.</p> </article> <article> <p>Mon, 01 Jan 0001 00:00:00 +0000</p> <p>[[Agents and Environments]] [[Search Problems]] [[Games]] [[Logic]] [[Bayes Nets]] [[Markov Models]] [[Utilities and Decision Trees]] [[Markov Decision Processes]] [[Machine Learning]] [[Neural Networks]] [[Reinforcement Learning]]</p> <p>[[Project 5 Notes]]</p> </article> </main></body></html>