Tuần 7.3 Bayesian networks: definition., AI Principle, CS221
Автор: Le Hoang Long Long
Загружено: 2025-05-21
Просмотров: 15
Описание:
playlist: • CS221: Artificial Intelligence: Principles...
github: https://github.com/hoanglong1712/Stan...
https://stanford-cs221.github.io/autu...
https://stanford-cs221.github.io/autu...
CS221: Artificial Intelligence: Principles and Techniques, Stanford
Screen Recording 2025 05 21 at 5 26 04 PM
Bayesian networks: definitions
• In this module, I’ll present the formal definition of Bayesian networks, give a few examples, and talk about an important property called
explaining away.
Review: probability
Random variables: sunshine S ∈ {0, 1}, rain R ∈ {0, 1}
Joint distribution (probabilistic database):
P(S, R) =
s r P(S = s, R = r)
0 0 0.20
0 1 0.08
1 0 0.70
1 1 0.02
Marginal distribution:
(aggregate rows)
P(S) =
s P(S = s)
0 0.28
1 0.72
Conditional distribution:
(select rows, normalize)
P(S | R = 1) =
s P(S = s | R = 1)
0 0.8
1 0.2
CS221 2
• Before introducing Bayesian networks, let’s review some basic probability. We start with an example about the weather. Suppose we have two
boolean random variables, S and R representing whether there is sunshine and whether there is rain, respectively. Think of an assignment to
(S, R) as representing a possible state of the world.
• The joint distribution specifies a probability for each assignment to (S, R) (state of the the world). We use lowercase letters (e.g., s and r)
to denote values and uppercase letters (e.g., S and R) to denote random variables. Note that P(S = s, R = r) is a probability (a number)
while P(S, R) is a distribution (represented by a table of probabilities). We don’t know what state of the world we’re in, but we know what
the probabilities are (there are no unknown unknowns). Think of the joint distribution as one giant (probabilitsic) database that contains full
information about how the world works.
• Sometimes, we might only be interested in a subset of the variables, e.g., sunshine S. From the joint distribution, we can derive a marginal
distribution over that. In the case of S, we get this by summing the probabilities of the rows in the joint distribution table that share the
same value of S. The interpretation is that we are interested in (the marginal probability of) S. We don’t explicitly care about R, but we
still need to take into account R’s effect on S. We say in this case that R is marginalized out.
• Sometimes, we might observe evidence; for example, suppose we know that there’s rain (R = 1). Again from the joint distribution, we
can derive a conditional distribution of the remaining variables (S) given this evidence R = 1. We do this by selecting rows of the table
matching the condition and then normalizing the remaining probabilities so that they sum to 1. Note that this normalization constant is
exactly P(R = 1).
Review: probability
Variables: S (sunshine), R (rain), T (traffic), A (autumn)
Joint distribution (probabilistic database):
P(S, R, T, A)
Marginal conditional distribution (probabilistic inference):
• Condition on evidence (traffic, autumn): T = 1, A = 1
• Interested in query (rain?): R
P( R︸︷︷︸
query
| T = 1, A = 1
︸ ︷︷ ︸
condition
)
(S is marginalized out)
CS221 4
• Let us augment our running example with two other random variables, T (whether there is traffic) and A (whether it’s autumn).
• We have a joint distribution, which again can be thought of as a probabilistic database that tells us how the world works.
• Probabilistic inference is the process of answering questions against this database. In general, we can both condition on evidence and be
interested in a subset of the remaining variables at the same time.
• For example, we might condition on there being traffic and the fact that it’s autumn.
• And we might be interested in whether there is rain (called the query variable), marginalizing out sunshine.
• The set of conditioning variables, query variables, and variables that are marginalized out should form a partitioning of all the variables.
A puzzle
Problem: earthquakes, burglaries, and alarms
Earthquakes and burglaries are independent events (probability ).
Either will cause an alarm to go off.
Suppose you get an alarm.
Does hearing that there’s an earthquake increase, decrease, or keep constant the
probability of a burglary?
Joint distribution:
P(E, B, A)
Questions:
P(B = 1 | A = 1) ? P(B = 1 | A = 1, E = 1)
CS221 6
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: