💽

Content

Default view

4 views

Default view

Default view

Table

Table

NameTypeWeekTagsImagePublishedDateFeatured
🚀
1 - Motivation, States, Actions and Rewards
Post
1
🛎️
2 - Return, Value Functions & Bellman Equations
Post
1
🕶️
3 - Learning from Experience, TD-Learning, epsilon-Greedy
Post
1
4 - Generalised Policy Iteration
Post
2
🧙
5 - Curse of Dimensionality, Function Approximation & Parameters
Post
2
🪶
6 - Feature Vectors
Post
2
🍎
7 - Intro to Neural Networks
Post
3
🔥
8 - PyTorch
Post
3
🏋️‍♀️
9 - Training in PyTorch
Post
3
🌍
10 - Model-free control & Q-Learning
Post
4
🕹️
11 - Deep Q-Networks
Post
4
🛹
12 - Double DQN
Post
4
What is PyTorch?
Post
Mean Squared Error in PyTorch
Post
Stochastic Gradient Descent
Post
nn.Sequential modules
Post
nn.Relu layers
Post
nn.Linear layers
Post
Activation functions
Post
Tensors
Post
Neurons to layers
Post
Polynomial function approximation
Post
Greedy Actions
Post
1 step lookahead
Post
Reward Functions
Post
State Transition Functions
Post
Episodic MDPs
Post
The Env class
Post
Reinforcement Learning in Python
Post
Epsilon Greedy Actions
Post
Policy \pi(s)
Post
Discounting the Future
Post
Supervised Learning the Value Function
Post
Generalised Policy Iteration
Post
Rewards
Post
Neural Networks
Post
Policy Evaluation
Post
Loss Functions in PyTorch
Post
Activation Functions
Post
Training a Model in PyTorch
Post
Backpropagation
Post
Q-learning
Post
Temporal Difference Learning
Post
Normalising Neural Network Inputs
Post
Problems with DQN and Value-based Approaches
Post
Value-Function Approximation
Post
Tensors and NumPy
Post
Autograd
Post
Value Functions
Post
Deep Q-Networks
Post
Training Machine Learning Models
Post
Linear Combination of Features
Post
Neural Network in PyTorch
Post
Exploration-Exploitation Trade-off
Post
Function Approximation for Policies
Post
Model-based and model-free approaches
Post
Optimizers
Post
Learning from Experience
Post
Policy Gradients
What is PyTorch?
Post
Gradient Descent
Post
Curse of Dimensionality
Post
Double DQN
Post
Bellman Optimal Action-Value Equation
Post
Mean Squared Error
Post
Feature Table Lookup
Post
Limitations of model-based approaches
Post
Difficulties with DQN
Post
Neural Network Dimensions
Post
Choosing a Learning Rate
Post
Terminal State Value
Loss Functions
Post
Neural Network Architectures
Post
What is a Model?
Post
Action-Value (Q) Functions
Post
Neurons
Post
Experience Replay
Post
Bellman Equation
Post
Approximation and Convergence Guarantees
Post
Limitations of Polynomials
Post
Generalization
Post
Markov Decision Processes
Post
Temporal Difference Update Equation
Post
Return
Post
Function Approximation methods
Post
Policy Improvement
Post
Actions
Post
What is Reinforcement Learning?
Post
States
Post
Why is Reinforcement Learning useful?
Post
Designing Features
Post
Stochastic State Transition Functions
Post
MCTS as Policy-Improvement Operator (1)
Residual Learning (1)
AlphaGo Zero: Architecture (1)
Batch Normalisation (1)
AlphaGo Zero: Backup (1)
AlphaGo Zero: Loss Functions (1)
AlphaGo Zero: Selection (1)
AlphaGo Zero: Self-play (1)
AlphaGo Zero: Expansion and Evaluation (1)
AlphaGo Zero: Training (1)
AlphaGo Zero: Action Selection (1)
AlphaGo: Value Network Training (1)
AlphaGo Zero Algorithm (1)
AlphaGo: Why two Policy networks? (1)
AlphaGo: RL Policy Network Training (1)
AlphaGo: SL Policy Network Training (1)
AlphaGo Zero: Neural Network (1)
AlphaGo versus AlphaGo Zero (1)
AlphaGo: Neural Networks Used (1)
AlphaGo: Action Selection (1)
AlphaGo: Simulation (1)
AlphaGo: Selection (1)
AlphaGo: Expansion (1)
AlphaGo: Backup (1)
AlphaGo: Rollout Policy (1)
Why Go is hard to solve (1)
Convolutional Neural Networks: Padding (1)
Convolutional Neural Networks: Pooling (1)
Convolutional Neural Networks in Reinforcement Learning (1)
AlphaGo Algorithm (1)
Convolutional Neural Networks: Stride (1)
Rules of Go (1)
Convolutional Neural Networks: Kernal Size (1)
Convolutional Layers (1)
Convolutional Neural Networks (1)
Handcrafted features for Computer Vision (1)
Convolutional Neural Networks: Number of Filters (1)
Convolution (1)
Filters for Feature Detection (1)
Monte Carlo Tree Search: Selection (1)
Monte Carlo Tree Search: Simulation (1)
Tree Pruning (1)
Computer Vision (1)
Monte Carlo Tree Search: Expansion (1)
Monte Carlo Tree Search: Backup (1)
Parallelisation (1)
Tree Policies (1)
Half-States (1)
Rollout Policies (1)
Sharing Rollout Updates (1)
Monte Carlo Tree Search (1)
Two-Player Monte Carlo Tree Search (1)
Environment and Acting Loop (1)
Unifying Learning and Planning (1)
Rollout Algorithms (1)
Types of Updates (1)
Decision-Time Planning (1)
Limitations of Dynamic Programming (1)
Trajectory Sampling (1)
What is Planning? (1)
Policy Iteration (1)
Value Iteration (1)
Distribution and Sample Models (1)
Model-Based Reinforcement Learning (1)
Dynamic Programming (1)
Legal Actions Masking (1)
Proximal Policy Optimization (PPO) (1)
Clipped Proximal Policy Optimization (PPO) (1)
Ideal Property of Policy Gradient Updates (1)
Trust-Region Policy Optimization (TRPO) (1)
Limitations of TRPO (1)
Clipped PPO with GAE (1)
Policy Gradients Instability (1)
Online Eligbility Traces (1)
Offline Forward-View GAE() (1)
Algorithms for Generalised Advantage Estimation (GAE) (1)
Batch GAE() (1)
TD-Lambda (1)
Advantage Function (1)
Advantage Estimation (1)
Generalised Advantage Estimation (1)
Lambda-return (1)
Actor Critic (1)
Monte Carlo Learning (1)
Bias-Variance Trade-Off (1)
n-Step Bootstrapping (1)
Averaging n-Step Returns (1)
Bootstrapping (1)
Learning a Baseline (1)
High Variance of Parameter Updates (1)
Model-Free Prediction (1)
Losses in Torch (1)
Vanilla Policy Gradients Update Equation (1)
Vanilla Policy Gradients (1)
Entropy Regularization (1)
Continuous Action Spaces (1)
Policy Gradient Theorem (1)
Softmax (1)
Benefits of Approximating Policies (1)
Stochastic Policies (1)
Supervised Learning
Overfitting