<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small"><div class="gmail_default"><b style="font-family:verdana,sans-serif;font-size:large;color:rgb(80,0,80);background-color:rgb(207,226,243)">Thesis Defense: Falcon Dai, TTIC</b><br clear="all"></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">When:        </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Wednesday, September 7th from </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b>12:30 - 2:30 pm CT</b></span><br></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;vertical-align:baseline;white-space:pre-wrap"><span style="color:rgb(0,0,0);font-weight:700">Virtually:</span><font color="#000000">   <a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank"> </a></font></span><b style="font-family:arial,sans-serif;white-space:pre-wrap"><font color="#0000ff"><a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank">Join Virtually Here</a></font></b><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap"> </span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap">    </span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><br></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="font-weight:700;background-color:rgb(255,255,255)">Who:          Falcon Dai, TTIC</span></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700">Thesis</span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"> Title</span>: </span><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34)">On Reward Structures of Markov Decision Processes</span>
</span></div><br><div class="gmail_default"><b>Abstract</b>: <br></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><div style="white-space:normal">A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In my inquiry of various kinds of "costs'' associated with reinforcement learning inspired by the demands in robotic applications, I discovered that rewards prove central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning.<br><br>Specifically, I studied the sample complexity of policy evaluation and developed a novel estimator with an instance-specific error bound of $\widetilde{O}(\sqrt{\nicefrac{\tau_s}{n}})$ for estimating a single state value. Under the online regret minimization setting, I refined the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provided a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, I modeled hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, I modified a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, I developed a planning algorithm that computationally efficiently finds Pareto optimal stochastic policies.</div>
</span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div class="gmail_default"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><b>Thesis Advisor:</b><a href="mailto:mwalter@ttic.edu" target="_blank"><b> Matthew Walter</b></a></span></div><div class="gmail_default"><br></div><div class="gmail_default"><br></div></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Sep 7, 2022 at 11:56 AM Mary Marre <<a href="mailto:mmarre@ttic.edu">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div style="font-size:small"><div><b style="font-family:verdana,sans-serif;font-size:large;color:rgb(80,0,80);background-color:rgb(207,226,243)">Thesis Defense: Falcon Dai, TTIC</b><br clear="all"></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">When:        </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Wednesday, September 7th from </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b>12:30 - 2:30 pm CT</b></span><br></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;vertical-align:baseline;white-space:pre-wrap"><span style="color:rgb(0,0,0);font-weight:700">Virtually:</span><font color="#000000">   <a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank"> </a></font></span><b style="font-family:arial,sans-serif;white-space:pre-wrap"><font color="#0000ff"><a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank">Join Virtually Here</a></font></b><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap"> </span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap">    </span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="font-weight:700;background-color:rgb(255,255,255)">Who:          Falcon Dai, TTIC</span></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700">Thesis</span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"> Title</span>: </span><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34)">On Reward Structures of Markov Decision Processes</span>
</span></div><br><div><b>Abstract</b>: <br></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><div style="white-space:normal">A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In my inquiry of various kinds of "costs'' associated with reinforcement learning inspired by the demands in robotic applications, I discovered that rewards prove central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning.<br><br>Specifically, I studied the sample complexity of policy evaluation and developed a novel estimator with an instance-specific error bound of $\widetilde{O}(\sqrt{\nicefrac{\tau_s}{n}})$ for estimating a single state value. Under the online regret minimization setting, I refined the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provided a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, I modeled hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, I modified a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, I developed a planning algorithm that computationally efficiently finds Pareto optimal stochastic policies.</div>
</span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><b>Thesis Advisor:</b><a href="mailto:mwalter@ttic.edu" target="_blank"><b> Matthew Walter</b></a></span></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Sep 6, 2022 at 3:48 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div style="font-size:small"><div><b style="font-family:verdana,sans-serif;font-size:large;color:rgb(80,0,80);background-color:rgb(207,226,243)"><span>Thesis</span> Defense: Falcon Dai, TTIC</b><br clear="all"></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">When:        </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Wednesday, September 7th from </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b>12:30 - 2:30 pm CT</b></span><br></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;vertical-align:baseline;white-space:pre-wrap"><span style="color:rgb(0,0,0);font-weight:700">Virtually:</span><font color="#000000">   <a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank"> </a></font></span><b style="font-family:arial,sans-serif;white-space:pre-wrap"><font color="#0000ff"><a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank">Join Virtually Here</a></font></b><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap"> </span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap">    </span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="font-weight:700;background-color:rgb(255,255,255)">Who:          Falcon Dai, TTIC</span></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><span>Thesis</span></span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"> Title</span>: </span><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34)">On Reward Structures of Markov Decision Processes</span>
</span></div><br><div><b>Abstract</b>: <br></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><div style="white-space:normal">A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In my inquiry of various kinds of "costs'' associated with reinforcement learning inspired by the demands in robotic applications, I discovered that rewards prove central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning.<br><br>Specifically, I studied the sample complexity of policy evaluation and developed a novel estimator with an instance-specific error bound of $\widetilde{O}(\sqrt{\nicefrac{\tau_s}{n}})$ for estimating a single state value. Under the online regret minimization setting, I refined the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provided a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, I modeled hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, I modified a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, I developed a planning algorithm that computationally efficiently finds Pareto optimal stochastic policies.</div>
</span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><b><span>Thesis</span> Advisor:</b><a href="mailto:mwalter@ttic.edu" target="_blank"><b> Matthew Walter</b></a></span></div><div><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><br></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 30, 2022 at 4:07 PM Mary Marre <<a href="mailto:mmarre@ttic.edu" target="_blank">mmarre@ttic.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-size:small"><b style="font-family:verdana,sans-serif;font-size:large;color:rgb(80,0,80);background-color:rgb(207,226,243)"><span>Thesis</span> <span>Defense</span>: Falcon Dai, TTIC</b><br clear="all"></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">When:        </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Wednesday, September 7th from </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b>12:30 - 2:30 pm CT</b></span><br></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;vertical-align:baseline;white-space:pre-wrap"><span style="background-color:rgb(255,255,255);color:rgb(0,0,0);font-weight:700">Virtually:</span><font color="#000000" style="background-color:rgb(255,255,255)">   <a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank"> </a></font></span><b style="background-color:rgb(255,255,255);font-family:arial,sans-serif;white-space:pre-wrap"><font color="#0000ff"><a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" target="_blank">Join Virtually Here</a></font></b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap"> </span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap">    </span></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><br></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="font-weight:700;background-color:rgb(255,255,255)">Who:          Falcon Dai, TTIC</span></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700">Thesis</span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"> Title</span>: </span><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34)">On Reward Structures of Markov Decision Processes</span>
</span></div><br><div style="font-size:small"><b>Abstract</b>: <br></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><div style="white-space:normal">A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In my inquiry of various kinds of "costs'' associated with reinforcement learning inspired by the demands in robotic applications, I discovered that rewards prove central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning.<br><br>Specifically, I studied the sample complexity of policy evaluation and developed a novel estimator with an instance-specific error bound of $\widetilde{O}(\sqrt{\nicefrac{\tau_s}{n}})$ for estimating a single state value. Under the online regret minimization setting, I refined the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provided a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, I modeled hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, I modified a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, I developed a planning algorithm that computationally efficiently finds Pareto optimal stochastic policies.</div>
</span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><b>Thesis Advisor:</b><a href="mailto:mwalter@ttic.edu" target="_blank"><b> Matthew Walter</b></a></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div><div dir="ltr"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>
</blockquote></div></div>
</blockquote></div></div>
</blockquote></div></div>