<div dir="ltr"><div class="gmail_default" style="font-size:small"><b style="font-family:verdana,sans-serif;font-size:large;color:rgb(80,0,80);background-color:rgb(207,226,243)"><span class="gmail-il">Thesis</span> <span class="gmail-il">Defense</span>: Falcon Dai, TTIC</b><br clear="all"></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);font-weight:700;vertical-align:baseline;white-space:pre-wrap">When:        </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap">Wednesday, September 7th from </span><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b>12:30 - 2:30 pm CT</b></span><br></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;vertical-align:baseline;white-space:pre-wrap"><span style="background-color:rgb(255,255,255);color:rgb(0,0,0);font-weight:700">Virtually:</span><font color="#000000" style="background-color:rgb(255,255,255)">   <a href="https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" style=""> </a></font></span><b style="background-color:rgb(255,255,255);font-family:arial,sans-serif;white-space:pre-wrap"><font color="#0000ff" style=""><a href=" https://uchicago.zoom.us/j/98534120153?pwd=SmRDMFo1UTA1M3pNZEZOblhkWG9yQT09" style="">Join Virtually Here</a></font></b><span style="background-color:rgb(255,255,255)"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap"> </span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;white-space:pre-wrap">    </span></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><br></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="font-weight:700;background-color:rgb(255,255,255)">Who:          Falcon Dai, TTIC</span></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span class="gmail-il" style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span class="gmail-il" style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"><br></span></span></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;font-variant-numeric:normal;font-variant-east-asian:normal;color:rgb(34,34,34);vertical-align:baseline"><span class="gmail-il" style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700">Thesis</span><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-weight:700"> Title</span>: </span><span style="background-color:rgb(255,255,255);font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34)">On Reward Structures of Markov Decision Processes</span>
</span></div><br class="gmail-Apple-interchange-newline"><div class="gmail_default" style="font-size:small"><b>Abstract</b>: <br></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><div style="white-space:normal">A Markov decision process can be parameterized by a transition kernel and a reward function. Both play essential roles in the study of reinforcement learning as evidenced by their presence in the Bellman equations. In my inquiry of various kinds of "costs'' associated with reinforcement learning inspired by the demands in robotic applications, I discovered that rewards prove central to understanding the structure of a Markov decision process and reward-centric notions can elucidate important concepts in reinforcement learning.<br><br>Specifically, I studied the sample complexity of policy evaluation and developed a novel estimator with an instance-specific error bound of $\widetilde{O}(\sqrt{\nicefrac{\tau_s}{n}})$ for estimating a single state value. Under the online regret minimization setting, I refined the transition-based MDP constant, diameter, into a reward-based constant, maximum expected hitting cost, and with it, provided a theoretical explanation for how a well-known technique, potential-based reward shaping, could accelerate learning with expert knowledge. In an attempt to study safe reinforcement learning, I modeled hazardous environments with irrecoverability and proposed a quantitative notion of safe learning via reset efficiency. In this setting, I modified a classic algorithm to account for resets achieving promising preliminary numerical results. Lastly, for MDPs with multiple reward functions, I developed a planning algorithm that computationally efficiently finds Pareto optimal stochastic policies.</div>
</span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><br></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;vertical-align:baseline;white-space:pre-wrap"><b>Thesis Advisor:</b><a href="mailto:mwalter@ttic.edu" style=""><b> Matthew Walter</b></a></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div class="gmail_default" style="font-size:small"><span style="font-variant-numeric:normal;font-variant-east-asian:normal;font-family:arial,sans-serif;color:rgb(0,0,0);vertical-align:baseline;white-space:pre-wrap;background-color:rgb(255,255,0)"><b><br></b></span></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="font-family:arial,helvetica,sans-serif;font-size:x-small">Mary C. Marre</span><br></div><div><div><font face="arial, helvetica, sans-serif" size="1">Faculty Administrative Support</font></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1"><b>Toyota Technological Institute</b></font></i></div><div><i><font face="arial, helvetica, sans-serif" color="#3d85c6" size="1">6045 S. Kenwood Avenue</font></i></div><div><font size="1"><i><font face="arial, helvetica, sans-serif" color="#3d85c6">Chicago, IL  60637</font></i><br></font></div><div><b><i><a href="mailto:mmarre@ttic.edu" target="_blank"><font face="arial, helvetica, sans-serif" size="1">mmarre@ttic.edu</font></a></i></b></div></div></div></div></div></div>