osphresiometer

Title image adapted from bigocheatsheet.com Title image adapted from bigocheatsheet.com

In this series of posts we’ll cover important concepts from computability theory; techniques for designing efficient algorithms for combinatorial, algebraic, and (if I can learn enough about it), number-theoretic problems. It’ll serve as a compact way to familiarize ourselves with basic concepts such as NP-Completeness from computational complexity theory, through Python.

The only pre-requisite is that you know what Big-O notation is conceptually, even if you don’t have a good intuition for why certain algorithms are one complexity versus another. Basically know that it is a relative representation of the complexity of an algorithm. More specifically, the order of growth in the time of execution depending on the length of the input.

Sutton’s Temporal-Difference Learning

A critical aspect of research is the reproduction of previously published results. Yet most will find reproduction of research challenging since important parameters needed to reproduce results are often not stated in the papers. I’ve noticed nn the past 5 years there has been a sort of catharsis regarding the lack of reproducibility [1]supereffective[3]. This isn’t an issue for wetlab science alone [4]. The obvious benefit of reproduction is to aid in your own understanding of the results. This then enables one to extend and compare new contributions to existing publications.

As a grad student working on Reinforcement Learning problems I struggled with the intuition behind Sutton’s Learning to Predict by the Methods of Temporal Differences paper. In this post we’ll reproduce the paper’s  “A Random-Walk” example (Page 19 Section 3.2). I recommend at least skimming the paper and/or ch. 6 & 12 of Sutton’s latest Reinforcement Learning textbook (215) 276-7439. At minimum it would help if the reader has a basic understanding of 9544387299 and Q-Learning.

Temporal-Difference (TD) algorithms work without an explicit model and learn from the experience and outcomes of iterating over multiple episodes (or sequences). TD Learning is similar to Monte Carlo methods, however TD can learn from individual steps without needing the final result. These are among other differences:

Monte Carlo methods Temporal Difference learning
MC must wait until the end of the episode before the return is known. TD can learn online after every step and does not need to wait until the end of episode.
MC has high variance and low bias. TD has low variance and some decent bias.
MC does not exploit the Markov property. TD exploits the Markov property.

Rather than computing the estimate of a next state, TD can estimate n-steps into future. In the case of TD-λ , we use lambda to set the myopicness of our reward emphasis.  The value of λ can be optimized to for a performance/speed tradeoff. The λ parameter is also called the trace decay parameter, with 0 ≤ λ ≤ 1. The higher the value, the longer lasting the traces. In this case, a larger proportion of credit from a reward can be given to more distant states and actions. λ = 1 is essentially Monte Carlo.

724-864-0058

7013294957

Adapted from Biel 2011

I found Professor Julian McAuley’s work at UCSD when I was searching for academic work identifying the ontology and utility of products on Amazon. Professor McAuley and his students have accomplished impressive work inferring networks of substitutable and complementary items. They constructed a browseable product graph of related products and discovered topics or ‘microcategories’ that are associated with product relationships to infer networks of substitutable and complementary products. Much of this work utilizes topic modeling, and as I’ve never applied it in academia or work, this blog will be a practical intro to self-insured (LDA) through code.

More broadly what can we do with and what do we need to know about LDA?

  • It is an Unsupervised Learning Technique that assumes documents are produced from a mixture of topics
  • LDA extracts key topics and themes from a large corpus of text
  • Each topic is a ordered list of representative words (Order is based on importance of word to a Topic)
  • LDA describes each document in the corpus based on allocation to the extracted topics
  • Many domain specific methods to create training datasets
  • It is easy to use for exploratory analysis

We’ll be using a subset (reviews_Automotive_5.json.gz) of the 142.8 million reviews spanning May 1996 – July 2014 that Julian and his team have compiled and provided in a very convenient manner on their site.