Complexity Science » ��TV Annual Retreat Projects 2017

Re: Some Reinforcment Learning Problem

Gian Lorenzo Spisso — Tue, 09 May 2017 10:36:27 GMT

It's great you've found this, was quite interesting to read. It is also nice that we basically got all the feature of the game right (altough we were a bit reinventing the wheel). Player 1 seemed quite obvious, but apparently player 2 has several mixed strategies he could follow. We would expect any computer learner program to be able to settle down in the optimal strategy for player 1 and on one of the mixed strategies of player 2.

It is interesting how the author mentions that Player 2 is the advantaged one (as he can choose more then one strategy which might yield some advantage if P1 makes a mistake (at least that is my understanding of its final comment).

It would be interesting to see if our simple learner can:

1) Actually obtain the optimal strategies

2) Win consistently against a human

Since we have to present these results tomorrow:

1) I can lay down today an outline of the content of the slides

2) Can someone else put it into some slide format?

3) Robert: is the code up and running? Do you think by tomorrow it would be feasible to have a run of the two players playing against each other?

Re: Some Reinforcment Learning Problem

Yihe Lu — Fri, 05 May 2017 17:13:27 GMT

Hi all,

Found a theoretic paper (Schwartz, 1959) on the solution to 2-player spoof (generally with n coins each player, all other rules the same).

I have attached it in case you want to have a look.

Otherwise, here is the keypoints you may want to take home (with potentially any possible misunderstanding or intrinsic bias of mine):

1. it is a fair game.

2. both players would play mixed strategies and they are optimal as if you try to change it, you would suffer from loss (in the sense of expectation).

3. the proof entirely uses the very classical (old-fasioned) game theoretical approach, which is just listing all possible strategies with payoff functions.

At least, hopefully you could see something similar in the convergent of your AI's strategies in the game of two players.

Attachments (follow link to download)

1959.pdf (846 KB)

Re: Something nontrivial with CUDA

Tim Pollington — Sun, 30 Apr 2017 18:40:24 GMT

Update: OpenCV 2.4 now running to perform elementary matrix operations on data.

Attached: Jack's links and mine all zipped into this one file. Download it yourself to your laptop prior to the retreat in case WiFi is poor.

Plan: All compilation and running of code to occur on my desktop computer. People can write code and transfer it to mine via USB memory stick. Sorry, that's the most lo-tech way we could think of as I didn't want to try setting up an ssh server on this machine as I'm treating this CUDA installation as quite fragile as can't afford to reboot the machine else would have to cancel this WARP.

Attachments (follow link to download)

cuda_warps.zip (41 MB)

Game theory and vaccination

Sophie Meakin — Fri, 07 Apr 2017 11:58:51 GMT

The problem of whether to vaccinate or not can be thought of as a game theory problem: choosing to vaccinate incurs some cost, but prevents an individual from becoming infected; however, due to herd immunity if enough other people vaccinate then an individual can reap the benefits of vaccination without actually vaccinating. My idea was to write down the vaccination problem as a game theory problem and investigate that on its own, or to tie it into an epidemic model. It might be nice to code it up and see what happens (maybe on a lattice?).

Very open to ideas and discussion before the retreat- I'll post updates as and when they come.

Counting Sea Lions

Rob Eyre — Thu, 06 Apr 2017 10:27:19 GMT

A little while ago some of the MSc students were complaining about some homework that required them to classify very bad photos of fish (from a Kaggle competition). Not sure how well they did (hopefully well). I did initially think a group of us could have a go at the problem. However Ayman suggested trying a competition about sea lions instead as they are cuter. Information can be found at https://www.kaggle.com/c/noaa-fisheries-steller-sea-lion-population-count. Essentially we would develop a method to count the number of sea lions in an aerial photo. Might not get very far in three days but it's worth a go. The only issue seems to be that the dataset is a little on the large size, so anyone with a 1-2TB HDD would be useful. Might need to download the dataset in advance due to poor wi fi at the hostel.

Outsourcing Michael's PhD

Rob Eyre — Thu, 06 Apr 2017 10:03:56 GMT

In case you've missed all of Michael's many and varied talks, he does stuff to do with (long sentence, take a breath now) matching the right tool to the right job in a cost effective fashion by taking advantage of the covariances in Gaussian process interpolation. He has spent way too much time developing theories and running simulations and has therefore not actually tried it out on many (if any) real applications/datasets. It could be fun to see whether his stuff actually works or not. Therefore people who do this project can find suitable open access datasets and run his method on them. In doing this they will learn (to quote Michael) "Gaussian process regression and stochastic black box optimisation". Michael promises he won't take the results and run. At the very least you'll all appear in his acknowledgements (names perhaps spelt incorrectly).

Re: Some Reinforcment Learning Problem

Robert Gowers — Wed, 22 Mar 2017 22:22:25 GMT

In this post Gian Lorenzo Spisso wrote:

Hey Robert, this was a very interesting read! Thanks for sharing this!

On the nose of it, it feels that the any paradoxical result rests on the definition of the game and I would tend to side with those claimnig that its ill posedeness is the source of paradox. But even if that's not the case, it seems that it might be a defining exactly what kind of game should be played. I'd say we'd like to keep this as tractable as possible so that we can code something and get some results in the short time available at the retreat. I might be wrong though and would love to chat about this at the retreat.

Also, I wanted to add: a summary of the game of spoof can be found here.

Discussin with Ayman we might actually go the way of making the guessing simultaneous as a starter to make it even easier.

Yes, often the problem has often been ill-posed. My intention would be to avoid the philosophical confusion by avoiding having a "perfect" predictor and simply having the predictor as another agent with their own strategy who makes a secret prediction. Reading up on spoof, I would say that is a similar type of problem, but with a greater number of variables (it seems like there should be a way to make two-player spoof equivalent to the Newcomb problem, the number of coins you put in your hand is like the prediction stage).

Re: Some Reinforcment Learning Problem

Gian Lorenzo Spisso — Wed, 22 Mar 2017 16:19:24 GMT

*defining

This is getting out of hand quickly.

Re: Some Reinforcment Learning Problem

Gian Lorenzo Spisso — Wed, 22 Mar 2017 16:18:51 GMT

*it might be hard definng exactly...

Apologies, I like my English like I like my operative system: obscure and unfriendly to the user.

Re: Some Reinforcment Learning Problem

Gian Lorenzo Spisso — Wed, 22 Mar 2017 16:15:15 GMT

Hey Robert, this was a very interesting read! Thanks for sharing this!

On the nose of it, it feels that the any paradoxical result rests on the definition of the game and I would tend to side with those claimnig that its ill posedeness is the source of paradox. But even if that's not the case, it seems that it might be a defining exactly what kind of game should be played. I'd say we'd like to keep this as tractable as possible so that we can code something and get some results in the short time available at the retreat. I might be wrong though and would love to chat about this at the retreat.

Also, I wanted to add: a summary of the game of spoof can be found here.

Discussin with Ayman we might actually go the way of making the guessing simultaneous as a starter to make it even easier.

Re: Some Reinforcment Learning Problem

Robert Gowers — Wed, 22 Mar 2017 14:12:35 GMT

I have added a WARP topic about Newcomb's problem, which is potentially a reinforcement learning problem. It's possible that we could integrate the two WARPs, depending what is decided for this WARP.

Devising strategies for multiple iterations of Newcomb's problem

Robert Gowers — Wed, 22 Mar 2017 14:10:03 GMT

Newcomb's Problem is an interesting problem that has caused great controversy in the field of decision theory. In Newcomb's Problem there are two boxes and two agents. One box (A) contains always contains a small amount of money (utility) while the other (B) contains either a large amount of money or nothing. Each round of the game, one agent, the selector, can either choose to take the contents of both boxes or just box B. Prior to each choice however, the other agent, the predictor, attempts to predict the choice of the selector. A prediction of both boxes means that nothing is placed in box B while a prediction of choosing a single box means the large amount is placed in box B instead.

The aim of the selector is to acquire as much money as possible, while that of the predictor is to simply most accurately predict the choices of the selector. Thus this game provides an interesting case for applied game theory. This is because understanding the game is very easy, but identifying the optimal strategies for the selector and the predictor across a number of games is nontrivial. This project would like benefit from reinforcement learning, and thus it is possible that this WARP could be associated or integrated with the reinforcement learning WARP.

Re: Something nontrivial with CUDA

Tim Pollington — Tue, 14 Mar 2017 17:52:51 GMT

Great idea Jack. I can bring my desktop which has a GeForce GTX 750 Ti with compute capability 5.0. The nvcc compiler is running and I've already got it to do vector sums so far. I'm currently experimenting with doing simple matrix operations to compare it with my CPU and to explore the size limit of arrays I can put onto there since transferring host<=>device tends to take up most computational time. I'm using cuBLAS, a linear algebra library, to do matrix operations as this probably chooses the best algorithm for the specific GPU architecture plus requires the least CUDA knowledge(!) but also happy to go into the detail too.

Happy to preread beforehand. I have zero PDE experience so would be good if you can walk us through the problem you have and how you've done it in C.

Resources:

Powerpoints from a 2-day CUDA course at Southampton summer school(http://ngcm.soton.ac.uk/summer-academy/index.html) last year, attached.

Attachments (follow link to download)

cudacourse.zip (22 MB)

Re: Some Reinforcment Learning Problem

Ayman Boustati — Sat, 11 Mar 2017 22:33:24 GMT

This sounds good!I am guessing the game isn't too difficult to code up? We could try it and see how we go, if it is an easy problem, we can look at other games like it to try. I think this should be fun and we can learn a lot since RL and Game Theory are share a lot of ideas.

Re: Some Reinforcment Learning Problem

Gian Lorenzo Spisso — Sat, 11 Mar 2017 17:10:15 GMT

It should be a feasible/interesting task for reinforcement learning!

Re: Some Reinforcment Learning Problem

Gian Lorenzo Spisso — Sat, 11 Mar 2017 17:09:40 GMT

Ehi Ayman, we were thinking together with Giovanni of trying to teach a very simple game to a computer. The game is Spoof, a drinking game where everyone picks a number of coin from 0 to 3 and in turn people try to guess the total amount available. Game should be simple enough, but at the same time the incomplete information components makes it very interesting!

Can we predict human behaviour using brain imaging data (fMRI)?

Jessie Liu — Fri, 10 Mar 2017 14:57:37 GMT

Aim: I am very interested to see whether certian human behaviour e.g. drinking/smoking habbits, cognitive functions etc. or even psychiatric disorders like depression can be predicted by functional MRI (fMRI).

Data: I have got LOTS of BIG DATA we can play around with including hundreds of behaviourial/demogrphic measures and fMRI for hundreds of subjects.

Plan: I will first explain what sort of data are in my hand, and what exactly is fMRI and how can it be used. But I haven't explored many of the behaviourial/demogrphic measures I have got, so there is no particular behaviour in my mind that can/should be predicted. So the next thing I think is that we will look through the data see which measures are predictable (very correlated with fMRI). Once we select the object we want to predict, then we will need to build certain models to do the prediction.

Chanllenges: 1. Dimension reduction: Omg the data are of very high dimensionality. I have known some dimension reduction techniques that I could share and will be very happy to learn others.
2. Feature selection: Omg there are so many feature selection methods... We should have really good discussion on this. I am hoping to apply some DEEP LEARNING technique to this step but I know littel about.
3. Model fitting: Omg there are also so many models that can do the prediction (all kind of linear and non-linear). We will see where we end up with.

I know this may sound like a big project. But there are many possiblities that this 7-hour WARP can end up-with:
1. We may end-up with discovering what behaviours/demogrphics that are most related/predictable to fMRI.
2. We may end-up with learning/discussing all kinds of demension reduction/feature selection techniques, and do a little comparison/summary on them.
3. We may end-up with comparing the advantages/disadvantages of linear vs. non-linear models.
4. Or we may end-up with just making an outline of this project in a more detailed plan. e.g. apart from the chanlleges I listed above, what are other components that should be included in prediction. How can we systemize a predicting procedure that could improve the reproducibility and minimise the variance.
.............

Here are some related papers:
Using connectome-based predictive modeling to predict individual behavior from brain connectivity

DEEP LEARNING FOR ROBUST FEATURE GENERATION IN AUDIOVISUAL EMOTION RECOGNITION

Re: Some Reinforcment Learning Problem

Ayman Boustati — Thu, 09 Mar 2017 15:51:54 GMT

I found there RL environemts we can try out:

OpenAI Gym: this seems like the easiest and most traight forward to use.
DeepMind Lab: this one focuses on 3D navigation puzzles.
Project Malmo: this one focuses on building RL agents for Minecraft.

We could either try to play around with one of these, or come up with our own problem to solve.

Something nontrivial with CUDA

Jack Binysh — Fri, 03 Mar 2017 19:08:22 GMT

I want to try using CUDA ( the library you use for programming GPU's, used with C code) to write something basic but not totally basic. A PDE simulation/ a matrix multiplication etc. The idea would be, do a bit of reading and decide on a thing to simulate beforehand, then try and code it up in the ~7hrs we have. Depending on time, we could then try profiling it, improving it, comparing it to CPU implementations etc.

We need a (compute card) GPU - we can ssh to tinis, or try and 'rent' one from amazon for the retreat, or if someone has a better idea...or a GPU...

links: a book I got recommended

http://www.hds.bme.hu/~fhegedus/C++/programming_massively_parallel_processors.pdf

Papers on PDE simulations with CUDA:

http://sma.epfl.ch/~lmichel/cuda_report.pdf

https://arxiv.org/pdf/1004.0480.pdf

Some Reinforcment Learning Problem

Ayman Boustati — Fri, 03 Mar 2017 11:47:46 GMT

I am quite interested in learning more about Reinforcement Learning. Reincforcement Learning is a field that combines machine learning and game theory (link). I think it will be a good idea to explore the basics of reinforcement learning during the retreat, in the form of a simple (but cool!) project. At the moment, all I have in mind is the multi-armed bandit problem, which Sophie kindly sent me a few resources on. However, I am open to any suggestions for other interesting problems, provided they fit the timescale allocated to the WARPS (i.e. approx 7 hours). I want to start the conversation about possible projects on this forum or in person in the department, so we can have a clear idea on what to do during the retreat.

Looking forward for suggestions!

Complexity Science » ����TV Annual Retreat Projects 2017

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Re: Something nontrivial with CUDA

Game theory and vaccination

Counting Sea Lions

Outsourcing Michael's PhD

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Devising strategies for multiple iterations of Newcomb's problem

Re: Something nontrivial with CUDA

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Re: Some Reinforcment Learning Problem

Can we predict human behaviour using brain imaging data (fMRI)?

Re: Some Reinforcment Learning Problem

Something nontrivial with CUDA

Some Reinforcment Learning Problem

Complexity Science » ��TV Annual Retreat Projects 2017