Moed B exam+final grades are available.

There was a +5 point factor, and grades in the range 55-60 were rounded up to 60.

The exam solution is also available. Enjoy!

Regev

]]>The final grades are in: http://ml-tau-2015.wikidot.com/home-assignments

The project file includes the accuracy calculated on the test set, the derived grade, and the final grade, which is the derived grade minus a penalty, for projects who had technical problems.

Let us know if there are any issues. In particular, there are 4 students who don't have a grade on their project yet (and therefore a low final grade) - your grade will be updated once the issues are resolved.

Thanks,

Regev.

]]>Were the project grades (before penalty) calculated as said in the project description?

If I apply the formula for an accuracy of 86, then 20+80*(86/92) = 94.78, yet in the grades pdf the grade for such accuracy is always 94.

What's the reason for that? Are there decimals that are used in the calculation that are not shown?

Thanks

]]>האם עלי בכל זאת להגיש קובץ שרץ?(ההגבלה על המשתמש באונברסיטה היא 2 גיגה ואין לי מקום לכך)

תודה ]]>

1. What do you mean by 'hard copy of the files' ? Should we submit a flashdrive with the ZIP on it in the course box?

2. Should we include the train files in case you would like to run our train function ?

3. If one uses code outide of matlab for training, sould one submit a a script (a shell script) that does ALL the pretraining (text manipulations, word2vec ,etc) and training ?

Thanks !

]]>Will all the test data be in the following file format: ([0-9]*).txt ?

I need the file name to be in numbers for the test data matrix..

Thanks

]]>We would wise to use a an official software, as mentioned that was allowed, which is written in python, or another one written in Java.

Is it OK to write a very short python/Java wrapper for this software (and use it on the project), so it's output will be more convenient to work with in Matlab? ]]>

(for those who are not interested - sorry for the spam) ]]>

Also, can we use pre-trained word vectors from an online library?

Thanks.

]]>1. Before I can classify, I have to read data off of a text file that I will include in the final .zip. Is this ok? As long as the folder is extracted and the files are in the same directory my classifier will work.

2. Does the order of the files in predicted.txt matter?

3. I understand that we have an hour to classify - around how many files will we have to classify?

Thanks

]]>2. Homework 4 will be returned today at the usual place. The exercise grades will be uploaded later today. Please take a look at your exercise grades to make sure you have all the grades. If there are any questions or reservations on your exercise grades, please let me know as soon as possible, and at any case until next week.

]]>כמה הבהרות לאחר קבלת שאלות ממספר סטודנטים:

1. מותר להשתמש בכל תוכנה רשמית ("יש לה דף אינטרנט") במטלב. להריץ פייתון מתוך MATLAB זה לא פתרון. אנחנו צריכים לבצע בדיקה בMATLAB. ראו גם הודעה קודמת של רגב.

2. אנחנו נריץ על מחשבים של SYSTEM כמו NOVA.

3. אתם יכולים לשלוח PATH במקום לצרף קבצים ואנחנו נעתיק את כל הספרייה ונריץ את go.m מתוכה.

4. אנחנו לא נאמן באמצעות הקוד שלכם, אלא אם כן נרצה לוודא שהוא אכן עובד. קוד האימון צריך להיות מתועד ומוכן להפעלה.

5. אנא ודאו שהקוד שלכם רץ כנדרש ואין בו תלויות נסתרות. אסור לקוד שלכם לפנות למקומות אחרים (למשל בספרייה שלכם)

I wanted to re-iterate a few things about the project:

1. The training part will be not part of what's being run to test your project. You will do the training part yourself, and your predictor function will return a ready-made classifier, that will only be run on the test set.

2. The code should be in Matlab only. I know that a few people asked if Matlab could run Python etc. I checked with Prof. Wolf and in fact the code should be in Matlab only.

Regev

]]>תודה ]]>

I encountered question 4 from exam 2013b, about transformation of points in the context of soft margin SVM.

In the solution, it states the transformation x->ax will not change the classification(no explanation included).

I disagree. To my understanding, if we keep the regularization parameter C constant throughout the transformation, then the transformation x->ax changes the scaling of the problem, which changes the relative weight of the error in the optimization function, which could change the classification.

Who is correct?

Thanks in advance,

John Francis Anthony Pastorius III

Will the perceptron classify the same after transformation

1. T(X) = Dx

2. T(x) = Ux

My guess is that the perceptron will classify the same for T=Ux and will not classify the same for T=Ux

]]>section 2:

a. is there any difference if the 100 and 200 Hypothesis are from different Set?

b. according to the solution: bigger |H| => bigger epsilon for a fixed m. but what happen with the probability for a mistake?

overfitting depend on it too.

If so, then for any set of points we can produce a tree with 0 sample error?

thanks

]]>The definitions of the ML and MAP hypotheses are not so clear to me.

I'm guessing the definitions would be as follows (given the sample S):

h_{ml} = argmax Pr[S|h]

h_{map} = argmax Pr[h|S]

My problem is that Pr[S|h] would only take 2 values: 0 if the error of h on S is more than 0 or Pr[S] otherwise.

Is my understanding correct?

Thanks

]]>In 2013 moed B Question 7 we are asked to calculate linear regression/PCA. Is there some trick to doing this, or do we have to compute the algorithms?

Thanks ]]>

If we look at the update rule - it remains the same. Do we assume the "bad" points (within the margin or in the opposite side of it) will not be given to the algorithm after a certain time t in the algorithm? if we don't, we can repetitively feed the perceptron with a bad point until the decision boundary is completely wrong.. ]]>

I apologize for the lateness. Exercise 3 will be returned to you tomorrow morning.

For Exercise 4, the following is a reference in which you can find the solution to the theoretical questions, in the hope that it would be helpful:

Q1: http://http-server.carleton.ca/~bchu/Document/ECON5005/lecture05.pdf (Note that there are other, more direct solutions).

Q2: https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem#Proof

Q3: Page 12 of http://www.cs.nyu.edu/~mohri/mlu/mlu_lecture_11.pdf

Regev

]]>Also, what is expected of us when you ask "For a soft-margin SVM with a polynomial kernel, what will happen if we multiply X by A (s.t. A is of rank n)?" It seems a little hard to visualize…

Thanks again.

Constructing the following example seems to refute the statement in question 2 section d (which according to the solution is true):

Four points: (10,1), (10,-1), (-10, 1), (-10, -1)

U =

0 1

1 0

If you apply the transformation you get: (-1,10), (-1,-10), (1, 10), (1, -10)

Which seems to also swap the PCs (the original first becomes the second and the original second becomes the first). The solution states that the PCs remain the same. Have I misunderstood something? If so could you please elaborate the solution further?

Thanks!

]]>in SVD we do : SVD(X) = UDV^t

1) Does UU^t=I and VV^t=I?

2) When we refer to the variance covered by a principal component i, is it equal to D_ii or (D_ii)^2?

]]>תודה מראש!

]]>In 2013MoedB, very last section, we are given the points (2,0),(1,0),(0,2),(0,1),(0,0),

And the question is "what is the line defined by the dimension reduction using PCA"?

What does that actually mean ? Do we have to compute the re-projected points by computing $UU^{t}x$ and then compute the regression line ? (where U is the PCA matrix)

Thanks ]]>

What is the policy regarding the use of external Matlab libraries in the project (libsvm, NLP libraries, etc…).

I've checked the ML-project spec where it's stated:

*'Any official and publicly available software package may be used as long as an exact reference is indicated and usage instructions are detailed. All other software must be original and included in your submission.'*

So are public matlab NLP libraries that we find, considered *official*, and valid to use?

Is it possible for a d=2 polynomial kernel learn a non linear separator of the shape x

This is in reference to question 3 in 2014b.

Thanks,

Roey

the solutions given in the test's solution-file are not elaborated enough (at all)

Qu3: We don't fully understand the question and what is required of us.

Qu4: In the 3rd and the last transformation ( D(x), A(x) ), what is the reason for the original solution to differ from the transformed one?

Thanks,

Taitai

]]>What I don't understand is, what are we trying to do? compress the data or reduce the features?

Because if we are trying to reduce the features, then we ended up with a vector(principal component) that is at the same dimension as the features and it doesn't help us. ]]>

I understand that if U is orthonormal matrix and X is vector and we calculate: U*x

we change the basis of x to be the basis that is spanned by the columns of U.

But in SVD we make V(Transpose)*x and it is written that we change the basis of x to be

the basis of the orthonormal columns of V. But when we are doing: V(Transpose) * X

the result is linear combination of the rows of V

I Would like an explanation please

]]>We discussed using feature decision stumps in the boosting recitation,

However, it is stated in the end that the decision stumps cannot be used as weak learners because they might have an error of exactly 1/2 (in symmetric cases),

So, which h do we actually use as the weak learner in the adaboost algorithm ?

Thanks & hopefully my question was clear enough ]]>

can you please give the range of relevant pages from the PDF of PAC model lacture?

thank you. ]]>

What is the answer to Q1 - Decision stump ?

My guess is no but my question is :

Lets assume all the positive example are in the rectangle :

if a<X<b and c<y<d

In general can we use the same feature multiple times ? like:

if X<a then y=0

else (X>a)

if X > b then y=0

else

if Y> d then y=0

else

if Y< c then y=0

else

y=1

In addition - what about adaboost with weak learners ?

]]>In Perceptron the online training process consists of updates of w in step t - i.e w_t+1 = w_t +c(x)*x

While in SVM, kNN all operations in the algorithm performed directly on Kernels, on the Perceptron case, if I understand correctly, while online training, in case of an error in the higher dimension the fix is performed in the original dimension. for instance: K(w_t,x)>0 while it should have been < , the update will take place: w_t+1 = w_t - x

The resulting <w*,x> is a classifier which is a linear combination of kernels but how can we tell that the update process is valid, said different, how do we know update in the original dimension corresponds to the fix in the higher dimension?

]]>Are we allowed to bring material in the exam? Calculator?

Thanks,

Roey.

During the lectures/recitation there were in use different formulas for vectors/matrix derivatives (with a reference to "the matrix cookbook").

Looking at the formulas there are quite many of it..

Does usage of it relevant for the test?

Thanks!

]]>In this case the perceptron update may cause it to move in the right direction but it will not classify xt correctly after the update. I understand this should be working if w_t is normalized but no one is assuring us that.

Can you please explain..?

]]>You asked to plot the eigenvalues we get from the PCA.

I used SVD to get the singular values of the data matrix X.

Should I plot the singular values as they are, or should I square them to get the eigenvalues of the matrix XX^T?

(I guess it doesn't really matter since it's only kind of scaling, but I wanted to be sure anyway). ]]>

The general guideline is that the material is whatever was passed in the lecture, no more and no less. The scribes are to help you.

More specifically, I understand that Tomer reached slide 47 (stopped at Example 2). Example 2 was in fact what we spoke about in Recitation 4, so the recitation is the authoritative material and is sufficient regarding that. Model selection is not included, from this lecture. The corresponding parts in the lesson scribe are parts 4.1-4.5 (unless there's a left black bar). Parts 4.6 and 4.7 are not mandatory, and part 4.8 is for your reference.

Is it possible to upload solutions to the HW?

It is very hard to understand the correct answer simply by the checker feedback if we are wrong in some exercise.

Thanks,

Amir

Two clarifications about Exercise 4:

1. In the practical assignment, you are not allowed to use Matlab's PCA

functions. You can use any other function related to the

eigen-decomposition of a matrix.

2. In Q3 (AdaBoost), we wish the find the direction in which the slope is

the steepest, therefore, we wish to maximize the *absolute value* of the

directional derivative, and not the actual value, which could be negative.

I've updated the exercise to reflect these corrections, and there is an

submission deadline extension to Sunday free of charge.

Regev

]]>I was wondering if we can actually use matlab's pca function, as it was not explicitly forbidden?

Thanks.

theta tilde is unbiased i.f.f Dy=0 and not DX=0???

]]>