[ Machine learning : Predict output number from input vector trained "matched" ]
I would know if its possible in maths to predict output number from multiple samples given with input vectors.
I really dont know how to explain this so I will give you an example : The vector will be like [hours studied, sleep hours] and output is the score at a school test :
x.train(100, [5, 8])
x.train(0, [0, 0])
x.predict([2.5, 4]) // should return 50 (because inputs are the half)
x.predict([5, 8]) // should return 100
x.predict([0, 0]) // should return 0
Sorry if I am not clear enought .. If you can understand what I mean do you know a Python or PHP library that can do that and an example of how to use it ?
Thank you very much and have a great day !
Answer 1
Using scikit-learn
you can do:
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[5, 8], [0 , 0]], [100, 0])
print(clf.predict([[2.5, 4], [5, 8], [0, 0]]))
You should be aware of the chance of over/underfitting. You might want to consider using polynomial features which would increase the complexity of your model and thus increasing the change of better prediction but don't pick too complex features (in this case high polynomial degree) since it would surely lead you to overfitting. Try to find a trade-off between the two by using cross-validation.
Consider also using Ridge or Lasso which natively deal with overfitting.