
MMS • RSS
Article originally posted on Data Science Central. Visit Data Science Central
Topic | Difficulty Level (High / Low) |
Questions | Refs / Answers | |
1. | Text Mining | L | Explain :TFIDF, Stanford NLP, Sentiment Analysis, Topic Modelling | |
2. | Text Mining | H | Explain Word2Vec. Explain how word vectors are created | https://www.tensorflow.org/tutorials/word2vec |
3. | Text Mining | L | Explain Distance : hamming, cosine or eucleadean. | |
4. | Text Mining | H | How can I get single vector for sentence / paragraphs / document using word2vec ? | https://radimrehurek.com/gensim/models/doc2vec.html |
5. | Dimestion Reduction | L | Suppoese I have TFIDF matrix having dimensions 1000×25000. I want to reduce the the dimensions to 1000×500. What are the ways available ? | PCA , SVD, (max df, min df, max features in TFIDF) |
6. | Dimestion Reduction | H | Kernel PCA, tSNE | http://scikit-learn.org/stable/modules/decomposition.html#decompositions |
7. | Supervised Learning | H | Uncorrelated vs highly corelated features : How they will affect linear regression vs GBM vs Random Forest | GBM and RF are least affected |
8. | Supervised Learning | L | If Metioned in Resume ask about : Logistic Regression, RF, Boosted Trees, SVM, NN | |
9. | Supervised Learning | L | Explain Bagging Vs Boosting | |
10. | Supervised Learning | L | Explain how variable importance is computed in RF and GBM | |
11. | Supervised Learning | H | What is Out Of bag in bagging | |
12. | Supervised Learning | H | What is difference between adaboost and gradient boosted trees | |
13. | Supervised Learning | H | What is learning rate ? What will happen if I increase my rate from 0.01 to 0.6 | The learning will be unncessarlity fast and the chances are that because of increased learning rate, global minima will be missed and weights will fluctuate. But if learning rate is 0.01, the learning will be slow and the chances are model will get stuck in local minima. Learning rate shoul dbe decided based on CV / parameter tuning |
14. | Supervised Learning | L | How would you choose parameters of any model? | http://scikit-learn.org/stable/modules/grid_search.html |
15. | Supervised Learning | L | Evaluation of Supervised Learning, Log Loss, Accuracy , sensitivity, specificity, AUC-ROC curve, Kappa | http://scikit-learn.org/stable/modules/model_evaluation.html |
16. | Supervised Learning | L | My data has 1% Lable 1 and 99% lalel 0 , and my model has 99% accuracy? Should I be happy ? Explain Why | No. This might just mean that model has predicted all 0s with no intelligence. Look at Confusion Mat, Sensitivity Specificity, Kappa etc. Try oversampling, Outlier Detection , diferent algos like RusBoost etc |
17. | Supervised Learning | H | How can I increase the percentage of Minority class representation in this case ? | SMOTE, Random Oversampling |
18. | Unsupervised Learning | L | Explain Kmeans | http://scikit-learn.org/stable/modules/clustering.html#clustering |
19. | Unsupervised Learning | L | How to choose no of clusters in K means | https://www.quora.com/How-can-we-choose-a-good-K-for-K-means-clustering |
20. | Unsupervised Learning | H | How to evaluate unsupervised learning algorithms | http://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation |
21. | Unsupervised Learning | H | Which algorithm doesn’t require no of clusters as an input ? Birch , DBSCAN, etc | http://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods |
22. | Unsupervised Learning | H | Explain AutoEncoder- Decoders | |
23. | Data Preprocessing | L | Normalising the data : How to normalise Train and Test data | http://scikit-learn.org/stable/modules/preprocessing.html#custom-transformers |
24. | Data Preprocessing | L | Categorical variables : How to convert categorical variablesin to features 1- when no ordering, 2- when ordering | Dummy / one hot Encoding , Thermometer Encoding |
25. | Unsupervised Learning | H | How kmeans will be affected in the presence of dummy variables | |
26. | Deep Learning | H | Deep learning : Explain activation function : ReLu, Fermi / sigmoid , Tanh ,etc | www.deeplearningbook.org |
27. | Supervised Learning | L | Explain Cross Validation : Simple, , If it is time series data can normal cross validation work ? | http://scikit-learn.org/stable/modules/cross_validation.html |
28. | Supervised Learning | L | Explain : Stratified and LOO CV | http://scikit-learn.org/stable/modules/cross_validation.html |
29. | Supervised Learning | H | In Ensemble Learning, What is Soft Voting and Hard Voting | http://scikit-learn.org/stable/modules/ensemble.html#voting-classifier |
30. | Supervised Learning | L | Ensemble Learning: If correlations of prediction between 3 classifiers is >0.95 should I ensemble the outputs? Why if Yes andNO? | |
31. | Optimisation | H | What is regularisation, is linear regression regularised , if no then how it can be regularised | L1, l2 : See Ridge and lasso |
32. | Supervised Learning | L | Which algorithms will afected by Random Seed : Logistic regression, SVM, RandomForest, Neural nets | RF and NN |
33. | Supervised Learning | H | What is Look Ahead Bias ? How it can be identified ? | |
34. | Supervised Learning | H | Situation : I have 1000 Samples and 500 Features. I want to select 50 features. I Check the correlation of each of the 500 variable with Y using 100 samples and then use top 50. After doing this step I run cross validation on 1000 sample. What is the problem here ? | This has Look Ahead Bias |
35. | Optimisation | H | Explain Gradient Descent. Which one is better Gradient Descent or SGD or ADAM ? | http://ruder.io/optimizing-gradient-descent/ |
36. | Supervised Learning | L | Which algorithm is faster : GBM Trees or xgBoost ? Why | Xgboost : https://arxiv.org/abs/1603.02754 |
37. | Deep Learning | H | Explain back progapagation | www.deeplearningbook.org |
38. | Deep Learning | H | Explain Softmax | www.deeplearningbook.org |
39. | Deep Learning | H | DL : For Time series which archeture is used : MLP / LSTM / CNN ? Why ? | www.deeplearningbook.org |
40. | Deep Learning | H | Is it required ot normalise the data in neural nets ? Why ? | www.deeplearningbook.org |
41. | Optimisation | L | My Model has Very High Variance but Low Bias. Is this overfitting or underfitting ? If ans is Overfitting ( Which is correct) how can I make sure I don’t overfit. |
|
42. | Deep Learning | H | Explain Early Stopping | http://www.deeplearningbook.org/contents/regularization.html#pf20 |
43. | Deep Learning | H | Explain Dropout. Is bagging and dropout similar concepts ? If No , what is the difference ? | http://www.deeplearningbook.org/contents/regularization.html#pf20 |