Category: Interview Part 1
-
Q.14 What is SVM? Can you name some kernels used in SVM?
SVM stands for support vector machine. They are used for classification and prediction tasks. SVM consists of a separating plane that discriminates between the two classes of variables. This separating plane is known as hyperplane. Some of the kernels used in SVM are –
-
Q.13 Explain bias, variance tradeoff.
Bias leads to a phenomenon called underfitting. This is caused by the introduction of error due to the oversimplification of the model. On the contrary, variance occurs due to complexity in the machine learning algorithm. In variance, the model also learns noise and other distortions that affect the overall performance of it. If you increase…
-
Q.12 How will you create a series from a given list in Pandas?
We will the list to the Series() function. ser1 = pd.Series(mylist)
-
Q.11 Why is Naive Bayes referred to as Naive?
Ans. In Naive Bayes, the assumptions and probabilities that are computed of the features are independent of each other. It is the assumption of feature independence that makes Naive Bayes, “Naive”.
-
Q.10 How is AUC different from ROC?
AUC curve is a measurement of precision against the recall. Precision = TP/(TP + FP) and TP/(TP + FN). This is in contrast with ROC that measures and plots True Positive against False positive rate.
-
Q.9 Explain ROC curve.
Receiver Operating Characteristic is a measurement of the True Positive Rate (TPR) against False Positive Rate (FPR). We calculate True Positive (TP) as TPR = TP/ (TP + FN). On the contrary, false positive rate is determined as FPR = FP/FP+TN where where TP = true positive, TN = true negative, FP = false positive,…
-
Q.8 How can you convert date-strings to timeseries in a series?
Input: s = pd.Series([’02 Feb 2011′, ’02-02-2013′, ‘20160104’, ‘2011/01/04’, ‘2014-12-05’, ‘2010-06-06T12:05]) To solve this, we will use the to_datetime() function. pd.to_datetime(s)
-
Q.7 Can you stack two series horizontally? If so, how?
Yes, we can stack the two series horizontally using concat() function and setting axis = 1. df = pd.concat([s1, s2], axis=1)
-
Q.6 How are KNN and K-means clustering different?
Firstly, KNN is a supervised learning algorithm. In order to train this algorithm, we require labeled data. K-means is an unsupervised learning algorithm that looks for patterns that are intrinsic to the data. The K in KNN is the number of nearest data points. On the contrary, the K in K-means specify the number of…
-
Q.5 How to find the positions of numbers that are multiples of 4 from a series?
For finding the multples of 4, we will use the argwhere() function. First, we will create a list of 10 numbers – s1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) np.argwhere(ser % 4==0) Output > [3], [7]