Skip to content

Conversation

@ANugmanova
Copy link
Owner

No description provided.


train = pd.read_csv(os.path.join(os.path.dirname(__file__), 'data', 't.csv'), header=0, delimiter="\t", quoting=3) #открывается обучающий датасет

train = train[:5000]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а почему и здесь и при обучении deepmoji все только по 5000 твит обрезается

model = LinearSVC(penalty='l2', loss='squared_hinge', dual=True, tol=0.0001, C=1.0, multi_class='ovr',
fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)
model.fit(X, y)
print ("20 Fold CV Score. Bag of words: ", np.mean(cross_validation.cross_val_score(model, X, y, cv=20, scoring='roc_auc')))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

другая модель без кроссвалидации ведь проверяется?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Да, но это просто старый код, я его не меняла.


def train_model(nb_classes, DATASET_PATH, DATASET_PATH_PRETRAINED = '',
PRETRAINED_PATH='', delete_non_raws = False, save_model = False):
vocab = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

а что вот это за слова, кстати?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

это те теги, которые добавляются в препроцессинге у авторов

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А для русского они тоже нужны?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ну вот CUSTOM_URL и CUSTOM_NUMBER не зависят от языка, но по идее нужно будет проверить


def review_to_wordlist( review, remove_stopwords=False ):
# review_text = BeautifulSoup(review).get_text()
review_text = review
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)))

df['sent'].append(emoji_dict[emoji_name])
return df

df = {'text':[], 'id':[], 'sent':[]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if name == 'main'

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

вот ты зануда)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А вот и нет. Я импортнул отсюда словарь и у меня вышла ошибка, что какого-то файлика не хватает

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А, ну я его для других целей создавала просто)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants