Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
161 commits
Select commit Hold shift + click to select a range
da52c7c
Add Pegasus wrapper requirement
Jul 8, 2020
d7ca092
Add parameter files
Jul 8, 2020
b2e862b
Add vistautils requirement
Jul 8, 2020
e1adb5a
Initial Pegasus skeleton
Jul 8, 2020
b01a1db
Update root parameters file
Jul 9, 2020
57fe8b2
Pass real job parameters to Pegasus
Jul 9, 2020
f9cd332
Clean up some
Jul 9, 2020
bae796e
Store more job info
Jul 9, 2020
6fcd5d1
Fix _includes in runner.params
Jul 9, 2020
feb5167
More pegasus stuff
Jul 9, 2020
ed3b77c
Use simple loop to read in job parameters for Pegasus runner
Jul 9, 2020
57d8c85
Create parameters files mirroring Hydra configs
Jul 9, 2020
1963323
Clean up SAGA code
Jul 9, 2020
0a23da0
Modify train.py to take parameters files as input
Jul 9, 2020
fc03fb3
Don't overwrite actual parameters with combinations parameters.
Jul 9, 2020
917c595
Fix style issues
Jul 9, 2020
57750a8
Move Slurm configuration into configuration file
Jul 9, 2020
5beadd0
Remove old TODO
Jul 9, 2020
c5bdf99
Fix usage of parameters_only_entry_point()
Jul 10, 2020
55ef547
Fix runner params.
Jul 10, 2020
b9e4cc4
Fix resource request creation.
Jul 10, 2020
40510fd
Fix parameters file.
Jul 10, 2020
d4aeda1
Fix params->mapping conversion.
Jul 10, 2020
5617c4a
Fix parameter name.
Jul 10, 2020
8da14a2
Update resource request creation.
Jul 10, 2020
f77bdab
Fix partition in runner.params
Jul 10, 2020
7709ff5
No task2 goes in _default directory.
Jul 10, 2020
6022168
Start converting ensemble.py
Jul 10, 2020
63379d1
Fix typo
Jul 10, 2020
74adcb6
Fix another typo
Jul 10, 2020
f6a26da
Fix another typo
Jul 10, 2020
2e7fca8
Set up dependencies properly
Jul 10, 2020
e7ec4a0
Remove old comment
Jul 10, 2020
b61eca8
Delete unnecessary parameter
Jul 10, 2020
aa046dc
Fix ensembling parameters setup
Jul 10, 2020
2d84ed8
Re-add depends_on argument
Jul 10, 2020
c15eb85
Eliminate some redundant informatoin in ensembling parameters
Jul 10, 2020
8bb859e
Convert ensembling script to use parameters
Jul 10, 2020
cc1b973
Style fix
spigo900 Jul 10, 2020
ba1ff93
Move backend parameter to another parameter file
spigo900 Jul 10, 2020
5b7937f
Add missing EOL at ends of files
spigo900 Jul 10, 2020
7bb7f15
Temporarily disable use of ephemeral
spigo900 Jul 13, 2020
da07237
Fix invalid reference problem with Pegasus
spigo900 Jul 13, 2020
5758962
Fix another invalid reference in Pegasus script
spigo900 Jul 13, 2020
0d65f18
Specify name in setup.py
spigo900 Jul 13, 2020
4580ffe
Move name back into setup.py from setup.cfg.
spigo900 Jul 13, 2020
29fc1c7
Add missing comma
spigo900 Jul 13, 2020
6a1e9ec
Fix typo
spigo900 Jul 13, 2020
c788a7f
Fix Python references again
spigo900 Jul 13, 2020
11d2fdf
Fix crash when running train.py script
spigo900 Jul 13, 2020
e6c7ef0
Re-fix references in pegasus.py
spigo900 Jul 13, 2020
e4ab577
Try including model parameters in Pegasus train job parameters
spigo900 Jul 13, 2020
eaca966
Fix parameter _includes reference
spigo900 Jul 13, 2020
32d8cda
Fix incorrect memory parameter name
spigo900 Jul 13, 2020
08e3148
Specify time limit
spigo900 Jul 14, 2020
727785d
Fix typo
spigo900 Jul 14, 2020
87dae91
Specify ensembling output correctly in Pegasus
spigo900 Jul 14, 2020
dd72e90
Fix typo
spigo900 Jul 14, 2020
5ee4d2e
Fix issue with "do nothing" options
spigo900 Jul 14, 2020
1159d87
Specify in Pegasus not to use pretrained model when training
spigo900 Jul 14, 2020
7ab0820
Correctly set time limit in 'not alphanli' branch
spigo900 Jul 14, 2020
39cae0d
Don't print parameter options (handled by vistautils)
spigo900 Jul 14, 2020
94a93da
Increase memory request size
spigo900 Jul 14, 2020
c11ea14
Fix how task data paths are defined in parameters
spigo900 Jul 15, 2020
fe37980
Fix typo (_include vs. _includes)
spigo900 Jul 15, 2020
c0b587d
train_data_slice is no longer an int
spigo900 Jul 15, 2020
b09a36b
Treat train_data_slice as an int
spigo900 Jul 15, 2020
7dbff58
Fix data params again
spigo900 Jul 15, 2020
af35c9f
Fix typo
spigo900 Jul 15, 2020
3370ab7
Fix another typo
spigo900 Jul 15, 2020
9676232
Fix yet another typo
spigo900 Jul 15, 2020
2f566c1
Add quotes to includes for consistency
spigo900 Jul 15, 2020
9208b67
Add radically smaller-scale Pegasus script parameters for development
spigo900 Jul 15, 2020
e71a4ef
Rename Pegasus parameter file for consistency
spigo900 Jul 15, 2020
4c78bf8
Clean up Pegasus parameter files
spigo900 Jul 15, 2020
21a6cb3
Specify missing configuration key
spigo900 Jul 15, 2020
e5004cd
Fix typo
spigo900 Jul 15, 2020
cc982d3
Provide task data to ensemble script properly
spigo900 Jul 15, 2020
6f204a4
Fix deprecation warning
spigo900 Jul 15, 2020
170567b
Fix another typo
spigo900 Jul 15, 2020
0100bd1
Fix yet another typo
spigo900 Jul 15, 2020
dc26fe8
Explicitly convert to string when making model name
spigo900 Jul 15, 2020
f156d68
More string conversions
spigo900 Jul 15, 2020
cf5f753
Access parameters as key, not property
spigo900 Jul 15, 2020
921f922
Fix: Store model name, not model map in best_model_per_seed_group
spigo900 Jul 15, 2020
3dbd882
Fix typo
spigo900 Jul 15, 2020
24c8d52
Fix how ensemble output file name is resolved
spigo900 Jul 15, 2020
4a55d71
Refactor AlphaNLI special case slightly
spigo900 Jul 15, 2020
46d37fe
Don't log config (vistautils does this already)
spigo900 Jul 15, 2020
679fd42
Refactor how "without {factor}" ensembling is done
spigo900 Jul 15, 2020
3605367
Delete config directory
spigo900 Jul 15, 2020
293c3c2
Comment Pegasus script more
spigo900 Jul 24, 2020
fd0aa9f
Better typing for parameter_combinations list
spigo900 Jul 24, 2020
db1bc9a
Fix eval.py to use parameters/eval.params
spigo900 Jul 28, 2020
caeea41
PIQA: Switch to internal dev set to match base
spigo900 Jul 30, 2020
4b4e308
Fix invalid reference problem with Pegasus
spigo900 Jul 13, 2020
7b3c794
Delete config directory
spigo900 Jul 15, 2020
7972358
Use parameters better in train.py
spigo900 Jul 28, 2020
3fae60b
Save to experiment root by default instead of project root
spigo900 Jul 28, 2020
3088bd3
Emulate Hydra dated-saving behavior
spigo900 Jul 28, 2020
a737204
Remove Hydra-related training parameters
spigo900 Jul 28, 2020
2e126e8
Convert eval.py to use parameters
spigo900 Jul 28, 2020
d16a13b
eval.py: Fix checkpoint path parameter
spigo900 Jul 29, 2020
294ddc2
eval.py: Fix random seeding
spigo900 Jul 29, 2020
5be58a2
Revert "eval.py: Fix random seeding"
spigo900 Jul 29, 2020
25680b3
eval.py: Match how train.py gets random seed
spigo900 Jul 29, 2020
244f934
train.py: Rework dated-saving functionality
spigo900 Jul 29, 2020
e56bcd5
train.py: Fix how output directory is logged
spigo900 Jul 29, 2020
c3186e1
train.py: Fix call to evaluate using wrong parameters
spigo900 Jul 29, 2020
7464af2
train.py: Fix build_on_pretrained_model parameter
spigo900 Jul 29, 2020
39d2d68
Don't pass build_on_pretrained_model in pegasus.py
spigo900 Jul 29, 2020
50ac2c4
Pass save_by_date_and_parameters=False to Pegasus training jobs
spigo900 Jul 29, 2020
ef764e9
Don't pass build_on_pretrained_model in training parameters file
spigo900 Jul 29, 2020
f4dfbd2
Try to fix SLURM scripts (except cross_task_eval)
spigo900 Jul 28, 2020
1c10d3f
Grab all needed parameters at beginning of Pegasus script
spigo900 Jul 28, 2020
3ca623f
Implement combination-specific overrides
spigo900 Jul 28, 2020
abbbe4c
Pollute training parameters less in Pegasus workflow
spigo900 Jul 28, 2020
172a66c
dict.get() can't take default as a keyword argument
spigo900 Jul 29, 2020
f8592e3
Fix: Multiply by number of possible values
spigo900 Jul 29, 2020
f849ab3
Fix override matching function
spigo900 Jul 29, 2020
28364d6
Fix pegasus-dev overrides so that they make sense
spigo900 Jul 29, 2020
e75f752
Fix override_matches some more
spigo900 Jul 29, 2020
c3c60d2
Fix override_matches even more
spigo900 Jul 29, 2020
e4801f8
Fix override complexity and matching
spigo900 Jul 29, 2020
df1b7ce
Finally fix override code
spigo900 Jul 29, 2020
4adff05
Clean up override code slightly
spigo900 Jul 29, 2020
aaa7ece
Change pegasus-dev experiment root
spigo900 Jul 29, 2020
f171767
Pegasus: Fix how model parameters are passed to training script
spigo900 Jul 30, 2020
412844f
Train: Pass other parameters to model
spigo900 Jul 30, 2020
1acb61e
Add .items()
spigo900 Jul 30, 2020
9fd6047
Fix training parameter files
spigo900 Jul 30, 2020
4a4f162
training: Fix parameter name in files
spigo900 Jul 30, 2020
69ce3b1
Ensemble: Fix try_without
spigo900 Jul 30, 2020
1d1814d
Fix dev ensembling params
spigo900 Jul 30, 2020
6b77a09
Fix how eval.py gets model name
spigo900 Jul 30, 2020
cbe7e6e
Fix how eval.py loads the model configuration
spigo900 Jul 30, 2020
b869efc
eval.py: Include other parameters in model configuration
spigo900 Jul 30, 2020
c1814e0
Fix deprecation warning in eval.py
spigo900 Jul 30, 2020
0482a81
Override testing: Change nonsense parameter instead of real one
spigo900 Jul 30, 2020
4e35777
Rename nonsense parameter
spigo900 Jul 30, 2020
3d9128d
Fix typo
spigo900 Jul 30, 2020
c04cf32
Use ephemeral in pegasus-dev configuration
spigo900 Jul 31, 2020
56a39b9
Fix partition setup in Pegasus workflow
spigo900 Aug 3, 2020
3d7c7b4
Add full-workflow development parameters
spigo900 Aug 3, 2020
976009e
Shorter time limit for AlphaNLI in pegasus-dev-full.params
spigo900 Aug 3, 2020
9183aef
Fix how gold labels are found and frontload parameter-getting
spigo900 Aug 4, 2020
d1e9e4a
Catch only FileNotFoundError in "couldn't find preds" exception block
spigo900 Aug 4, 2020
4b8f34a
Fix how model_without_seed is constructed
spigo900 Aug 4, 2020
7e20de8
Shorten line
spigo900 Aug 4, 2020
da14165
Pegasus: Pass gold labels to ensemble script in new way
spigo900 Aug 4, 2020
e7e8f39
Ensemble: Rename variable to match parameter task_to_threshold
spigo900 Aug 4, 2020
24983ad
Pegasus: Remove unnecessary line
spigo900 Aug 4, 2020
5a7582c
Ensemble: Include parameters in the dict of successful models
spigo900 Aug 4, 2020
1c69984
Pegasus: Fix how iteration over ensembling tasks is done
spigo900 Aug 4, 2020
9b192bd
Pegasus: Actually add task_to_gold parameters to ensembling parameters
spigo900 Aug 4, 2020
0ed048e
Pegasus: Do job throttling
spigo900 Aug 5, 2020
bf5665f
Pegasus: Pass imported modules instead of strings to run
spigo900 Aug 5, 2020
95510e2
Check the list of tasks against
spigo900 Aug 10, 2020
0ab6443
Rearrange comment
spigo900 Aug 10, 2020
8f36dc3
Document how to run the ensembling workflow
spigo900 Aug 14, 2020
0eb9806
Change AlphaNLI, HellaSwag, and SIQA to use internal dev sets
spigo900 Aug 18, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,33 @@ python eval.py \
--output pred.lst
```

## Ensembling using Pegasus

### Setup

The ensembling workflow is defined and run using the Pegasus workflow management system. To run the
workflow, you'll need to install the [Pegasus wrapper][pegasus_wrapper].

Note that before running you'll need to set up your user-specific parameters file,
`parameters/root.params`. See `parameters/root.sample.params` for an example.

### Running the workflow

Once the wrapper is installed, generate the workflow:

```bash
python ai2/pegasus.py parameters/pegasus.params
```

Then submit the workflow:

```bash
cd path/to/experiment_root/ensemble
sh submit.sh
```

[pegasus_wrapper]: https://github.com/isi-vista/vista-pegasus-wrapper/

## Results

### PIQA
Expand Down
File renamed without changes.
193 changes: 193 additions & 0 deletions ai2/ensemble.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
import csv
import itertools
import os
import numpy as np
from collections import Counter, defaultdict
from typing import Mapping, Any
import heapq

from more_itertools import powerset
from sklearn.metrics import accuracy_score
import pandas as pd
from scipy.stats.stats import pearsonr

from vistautils.parameters_only_entrypoint import parameters_only_entry_point
from vistautils.parameters import Parameters


def get_model_name(model: Mapping[str, Any]) -> str:
return "_".join(str(option) for parameter, option in model['parameters'])


def main(params: Parameters):
def run_ensemble(predictions_df, confidences_df, subset):
# confidences_df[confidences_df < 0.2] = 0 # Set low confidence values to 0.
# confidences_df = confidences_df.eq(confidences_df.where(confidences_df != 0).max(1), axis=0).astype(int) # Get the most confident

relevant_confidences = confidences_df[subset]
weighted_votes = relevant_confidences.sum(axis=1).apply(np.argmax).to_numpy()
if task in ['socialiqa', 'alphanli']: weighted_votes += 1
final_predictions = weighted_votes.tolist()
stats = []
for _ in range(accuracy_bootstrapping_samples):
indices = [i for i in np.random.randint(0, len(final_predictions), size=len(final_predictions))]
stats.append(accuracy_score([labels[j] for j in indices], [final_predictions[j] for j in indices]))

# Calculate the confidence interval and log it to console
alpha = 0.95
p = ((1.0 - alpha) / 2.0) * 100
lower = max(0.0, np.percentile(stats, p))
p = (alpha + ((1.0 - alpha) / 2.0)) * 100
upper = min(1.0, np.percentile(stats, p))
accuracy = accuracy_score(labels, final_predictions)
print(f'Accuracy: {accuracy}, {alpha * 100:.1f} confidence interval {lower * 100:.1f} and {upper * 100:.1f}, '
f'average: {np.mean(stats) * 100:.1f}')

# print(f'{accuracy},{[int(i in subset) for i in model_to_path.keys()]}'.replace(' ','').replace('[','').replace(']','')) # CSV
# unweighted_votes = predictions_df[subset].mode(axis=1).too_nutolist()
return round(accuracy*100,2)

all_results = {}

task_to_threshold = params.namespace('task_to_threshold').as_nested_dicts()
task_to_gold = params.namespace('task_to_gold')

# Check that all the necessary namespaces and files exist before we go training on them.
gold_labels_paths = {}
for task in task_to_threshold.keys():
gold_labels_paths[task] = task_to_gold.namespace(task).existing_file('val_y')

# Check that all the necessary
task_to_models = {}
for task in task_to_threshold.keys():
models_for_task = params.namespace('models').arbitrary_list(task)
task_to_models[task] = models_for_task

data_sizes = params.arbitrary_list('data_sizes')
try_without = params.arbitrary_list('try_without')

accuracy_bootstrapping_samples = params.integer('accuracy_bootstrapping_samples')
output_file = params.creatable_file('output_file')

for task in task_to_threshold.keys():
task_models = task_to_models[task]
labels = pd.read_csv(gold_labels_paths[task], sep='\t', header=None).values.squeeze().tolist()
for data_size in data_sizes:
results = {}
print(f'\nRunning ensemble for {task.upper()}, {data_size}')
relevant_models = [model for model in task_models if model['train_data_slice'] == data_size]

best_score_per_seed_group = defaultdict(float)
best_model_per_seed_group = defaultdict(str)
successful_models = {}
model_to_predictions = {}
model_to_confidences = {}
# Get Accuracies
print('Accuracy of each model:')
for model in relevant_models:
try:
preds = pd.read_csv(model['predictions'], sep='\t', header=None).values.squeeze().tolist()
confs = pd.read_csv(model['confidence'], sep='\t', header=None).values.squeeze().tolist()
accuracy = accuracy_score(labels, preds)

model_name = get_model_name(model)
successful_models[model_name] = {'accuracy': accuracy, 'parameters': dict(model['parameters'])}
model_to_predictions[model_name] = preds
model_to_confidences[model_name] = confs
print(f'{model_name},{round(accuracy*100,2)}')
model_without_task_data_size = '_'.join(
str(option) for parameter, option in model['parameters']
if parameter not in {'task', 'train_data-slice'}
)
results[model_without_task_data_size] = round(accuracy*100,2)

# model_without_seed = model.strip('_'+model.split('_')[-1])
model_without_seed = '_'.join(
str(option) for parameter, option in model['parameters']
if parameter != 'random_seed'
)
if accuracy > best_score_per_seed_group[model_without_seed]:
best_score_per_seed_group[model_without_seed] = accuracy
best_model_per_seed_group[model_without_seed] = model_name
except FileNotFoundError:
print(f'Couldn\'t find preds for {model}')
continue

# Compare Models
# print('Compare pairs of predictions of each model')
# print('ID1,ID22,Pred Sim,Pred Cor,Correctness Cor,Confidence Cor,ConfCor Both Correct,ConfCor One Correct,ConfCor Both Wrong')
# for id1, id2 in itertools.combinations(relevant_models, 2):
# model1, rs1 = tuple(id1.split('_'))
# model2, rs2 = tuple(id2.split('_'))
# if model1 != model2 and rs1 != rs2: continue # skip if both the model and rs are different
# preds1, conf1 = model_to_predictions[id1], model_to_confidences[id1]
# correctness1 = [int(p == labels[i]) for i, p in enumerate(preds1)]
# preds2, conf2 = model_to_predictions[id2], model_to_confidences[id2]
# correctness2 = [int(p == labels[i]) for i, p in enumerate(preds2)]
# # ConfCor Both Correct
# ccbc = pearsonr(*zip(*[(conf1[i], conf2[i]) for i in range(len(preds1)) if correctness1[i] and correctness2[i]]))[0]
# # ConfCor Only One Correct
# ccoc = pearsonr(*zip(*[(conf1[i], conf2[i]) for i in range(len(preds1)) if correctness1[i] != correctness2[i]]))[0]
# # ConfCor Both Wrong
# ccbw = \
# pearsonr(*zip(*[(conf1[i], conf2[i]) for i in range(len(preds1)) if correctness1[i] == correctness2[i] == 0]))[
# 0]
# print(
# f'{id1},{id2},{accuracy_score(preds1, preds2)},{pearsonr(preds1, preds2)[0]},{pearsonr(correctness1, correctness2)[0]},{pearsonr(conf1, conf2)[0]},{ccbc},{ccoc},{ccbw}')
# print('\n')

predictions_df = pd.DataFrame.from_dict(model_to_predictions)
confidences_df = pd.DataFrame.from_dict(model_to_confidences).applymap(np.asarray)
# print(f'accuracy,{list(model_to_path.keys())}'.replace(' ','').replace('\'','').replace('[','').replace(']','')) # print for csv
# Grid search for ensembling
# ensemble_results = {}
# for subset in powerset(successful_models):
# if len(subset) <= 1: continue
# subset = list(subset)
# ensemble_results[tuple(subset)]=run_ensemble(predictions_df, confidences_df, subset)
# best = heapq.nlargest(10, ensemble_results, key=ensemble_results.get)
# print(ensemble_results[best[0]])
# best_performers = [m for ms in best for m in ms]
# counts = Counter(best_performers)
# print(counts.most_common())

print(best_model_per_seed_group)
print(best_score_per_seed_group)
print('Ensemble of all models:')
all_accuracy = run_ensemble(predictions_df, confidences_df, [
m for m, d in successful_models.items() if d['accuracy'] > task_to_threshold[d['parameters']['task']]
])
results['Ensemble - All'] = all_accuracy

print('Ensemble of best-per-architecture:', )
best_per_seed_accuracy = run_ensemble(predictions_df, confidences_df, [best_model_per_seed_group[k] for k in best_score_per_seed_group.keys()])
# if task != 'physicaliqa' and task != 'alphanli':
# confidences_df[[best_model_per_seed_group[k] for k in best_score_per_seed_group.keys()]].to_csv(f'{task}_conf_ensemble.csv')

results['Ensemble - best-per-architecture'] = best_per_seed_accuracy
results['Ensemble Improvement best-per-architecture vs all'] = round(best_per_seed_accuracy-all_accuracy,2)
print('Ensemble Improvement best per arc vs all:', results['Ensemble Improvement best-per-architecture vs all'])

for factor in try_without:
without_factor = [m for m in successful_models if factor not in m]
print(f'Without {factor}:')
# print(without_factor)
wf_accuracy = run_ensemble(predictions_df, confidences_df, without_factor)
results[f'Ensemble - Without {factor}'] = wf_accuracy

without_factor_per_arc = [m for m in [best_model_per_seed_group[k] for k in best_score_per_seed_group.keys()] if factor not in m]
print(f'Best-per-arc without {factor}:')
# print(without_factor_per_arc)
bpa_wf_accuracy = run_ensemble(predictions_df, confidences_df, without_factor_per_arc)
results[f'Best-per-arc without {factor}'] = bpa_wf_accuracy
# if factor == 'embed_all_sep_mean' and (task == 'physicaliqa' or task == 'alphanli'):
# confidences_df[without_factor_per_arc].to_csv(f'{task}_conf_ensemble.csv')

all_results[task + '_' + str(data_size)] = results

df = pd.DataFrame.from_dict(all_results)
df.to_csv(output_file, na_rep='-')


if __name__ == '__main__':
parameters_only_entry_point(main)
74 changes: 46 additions & 28 deletions eval.py → ai2/eval.py
Original file line number Diff line number Diff line change
@@ -1,58 +1,76 @@
from pathlib import Path
from typing import List, Union
from typing import List, Union, Any

import hydra
from loguru import logger
import numpy as np
import omegaconf
import pandas as pd
from sklearn.metrics import accuracy_score
import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from tqdm import tqdm

from model import Classifier

# Save root path as hydra will create copies of this code in a folder
ROOT_PATH = Path(__file__).parent.absolute()


# If script is executed by itself, load in the configuration yaml file and desired checkpoint model
@hydra.main(config_path="config/eval.yaml")
def main(config: omegaconf.Config):
config = omegaconf.OmegaConf.to_container(config)
logger.info(config)
from vistautils.parameters import Parameters
from vistautils.parameters_only_entrypoint import parameters_only_entry_point

from ai2.model import Classifier


def main(params: Parameters):
checkpoint_path = params.existing_file('checkpoint_path')
results_path = params.creatable_file('results_path')
val_x_file = params.existing_file('val_x')
val_y_file = params.optional_existing_file('val_y')
with_true_label = params.boolean('with_true_label')
if with_true_label and val_y_file is None:
raise RuntimeError(
f'with_true_label set to true but no true labels (val_y) provided! '
)
elif not with_true_label and val_y_file is not None:
raise RuntimeError(
f'with_true_label set to false but got true labels val_y!'
)

model_name = params.string('model.model_name')
task_name = params.string('task_name')
maybe_random_seed = params.get('random_seed', object)

# If the evaluation is deterministic for debugging purposes, we set the random seed
if not isinstance(config['random_seed'], bool):
logger.info(f"Running deterministic model with seed {config['random_seed']}")
np.random.seed(config['random_seed'])
torch.manual_seed(config['random_seed'])
if not isinstance(maybe_random_seed, bool):
if not isinstance(maybe_random_seed, int): \
raise RuntimeError(
"Random seed must be either false (i.e. no random seed)"
"or an integer seed!"
)
logger.info(f"Running deterministic model with seed {maybe_random_seed}")
np.random.seed(maybe_random_seed)
torch.manual_seed(maybe_random_seed)
if torch.cuda.is_available():
torch.backends.cuda.deterministic = True
torch.backends.cuda.benchmark = False

# Load in the check pointed model
config = params.namespace('model').as_nested_dicts()
config.update((k, v) for k, v in params.as_nested_dicts().items() if k != 'model')
model = Classifier(config)
device = 'cpu' if not torch.cuda.is_available() else "cuda"
checkpoint = torch.load(ROOT_PATH / config['checkpoint_path'], map_location=device)
checkpoint = torch.load(checkpoint_path, map_location=device)
model.load_state_dict(checkpoint['state_dict'])

save_path = Path(f"{config['model']}-{config['task_name']}-s{config['random_seed']}")
save_path = Path(f"{model_name}-{task_name}-s{maybe_random_seed}")
save_path.mkdir(parents=True, exist_ok=True)

# Call the main function with appropriate parameters
evaluate(a_classifier=model,
output_path=save_path,
results_path=results_path,
compute_device=device,
val_x=ROOT_PATH / config['val_x'],
val_y=(ROOT_PATH / config['val_y'] if config['with_true_label'] else None))
val_x=val_x_file,
val_y=val_y_file)


# Function to perform the evaluation (This was separated out to be called in train script)
def evaluate(a_classifier: Classifier, output_path: Union[str, Path], compute_device: str,
val_x: Union[str, Path], val_y: Union[str, Path] = None):
def evaluate(a_classifier: Classifier, output_path: Union[str, Path], results_path: Union[str, Path],
compute_device: str, val_x: Union[str, Path], val_y: Union[str, Path] = None):
# Move model to device and set to evaluation mode
a_classifier.to(compute_device)
a_classifier.eval()
Expand Down Expand Up @@ -88,7 +106,7 @@ def evaluate(a_classifier: Classifier, output_path: Union[str, Path], compute_de

stats = []
for _ in range(10000):
indices = [i for i in np.random.random_integers(0, len(predictions) - 1, size=len(predictions))]
indices = [i for i in np.random.randint(0, len(predictions) - 1, size=len(predictions))]
stats.append(accuracy_score([labels[j] for j in indices], [predictions[j] for j in indices]))

# Calculate the confidence interval and log it to console
Expand All @@ -101,10 +119,10 @@ def evaluate(a_classifier: Classifier, output_path: Union[str, Path], compute_de
f'average: {np.mean(stats) * 100:.1f}')

# Log eval result
with open(ROOT_PATH / f"results.txt", "a+") as resultf:
with open(results_path, "a+") as resultf:
resultf.write(f'{output_path},Accuracy-lower-upper-average,{accuracy_score(labels, predictions):.3f},'
f'{lower * 100:.1f},{upper * 100:.1f},{np.mean(stats) * 100:.1f}\n')


if __name__ == "__main__":
main()
parameters_only_entry_point(main)
File renamed without changes.
10 changes: 5 additions & 5 deletions model.py → ai2/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ def __init__(self, hparams):
self.label_offset = 0

# Load Transformer model from cache files (encoder and tokenizer)
self.embedder = AutoModel.from_pretrained(hparams["model"], cache_dir=self.root_path / "model_cache")
self.embedder = AutoModel.from_pretrained(hparams["model_name"], cache_dir=self.root_path / "model_cache")
self.tokenizer = \
AutoTokenizer.from_pretrained(hparams["model"], cache_dir=self.root_path / "model_cache", use_fast=False)
AutoTokenizer.from_pretrained(hparams["model_name"], cache_dir=self.root_path / "model_cache", use_fast=False)
self.embedder.train()
self.dropout = nn.Dropout(hparams["dropout"])

Expand All @@ -76,13 +76,13 @@ def forward(self, batch):
assert len(batch["attention_mask"].shape) == 2, "LM only take two-dimensional input"
assert len(batch["token_type_ids"].shape) == 2, "LM only take two-dimensional input"

batch["token_type_ids"] = None if "roberta" in self.hparams["model"] or "lm_finetuned" \
in self.hparams["model"] else batch["token_type_ids"]
batch["token_type_ids"] = None if "roberta" in self.hparams["model_name"] or "lm_finetuned" \
in self.hparams["model_name"] else batch["token_type_ids"]
results = self.embedder(input_ids=batch["input_ids"],
attention_mask=batch["attention_mask"],
token_type_ids=batch["token_type_ids"])

if 't5' in self.hparams["model"]:
if 't5' in self.hparams["model_name"]:
results = self.embedder(input_ids=batch["input_ids"],
decoder_input_ids=batch["input_ids"], )

Expand Down
Loading